18 January 2011
well, sometimes you're use LWP::UserAgent or WWW::Mechanize to scrape webpages, and in those webpages, there is some javascript code to set js cookies and the site will use those cookie to continue.

for example, one site has some js code like:

<script type="text/javascript">function test(){
// complicated js code to generate different code each time
document.cookie = "TS884e96_75=" + "b0f056f808cab30029f1dfed8117af84:"
 + chlg + ":" + slt + ":" + crc + ";Max-Age=3600;path=/";
<body onload="test()">

OK, now you're totally blocked out.

lucky we have JE, which I talked in my PerlChina Advent last year: http://advent.perlchina.org/2010/JE.html
it's pretty amazing and the solution is very simple:

# cookie_jar will build memory cookie for UA
my $ua = WWW::Mechanize->new(
    agent => 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20101203 Firefox/3.6.13',
    stack_depth => 1,
    autocheck  => 0,
    cookie_jar => {},
my $resp = $ua->get($url);

# get the js code
my ($js) = ($ua->content =~ /\<script type=\"text\/javascript\"\>(.*?)\<\/script\>/s);
$js =~ s/document.cookie \=/return/s;
$js .= "\ntest();"; ## use return and run it for JE
## get js and set cookie
my $j = JE->new;
my $v = $j->eval($js);

$resp->header('Set-Cookie', $v->value);


pretty simple and that's almost all what you need to write. a little explanation:
1. cookie_jar => {} will build memory cookie and read HTTP::Cookies for more details.
2. we convert the document.cookie = to return in javascript code so that we can get the value by eval the js.
3. in the HTTP::Response, we set the header Set-Cookie to do the job what js does for us
4. cookie_jar (HTTP::Cookies) extract_cookies will add that cookie into WWW::Mechanzie UA.

fun and Enjoy!


blog comments powered by Disqus