[pdb-discuss] british library screen scraping
Nathan Lewis
nathan_lewis at mac.com
Tue Apr 11 21:40:48 UTC 2006
FYI: Here is perl code using WWW::Mechanize to go to the british
library sound archive and search on the year 1900.
#!/usr/bin/perl
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
my $url = 'http://cadensa.bl.uk/cgi-bin/webcat';
$mech->get( $url );
$mech->follow_link( text_regex => qr/Advanced search/);
$mech->submit_form(
form_name => 'searchform',
fields => { pubyear => 1900 },
# enters the year 1900 and submits the form
);
print $mech->content;
__END__
From the page retrieved we can easily extract data like
1CD0028844 D1 S1 BD31 SYMPOSIUM
Faust (Act 4)/Gounod
Unnamed Male Chorus
It appears we can also get the full details by following the
<input type="submit" value="Details" name="VIEW^1" id="VIEW1"
class="itemdetails"> links
I hope this helps,
Nathan
More information about the pd-discuss
mailing list