Difference between revisions of "PhD Masterclass - How to Build a Web Crawler"

From edegan.com
Jump to navigation Jump to search
imported>Ed
(New page: This page provides resources for the PhD Masterclass "How to Build a Web Crawler", which I gave on Friday 28th January 2011 to interested PhD students at Haas. ==Tools== *[http://www.per...)
 
imported>Ed
Line 10: Line 10:
  
 
Perl is a free and open language, with a rich history, so you will find a wealth of information on the web to help you learn and use it.
 
Perl is a free and open language, with a rich history, so you will find a wealth of information on the web to help you learn and use it.
 +
 +
==Sample Perl Code==
 +
 +
We wrote a couple of simple scripts together to get to grips with Perl.
 +
 +
The first was (save it in a file called Script1.pl in the root of your R drive):
 +
 +
print "Hello World";
 +
 +
To execute the script we can either open a command prompt and run the script:
 +
Start->Run->"cmd.exe"
 +
R:
 +
perl Script1.pl
 +
 +
Or we can run it in command by going:
 +
Debug->Go
 +
 +
(Under Preferences->Debugger tick the box to avoid being prompted by the debug dialog each time)
 +
 +
Or we can shell on to Bear and run it there:
 +
Use PuTTY to connect to bear.haas.berkeley.edu (see [[Research Computing At Haas| here]].
 +
perl Script1.pl
 +
  
 
==Modules==
 
==Modules==
  
 
One of the joys of Perl is [http://www.cpan.org/ CPAN - The Comprehensive Perl Archive Network] which acts as repository for perl modules (as well as scripts, distros and much else). There are modules written by people from all over the world for almost every conceivable purpose. There is usually no need to reinvent the wheel in Perl - just grab a module (e.g. Wheel::Base)!
 
One of the joys of Perl is [http://www.cpan.org/ CPAN - The Comprehensive Perl Archive Network] which acts as repository for perl modules (as well as scripts, distros and much else). There are modules written by people from all over the world for almost every conceivable purpose. There is usually no need to reinvent the wheel in Perl - just grab a module (e.g. Wheel::Base)!

Revision as of 19:02, 31 January 2011

This page provides resources for the PhD Masterclass "How to Build a Web Crawler", which I gave on Friday 28th January 2011 to interested PhD students at Haas.

Tools

  • Perl - Available with a large set of useful modules for Windows from ActiveState as ActivePerl
  • Komodo - An integrated development environment for Perl available from ActiveState
  • Textpad - A powerful shareware text editor that supports regular expressions

You should download a trial of Komodo to help you learn. The trial is valid for 21 days (longer if you keep changing your system clock). Komodo will let you step through your code, line by line, and see the values that your variables take on.

Perl is a free and open language, with a rich history, so you will find a wealth of information on the web to help you learn and use it.

Sample Perl Code

We wrote a couple of simple scripts together to get to grips with Perl.

The first was (save it in a file called Script1.pl in the root of your R drive):

print "Hello World";

To execute the script we can either open a command prompt and run the script:

Start->Run->"cmd.exe"
R:
perl Script1.pl

Or we can run it in command by going:

Debug->Go

(Under Preferences->Debugger tick the box to avoid being prompted by the debug dialog each time)

Or we can shell on to Bear and run it there:

Use PuTTY to connect to bear.haas.berkeley.edu (see  here.
perl Script1.pl


Modules

One of the joys of Perl is CPAN - The Comprehensive Perl Archive Network which acts as repository for perl modules (as well as scripts, distros and much else). There are modules written by people from all over the world for almost every conceivable purpose. There is usually no need to reinvent the wheel in Perl - just grab a module (e.g. Wheel::Base)!