PhD Masterclass - How to Build a Web Crawler

From edegan.com
Revision as of 19:23, 28 January 2011 by imported>Ed (New page: This page provides resources for the PhD Masterclass "How to Build a Web Crawler", which I gave on Friday 28th January 2011 to interested PhD students at Haas. ==Tools== *[http://www.per...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page provides resources for the PhD Masterclass "How to Build a Web Crawler", which I gave on Friday 28th January 2011 to interested PhD students at Haas.

Tools

  • Perl - Available with a large set of useful modules for Windows from ActiveState as ActivePerl
  • Komodo - An integrated development environment for Perl available from ActiveState
  • Textpad - A powerful shareware text editor that supports regular expressions

You should download a trial of Komodo to help you learn. The trial is valid for 21 days (longer if you keep changing your system clock). Komodo will let you step through your code, line by line, and see the values that your variables take on.

Perl is a free and open language, with a rich history, so you will find a wealth of information on the web to help you learn and use it.

Modules

One of the joys of Perl is CPAN - The Comprehensive Perl Archive Network which acts as repository for perl modules (as well as scripts, distros and much else). There are modules written by people from all over the world for almost every conceivable purpose. There is usually no need to reinvent the wheel in Perl - just grab a module (e.g. Wheel::Base)!