Changes

NHL (view source)

Revision as of 16:15, 14 March 2016

1,550 bytes added , 16:15, 14 March 2016

Documentation of General Fanager Webcrawler

Using the LWP::Simple library makes it easy to rip the HTML off the website by simply doing,

$content = get(your url as a string here);

The URL used to access the General Fanager page containing data from all the players is [http://www.generalfanager.com/players http://www.generalfanager.com/players]. Occasionally the function will pull a webpage without any actual content in it. I don't know why this happens and it appears to be very inconsistent.

Now the HTML::Tree library allows us to parse the HTML code into a more accessible tree structure.

$tree->look_down( '_tag', 'tag of what you are looking for here')

Will return an array with each element of the array containing the HTMl tree down from where the tag was found. I used the tag table because it was the most specific tag above the player stat, and put the resuls into the @tables variable.

Now in order to access the data of each individual player you must look inside the @tables variable. @{$tables[0]->{_content}[1]->{_content}}is where I found an array containing an HTML tree for each player. However the content of the first element of this array is an empty array and the last 2 elements of this array have no content. the rest of the elements should be players. There are 2 different ways the HTML tree can be formed, one for a player without his own page, and one for a player with his own page. I created my %playerdicit;to store all the data from each player. For the players without their own page these are the locations I found their data foreach my $player (@{$tables[0]->{_content}[1]->{_content}}){ $name = $player->{_content}[0]->{_content}[0]; $position = $player->{_content}[1]->{_content}[0]; $age = $player->{_content}[2]->{_content}[0];for players with their own page, the position and age can be found at the same place but the name and link to their page are found elsewhere. $name = $player->{_content}[0]->{_content}[0]->{_content}[0]; $link = $player->{_content}[0]->{_content}[0]->{href};the link should be of the form /players/playerid# to which you can add http://www.generalfanager.com to get <nowiki>http://www.generalfanager.com/players/playerid#</nowiki> which is the link to that player's page. Using that link you can use the same method as described above to pull the HTML from that page and parse it into a tree structure.

Anonymous user

imported>Sahil

Changes

NHL (view source)

Revision as of 16:15, 14 March 2016

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools