Changes

Jump to navigation Jump to search
====Site Map Generator====
'''Part I URL Extraction from HTML'''
 
The goal here is to identify url links from the HTML code of a website. We can solve this by finding the place holder, which is anchor tag <a>, for a hyperlink. Within the anchor tag, we may locate the href attribute that contains the url link that we are looking for (see example below).
<a href="/wiki/Listing_Page_Classifier_Progress" title="Listing Page Classifier Progress"> Progress Log (updated on 4/15/2019)</a>
 
Issues may occur:
* The href may not give us the full url, like above example it excludes the domain name: "http://www.edegan.com"
* Some may not exclude the domain name and we should take consideration of both cases when extracting the url
 
'''Part II Algorithm On Collecting Internal Links'''
[[File:WebPageTree.png|700px|thumb|center|Site Map Tree]]
227

edits

Navigation menu