Changes

Jump to navigation Jump to search
====URL Extraction from HTML====
The goal here is to identify url links from the HTML code of a website. We can solve this by finding the place holder, which is the anchor tag <a>, for a hyperlink. Within the anchor tag, we may locate the href attribute that contains the url link that we are looking for (see example below).
<a href="/wiki/Listing_Page_Classifier_Progress" title="Listing Page Classifier Progress"> Progress Log (updated on 4/15/2019)</a>
'''Note:''' the [https://www.crummy.com/software/BeautifulSoup/bs4/doc/ beautifulsoup] package is used for pulling data out of HTML
 
====Distinguish Internal Links====
227

edits

Navigation menu