Difference between revisions of "Listing Page Classifier Progress"

Revision as of 14:15, 8 April 2019

This page records the progress on the Listing Page Classifier Project

3/28/2019

Assigned Tasks:

Suggested Approaches:

work on site map first:

4/1/2019

Site map:

Some href may not include home_page url : e.g. /careers
Updated urlcrawler.py (having issues with identifying internal links does not start with "/") <- will work on this part tomorrow

4/2/2019

Site map:

Solved the second bullet point from yesterday
Recursion to get internal links from a page causing HTTPerror on some websites (should set up a depth constraint- WILL WORK ON THIS TOMORROW )

4/3/2019

Site map:

4/4/2019

Site map (BFS approach is DONE):

@@ Line 39: / Line 39: @@
 '''4/4/2019'''
-Site map (DONE):
+Site map (BFS approach is DONE):
 *Test run couple sites to see if there are edge cases that I missed
-*Implement the code: try to output the result in a txt file
+*Implement the BFS code: try to output the result in a txt file
-*Will work on screenshot generator next week
+*Will work on DFS approach next week