Changes

Jump to navigation Jump to search
*Given the above idea, we have built 2 following algorithms to find all internal links of a web page with 2 given user inputs: homepage url and depth
'''''Breadth-First Search(BFS)approach''''':
we examine all pages(nodes) at the same depth before going down to the next depth.
E:\projects\listing page identifier\Internal_Link\Internal_url_BFS.py
'''''Depth-First Search (DFS) approach''''':
we visit a page(node)"A" and then all its children on the current path will be visited before we visit A's neighbor node "B".
[[File:screenshotEx.png|5px|thumb|right|Output File Example]]
E:\projects\listing page identifier\screen_shot\screen_shot_tool.py
 
 
===Image Processing===
This method would likely rely on a [https://en.wikipedia.org/wiki/Convolutional_neural_network convolutional neural network (CNN)] to classify HTML elements present in web page screenshots. Implementation could be achieved by combining the VGG16 model or ResNet architecture with batch normalization to increase accuracy in this context.
227

edits

Navigation menu