Listing Page Extractor
The objective of this project is to build a tool that automatically extracts the listing of client companies from an incumbator's website. The first step of the project is to develop the LP Extractor Protocol.
| Listing Page Extractor | |
|---|---|
| Project Information | |
| Has title | Listing Page Extractor |
| Has start date | |
| Has deadline date | |
| Has project status | Active |
| Does subsume | LP Extractor Protocol |
| Has sponsor | Kauffman Incubator Project |
| Copyright © 2019 edegan.com. All Rights Reserved. | |
LP Extractor Protocol
The LP Extractor Protocol currently envisages marking data locations on webpages, converting webpages into a simplified Domain Specific Language (DSL), and then encoding the DSL into a matrix. The markings of data locations would be encoded into a companion matrix. Both matrices will then be fed into a neural network, which is trained to produce the markings given the DSL. To date, we have conducted a literature review that has found papers describing similar "paired input" networks, and are in the process refining our understanding of the pre-existing code and work related to each step.