Difference between revisions of "Listing Page Extractor"
| (6 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
{{Project | {{Project | ||
| + | |Has project output=Tool | ||
| + | |Has sponsor=Kauffman Incubator Project | ||
|Has title=Listing Page Extractor | |Has title=Listing Page Extractor | ||
|Has project status=Active | |Has project status=Active | ||
| Line 5: | Line 7: | ||
}} | }} | ||
| − | [[ | + | The objective of this project is to build a tool that automatically extracts the listing of client companies from an incumbator's website. The first step of the project is to develop the [[LP Extractor Protocol]]. |
| + | |||
| + | ==LP Extractor Protocol== | ||
| + | |||
| + | {{:LP Extractor Protocol}} | ||
Latest revision as of 12:47, 21 September 2020
The objective of this project is to build a tool that automatically extracts the listing of client companies from an incumbator's website. The first step of the project is to develop the LP Extractor Protocol.
| Listing Page Extractor | |
|---|---|
| Project Information | |
| Has title | Listing Page Extractor |
| Has start date | |
| Has deadline date | |
| Has project status | Active |
| Does subsume | LP Extractor Protocol |
| Has sponsor | Kauffman Incubator Project |
| Has project output | Tool |
| Copyright © 2019 edegan.com. All Rights Reserved. | |
LP Extractor Protocol
The LP Extractor Protocol currently envisages marking data locations on webpages, converting webpages into a simplified Domain Specific Language (DSL), and then encoding the DSL into a matrix. The markings of data locations would be encoded into a companion matrix. Both matrices will then be fed into a neural network, which is trained to produce the markings given the DSL. To date, we have conducted a literature review that has found papers describing similar "paired input" networks, and are in the process refining our understanding of the pre-existing code and work related to each step.