Difference between revisions of "Listing Page Extractor"
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{Project | {{Project | ||
+ | |Has project output=Tool | ||
+ | |Has sponsor=Kauffman Incubator Project | ||
|Has title=Listing Page Extractor | |Has title=Listing Page Extractor | ||
|Has project status=Active | |Has project status=Active | ||
Line 5: | Line 7: | ||
}} | }} | ||
− | [[ | + | The objective of this project is to build a tool that automatically extracts the listing of client companies from an incumbator's website. The first step of the project is to develop the [[LP Extractor Protocol]]. |
+ | |||
+ | ==LP Extractor Protocol== | ||
+ | |||
+ | {{:LP Extractor Protocol}} |
Latest revision as of 13:47, 21 September 2020
The objective of this project is to build a tool that automatically extracts the listing of client companies from an incumbator's website. The first step of the project is to develop the LP Extractor Protocol.
Listing Page Extractor | |
---|---|
Project Information | |
Has title | Listing Page Extractor |
Has start date | |
Has deadline date | |
Has project status | Active |
Does subsume | LP Extractor Protocol |
Has sponsor | Kauffman Incubator Project |
Has project output | Tool |
Copyright © 2019 edegan.com. All Rights Reserved. |
LP Extractor Protocol
The LP Extractor Protocol currently envisages marking data locations on webpages, converting webpages into a simplified Domain Specific Language (DSL), and then encoding the DSL into a matrix. The markings of data locations would be encoded into a companion matrix. Both matrices will then be fed into a neural network, which is trained to produce the markings given the DSL. To date, we have conducted a literature review that has found papers describing similar "paired input" networks, and are in the process refining our understanding of the pre-existing code and work related to each step.