Changes

Jump to navigation Jump to search
no edit summary
2. If the file contains a list of cohort companies, mark it as a 1 under the column "cohort." If not, mark it a 0. These pages do not necessarily have to be about the accelerator that the row is about, it could just be any list of cohort companies for any demoday.
It would probably be better not to do this sequentially, because having a balanced dataset of many types of pages is useful. Also, if you see a certain page that shows up many times (For example, the "Pardon the Our Interruption" page), you don't need to classify it multiple times. Just leave the rest blank.
Also, it is better to have a balanced set of 1's and 0's. It's not really useful to have a huge list of 0's, when there are only a few 1's (as the classifier only takes as many 0's as there are 1's to have a 50/50 set). So it's probably better to look for pages that are likely to list cohort companies and look at those first.
If you want examples of pages with and without cohort lists, you can look at some of the already classified examples.
226

edits

Navigation menu