Changes

Jump to navigation Jump to search
[[Kyran Adams]] [[Work Logs]] [[Kyran Adams (Work Log)|(log page)]]
2018-04-05: The classifier doesn't work as well when there is an imbalance of positive and negative training cases, so I made it use the same number of cases from each. Also had a meeting, my next task is to run the google crawler to create a larger dataset, which we can then use to improve the classifier.
2018-04-04: Continued to classify examples, and tried using images as features. It didn't give great results, so I abandoned that. Currently the features are wordcounts in the headers and title. I might consider the number of "simple" links in the page, like "www.abc.com". Complicated links would be used a lot for many things, but simple links are often used to bring someone to a home page, as in a cohort company's page, so this might be a good indicator of a list of cohort companies.
226

edits

Navigation menu