Changes

Kyran Adams (Work Log) (view source)

Revision as of 17:12, 16 April 2018

47 bytes added , 17:12, 16 April 2018

→‎Spring 2018

[[Kyran Adams]] [[Work Logs]] [[Kyran Adams (Work Log)|(log page)]]

2018-04-16: I think I'm going to transition from using hand-picked feature words to automatically generated features. [http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html This webpage] has a good example. I could also use n-grams, instead of unigrams. I might also consider using a SVM instead of a random forest, or a combination of the two.

2018-04-12: Continued increasing the dataset size as well as going back and correcting some wrong classifications in the dataset. I'm wondering whether the accuracy would be improved most by an increased dataset, a different approach to features, or changes to the model itself. I am considering using something like word2vec with, for example, five words before and after each instance of the words "startup" or "demo day" in the pages. The problem with this is that this would need its own dataset (which would be easier to create). However, semantic understanding of the text might be an improvement. Or, maybe, I could train this on the title of the article, because the title should have enough semantic meaning. But even this dataset might have to be curated, because a lot of the 0's are demoday pages, they just don't list the cohorts.

Kyranstar

226

edits

Changes

Kyran Adams (Work Log) (view source)

Revision as of 17:12, 16 April 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools