Changes

Jump to navigation Jump to search
739 bytes added ,  14:37, 17 February 2017
no edit summary
===FindTrainData.py===
Builds a tab-delimited text file containing 200 companies with each Industry classification (i.e. 200 biotech, 200 media etc). Hopefully if we use this as our training data, we will get more accurate classifications.
 
==FixDescriptions.py==
Deals with the problem that by output files from SDC are poorly formatted when the description goes beyond 1 line. Outputs a tab-delimited text file where the whole description is on the same line and can be read.
===Addresses.txt===
This text file contains investment info, name, address, city, state of Portfolio companies.
 
===Descriptions.txt===
See an example [https://exceljet.net/formula/count-matches-between-two-columns here].
 
 
=Comments and Thoughts=
 
'''2/17/17'''
 
Christy: No matter what parameters I change in the NN, I can't get the accuracy to go up above around 30%. Looking at the descriptions that the classifier fails on, I realized that it pretty much guesses randomly a lot of the time when the descriptions are terrible like "We provide services to our customers." I think we need to be training and classifying based on the longer description, which is why I started working on the FixDescriptions.txt script.
272

edits

Navigation menu