Changes

Jump to navigation Jump to search
1,349 bytes added ,  13:47, 21 September 2020
no edit summary
{{Project|Has project output=Tool|Has sponsor=McNair ProjectsCenter
|Has title=Industry Classifier
|Has owner=Christy Warden,
=Summer 2018 Work=
Test data will come from crunchbase. Database is called crunchbase2 and is located in: /bulk/crunchbase2 The pulled information is in:
E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\Our companies with other info.xlsx
Since The code to build tables to pull all info is in: E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\BuildTestData.sql ==MLP Classifier==The new version that I am editing on is: E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\IndustryClassifierCONDENSED-USETHIS.pySmall training and testing data is called: 2018traindata.txt NewTestData2018.txtLarger training and testing data is called: bigtrain2018.txt bigtest2018.txtThis file modifies the Classifier.pkl file which stores the components of the model. Eventually, we should be able to run this dataset through FinalIndustryClassifier.py. The crunchbase data in my training data has different almost 40 labels and more classifications than I could not get the accuracy rate of this model to go up past 30%. However, if you assign only 3 labels, the accuracy rate goes up to 50% ==LSTM Model==See old page here [[Deep Text Classifier]]. I updated the preprocessing file to run on python3. I tried updating this code to run on the venture capital new data previously from Crunchbase. Files used, we need to rebuild a coding system for are located in: E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\Yang's Code You should first run the preprocessing file and then use the classification file. I could not figure out why the accuracy on this model was only 10% with 40 labels and around 30% with 5-8 labels. The accuracy of this one should be higher than the MLP classifier.  
=New Notes=

Navigation menu