Changes

1,349 bytes added , 13:47, 21 September 2020

no edit summary

{{Project|Has project output=Tool|Has sponsor=McNair ~~Projects~~Center

|Has title=Industry Classifier

|Has owner=Christy Warden,

=Summer 2018 Work=

Test data will come from crunchbase. Database is called crunchbase2 and is located in: /bulk/crunchbase2 The pulled information is in:

E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\Our companies with other info.xlsx

~~Since~~ The code to build tables to pull all info is in: E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\BuildTestData.sql ==MLP Classifier==The new version that I am editing on is: E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\IndustryClassifierCONDENSED-USETHIS.pySmall training and testing data is called: 2018traindata.txt NewTestData2018.txtLarger training and testing data is called: bigtrain2018.txt bigtest2018.txtThis file modifies the Classifier.pkl file which stores the components of the model. Eventually, we should be able to run this ~~dataset~~ through FinalIndustryClassifier.py. The crunchbase data in my training data has ~~different~~ almost 40 labels and ~~more classifications than~~ I could not get the accuracy rate of this model to go up past 30%. However, if you assign only 3 labels, the accuracy rate goes up to 50% ==LSTM Model==See old page here [[Deep Text Classifier]]. I updated the preprocessing file to run on python3. I tried updating this code to run on the ~~venture capital~~ new data ~~previously~~ from Crunchbase. Files used~~, we need to rebuild a coding system for~~ are located in: E:\McNair\Projects\Accelerators\Summer 2018\Industry Classifier update\Yang's Code You should first run the preprocessing file and then use the classification file. I could not figure out why the accuracy on this model was only 10% with 40 labels and around 30% with 5-8 labels. The accuracy of this one should be higher than the MLP classifier.

=New Notes=

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,612

edits

Changes

Industry Classifier (view source)

Revision as of 13:47, 21 September 2020

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools