[[Minh Le]] [[Work Logs]] [[Minh Le (Work Log)|(log page)]]
2018-0908-03:*For some reason, when we search Cappital Innovators, there are more options in the "Tools" section. Need to figure out away around this. Did some quick fix around but nothing permanents.*Finished crawling, started classifying.*Finished classifying.*Pushed the batch to MTurk. 2018-08-02:*Cleaned up codes*Published the big MTurk batch.*Got results after 2 hours. *Processed the data and trimmed extra columns off.*Helped Grace with her minor code code*Helped Maxine with the url classifier*Improved crawler to take date arguments as per Ed request.*Ran the crawler again. 2018-08-01:*Built the SeedDB parser with Maxine and Connor*Finished getting the data from Seed DB and sent it to Connor. 2018-07-31:*Talked to Connor and Maxine to figure out SeedDB*Published the first small batch of MTurk with interjudge reliability (2 workers per HIT) and got good results*Tested SeedDB server 2018-07-30:*Finalized the design for MTurk, sent to Ed for thoughts and opinions*Tried publishing a batch on MTurk using the sandbox, and talked to Connor to test it out together. 2018-07-29:*Worked on HTML mockup for MTurk*Crawled Data for the Mturk 2018-07-28:*Worked on HTML mockup for MTurk 2018-07-27:*Worked on MTurk 2018-07-26:*Worked on collecting data with others.*Skyped Ed, Hira along with others. 2018-07-25:*Worked with MTurk with Connor*Talked with Ed about the project progress. We agreed that the RNN can wait, and focus on collecting the data because the data seems much usable now.*Hand collect data along with fellow interns. 2018-07-24:*Tried to tweak some more. Still no progress. I might change to word2vec finally?*Looked into MTurk 2018-07-23:*The tuning has not been completed yet. However, checking from the results, it seemed that the last 6 parameters did not significantly affect the result?*This tuning had been fruitless. I stopped the code.*Looked into using Yang's preprocessing code.*Maxine was borrowing my crawler for her work and she found a bug in the crawler where the crawler would never take the first result. i think because google updates their web display? Anyway, fixed it.*Worked on the wiki page 2018-07-20:*Ran parameters tuning to tweak 11 different parameters: dropout_rate_firstlayer\tdropout_rate_secondlayer\trec_dropout_rate_firstlayer\trec_dropout_rate_secondlayer\tembedding_vector_length\tfirstlayer_units\tsecondlayer_units\t"dropout_rate_dropout_layer\tepochs\tbatch_size\tvalidation_split*Talked to Ed about potentially just do a test run with the RandomForest model because we needed data soon. 2018-07-19:*Helped Grace with her Thicket project*Helped Maxine with her classifier*Delegated the data collecting task to Connor*Continued optimizing the current Kera's LSTM. The accuracy is around 50% right now 2018-07-18:*Edited the wiki page with more content and ideas.*Tried an MLP with lbfgs solver, and got around 60% accuracy: FINISHED classifying. Train accuracy score: 1.0 FINISHED classifying. Test accuracy score: 0.652542372881356*Building a full fledge LSTM (not prototype) to see how things go 2018-07-17:*try tuning the LSTM in keras but did not manage to increase the accuracy by much. Accuracy fluctuates around 50% 2018-07-16:*Work to adapt the data to RNN*Installed keras for BOTH python 2 and 3.*For python2, installed using the command: pip install keras*For python3, installed by first downloading github repo: git clone https://github.com/keras-team/keras.gitthen run the following command cd keras python3 setup.py installNormally, having run the command for python 2 should be sufficient, but we have anaconda2 and anaconda3 both so for some reason, pip can't detect the ananconda 3 folder, hence we have to manually install it like that.Note that you can run: python setup.py install to install to python2 as well (and skip the pip installation). Source: https://keras.io/*Prototyped a simple LSTM in keras, and the accuracy was 0.53. This is promising; after I complete the full model, the accuracy can be much higher. 2018-07-13:
*Finished installing tensorflow for all user. Create a new folder to work on the DBServer to use tensorflow. The folder can be found here:
Z:\AcceleratorDemoDay
or if accessed from PuTtY, use the following command:
cd \bulk\AcceleratorDemoDay
*The new RNN currently has words frequency as input features
2018-0907-12:
*Followed this instruction here: https://www.tensorflow.org/install/install_linux#InstallingVirtualenv and install tensorflow with Wei. Specific is below.
*1. Installed CUDA Toolkit 9.0 Base Installer. The toolkit is in