Changes

Jump to navigation Jump to search
8,747 bytes added ,  17:14, 3 August 2018
no edit summary
[[Minh Le]] [[Work Logs]] [[Minh Le (Work Log)|(log page)]]
 
2018-08-03:
*For some reason, when we search Cappital Innovators, there are more options in the "Tools" section. Need to figure out away around this. Did some quick fix around but nothing permanents.
*Finished crawling, started classifying.
*Finished classifying.
*Pushed the batch to MTurk.
 
2018-08-02:
*Cleaned up codes
*Published the big MTurk batch.
*Got results after 2 hours.
*Processed the data and trimmed extra columns off.
*Helped Grace with her minor code code
*Helped Maxine with the url classifier
*Improved crawler to take date arguments as per Ed request.
*Ran the crawler again.
 
2018-08-01:
*Built the SeedDB parser with Maxine and Connor
*Finished getting the data from Seed DB and sent it to Connor.
 
2018-07-31:
*Talked to Connor and Maxine to figure out SeedDB
*Published the first small batch of MTurk with interjudge reliability (2 workers per HIT) and got good results
*Tested SeedDB server
 
2018-07-30:
*Finalized the design for MTurk, sent to Ed for thoughts and opinions
*Tried publishing a batch on MTurk using the sandbox, and talked to Connor to test it out together.
 
2018-07-29:
*Worked on HTML mockup for MTurk
*Crawled Data for the Mturk
 
2018-07-28:
*Worked on HTML mockup for MTurk
 
2018-07-27:
*Worked on MTurk
 
2018-07-26:
*Worked on collecting data with others.
*Skyped Ed, Hira along with others.
 
2018-07-25:
*Worked with MTurk with Connor
*Talked with Ed about the project progress. We agreed that the RNN can wait, and focus on collecting the data because the data seems much usable now.
*Hand collect data along with fellow interns.
 
2018-07-24:
*Tried to tweak some more. Still no progress. I might change to word2vec finally?
*Looked into MTurk
 
2018-07-23:
*The tuning has not been completed yet. However, checking from the results, it seemed that the last 6 parameters did not significantly affect the result?
*This tuning had been fruitless. I stopped the code.
*Looked into using Yang's preprocessing code.
*Maxine was borrowing my crawler for her work and she found a bug in the crawler where the crawler would never take the first result. i think because google updates their web display? Anyway, fixed it.
*Worked on the wiki page
 
 
2018-07-20:
*Ran parameters tuning to tweak 11 different parameters:
dropout_rate_firstlayer\tdropout_rate_secondlayer\trec_dropout_rate_firstlayer\trec_dropout_rate_secondlayer\tembedding_vector_length\tfirstlayer_units\tsecondlayer_units\t"dropout_rate_dropout_layer\tepochs\tbatch_size\tvalidation_split
*Talked to Ed about potentially just do a test run with the RandomForest model because we needed data soon.
 
 
2018-07-19:
*Helped Grace with her Thicket project
*Helped Maxine with her classifier
*Delegated the data collecting task to Connor
*Continued optimizing the current Kera's LSTM. The accuracy is around 50% right now
 
 
2018-07-18:
*Edited the wiki page with more content and ideas.
*Tried an MLP with lbfgs solver, and got around 60% accuracy:
FINISHED classifying. Train accuracy score:
1.0
FINISHED classifying. Test accuracy score:
0.652542372881356
*Building a full fledge LSTM (not prototype) to see how things go
 
2018-07-17:
*try tuning the LSTM in keras but did not manage to increase the accuracy by much. Accuracy fluctuates around 50%
 
2018-07-16:
*Work to adapt the data to RNN
*Installed keras for BOTH python 2 and 3.
*For python2, installed using the command:
pip install keras
*For python3, installed by first downloading github repo:
git clone https://github.com/keras-team/keras.git
then run the following command
cd keras
python3 setup.py install
Normally, having run the command for python 2 should be sufficient, but we have anaconda2 and anaconda3 both so for some reason, pip can't detect the ananconda 3 folder, hence we have to manually install it like that.
Note that you can run:
python setup.py install
to install to python2 as well (and skip the pip installation). Source: https://keras.io/
*Prototyped a simple LSTM in keras, and the accuracy was 0.53. This is promising; after I complete the full model, the accuracy can be much higher.
 
2018-07-13:
*Finished installing tensorflow for all user. Create a new folder to work on the DBServer to use tensorflow. The folder can be found here:
Z:\AcceleratorDemoDay
or if accessed from PuTtY, use the following command:
cd \bulk\AcceleratorDemoDay
*The new RNN currently has words frequency as input features
 
2018-07-12:
*Followed this instruction here: https://www.tensorflow.org/install/install_linux#InstallingVirtualenv and install tensorflow with Wei. Specific is below.
*1. Installed CUDA Toolkit 9.0 Base Installer. The toolkit is in
/usr/local/cuda-9.0
for the toolkit.
Did NOT install NVDIA accelerated Graphics Driver for Linux-x86_64 384.81 (We believe we have a different graphic driver. we have a much Newer version(396.26)).
Installed the CUDA 9.0 samples in
HOME/MCNAIR/CUDA-SAMPLES.
*2. Installed Patch 1, 2 and 3. The command to install was
sudo sh cuda 9.0.176.2 linux.run # (9.0.176.1 for patch 1 and 9.0.176.3 for patch 3)
*3. This was supposed to be what to do next:
"""
Set up the environment variables:
The PATH variable needs to include /usr/local/cuda-9.0/bin
To add this path to the PATH variable:
$ export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
In addition, when using the runfile installation method, the LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-9.0/lib64 on a 64-bit system
To change the environment variables for 64-bit operating systems:
$ export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Note that the above paths change when using a custom install path with the runfile installation method.
"""
But when we travel to /usr/local/ we saw cuda-9.2 which we did not install. So we are WAITING for Yang to get back to use so we can proceed.
*For now, I can't build anything without tensorflow, so I am going to continue classifying data.
*Helped Grace with Google Scholar Crawler's regex
*All installationote can be see here [[Installing TensorFlow]]
 
2018-07-11:
*With an extended dataset, the accuracy went down with the random forest model. Accuracy: 0.71 (+/- 0.15)
*Built codes for an RNN, running into problem of not having tensorflow installed
*Helped Grace with her Google Scholar Crawler.
*Asked Wei to help with installing tensorflow GPU version.
 
2018-07-10:
*Doing further research into how RNN can be used to classify
*Reorganize the code under a new folder "Experiment" to prepare for testing with a new RNN
*Ran the reorganized code to make sure there is no problem. I kept running into this error: "TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''"
*Apparently this was caused by random question marks I have in the column (??) Removed it and it seems to run fine.
 
2018-07-09:
*Continued studying machine learning models.
*Helped Grace with her LinkedIn Crawler.
*Cleaned up working folder.
*Populate the project page with some information.
 
2018-07-06:
*Review Augi's classified training data to make sure it meets the correct requirement.
*Continued studying machine learning models and neural nets
 
2018-07-05:
*Studied different machine learning models and different classifier algorithms to prepare to build the RNN.
*Worked on classifying more training data.
 
2018-07-03:
*Ran a 0.84 classifier on the newly crawled data from the Chrome driver. From observation, the data still was not good enough. I will started building the RNN
*Still waiting Augi to release lock on the new excel data so i can work on it.
 
2018-07-02:
*Why did the code not run while I logged out of RDP omg these codes were running for so 3 hours last time I logged off :(
*The accuracy got to 0.875 today with just the new improved word list, which I thought might have overfitted the data. This was also rare because I never got it again
*Ran the improved crawler again to see how it went. (The ran start at 10AM ~It has been 5 hours-ish and it only processes the 50% of the list)
*After painfully seeing firefox crawling (literally) through webpages, I had installed the chromedriver in the working folder and changed the DemoDayCrawler.py back to Chrome Webdriver
*It seems like Firefox has a tendancy to pause randomly when i don't log into rdp and keep an eye on it. Chrome resolves this problem
2018-06-29:
*Delegated Augi to work on building the training data.
*Started to work on the classifier by studying machine learning models
*Edited words.txt with new words and remove words that i don't think help with the classification. Removed: march/ Added: rundown, list, mentors, overview, graduating, company, founders, autumn.*The new words.txt has had increased the accuracy from 0.76 to 0.83 in the first run*The accuracy really fluctuated. Got as low as 0.74 but the highest run has been 0.866*Note: testing inside of KyranGoogleClassifier instead of the main folder because the main folder was testing out the new improved crawler.*It also seemed that rundown and autumn is the least important with 0.0 score so I removed them
2018-06-28:
197

edits

Navigation menu