Changes

5,428 bytes added , 17:14, 3 August 2018

no edit summary

[[Minh Le]] [[Work Logs]] [[Minh Le (Work Log)|(log page)]]

2018-08-03:

*For some reason, when we search Cappital Innovators, there are more options in the "Tools" section. Need to figure out away around this. Did some quick fix around but nothing permanents.

*Finished crawling, started classifying.

*Finished classifying.

*Pushed the batch to MTurk.

2018-08-02:

*Cleaned up codes

*Published the big MTurk batch.

*Got results after 2 hours.

*Processed the data and trimmed extra columns off.

*Helped Grace with her minor code code

*Helped Maxine with the url classifier

*Improved crawler to take date arguments as per Ed request.

*Ran the crawler again.

2018-08-01:

*Built the SeedDB parser with Maxine and Connor

*Finished getting the data from Seed DB and sent it to Connor.

2018-07-31:

*Talked to Connor and Maxine to figure out SeedDB

*Published the first small batch of MTurk with interjudge reliability (2 workers per HIT) and got good results

*Tested SeedDB server

2018-07-30:

*Finalized the design for MTurk, sent to Ed for thoughts and opinions

*Tried publishing a batch on MTurk using the sandbox, and talked to Connor to test it out together.

2018-07-29:

*Worked on HTML mockup for MTurk

*Crawled Data for the Mturk

2018-07-28:

*Worked on HTML mockup for MTurk

2018-07-27:

*Worked on MTurk

2018-07-26:

*Worked on collecting data with others.

*Skyped Ed, Hira along with others.

2018-07-25:

*Worked with MTurk with Connor

*Talked with Ed about the project progress. We agreed that the RNN can wait, and focus on collecting the data because the data seems much usable now.

*Hand collect data along with fellow interns.

2018-07-24:

*Tried to tweak some more. Still no progress. I might change to word2vec finally?

*Looked into MTurk

2018-07-23:

*The tuning has not been completed yet. However, checking from the results, it seemed that the last 6 parameters did not significantly affect the result?

*This tuning had been fruitless. I stopped the code.

*Looked into using Yang's preprocessing code.

*Maxine was borrowing my crawler for her work and she found a bug in the crawler where the crawler would never take the first result. i think because google updates their web display? Anyway, fixed it.

*Worked on the wiki page

2018-07-20:

*Ran parameters tuning to tweak 11 different parameters:

dropout_rate_firstlayer\tdropout_rate_secondlayer\trec_dropout_rate_firstlayer\trec_dropout_rate_secondlayer\tembedding_vector_length\tfirstlayer_units\tsecondlayer_units\t"dropout_rate_dropout_layer\tepochs\tbatch_size\tvalidation_split

*Talked to Ed about potentially just do a test run with the RandomForest model because we needed data soon.

2018-07-19:

*Helped Grace with her Thicket project

*Helped Maxine with her classifier

*Delegated the data collecting task to Connor

*Continued optimizing the current Kera's LSTM. The accuracy is around 50% right now

2018-07-18:

*Edited the wiki page with more content and ideas.

*Tried an MLP with lbfgs solver, and got around 60% accuracy:

FINISHED classifying. Train accuracy score:

1.0

FINISHED classifying. Test accuracy score:

0.652542372881356

*Building a full fledge LSTM (not prototype) to see how things go

2018-07-17:

*try tuning the LSTM in keras but did not manage to increase the accuracy by much. Accuracy fluctuates around 50%

2018-07-16:

*Work to adapt the data to RNN

*Installed keras for BOTH python 2 and 3.

*For python2, installed using the command:

pip install keras

*For python3, installed by first downloading github repo:

git clone https://github.com/keras-team/keras.git

then run the following command

cd keras

python3 setup.py install

Normally, having run the command for python 2 should be sufficient, but we have anaconda2 and anaconda3 both so for some reason, pip can't detect the ananconda 3 folder, hence we have to manually install it like that.

Note that you can run:

python setup.py install

to install to python2 as well (and skip the pip installation). Source: https://keras.io/

*Prototyped a simple LSTM in keras, and the accuracy was 0.53. This is promising; after I complete the full model, the accuracy can be much higher.

2018-07-13:

*Finished installing tensorflow for all user. Create a new folder to work on the DBServer to use tensorflow. The folder can be found here:

Z:\AcceleratorDemoDay

or if accessed from PuTtY, use the following command:

cd \bulk\AcceleratorDemoDay

*The new RNN currently has words frequency as input features

2018-07-12:

*Followed this instruction here: https://www.tensorflow.org/install/install_linux#InstallingVirtualenv and install tensorflow with Wei. Specific is below.

*1. Installed CUDA Toolkit 9.0 Base Installer. The toolkit is in /usr/local/cuda-9.0 for the toolkit. Did NOT install NVDIA accelerated Graphics Driver for Linux-x86_64 384.81 (We believe we have a different graphic driver. we have a much Newer version(396.26)). Installed the CUDA 9.0 samples in HOME/MCNAIR/CUDA-SAMPLES.

*2. Installed Patch 1, 2 and 3. The command to install was

sudo sh cuda 9.0.176.2 linux.run # (9.0.176.1 for patch 1 and 9.0.176.3 for patch 3)

*3. This was supposed to be what to do next:

"""

Set up the environment variables:

The PATH variable needs to include /usr/local/cuda-9.0/bin

To add this path to the PATH variable:

$ export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}

In addition, when using the runfile installation method, the LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-9.0/lib64 on a 64-bit system

To change the environment variables for 64-bit operating systems:

$ export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Note that the above paths change when using a custom install path with the runfile installation method.

"""

But when we travel to /usr/local/ we saw cuda-9.2 which we did not install. So we are WAITING for Yang to get back to use so we can proceed.

*For now, I can't build anything without tensorflow, so I am going to continue classifying data.

*Helped Grace with Google Scholar Crawler's regex

*All installationote can be see here [[Installing TensorFlow]]

2018-07-911:

*With an extended dataset, the accuracy went down with the random forest model. Accuracy: 0.71 (+/- 0.15)

*Built codes for an RNN, running into problem of not having tensorflow installed

Leminh.ams

197

edits

Changes

Minh Le (Work Log) (view source)

Revision as of 17:14, 3 August 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools