Difference between revisions of "Minh Le (Work Log)"

From edegan.com
Jump to navigation Jump to search
Line 4: Line 4:
  
 
[[Minh Le]] [[Work Logs]] [[Minh Le (Work Log)|(log page)]]
 
[[Minh Le]] [[Work Logs]] [[Minh Le (Work Log)|(log page)]]
 +
 +
2018-06-29:
 +
*Delegated Augi to work on building the training data.
 +
*Started to work on the classifier by studying machine learning models
  
 
2018-06-28:
 
2018-06-28:
*Continue to find more ways to optimize the crawler.
+
*Continued to find more ways to optimize the crawler: adding several constraints as well as blacklist websites like Eventbrite, LinkedIn and Twitter. Needed to figure out a way to bypass Eventbrite's time expire script. LinkedIn required login before seeing details. Twitter's post was too short and frankly distracting.
 
*Ran improved results on the classifier.
 
*Ran improved results on the classifier.
 +
*Classified some training data.
 
*Helped Grace debug the LinkedIn Crawler.
 
*Helped Grace debug the LinkedIn Crawler.
  

Revision as of 13:04, 29 June 2018

Summer 2018

Minh Le Work Logs (log page)

2018-06-29:

  • Delegated Augi to work on building the training data.
  • Started to work on the classifier by studying machine learning models

2018-06-28:

  • Continued to find more ways to optimize the crawler: adding several constraints as well as blacklist websites like Eventbrite, LinkedIn and Twitter. Needed to figure out a way to bypass Eventbrite's time expire script. LinkedIn required login before seeing details. Twitter's post was too short and frankly distracting.
  • Ran improved results on the classifier.
  • Classified some training data.
  • Helped Grace debug the LinkedIn Crawler.

2018-06-27:

  • Worked on optimizing and fixing issues with the crawler.
  • It was observed that we may not need to change our criteria for the demo day pages. The page containing cohort list often includes dates (which is a data we now need to find). I might add more words to the words bag to improve it further but it seems unnecessary for now

2018-06-26:

  • Finished running the Analysis code (for some reasons the shell didn't run after i logged off of RDP
  • Talked to Ed about where to head with the code
  • Connected the 2 projects together: got rid of Kyran's crawler and Peter's analysis script for now (we might want the analysis code later on to see how good the crawler was)
  • Ran on the list of accelerators Connor gave me. Got mixed results (probably because the 80% is low) and we had to deal with website with expire timestamp like Eventbrite (the html showed the list, but displaying the html in the web browser doesn't). Found a problem that the crawler only get the number of results of the first page so if we want to gather large numbers of result, it would not work.

2018-06-25:

  • Fixed Peter's Parser's compatibility issue with Python3. All code can now be used with Python 3
  • Ran through everything in the Parser on a small test set.
  • Completed moving all the files.
  • Ran the Parser on the entire list.
  • The run took 3h45m to execute the crawling (not counting the other steps) with 5 results per accelerators
  • Update @6:00PM The Analysis has been taking an hour and 30m to run and only 80% done. I need to go home now but these steps are taking a lot of time

2018-06-22:

  • Moved Peter's Parser into my project folder. Details can be read under the folder "E:\McNair\Projects\Accelerator Demo Day\Notes. READ THIS FIRST\movelog".
  • The current Selenium version and Chrome seem to hate each other on the RDP (throwing a bunch of errors on registry key), so I had to switch to a Firefox webdriver. Adjusting code and inserting a bunch of sleep statements.
  • For some reason (yet to be understood) if I save HTML pages with the utf-8 encoding, it will get mad at me. So commented that out for now.
  • The code seemed slow compared to those existed in Kyran's project. Might attempt to optimize and parallelize it?
  • it seems that python 3 does not support write(stuff).encoding('utf-8')?

2018-06-21:

  • Continued reading through past projects (it's so disorganized...)
  • Moved Kyran's Google Classifier to my project folder. Details can be read under the folder "Notes. READ THIS FIRST\movelog".
  • Tried running the Classifier from a new folder. The Shell crashed once on the web_demo_feature.py
  • Ran through everything in the Classfier. Things seemed to be functioning with occasional error messages
  • Talked to Kyran about the project and clarified some confusions up
  • Made a to-do list in the general note file ("Notes. READ THIS FIRST\NotesAndTasks.txt")

2018-06-20:

  • Set up Work Log page.
  • Edited Profile page with more information.
  • Created project page: Accelerator Demo Day.
  • Made new project folder at E:\McNair\Projects\Accelerator Demo Day.
  • Read through old projects and started copying scripts over as well as cleaned things up.
  • Created movelog.txt to track these moving details.
  • Talked to Ed more about the project goals and purposes

2018-06-19: More SQL. Talked to Ed and received my project (Demo Day Crawler).

2018-06-18: Set up RDPs, Slacks, Profile page and learned about SQL.