Changes

Jump to navigation Jump to search
no edit summary
[[Grace Tan]] [[Work Logs]] [[Grace Tan (Work Log)|(log page)]]
 
2018-07-13: Fixed problem where pdf urls were not saving to txt file. Created another txt file to save urls that are not pdfs. Didn't run into a single recaptcha all morning. Towards the end, it started catching me at the 7th query and forced the program to restart.
 
2018-07-12: Figured out how to save BibTeX files to computer. Still had to do recaptcha tests. After giving it some time, I was able to run it completely once but I only got 49 BibTeX files and about half as many pdf links. When I tried to work on it further the recaptcha wasn't loading and gave me the error - "Cannot contact reCAPTCHA. Check your connection and try again." I ended up moving to the selenium computer and spent the rest of the day converting the code to python3 and messing with regex because for some reason it wasn't matching the text correctly.
2018-07-11: Started on Google Scholar Crawler for Patent Thicket Project. I'm not sure what the problem is. The code seems to work except that Google constantly blocks be to do reCaptcha tests. I am also not sure if the crawler is saving any data to txt files and if so, where those files are located.
108

edits

Navigation menu