Changes

Jump to navigation Jump to search
no edit summary
[[Grace Tan]] [[Work Logs]] [[Grace Tan (Work Log)|(log page)]]
2018-07-1518: Ran google scholar crawlerFinished running the rest of the 100 pages. When Took quite a long time because google scholar blocks was catching me with a 403 error codeafter 2-5 pages rather than 5-10. It helped to switch between the different wifi(rice visitor, rice owls, eduroam). Altogether resulted in 958 bibtex files and 613 pdfs from 1000 entries. There might be more entries but I exit 'm not sure where to find them. I saved the program data and rerun it at code onto the page that rdp by connecting to it last looked at by clicking on from the correct page number before crawlingselenium box.
2018-07-1417: Ran google scholar crawler. When google scholar blocks me with a 403 error code, I exit the program and rerun it at the page that it last looked at by clicking on the correct page number before crawling. I finished running through 68/100 pages of google scholar. 2018-07-16: Ran through 10 pages of google scholar first thing without a problem. Tried running through all 100 pages but kept on getting caught. Helped Augi with discrepancies in data and will try google scholar crawler again tomorrow.
2018-07-13: Fixed problem where pdf urls were not saving to txt file. Created another txt file to save urls that are not pdfs. Didn't run into a single recaptcha all morning. Towards the end, it started catching me at the 7th query and forced the program to restart. For some reason, selecting the css element triggered google scholar to find me. I changed the css element tag for the "next" button to the path and I was able to get through the 4th page. It is still not able to click on the actual link but I'm not sure if that's supposed to do anything.
108

edits

Navigation menu