Changes

Jump to navigation Jump to search
664 bytes added ,  18:24, 28 November 2017
no edit summary
Things are good! Today made the program so that we can get however many pages of search results we want and get the PDF links for all the ones we can see. Towards the end of the day, google scholar picked up that we were a robot and started blocking me. Hopefully the block goes away when I am back on Monday. Now working on parsing apart the txt file to go to the websites we saved and download the PDFs. Should not be particularly difficult.
 
'''11/28'''
 
Basically everything is ready to go, so long as Google Scholar leaves me alone. We currently have a program which will take in a search term and number of pages you want to search. The crawler will pull as many PDFs from this many pages as possible (it'll go slowly to avoid getting caught). Next, it will download all the PDFs discovered by the crawler (also possibly save the links for journals whose PDFs were not linked on scholar). It will then convert all the PDFs to text. Finally, it will search through the paper for a list of terms and for any definitions of patent thickets. I will be making documentation for these pieces of code today.
=Lauren's LOG=
272

edits

Navigation menu