Changes

Jump to navigation Jump to search
170 bytes added ,  13:47, 21 September 2020
no edit summary
{{Project|Has project output=Tool|Has sponsor=McNair ProjectsCenter
|Has title=Patent Thicket
|Has owner=Grace Tan
===Location of Files===
E:://McNair/Software/Patent_Thicket
Downloaded PDFs:
===Google Scholar Crawler===
used Used [[Google Scholar Crawler]] 
I used the selenium box and switched from Rice Visitor, Rice Owls, and eduroam to prevent Google Scholar from blocking me.
This program converts all pdfs to txt files. It also generates two files _LOG_ERR.txt and _LOG_RUN.txt that includes the names of the pdfs that could not be converted and were converted successfully. Some of the files that were successfuly converted, especially the very small ones, don't have the text from the paper.
 
There were 573 successful txt files and 36 files that failed to convert (which does not add up to 608 but I'm not sure why).

Navigation menu