Scholar Crawler Main Program

Revision as of 13:47, 21 September 2020 by Ed (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Scholar Crawler Main Program
Project logo 02.png
Project Information
Has title Scholar Crawler Main Program
Has owner Christy Warden
Has start date 10/23/2017
Has deadline date
Has keywords Google Scholar, python
Has project status Active
Has sponsor McNair Center
Has project output Tool
Copyright © 2019 All Rights Reserved.


This code is located at E:/McNair/Software/Google_Scholar_Crawler/ It calls on various other pieces of code to create a cohesive program for the patent thicket project which takes in a search term and a number of pages. It responds by searching on Google Scholar for that term, downloaded as many papers as it can from that search, converting them to text and searching for key terms and a definition of patent thicket in the text. Each piece of code can also be used individually for other applications.

Stage 1

Sets up a series of directories for results to go in.

Stage 2

Google Scholar Crawler under heading.

Stage 3

PDF Downloader

Stage 4

PDF to Text Converter

Stage 5

Key Terms Search