Difference between revisions of "Google Scholar Crawler"

From edegan.com
Jump to navigation Jump to search
(Web Crawler Information for use with Google Scholar.)
 
Line 15: Line 15:
 
|Primary Billing=
 
|Primary Billing=
 
}}
 
}}
 +
 +
==Overview==
 +
 +
Google Scholar does not have its own API provided by Google. This page is dedicated to investigation into alternative methods for parsing and crawling data from Google Scholar.
 +
 +
==Existing Libraries==
 +
 +
A couple of Python parsers for Google Scholar exist, but they do not satisfy everything we need from this crawler.
 +
 +
===Scholar.py===
 +
 +
The [https://github.com/ckreibich/scholar.py scholar.py] script is the most extensive command line tool for parsing Google Scholar information. Given a search query, it returns results such as title, URL, year, number of citations, Cluster ID, Citations list, Version list, and an excerpt.
 +
 +
For example, once scholar.py is downloaded and all necessary components are installed the following command:
 +
 +
python scholar.py -c 3 --phrase "innovation"
 +
 +
produces the following result:
 +
 +
Title Mastering the dynamics of innovation: how companies can seize opportunities in the face of technological change
 +
          URL http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1496719
 +
          Year 1994
 +
    Citations 5107
 +
      Versions 5
 +
    Cluster ID 6139131108983230018
 +
Citations list http://scholar.google.com/scholar?cites=6139131108983230018&as_sdt=2005&sciodt=0,5&hl=en
 +
Versions list http://scholar.google.com/scholar?cluster=6139131108983230018&hl=en&as_sdt=0,5
 +
      Excerpt Abstract: Explores how innovation transforms industries, suggesting a strategic model to help firms to adjust to ever-shifting market dynamics. Understanding and adapting to innovation-- 
 +
'at once the creator and destroyer of industries and corporations'--is essential  ...
 +
 +
        Title National innovation systems: a comparative analysis
 +
          URL http://books.google.com/books?hl=en&lr=&id=YFDGjgxc2CYC&oi=fnd&pg=PR7&dq=%22innovation%22&ots=Opaxro2BTV&sig=9-svcPMAzs8nHezDp94Z-HATdRk
 +
          Year 1993
 +
    Citations 8590
 +
      Versions 6
 +
    Cluster ID 13756840170990063961
 +
Citations list http://scholar.google.com/scholar?cites=13756840170990063961&as_sdt=2005&sciodt=0,5&hl=en
 +
Versions list http://scholar.google.com/scholar?cluster=13756840170990063961&hl=en&as_sdt=0,5
 +
      Excerpt The slowdown of growth in Western industrialized nations in the last twenty years, along with the rise of Japan as a major economic and technological power (and enhanced technical 
 +
sophistication of Taiwan, Korea, and other NICs) has led to what the authors believe to be  ...
 +
 +
        Title Profiting from technological innovation: Implications for integration, collaboration, licensing and public policy
 +
          URL http://www.sciencedirect.com/science/article/pii/0048733386900272
 +
          Year 1986
 +
    Citations 10397
 +
      Versions 38
 +
    Cluster ID 14785720633759689821
 +
Citations list http://scholar.google.com/scholar?cites=14785720633759689821&as_sdt=2005&sciodt=0,5&hl=en
 +
Versions list http://scholar.google.com/scholar?cluster=14785720633759689821&hl=en&as_sdt=0,5
 +
      Excerpt Abstract This paper attempts to explain why innovating firms often fail to obtain significant economic returns from an innovation, while customers, imitators and other industry participants 
 +
benefit Business strategy—particularly as it relates to the firm's decision to  ...

Revision as of 17:00, 10 November 2016


McNair Project
Google Scholar Crawler
Project logo 02.png
Project Information
Project Title
Start Date
Deadline
Primary Billing
Notes
Has project status
Copyright © 2016 edegan.com. All Rights Reserved.


Overview

Google Scholar does not have its own API provided by Google. This page is dedicated to investigation into alternative methods for parsing and crawling data from Google Scholar.

Existing Libraries

A couple of Python parsers for Google Scholar exist, but they do not satisfy everything we need from this crawler.

Scholar.py

The scholar.py script is the most extensive command line tool for parsing Google Scholar information. Given a search query, it returns results such as title, URL, year, number of citations, Cluster ID, Citations list, Version list, and an excerpt.

For example, once scholar.py is downloaded and all necessary components are installed the following command:

python scholar.py -c 3 --phrase "innovation" 

produces the following result:

Title Mastering the dynamics of innovation: how companies can seize opportunities in the face of technological change
          URL http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1496719
         Year 1994
    Citations 5107
     Versions 5
   Cluster ID 6139131108983230018
Citations list http://scholar.google.com/scholar?cites=6139131108983230018&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=6139131108983230018&hl=en&as_sdt=0,5
      Excerpt Abstract: Explores how innovation transforms industries, suggesting a strategic model to help firms to adjust to ever-shifting market dynamics. Understanding and adapting to innovation--  
'at once the creator and destroyer of industries and corporations'--is essential  ...
        Title National innovation systems: a comparative analysis
          URL http://books.google.com/books?hl=en&lr=&id=YFDGjgxc2CYC&oi=fnd&pg=PR7&dq=%22innovation%22&ots=Opaxro2BTV&sig=9-svcPMAzs8nHezDp94Z-HATdRk
         Year 1993
    Citations 8590
     Versions 6
   Cluster ID 13756840170990063961
Citations list http://scholar.google.com/scholar?cites=13756840170990063961&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=13756840170990063961&hl=en&as_sdt=0,5
      Excerpt The slowdown of growth in Western industrialized nations in the last twenty years, along with the rise of Japan as a major economic and technological power (and enhanced technical   
sophistication of Taiwan, Korea, and other NICs) has led to what the authors believe to be  ...
        Title Profiting from technological innovation: Implications for integration, collaboration, licensing and public policy
          URL http://www.sciencedirect.com/science/article/pii/0048733386900272
         Year 1986
    Citations 10397
     Versions 38
   Cluster ID 14785720633759689821
Citations list http://scholar.google.com/scholar?cites=14785720633759689821&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=14785720633759689821&hl=en&as_sdt=0,5
      Excerpt Abstract This paper attempts to explain why innovating firms often fail to obtain significant economic returns from an innovation, while customers, imitators and other industry participants  
benefit Business strategy—particularly as it relates to the firm's decision to  ...