Difference between revisions of "Collecting SBIR Data"

From edegan.com
Jump to navigation Jump to search
Line 5: Line 5:
 
|Has keywords=Data, Tool
 
|Has keywords=Data, Tool
 
|Has project status=Complete
 
|Has project status=Complete
|Does subsume=SBIR Concatenation,
+
|Does subsume=SBIR Concatenation, SBIR Evaluation,
 
}}
 
}}
 
==Manual Collection==
 
==Manual Collection==
Line 26: Line 26:
 
*Right click on what looks like the right piece of code and select "Copy xpath data"
 
*Right click on what looks like the right piece of code and select "Copy xpath data"
 
*Paste that stuff in your python script where it asks for a path, For example: driver.find_element_by_xpath("//*[@id='solr-print-dropdown-button']")
 
*Paste that stuff in your python script where it asks for a path, For example: driver.find_element_by_xpath("//*[@id='solr-print-dropdown-button']")
 +
 +
= SBIR Concatenation =
 +
==Objective==
 +
The objective of this project was to concatenate 162 xlsx files into one large tab delimited text file <br>
 +
==Script==
 +
The python script can be found here:
 +
  E:\McNair\Projects\SBIR\concat_excel.py
 +
The resulting file is located here:
 +
  E:\McNair\Projects\SBIR\SBIR.txt

Revision as of 11:15, 31 October 2017


McNair Project
Collecting SBIR Data
Project logo 02.png
Project Information
Project Title Collecting SBIR Data
Owner Adrian Smart
Start Date June 6, 2017
Deadline
Keywords Data, Tool
Primary Billing
Notes
Has project status Complete
Subsumes: SBIR Concatenation, SBIR Evaluation
Copyright © 2016 edegan.com. All Rights Reserved.


Manual Collection

Files are in:

E:\McNair\Projects\SBIR

Each file is a group of 1000 companies. Each group of 1000 is numbered sequentially.

Rough notes

  • Get the data from https://www.sbir.gov/sbirsearch/award/all
  • Built a Selenium Web Driver which is stored in E:\McNair\Software\Scripts\Selenium Web Drivers
  • Does not work because there is a captcha that must be entered after selecting xls download

Notes on building a Selenium Web Driver:

In your python script:

  • Make sure that you properly set the chromedriver path if you don't have it under root. For example: webdriver.Chrome("/Users/adriansmart/PycharmProjects/SeleniumTest/chromedriver")
  • Use driver.find_element_by_xpath to select the element on html page. You will need to enter the xpath in this function so first load the website in a browser.
  • Next, right click on the page element you want the xpath and select inspect. This will launch the html inspector and highlight the relevant lines of code
  • Right click on what looks like the right piece of code and select "Copy xpath data"
  • Paste that stuff in your python script where it asks for a path, For example: driver.find_element_by_xpath("//*[@id='solr-print-dropdown-button']")

SBIR Concatenation

Objective

The objective of this project was to concatenate 162 xlsx files into one large tab delimited text file

Script

The python script can be found here:

  E:\McNair\Projects\SBIR\concat_excel.py

The resulting file is located here:

  E:\McNair\Projects\SBIR\SBIR.txt