Changes

Jump to navigation Jump to search
1,553 bytes added ,  13:41, 21 September 2020
no edit summary
{{Project|Has project output=Data|Has sponsor=McNair ProjectsCenter
|Has title=Collecting SBIR Data
|Has owner=Adrian Smart,
|Has start date=June 6, 2017
|Has keywords=Data, Tool
|Has project status=ActiveComplete|Does subsume=SBIR Evaluation,
}}
==Manual Collection==
 
Files are in:
E:\McNair\Projects\SBIR
Each file is a group of 1000 companies. Each group of 1000 is numbered sequentially.
==Rough notes==
*Get the data from https://www.sbir.gov/sbirsearch/award/all*Built a Selenium Web Driver which is stored in E:\McNair\Software\Scripts\Selenium Web Drivers*Does not work because there is a captcha that must be entered after selecting xls download ==Notes on building a Selenium Web Driver:==In your python script:*Make sure that you properly set the chromedriver path if you don't have it under root. For example: webdriver.Chrome("/Users/adriansmart/PycharmProjects/SeleniumTest/chromedriver")*Use driver.find_element_by_xpath to select the element on html page. You will need to enter the xpath in this function so first load the website in a browser.*Next, right click on the page element you want the xpath and select inspect. This will launch the html inspector and highlight the relevant lines of code*Right click on what looks like the right piece of code and select "Copy xpath data"*Paste that stuff in your python script where it asks for a path, For example: driver.find_element_by_xpath("//*[@id='solr-print-dropdown-button']") = SBIR Concatenation ===Objective==The objective of this project was to concatenate 162 xlsx files into one large tab delimited text file <br>==Script==The python script can be found here: E:\McNair\Projects\SBIR\concat_excel.pyThe resulting file is located here: E:\McNair\Projects\SBIR\SBIR.txt

Navigation menu