Changes

Jump to navigation Jump to search
no edit summary
Read the tutorial and instructions first before pushing anything to the git-server.
=Repositories on McNair [[Software Repository|git server]]Tools not currently in the repository=   =Tools in the repository=
==Center IT Sysadmin==
This repository contains all tools and scripts meant for system administration (stuff like backup scripts..)
*See the [[Center IT]] page for current documentation.
 
==Harvard Dataverse==
This repository contains all tools and scripts related to Harvard Dataverse.
*The [[Harvard Dataverse]] page provides instruction on how to access data.
----
 
'''Contents'''
 
''cleaning db.sql''
''copytable.sql''
 
''createtables.sql''
 
''droptables.sql''
 
==ExecutiveOrderCrawler==
This repository downloads and parses Executive Orders from Archives.gov using Scrapy and Selenium.
 
This project is documented on the McNair Wiki under the Code section/Executive Orders at: http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report#Code
----
'''Contents'''
 
''Order_links.txt''
This text file contains a list of links to the executive orders that need to be downloaded.
 
''Executive Spider''
The executive folder contains all the necessary components to run a successful Scrapy crawl.
 
''Extractor.py''
This script runs through the text files of the executive orders, and outputs a CSV with a 1 if the order hit a buzzword, and a 0 if it did not.
 
==Geocoding Inventor Locations==
This repository holds software for matching Inventor addresses to known locations.
There are two programs/scripts that do same job. One is implemented in Perl(old) and other in Python(new). You should probably use the newer tool.
*See [[Geocoding Inventor Locations (Tool)]] for documentation on the older version implemented in Perl.
*See [[Geocode.py]] for the newer version in Python.
 
----
'''Contents'''
 
''MatchLocations.pl'' Perl version
 
''Geocode.py'' Python version
 
==GovTrack==
 
The code in this repository is meant for scraping the govtrack website and running analytics on the data retrieved. This could prove helpful in the ongoing entrepreneurship research at McNair Center.
----
'''Contents'''
 
''GovPassed.py''
Python code. This filters Bills by the ones that passed.
 
''GovTrack.py''
Python code. Takes all bills from a directory and creates a tab-delimited text file of their stats, such as number of words and number of entrepreneurship buzzwords.
 
''Govtrack_webcrawler.pl'' Perl code.
 
''Govtrack_webcrawler_AllEnactedBills.pl'' Perl code.
==Geocoding Inventor Locations==
*[[Patent Data (Tool)]] and [[Patent Data Extraction Scripts (Tool)]] pages on the wiki describe our Patent Database schema and corresponding XML parsing tools.
*Also, see [[USPTO Assignees Data]] which explains Patent Assignee Database schema and relevant XML parsing tools.
 
==TwitterAPI==
Python code for interacting with Twitter
 
This project is documented on the McNair Wiki
 
http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1)
 
http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_2)
----
'''Contents'''
 
''AutoFollower.py''
Twitter crawler application 1 above. Incomplete file attempting to automate the process.
 
''Automate.py''
Twitter crawler application 2 above. Complete python code.
 
''InfoGrabber.py''
Gets information about an input twitter user.
 
''Twitter_Follower_Finder.py''
The first instantiation of the Twitter Crawler application 1 described above.
==Utilities==
*[[PhD Masterclass - How to Build a Web Crawler]]: Ed's class on building a web crawler.
*[[LinkedIn Crawler (Tool)]] : web scraper for Linked In.
 
[[category:McNair Admin]]
[[admin_classification::Software Repository| ]]

Navigation menu