Changes

Jump to navigation Jump to search
no edit summary
This repository contains all tools and scripts meant for system administration (stuff like backup scripts..)
*See the [[Center IT]] page for current documentation.
 
==Harvard Dataverse==
This repository contains all tools and scripts related to Harvard Dataverse.
*The [[Harvard Dataverse]] page provides instruction on how to access data.
----
'''Contents'''
==Geocoding Inventor Locations==This repository holds software for matching Inventor addresses to known locations.There are two programs/scripts that do same job. One is implemented in Perl(old) and other in Python(new). You should probably use the newer tool.*See [[Geocoding Inventor Locations (Tool)]] for documentation on the older version implemented in Perl.*See [[Geocode.py]] for the newer version in Python''cleaning db.sql''
==Matcher==This repository contains the matcher tool which is used to match firm names given two lists.*See [[The Matcher (Tool)]] for documentation''copytable.sql''
==Patent Data Parser==This repository contains all tools developed for patent data parsing.*[[Patent Data (Tool)]] and [[Patent Data Extraction Scripts (Tool)]] pages on the wiki describe our Patent Database schema and corresponding XML parsing tools.*Also, see [[USPTO Assignees Data]] which explains Patent Assignee Database schema and relevant XML parsing tools''createtables.sql''
==Utilities==This repository contains various utilities developed for text processing and other generally useful tools''droptables. See the wiki pages for each toolsql's documentation.*[[Fuzzy match names (Tool)]]*[[Godo (Tool)]]*[[Normalizer Documentation | Normalizer]]. On the [[Software Repository|git-server]] we have many different versions like normalize fixed width, normalize surnames. ==Web Crawler==This repository contains all software for web crawlers.*[[Whois Parser]] pulls the Whois information given a list for URLs.*[[PhD Masterclass - How to Build a Web Crawler]]: Ed's class on building a web crawler.*[[LinkedIn Crawler (Tool)]] : web scraper for Linked In.
==ExecutiveOrderCrawler==
''Extractor.py''
This script runs through the text files of the executive orders, and outputs a CSV with a 1 if the order hit a buzzword, and a 0 if it did not.
 
==Geocoding Inventor Locations==
This repository holds software for matching Inventor addresses to known locations.
There are two programs/scripts that do same job. One is implemented in Perl(old) and other in Python(new). You should probably use the newer tool.
*See [[Geocoding Inventor Locations (Tool)]] for documentation on the older version implemented in Perl.
*See [[Geocode.py]] for the newer version in Python.
 
----
'''Contents'''
 
''MatchLocations.pl'' Perl version
 
''Geocode.py'' Python version
==GovTrack==
''Govtrack_webcrawler_AllEnactedBills.pl'' Perl code.
 
 
 
==Harvard Dataverse==
This repository contains all tools and scripts related to Harvard Dataverse.
*The [[Harvard Dataverse]] page provides instruction on how to access data.
----
 
'''Contents'''
 
''cleaning db.sql''
 
''copytable.sql''
 
''createtables.sql''
 
''droptables.sql''
 
==Geocoding Inventor Locations==
*See [[Geocoding Inventor Locations (Tool)]] for documentation on the older version implemented in Perl.
*See [[Geocode.py]] for the newer version in Python.
 
----
'''Contents'''
 
''MatchLocations.pl'' Perl version
 
''Geocode.py'' Python version
==Matcher==
The first instantiation of the Twitter Crawler application 1 described above.
==Utilities==
This repository contains various utilities developed for text processing and other generally useful tools. See the wiki pages for each tool's documentation.
*[[Fuzzy match names (Tool)]]
*[[Godo (Tool)]]
*[[Normalizer Documentation | Normalizer]]. On the [[Software Repository|git-server]] we have many different versions like normalize fixed width, normalize surnames.
 
==Web Crawler==
This repository contains all software for web crawlers.
*[[Whois Parser]] pulls the Whois information given a list for URLs.
*[[PhD Masterclass - How to Build a Web Crawler]]: Ed's class on building a web crawler.
*[[LinkedIn Crawler (Tool)]] : web scraper for Linked In.
[[category:McNair Admin]]
[[admin_classification::Software Repository| ]]

Navigation menu