Changes

Software Repository Listing (view source)

Revision as of 12:22, 13 March 2017

705 bytes removed , 12:22, 13 March 2017

no edit summary

This repository contains all tools and scripts meant for system administration (stuff like backup scripts..)

*See the [[Center IT]] page for current documentation.

==Harvard Dataverse==

This repository contains all tools and scripts related to Harvard Dataverse.

*The [[Harvard Dataverse]] page provides instruction on how to access data.

----

'''Contents'''

~~==Geocoding Inventor Locations==This repository holds software for matching Inventor addresses to known locations.There are two programs/scripts that do same job. One is implemented in Perl(old) and other in Python(new). You should probably use the newer tool.~~*See [[Geocoding Inventor Locations (Tool)]] for documentation on the older version implemented in Perl.*See [[Geocode.py]] for the newer version in Python''cleaning db.sql''

~~==Matcher==This repository contains the matcher tool which is used to match firm names given two lists.~~*See [[The Matcher (Tool)]] for documentation''copytable.sql''

~~==Patent Data Parser==This repository contains all tools developed for patent data parsing.~~*[[Patent Data (Tool)]] and [[Patent Data Extraction Scripts (Tool)]] pages on the wiki describe our Patent Database schema and corresponding XML parsing tools.*Also, see [[USPTO Assignees Data]] which explains Patent Assignee Database schema and relevant XML parsing tools''createtables.sql''

~~==Utilities==This repository contains various utilities developed for text processing and other generally useful tools~~''droptables. ~~See the wiki pages for each tool~~sql'~~s documentation.~~*[[Fuzzy match names (Tool)]]*[[Godo (Tool)]]*[[Normalizer Documentation | Normalizer]]. On the [[Software Repository|git-server]] we have many different versions like normalize fixed width, normalize surnames. ~~==Web Crawler==This repository contains all software for web crawlers.~~*[[Whois Parser]] pulls the Whois information given a list for URLs.*[[PhD Masterclass - How to Build a Web Crawler]]: Ed'~~s class on building a web crawler.~~*[[LinkedIn Crawler (Tool)]] : web scraper for Linked In.

==ExecutiveOrderCrawler==

''Extractor.py''

This script runs through the text files of the executive orders, and outputs a CSV with a 1 if the order hit a buzzword, and a 0 if it did not.

==Geocoding Inventor Locations==

This repository holds software for matching Inventor addresses to known locations.

There are two programs/scripts that do same job. One is implemented in Perl(old) and other in Python(new). You should probably use the newer tool.

*See [[Geocoding Inventor Locations (Tool)]] for documentation on the older version implemented in Perl.

*See [[Geocode.py]] for the newer version in Python.

----

'''Contents'''

''MatchLocations.pl'' Perl version

''Geocode.py'' Python version

==GovTrack==

''Govtrack_webcrawler_AllEnactedBills.pl'' Perl code.

~~==Harvard Dataverse==~~

~~This repository contains all tools and scripts related to Harvard Dataverse.~~

*The [[Harvard Dataverse]] page provides instruction on how to access data.

~~----~~

~~'''Contents'''~~

~~''cleaning db.sql''~~

~~''copytable.sql''~~

~~''createtables.sql''~~

~~''droptables.sql''~~

==Geocoding Inventor Locations==

*See [[Geocoding Inventor Locations (Tool)]] for documentation on the older version implemented in Perl.

*See [[Geocode.py]] for the newer version in Python.

~~----~~

~~'''Contents'''~~

~~''MatchLocations.pl'' Perl version~~

~~''Geocode.py'' Python version~~

==Matcher==

The first instantiation of the Twitter Crawler application 1 described above.

==Utilities==

This repository contains various utilities developed for text processing and other generally useful tools. See the wiki pages for each tool's documentation.

*[[Fuzzy match names (Tool)]]

*[[Godo (Tool)]]

*[[Normalizer Documentation | Normalizer]]. On the [[Software Repository|git-server]] we have many different versions like normalize fixed width, normalize surnames.

==Web Crawler==

This repository contains all software for web crawlers.

*[[Whois Parser]] pulls the Whois information given a list for URLs.

*[[PhD Masterclass - How to Build a Web Crawler]]: Ed's class on building a web crawler.

*[[LinkedIn Crawler (Tool)]] : web scraper for Linked In.

[[category:McNair Admin]]

[[admin_classification::Software Repository| ]]

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,612

edits

Changes

Software Repository Listing (view source)

Revision as of 12:22, 13 March 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools