Tools, Scripts, and Data
All rights are reserved on my tools and scripts, as is my documentation on how to work with various data sources. The data sources themselves belong to their respective owners.
My four most popular tools are:
- The Matcher: A firm name matching tool capable of joining very large datasets that use firm name identifiers. It has been used to join the NBER patent data to COMPUSTAT and CRSP for the NBER Patent Data Project, and has a loyal following.
- Normalizer.pl: A script for processing SDC data into third-normal form ready for import into a database
- STATA-Fix-Regressions.pl: A script that had a loyal following before OUTREG2 (and other tools) got so good.
- BibTucker.pl: A script for doing wierd and wonderful things to large files of BibTeX entries.
These tools are available to researchers who provide sufficient reassurances.
Pages on data sources include:
Tools and Scripts (by application)
- Classifying Names by Culture
- Culture Based Classifications
- Normalizing Surnames
- Sources of Surname Data
- Extracting Features from Surnames