Difference between revisions of "Start-Ups of Houston (Map)"

From edegan.com
Jump to navigation Jump to search
Line 2,949: Line 2,949:
  
 
==Processes==
 
==Processes==
 +
 +
'''Steps taken'''
 +
#Mined websites like AngelList, Cruchbase, StartupBlink, Houston Startups List, etc.
 +
#Cleaned data
 +
**Columns align with headers all the way down
 +
**Websites actually belong to the company (Not youtube or angellist)
 +
**There are no "new lines" in individual cells
 +
**There are no open quotes (or really just no quotes in general is best)
 +
#Uploaded into the Houston psql database
 +
**Saved as UTF-8 encoding
 +
#Used Matcher to Match compiled names against itself
 +
**Used this matched file to standardize/normalize names for future data consolidating
 +
#Made distinct list of Houston Startups using file above
 +
#Made a priority list for importing data into the Masterfile
 +
#Using priority list populated empty columns in Masterfile with each of the mined tables
 +
**had to go back separate some things out like addresses or multiple accelerators
 +
#exported MasterFile into excel
 +
 +
'''Future Steps'''
 +
#Use who is parser to find missing addresses
 +
#upload individual startups into their own wiki
 +
#repeat these steps with Venture firms, Angels (& Groups), Accelerators, Incubators, Service Firms, Flex & Co-working spaces, Event Spaces, etc.
  
 
==References==
 
==References==

Revision as of 15:36, 5 July 2016


McNair Project
Start-Ups of Houston (Map)
Project logo 02.png
Project Information
Project Title
Start Date
Deadline
Primary Billing
Notes
Has project status
Copyright © 2016 edegan.com. All Rights Reserved.


Abstract

Using lists mined from websites, weblists, and databases, this map will be precisely locating and diagraming the Startups of Houston, TX. Later incorporations will include corresponding wiki pages for individual companies as well as maps of startup resources (including: accelerators, incubators, Angels and VC firms).

Report

From File File:HStartupMaster7.xlsx

Processes

Steps taken

  1. Mined websites like AngelList, Cruchbase, StartupBlink, Houston Startups List, etc.
  2. Cleaned data
    • Columns align with headers all the way down
    • Websites actually belong to the company (Not youtube or angellist)
    • There are no "new lines" in individual cells
    • There are no open quotes (or really just no quotes in general is best)
  1. Uploaded into the Houston psql database
    • Saved as UTF-8 encoding
  1. Used Matcher to Match compiled names against itself
    • Used this matched file to standardize/normalize names for future data consolidating
  1. Made distinct list of Houston Startups using file above
  2. Made a priority list for importing data into the Masterfile
  3. Using priority list populated empty columns in Masterfile with each of the mined tables
    • had to go back separate some things out like addresses or multiple accelerators
  1. exported MasterFile into excel

Future Steps

  1. Use who is parser to find missing addresses
  2. upload individual startups into their own wiki
  3. repeat these steps with Venture firms, Angels (& Groups), Accelerators, Incubators, Service Firms, Flex & Co-working spaces, Event Spaces, etc.

References

https://angel.co/

http://www.startupblink.com/Houston-startups

http://houston.startups-list.com/

https://www.crunchbase.com/#/home/index

SDC Platinum

Ed Egan