Difference between revisions of "Research Plans"

From edegan.com
Jump to navigation Jump to search
Line 25: Line 25:
  
 
==James Chen==
 
==James Chen==
{{:James Chen (Research Plans)}}
+
{{:James Chen (Research Plan)}}
  
 
==Ariel Sun==
 
==Ariel Sun==

Revision as of 13:20, 15 July 2016


Ravali Kruthiventi

Project - USPTO Assignees, Patent and Citation Data

Assignees Data

  • Data source: patent database (merged data from patent_2015 and patentdata databases)
    • Issues: citations data contains non numeric patent numbers (likely application numbers, etc)
    • Solution:
      • Segregate into smaller tables so that Amir and Marcela can identify patterns
      • link back to appropriate patent numbers from the patent table
    • Time to implement: 1 day
    • Priority:
    • Teams waiting for it:
    • Deadline:
  • Data Source: USPTO Bulk Data repository
    • Issues:
      • The script inserts copies of data into the tables.
      • Analysis required on the data to make sure the data was inserted correctly from the XML files.
      • Analysis is also required to determine whether this data is better than the data we have in the patent database right now.
        • Action owners : Amir and Marcela
    • Solution:
      • Amir and Marcela and/or I need to look at the data to determine quality
        • If they find that any of this data is better than the data we currently have, I will have to figure out a way to integrate this data into our data model for patent data.
      • Amir and Marcela and/or I will need to delete the copies
    • Time to implement:
    • Priority:
    • Teams waiting for it:
    • Deadline:

Project - Lex Machina Data

  • Data Source:
    • Issues:
    • Solution:
    • Time to implement:
    • Priority:
    • Teams waiting for it:
    • Deadline:


Project - Pattern Recognition on Patent Data through Machine Learning

  • Data Source: The patent database.
    • Plan:
      • Technique
        • Determine research question to be asked
        • Scrub data
        • Determine 3-4 mining\machine learning techniques to best extract patterns
        • Train the algorithms
        • Run the algos on sample dataset
        • Determine the algo with best results
        • Implement the
    • Known Issues:
      • Dataset to be cleaned, quality analyzed as specified above.
    • Deliverables
      • Set of patterns to base further research on
      • Research paper (?)
        • Documentation - Wiki page
    • Time to implement:
    • Priority:
    • Teams waiting for it: None
    • Deadline:


Dylan Dickens


Ben Baldazo

Ben Baldazo Research Plans (Plan Page)

Houston Entrepreneurship

Startups of Houston Interactive Maps - The Whole Process

Start-Ups of Houston (Map)

Use Google Maps to find Longitude and Latitude

  • Document how to work Geocode.py and what might go wrong

Put through R code to make an interactive map

  • Finding and Documenting the processes required to run the R code may be necessary
  • Works on Carto and looks really cool
  • We do eventually need to have a plug in and a Carto account so that we can post this on the Wiki

Accelerator Quality Issue Brief

Houston Accelerators (issue brief)

Factors to look at

  • Value Added
    • How to look at this though?
  • Market vs Non-Market
    • Philanthropic funding?
      • If Non-Profit: Propublica will document contributions
      • If for profit: call?
    • Founded Bottom up? or Top down?
      • See if it was founded by a group or individual with actual industry connections (online or phone)
  • Location
    • Proximity to resources
      • We may have to update the startups in Houston map for this
  • Available Resources (we should generally be able to call or find this on website, these should be things they brag about)
    • Flex Space
    • Events
    • Co-Working
    • Connections/Mentorship
      • This can also have a judged value based on Mentor/Connection perceived experiece
    • Funding (This may be hard but if they offer their own VC we can check through SDC)
    • Userbase
  • Leadership/Experience qualifications (Linkedin, profiles on their own websites, other bios)
    • Have they driven a startup before or been in backseat
    • What other qualifications do they have
  • Criteria from the Acc Rankings (as long as we have portfolios then we can use SDC for this)
    • VC funding history
    • IPO
    • Acquired
  • Any other reviews possible
    • Other articles (Xconomy, Houston Chronicle, etc.)
    • Info from actually calling the accelerators (Putting a list of questions up on the discussion for Houston Accelerators (issue brief)
    • perhaps reviews from startups themselves
      • Could look specifically into startups that have gone through multiple accelerators hopefully we have phone numbers on File:HSM10.xlsx


Jake Silberman

Jake Silberman Research Plans Plan Page

Leveraged Buyout Innovation (Academic Paper)

  • Finalize Hazard Model
    • Determine best regression model (Cox or something else that makes more assumptions)
    • Determine finalized variable set
    • Predict based on model
  • Match LBO and non LBO companies based on hazard model predictions
    • Generate buckets, i.e. break down by industry, decade, etc...
    • Determine metric for matching
  • Integrate new patent data
    • Create stocks of patents
    • Add in patent assignment data
  • Analysis of control group and study group for first results
    • Refine matching if necessary
    • Test for endogeneity/other issues
  • Lit review for variables
    • Revise preexisting regression variable write-up and reformat it to appropriate academic paper form


Venture Capital (Data)

  • Do final correct pull of SDC Data (just include IPOS)
  • Clean data, throwing out duplicate names and only take most recently invested one
  • Rank cities by venture capital on different metrics, either in SQL or Excel
  • Write up issue brief


Shoeb Mohammed

Shoeb Mohammed Research Plans Research Plan page

Short Term

  • Create a listing on the wiki for all software developed at McNair center. - Completed
  • Build a Linux box to run the crawler. - Completed

Long Term

  • Optimize/re-design the 'Matcher' software. In particular, speed-up fuzzy matching and possibly re-structure the code to make usage easier.
  • Develop the crawler. Try to begin with code that Dan has. - Completed

Side Tasks

  • If possible, redo the patent parser (previous coded by Kranti) to also pull in Patent Citation data.

Veeral Shah

Veeral Shah (Research Plans)

James Chen

James Chen Research Plans (Plan Page)

  • Short term:
    • Refine variables to include in hazard rate model
      • Industrygroup
      • Log or ratio for tax, ebitda, etc.
  • Long term:
    • Finish hazard model
    • Complete hazard rate matching
    • Test for endogeneity, variables list
    • Incorporate new patent data (stock, transfers, etc.)
    • Complete literature Review using final variable list

Ariel Sun

Ariel Sun Research Plans (Plan Page)

Hubs (Academic Paper)

  • VC table
    • Waiting for patent data to be fixed to join to VC table
    • Import VC data to STATA
    • Hazard rate model
    • Diff-in-diff


Todd Rachowin

Todd Rachowin Research Plans (Plan Page)

  1. Short-Term = Hubs List (Hubs: Mechanical Turk)
    1. Creating a comprehensive list of potential hubs
    2. Determining the best variables for the scorecard
    3. Building "filters" for automating the collection
    4. Running and auditing of the automation
    5. Collecting the remaining manual data
  1. Long-Term = Everything Else
    1. Hazard Rate Model (determine proper one, run it, etc.)
    2. Diff-Diff

Gunny Liu

Gunny Liu Research Plans (Plan Page)

Week VII

7/11 thru 7/15

  • Finalize Twitter Webcrawler version Alpha, discuss roadmap ahead with research fellows
  • Expand semantic mediawiki capabilities on our wiki and provide documentation for existing data structures
  • Configuration of data transfer of startup data from local to wiki wrt Ben

Week VIII

7/8 thru 7/22

  • Alpha Exploration & development of existing Google Maps API script
  • Advanced development of Twitter Webcrawler to populate McNair databases
    • Input: previously documented mothernodes and entrepreneurship buzzwords
  • Advance development of Eventbrite Webcrawler to populate McNair databases
    • To integrate with Google Maps API to provide updated mapping of active entrepreneurship events in Houston

Week IX

7/25 thru 7/29

  • Alpha Exploration & development of Techcrunch API
  • Alpha Exploration & development of Facebook API

Week X

8/1 thru 8/5

  • Advanced development of all API scripts to populate McNair databases

Week XI

8/8 thru 8/12

  • Last day of summer internship: 8/8