Difference between revisions of "Research Plans"

Revision as of 16:16, 15 July 2016

Ravali Kruthiventi

Project - USPTO Assignees, Patent and Citation Data

Assignees Data

Data source: patent database (merged data from patent_2015 and patentdata databases)
- Issues: citations data contains non numeric patent numbers (likely application numbers, etc)
- Solution:
  - Segregate into smaller tables so that Amir and Marcela can identify patterns
  - link back to appropriate patent numbers from the patent table
- Time to implement: 1 day
- Priority:
- Teams waiting for it:
  - Marcela and Amir
    - Project : Patent data analysis
  - Jake and James, potentially could need this down the line
    - Project : LBO data
- Deadline:

Data Source: USPTO Bulk Data repository
- Issues:
  - The script inserts copies of data into the tables.
  - Analysis required on the data to make sure the data was inserted correctly from the XML files.
  - Analysis is also required to determine whether this data is better than the data we have in the patent database right now.
    - Action owners : Amir and Marcela
- Solution:
  - Amir and Marcela and/or I need to look at the data to determine quality
    - If they find that any of this data is better than the data we currently have, I will have to figure out a way to integrate this data into our data model for patent data.
  - Amir and Marcela and/or I will need to delete the copies
- Time to implement:
- Priority:
- Teams waiting for it:
- Deadline:

Project - Lex Machina Data

Data Source:
- Issues:
- Solution:
- Time to implement:
- Priority:
- Teams waiting for it:
- Deadline:

Project - Pattern Recognition on Patent Data through Machine Learning

Data Source: The patent database.
- Plan:
  - Technique
    - Determine research question to be asked
    - Scrub data
    - Determine 3-4 mining\machine learning techniques to best extract patterns
    - Train the algorithms
    - Run the algos on sample dataset
    - Determine the algo with best results
    - Implement the
- Known Issues:
  - Dataset to be cleaned, quality analyzed as specified above.
- Deliverables
  - Set of patterns to base further research on
  - Research paper (?)
    - Documentation - Wiki page
- Time to implement:
- Priority:
- Teams waiting for it: None
- Deadline:

Amir Kazempour

Introduction

The research plan at this point consists of roughly 5 separate smaller tasks. Namely, data project relevant to the little guy paper, Lex Machina data pull which is a vital step to get started with the little guy paper, application for the SBA funding for little guy paper and improving and updating wiki pages relevant to the little guy project and Marcela's work on patent litigation process.

Data Project

We have found few issues with the patent database which need to be addressed in order to have the datasets required for the little guy paper.

USPTO historical assignment data
- The issue with table keys is potentially resolved.
- Create a table to enable us to track patent ownership through the life of a patent.
Assignees data
- Identify U.S. assignees in the data for all assignees without a valid country or state entry.
Maintenance fee data
- Create a table with all the active patents and their expected remaining life using the maintenance fee event codes.
Citation data
- We need to recognize patterns in the cited patent numbers. A low hanging fruit would be to match all the publication numbers to the granted patent numbers in the histpatent table. Few other repeating patterns seem to point to foreign issued patents.

Lex Machina data pull

A data pull proposal/request has been prepared. We need to get in touch with Brian to discuss the first pull. Upon acquiring the first data pull, we need to assess feasibility of the little guy paper as discussed in the proposal mentioned above.

Little guy paper

We need to consider the possibility of new research questions based on the hypotheses discussed in Lerner, Josh, Andrew Speen, and Ann Leamon. "The Leahy-Smith America Invents Act: A Preliminary Examination of Its Impact on Small Businesses."

Further progress on the paper is conditional on the first data pull from Lex Machina.

Wiki pages

Making sure that:

All the SQL codes are available on the wiki and are up to date.
Full data description for the USPTO database and the USPTO historical assignment data are available on the wiki.
Pages are linked correctly.

SBA grant application

The goal is to go over the Instruction to Offerors and Statement of Work documents, both available on Research on the Changing Value of Patents to Startups page, and gauge the compatibility of the little guy paper's research question with the one in these documents. The main focus for the literature review will be on Lerner, Josh, Andrew Speen, and Ann Leamon. "The Leahy-Smith America Invents Act: A Preliminary Examination of Its Impact on Small Businesses".

Dylan Dickens

Ben Baldazo

Ben Baldazo Research Plans (Plan Page)

Houston Entrepreneurship

Startups of Houston Interactive Maps - The Whole Process

Start-Ups of Houston (Map)

Use Google Maps to find Longitude and Latitude

Document how to work Geocode.py and what might go wrong

Put through R code to make an interactive map

Finding and Documenting the processes required to run the R code may be necessary
Works on Carto and looks really cool
We do eventually need to have a plug in and a Carto account so that we can post this on the Wiki

Accelerator Quality Issue Brief

Houston Accelerators (issue brief)

Factors to look at

Value Added
- How to look at this though?
Market vs Non-Market
- Philanthropic funding?
  - If Non-Profit: Propublica will document contributions
  - If for profit: call?
- Founded Bottom up? or Top down?
  - See if it was founded by a group or individual with actual industry connections (online or phone)
Location
- Proximity to resources
  - We may have to update the startups in Houston map for this
Available Resources (we should generally be able to call or find this on website, these should be things they brag about)
- Flex Space
- Events
- Co-Working
- Connections/Mentorship
  - This can also have a judged value based on Mentor/Connection perceived experiece
- Funding (This may be hard but if they offer their own VC we can check through SDC)
- Userbase
Leadership/Experience qualifications (Linkedin, profiles on their own websites, other bios)
- Have they driven a startup before or been in backseat
- What other qualifications do they have
Criteria from the Acc Rankings (as long as we have portfolios then we can use SDC for this)
- VC funding history
- IPO
- Acquired
Any other reviews possible
- Other articles (Xconomy, Houston Chronicle, etc.)
- Info from actually calling the accelerators (Putting a list of questions up on the discussion for Houston Accelerators (issue brief)
- perhaps reviews from startups themselves
  - Could look specifically into startups that have gone through multiple accelerators hopefully we have phone numbers on File:HSM10.xlsx

Jake Silberman

Jake Silberman Research Plans Plan Page

Leveraged Buyout Innovation (Academic Paper)

Finalize Hazard Model
- Determine best regression model (Cox or something else that makes more assumptions)
- Determine finalized variable set
- Predict based on model
Match LBO and non LBO companies based on hazard model predictions
- Generate buckets, i.e. break down by industry, decade, etc...
- Determine metric for matching
Integrate new patent data
- Create stocks of patents
- Add in patent assignment data
Analysis of control group and study group for first results
- Refine matching if necessary
- Test for endogeneity/other issues
Lit review for variables
- Revise preexisting regression variable write-up and reformat it to appropriate academic paper form

Venture Capital (Data)

Do final correct pull of SDC Data (just include IPOS)
Clean data, throwing out duplicate names and only take most recently invested one
Rank cities by venture capital on different metrics, either in SQL or Excel
Write up issue brief

Shoeb Mohammed

Shoeb Mohammed Research Plans Research Plan page

Short Term

Create a listing on the wiki for all software developed at McNair center. - Completed
Build a Linux box to run the crawler. - Completed

Long Term

Optimize/re-design the 'Matcher' software. In particular, speed-up fuzzy matching and possibly re-structure the code to make usage easier.
Develop the crawler. Try to begin with code that Dan has. - Completed

Side Tasks

If possible, redo the patent parser (previous coded by Kranti) to also pull in Patent Citation data.

Veeral Shah

Short Term:

Build Web Crawler Tool that can obtain company descriptions for a list of companies using HTML and Python.
Collaborate with Ben to help with obtain and organize information on Houston startup companies.

James Chen

James Chen Research Plans (Plan Page)

Short term:
- Refine variables to include in hazard rate model
  - Industrygroup
  - Log or ratio for tax, ebitda, etc.
Long term:
- Finish hazard model
- Complete hazard rate matching
- Test for endogeneity, variables list
- Incorporate new patent data (stock, transfers, etc.)
- Complete literature Review using final variable list

Ariel Sun

Ariel Sun Research Plans (Plan Page)

Hubs (Academic Paper)

Hubs
- Get scorecard system completed Hubs: Mechanical Turk
- Mechanical Turk for potential hubs Mechanical Turk (Tool)
- Matched identified hubs to CMSAs

VC table
- Waiting for patent data to be fixed to join to VC table
- Import VC data to STATA
- Hazard rate model
- Diff-in-diff

Todd Rachowin

Todd Rachowin Research Plans (Plan Page)

Short-Term = Hubs List (Hubs: Mechanical Turk)
1. Creating a comprehensive list of potential hubs
2. Determining the best variables for the scorecard
3. Building "filters" for automating the collection
4. Running and auditing of the automation
5. Collecting the remaining manual data

Long-Term = Everything Else
1. Hazard Rate Model (determine proper one, run it, etc.)
2. Diff-Diff

Gunny Liu

Gunny Liu Research Plans (Plan Page)

Week VII

7/11 thru 7/15

Finalize Twitter Webcrawler version Alpha, discuss roadmap ahead with research fellows
Expand semantic mediawiki capabilities on our wiki and provide documentation for existing data structures
Configuration of data transfer of startup data from local to wiki wrt Ben

Week VIII

7/8 thru 7/22

Alpha Exploration & development of existing Google Maps API script
Advanced development of Twitter Webcrawler to populate McNair databases
- Input: previously documented mothernodes and entrepreneurship buzzwords
Advance development of Eventbrite Webcrawler to populate McNair databases
- To integrate with Google Maps API to provide updated mapping of active entrepreneurship events in Houston

Week IX

7/25 thru 7/29

Alpha Exploration & development of Techcrunch API
Alpha Exploration & development of Facebook API

Week X

8/1 thru 8/5

Advanced development of all API scripts to populate McNair databases

Week XI

8/8 thru 8/12

Last day of summer internship: 8/8

@@ Line 25: / Line 25: @@
 ==Veeral Shah==
-{{:Veeral Shah (Research Plans)}}
+Short Term:
+*Build Web Crawler Tool that can obtain company descriptions for a list of companies using HTML and Python.
+*Collaborate with Ben to help with obtain and organize information on Houston startup companies.
 ==James Chen==