Difference between revisions of "Work Logs"
Suchen-teh (talk | contribs) |
|||
Line 101: | Line 101: | ||
== Meghana Pannala == | == Meghana Pannala == | ||
{{: Meghana Pannala (Work Log)}} | {{: Meghana Pannala (Work Log)}} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=Technical= | =Technical= |
Revision as of 17:06, 20 February 2017
Work Logs are broken down within two divisions of McNair Center, the long-term deliverables of academic papers and short-term deliverables of general content. Individuals working within a division will be listed under the respective one. In case an individual works within both divisions, they will be listed in both locations.
Contents
Academic Papers
This division of the McNair Center pursues longer term projects, such as peer-reviewed academic papers.
Jake Silberman
Jake Silberman Work Logs (log page)
Will Cleland
Will Cleland Work Logs (log page)
Todd Rachowin
Todd Rachowin Work Logs (log page)
Amir Kazempour
Amir Kazempour Work Logs (log page)
Content
This division of the McNair Center focuses on shorter term projects, including blog posts, tweets, and issue briefs.
Dylan Dickens
2018-03-06:Troubleshot Key Terms program with Christy, continued to read articles.
2018-03-05: Tested the Key Terms program, found it not to be working. Troubleshot and alerted Christy.
2018-03-01: Started to read articles for key-terms testing.
2018-02-28: Adjusted some wiki pages, started testing the revamped tools.
2018-02-27: Drafted email with concerns to Ed, met with Ed to resolve concerns. Created action plan of testing the revamped tools and codifying a subset of known papers.
2018-02-26: Reviewed Christy's new documentation, prepared to meet with Ed.
2018-02-22: Tested RegEx-Excel Filter process, flagged some additional questions that need guidance from Ed. Met with Christy and worked to resolve coding issues.
2018-02-21: Finished RegEx-Excel Filter process, spoke with Ed about long-term goals of project.
2018-02-20: Continued working on the RegEx-Excel filter.
2018-02-19: Continued working on the RegEx-Excel filter.
2018-02-15: Started developing a RegEx and Excel filter for processing and cross-referencing sources.
2018-02-14: Identified the status of all codes. Drafted an email to Christy about retunring temporarily to help with the codes.
2018-02-13: Ran the KeyTerms and PDF Converter Python Codes.
2018-02-12: Finished troubleshooting crawler, reached out to Ed for guidance. Was redirected to testing Key Terms code.
2018-02-07: Troubleshot the crawler with Christy.
2018-02-06: Troubleshot the crawler with Christy.
2018-02-05: Reached out to Ed for guidance, was redirected to testing the scholar crawler.
2018-02-01: Continued PDF - BibTex filtering
2018-01-31: Started PDF - BibTex filtering process as per meetings with Christy and Lauren.
2018-01-30: Met with both Christy and Lauren.
2018-01-29: Reviewed the current state of PTLR project in order to prepare for meetings on Tuesday.
2018-01-26: Assisted with the McNair Center Event.
2018-01-25: Reached out to previous project owners to gather information for next steps. Was on standby to assist with the Lyceum Research Page
2018-01-24: Searched for tools to accomplish the strategies outlined in Patent Thicket Strategic Planning. Had a hard time locating anything, or getting a good grasp on where exactly the project is and what it needs. Gathered contact information for previous owners to make communications later this week. Also continued to prep the Lyceum Research Page for Ed.
2018-01-23: Finished Patent Thicket Strategic Planning and sent to Ed. Ed approved.
2018-01-22: Read Patent Thicket literature. Met with Ed to discuss broad strategy, began planning for next steps. Patent Thicket Strategic Planning
2018-01-18: Met with Ed to discuss Patent Thicket Project. Helped complete his research for the Amazon HQ2 Report.
2018-01-11: Finalized sourcing for Venture Capital Gap for Women
2018-01-10: Sourced all of Venture Capital Gap for Women, downloaded PDF's for about 3/4 of sources
2018-01-09: Found additional sources on the Venture Capital Gap for Women, as well as Fondren availability for a portion of the sources.
9/23/2016 2:00-4:30 Introductory explanation/exploration, helped Catherine find source for Ed
9/26/2016 2:00-4:00 Began research for blog post about largely unknown entrepreneurial hubs, checked links on McNair Center blog
9/27/2016 4:00-6:00 Set up personal and work log pages, researched for blog post
Eliza Martin
Eliza Martin Work Logs (log page)
Meghana Gaur
Meghana Gaur Work Logs (log page)
2017-12-1: worked with ed to build tables with firm/portco data on distance and fund/portco data on performance
2017-11-16: finished calculating great circle distances between firms, portco's, and branch offices (look at roundlinewithgcd table)
2017-11-14: worked on getting all roundline tables down to the firm level, instead of fund; running into small problems with calculating gcd between firms and portco's (will discuss with Ed)
2017-11-14: worked on joining ipo information to roundline; aggregated ipo information to the fund level (rather than fund)
2017-11-09: reloaded firm coords and also fund coords - re-building roundlinewithgcd (code is written, but fund coords weren't correctly loaded, so this code will be re-run), wrote code for fundtofirms and portcotofirms, but this code will be re-run once the firm codes are correctly loaded; working on joining portcoexitmaster to roundlinejoinerlean
2017-11-08: loaded roundlinewithgcd table (calculating gcd between portcos and funds), created GCD example with notes in datawork folder in MatchingEntrepsToVCs, worked on building portcostofirms
2017-11-07: loaded portcocoords table, joined portcocoords to roundlinejoinerlean, calculate gcd distance between funds and portco's, work on joining funds to firms
2017-11-03: loaded table/sql script for firms office locations into vcdb2 with latitude and longitude coordinates; joined coordinates to all clean base tables for firms, funds, branch offices, joined co and fund coordinates to roundlinejoinerlean in new table: roundlinecoords
2017-11-02: met with Ed; loaded tables/sql script for branch office and fund office locations into vcdb2 with latitude and longitude coordinates
2017-10-27: come up with next steps for matching firms to funds - for geocoding branch offices
2017-10-26: update VC Database Rebuild wiki; identify key for bocore table; verify that fundbasecore table was correctly cleaned after being being rebuilt by Ed
2017-10-24: met with Ed to discuss firmbase and branch office tables; find key for firmbasecore table; remove undisclosed firms from both firmbasecore and bocore
2017-10-12: peer edit and put Shelby's blog post into Wordpress; see what needs to be done on VC project; continue literature review for matching models
2017-10-11: finished loading tables (firmbase and branchoffice)
2017-10-6: load data using SQL code into tables, which is on Retrieving US VC Data From SDC
2017-09-29: completed pulling/normalizing data, still need to load data using SQL code into tables, which is on Retrieving US VC Data From SDC
2017-09-28: met with Ed, worked on pulling firm and branch office data from SDC
2017-09-22: join portcos and funds; and begin literature review of matching games/venture capital (located in "Matching Entreps to VC's project folder" on E drive."
2017-09-21: work with Ed on research project
2017-09-19: continue to work on joining portcoexits and roundlinejoiner tables in vcdb2, in MatchingEntrepsToVC folder under project management
2017-09-15: work on joining portcoexits and roundlinejoiner; create txt file called "Notes on Matching Funds to portcos" in the "Matching Entreps to VC's project folder" on E drive.
2017-09-14: build table roundlinejoinerapprop (appropriate the funds between funds; work on joining portcoexits and roundlinejoiner)
2017-09-27: rebuild portcoexits and work on apportioning amounts in roundlinejoiner
2017-09-07: work with Ed to familiarize with SQL script for VC project/vcdb2 database
2017-09-05: receive project from Ed; reacquaint with wiki, RDP, etc.
Marcela Interiano
Marcela Interiano Work Logs (log page)
Veeral Shah
Veeral Shah Work Logs (log page)
Ariel Sun
Ariel Sun Work Logs (log page)
Gunny Liu
Gunny Liu Work Logs (Work Log)
Ben Baldazo
Ben Baldazo Work Logs (Work Log)
contributing Projects
Crunchbase Data / Accelerator Seed List (Data) : Combined this data in a table discussed on Crunchbase Data
Houston Entrepreneurship Ecosystem Project
Houston Accelerators and Incubators (Report)
Cofounding in Exchange for Equity
worklog
2017-11-21: Worked with Ed to set up all of the ground work to begin joining tables for the purpose stated in yesterday's work log. Should be able to finish it upon returning next week, but until then, notes are all held within "Z:\bulk\crunchbase\AccFunding.psql" with the important parts under the header of "From Ed on 21st of Nov. To finish on Nov 27" 2017-11-20: Attempting to link 3 tables from psql crunchbasebulk to find accelerators that have invested in companies. Likely found success with the table "Acc_Funded_Cos" but the investor column is dirty, thus trying to do it cleaner with the aforementioned 3 table link
- This is all noted in "Z:\bulk\crunchbase\AccFunding.psql" and the code for "Acc_Funded_Cos" is emphasized
2017-09-25: Followed Talk:Ben Baldazo (Work Log) to create documentation infrastructure for Augusta Startup Ecosystem
Shoeb Mohammed
Shoeb Mohammed Work Logs Work Log page
James Chen
James Chen Work Logs (log page)
Albert Nabiullin
Albert Nabiullin Work Logs (log page)
Carlin Cherry
Carlin Cherry Work Logs (log page)
Julia Wang
Julia Wang Work Logs (log page)
12/4-12/8 finalizing University Patents report
- 12/4 9-12 edits, sent to Ed, confirming catering for party
- 12/5 9-12 final edits, sent to Ed
- 12/6 1-4 making City Agglomeration graphics
- 12/7 1-3 wrapping up everything
11/27-12/1
- 11/27 10-12 edits, sent to Ed
- 11/29 10-12 catering order for lunch party, wiki page organization
- 11/30 2:30-4:30 edits
- 12/1 10-12 met with Ed, edits
11/20-11/22
- 11/20 10-12 edits
- 11/21 10-12 met with Ed, edits
- 11/22 10-12 met with Ed, edits
11/13-11/17 deadline 11/16 final draft
- 11/13 10-12 redoing reg table
- 11/14 10-12 edits
- 11/15 10-12:30 edits
- 11/16 2:30-4 met with Ed, edits
- 11/17 10-12 edits
11/6-11/10 deadline 11/16 final draft
- 11/6 10-12 4th draft
- 11/8 10-12 met with Ed, editing
- 11/9 2:30-5:30 redoing graphs, restructuring introduction
- 11/10 10-12, 3-4:30 redoing charts, rewriting body
10/30-11/3 deadline 11/16 final draft
- 10/30 10-12 revisions
- 10/31 10-12 revisions
- 11/1 10-12 revisions, new data for basic funding
- 11/2 10-12 revisions
10/23-10/27
- 10/23 10-12 editing University Patents
- 10/24 10-12 reran regressions, fixed problem with Cornell!
- 10/25 10-12 sent 2nd draft to Anne
- 10/26 2:30-4:30 revisions
- 10/27 10-12 sent 3rd draft
10/16-10/20
- 10/16 10-12 pulled Houston patent addresses
- 10/17 10-12 pulled Houston patent addresses
- 10/18 10-12 edited University Patents
- 10/19 2:30-4:30 edited University Patents, tabled patent database reorganization until it is cleaned by Oliver/Shelby/Ed
- 10/20 10-12 edited University Patents
10/11-10/13
- 10/11 10-12 pulling Houston patents
- 10/12 10-12 University patents, sent draft to Anne
10/2-10/6
- 10/2 10-12 work on University Patents draft, close to sending
- 10/3 10-12 Distracted by Augusta project, Reorganizing patent database
- 10/4 10-12 Reorganizing whole patent database by city, state, pulling Crunchbase data for Augusta
- 10/5 2:15-2:45 Augusta patents
- 10/6 10-12 Reorganizing patents, figure out misspellings
9/25-9/29 finish draft
- 9/25 10-12 remaking charts
- 9/26 10-12 data pull for Augusta University
- 9/27 10-12 data pull for Augusta University
- 9/29 10-12 reran log regressions
9/18-9/22/2017
- 9/18 10-12 cleaning data
- 9/19 10-12 cleaning data
- 9/20 10-12 created time-series data set
- 9/21 2:30-4 reran regressions
- 9/22 10-12, 2-3 remade charts
9/11-9/15/2017 Deadline: 9/15 - convert data to time-series, new charts
- 9/11 10-12 Converting to time-series
- 9/12 10-12 Check accuracy, converting to time-series, talked to Jeemin about next project
- 9/13 10-12 Fix R&D data, previous SQL code
- 9/14 2:30pm-4pm, 10:30pm-12am Fix R&D data
- 9/15 10-12, 2-3 join data
9/5-9/8/2017 Putting together University Patents report
- 9/5 10-12 Looked at report, created artifacts, cleaned University Patents folder
- 9/6 10-12 Spoke with Ed about project organization
- 9/7 2:30-4:30 Writing report
- 9/8 10-12 Making data into time series: gyear, make tables and charts
Ramee Saleh
Ramee Saleh Work Logs (log page)
Avesh Krishna
Avesh Krishna Work Logs (Log Page)
Shrey Agarwal
Shrey Agarwal Work Logs (log page)
1/23/18 15:00 - 17:00
- Became reacclimatized with the project, spoke with Ed about the direction for the rest of the semester
1/25/18 15:00 - 17:00
- Began examining the data on pulled webpages relating to demo days
1/26/18 13:00 - 17:00
- Began categorizing demo day pages based on: 1) relevance to accelerators, 2) relevance to the particular accelerator (got to 200)
1/30/18 15:00 - 17:00
- Continued working through the demo day pages, spoke with Ed about using the data to work a better set (got to 450)
2/01/18 15:00 - 17:00
- Finished the match and created pivot tables to count the number of repetitions (companies going through more than one accelerator)
2/06/18 15:00 - 17:00
- Discussed with Matthew the best way to collect the VC data from the repetitions. We tried different matches through our SDC data to no avail
2/08/18 15:00 - 18:00
- Continued attempting to match with SDC the different columns. Didn't work without separating the data into individual files, a very tedious process.
2/13/18 15:00 - 17:00
- Spoke with Ed about incubators project, will begin as soon as we can time the accelerator startup investments. Ed is expecting us to begin sometime in the next two months, using a similar process as we did for incubators. The process should be handled by a new worker.
2/15/18 15:00 - 17:00
- Talked to Ed about next steps for the project. Practiced accessing the CrunchBase database on SQL and brushed up on SQL code.
2/16/18 13:00 - 17:00
- Sifted through the database for Crunchbase investment information.
2/20/18 15:00 - 17:00
- Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to gather round dates.
2/22/18 15:00 - 18:00
- Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators.
2/27/18 15:00 - 17:00
- Looked into using the WhoIs Parser in order to find when the companies went through their accelerators.
9/19/17 15:00 - 17:00
- Became reacclimatized with the project, spoke with Ed about the direction for the rest of the semester
9/20/17 15:00 - 17:00
- Worked on setting up a new pull for the updated SDC data
9/21/17 15:00 - 17:00
- Finished the pull and sorted the data from the updated accelerator list
9/22/17 15:00 - 17:00
- Tried to set up the matcher with Matthew; ran into some difficulties on Power Shell, returning a blank file in the output
9/26/17 15:00 - 17:00
- Finished the match and created pivot tables to count the number of repetitions (companies going through more than one accelerator)
9/27/17 15:00 - 17:00
- Discussed with Matthew the best way to collect the VC data from the repetitions. We tried different matches through our SDC data to no avail
9/28/17 16:00 - 17:00
- Continued attempting to match with SDC the different columns. Didn't work without separating the data into individual files, a very tedious process.
9/29/17 15:00 - 17:00
- Spoke with Ed about incubators project, will begin as soon as we can time the accelerator startup investments. Ed is expecting us to begin sometime in the next two months, using a similar process as we did for incubators. The process should be handled by a new worker.
10/02/17 15:00 - 17:00
- Talked to Ed about next steps for the project. Practiced accessing the CrunchBase database on SQL and brushed up on SQL code.
10/03/17 15:00 - 17:00
- Sifted through the database for Crunchbase investment information.
10/04/17 15:00 - 17:00
- Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to gather round dates.
10/06/17 15:00 - 17:00
- Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators.
10/11/17 15:00 - 17:00
- Looked into using the WhoIs Parser in order to find when the companies went through their accelerators.
10/12/17 15:00 - 17:00
- Discovered that the Wayback Machine will not be a good option for identifying the time when a company went through the accelerator. Created a list of VC Companies and their earliest round date. Included a column for the date they went through their accelerators and will fill it in when we find a good method of finding this date.
10/16/17 15:00 - 17:00
- Continued working on sorting VCCompanies by their earliest round date.
10/17/17 15:00 - 17:00
- Worked with Ben to find a solution to our problem of data acquisition. Finalized earliest round date for VCCompanies.
10/18/17 15:00 - 17:00
- Updated our VC data with Ed's help in order to increase the accuracy and completion of our data.
10/19/17 15:00 - 17:00
- Organized all of our matched data and updated it in order to reflect the most recent SDC pull with Ed. Matched Crunchbase data with our cohort companies.
10/20/17 15:00 - 17:00
- Generated the new list of VCCompanies as well as their earliest round dates.
10/23/17 15:00 - 17:00
- Worked on sorting out the discrepancies in our matched data.
10/24/17 15:00 - 17:00
- Went through list of VCCompanies and began adding respective accelerators in order to proceed with VCPercentage table.
10/25/17 15:00 - 17:00
- Continued going through list of VCCompanies and adding accelerators.
10/26/17 15:00 - 17:00
- Continued going through list of VCCompanies and adding accelerators. Will have this completed on Monday.
10/30/17 15:00 - 17:00
- Finished adding all of the accelerators to the list of VCCompanies. Added a column indicating whether or not the company went through two or more accelerators.
10/31/17 15:00 - 17:00
- Began compiling data in the column for the dates that a specific company went through an Accelerator.
11/01/17 15:00 - 17:00
- Finalized entering dates for Y Combinator cohort companies.
11/02/17 15:00 - 17:00
- Continued entering cohort company dates into Excel file.
11/06/17 15:00 - 17:00
- Began looking at keywords for identifying the cohort class dates for each company
11/07/17 15:00 - 17:00
- Received list from Peter with the accelerator founders matched from the Crunchbase LinkedIn URLs and proceeded to find the links for those founders without a match on Crunchbase. Data found in "Unfound Founders List" in the Fall 2017 folder
Tay Jacobe
Taylor Jacobe Work Logs (log page)
2017-12-01: Finished up the California post, ready to publish.
2017-11-30: Finished and published the Augusta post. Worked on California post in Wordpress, adding a bit more content suggested by Ed.
2017-11-29: Cleaned up Augusta findings post and cleaned out spam comments on Wordpress; there were almost 2000 spam comments within the last 3 weeks, which is concerning. Maybe there is a reason it has increased so quickly?
2017-11-17: Worked on Augusta Findings post.
2017-11-16: Finished first draft of California post, California Growth (Blog Post). Continued looking into a "Future of Communication" post and what that would look like. Anne also suggested I write a post about Augusta Findings (Blog Post), so I began that!
2017-11-15: Peer edited Yunnie and Dianna's blog post drafts. Anne suggested another post: McNair projects>Agglomeration>PeterHarrison. Research growth of high-tech high-growth enterprises in California from 1986-2016. Use file of maps. Started working on the post.
2017-11-10: Spent the morning cleaning out the spam comments on the blog. More than 1000 of them! Kept investigating Blockchain; I think I've determined that it might not be worth doing a post about because there are already a lot of sources that have published pieces that explain blockchain in simple terms. Continued looking for future blog post ideas: new social media (https://www.techworld.com/social-media/bumble-founder-whitney-wolfe-herd-talks-harvey-weinstein-linkedin-future-of-social-network-3666350/), 3D printing, security in a time of increasing automation and digitalization, the future of communication (smartphones, etc.: what's next?)
2017-11-09: Reorganized work log. Continued researching blockchain and began a draft of a post that will explain the concept in simpler terms and discuss potential impacts of this new technology! Created graphs for the Fund of Funds post. Finished the post and put everything into wordpress.
2017-11-08: Compiled a list of cities in Greater Cincinnati to use for data for blog post. Tried to educate myself on blockchain to eventually write a post about it
2017-11-01: Tried to gather research to improve the VC FOF post. Edited and redrafted. Investigated other potential blog posts.
2017-10-26: Worked on Fund of Funds in VC Blog post
2017-10-25: Looked over summary and edited. Started working on a blog post on the role of fund of funds in venture capital by direction of Anne and Ed
2017-10-23: Worked on a summary document for the Houston Innovation District project, verbal summaries of data analysis
2017-10-19: Worked on Houston Innovation District more. All work is documented on the wiki page
2017-10-18: Working on Houston Innovation District project. Figuring out what we've done and what needs to be done. McNair center servers went down, while I was working, so I lost a decent amount of work that I had been doing to summarize what we had and had to start over when it was rebooted. Cleaned up the wiki page and summarized where we are so far: what data we have & where it is, what data we are currently collecting, and what data we still want/need. Began working on collecting information about tax codes, incentives for development offered in Houston.
2017-10-11: Worked on prep for Houston Innovation District project
2017-10-06: Added more slides and edited. Updated wiki page with info.
2017-10-05: Spent quite a while trying to figure out source data for patent data in slides for Augusta, then worked on cleaning up and adding to Augusta slides that were unfinished/not great, created cybersecurity slide
2017-10-04: Continued working on Augusta project
2017-09-27: Worked on data analysis and research for Augusta Project, looked into Augusta business news (there isn't very much of it!)
2017-09-21: Continued preparation for Augusta Startup Ecosystem and Houston Innovation District Projects
2017-09-20: Preliminary preparations for Augusta and Houston Projects
Matthew Ringheanu
Matthew Ringheanu Work Logs (log page)
9/11/2017 2:00-5:00 pm
- Spoke to Ed about the project going forward. Organized the current updated data for our project.
9/12/2017 3:00-5:00 pm
- Began going through the Cleaned Cohort Data Excel file and found a few problems with it. Will continue the cleaning process for the rest of the week.
9/13/2017 2:00-5:00 pm
- Sorted through Cleaned Cohort Data and finalized our List of Accelerators. We can begin the process of creating our PercentVC table.
9/14/2017 3:00-5:00 pm
- Completely finalized our dataset of accelerators and startups. Met with Michelle Passo to discuss objectives of the research for credit course.
9/18/2017 2:00-4:00 pm
- Talked with Peter about the LinkedIn crawler data. Went through VC page that Meghana sent me.
9/19/2017 3:00-5:00 pm
- Completed SDC pull of updated VC Data.
9/20/2017 2:00-5:00 pm
- Attempted several times to run the Matcher. Cleaned our pulled data.
9/21/2017 3:00-5:00 pm
- Came extremely close to running the Matcher the correctly. Reviewed the final LinkedIn data from Peter.
9/25/2017 2:00-5:00 pm
- Finalized the matched file of accelerator companies with VC portfolio companies. Gave Ben the data on Georgia accelerators.
9/26/2017 3:00-5:00 pm
- Worked on finding the duplicates in our Matched file in order to have the most accurate data.
9/27/2017 2:00-5:00 pm
- Attempted to find a way to organize the duplicate matches.
9/28/2017 4:00-5:00 pm
- Continued running through matched data in order to organize it effectively.
10/2/2017 2:00-5:00 pm
- Talked to Ed about next steps for the project. Practiced accessing the crunchbase database on SQL. Brushed up on SQL code.
10/3/2017 3:00-5:00 pm
- Searched the database for crunchbase investment information.
10/4/2017 2:00-5:00 pm
- Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to gather round dates.
10/6/2017 3:00-5:00 pm
- Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators.
10/11/2017 2:00-3:30 pm:
- Looked into using the WhoIs Parser in order to find when the companies went through their accelerators.
10/12/2017 3:00-5:00 pm
- Discovered that the Wayback Machine will not be a good option for finding when companies went through their accelerators. Created a list of VCCompanies and their earliest round date. Included a column for the date they went through their accelerators and will fill it in when we find a good method of finding this date.
10/16/2017 2:00-3:30 pm
- Continued working on sorting VCCompanies by their earliest round date.
10/17/2017 3:00-5:00 pm
- Worked with Ben to find a solution to our problem of data acquisition. Finalized earliest round date for VCCompanies.
10/18/2017 2:00-5:00 pm
- Updated our VC data with Ed's help in order to increase the accuracy and completion of our data.
10/19/2017 3:00-5:00 pm
- Organized all of our matched data and updated it in order to reflect the most recent SDC pull with Ed. Matched Crunchbase data with our cohort companies.
10/20/2017 2:00-3:30 pm
- Generated the new list of VCCompanies as well as their earliest round dates.
10/23/2017 2:00-3:30 pm
- Worked on sorting out the discrepancies in our matched data.
10/24/2017 3:00-5:00 pm
- Went through list of VCCompanies and began adding respective accelerators in order to proceed with VCPercentage table.
10/25/2017 2:00-5:00 pm
- Continued going through list of VCCompanies and adding accelerators.
10/26/2017 3:30-5:30 pm
- Continued going through list of VCCompanies and adding accelerators. Will have this completed on Monday.
10/30/2017 2:00-3:30 pm
- Finished adding all of the accelerators to the list of VCCompanies. Added a column indicating whether or not the company went through two or more accelerators.
10/31/2017 3:00-5:00 pm
- Began compiling data in the column for Date Company went through Accelerator.
11/1/2017 2:00-4:00 pm
- Finalized entering dates for Y Combinator cohort companies.
11/2/2017 4:00-5:30 pm
- Continued entering cohort company dates into Excel file.
11/6/2017 2:00-4:00 pm
- Continued entering cohort company dates into Excel file. Began compiling a list of keywords for demo day press releases.
11/7/2017 3:00-5:00 pm
- Finished coming up with keywords for demo day crawler. Sent the final list to Peter.
11/8/2017 2:00-3:30 pm
- Spoke to Ed and organized all of our current data.
11/9/2017 3:00-5:00 pm
- Created a new project page called Accelerator Data and listed all relevant files as well as descriptions.
11/14/2017 3:00-5:00 pm
- Looked up URLs and decided whether or not the webiste was relevant.
11/15/2017 2:00-5:00 pm
- Created SQL database entitled "acceleratordata" and began creating tables from folder of All Relevant Files.
11/16/2017 3:00-5:00 pm
- Continued to input tables into SQL database.
11/20/2017 2:00-5:00 pm
- Cleaned text files in order to import tables into SQL database.
11/27/2017 2:00-5:00 pm
- Worked with Peter to find and exclude irrelevant keywords on HTML pages. Began categorizing relevant demo day pages.
11/28/2017 3:00-5:00 pm
- Finished inputting tables of relevant files into SQL database.
11/29/2017 2:00-5:00 pm
- Went through accelerator HTML URLs. Spoke with Ed about going through HTMLs and classifying based on overall and specific relevance.
12/1/2017 3:00-5:00 pm
- Worked through accelerator links and classified pages based on whether or not they provided relevant information about startup timing.
12/4/2017 10:00-12:00 pm
- Continued running through demo day crawl URLs and scoring them based on relevance.
12/7/2017 1:00-4:30 pm
- Finalized scoring of demo day URLs for the original crawl. Last day of work for this semester.
Meghana Pannala
Meghana Pannala Work Logs (log page)
Technical
Harsh
Harsh Upadhyay Work Logs (log page)
Peter Jalbert
Peter Jalbert Work Logs (log page)
2017-12-21: Last minute adjustments to the Moroccan Data. Continued working on Selenium Documentation.
2017-12-20: Working on Selenium Documentation. Wrote 2 demo files. Wiki Page is avaiable here. Created 3 spreadsheets for the Moroccan data.
2017-12-19: Finished fixing the Demo Day Crawler. Changed files and installed as appropriate to make linked in crawler compatible with the RDP. Removed some of the bells and whistles.
2017-12-18: Continued finding errors with the Demo Day Crawler analysis. Rewrote the parser to remove any search terms that were in the top 10000 most common English words according to Google. Finished uploading and submitting Moroccan data.
2017-12-15: Found errors with the Demo Day Crawler. Fixed scripts to download Moroccan Law Data.
2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data.
2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.
2017-11-20: Continued running Demo Day Page Parser. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.
2017-11-16: Continued running Demo Day Page Parser. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.
2017-11-15: Continued running Demo Day Page Parser. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for TIGER Geocoder. Finished re-formatting work logs.
2017-11-14: Continued running Demo Day Page Parser. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for TIGER Geocoder.
2017-11-13: Built Demo Day Page Parser.
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format.
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on Accelerator Seed List page. Still waiting for feedback on the PostGIS installation from Tiger Geocoder. Continued working on Accelerator Google Crawler.
2017-11-06: Contacted Geography Center for the US Census Bureau, here, and began email exchange on PostGIS installation problems. Began working on the Selenium Documentation. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the Tiger Geocoder Page.
2017-10-31: Began downloading blocks of data for individual states for the Tiger Geocoder project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in the database server documentation under "Editing Users".
2017-10-25: Continued working on the TigerCoder Installation.
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the PostGIS Installation page.
2017-10-23: Finished Yelp crawler for Houston Innovation District Project.
2017-10-19: Continued work on Yelp crawler for Houston Innovation District Project.
2017-10-18: Continued work on Yelp crawler for Innovation District Project.
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.
2017-10-13: Updated various project wiki pages.
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.
2017-10-05: Emergency ArcGIS creation for Agglomeration project.
2017-10-04: Emergency ArcGIS creation for Agglomeration project.
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.
2017-09-28: Added collaborative editing feature to PyCharm.
2017-09-27: Worked on big database file.
2017-09-25: New task -- Create text file with company, description, and company type.
- VC Database Rebuild
- psql vcdb2
- table name, sdccompanybasecore2
- Combine with Crunchbasebulk
- TODO: Write wiki on linkedin crawler, write wiki on creating accounts.
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project.
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.
2017-09-14: Continued implementing LinkedIn Crawler for profiles.
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.
2017-09-12: Continued working on the LinkedIn Crawler for Accelerator Founders Data. Added to the wiki on this topic.
2017-09-11: Continued working on the LinkedIn Crawler for Accelerator Founders Data.
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see here.
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see here.
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post here under Section 4.
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.
Administrative
Su Chen Teh
Su Chen Teh Work Logs (log page)
Archive
This is the work log for archived members.