Changes

Jump to navigation Jump to search
no edit summary
10/17/2016 2:00-5:00 pm: Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of what I believe it is and included some helpful links===Fall 2017===<onlyinclude>
10[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]] 9/11/2017 2:00-5:00 pm*Spoke to Ed about the project going forward. Organized the current updated data for our project. 9/12/2017 3:00-5:00 pm*Began going through the Cleaned Cohort Data Excel file and found a few problems with it. Will continue the cleaning process for the rest of the week. 9/1813/2016 42017 2:00-65:00 pm*Sorted through Cleaned Cohort Data and finalized our List of Accelerators. We can begin the process of creating our PercentVC table. 9/14/2017 3: 00-5:00 pm*Completely finalized our dataset of accelerators and startups. Met with Michelle Passo to discuss objectives of the research partner Shrey who filled for credit course. 9/18/2017 2:00-4:00 pm*Talked with Peter about the LinkedIn crawler data. Went through VC page that Meghana sent me . 9/19/2017 3:00-5:00 pm*Completed SDC pull of updated VC Data. 9/20/2017 2:00-5:00 pm*Attempted several times to run the Matcher. Cleaned our pulled data. 9/21/2017 3:00-5:00 pm*Came extremely close to running the Matcher the correctly. Reviewed the final LinkedIn data from Peter. 9/25/2017 2:00-5:00 pm*Finalized the matched file of accelerator companies with VC portfolio companies. Gave Ben the data on Georgia accelerators. 9/26/2017 3:00-5:00 pm*Worked on finding the duplicates in our Matched file in order to have the most accurate data. 9/27/2017 2:00-5:00 pm*Attempted to find a way to organize the duplicate matches. 9/28/2017 4:00-5:00 pm*Continued running through matched data in order to organize it effectively. 10/2/2017 2:00-5:00 pm*Talked to Ed about next steps for the project. Practiced accessing the crunchbase database on where we are SQL. Brushed up on SQL code. 10/3/2017 3:00-5:00 pm*Searched the database for crunchbase investment information. 10/4/2017 2:00-5:00 pm*Pulled the funding rounds table from SQL and matched it with the project; Began looking companies that have received VC funding in order to gather round dates. 10/6/2017 3:00-5:00 pm*Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators. 10/11/2017 2:00-3:30 pm:*Looked into using the WhoIs Parser in order to find when the companies went through their accelerators. 10/12/2017 3:00-5:00 pm*Discovered that the Wayback Machine will not be a good option for finding when companies went through their accelerators. Created a list of VCCompanies and their earliest round date. Included a column for the date they went through their accelerators and will fill it in when we find a good method of finding this date. 10/16/2017 2:00-3:30 pm*Continued working on websites sorting VCCompanies by their earliest round date. 10/17/2017 3:00-5:00 pm*Worked with Ben to find a solution to our problem of certain accelerators data acquisition. Finalized earliest round date for how VCCompanies. 10/18/2017 2:00-5:00 pm*Updated our VC data with Ed's help in order to determine increase the accuracy and completion of our data. 10/19/2017 3:00-5:00 pm*Organized all of our matched data and updated it in order to reflect the most recent SDC pull with Ed. Matched Crunchbase data with our cohort companies. 10/20/2017 2:00-3:30 pm*Generated the new list of VCCompanies as well as their cohorts and listed these steps earliest round dates. 10/23/2017 2:00-3:30 pm*Worked on sorting out the wikidiscrepancies in our matched data.
10/1924/2016 22017 3:00-5:00 pm: Finished looking on the remaining accelerator websites *Went through list of VCCompanies and wrote the steps on determining how began adding respective accelerators in order to manually locate the cohortsproceed with VCPercentage table.
10/2025/2016 42017 2:00-65:00 pm: Met with Peter *Continued going through list of VCCompanies and Christy to discuss the possibility of creating a web crawler that will pull data from individual accelerator sitesadding accelerators.
10/2426/2016 22017 3:0030-5:00 30 pm: Brainstormed with Albert *Continued going through list of VCCompanies and Julia about changes to the category name for SBDEadding accelerators. Spoke to Ed about full scope of accelerator projectWill have this completed on Monday.
10/2530/2016 42017 2:00-63:00 30 pm: Brainstormed with Shrey about different potential industry focuses within *Finished adding all of the accelerators, as well as different variables to search for in terms the list of VCCompanies. Added a column indicating whether or not the company went through two or more accelerators, startups, cohorts, etc.
10/2631/2016 22017 3:00-5:00 pm: Began searching for more databases including lists of accelerators as well as some characteristics of those accelerators; *Began searching compiling data in the column for characteristics that identify accelerators on their websitesDate Company went through Accelerator.
1011/271/2016 42017 2:00-64:00 pm: Continued searching *Finalized entering dates for relevant lists of accelerators to include on our page. Added some links that have high potential under the tab (Obtained from List of Accelerators or various Google searches)Y Combinator cohort companies.
1011/312/2016 22017 4:00-5:00 30 pm: Began constructing a list of variables that clearly distinguish an accelerator on its website. This is in an effort to allow a crawler to crawl through many Google searches and identify accelerators*Continued entering cohort company dates into Excel file.
11/16/2016 42017 2:00-64:00 pm: *Continued looking for variables that could identify accelerators from their websitesentering cohort company dates into Excel file. Searched through numerous different websites Began compiling a list of accelerators obtained from our current databaseskeywords for demo day press releases.
11/27/2016 22017 3:00-45:00 pm: Continued combing through websites of numerous accelerators, well-known and other, in *Finished coming up with keywords for demo day crawler. Sent the hopes of finding identifying variablesfinal list to Peter.
11/38/2016 42017 2:00-63:00 30 pm: Finalized my list of variables that could be used *Spoke to distinguish the websites Ed and organized all of accelerators. Slightly re-arranged our list of accelerator databases in order of relevancecurrent data.
11/79/2016 22017 3:00-5:00 pm: Began compiling the list of all accelerators. *Created a new TextPad document with information from a new databaseproject page called Accelerator Data and listed all relevant files as well as descriptions.
11/814/2016 42017 3:00-65:00 pm: Worked with Shrey *Looked up URLs and Ben in order to compile all of our accelerator databases into one long list on Textpaddecided whether or not the webiste was relevant.
11/915/2016 2017 2:00-5:00 pm: Continued formulating a *Created SQL database for all accelerators entitled "acceleratordata" and all began creating tables from folder of the available info givenAll Relevant Files.
11/1016/2016 42017 3:00-65:00 pm: Worked with Shrey and Peter in order *Continued to develop a crawler for f6sinput tables into SQL database.
11/1420/2016 2017 2:00-5:00 pm: Began sorting the Seed-DB *Cleaned text files in order to import tables into SQL database in an Excel document.
11/1527/2016 42017 2:00-65:00 pm: Conducted some Google searches in an attempt *Worked with Peter to find more accelerator databasesand exclude irrelevant keywords on HTML pages. Began looking through Executive Orders searching for keywordscategorizing relevant demo day pages.
11/1628/2016 22017 3:00-5:00 pm: Completed searching through Executive Orders*Finished inputting tables of relevant files into SQL database.
11/1729/2016 42017 2:00-65:00 pm: Continued working on Google searches for state *Went through accelerator listHTML URLs. Looked Spoke with Ed about going through f6s for common words that can be used to distinguish accelerators once we have finalized the crawlerHTMLs and classifying based on overall and specific relevance.
1112/211/2016 22017 3:00-5:00 pm: Randomly chose 10 accelerators from Excel list of accelerators on the RDP. Went *Worked through each website accelerator links and listed the steps that I took in order to determine classified pages based on whether or not the website belonged to an accelerator. Will continue extracting cohort they provided relevant information tomorrowabout startup timing.
1112/224/2016 42017 10:00-612:00 pm: Listed out all steps for extracting cohort information from the ten randomly chosen accelerators. Worked with Peter in order to build a tool that will search all of the HTMLs *Continued running through demo day crawl URLs and attempt to identify each one as an accelerator as well as extract some basic informationscoring them based on relevance.
1112/287/2016 22017 1:00-54:00 30 pm: Merged the F6S accelerator list with our other list, then posted it on *Finalized scoring of demo day URLs for the project pageoriginal crawl. Learned process Last day of work for accelerator data extraction from Edthis semester.
11</29/2016 4:00-6:00 pm: Began process of collecting data from the 20 accelerators that I am responsible for.onlyinclude>
11/30/2016 2:00-5:00 pm: Continued collecting data from accelerators. Finished 15/20.===Spring 2017===
121/18/2017 1/2016 4:00-65:00 pm: *Continued collecting data from acceleratorsfor accelerator project. Finished original 20, picked up a new set of 20Helped Catherine draft tweets for the McNair Center twitter account.
121/220/2016 22017 1:00-53:00 pm: *Continued collecting data from on accelerators. Finished next 20Attended McNair Center team meeting.
121/823/2016 2017 1:00-35:00 pm: Completed collecting *Began combing through accelerator list, determining which accelerators are still missing data from accelerators for the semesterand documenting these in a TextPad file. Finished through #115.
1/1825/2017 1:00-5:00 pm: *Continued collecting data for looking through accelerator project. Helped Catherine draft tweets for the McNair Center twitter accountlist.
1/2027/2017 1:00-3:00 pm: *Continued collecting data going through accelerator list. Left off on accelerators. Attended McNair Center team meeting#226 with Shrey.
1/2320/2017 1:00-5:00 pm: Began combing *Continued going through accelerator list, determining which accelerators are still missing data and documenting these in a TextPad file. Finished through #115440.
2/1/25/2017 1:00-5:00 pm: Continued looking *Finished going through accelerator the listof accelerators looking for incomplete files. Began completing the files that were not done.
12/273/2017 1:00-3:00 pm: *Continued going through working on completing accelerator list. Left off on #226 with Shreyfiles.
12/206/2017 1:00-54:00 30 pm: Continued *Finished data set of accelerators. Began going through accelerator listand making sure that all text files and cohort files are of the same format so Peter can easily pull the information. Finished through #440Left for 30 minutes for an interview from 2:30-3:00 pm.
2/18/2017 1:00-5:00 pm: *Finished going formatting through the list of accelerators looking for incomplete files#137. Began completing the files that were not doneSpoke with Ed about project.
2/313/2017 1:00-35:00 pm: Continued working on completing *Completed formatting for all accelerator text files.
2/615/2017 13:00-45:30 00 pm: Finished *Made copy of the completed data set of accelerators. Began going through Spoke to Ed about future steps to take for project including gathering founder data and making sure that all text files and cohort files are of the same format so Peter can easily pull obtaining the information. Left for 30 minutes for an interview from 2:30-3:00 pmcrunchbase api.
2/817/2017 1:00-53:00 pm: Finished formatting *Went through #137final Excel spreadsheet for cohort information. Still need to run the crawler one more time after the completion of the editing process. Spoke with Ed about projectFound the application for the crunchbase api which will hopefully allow us to gain access.
2/1320/2017 1:00-5:00 pm: Completed formatting *Filled out another application for Crunchbase research access; Found the first source for all accelerator text filesthe incubator project on angel.co, will hopefully work with Peter to make a crawler similar to f6s
2/1522/2017 31:00-5:00 pm: Made copy of the completed *Pulled data setfrom SDC for Ed and normalized it. Spoke Learned how to Ed about future steps to take for project including gathering founder data use SDC and obtaining the crunchbase apinormalizer.
2/1724/2017 1:00-3:00 pm: Went through final Excel spreadsheet for *Finished cleaning up the cohort information. Still need to run the crawler one more time after the completion of the editing process. Found the application data for Y-combinator on the crunchbase api which will hopefully allow us to gain accessFinal Cohort Excel Spreadsheet.
2/2027/2017 1:00-5:00 pm: Filled out another application for Crunchbase research access; Found *Continued cleaning up the first source for cohort data in the incubator project on angelExcel file. Finished Cohort Number and Year.co, will hopefully work with Peter to make a crawler similar to f6s
23/221/2017 12:00-5:00 pm: Pulled *Worked with Ben and Shrey to pull data from SDC for Ed all VC funded companies and normalized it. Learned how to use SDC and the normalizerput it in an Excel document.
23/243/2017 1:00-32:00 30 pm: Finished cleaning up *Worked with Ben to try and repeat down the cohort VC data for Y-combinator on the Final Cohort Excel Spreadsheetwithout it going too far.
23/276/2017 1:00-54:00 pm: Continued *Worked with Shrey to finish cleaning up the cohort data in . It is ready to be run through the Excel file. Finished Cohort Number and Yearmatcher with Ben.
3/18/2017 21:00-5:00 pm: Worked *Matched the VC Data with Ben the list of Cohort Companies and Shrey to pull data from SDC for got one list of all cohort companies that have received VC funded companies and normalized it to put it in an Excel documentfunding.
3/310/2017 112:00-2:30 00 pm: Worked with Ben to try and repeat down *Put a write-up on the top of the Accelerator wiki page detailing where we are in the VC project currently as well as what data without it going too farwe have accumulated on the RDP.
3/620/2017 1:00-45:00 pm: Worked with Shrey to finish cleaning *Began gathering the cohort dataURLs of all accelerators in a TextPad file called Accelerator URLs. It is ready to be run through Participated in the matcher with BenSQL training session.
3/822/2017 1:00-5:00 pm: Matched the *Made tables in Terminal for Accelerator companies matched with VC companies and for Cohort Data with the list of Cohort Companies and got one list of all cohort companies that have received VC funding.
3/1027/2017 121:00-24:00 pm: Put *Compiled all URLs of accelerator into a write-up on the top of the Accelerator wiki page detailing where we are in the project currently as well as what data we have accumulated on the RDPTextPad file.
3/2029/2017 1:00-5:00 pm: Began gathering *Worked on the matched data with Ben. Next time I will run the RegEx code that will filter the URLs of all accelerators in a TextPad file called Accelerator URLs. Participated in , and I will look through the SQL training sessionduplicates where two different VC backed company names matched to one cohort company name.
3/2231/2017 1:00-52:00 pm: Made tables *Ran the code for accelerator urls which are ready to be run through the wayback machine in Terminal for Accelerator companies matched with VC companies and for Cohort Dataorder to get the start dates. Also began looking through vc backed company names.
4/3/27/2017 1:00-45:00 pm: Compiled all URLs of accelerator into a TextPad file*Continued looking through double matched VC companies. Learned more SQL from Ed.
34/295/2017 1:00-5:00 pm: Worked *Made the final vc percentage table on the matched data with Ben. Next terminal and for next time I will run the RegEx code that will filter the URLs, and I will look through the duplicates where two different VC backed company names matched to one cohort company namecollect missing accelerator data.
34/317/2017 1:00-23:00 pm: Ran the code *Began collecting cohort data for accelerator urls which are ready to be run through the wayback machine big accelerators that were missing from our list in order to get the start dates. Also began looking through vc backed company namesadd it to our final list of cohort companies.
4/310/2017 1:00-5:00 pm: Continued *Finished gathering cohort company names for big accelerators that we were missing and put them into the Cleaned Cohort Companies Excel file. Ben is looking through double matched VC companies. Learned Crunchbase data in order to possibly find more SQL from Edmissing accelerators.
4/514/2017 1:00-54:00 pm: Made *Began working through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators and wrote notes on the final vc percentage table on terminal and for next time ones that I will collect missing accelerator datawas able to go through. Need to finish this textpad before moving forward.
4/717/2017 1:00-34:00 pm: Began collecting cohort data for big *Continued going through potential Crunchbase accelerators that were missing from our list in order we may have missed. Talked to add it to our final Ed about getting a more comprehensive list from Excel file and by the end of cohort companiesthe semester have the tables and data collected and done.
4/1019/2017 1:00-54:00 pm: Finished gathering cohort company names for big *Worked with Jeemin to generate an entire list of potential US accelerators that we were missing and put them into the Cleaned Cohort Companies Excel filefrom crunchbase. Ben is looking through Crunchbase data in order Worked to possibly find more missing a way to classify acceleratorsjust based on their descriptions.
4/1421/2017 : 1:00-4:00 pm: Began *Continued working through "Crunchbase Potential Accelerators" textpad the list identifying accelerators that may contain we do not have. Ramee and Juliette are now helping us gather cohort data for those missing accelerators and wrote notes on the ones that I was able to go through. Need to finish this textpad before moving forward.
4/1724/2017 19:00-41:00 pm: Continued going through potential Crunchbase accelerators that we may have missed*Updated Veeral on current state of project. Talked Typed up a to Ed about getting a more comprehensive -do list from Excel file and by on the end of the semester have the tables and discussion wiki for Veeral. Got new cohort data collected on an accelerator and doneadded it to Excel file.
45/193/2017 111:00-41:00 pm: Worked with Jeemin *Talked to generate an entire Ed and Anne about future report. Continued working through list of crunchbase potential US accelerators from crunchbase. Worked to find a way to classify accelerators just based on their descriptionsLast day of work for this semester.
4/21/2017: 1:00-4:00 pm: Continued working through the list identifying accelerators that we do not have. Ramee and Juliette are now helping us gather cohort data for those missing accelerators.===Fall 2016===
410/2417/2017 92016 2:00-15:00 pm: Updated Veeral on current state of *Created personal wiki page as well as work log; Read about the research project. Typed up to which I have been assigned; Wrote a to-do list on the discussion wiki for Veeral. Got new cohort data on an accelerator short summary of what I believe it is and added it to Excel file.included some helpful links
510/318/2017 112016 4:00-16:00 pm: Talked to Ed and Anne about future report. Continued working through list *Met with research partner Shrey who filled me in on where we are with the project; Began looking on websites of crunchbase potential certain accelerators. Last day of work for this semester.how to determine their cohorts and listed these steps on the wiki
910/1119/2017 2016 2:00-5:00 pm: Spoke *Finished looking on the remaining accelerator websites and wrote the steps on determining how to Ed about the project going forward. Organized manually locate the current updated data for our projectcohorts.
910/1220/2017 32016 4:00-56:00 pm: Began going through the Cleaned Cohort Data Excel file *Met with Peter and found a few problems with it. Will continue the cleaning process for Christy to discuss the rest possibility of the weekcreating a web crawler that will pull data from individual accelerator sites.
910/1324/2017 2016 2:00-5:00 pm: Sorted through Cleaned Cohort Data *Brainstormed with Albert and finalized our List of AcceleratorsJulia about changes to the category name for SBDE. We can begin the process Spoke to Ed about full scope of creating our PercentVC tableaccelerator project.
910/1425/2017 32016 4:00-56:00 pm: Completely finalized our dataset *Brainstormed with Shrey about different potential industry focuses within accelerators, as well as different variables to search for in terms of accelerators and , startups. Met with Michelle Passo to discuss objectives of the research for credit course, cohorts, etc.
910/1826/2017 2016 2:00-45:00 pm: Talked with Peter about the LinkedIn crawler data. Went through VC page *Began searching for more databases including lists of accelerators as well as some characteristics of those accelerators; Began searching for characteristics that Meghana sent me.identify accelerators on their websites
910/1927/2017 32016 4:00-56:00 pm: Completed SDC pull *Continued searching for relevant lists of accelerators to include on our page. Added some links that have high potential under the tab (Obtained from List of updated VC DataAccelerators or various Google searches).
910/2031/2017 2016 2:00-5:00 pm: Attempted several times *Began constructing a list of variables that clearly distinguish an accelerator on its website. This is in an effort to allow a crawler to run the Matcher. Cleaned our pulled datacrawl through many Google searches and identify accelerators.
911/211/2017 32016 4:00-56:00 pm: Came extremely close to running the Matcher the correctly*Continued looking for variables that could identify accelerators from their websites. Reviewed the final LinkedIn data Searched through numerous different websites of accelerators obtained from Peterour current databases.
911/252/2017 2016 2:00-54:00 pm: Finalized *Continued combing through websites of numerous accelerators, well-known and other, in the matched file hopes of accelerator companies with VC portfolio companies. Gave Ben the data on Georgia acceleratorsfinding identifying variables.
911/263/2017 32016 4:00-56:00 pm: Worked on finding *Finalized my list of variables that could be used to distinguish the duplicates in websites of accelerators. Slightly re-arranged our Matched file list of accelerator databases in order to have the most accurate dataof relevance.
911/277/2017 2016 2:00-5:00 pm: Attempted to find *Began compiling the list of all accelerators. Created a new TextPad document with information from a way to organize the duplicate matchesnew database.
911/288/2017 2016 4:00-56:00 pm: Continued running through matched data *Worked with Shrey and Ben in order to organize it effectivelycompile all of our accelerator databases into one long list on Textpad.
1011/29/2017 2016 2:00-5:00 pm: Talked to Ed about next steps *Continued formulating a database for all accelerators and all of the project. Practiced accessing the crunchbase database on SQL. Brushed up on SQL codeavailable info given.
11/10/32016 4:00-6:00 pm*Worked with Shrey and Peter in order to develop a crawler for f6s. 11/14/2016 2:00-5:00 pm*Began sorting the Seed-DB database in an Excel document. 11/15/2016 4:00-6:00 pm*Conducted some Google searches in an attempt to find more accelerator databases. Began looking through Executive Orders searching for keywords. 11/16/2017 32016 2:00-5:00 pm*Completed searching through Executive Orders. 11/17/2016 4:00-6:00 pm*Continued working on Google searches for state accelerator list. Looked through f6s for common words that can be used to distinguish accelerators once we have finalized the crawler. 11/21/2016 2:00-5: Searched 00 pm*Randomly chose 10 accelerators from Excel list of accelerators on the RDP. Went through each website and listed the steps that I took in order to determine whether or not the database website belonged to an accelerator. Will continue extracting cohort information tomorrow. 11/22/2016 4:00-6:00 pm*Listed out all steps for crunchbase investment extracting cohort informationfrom the ten randomly chosen accelerators. Worked with Peter in order to build a tool that will search all of the HTMLs and attempt to identify each one as an accelerator as well as extract some basic information. 11/28/2016 2:00-5:00 pm*Merged the F6S accelerator list with our other list, then posted it on the project page. Learned process for accelerator data extraction from Ed. 11/29/2016 4:00-6:00 pm*Began process of collecting data from the 20 accelerators that I am responsible for. 11/30/2016 2:00-5:00 pm*Continued collecting data from accelerators. Finished 15/20. 12/1/2016 4:00-6:00 pm*Continued collecting data from accelerators. Finished original 20, picked up a new set of 20. 12/2/2016 2:00-5:00 pm*Continued collecting data from accelerators. Finished next 20. 12/8/2016 1:00-3:00 pm*Completed collecting data from accelerators for the semester.
[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]]
[[Category:Work Log]]

Navigation menu