Changes

Jump to navigation Jump to search
no edit summary
10/17/2016 2:00-5:00 pm: Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of what I believe it is and included some helpful links===Fall 2017===<onlyinclude>
10[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]] 9/11/2017 2:00-5:00 pm*Spoke to Ed about the project going forward. Organized the current updated data for our project. 9/12/2017 3:00-5:00 pm*Began going through the Cleaned Cohort Data Excel file and found a few problems with it. Will continue the cleaning process for the rest of the week. 9/1813/2016 42017 2:00-65:00 pm*Sorted through Cleaned Cohort Data and finalized our List of Accelerators. We can begin the process of creating our PercentVC table. 9/14/2017 3: 00-5:00 pm*Completely finalized our dataset of accelerators and startups. Met with Michelle Passo to discuss objectives of the research partner Shrey who filled for credit course. 9/18/2017 2:00-4:00 pm*Talked with Peter about the LinkedIn crawler data. Went through VC page that Meghana sent me . 9/19/2017 3:00-5:00 pm*Completed SDC pull of updated VC Data. 9/20/2017 2:00-5:00 pm*Attempted several times to run the Matcher. Cleaned our pulled data. 9/21/2017 3:00-5:00 pm*Came extremely close to running the Matcher the correctly. Reviewed the final LinkedIn data from Peter. 9/25/2017 2:00-5:00 pm*Finalized the matched file of accelerator companies with VC portfolio companies. Gave Ben the data on Georgia accelerators. 9/26/2017 3:00-5:00 pm*Worked on finding the duplicates in our Matched file in order to have the most accurate data. 9/27/2017 2:00-5:00 pm*Attempted to find a way to organize the duplicate matches. 9/28/2017 4:00-5:00 pm*Continued running through matched data in order to organize it effectively. 10/2/2017 2:00-5:00 pm*Talked to Ed about next steps for the project. Practiced accessing the crunchbase database on where SQL. Brushed up on SQL code. 10/3/2017 3:00-5:00 pm*Searched the database for crunchbase investment information. 10/4/2017 2:00-5:00 pm*Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to gather round dates. 10/6/2017 3:00-5:00 pm*Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators. 10/11/2017 2:00-3:30 pm:*Looked into using the WhoIs Parser in order to find when the companies went through their accelerators. 10/12/2017 3:00-5:00 pm*Discovered that the Wayback Machine will not be a good option for finding when companies went through their accelerators. Created a list of VCCompanies and their earliest round date. Included a column for the date they went through their accelerators and will fill it in when we are find a good method of finding this date. 10/16/2017 2:00-3:30 pm*Continued working on sorting VCCompanies by their earliest round date. 10/17/2017 3:00-5:00 pm*Worked with Ben to find a solution to our problem of data acquisition. Finalized earliest round date for VCCompanies. 10/18/2017 2:00-5:00 pm*Updated our VC data with Ed's help in order to increase the project; Began looking accuracy and completion of our data. 10/19/2017 3:00-5:00 pm*Organized all of our matched data and updated it in order to reflect the most recent SDC pull with Ed. Matched Crunchbase data with our cohort companies. 10/20/2017 2:00-3:30 pm*Generated the new list of VCCompanies as well as their earliest round dates. 10/23/2017 2:00-3:30 pm*Worked on websites sorting out the discrepancies in our matched data. 10/24/2017 3:00-5:00 pm*Went through list of certain VCCompanies and began adding respective accelerators for how in order to determine their cohorts proceed with VCPercentage table. 10/25/2017 2:00-5:00 pm*Continued going through list of VCCompanies and adding accelerators. 10/26/2017 3:30-5:30 pm*Continued going through list of VCCompanies and listed these steps adding accelerators. Will have this completed on Monday. 10/30/2017 2:00-3:30 pm*Finished adding all of the accelerators to the list of VCCompanies. Added a column indicating whether or not the company went through two or more accelerators. 10/31/2017 3:00-5:00 pm*Began compiling data in the wikicolumn for Date Company went through Accelerator. 11/1/2017 2:00-4:00 pm*Finalized entering dates for Y Combinator cohort companies. 11/2/2017 4:00-5:30 pm*Continued entering cohort company dates into Excel file. 11/6/2017 2:00-4:00 pm*Continued entering cohort company dates into Excel file. Began compiling a list of keywords for demo day press releases.
1011/197/2016 22017 3:00-5:00 pm: *Finished looking on coming up with keywords for demo day crawler. Sent the remaining accelerator websites and wrote the steps on determining how final list to manually locate the cohortsPeter.
1011/208/2016 42017 2:00-63:00 30 pm: Met with Peter *Spoke to Ed and Christy to discuss the possibility organized all of creating a web crawler that will pull our current data from individual accelerator sites.
1011/249/2016 22017 3:00-5:00 pm: Brainstormed with Albert *Created a new project page called Accelerator Data and Julia about changes to the category name for SBDE. Spoke to Ed about full scope of accelerator projectlisted all relevant files as well as descriptions.
1011/2514/2016 42017 3:00-65:00 pm: Brainstormed with Shrey about different potential industry focuses within accelerators, as well as different variables to search for in terms of accelerators, startups, cohorts, etc*Looked up URLs and decided whether or not the webiste was relevant.
1011/2615/2016 2017 2:00-5:00 pm: Began searching for more databases including lists of accelerators as well as some characteristics *Created SQL database entitled "acceleratordata" and began creating tables from folder of those accelerators; Began searching for characteristics that identify accelerators on their websitesAll Relevant Files.
1011/2716/2016 42017 3:00-65:00 pm: *Continued searching for relevant lists of accelerators to include on our page. Added some links that have high potential under the tab (Obtained from List of Accelerators or various Google searches)input tables into SQL database.
1011/3120/2016 2017 2:00-5:00 pm: Began constructing a list of variables that clearly distinguish an accelerator on its website. This is *Cleaned text files in an effort to allow a crawler order to crawl through many Google searches and identify acceleratorsimport tables into SQL database.
11/127/2016 42017 2:00-65:00 pm: Continued looking for variables that could identify accelerators from their websites*Worked with Peter to find and exclude irrelevant keywords on HTML pages. Searched through numerous different websites of accelerators obtained from our current databasesBegan categorizing relevant demo day pages.
11/228/2016 22017 3:00-45:00 pm: Continued combing through websites of numerous accelerators, well-known and other, in the hopes *Finished inputting tables of finding identifying variablesrelevant files into SQL database.
11/329/2016 42017 2:00-65:00 pm: Finalized my list of variables that could be used to distinguish the websites of accelerators*Went through accelerator HTML URLs. Slightly re-arranged our list of accelerator databases in order of Spoke with Ed about going through HTMLs and classifying based on overall and specific relevance.
1112/71/2016 22017 3:00-5:00 pm: Began compiling the list of all accelerators. Created a new TextPad document with *Worked through accelerator links and classified pages based on whether or not they provided relevant information from a new databaseabout startup timing.
1112/84/2016 42017 10:00-612:00 pm: Worked with Shrey *Continued running through demo day crawl URLs and Ben in order to compile all of our accelerator databases into one long list scoring them based on Textpadrelevance.
1112/97/2016 22017 1:00-54:00 30 pm: Continued formulating a database *Finalized scoring of demo day URLs for all accelerators and all the original crawl. Last day of the available info givenwork for this semester.
11</10/2016 4:00-6:00 pm: Worked with Shrey and Peter in order to develop a crawler for f6s.onlyinclude>
11/14/2016 2:00-5:00 pm: Began sorting the Seed-DB database in an Excel document.===Spring 2017===
111/1518/2016 42017 1:00-65:00 pm: Conducted some Google searches in an attempt to find more *Continued collecting data for accelerator databasesproject. Began looking through Executive Orders searching Helped Catherine draft tweets for keywordsthe McNair Center twitter account.
111/1620/2016 22017 1:00-53:00 pm: Completed searching through Executive Orders*Continued collecting data on accelerators. Attended McNair Center team meeting.
111/1723/2016 42017 1:00-65:00 pm: Continued working on Google searches for state *Began combing through accelerator list, determining which accelerators are still missing data and documenting these in a TextPad file. Looked Finished through f6s for common words that can be used to distinguish accelerators once we have finalized the crawler#115.
111/2125/2016 22017 1:00-5:00 pm: Randomly chose 10 accelerators from Excel list of accelerators on the RDP. Went *Continued looking through each website and listed the steps that I took in order to determine whether or not the website belonged to an accelerator. Will continue extracting cohort information tomorrowlist.
111/2227/2016 42017 1:00-63:00 pm: Listed out all steps for extracting cohort information from the ten randomly chosen accelerators*Continued going through accelerator list. Worked Left off on #226 with Peter in order to build a tool that will search all of the HTMLs and attempt to identify each one as an accelerator as well as extract some basic informationShrey.
111/2820/2016 22017 1:00-5:00 pm: Merged the F6S *Continued going through accelerator list with our other list, then posted it on the project page. Learned process for accelerator data extraction from EdFinished through #440.
112/291/2016 42017 1:00-65:00 pm: *Finished going through the list of accelerators looking for incomplete files. Began process of collecting data from completing the 20 accelerators files that I am responsible forwere not done.
112/303/2016 22017 1:00-53:00 pm: *Continued collecting data from accelerators. Finished 15/20working on completing accelerator files.
122/6/2017 1/2016 4:00-64:00 30 pm: Continued collecting *Finished data from set of accelerators. Finished original 20, picked up a new set Began going through and making sure that all text files and cohort files are of 20the same format so Peter can easily pull the information. Left for 30 minutes for an interview from 2:30-3:00 pm.
122/28/2016 22017 1:00-5:00 pm: Continued collecting data from accelerators*Finished formatting through #137. Finished next 20Spoke with Ed about project.
122/813/2016 2017 1:00-35:00 pm: *Completed collecting data from accelerators formatting for the semesterall accelerator text files.
12/1815/2017 13:00-5:00 pm: Continued collecting *Made copy of the completed data set. Spoke to Ed about future steps to take for accelerator project. Helped Catherine draft tweets for including gathering founder data and obtaining the McNair Center twitter accountcrunchbase api.
12/2017/2017 1:00-3:00 pm: Continued collecting data on accelerators*Went through final Excel spreadsheet for cohort information. Still need to run the crawler one more time after the completion of the editing process. Attended McNair Center team meetingFound the application for the crunchbase api which will hopefully allow us to gain access.
12/2320/2017 1:00-5:00 pm: Began combing through accelerator list*Filled out another application for Crunchbase research access; Found the first source for the incubator project on angel.co, determining which accelerators are still missing data and documenting these in will hopefully work with Peter to make a TextPad file. Finished through #115.crawler similar to f6s
12/2522/2017 1:00-5:00 pm: Continued looking through accelerator list*Pulled data from SDC for Ed and normalized it. Learned how to use SDC and the normalizer.
12/2724/2017 1:00-3:00 pm: Continued going through accelerator list. Left off *Finished cleaning up the cohort data for Y-combinator on #226 with Shreythe Final Cohort Excel Spreadsheet.
12/2027/2017 1:00-5:00 pm: *Continued going through accelerator listcleaning up the cohort data in the Excel file. Finished through #440Cohort Number and Year.
23/1/2017 12:00-5:00 pm: Finished going through the list of accelerators looking *Worked with Ben and Shrey to pull data from SDC for incomplete files. Began completing the files that were not doneall VC funded companies and normalized it to put it in an Excel document.
23/3/2017 1:00-32:00 30 pm: Continued working on completing accelerator files*Worked with Ben to try and repeat down the VC data without it going too far.
23/6/2017 1:00-4:30 00 pm: Finished *Worked with Shrey to finish cleaning the cohort data set of accelerators. Began going It is ready to be run through and making sure that all text files and cohort files are of the same format so Peter can easily pull the information. Left for 30 minutes for an interview from 2:30-3:00 pmmatcher with Ben.
23/8/2017 1:00-5:00 pm: Finished formatting through #137. Spoke *Matched the VC Data with Ed about projectthe list of Cohort Companies and got one list of all cohort companies that have received VC funding.
23/1310/2017 112:00-52:00 pm: Completed formatting for all accelerator text files*Put a write-up on the top of the Accelerator wiki page detailing where we are in the project currently as well as what data we have accumulated on the RDP.
23/1520/2017 31:00-5:00 pm: Made copy *Began gathering the URLs of the completed data setall accelerators in a TextPad file called Accelerator URLs. Spoke to Ed about future steps to take for project including gathering founder data and obtaining Participated in the crunchbase apiSQL training session.
23/1722/2017 1:00-35:00 pm: Went through final Excel spreadsheet *Made tables in Terminal for cohort information. Still need to run the crawler one more time after the completion of the editing process. Found the application Accelerator companies matched with VC companies and for the crunchbase api which will hopefully allow us to gain accessCohort Data.
23/2027/2017 1:00-54:00 pm: Filled out another application for Crunchbase research access; Found the first source for the incubator project on angel*Compiled all URLs of accelerator into a TextPad file.co, will hopefully work with Peter to make a crawler similar to f6s
23/2229/2017 1:00-5:00 pm: Pulled *Worked on the matched data from SDC for Ed and normalized itwith Ben. Learned how to use SDC Next time I will run the RegEx code that will filter the URLs, and I will look through the normalizerduplicates where two different VC backed company names matched to one cohort company name.
23/2431/2017 1:00-32:00 pm: Finished cleaning up *Ran the cohort data code for Y-combinator on accelerator urls which are ready to be run through the wayback machine in order to get the Final Cohort Excel Spreadsheetstart dates. Also began looking through vc backed company names.
24/273/2017 1:00-5:00 pm: *Continued cleaning up the cohort data in the Excel filelooking through double matched VC companies. Finished Cohort Number and YearLearned more SQL from Ed.
34/15/2017 21:00-5:00 pm: Worked with Ben *Made the final vc percentage table on terminal and Shrey to pull for next time I will collect missing accelerator data from SDC for all VC funded companies and normalized it to put it in an Excel document.
34/37/2017 1:00-23:30 00 pm: Worked with Ben *Began collecting cohort data for big accelerators that were missing from our list in order to try and repeat down the VC data without add it going too farto our final list of cohort companies.
34/610/2017 1:00-45:00 pm: Worked with Shrey to finish cleaning *Finished gathering cohort company names for big accelerators that we were missing and put them into the cohort dataCleaned Cohort Companies Excel file. It Ben is ready looking through Crunchbase data in order to be run through the matcher with Benpossibly find more missing accelerators.
34/814/2017 1:00-54:00 pm: Matched *Began working through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators and wrote notes on the VC Data with the list of Cohort Companies and got one list of all cohort companies ones that have received VC fundingI was able to go through. Need to finish this textpad before moving forward.
34/1017/2017 121:00-24:00 pm: Put *Continued going through potential Crunchbase accelerators that we may have missed. Talked to Ed about getting a write-up on more comprehensive list from Excel file and by the top end of the Accelerator wiki page detailing where we are in semester have the project currently as well as what tables and data we have accumulated on the RDPcollected and done.
34/2019/2017 1:00-54:00 pm: Began gathering the URLs *Worked with Jeemin to generate an entire list of all potential US accelerators in from crunchbase. Worked to find a TextPad file called Accelerator URLs. Participated in the SQL training sessionway to classify accelerators just based on their descriptions.
34/2221/2017 : 1:00-54:00 pm: Made tables in Terminal for Accelerator companies matched with VC companies *Continued working through the list identifying accelerators that we do not have. Ramee and Juliette are now helping us gather cohort data for Cohort Datathose missing accelerators.
34/2724/2017 19:00-41:00 pm: Compiled all URLs *Updated Veeral on current state of project. Typed up a to-do list on the discussion wiki for Veeral. Got new cohort data on an accelerator into a TextPad and added it to Excel file.
5/3/29/2017 111:00-51:00 pm: Worked on the matched data with Ben*Talked to Ed and Anne about future report. Next time I will run the RegEx code that will filter the URLs, and I will look Continued working through the duplicates where two different VC backed company names matched to one cohort company namelist of crunchbase potential accelerators. Last day of work for this semester.
3/31/2017 1:00-2:00 pm: Ran the code for accelerator urls which are ready to be run through the wayback machine in order to get the start dates. Also began looking through vc backed company names.===Fall 2016===
410/317/2017 12016 2:00-5:00 pm: Continued looking through double matched VC companies. Learned more SQL from Ed.*Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of what I believe it is and included some helpful links
410/518/2017 12016 4:00-56:00 pm: Made *Met with research partner Shrey who filled me in on where we are with the final vc percentage table project; Began looking on terminal websites of certain accelerators for how to determine their cohorts and for next time I will collect missing accelerator data.listed these steps on the wiki
410/719/2017 12016 2:00-35:00 pm: Began collecting cohort data for big accelerators that were missing from our list in order to add it *Finished looking on the remaining accelerator websites and wrote the steps on determining how to our final list of cohort companiesmanually locate the cohorts.
410/1020/2017 12016 4:00-56:00 pm: Finished gathering cohort company names for big accelerators that we were missing *Met with Peter and put them into Christy to discuss the Cleaned Cohort Companies Excel file. Ben is looking through Crunchbase possibility of creating a web crawler that will pull data in order to possibly find more missing acceleratorsfrom individual accelerator sites.
410/1424/2017 12016 2:00-45:00 pm: Began working through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators *Brainstormed with Albert and wrote notes on Julia about changes to the ones that I was able to go throughcategory name for SBDE. Need Spoke to finish this textpad before moving forwardEd about full scope of accelerator project.
410/1725/2017 12016 4:00-46:00 pm: Continued going through *Brainstormed with Shrey about different potential Crunchbase industry focuses within accelerators that we may have missed. Talked , as well as different variables to Ed about getting a more comprehensive list from Excel file and by the end search for in terms of the semester have the tables and data collected and doneaccelerators, startups, cohorts, etc.
410/1926/2017 12016 2:00-45:00 pm: Worked with Jeemin to generate an entire list *Began searching for more databases including lists of accelerators as well as some characteristics of potential US those accelerators from crunchbase. Worked to find a way to classify ; Began searching for characteristics that identify accelerators just based on their descriptions.websites
410/2127/2017: 12016 4:00-46:00 pm: *Continued working through the list identifying searching for relevant lists of accelerators to include on our page. Added some links that we do not have. Ramee and Juliette are now helping us gather cohort data for those missing acceleratorshigh potential under the tab (Obtained from List of Accelerators or various Google searches).
410/2431/2017 92016 2:00-15:00 pm: Updated Veeral on current state of project. Typed up *Began constructing a to-do list of variables that clearly distinguish an accelerator on the discussion wiki for Veeralits website. Got new cohort data on This is in an accelerator effort to allow a crawler to crawl through many Google searches and added it to Excel fileidentify accelerators.
511/31/2017 112016 4:00-16:00 pm: Talked to Ed and Anne about future report*Continued looking for variables that could identify accelerators from their websites. Continued working Searched through list numerous different websites of crunchbase potential accelerators. Last day of work for this semesterobtained from our current databases.
911/112/2017 2016 2:00-54:00 pm: Spoke to Ed about the project going forward. Organized *Continued combing through websites of numerous accelerators, well-known and other, in the current updated data for our projecthopes of finding identifying variables.
911/123/2017 32016 4:00-56:00 pm: Began going through *Finalized my list of variables that could be used to distinguish the Cleaned Cohort Data Excel file and found a few problems with itwebsites of accelerators. Will continue the cleaning process for the rest Slightly re-arranged our list of accelerator databases in order of the weekrelevance.
911/137/2017 2016 2:00-5:00 pm: Sorted through Cleaned Cohort Data and finalized our List *Began compiling the list of Acceleratorsall accelerators. We can begin the process of creating our PercentVC tableCreated a new TextPad document with information from a new database.
911/148/2017 32016 4:00-56:00 pm: Completely finalized our data set of accelerators *Worked with Shrey and startups. Met with Michelle Passo Ben in order to discuss objectives compile all of the research for credit courseour accelerator databases into one long list on Textpad.
11/9/18/2017 2016 2:00-45:00 pm: Talked with Peter about *Continued formulating a database for all accelerators and all of the LinkedIn crawler data. Went through VC page that Meghana sent meavailable info given.
911/1910/2017 32016 4:00-6:00 pm*Worked with Shrey and Peter in order to develop a crawler for f6s. 11/14/2016 2:00-5:00 pm*Began sorting the Seed-DB database in an Excel document. 11/15/2016 4: 00-6:00 pm*Conducted some Google searches in an attempt to find more accelerator databases. Began looking through Executive Orders searching for keywords. 11/16/2016 2:00-5:00 pm*Completed SDC pull searching through Executive Orders. 11/17/2016 4:00-6:00 pm*Continued working on Google searches for state accelerator list. Looked through f6s for common words that can be used to distinguish accelerators once we have finalized the crawler. 11/21/2016 2:00-5:00 pm*Randomly chose 10 accelerators from Excel list of accelerators on the RDP. Went through each website and listed the steps that I took in order to determine whether or not the website belonged to an accelerator. Will continue extracting cohort information tomorrow. 11/22/2016 4:00-6:00 pm*Listed out all steps for extracting cohort information from the ten randomly chosen accelerators. Worked with Peter in order to build a tool that will search all of the HTMLs and attempt to identify each one as an accelerator as well as extract some basic information. 11/28/2016 2:00-5:00 pm*Merged the F6S accelerator list with our other list, then posted it on the project page. Learned process for accelerator data extraction from Ed. 11/29/2016 4:00-6:00 pm*Began process of collecting data from the 20 accelerators that I am responsible for. 11/30/2016 2:00-5:00 pm*Continued collecting data from accelerators. Finished 15/20. 12/1/2016 4:00-6:00 pm*Continued collecting data from accelerators. Finished original 20, picked up a new set of updated VC Data20. 12/2/2016 2:00-5:00 pm*Continued collecting data from accelerators. Finished next 20. 12/8/2016 1:00-3:00 pm*Completed collecting data from accelerators for the semester.
[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]]
[[Category:Work Log]]

Navigation menu