Changes

Jump to navigation Jump to search
no edit summary
10/17/2016 2:00-5:00 pm: Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of what I believe it is and included some helpful links===Fall 2017===<onlyinclude>
10[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]] 9/11/2017 2:00-5:00 pm*Spoke to Ed about the project going forward. Organized the current updated data for our project. 9/12/2017 3:00-5:00 pm*Began going through the Cleaned Cohort Data Excel file and found a few problems with it. Will continue the cleaning process for the rest of the week. 9/1813/2016 42017 2:00-65:00 pm*Sorted through Cleaned Cohort Data and finalized our List of Accelerators. We can begin the process of creating our PercentVC table. 9/14/2017 3:00-5: 00 pm*Completely finalized our dataset of accelerators and startups. Met with Michelle Passo to discuss objectives of the research partner Shrey who filled for credit course. 9/18/2017 2:00-4:00 pm*Talked with Peter about the LinkedIn crawler data. Went through VC page that Meghana sent me . 9/19/2017 3:00-5:00 pm*Completed SDC pull of updated VC Data. 9/20/2017 2:00-5:00 pm*Attempted several times to run the Matcher. Cleaned our pulled data. 9/21/2017 3:00-5:00 pm*Came extremely close to running the Matcher the correctly. Reviewed the final LinkedIn data from Peter. 9/25/2017 2:00-5:00 pm*Finalized the matched file of accelerator companies with VC portfolio companies. Gave Ben the data on Georgia accelerators. 9/26/2017 3:00-5:00 pm*Worked on finding the duplicates in our Matched file in order to have the most accurate data. 9/27/2017 2:00-5:00 pm*Attempted to find a way to organize the duplicate matches. 9/28/2017 4:00-5:00 pm*Continued running through matched data in order to organize it effectively. 10/2/2017 2:00-5:00 pm*Talked to Ed about next steps for the project. Practiced accessing the crunchbase database on SQL. Brushed up on where SQL code. 10/3/2017 3:00-5:00 pm*Searched the database for crunchbase investment information. 10/4/2017 2:00-5:00 pm*Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to gather round dates. 10/6/2017 3:00-5:00 pm*Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators. 10/11/2017 2:00-3:30 pm:*Looked into using the WhoIs Parser in order to find when the companies went through their accelerators. 10/12/2017 3:00-5:00 pm*Discovered that the Wayback Machine will not be a good option for finding when companies went through their accelerators. Created a list of VCCompanies and their earliest round date. Included a column for the date they went through their accelerators and will fill it in when we are find a good method of finding this date. 10/16/2017 2:00-3:30 pm*Continued working on sorting VCCompanies by their earliest round date. 10/17/2017 3:00-5:00 pm*Worked with Ben to find a solution to our problem of data acquisition. Finalized earliest round date for VCCompanies. 10/18/2017 2:00-5:00 pm*Updated our VC data with Ed's help in order to increase the accuracy and completion of our data. 10/19/2017 3:00-5:00 pm*Organized all of our matched data and updated it in order to reflect the most recent SDC pull with Ed. Matched Crunchbase data with our cohort companies. 10/20/2017 2:00-3:30 pm*Generated the new list of VCCompanies as well as their earliest round dates. 10/23/2017 2:00-3:30 pm*Worked on sorting out the project; Began looking discrepancies in our matched data. 10/24/2017 3:00-5:00 pm*Went through list of VCCompanies and began adding respective accelerators in order to proceed with VCPercentage table. 10/25/2017 2:00-5:00 pm*Continued going through list of VCCompanies and adding accelerators. 10/26/2017 3:30-5:30 pm*Continued going through list of VCCompanies and adding accelerators. Will have this completed on websites Monday. 10/30/2017 2:00-3:30 pm*Finished adding all of certain the accelerators to the list of VCCompanies. Added a column indicating whether or not the company went through two or more accelerators. 10/31/2017 3:00-5:00 pm*Began compiling data in the column for Date Company went through Accelerator. 11/1/2017 2:00-4:00 pm*Finalized entering dates for how Y Combinator cohort companies. 11/2/2017 4:00-5:30 pm*Continued entering cohort company dates into Excel file. 11/6/2017 2:00-4:00 pm*Continued entering cohort company dates into Excel file. Began compiling a list of keywords for demo day press releases. 11/7/2017 3:00-5:00 pm*Finished coming up with keywords for demo day crawler. Sent the final list to Peter. 11/8/2017 2:00-3:30 pm*Spoke to determine their cohorts Ed and organized all of our current data. 11/9/2017 3:00-5:00 pm*Created a new project page called Accelerator Data and listed these steps on the wikiall relevant files as well as descriptions.
1011/1914/2016 22017 3:00-5:00 pm: Finished looking on the remaining accelerator websites *Looked up URLs and wrote the steps on determining how to manually locate decided whether or not the cohortswebiste was relevant.
1011/2015/2016 42017 2:00-65:00 pm: Met with Peter *Created SQL database entitled "acceleratordata" and Christy to discuss the possibility of began creating a web crawler that will pull data tables from individual accelerator sitesfolder of All Relevant Files.
1011/2416/2016 22017 3:00-5:00 pm: Brainstormed with Albert and Julia about changes to the category name for SBDE. Spoke *Continued to Ed about full scope of accelerator projectinput tables into SQL database.
1011/2520/2016 42017 2:00-65:00 pm: Brainstormed with Shrey about different potential industry focuses within accelerators, as well as different variables *Cleaned text files in order to search for in terms of accelerators, startups, cohorts, etcimport tables into SQL database.
1011/2627/2016 2017 2:00-5:00 pm: Began searching for more databases including lists of accelerators as well as some characteristics of those accelerators; *Worked with Peter to find and exclude irrelevant keywords on HTML pages. Began searching for characteristics that identify accelerators on their websitescategorizing relevant demo day pages.
1011/2728/2016 42017 3:00-65:00 pm: Continued searching for *Finished inputting tables of relevant lists of accelerators to include on our page. Added some links that have high potential under the tab (Obtained from List of Accelerators or various Google searches)files into SQL database.
1011/3129/2016 2017 2:00-5:00 pm: Began constructing a list of variables that clearly distinguish an *Went through accelerator on its websiteHTML URLs. This is in an effort to allow a crawler to crawl Spoke with Ed about going through many Google searches HTMLs and classifying based on overall and identify acceleratorsspecific relevance.
1112/1/2016 42017 3:00-65:00 pm: Continued looking for variables that could identify accelerators from their websites. Searched *Worked through numerous different websites of accelerators obtained from our current databasesaccelerator links and classified pages based on whether or not they provided relevant information about startup timing.
1112/24/2016 22017 10:00-412:00 pm: *Continued combing running through websites of numerous accelerators, well-known demo day crawl URLs and other, in the hopes of finding identifying variablesscoring them based on relevance.
1112/37/2016 42017 1:00-64:00 30 pm: *Finalized my list scoring of variables that could be used to distinguish demo day URLs for the websites of acceleratorsoriginal crawl. Slightly re-arranged our list of accelerator databases in order Last day of relevancework for this semester.
11</7/2016 2:00-5:00 pm: Began compiling the list of all accelerators. Created a new TextPad document with information from a new database.onlyinclude>
11/8/2016 4:00-6:00 pm: Worked with Shrey and Ben in order to compile all of our accelerator databases into one long list on Textpad.===Spring 2017===
111/918/2016 22017 1:00-5:00 pm: *Continued formulating a database collecting data for accelerator project. Helped Catherine draft tweets for all accelerators and all of the available info givenMcNair Center twitter account.
111/1020/2016 42017 1:00-63:00 pm: Worked with Shrey and Peter in order to develop a crawler for f6s*Continued collecting data on accelerators. Attended McNair Center team meeting.
111/1423/2016 22017 1:00-5:00 pm: *Began sorting the Seed-DB database combing through accelerator list, determining which accelerators are still missing data and documenting these in an Excel documenta TextPad file. Finished through #115.
111/1525/2016 42017 1:00-65:00 pm: Conducted some Google searches in an attempt to find more accelerator databases. Began *Continued looking through Executive Orders searching for keywordsaccelerator list.
111/1627/2016 22017 1:00-53:00 pm: Completed searching *Continued going through Executive Ordersaccelerator list. Left off on #226 with Shrey.
111/1720/2016 42017 1:00-65:00 pm: *Continued working on Google searches for state going through accelerator list. Looked Finished through f6s for common words that can be used to distinguish accelerators once we have finalized the crawler#440.
112/211/2016 22017 1:00-5:00 pm: Randomly chose 10 accelerators from Excel *Finished going through the list of accelerators on the RDPlooking for incomplete files. Went through each website and listed Began completing the steps files that I took in order to determine whether or were not the website belonged to an accelerator. Will continue extracting cohort information tomorrowdone.
112/223/2016 42017 1:00-63:00 pm: Listed out all steps for extracting cohort information from the ten randomly chosen accelerators. Worked with Peter in order to build a tool that will search all of the HTMLs and attempt to identify each one as an *Continued working on completing accelerator as well as extract some basic informationfiles.
112/286/2016 22017 1:00-54:00 30 pm: Merged *Finished data set of accelerators. Began going through and making sure that all text files and cohort files are of the F6S accelerator list with our other list, then posted it on same format so Peter can easily pull the project pageinformation. Learned process Left for 30 minutes for accelerator data extraction an interview from Ed2:30-3:00 pm.
112/298/2016 42017 1:00-65:00 pm: Began process of collecting data from the 20 accelerators that I am responsible for*Finished formatting through #137. Spoke with Ed about project.
112/3013/2016 22017 1:00-5:00 pm: Continued collecting data from accelerators. Finished 15/20*Completed formatting for all accelerator text files.
122/115/2016 42017 3:00-65:00 pm: Continued collecting *Made copy of the completed data from acceleratorsset. Finished original 20, picked up a new set of 20Spoke to Ed about future steps to take for project including gathering founder data and obtaining the crunchbase api.
122/217/2016 22017 1:00-53:00 pm: Continued collecting data from accelerators*Went through final Excel spreadsheet for cohort information. Still need to run the crawler one more time after the completion of the editing process. Finished next 20Found the application for the crunchbase api which will hopefully allow us to gain access.
122/820/2016 2017 1:00-35:00 pm: Completed collecting data from accelerators *Filled out another application for Crunchbase research access; Found the first source for the semesterincubator project on angel.co, will hopefully work with Peter to make a crawler similar to f6s
12/1822/2017 1:00-5:00 pm: Continued collecting *Pulled data from SDC for accelerator projectEd and normalized it. Helped Catherine draft tweets for Learned how to use SDC and the McNair Center twitter accountnormalizer.
12/2024/2017 1:00-3:00 pm: Continued collecting *Finished cleaning up the cohort data for Y-combinator on accelerators. Attended McNair Center team meetingthe Final Cohort Excel Spreadsheet.
12/2327/2017 1:00-5:00 pm: Began combing through accelerator list, determining which accelerators are still missing *Continued cleaning up the cohort data and documenting these in a TextPad the Excel file. Finished through #115Cohort Number and Year.
3/1/25/2017 12:00-5:00 pm: Continued looking through accelerator list*Worked with Ben and Shrey to pull data from SDC for all VC funded companies and normalized it to put it in an Excel document.
13/273/2017 1:00-32:00 30 pm: Continued *Worked with Ben to try and repeat down the VC data without it going through accelerator list. Left off on #226 with Shreytoo far.
13/206/2017 1:00-54:00 pm: Continued going through accelerator list*Worked with Shrey to finish cleaning the cohort data. Finished It is ready to be run through #440the matcher with Ben.
23/18/2017 1:00-5:00 pm: Finished going through *Matched the VC Data with the list of accelerators looking for incomplete files. Began completing the files Cohort Companies and got one list of all cohort companies that were not donehave received VC funding.
23/310/2017 112:00-32:00 pm: Continued working *Put a write-up on the top of the Accelerator wiki page detailing where we are in the project currently as well as what data we have accumulated on completing accelerator filesthe RDP.
23/620/2017 1:00-45:30 00 pm: Finished data set *Began gathering the URLs of all acceleratorsin a TextPad file called Accelerator URLs. Began going through and making sure that all text files and cohort files are of Participated in the same format so Peter can easily pull the information. Left for 30 minutes for an interview from 2:30-3:00 pmSQL training session.
23/822/2017 1:00-5:00 pm: Finished formatting through #137. Spoke *Made tables in Terminal for Accelerator companies matched with Ed about projectVC companies and for Cohort Data.
23/1327/2017 1:00-54:00 pm: Completed formatting for *Compiled all URLs of accelerator text filesinto a TextPad file.
23/1529/2017 31:00-5:00 pm: Made copy of *Worked on the completed matched data setwith Ben. Spoke to Ed about future steps to take for project including gathering founder data Next time I will run the RegEx code that will filter the URLs, and obtaining I will look through the crunchbase apiduplicates where two different VC backed company names matched to one cohort company name.
23/1731/2017 1:00-32:00 pm: Went through final Excel spreadsheet *Ran the code for cohort information. Still need accelerator urls which are ready to be run through the crawler one more time after wayback machine in order to get the completion of the editing processstart dates. Found the application for the crunchbase api which will hopefully allow us to gain accessAlso began looking through vc backed company names.
24/203/2017 1:00-5:00 pm: Filled out another application for Crunchbase research access; Found the first source for the incubator project on angel*Continued looking through double matched VC companies. Learned more SQL from Ed.co, will hopefully work with Peter to make a crawler similar to f6s
24/225/2017 1:00-5:00 pm: Pulled *Made the final vc percentage table on terminal and for next time I will collect missing accelerator data from SDC for Ed and normalized it. Learned how to use SDC and the normalizer.
24/247/2017 1:00-3:00 pm: Finished cleaning up the *Began collecting cohort data for Y-combinator on the Final Cohort Excel Spreadsheetbig accelerators that were missing from our list in order to add it to our final list of cohort companies.
24/2710/2017 1:00-5:00 pm: Continued cleaning up the *Finished gathering cohort data in company names for big accelerators that we were missing and put them into the Cleaned Cohort Companies Excel file. Finished Cohort Number and YearBen is looking through Crunchbase data in order to possibly find more missing accelerators.
34/114/2017 21:00-54:00 pm: Worked with Ben *Began working through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators and Shrey wrote notes on the ones that I was able to pull data from SDC for all VC funded companies and normalized it go through. Need to put it in an Excel documentfinish this textpad before moving forward.
34/317/2017 1:00-24:30 00 pm: Worked with Ben *Continued going through potential Crunchbase accelerators that we may have missed. Talked to try Ed about getting a more comprehensive list from Excel file and repeat down by the end of the semester have the VC tables and data without it going too farcollected and done.
34/619/2017 1:00-4:00 pm: *Worked with Shrey Jeemin to finish cleaning the cohort datagenerate an entire list of potential US accelerators from crunchbase. It is ready Worked to be run through the matcher with Benfind a way to classify accelerators just based on their descriptions.
34/821/2017 : 1:00-54:00 pm: Matched the VC Data with *Continued working through the list of Cohort Companies identifying accelerators that we do not have. Ramee and got one list of all Juliette are now helping us gather cohort companies that have received VC fundingdata for those missing accelerators.
34/1024/2017 129:00-21:00 pm: Put *Updated Veeral on current state of project. Typed up a writeto-up do list on the top of the Accelerator discussion wiki page detailing where we are in the project currently as well as what for Veeral. Got new cohort data we have accumulated on the RDPan accelerator and added it to Excel file.
5/3/20/2017 111:00-51:00 pm: Began gathering the URLs *Talked to Ed and Anne about future report. Continued working through list of all crunchbase potential accelerators in a TextPad file called Accelerator URLs. Participated in the SQL training sessionLast day of work for this semester.
3/22/2017 1:00-5:00 pm: Made tables in Terminal for Accelerator companies matched with VC companies and for Cohort Data.===Fall 2016===
310/2717/2017 12016 2:00-45:00 pm: Compiled all URLs *Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of accelerator into a TextPad file.what I believe it is and included some helpful links
310/2918/2017 12016 4:00-56:00 pm: Worked *Met with research partner Shrey who filled me in on the matched data where we are with Ben. Next time I will run the RegEx code that will filter the URLs, project; Began looking on websites of certain accelerators for how to determine their cohorts and I will look through listed these steps on the duplicates where two different VC backed company names matched to one cohort company name.wiki
310/3119/2017 12016 2:00-25:00 pm: Ran *Finished looking on the code for remaining accelerator urls which are ready to be run through websites and wrote the wayback machine in order steps on determining how to get manually locate the start dates. Also began looking through vc backed company namescohorts.
410/320/2017 12016 4:00-56:00 pm: Continued looking through double matched VC companies. Learned more SQL *Met with Peter and Christy to discuss the possibility of creating a web crawler that will pull data from Edindividual accelerator sites.
410/524/2017 12016 2:00-5:00 pm: Made *Brainstormed with Albert and Julia about changes to the final vc percentage table on terminal and category name for next time I will collect missing SBDE. Spoke to Ed about full scope of accelerator dataproject.
410/725/2017 12016 4:00-36:00 pm: Began collecting cohort data *Brainstormed with Shrey about different potential industry focuses within accelerators, as well as different variables to search for big accelerators that were missing from our list in order to add it to our final list terms of cohort companiesaccelerators, startups, cohorts, etc.
410/1026/2017 12016 2:00-5:00 pm: Finished gathering cohort company names *Began searching for big more databases including lists of accelerators as well as some characteristics of those accelerators; Began searching for characteristics that we were missing and put them into the Cleaned Cohort Companies Excel file. Ben is looking through Crunchbase data in order to possibly find more missing identify accelerators.on their websites
410/1427/2017 12016 4:00-46:00 pm: Began working through "Crunchbase Potential Accelerators" textpad that may contain missing *Continued searching for relevant lists of accelerators and wrote notes to include on our page. Added some links that have high potential under the ones that I was able to go through. Need to finish this textpad before moving forwardtab (Obtained from List of Accelerators or various Google searches).
410/1731/2017 12016 2:00-45:00 pm: Continued going through potential Crunchbase accelerators *Began constructing a list of variables that we may have missedclearly distinguish an accelerator on its website. Talked This is in an effort to Ed about getting allow a more comprehensive list from Excel file crawler to crawl through many Google searches and by the end of the semester have the tables and data collected and doneidentify accelerators.
411/191/2017 12016 4:00-46:00 pm: Worked with Jeemin to generate an entire list of potential US *Continued looking for variables that could identify accelerators from crunchbasetheir websites. Worked to find a way to classify Searched through numerous different websites of accelerators just based on their descriptionsobtained from our current databases.
411/212/2017: 12016 2:00-4:00 pm: *Continued working combing through websites of numerous accelerators, well-known and other, in the list hopes of finding identifying accelerators that we do not have. Ramee and Juliette are now helping us gather cohort data for those missing acceleratorsvariables.
411/243/2017 92016 4:00-16:00 pm: Updated Veeral on current state *Finalized my list of variables that could be used to distinguish the websites of projectaccelerators. Typed up a toSlightly re-do arranged our list on the discussion wiki for Veeral. Got new cohort data on an of accelerator and added it to Excel filedatabases in order of relevance.
511/37/2017 112016 2:00-15:00 pm: Talked to Ed and Anne about future report. Continued working through *Began compiling the list of crunchbase potential all accelerators. Last day of work for this semesterCreated a new TextPad document with information from a new database.
911/118/2017 22016 4:00-56:00 pm: Spoke *Worked with Shrey and Ben in order to Ed about the project going forward. Organized the current updated data for compile all of our projectaccelerator databases into one long list on Textpad.
11/9/12/2017 32016 2:00-5:00 pm: Began going through the Cleaned Cohort Data Excel file and found *Continued formulating a few problems with it. Will continue the cleaning process database for the rest all accelerators and all of the weekavailable info given.
911/1310/2017 22016 4:00-56:00 pm: Sorted through Cleaned Cohort Data *Worked with Shrey and finalized our List of Accelerators. We can begin the process of creating our PercentVC tablePeter in order to develop a crawler for f6s.
911/14/2017 32016 2:00-5:00 pm: Completely finalized our dataset of accelerators and startups. Met with Michelle Passo to discuss objectives of *Began sorting the research for credit courseSeed-DB database in an Excel document.
911/1815/2017 22016 4:00-46:00 pm: Talked with Peter about the LinkedIn crawler data*Conducted some Google searches in an attempt to find more accelerator databases. Went Began looking through VC page that Meghana sent meExecutive Orders searching for keywords.
911/1916/2017 32016 2:00-5:00 pm: *Completed SDC pull of updated VC Datasearching through Executive Orders.
911/2017/2017 22016 4:00-56:00 pm: Attempted several times *Continued working on Google searches for state accelerator list. Looked through f6s for common words that can be used to run distinguish accelerators once we have finalized the Matcher. Cleaned our pulled datacrawler.
911/21/2017 32016 2:00-5:00 pm: Came extremely close *Randomly chose 10 accelerators from Excel list of accelerators on the RDP. Went through each website and listed the steps that I took in order to running determine whether or not the Matcher the correctlywebsite belonged to an accelerator. Reviewed the final LinkedIn data from PeterWill continue extracting cohort information tomorrow.
911/2522/2017 22016 4:00-56:00 pm: Finalized *Listed out all steps for extracting cohort information from the matched file ten randomly chosen accelerators. Worked with Peter in order to build a tool that will search all of the HTMLs and attempt to identify each one as an accelerator companies with VC portfolio companies. Gave Ben the data on Georgia acceleratorsas well as extract some basic information.
911/2628/2017 32016 2:00-5:00 pm: Worked on finding *Merged the duplicates in F6S accelerator list with our Matched file in order to have other list, then posted it on the most accurate project page. Learned process for accelerator dataextraction from Ed.
911/2729/2017 22016 4:00-56:00 pm: Attempted to find a way to organize *Began process of collecting data from the duplicate matches20 accelerators that I am responsible for.
911/2830/2017 2016 2:00-5:00 pm*Continued collecting data from accelerators. Finished 15/20. 12/1/2016 4:00-6:00 pm*Continued collecting data from accelerators. Finished original 20, picked up a new set of 20. 12/2/2016 2:00-5:00 pm*Continued collecting data from accelerators. Finished next 20. 12/8/2016 1:00-3: Continued running through matched 00 pm*Completed collecting data in order to organize it effectivelyfrom accelerators for the semester.
[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]]
[[Category:Work Log]]

Navigation menu