Changes

Jump to navigation Jump to search
no edit summary
10/17/2016 2:00-5:00 pm: Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of what I believe it is and included some helpful links===Fall 2017===<onlyinclude>
10[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]] 9/11/2017 2:00-5:00 pm*Spoke to Ed about the project going forward. Organized the current updated data for our project. 9/12/2017 3:00-5:00 pm*Began going through the Cleaned Cohort Data Excel file and found a few problems with it. Will continue the cleaning process for the rest of the week. 9/1813/2016 42017 2:00-65:00 pm*Sorted through Cleaned Cohort Data and finalized our List of Accelerators. We can begin the process of creating our PercentVC table. 9/14/2017 3: 00-5:00 pm*Completely finalized our dataset of accelerators and startups. Met with Michelle Passo to discuss objectives of the research partner Shrey who filled for credit course. 9/18/2017 2:00-4:00 pm*Talked with Peter about the LinkedIn crawler data. Went through VC page that Meghana sent me in on where we are . 9/19/2017 3:00-5:00 pm*Completed SDC pull of updated VC Data. 9/20/2017 2:00-5:00 pm*Attempted several times to run the Matcher. Cleaned our pulled data. 9/21/2017 3:00-5:00 pm*Came extremely close to running the Matcher the correctly. Reviewed the final LinkedIn data from Peter. 9/25/2017 2:00-5:00 pm*Finalized the matched file of accelerator companies with VC portfolio companies. Gave Ben the project; Began looking data on websites of certain Georgia accelerators for how . 9/26/2017 3:00-5:00 pm*Worked on finding the duplicates in our Matched file in order to have the most accurate data. 9/27/2017 2:00-5:00 pm*Attempted to find a way to organize the duplicate matches. 9/28/2017 4:00-5:00 pm*Continued running through matched data in order to organize it effectively. 10/2/2017 2:00-5:00 pm*Talked to determine their cohorts and listed these Ed about next steps for the project. Practiced accessing the crunchbase database on SQL. Brushed up on SQL code. 10/3/2017 3:00-5:00 pm*Searched the database for crunchbase investment information. 10/4/2017 2:00-5:00 pm*Pulled the funding rounds table from SQL and matched it with the wikicompanies that have received VC funding in order to gather round dates. 10/6/2017 3:00-5:00 pm*Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators. 10/11/2017 2:00-3:30 pm:*Looked into using the WhoIs Parser in order to find when the companies went through their accelerators.
10/1912/2016 22017 3:00-5:00 pm: Finished looking on *Discovered that the remaining accelerator websites Wayback Machine will not be a good option for finding when companies went through their accelerators. Created a list of VCCompanies and wrote their earliest round date. Included a column for the steps on determining how to manually locate the cohortsdate they went through their accelerators and will fill it in when we find a good method of finding this date.
10/2016/2016 42017 2:00-63:00 30 pm: Met with Peter and Christy to discuss the possibility of creating a web crawler that will pull data from individual accelerator sites*Continued working on sorting VCCompanies by their earliest round date.
10/2417/2016 22017 3:00-5:00 pm: Brainstormed *Worked with Albert and Julia about changes Ben to the category name for SBDE. Spoke find a solution to Ed about full scope our problem of accelerator projectdata acquisition. Finalized earliest round date for VCCompanies.
10/2518/2016 42017 2:00-65:00 pm: Brainstormed *Updated our VC data with Shrey about different potential industry focuses within accelerators, as well as different variables Ed's help in order to search for in terms increase the accuracy and completion of accelerators, startups, cohorts, etcour data.
10/2619/2016 22017 3:00-5:00 pm: Began searching for more databases including lists of accelerators as well as some characteristics *Organized all of those accelerators; Began searching for characteristics that identify accelerators on their websitesour matched data and updated it in order to reflect the most recent SDC pull with Ed. Matched Crunchbase data with our cohort companies.
10/2720/2016 42017 2:00-63:00 30 pm: Continued searching for relevant lists of accelerators to include on our page. Added some links that have high potential under *Generated the tab (Obtained from List new list of Accelerators or various Google searches)VCCompanies as well as their earliest round dates.
10/3123/2016 2017 2:00-53:00 30 pm: Began constructing a list of variables that clearly distinguish an accelerator *Worked on its website. This is sorting out the discrepancies in an effort to allow a crawler to crawl through many Google searches and identify acceleratorsour matched data.
1110/124/2016 42017 3:00-65:00 pm: Continued looking for variables that could identify accelerators from their websites. Searched *Went through numerous different websites list of VCCompanies and began adding respective accelerators obtained from our current databasesin order to proceed with VCPercentage table.
1110/225/2016 2017 2:00-45:00 pm: *Continued combing going through websites list of numerous VCCompanies and adding accelerators, well-known and other, in the hopes of finding identifying variables.
1110/26/2017 3/2016 4:0030-65:00 30 pm: Finalized my *Continued going through list of variables that could be used to distinguish the websites of VCCompanies and adding accelerators. Slightly re-arranged our list of accelerator databases in order of relevanceWill have this completed on Monday.
1110/730/2016 2017 2:00-53:00 30 pm: Began compiling *Finished adding all of the accelerators to the list of all acceleratorsVCCompanies. Created Added a new TextPad document with information from a new databasecolumn indicating whether or not the company went through two or more accelerators.
1110/831/2016 42017 3:00-65:00 pm: Worked with Shrey and Ben *Began compiling data in order to compile all of our accelerator databases into one long list on Textpadthe column for Date Company went through Accelerator.
11/91/2016 2017 2:00-54:00 pm: Continued formulating a database *Finalized entering dates for all accelerators and all of the available info givenY Combinator cohort companies.
11/102/2016 2017 4:00-65:00 30 pm: Worked with Shrey and Peter in order to develop a crawler for f6s*Continued entering cohort company dates into Excel file.
11/146/2016 2017 2:00-54:00 pm: *Continued entering cohort company dates into Excel file. Began sorting the Seed-DB database in an Excel documentcompiling a list of keywords for demo day press releases.
11/157/2016 42017 3:00-65:00 pm: Conducted some Google searches in an attempt *Finished coming up with keywords for demo day crawler. Sent the final list to find more accelerator databases. Began looking through Executive Orders searching for keywordsPeter.
11/168/2016 2017 2:00-53:00 30 pm: Completed searching through Executive Orders*Spoke to Ed and organized all of our current data.
11/179/2016 42017 3:00-65:00 pm: Continued working on Google searches for state accelerator list. Looked through f6s for common words that can be used to distinguish accelerators once we have finalized the crawler*Created a new project page called Accelerator Data and listed all relevant files as well as descriptions.
11/2114/2016 22017 3:00-5:00 pm: Randomly chose 10 accelerators from Excel list of accelerators on the RDP. Went through each website *Looked up URLs and listed the steps that I took in order to determine decided whether or not the website belonged to an accelerator. Will continue extracting cohort information tomorrowwebiste was relevant.
11/2215/2016 42017 2:00-65:00 pm: Listed out all steps for extracting cohort information *Created SQL database entitled "acceleratordata" and began creating tables from the ten randomly chosen accelerators. Worked with Peter in order to build a tool that will search all folder of the HTMLs and attempt to identify each one as an accelerator as well as extract some basic informationAll Relevant Files.
11/2816/2016 22017 3:00-5:00 pm: Merged the F6S accelerator list with our other list, then posted it on the project page. Learned process for accelerator data extraction from Ed*Continued to input tables into SQL database.
11/2920/2016 42017 2:00-65:00 pm: Began process of collecting data from the 20 accelerators that I am responsible for*Cleaned text files in order to import tables into SQL database.
11/3027/2016 2017 2:00-5:00 pm: Continued collecting data from accelerators*Worked with Peter to find and exclude irrelevant keywords on HTML pages. Finished 15/20Began categorizing relevant demo day pages.
1211/128/2016 42017 3:00-65:00 pm: Continued collecting data from accelerators. *Finished original 20, picked up a new set inputting tables of 20relevant files into SQL database.
1211/229/2016 2017 2:00-5:00 pm: Continued collecting data from accelerators*Went through accelerator HTML URLs. Finished next 20Spoke with Ed about going through HTMLs and classifying based on overall and specific relevance.
12/81/2016 12017 3:00-35:00 pm: Completed collecting data from accelerators for the semester*Worked through accelerator links and classified pages based on whether or not they provided relevant information about startup timing.
112/184/2017 110:00-512:00 pm: *Continued collecting data for accelerator project. Helped Catherine draft tweets for the McNair Center twitter accountrunning through demo day crawl URLs and scoring them based on relevance.
112/207/2017 1:00-34:00 30 pm: Continued collecting data on accelerators*Finalized scoring of demo day URLs for the original crawl. Attended McNair Center team meetingLast day of work for this semester.
1</23/2017 1:00-5:00 pm: Began combing through accelerator list, determining which accelerators are still missing data and documenting these in a TextPad file. Finished through #115.onlyinclude>
1/25/===Spring 2017 1:00-5:00 pm: Continued looking through accelerator list.===
1/2718/2017 1:00-35:00 pm: *Continued going through collecting data for accelerator listproject. Left off on #226 with ShreyHelped Catherine draft tweets for the McNair Center twitter account.
1/20/2017 1:00-53:00 pm: *Continued going through accelerator listcollecting data on accelerators. Finished through #440Attended McNair Center team meeting.
21/123/2017 1:00-5:00 pm: Finished going *Began combing through the accelerator list of , determining which accelerators looking for incomplete filesare still missing data and documenting these in a TextPad file. Began completing the files that were not doneFinished through #115.
21/325/2017 1:00-35:00 pm: *Continued working on completing looking through accelerator fileslist.
21/627/2017 1:00-43:30 00 pm: Finished data set of accelerators. Began *Continued going through and making sure that all text files and cohort files are of the same format so Peter can easily pull the informationaccelerator list. Left for 30 minutes for an interview from 2:30-3:00 pmoff on #226 with Shrey.
21/820/2017 1:00-5:00 pm: *Continued going through accelerator list. Finished formatting through #137. Spoke with Ed about project440.
2/131/2017 1:00-5:00 pm: Completed formatting *Finished going through the list of accelerators looking for all accelerator text incomplete files. Began completing the filesthat were not done.
2/153/2017 31:00-53:00 pm: Made copy of the completed data set. Spoke to Ed about future steps to take for project including gathering founder data and obtaining the crunchbase api*Continued working on completing accelerator files.
2/176/2017 1:00-34:00 30 pm: Went *Finished data set of accelerators. Began going through final Excel spreadsheet for and making sure that all text files and cohort information. Still need to run files are of the crawler one more time after same format so Peter can easily pull the completion of the editing processinformation. Found the application Left for 30 minutes for the crunchbase api which will hopefully allow us to gain accessan interview from 2:30-3:00 pm.
2/208/2017 1:00-5:00 pm: Filled out another application for Crunchbase research access; Found the first source for the incubator *Finished formatting through #137. Spoke with Ed about project on angel.co, will hopefully work with Peter to make a crawler similar to f6s
2/2213/2017 1:00-5:00 pm: Pulled data from SDC *Completed formatting for Ed and normalized it. Learned how to use SDC and the normalizerall accelerator text files.
2/2415/2017 13:00-35:00 pm: Finished cleaning up *Made copy of the cohort completed data set. Spoke to Ed about future steps to take for Y-combinator on project including gathering founder data and obtaining the Final Cohort Excel Spreadsheetcrunchbase api.
2/2717/2017 1:00-53:00 pm: Continued cleaning up *Went through final Excel spreadsheet for cohort information. Still need to run the crawler one more time after the cohort data in completion of the Excel fileediting process. Finished Cohort Number and YearFound the application for the crunchbase api which will hopefully allow us to gain access.
32/120/2017 21:00-5:00 pm: Worked *Filled out another application for Crunchbase research access; Found the first source for the incubator project on angel.co, will hopefully work with Ben and Shrey Peter to pull data from SDC for all VC funded companies and normalized it make a crawler similar to put it in an Excel document.f6s
32/322/2017 1:00-25:30 00 pm: Worked with Ben *Pulled data from SDC for Ed and normalized it. Learned how to try use SDC and repeat down the VC data without it going too farnormalizer.
32/624/2017 1:00-43:00 pm: Worked with Shrey to finish *Finished cleaning up the cohort data. It is ready to be run through for Y-combinator on the matcher with BenFinal Cohort Excel Spreadsheet.
32/827/2017 1:00-5:00 pm: Matched *Continued cleaning up the VC Data with cohort data in the list of Excel file. Finished Cohort Companies Number and got one list of all cohort companies that have received VC fundingYear.
3/101/2017 122:00-25:00 pm: Put a write-up on the top of the Accelerator wiki page detailing where we are *Worked with Ben and Shrey to pull data from SDC for all VC funded companies and normalized it to put it in the project currently as well as what data we have accumulated on the RDPan Excel document.
3/203/2017 1:00-52:00 30 pm: Began gathering the URLs of all accelerators in a TextPad file called Accelerator URLs. Participated in *Worked with Ben to try and repeat down the SQL training sessionVC data without it going too far.
3/226/2017 1:00-54:00 pm: Made tables in Terminal for Accelerator companies matched *Worked with Shrey to finish cleaning the cohort data. It is ready to be run through the matcher with VC companies and for Cohort DataBen.
3/278/2017 1:00-45:00 pm: Compiled *Matched the VC Data with the list of Cohort Companies and got one list of all URLs of accelerator into a TextPad filecohort companies that have received VC funding.
3/2910/2017 112:00-52:00 pm: Worked *Put a write-up on the matched data with Ben. Next time I will run top of the RegEx code that will filter Accelerator wiki page detailing where we are in the URLs, and I will look through project currently as well as what data we have accumulated on the duplicates where two different VC backed company names matched to one cohort company nameRDP.
3/3120/2017 1:00-25:00 pm: Ran *Began gathering the code for accelerator urls which are ready to be run through the wayback machine URLs of all accelerators in a TextPad file called Accelerator URLs. Participated in order to get the start dates. Also began looking through vc backed company namesSQL training session.
43/322/2017 1:00-5:00 pm: Continued looking through double *Made tables in Terminal for Accelerator companies matched with VC companies. Learned more SQL from Edand for Cohort Data.
43/527/2017 1:00-54:00 pm: Made the final vc percentage table on terminal and for next time I will collect missing *Compiled all URLs of accelerator datainto a TextPad file.
43/729/2017 1:00-35:00 pm: Began collecting cohort *Worked on the matched data for big accelerators with Ben. Next time I will run the RegEx code that were missing from our list in order to add it will filter the URLs, and I will look through the duplicates where two different VC backed company names matched to our final list of one cohort companiescompany name.
43/1031/2017 1:00-52:00 pm: Finished gathering cohort company names *Ran the code for big accelerators that we were missing and put them into accelerator urls which are ready to be run through the Cleaned Cohort Companies Excel filewayback machine in order to get the start dates. Ben is Also began looking through Crunchbase data in order to possibly find more missing acceleratorsvc backed company names.
4/143/2017 1:00-45:00 pm: Began working through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators and wrote notes on the ones that I was able to go *Continued looking throughdouble matched VC companies. Need to finish this textpad before moving forwardLearned more SQL from Ed.
4/175/2017 1:00-45:00 pm: Continued going through potential Crunchbase accelerators that we may have missed. Talked to Ed about getting a more comprehensive list from Excel file and by *Made the end of the semester have the tables final vc percentage table on terminal and for next time I will collect missing accelerator data collected and done.
4/197/2017 1:00-43:00 pm: Worked with Jeemin to generate an entire list of potential US *Began collecting cohort data for big accelerators that were missing from crunchbase. Worked our list in order to find a way add it to classify accelerators just based on their descriptionsour final list of cohort companies.
4/2110/2017: 1:00-45:00 pm: Continued working through the list identifying *Finished gathering cohort company names for big accelerators that we do not havewere missing and put them into the Cleaned Cohort Companies Excel file. Ramee and Juliette are now helping us gather cohort Ben is looking through Crunchbase data for those in order to possibly find more missing accelerators.
4/2414/2017 91:00-14:00 pm: Updated Veeral *Began working through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators and wrote notes on current state of project. Typed up a the ones that I was able to-do list on the discussion wiki for Veeralgo through. Got new cohort data on an accelerator and added it Need to Excel filefinish this textpad before moving forward.
54/317/2017 111:00-14:00 pm: *Continued going through potential Crunchbase accelerators that we may have missed. Talked to Ed and Anne about future report. Continued working through getting a more comprehensive list from Excel file and by the end of crunchbase potential accelerators. Last day of work for this the semesterhave the tables and data collected and done.
94/1119/2017 21:00-54:00 pm: Spoke *Worked with Jeemin to Ed about the project going forwardgenerate an entire list of potential US accelerators from crunchbase. Organized the current updated data for our projectWorked to find a way to classify accelerators just based on their descriptions.
94/1221/2017 3: 1:00-54:00 pm: Began going *Continued working through the Cleaned Cohort Data Excel file list identifying accelerators that we do not have. Ramee and found a few problems with it. Will continue the cleaning process Juliette are now helping us gather cohort data for the rest of the weekthose missing accelerators.
94/1324/2017 29:00-51:00 pm: Sorted through Cleaned Cohort Data and finalized our List *Updated Veeral on current state of Acceleratorsproject. We can begin Typed up a to-do list on the process of creating our PercentVC tablediscussion wiki for Veeral. Got new cohort data on an accelerator and added it to Excel file.
95/143/2017 311:00-51:00 pm: Completely finalized our dataset *Talked to Ed and Anne about future report. Continued working through list of crunchbase potential accelerators and startups. Met with Michelle Passo to discuss objectives Last day of the research work for credit coursethis semester.
9/18/2017 2:00-4:00 pm: Talked with Peter about the LinkedIn crawler data. Went through VC page that Meghana sent me.===Fall 2016===
910/1917/2017 32016 2:00-5:00 pm: Completed SDC pull *Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of updated VC Data.what I believe it is and included some helpful links
910/2018/2017 22016 4:00-56:00 pm: Attempted several times *Met with research partner Shrey who filled me in on where we are with the project; Began looking on websites of certain accelerators for how to run determine their cohorts and listed these steps on the Matcher. Cleaned our pulled data.wiki
910/2119/2017 32016 2:00-5:00 pm: Came extremely close to running *Finished looking on the Matcher remaining accelerator websites and wrote the correctly. Reviewed steps on determining how to manually locate the final LinkedIn data from Petercohorts.
910/2520/2017 22016 4:00-56:00 pm: Finalized *Met with Peter and Christy to discuss the matched file possibility of creating a web crawler that will pull data from individual accelerator companies with VC portfolio companies. Gave Ben the data on Georgia acceleratorssites.
910/2624/2017 32016 2:00-5:00 pm: Worked on finding *Brainstormed with Albert and Julia about changes to the duplicates in our Matched file in order category name for SBDE. Spoke to have the most accurate dataEd about full scope of accelerator project.
910/2725/2017 22016 4:00-56:00 pm: Attempted to find a way *Brainstormed with Shrey about different potential industry focuses within accelerators, as well as different variables to organize the duplicate matchessearch for in terms of accelerators, startups, cohorts, etc.
910/2826/2017 42016 2:00-5:00 pm: Continued running through matched data in order to organize it effectively.*Began searching for more databases including lists of accelerators as well as some characteristics of those accelerators; Began searching for characteristics that identify accelerators on their websites
10/227/2017 22016 4:00-56:00 pm: Talked *Continued searching for relevant lists of accelerators to Ed about next steps for the projectinclude on our page. Practiced accessing Added some links that have high potential under the crunchbase database on SQL. Brushed up on SQL codetab (Obtained from List of Accelerators or various Google searches).
10/331/2017 32016 2:00-5:00 pm: Searched the database for crunchbase investment information*Began constructing a list of variables that clearly distinguish an accelerator on its website. This is in an effort to allow a crawler to crawl through many Google searches and identify accelerators.
1011/1/2016 4/2017 2:00-56:00 pm: Pulled the funding rounds table *Continued looking for variables that could identify accelerators from their websites. Searched through numerous different websites of accelerators obtained from SQL and matched it with the companies that have received VC funding in order to gather round datesour current databases.
1011/62/2017 32016 2:00-54:00 pm: Went *Continued combing through websites of numerous accelerators, well-known and other, in the matched data. Brainstormed ways to get the dates for cohort companies going through acceleratorshopes of finding identifying variables.
1011/113/2017 22016 4:00-36:30 00 pm: Looked into using *Finalized my list of variables that could be used to distinguish the WhoIs Parser websites of accelerators. Slightly re-arranged our list of accelerator databases in order to find when the companies went through their acceleratorsof relevance.
1011/127/2017 32016 2:00-5:00 pm*Began compiling the list of all accelerators. Created a new TextPad document with information from a new database. 11/8/2016 4:00-6:00 pm*Worked with Shrey and Ben in order to compile all of our accelerator databases into one long list on Textpad. 11/9/2016 2: Discovered that 00-5:00 pm*Continued formulating a database for all accelerators and all of the Wayback Machine will not be available info given. 11/10/2016 4:00-6:00 pm*Worked with Shrey and Peter in order to develop a good option crawler for f6s. 11/14/2016 2:00-5:00 pm*Began sorting the Seed-DB database in an Excel document. 11/15/2016 4:00-6:00 pm*Conducted some Google searches in an attempt to find more accelerator databases. Began looking through Executive Orders searching for finding when companies went keywords. 11/16/2016 2:00-5:00 pm*Completed searching through their Executive Orders. 11/17/2016 4:00-6:00 pm*Continued working on Google searches for state accelerator list. Looked through f6s for common words that can be used to distinguish acceleratorsonce we have finalized the crawler. Created a  11/21/2016 2:00-5:00 pm*Randomly chose 10 accelerators from Excel list of VCCompanies accelerators on the RDP. Went through each website and their earliest round datelisted the steps that I took in order to determine whether or not the website belonged to an accelerator. Included a column Will continue extracting cohort information tomorrow. 11/22/2016 4:00-6:00 pm*Listed out all steps for extracting cohort information from the date they went through their ten randomly chosen accelerators . Worked with Peter in order to build a tool that will search all of the HTMLs and will fill attempt to identify each one as an accelerator as well as extract some basic information. 11/28/2016 2:00-5:00 pm*Merged the F6S accelerator list with our other list, then posted it in when we find on the project page. Learned process for accelerator data extraction from Ed. 11/29/2016 4:00-6:00 pm*Began process of collecting data from the 20 accelerators that I am responsible for. 11/30/2016 2:00-5:00 pm*Continued collecting data from accelerators. Finished 15/20. 12/1/2016 4:00-6:00 pm*Continued collecting data from accelerators. Finished original 20, picked up a good method new set of finding this date20. 12/2/2016 2:00-5:00 pm*Continued collecting data from accelerators. Finished next 20. 12/8/2016 1:00-3:00 pm*Completed collecting data from accelerators for the semester.
[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]]
[[Category:Work Log]]

Navigation menu