Changes

Jump to navigation Jump to search
no edit summary
10/17/2016 2:00-5:00 pm: Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of what I believe it is and included some helpful links===Fall 2017===<onlyinclude>
10[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]] 9/1811/2017 2:00-5:00 pm*Spoke to Ed about the project going forward. Organized the current updated data for our project. 9/12/2017 3:00-5:00 pm*Began going through the Cleaned Cohort Data Excel file and found a few problems with it. Will continue the cleaning process for the rest of the week. 9/13/2016 42017 2:00-65:00 pm*Sorted through Cleaned Cohort Data and finalized our List of Accelerators. We can begin the process of creating our PercentVC table. 9/14/2017 3: 00-5:00 pm*Completely finalized our dataset of accelerators and startups. Met with Michelle Passo to discuss objectives of the research partner Shrey who filled for credit course. 9/18/2017 2:00-4:00 pm*Talked with Peter about the LinkedIn crawler data. Went through VC page that Meghana sent me in on where we are . 9/19/2017 3:00-5:00 pm*Completed SDC pull of updated VC Data. 9/20/2017 2:00-5:00 pm*Attempted several times to run the Matcher. Cleaned our pulled data. 9/21/2017 3:00-5:00 pm*Came extremely close to running the Matcher the correctly. Reviewed the final LinkedIn data from Peter. 9/25/2017 2:00-5:00 pm*Finalized the matched file of accelerator companies with VC portfolio companies. Gave Ben the project; Began looking data on websites of certain Georgia accelerators for how to determine their cohorts and listed these steps on the wiki.
109/1926/2016 22017 3:00-5:00 pm: Finished looking *Worked on finding the remaining accelerator websites and wrote the steps on determining how duplicates in our Matched file in order to manually locate have the cohortsmost accurate data.
109/2027/2016 42017 2:00-65:00 pm: Met with Peter and Christy *Attempted to find a way to discuss organize the possibility of creating a web crawler that will pull data from individual accelerator sitesduplicate matches.
109/2428/2016 22017 4:00-5:00 pm: Brainstormed with Albert and Julia about changes to the category name for SBDE. Spoke *Continued running through matched data in order to Ed about full scope of accelerator projectorganize it effectively.
10/252/2016 42017 2:00-65:00 pm: Brainstormed with Shrey *Talked to Ed about different potential industry focuses within accelerators, as well as different variables to search next steps for in terms of accelerators, startups, cohorts, etcthe project. Practiced accessing the crunchbase database on SQL. Brushed up on SQL code.
10/263/2016 22017 3:00-5:00 pm: Began searching for more databases including lists of accelerators as well as some characteristics of those accelerators; Began searching *Searched the database for characteristics that identify accelerators on their websitescrunchbase investment information.
10/274/2016 42017 2:00-65:00 pm: Continued searching for relevant lists of accelerators to include on our page. Added some links *Pulled the funding rounds table from SQL and matched it with the companies that have high potential under the tab (Obtained from List of Accelerators or various Google searches)received VC funding in order to gather round dates.
10/316/2016 22017 3:00-5:00 pm: Began constructing a list of variables that clearly distinguish an accelerator on its website*Went through the matched data. This is in an effort to allow a crawler Brainstormed ways to crawl get the dates for cohort companies going through many Google searches and identify accelerators.
10/11/1/2016 42017 2:00-63:00 30 pm: Continued looking for variables that could identify accelerators from *Looked into using the WhoIs Parser in order to find when the companies went through their websites. Searched through numerous different websites of accelerators obtained from our current databases.
1110/212/2016 22017 3:00-45:00 pm: Continued combing *Discovered that the Wayback Machine will not be a good option for finding when companies went through websites their accelerators. Created a list of numerous VCCompanies and their earliest round date. Included a column for the date they went through their accelerators, well-known and other, will fill it in the hopes when we find a good method of finding identifying variablesthis date.
1110/316/2016 42017 2:00-63:00 30 pm: Finalized my list of variables that could be used to distinguish the websites of accelerators. Slightly re-arranged our list of accelerator databases in order of relevance*Continued working on sorting VCCompanies by their earliest round date.
1110/717/2016 22017 3:00-5:00 pm: Began compiling the list *Worked with Ben to find a solution to our problem of all acceleratorsdata acquisition. Created a new TextPad document with information from a new databaseFinalized earliest round date for VCCompanies.
1110/818/2016 42017 2:00-65:00 pm: Worked *Updated our VC data with Shrey and Ben Ed's help in order to compile all increase the accuracy and completion of our accelerator databases into one long list on Textpaddata.
1110/919/2016 22017 3:00-5:00 pm: Continued formulating a database for *Organized all accelerators of our matched data and all of updated it in order to reflect the available info givenmost recent SDC pull with Ed. Matched Crunchbase data with our cohort companies.
1110/1020/2016 42017 2:00-63:00 30 pm: Worked with Shrey and Peter in order to develop a crawler for f6s*Generated the new list of VCCompanies as well as their earliest round dates.
1110/1423/2016 2017 2:00-53:00 30 pm: Began *Worked on sorting out the Seed-DB database discrepancies in an Excel documentour matched data.
1110/1524/2016 42017 3:00-65:00 pm: Conducted some Google searches *Went through list of VCCompanies and began adding respective accelerators in an attempt order to find more accelerator databases. Began looking through Executive Orders searching for keywordsproceed with VCPercentage table.
1110/1625/2016 2017 2:00-5:00 pm: Completed searching *Continued going through Executive Orderslist of VCCompanies and adding accelerators.
1110/1726/2016 42017 3:0030-65:00 30 pm: *Continued working on Google searches for state accelerator going through listof VCCompanies and adding accelerators. Looked through f6s for common words that can be used to distinguish accelerators once we Will have finalized the crawlerthis completed on Monday.
1110/2130/2016 2017 2:00-53:00 30 pm: Randomly chose 10 *Finished adding all of the accelerators from Excel to the list of accelerators on the RDPVCCompanies. Went through each website and listed the steps that I took in order to determine Added a column indicating whether or not the website belonged to an accelerator. Will continue extracting cohort information tomorrowcompany went through two or more accelerators.
1110/2231/2016 42017 3:00-65:00 pm: Listed out all steps for extracting cohort information from the ten randomly chosen accelerators. Worked with Peter *Began compiling data in order to build a tool that will search all of the HTMLs and attempt to identify each one as an accelerator as well as extract some basic informationcolumn for Date Company went through Accelerator.
11/281/2016 2017 2:00-54:00 pm: Merged the F6S accelerator list with our other list, then posted it on the project page. Learned process *Finalized entering dates for accelerator data extraction from EdY Combinator cohort companies.
11/292/2016 2017 4:00-65:00 30 pm: Began process of collecting data from the 20 accelerators that I am responsible for*Continued entering cohort company dates into Excel file.
11/306/2016 2017 2:00-54:00 pm: *Continued collecting data from acceleratorsentering cohort company dates into Excel file. Finished 15/20Began compiling a list of keywords for demo day press releases.
1211/17/2016 42017 3:00-65:00 pm: Continued collecting data from accelerators. *Finished original 20, picked coming up a new set of 20with keywords for demo day crawler. Sent the final list to Peter.
1211/28/2016 2017 2:00-53:00 30 pm: Continued collecting *Spoke to Ed and organized all of our current data from accelerators. Finished next 20.
1211/89/2016 12017 3:00-35:00 pm: Completed collecting data from accelerators for the semester*Created a new project page called Accelerator Data and listed all relevant files as well as descriptions.
111/1814/2017 13:00-5:00 pm: Continued collecting data for accelerator project. Helped Catherine draft tweets for *Looked up URLs and decided whether or not the McNair Center twitter accountwebiste was relevant.
111/2015/2017 12:00-35:00 pm: Continued collecting data on accelerators. Attended McNair Center team meeting*Created SQL database entitled "acceleratordata" and began creating tables from folder of All Relevant Files.
111/2316/2017 13:00-5:00 pm: Began combing through accelerator list, determining which accelerators are still missing data and documenting these in a TextPad file. Finished through #115*Continued to input tables into SQL database.
111/2520/2017 12:00-5:00 pm: Continued looking through accelerator list*Cleaned text files in order to import tables into SQL database.
111/27/2017 12:00-35:00 pm: Continued going through accelerator list*Worked with Peter to find and exclude irrelevant keywords on HTML pages. Left off on #226 with ShreyBegan categorizing relevant demo day pages.
111/2028/2017 13:00-5:00 pm: Continued going through accelerator list. *Finished through #440inputting tables of relevant files into SQL database.
211/129/2017 12:00-5:00 pm: Finished *Went through accelerator HTML URLs. Spoke with Ed about going through the list of accelerators looking for incomplete files. Began completing the files that were not doneHTMLs and classifying based on overall and specific relevance.
212/31/2017 13:00-35:00 pm: Continued working *Worked through accelerator links and classified pages based on completing accelerator fileswhether or not they provided relevant information about startup timing.
212/64/2017 110:00-412:30 00 pm: Finished data set of accelerators. Began going *Continued running through demo day crawl URLs and making sure that all text files and cohort files are of the same format so Peter can easily pull the information. Left for 30 minutes for an interview from 2:30-3:00 pmscoring them based on relevance.
212/87/2017 1:00-54:00 30 pm: Finished formatting through #137*Finalized scoring of demo day URLs for the original crawl. Spoke with Ed about projectLast day of work for this semester.
2</13/2017 1:00-5:00 pm: Completed formatting for all accelerator text files.onlyinclude>
2/15/===Spring 2017 3:00-5:00 pm: Made copy of the completed data set. Spoke to Ed about future steps to take for project including gathering founder data and obtaining the crunchbase api.===
21/1718/2017 1:00-35:00 pm: Went through final Excel spreadsheet *Continued collecting data for cohort informationaccelerator project. Still need to run the crawler one more time after the completion of the editing process. Found the application Helped Catherine draft tweets for the crunchbase api which will hopefully allow us to gain accessMcNair Center twitter account.
21/20/2017 1:00-53:00 pm: Filled out another application for Crunchbase research access; Found the first source for the incubator project *Continued collecting data on angelaccelerators. Attended McNair Center team meeting.co, will hopefully work with Peter to make a crawler similar to f6s
21/2223/2017 1:00-5:00 pm: Pulled *Began combing through accelerator list, determining which accelerators are still missing data from SDC for Ed and normalized itdocumenting these in a TextPad file. Learned how to use SDC and the normalizerFinished through #115.
21/2425/2017 1:00-35:00 pm: Finished cleaning up the cohort data for Y-combinator on the Final Cohort Excel Spreadsheet*Continued looking through accelerator list.
21/27/2017 1:00-53:00 pm: *Continued cleaning up the cohort data in the Excel filegoing through accelerator list. Finished Cohort Number and YearLeft off on #226 with Shrey.
31/120/2017 21:00-5:00 pm: Worked with Ben and Shrey to pull data from SDC for all VC funded companies and normalized it to put it in an Excel document*Continued going through accelerator list. Finished through #440.
32/31/2017 1:00-25:30 00 pm: Worked with Ben to try and repeat down *Finished going through the list of accelerators looking for incomplete files. Began completing the VC data without it going too farfiles that were not done.
2/3/6/2017 1:00-43:00 pm: Worked with Shrey to finish cleaning the cohort data. It is ready to be run through the matcher with Ben*Continued working on completing accelerator files.
32/86/2017 1:00-54:00 30 pm: Matched the VC Data with the list *Finished data set of Cohort Companies accelerators. Began going through and got one list of making sure that all text files and cohort companies that have received VC fundingfiles are of the same format so Peter can easily pull the information. Left for 30 minutes for an interview from 2:30-3:00 pm.
32/108/2017 121:00-25:00 pm: Put a write-up on the top of the Accelerator wiki page detailing where we are in the *Finished formatting through #137. Spoke with Ed about project currently as well as what data we have accumulated on the RDP.
32/2013/2017 1:00-5:00 pm: Began gathering the URLs of *Completed formatting for all accelerators in a TextPad file called Accelerator URLs. Participated in the SQL training sessionaccelerator text files.
32/2215/2017 13:00-5:00 pm: *Made tables in Terminal copy of the completed data set. Spoke to Ed about future steps to take for Accelerator companies matched with VC companies project including gathering founder data and for Cohort Dataobtaining the crunchbase api.
32/2717/2017 1:00-43:00 pm: Compiled all URLs *Went through final Excel spreadsheet for cohort information. Still need to run the crawler one more time after the completion of accelerator into a TextPad filethe editing process. Found the application for the crunchbase api which will hopefully allow us to gain access.
32/2920/2017 1:00-5:00 pm: Worked *Filled out another application for Crunchbase research access; Found the first source for the incubator project on the matched data with Benangel. Next time I will run the RegEx code that will filter the URLsco, and I will look through the duplicates where two different VC backed company names matched hopefully work with Peter to make a crawler similar to one cohort company name.f6s
32/3122/2017 1:00-25:00 pm: Ran the code *Pulled data from SDC for accelerator urls which are ready to be run through the wayback machine in order Ed and normalized it. Learned how to get use SDC and the start dates. Also began looking through vc backed company namesnormalizer.
42/324/2017 1:00-53:00 pm: Continued looking through double matched VC companies. Learned more SQL from Ed*Finished cleaning up the cohort data for Y-combinator on the Final Cohort Excel Spreadsheet.
42/527/2017 1:00-5:00 pm: Made *Continued cleaning up the cohort data in the final vc percentage table on terminal Excel file. Finished Cohort Number and for next time I will collect missing accelerator dataYear.
43/71/2017 12:00-35:00 pm: Began collecting cohort *Worked with Ben and Shrey to pull data from SDC for big accelerators that were missing from our list in order all VC funded companies and normalized it to add put it to our final list of cohort companiesin an Excel document.
43/103/2017 1:00-52:00 30 pm: Finished gathering cohort company names for big accelerators that we were missing *Worked with Ben to try and put them into repeat down the Cleaned Cohort Companies Excel file. Ben is looking through Crunchbase VC data in order to possibly find more missing acceleratorswithout it going too far.
43/146/2017 1:00-4:00 pm: Began working through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators and wrote notes on *Worked with Shrey to finish cleaning the ones that I was able cohort data. It is ready to go be run through. Need to finish this textpad before moving forwardthe matcher with Ben.
43/178/2017 1:00-45:00 pm: Continued going through potential Crunchbase accelerators that we may have missed. Talked to Ed about getting a more comprehensive *Matched the VC Data with the list from Excel file of Cohort Companies and by the end got one list of the semester all cohort companies that have the tables and data collected and donereceived VC funding.
43/1910/2017 112:00-42:00 pm: Worked with Jeemin to generate an entire list *Put a write-up on the top of potential US accelerators from crunchbase. Worked to find a way to classify accelerators just based the Accelerator wiki page detailing where we are in the project currently as well as what data we have accumulated on their descriptionsthe RDP.
43/2120/2017: 1:00-45:00 pm: Continued working through *Began gathering the list identifying URLs of all accelerators that we do not havein a TextPad file called Accelerator URLs. Ramee and Juliette are now helping us gather cohort data for those missing acceleratorsParticipated in the SQL training session.
43/2422/2017 91:00-15:00 pm: Updated Veeral on current state of project. Typed up a to-do list on the discussion wiki *Made tables in Terminal for Veeral. Got new cohort data on an accelerator Accelerator companies matched with VC companies and added it to Excel filefor Cohort Data.
53/327/2017 111:00-14:00 pm: Talked to Ed and Anne about future report. Continued working through list of crunchbase potential accelerators. Last day *Compiled all URLs of work for this semesteraccelerator into a TextPad file.
93/1129/2017 21:00-5:00 pm: Spoke to Ed about *Worked on the project going forwardmatched data with Ben. Organized Next time I will run the current updated data for our projectRegEx code that will filter the URLs, and I will look through the duplicates where two different VC backed company names matched to one cohort company name.
93/1231/2017 31:00-52:00 pm: Began going through *Ran the Cleaned Cohort Data Excel file and found a few problems with it. Will continue the cleaning process code for accelerator urls which are ready to be run through the rest of wayback machine in order to get the weekstart dates. Also began looking through vc backed company names.
94/133/2017 21:00-5:00 pm: Sorted *Continued looking through Cleaned Cohort Data and finalized our List of Acceleratorsdouble matched VC companies. We can begin the process of creating our PercentVC tableLearned more SQL from Ed.
94/145/2017 31:00-5:00 pm: Completely finalized our data set of accelerators *Made the final vc percentage table on terminal and startups. Met with Michelle Passo to discuss objectives of the research for credit coursenext time I will collect missing accelerator data.
94/187/2017 21:00-43:00 pm: Talked with Peter about the LinkedIn crawler *Began collecting cohort data. Went through VC page for big accelerators that Meghana sent mewere missing from our list in order to add it to our final list of cohort companies.
94/1910/2017 31:00-5:00 pm: Completed SDC pull of updated VC Data*Finished gathering cohort company names for big accelerators that we were missing and put them into the Cleaned Cohort Companies Excel file. Ben is looking through Crunchbase data in order to possibly find more missing accelerators.
4/14/2017 1:00-4:00 pm*Began working through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators and wrote notes on the ones that I was able to go through. Need to finish this textpad before moving forward. 4/17/2017 1:00-4:00 pm*Continued going through potential Crunchbase accelerators that we may have missed. Talked to Ed about getting a more comprehensive list from Excel file and by the end of the semester have the tables and data collected and done. 4/19/2017 1:00-4:00 pm*Worked with Jeemin to generate an entire list of potential US accelerators from crunchbase. Worked to find a way to classify accelerators just based on their descriptions. 4/21/2017: 1:00-4:00 pm*Continued working through the list identifying accelerators that we do not have. Ramee and Juliette are now helping us gather cohort data for those missing accelerators. 4/24/2017 9:00-1:00 pm*Updated Veeral on current state of project. Typed up a to-do list on the discussion wiki for Veeral. Got new cohort data on an accelerator and added it to Excel file. 5/3/2017 11:00-1:00 pm*Talked to Ed and Anne about future report. Continued working through list of crunchbase potential accelerators. Last day of work for this semester. ===Fall 2016=== 10/17/2016 2:00-5:00 pm*Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of what I believe it is and included some helpful links 10/18/2016 4:00-6:00 pm*Met with research partner Shrey who filled me in on where we are with the project; Began looking on websites of certain accelerators for how to determine their cohorts and listed these steps on the wiki 10/19/2016 2:00-5:00 pm*Finished looking on the remaining accelerator websites and wrote the steps on determining how to manually locate the cohorts. 10/20/2017 2016 4:00-6:00 pm*Met with Peter and Christy to discuss the possibility of creating a web crawler that will pull data from individual accelerator sites. 10/24/2016 2:00-5:00 pm*Brainstormed with Albert and Julia about changes to the category name for SBDE. Spoke to Ed about full scope of accelerator project. 10/25/2016 4:00-6:00 pm*Brainstormed with Shrey about different potential industry focuses within accelerators, as well as different variables to search for in terms of accelerators, startups, cohorts, etc. 10/26/2016 2:00-5:00 pm*Began searching for more databases including lists of accelerators as well as some characteristics of those accelerators; Began searching for characteristics that identify accelerators on their websites 10/27/2016 4:00-6:00 pm*Continued searching for relevant lists of accelerators to include on our page. Added some links that have high potential under the tab (Obtained from List of Accelerators or various Google searches). 10/31/2016 2:00-5:00 pm*Began constructing a list of variables that clearly distinguish an accelerator on its website. This is in an effort to allow a crawler to crawl through many Google searches and identify accelerators. 11/1/2016 4:00-6:00 pm*Continued looking for variables that could identify accelerators from their websites. Searched through numerous different websites of accelerators obtained from our current databases. 11/2/2016 2:00-4:00 pm*Continued combing through websites of numerous accelerators, well-known and other, in the hopes of finding identifying variables. 11/3/2016 4:00-6:00 pm*Finalized my list of variables that could be used to distinguish the websites of accelerators. Slightly re-arranged our list of accelerator databases in order of relevance. 11/7/2016 2:00-5:00 pm*Began compiling the list of all accelerators. Created a new TextPad document with information from a new database. 11/8/2016 4:00-6:00 pm*Worked with Shrey and Ben in order to compile all of our accelerator databases into one long list on Textpad. 11/9/2016 2:00-5:00 pm*Continued formulating a database for all accelerators and all of the available info given. 11/10/2016 4:00-6:00 pm*Worked with Shrey and Peter in order to develop a crawler for f6s. 11/14/2016 2:00-5:00 pm*Began sorting the Seed-DB database in an Excel document. 11/15/2016 4:00-6:00 pm*Conducted some Google searches in an attempt to find more accelerator databases. Began looking through Executive Orders searching for keywords. 11/16/2016 2:00-5:00 pm*Completed searching through Executive Orders. 11/17/2016 4:00-6:00 pm*Continued working on Google searches for state accelerator list. Looked through f6s for common words that can be used to distinguish accelerators once we have finalized the crawler. 11/21/2016 2:00-5:00 pm*Randomly chose 10 accelerators from Excel list of accelerators on the RDP. Went through each website and listed the steps that I took in order to determine whether or not the website belonged to an accelerator. Will continue extracting cohort information tomorrow. 11/22/2016 4: Attempted several times 00-6:00 pm*Listed out all steps for extracting cohort information from the ten randomly chosen accelerators. Worked with Peter in order to run build a tool that will search all of the MatcherHTMLs and attempt to identify each one as an accelerator as well as extract some basic information. Cleaned  11/28/2016 2:00-5:00 pm*Merged the F6S accelerator list with our pulled other list, then posted it on the project page. Learned process for accelerator dataextraction from Ed. 11/29/2016 4:00-6:00 pm*Began process of collecting data from the 20 accelerators that I am responsible for. 11/30/2016 2:00-5:00 pm*Continued collecting data from accelerators. Finished 15/20. 12/1/2016 4:00-6:00 pm*Continued collecting data from accelerators. Finished original 20, picked up a new set of 20. 12/2/2016 2:00-5:00 pm*Continued collecting data from accelerators. Finished next 20. 12/8/2016 1:00-3:00 pm*Completed collecting data from accelerators for the semester.
[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]]
[[Category:Work Log]]

Navigation menu