Changes

Jump to navigation Jump to search
Grace:
*Process and Join in new timing data - new date located in Z:/accelerator/Formatted Timing Info.txt
*Make a category group to minorcode lookup- in E:/McNair/Projects/Accelerators/Summer 2018/Cohorts Final - minor code priority Grace.xlsx*Run WHOIS crawler on all valid URLs (not facebook pages, etc.)- Maxine did this*Founders Experience: Match Employers to VC funds/firms, VC backed startups (requires data from Augi) - added 2 columns to The File to Rule Them All (VC and VC start up)
==Accelerator Data Assembly Progress (Hira) ==
*All data files are in Z:/accelerator
*The SQL file that loads all data is: LoadAccData.sql. It is located in E:\McNair\Projects\Accelerators\Summer 2018.
=== Data assembly details ===
6) founders_experience - source file: "The File to Rule Them All - Founders experience sheet"
*Accelerator text, *First_Name text,*Last_Name text, *Full_Name text, *Employer text, *VC varchar(5), *VC_backed_startup varchar(5),*OLD_Job_Title text, *NEW_Job_Title text, *Dates_Employed text, *Time_Employed text, *Location text, *Extra_Description text 7) additional_timing_info - source file: "merging_work.xlxs" located in: E:\Projects\McNair\Seed DB8) additional_timing_info2 - source file: "formatted timing info2.txt" located in E:\Projects\McNair\Accelerators\Summer 2018. This was collected through MTurks. Tables 7 and 8 include columns:*coname*acceleratorname*cohort_name*date*month*year*season 9) timing_combined - This table combines all timing information we have and appends tables 4, 7 and 8.10) cohortcompanies_wtiming - merges data in tables cohortcompany and timing_combined  ==Grace's Code=====format_timing.py=== E:/McNair/Projects/Accelerators/Summer 2018/format_timing.py Input: a txt file with accelerator mapped to multiple companies(in a single cell separated by columns or in separate rows) Output: txt file with companies mapped to accelerators with all the other information in the original file ===prioritycodecategory.py=== E:/McNair/Projects/Accelerators/Summer 2018/prioritycodecategory.py Input: txt file with a list of category groups (Column Y of Cohorts Final in The File to Rule Them All) Output: txt file with line number and minor code  Final output: I took the txt file and copied the codes and pasted it into the added column Z of the Cohorts Final sheet from The File to Rule Them All. E:/McNair/Projects/Accelerators/Summer 2018/Cohorts Final - minor code priority Grace.xlsx Chooses code based on important from priority ranking dictionary before choosing arbitrarily. ===codecategory.py=== E:/McNair/Projects/Accelerators/Summer 2018/codecategory.py Input: txt file with a list of category groups (column Y of Cohorts Final in The File to Rule Them All) Output: txt file with line number and multiple minor codes Final output: I took the minor codes and copied them into column Z of this sheet (a copy of The File to Rule Them All with this added column)  E:/McNair/Projects/Accelerators/Summer 2018/Cohorts Final - minor code (no priority) Grace.xlsx I arbitrarily chose the first code when multiple were given. I fixed this in excel by separating on commas. I also manually did a lot of them which is why there are mode values in this file than the one with priority.
108

edits

Navigation menu