Changes

Jump to navigation Jump to search
Grace:
*Process and Join in new timing data - new date located in Z:/accelerator/Formatted Timing Info.txt
*Make a category group to minorcode lookup- in E:/McNair/Projects/Accelerators/Summer 2018/Cohorts Final - minor code priority Grace.xlsx*Run WHOIS crawler on all valid URLs (not facebook pages, etc.)- Maxine did this*Founders Experience: Match Employers to VC funds/firms, VC backed startups (requires data from Augi) - added 2 columns to The File to Rule Them All (VC and VC start up)
==Accelerator Data Assembly Progress (Hira) ==
*All data files are in Z:/accelerator
*The SQL file that loads all data is: LoadAccData.sql. It is located in E:\McNair\Projects\Accelerators\Summer 2018. === Data assembly details === The SQL file LoadAccData.sql currently loads data on Cohorts final and Founders from: E:\McNair\Projects\Accelerators\Summer 2018\The File To Rule Them All.xlsx It creates the following tables: 1) cohortsfinal - source file: Cohorts Final sheet in "The File to Rule Them All". 2) CohortCompany - this uses data in cohortsfinal and creates a table with the following:*conamestd*conameorg*colocation*city*state_code*country_code*address*codescription*short_desc*long_desc*cosectors*costatus*cofundingstage*courl*uuid*category_list*category_group_list*founded_on*employee_count*emp_count_scale*linkedin_url*gotvc 3)CohortParticipation - uses table cohortsfinal*cohort*year*quarter*accelerator*conamestd  4) timing_final - This table is based on the most updated information on timing compiled in source file: Z:/accelerator/Formatted Timing Info.txt (by Grace). It includes:*coname*acceleratorname*keyword*url*webpage *predicted *gooddata*page_details *full_date *month*year*cohort_name*notes*prog_duration_wks*actual_date *actual_month*actual_year *season   5) Founders - source file: "The File to Rule Them All - Founders main sheet"*Accelerator *First_Name*Last_Name*Full_Name*Current_Job *Current_Location 6) founders_experience - source file: "The File to Rule Them All - Founders experience sheet"*Accelerator *First_Name *Last_Name*Full_Name*Employer *VC *VC_backed_startup *OLD_Job_Title*NEW_Job_Title *Dates_Employed*Time_Employed*Location *Extra_Description 7) additional_timing_info - source file: "merging_work.xlxs" located in: E:\Projects\McNair\Seed DB8) additional_timing_info2 - source file: "formatted timing info2.txt" located in E:\Projects\McNair\Accelerators\Summer 2018. This was collected through MTurks. Tables 7 and 8 include columns:*coname*acceleratorname*cohort_name*date*month*year*season 9) timing_combined - This table combines all timing information we have and appends tables 4, 7 and 8.10) cohortcompanies_wtiming - merges data in tables cohortcompany and timing_combined  ==Grace's Code=====format_timing.py=== E:/McNair/Projects/Accelerators/Summer 2018/format_timing.py Input: a txt file with accelerator mapped to multiple companies(in a single cell separated by columns or in separate rows) Output: txt file with companies mapped to accelerators with all the other information in the original file ===prioritycodecategory.py=== E:/McNair/Projects/Accelerators/Summer 2018/prioritycodecategory.py Input: txt file with a list of category groups (Column Y of Cohorts Final in The File to Rule Them All) Output: txt file with line number and minor code  Final output: I took the txt file and copied the codes and pasted it into the added column Z of the Cohorts Final sheet from The File to Rule Them All. E:/McNair/Projects/Accelerators/Summer 2018/Cohorts Final - minor code priority Grace.xlsx Chooses code based on important from priority ranking dictionary before choosing arbitrarily. ===codecategory.py=== E:/McNair/Projects/Accelerators/Summer 2018/codecategory.py Input: txt file with a list of category groups (column Y of Cohorts Final in The File to Rule Them All) Output: txt file with line number and multiple minor codes Final output: I took the minor codes and copied them into column Z of this sheet (a copy of The File to Rule Them All with this added column)  E:/McNair/Projects/Accelerators/Summer 2018/Cohorts Final - minor code (no priority) Grace.xlsx I arbitrarily chose the first code when multiple were given. I fixed this in excel by separating on commas. I also manually did a lot of them which is why there are mode values in this file than the one with priority.
108

edits

Navigation menu