Changes

1,566 bytes added , 13:38, 8 October 2019

no edit summary

[[Jeemin Sim]] [[Work Logs]] [[Jeemin Sim (WorkLog)|(log page)]] ===Spring 2018=~~2/6/~~==2017 ~~MONDAY ''2PM~~-~~6PM''~~01-11:*Was briefed on [[Urban Start-up Agglomeration and Venture Capital Investment]]*Going forward - running regressions in R in PostgreSQL and with desired layers (instead of levels) ===Fall 2017=== <onlyinclude> 2017-12-05:* Uploaded tables with KML files in TIF folder in bulk(E:) drive** Allegheny County** Atlanta** Chicago** Columbus, OH** ~~Set up wikiPage & remote desktop~~Dublin, OH** Vermont** Washington, D.C. * ~~Started working on python version of web crawler~~Updated documentation for: ** http://www.edegan. ~~So far it successfully prints out a catchphrase~~com/wiki/ ~~description~~ TIF_Project#Uploading_TIF_Data_onto_database_.28tigertest.29 * shp2pgsql needs to be installed to upload Shapefiles to PostgreSQL database** applies for ~~one website~~Houston, TX and Dallas, TX** DONE 2017-12-04:* To upload KML file into database with specified table name: researcher@McNairDBServ:/bulk/tigertest$ ogr2ogr -f PostgreSQL PG:"dbname=tigertest" chicagotif.kml -nln chicagotif* chicagotif table now resides in tigertest database 2017-11-30:* Displayed map with TIF districts and startups* Location: E:\McNair\Projects\Agglomeration\TIF\McNair_Color_Chicago_TIF_and_startups. ~~To be worked on~~png* Added information and steps to ArcMap/ArcGIS documentation page* Create a project page for 'Working with POSTGIS' and add instructions for uploading KML file onto POSTGIS** Command used : *** Logged in as : researcher@McNairDBServ /bulk/tigertest$ ogr2ogr -f PostgreSQL PG:"dbname=tigertest" chicagotif. ~~The python~~ kml * chicago TIF kml file currently downloaded in tigertest with a table name of Layer0** Figure out how to change layer name while loading kml file ~~can be found~~ ** Instructions pulled from : http://wiki.wildsong.biz/index.php/Loading_data_into_PostGIS#Loading_data_from_KMZ_files 2017-11-28:* Created and edited [[ArcMap / ArcGIS Documentation]]* Plotted points for TIFS and Startups in Chicago inone map.** Location: E:\McNair\Projects\~~Accelerators~~Agglomeration\~~Python WebCrawler~~TIF\~~webcrawlerpython~~Jeemin_Chicago_TIF_and_Startups_Attempt1* Used https://mygeodata.cloud/result to convert from KML file to CSV (which was then saved as txt file to be uploaded onto ArcMap)* Text files located in Local Disk (C:) Drive 2017-11-27:*Download Chicago TIF data 2017-11-13:* 2017-11-09: * Notes on data downloaded:** Year 2010-2012 data are based on total population, not 25 yrs or over (case for all other tables)*** Record appears five times total, with same exact column name***For exmaple: 'Total; Estimate; High school graduate (includes equivalency)' appears five times, with different values.* TODO:** Make Projects page for ACS Data***[[American Community Survey (ACS) Data]] 2017-11-07:*Yesterday, narrowed down columns of interest from ACS_S1501_educationattain_2016 table.py

~~====2/8/2017 WEDNESDAY~~ Id Id2 Geography Total; Estimate; Population 25 years and over Total; Estimate; Population 25 years and over - High school graduate (includes equivalency) Total; Margin of Error; Population 25 years and over - High school graduate (includes equivalency) Total; Margin of Error; Population 25 years and over - High school graduate (includes equivalency) Percent; Margin of Error; Population 25 years and over - High school graduate (includes equivalency) Total; Estimate; Population 25 years and over - Associate's degree Total; Margin of Error; Population 25 years and over - Associate'~~9AM~~s degree Percent; Estimate; Population 25 years and over -~~11AM~~Associate's degree Percent; Margin of Error; Population 25 years and over - Associate'~~====~~ s degree* Attempted to come up with possible cases for locating the description Total; Estimate; Population 25 years and over - Bachelor's degree Total; Margin of Error; Population 25 years and over - Bachelor's degree Percent; Estimate; Population 25 years and over - Bachelor's degree Percent; Margin of Error; Population 25 years and over - Bachelor's degree Total; Estimate; Population 25 years and over - Graduate or professional degree Total; Margin of Error; Population 25 years and over - Graduate or professional degree Percent; Estimate; Population 25 years and over - Graduate or professional degree Percent; Margin of ~~accelerators~~ Error; Population 25 years and over - ~~pick up from extracting bodies~~ Graduate or professional degree Percent; Estimate; Percent high school graduate or higher Percent; Margin of ~~text from the about page (given that it exists)~~Error; Percent high school graduate or higher Percent; Estimate; Percent bachelor's degree or higher Percent; Margin of Error; Percent bachelor's degree or higher

~~====2/13/2017 MONDAY~~ *Complications:**For csv files corresponding to years 2015 & 2016, all of the above columns exist.**For csv files corresponding to years 2005 - 2014, no 'Percent'~~2PM-6PM~~columns exist*** Instead their 'Total'~~====~~columns are percentage values* ~~Goals (for trials): 1) Build ER Diagram 2)~~ *For ~~each entity~~csv file corresponding to year 2005, ~~get XML snippet 3) Build a parser/ripper for single file; the python parser can be found at: E:\McNair\Projects\FDA Trials\Jeemin_Project~~columns regarding Graduate or professional degree are labeled differently.* ~~[[Trial Data Project]]~~*2012 data doesn't correspond to Population 25 years and over.

~~====2/15/2017 WEDNESDAY ''9AM-11AM''====~~*Temporary Solution:* ~~Discussed with Catherine what to do with FDA Trial data and decided~~ *Since the above problems may be specific to ~~have a dictionary with zip-codes as keys and number~~ this set of ~~trials occurred in that zipcode as values. Was still attempting to loop~~ tables, will go through ~~the~~ csv files ~~without the code having to exist in the same directory as the XML files. Plan to write to excel via tsv, with zip-code as one column~~ and ~~# of occurrence as the other~~adjust columns.

~~====2/17/2017 FRIDAY ''2PM-6PM''====~~* Completed code for counting the number of occurrences for each unique zipcode. (currently titled & located: E:\McNair\Projects\FDA Trials\Jeemin_Project\Jeemin_Running_File.py). It has been running for 20+min because of the comprehensive XML data files. Meanwhile started coding to create a dictionary with the keys corresponding to each unique trial ID, mapped to every other information (Python script location~~, sponsors, phase, drugs ...etc.) (currently titled & located~~: E:\McNair\Projects\~~FDA Trials~~Agglomeration\~~Jeemin_Project~~ACS_Downloaded_Data\~~Jeemin_FDATrial_as_key_data_ripping~~pullCertainColumns.py).

~~====2/20/~~2017 ~~MONDAY ''2PM~~-410-31:~~30PM''====~~* ~~Continued working on Jeemin_FDATrial_as_key_data_ripping~~Finished doanloading files from ACS.* Started loading tables into tigertest.~~py to find tags and place all of those information~~ * Commands run could be found in a list. The other zipcode file did not finish executing after 2+ hours of running it - considering the possibility of splitting the record file into smaller bits, or running the processing on a faster machine E:\McNair\Projects\Agglomeration\ACS_Downloaded_Data\DataLoading_SQL_Commands.txt

~~====2/22/~~2017 ~~WEDNESDAY ''9AM~~-1210-30:~~30PM''====~~* ~~Finished Jeemin_FDATrial_as_key_data_ripping.py (E:\McNair\Projects\FDA Trials\Jeemin_Project\Jeemin_FDATrial_as_key_data_ripping.py)~~Downloaded data from ACS, ~~which outputs~~ to be continued* File path: E:\McNair\Projects\~~FDA Trials~~Agglomeration\~~Jeemin_Project\general_data_ripping_output.txt; TODO~~ACS_Downloaded_Data* Fields of interest: ~~output four different tables & replace the write in the same for~~ S1401 SCHOOL ENROLLMENT S1501 EDUCATIONAL ATTAINMENT S2301 EMPLOYMENT STATUS B01003 TOTAL POPULATION B02001 RACE B07201 GEOGRAPHICAL MOBILITY B08303 TRAVEL TIME TO WORK B19013 MEDIAN HOUSEHOLD INCOME B19053 SELF-~~loop as going through each file~~EMPLOYMENT INCOME IN THE PAST 12 MONTHS FOR HOUSEHOLDS B19083 GINI INDEX OF INCOME INEQUALITY B25003 TENURE B25105 MEDIAN MONTHLY HOUSING COSTS B28011 INTERNET SUBSCRIPTIONS IN HOUSEHOLD G001 GEOGRAPHIC IDENTIFIERS

~~====2/24/~~2017 ~~FRIDAY ''2:30PM~~-610-23:~~30PM''====~~* ~~Continued working on producing multiple tables - first two are done~~Talked to Ed with Peter & Oliver about upcoming tasks & projects. ~~Was working~~ * Loaded acs_place table 2017 (does not contain population) on ~~location, as there are multiple location tags per location~~tigertest.** SQL commands used:

~~====2/27/2017 MONDAY ''2PM-6PM''====~~* Finished producing tables from Jeemin_FDATrial_as_key_data_ripping.py* Talked to Julia about LinkedIn data extracting - to be discussed further with Julia & Peter.* Started web crawler for Wikipedia - currently pulls Endowment, Academic staff, students, undergraduates, and postgraduates info found on Rice Wikipedia page. Can be found in : E:\McNair\Projects\University Patents\Jeemin_University_wikipedia_crawler.py DROP TABLE acs_place;

~~====3/1/2017 WEDNESDAY ''9AM-12PM''====~~ CREATE TABLE acs_place (* Started re-running Jeemin_FDATrial_as_key_data_ripping.py USPS varchar(5), GEOID varchar(30), ANSICODE varchar(30), NAME varchar(100), LSAD varchar(30), FUNCSTAT varchar(10), ALAND varchar(30), AWATER varchar(30), ALAND_SQMI varchar(30), AWATER_SQMI varchar(30), INTPTLAT varchar(30), INTPTLONG varchar(30) );

~~====3~~ \COPY acs_place FROM '/3bulk/~~2017 FRIDAY~~ 2017_Gaz_place_national.txt'~~'2PM~~; --~~5PM''====~~* Attempted to output sql tablesCOPY 29578

~~====3/6/017 MONDAY ''2PM-6PM''====~~* TODO:* ~~[[Installing python in a database]]~~* Find acs place 2016 data for population* ~~Added building Python function section to [[Working with PostgreSQL]] at the bottom of the page.~~* ~~Ran FDA Trial data ripping again~~Find larger acs files, as ideally at the ~~text output files were wiped.~~ place level* ~~Plan on discussing with Julia and Meghana again about pulling universities and other relevant institutions from the Assignee List USA.~~ * ~~Talked to Sonia about pulling city, state, zipcode information, hence python was installed in a database. Will work with Sonia~~ Provide more documentation on ~~Wednesday afternoon and see how best a regex function could be implemented~~POSTGIS & geocoding

~~====3/8/~~2017 ~~WEDNESDAY ''9AM~~-~~12PM''====~~10-16:* ~~Output sql tables~~ Exported maps of points from ~~finished run of Jeemin_FDATrial_as_key_data_ripping~~the Bay Area each year.py * ~~Ran through assigneelist_USA.txt to see how many different ways UNIVERSITY could be spelled wrong. There were many~~* Map used location: E:\McNair\Projects\Agglomeration\HarrisonPeterWorkArcGIS\Jeemin_Bay_Area Points_Every_Year\BayAreaEveryYearMap** Zoom scale: 1:650.000* ~~Tried to logic through creating a pattern that could catch all different versions~~ Location of ~~UNIVERSITY. Discuss further on whether UNIVERSITIES and those that include UNIVERSITIES but include~~ Bay Area Points png files: ~~INC in the end should be pulled as relevant information~~E:\McNair\Projects\Agglomeration\HarrisonPeterWorkArcGIS\Jeemin_Bay_Area Points_Every_Year

~~====3~~2017-10-10:*Discoveries/ Struggles regarding.gdb to .shp file conversion:** Esri Production Mapping (costly)*** License needs to be purchasesd: http://8www.esri.com/~~2017 WEDNESDAY ''2PM~~software/arcgis/extensions/production-~~5PM ''====~~mapping/pricing* ~~Wrote regex pattern that identifies all "university" matchings~~ * Use ogr2ogr from gdal package*** https://gis.stackexchange.com/questions/14432/migrating-geodatabase-data-into-postgis-without-esri- ~~can be found in E~~apps*** Command:~~\McNair\Projects\University Patents\university_pulled_from_assignee_list_USA~~ ogr2ogr -~~- is an output~~ f "ESRI Shapefile" [Destination of shapefile] [path to gdb file]* ~~Talked to Sonia, but didn't come to solid conclusion on identifying whether key words associate with city or country by running a python function~~** Problem installing gdal

~~====3/13/~~2017 ~~MONDAY ''12PM~~-~~2PM~~10-09:* TODO'~~'====~~S:** Downloading data onto tigertest*** Road*** Railway*** Coastline* ~~For University Patent Data Matching - matched SCHOOL (output~~** Instructions: Ehttp:~~\McNair\Projects\University Patents\school_pulled_from_assignee_list_USA~~//www.edegan.com/wiki/PostGIS_Installation#Bulk_Download_TIGER_Shapefiles** Configure census data from American Community Survey (ACS)*** 1) ~~and matched INSTITUTE~~Work out what data is of our interest (~~output~~confirm ACS)*** 2) Determine appropriate shape file unit: ~~E:\McNair\Projects\University Patents\institute_pulled_from_assignee_list_USA)~~**** census block vs. census block group vs. census track* ~~[[University Patent Matching]]~~ * ~~To be worked on later: Grant XML parsing & general name matcher~~* 3) Load into tigertest

~~====3~~* Done:** Downloaded data from https://www.census.gov/cgi-bin/14geo/~~2017 TUESDAY ''12PM~~shapefiles/index.php tl_2017_us_coastline -- 4209 tl_2017_us_primaryroads -- 11574 tl_2017_us_rails -~~2PM''====~~- 176237* ~~Started pulling academy cases but there are too many cases~~ * Link found to ~~worry about, in terms of institution of interest~~potentially download ACS data: https://www. ~~A document is located in E:\McNair\Projects\University Patents\academies_verify_cases~~census.~~txt~~* Need Juliagov/geo/maps-data/data/~~Meghana to look through the hits and see which are relevant & extract pattern from there~~tiger-data. html* ~~Having trouble outputting txt file without double quotes around every line.~~ * Thinking that one text file should be output for all keywords instead of having one each, to avoid overlap (ex) COLLEGE and UNIVERSITY are both keywords; ALBERT EINSTEIN COLLEGE OF YESHIVA UNIVERSITY will be hit twice if * But most files on it ~~were counted as two separate instances, one accounting for COLLEGE~~ come with .gdb extension and ~~the other for UNIVERSITY) - either in the form of if-elseif statements or one big regex check~~not .shp

~~====3/15/~~2017 ~~WEDNESDAY ''9AM~~-~~1PM''====~~10-03:* ~~Todo: write a wikipage~~ Installed PostGIS & is now visible on ~~possible input/output info on string matcher~~* Wrote part of XML parser, extracted yearly data into E:\McNair\Projects\Federal Grant Data\NSF\NSF Extracted Data (up to year 2010)pgAdmin III

~~====~~* ArcGIS (connect to postgis database):** 1) Open ArcMap** 2) Either open blank or open existing file/project** 3~~/16/2017 THURSDAY~~ ) Click on 'Add Data'~~12PM~~button with a cross and a yellow diamond (under Selection toolbar)** 4) Go to the top-~~2PM~~most directory by pressing on the arrow that points left-then-up (on the left of home button)** 5) Click on 'Database Connections'** 6) Click on 'Add Database Connection'~~====~~(if Connection to localhost.sde) does not exist already)** 7) Fill in the following fields:*** Database Platform: PostgreSQL*** Instance: localhost* ~~Further documented [[University Patent Matching]]~~** User name: postgres* ~~Finished writing XML Parser~~** Password: *** Database: tigertest** 8) Press 'OK'** 9) Now you'll have 'Connection to localhost.sde' in your Database Connections** 10) Double click on 'Connection to localhost.sde'** 11) Double click on the table of interest** 12) Click 'Finish'** 13) You'll see information populated on map, as one of the 'Layers'*** Tested with: tigertest.public.copointplacescontains

~~====3/20/2017 MONDAY~~ * On running & altering Oliver'~~'2PM~~s script: ** Location: E:\McNair\Projects\OliverLovesCircles\src\python\vc_circles.py** Ed manipulated file names so that underscores would replace dots (St.Louis --~~6PM''====~~> St_Louis)* ~~Talked to Julia about universal matcher, want to combine all University~~ * Takes in instances and sweep times as part of ~~California's to University of California~~the argument, ~~The Regents of~~but not impactful as those variables are hardcoded in the script* ~~Converted crunchbase2013 data from mySQL to PostgreSQL, but having trouble~~ * Ran vc_circles.py with the ~~last table - cb_relationships, complains about syntax error at or near some places - but generally all tables exist in database called crunchbase~~following variables with changed values:*** SWEEP_CYCLE_SECONDS = 10 (used to be 30)*** NUMBER_INSTANCES = 16 (used to be 8)* ~~Federal Grant Data XML Parser was run - the three~~ * New output ~~textfiles can~~ to be found in : E:\McNair\Projects\~~Federal Grant Data~~OliverLovesCircles\~~NSF~~out

~~====3/22/~~2017 ~~WEDNESDAY ''9AM~~-~~12PM~~10-02:* Talked to Harrison & Peter regarding ArcGIS** Currently have points plotted on Houston** Trouble interpreting geometry type, as currently reads in from text file** Documents located in : E:\McNair\Projects\Agglomeration\HarrisonPeterWorkArcGIS* Attempted to install PostGIS spatial extention from PostgreSQL but getting 'spatial database creation failed'~~====~~error message.* ~~Read string matching & calculating distance, below are relevant links~~* Referenced instructions: * ~~[http~~** https://www.~~cs.cmu~~gpsfiledepot.~~edu~~com/~~~wcohen~~tutorials/~~postscript/ijcai~~installing-and-setting-up-postgresql-wswith-~~2003.pdf]~~postgis/* ~~[http://web.archive.org/web/20081224234350/~~** http://www.~~dcs~~bostongis.~~shef.ac.uk~~com/~~~sam/stringmetrics~~PrinterFriendly.~~html]~~aspx?content_name=postgis_tut01

~~====3/24/~~2017 ~~FRIDAY ''2PM~~-~~5PM''====~~09-26:* ~~Discussed with Julia & Meghana about university keys~~ Created a table that maps a state to ~~use to count~~ the database name.** http://www.edegan.com/wiki/PostGIS_Installation# ~~of occurrences~~Translating_Table_names_to_corresponding_States* Added more GIS-information (functions, ~~including aliases and misspellings~~realm & outliers to consider)* ~~Thoughts~~* http: ~~to use a scoring metric with a key of UNIVERSITY OF CALIFORNIA SYSTEM, it should have a 'better' score when compared to MATHEMATICAL SCIENCES PUBLISHERS C~~/O UNIVERSITY OF CALIFORNIA BERKELEY or CALIFORNIA AT LOS ANGELES, UNVIERSITY OF than when compared to UNIVERSITY OF SOUTHERN CALIFORNIA, which may pose a challenge when attempting to implement this in a more general sense/www. ~~In normalizing a string, strip "THE", "," and split words by spaces and compare each keyword from the two strings~~edegan. ~~Deciding on which strings~~ com/wiki/Urban_Start-up_Agglomeration#GIS_Resources* Visualization in PostGIS or connecting to ~~compare will be another issue - length~~ ArcGIS for visualization (~~within some range maybe~~import/export data) ~~could be an option.~~ * ~~Federal Grant Data XML Parser was rerun - same output textfiles~~Spatial indexing:** http://revenant.ca/www/postgis/workshop/indexing.html

~~====3/27/~~2017 ~~MONDAY ''2PM~~-~~6PM''====~~09-25:* ~~Writing code for university matches - decided~~ Talked to ~~go through keys instead~~ Ed about GIS, Census data, and going about determining the correctness of reported 'place.' Currently script makes a cross product of each ~~dataitem~~reported place and an existing place, outputting a column of boolean value to indicate whether the reported place's coordinates fell within a place's geometric boundaries. ~~Use keywords in each key~~ One other way of going about this which we discussed is to first check if the reported place does fall within that place's boundaries. If it isn't, we'll go ~~through~~ about the ~~dataitem - misspellings are currently unaccounted for~~cross product method.

~~====3~~* To add documentation : ** http:/29/~~2017 WEDNESDAY''2PM-5PM''====~~* Troubled by the variety of cases - separating keys by keywords will not work favorably when it hits University of California vswww. ~~University of Southern California case - find a way to match University of Southern California first (more specific ones first) - but how to generalize~~edegan...~~====4~~com/12wiki/~~2017 WEDNESDAY''9AM-12PM''====~~PostGIS_Installation* ~~Finishing up cleaning the columns for Federal Grant Data - NIH. The output excel files can be accessed at~~* http: ~~E:\McNair\Projects\Federal Grant Data\NIH\Grants~~ ~~Titled:~~ ~~Jeemin_combined_files 1986-2001~~//www.~~csv~~ ~~Jeemin_combined_files 2002-2012~~edegan.~~csv~~ ~~Jeemin_combined_files 2013~~com/wiki/Urban_Start-~~2015.csv~~up_Agglomeration

* ~~psql table formula:~~Discussed the need to maintain venture capital database.

~~CREATE TABLE all_grants (~~*Relevant File paths: ~~APPLICATION_ID integer,~~**E:\McNair\Projects\Agglomeration\TestGIS.sql ~~ACTIVITY varchar(3),~~**Z:\VentureCapitalData\SDCVCData\vcdb2\ProecssingCoLevelSimple.sql ~~ADMINISTERING_IC varchar(2),~~ ~~APPLICATION_TYPE varchar(1),~~ ~~ARRA_FUNDED varchar(1),~~ ~~AWARD_NOTICE_DATE date,~~ ~~BUDGET_START date,~~ ~~BUDGET_END date,~~ ~~CFDA_CODE varchar(3),~~ ~~CORE_PROJECT_NUM varchar(11),~~ ~~ED_INST_TYPE varchar(30),~~ ~~FOA_NUMBER varchar(13),~~ ~~FULL_PROJECT_NUM varchar(35),~~ ~~FUNDING_ICs varchar(40),~~ ~~FUNDING_MECHANISM varchar(23),~~ ~~FY smallint,~~ ~~IC_NAME varchar(77),~~ ~~NIH_SPENDING_CATS varchar(295),~~ ~~ORG_CITY varchar(20),~~ ~~ORG_COUNTRY varchar(16),~~ ~~ORG_DEPT varchar(30),~~ ~~ORG_DISTRICT smallint,~~ ~~ORG_DUNS integer,~~ ~~ORG_FIPS varchar(2),~~ ~~ORG_NAME varchar(60),~~ ~~ORG_STATE varchar(2),~~ ~~ORG_ZIPCODE integer,~~ ~~PHR varchar(200),~~ ~~PI_IDS varchar(30),~~ ~~PI_NAMEs varchar(200),~~ ~~PROGRAM_OFFICER_NAME varchar(36),~~ ~~PROJECT_START date,~~ ~~PROJECT_END date,~~ ~~PROJECT_TERMS varchar(200),~~ ~~PROJECT_TITLE varchar(244),~~ ~~SERIAL_NUMBER smallint,~~ ~~STUDY_SECTION varchar(4),~~ ~~STUDY_SECTION_NAME varchar(100),~~ ~~SUBPROJECT_ID smallint,~~ ~~SUFFIX varchar(2),~~ ~~SUPPORT_YEAR smallint,~~ ~~DIRECT_COST_AMT integer,~~ ~~INDIRECT_COST_AMT integer,~~ ~~TOTAL_COST integer,~~ ~~TOTAL_COST_SUB_PROJECT integer~~ ); **Z:\VentureCapitalData\SDCVCData\vcdb2\~~COPY all_grants FROM 'Jeemin_combined_files 1986-2001~~CitiesWithGT10Active.~~csv' WITH DELIMITER AS E'\t' HEADER NULL AS ''CSV~~txt

~~====4/14/~~2017 ~~FRIDAY~~-09-21:* Functions for Linear Referencing: '''ST_LineInterpolatePoint(geometry A, double measure)''': Returns a point interpolated along a line. '''ST_LineLocatePoint(geometry A, geometry B)''~~2PM-5PM~~': Returns a float between 0 and 1 representing the location of the closest point on LineString to the given Point. '''ST_Line_Substring(geometry A, double from, double to)''': Return a linestring being a substring of the input one starting and ending at the given fractions of total 2d length. '''ST_Locate_Along_Measure(geometry A, double measure)''': Return a derived geometry collection value with elements that match the specified measure. '''ST_Locate_Between_Measures(geometry A, double from, double to)'''~~====~~: Return a derived geometry collection value with elements that match the specified range of measures inclusively.* Loaded Federal Grants Data into database '''ST_AddMeasure(geometry A, double from, double to)''': Return a derived geometry with measure elements linearly interpolated between the start and end points. If the geometry has no measure dimension, one is added.

~~====4/17/2017 MONDAY~~*3-D Functions: '''ST_3DClosestPoint''' — Returns the 3-dimensional point on g1 that is closest to g2. This is the first point of the 3D shortest line. '''ST_3DDistance''' — For geometry type Returns the 3-dimensional cartesian minimum distance (based on spatial ref) between two geometries in projected units. '''ST_3DDWithin''' — For 3d (z) geometry type Returns true if two geometries 3d distance is within number of units. '''ST_3DDFullyWithin''' — Returns true if all of the 3D geometries are within the specified distance of one another. '''~~2PM~~ST_3DIntersects''' — Returns TRUE if the Geometries “spatially intersect” in 3d -~~4PM~~only for points and linestrings ''~~====~~'ST_3DLongestLine''' — Returns the 3-dimensional longest line between two geometries '''ST_3DMaxDistance''' — For geometry type Returns the 3-dimensional cartesian maximum distance (based on spatial ref) between two geometries in projected units. '''ST_3DShortestLine''' — Returns the 3-dimensional shortest line between two geometries

~~====4/17/2017 WEDNESDAY''9AM-12PM''====~~* ~~To pull accelerators~~Relevant PostgreSQL Commands: ~~Wrote simple python regex-based script that ran on organizations data.~~ ~~Code: E:~~'''\~~McNair\Projects\Accelerators\Crunchbase Snapshot\accelerator keywords~~dt *.py*''' Show all tables ~~Matched output (885 mathces) : E:~~'''\~~McNair\Projects\Accelerators\Crunchbase Snapshot\Jeemin_885_accel_matches~~q''' Exit table

~~====9~~*Specifities/~~11/2017 MONDAY''4PM-6PM''====~~Outliers to consider: New York (decompose) Princeton area (keep Princeton unique) Reston, Virginia (keep) San Diego (include La Jolla)* Ensured that documentation exists for the projects worked on last semester. Silicon Valley (all distinct)

~~====9~~* Continue reading from: https:/12/~~2017 TUESDAY''9AM-10:40AM & 1PM-2:20PM & 4PM-5:30PM''====~~* Clarified University Matching output filepostgis.* Helped Christy with pdf-reader, capturing keywords in readable formatnet/docs/postgis_installation.html

..2017-09-20:* Attended first intro to GIS course yesterday* Updated above notes on GIS

~~====9/18/2017 MONDAY''4PM-6PM''====~~

* Read documentation on PostGIS and tiger geocoder

* Continue reading from: http://workshops.boundlessgeo.com/postgis-intro/joins.html

~~====9/19/~~2017 ~~TUESDAY''9AM~~-~~10:40AM & 1PM~~09-219:~~20PM''====~~

* Useful functions for spatial joins:

* Continue reading from: http://workshops.boundlessgeo.com/postgis-intro/geography.html

~~====9/20/2017 WEDNESDAY''3:50PM-5:20PM''====~~

* Attended first intro to GIS course yesterday

* Updated above notes on GIS

* Continue reading from:

~~====9/21/~~2017 ~~THURSDAY''9AM~~-1209-18:~~30PM''====~~* ~~Functions for Linear Referencing~~Read documentation on PostGIS and tiger geocoder* Continue reading from: ~~'''ST_LineInterpolatePoint(geometry A, double measure)'''~~http: ~~Returns a point interpolated along a line~~//workshops.boundlessgeo.com/postgis-intro/joins.html ~~'''ST_LineLocatePoint(geometry A, geometry B)'''~~2017-09-12: ~~Returns a float between 0 and 1 representing the location of the closest point on LineString to the given Point~~* Clarified University Matching output file. ~~'''ST_Line_Substring(geometry A, double from~~* Helped Christy with pdf-reader, ~~double to)''': Return a linestring being a substring of the input one starting and ending at the given fractions of total 2d length~~capturing keywords in readable format. ~~'''ST_Locate_Along_Measure(geometry A, double measure)'''~~2017-09-11: ~~Return a derived geometry collection value with elements that match the specified measure.~~ ~~'''ST_Locate_Between_Measures(geometry A, double from, double to)''': Return a derived geometry collection value with elements~~ * Ensured that ~~match~~ documentation exists for the ~~specified range of measures inclusively.~~ '''ST_AddMeasure(geometry A, double from, double to)''': Return a derived geometry with measure elements linearly interpolated between the start and end points. If the geometry has no measure dimension, one is addedprojects worked on last semester.

*3-D Functions: ~~'''ST_3DClosestPoint''' — Returns the 3-dimensional point on g1 that is closest to g2. This is the first point of the 3D shortest line.~~ ~~'''ST_3DDistance''' — For geometry type Returns the 3-dimensional cartesian minimum distance (based on spatial ref) between two geometries in projected units.~~ ~~'''ST_3DDWithin''' — For 3d (z) geometry type Returns true if two geometries 3d distance is within number of units.~~ ~~'''ST_3DDFullyWithin''' — Returns true if all of the 3D geometries are within the specified distance of one another.~~ ~~'''ST_3DIntersects''' — Returns TRUE if the Geometries “spatially intersect” in 3d - only for points and linestrings~~ ~~'''ST_3DLongestLine''' — Returns the 3-dimensional longest line between two geometries~~ ~~'''ST_3DMaxDistance''' — For geometry type Returns the 3-dimensional cartesian maximum distance (based on spatial ref) between two geometries in projected units.~~ ~~'''ST_3DShortestLine''' — Returns the 3-dimensional shortest line between two geometries~~</onlyinclude>

*Relevant PostgreSQL Commands: ~~'''\dt *.*''' Show all tables~~ ~~'''\q''' Exit table~~===Spring 2017===

2017-04-17:*~~Specifities/ Outliers to consider~~To pull accelerators:Wrote simple python regex-based script that ran on organizations data. ~~New York (decompose)~~Code: E:\McNair\Projects\Accelerators\Crunchbase Snapshot\accelerator keywords.py ~~Princeton area~~ Matched output (~~keep Princeton unique)~~ ~~Reston, Virginia (keep)~~ ~~San Diego (include La Jolla)~~ ~~Silicon Valley (all distinct~~885 mathces): E:\McNair\Projects\Accelerators\Crunchbase Snapshot\Jeemin_885_accel_matches

2017-04-14:* ~~Continue reading from: https://postgis.net/docs/postgis_installation.html~~Loaded Federal Grants Data into database

~~====9/25/~~2017 ~~MONDAY''4PM~~-~~6PM''====~~04-12:* ~~Talked to Ed about GIS, Census data, and going about determining~~ Finishing up cleaning the ~~correctness of reported 'place~~columns for Federal Grant Data - NIH.' Currently script makes a cross product of each reported place and an existing place, outputting a column of boolean value to indicate whether the reported place's coordinates fell within a place's geometric boundariesThe output excel files can be accessed at: E:\McNair\Projects\Federal Grant Data\NIH\Grants Titled: Jeemin_combined_files 1986-2001. ~~One other way of going about this which we discussed is to first check if the reported place does fall within that place's boundaries~~csv Jeemin_combined_files 2002-2012. ~~If it isn't, we'll go about the cross product method~~csv Jeemin_combined_files 2013-2015.csv

* ~~To add documentation~~ psql table formula: ** http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation** http://mcnair.bakerinstitute.org/wiki/Urban_Start-up_Agglomeration

* Discussed the need to maintain venture capital database CREATE TABLE all_grants ( APPLICATION_ID integer, ACTIVITY varchar(3), ADMINISTERING_IC varchar(2), APPLICATION_TYPE varchar(1), ARRA_FUNDED varchar(1), AWARD_NOTICE_DATE date, BUDGET_START date, BUDGET_END date, CFDA_CODE varchar(3), CORE_PROJECT_NUM varchar(11), ED_INST_TYPE varchar(30), FOA_NUMBER varchar(13), FULL_PROJECT_NUM varchar(35), FUNDING_ICs varchar(40), FUNDING_MECHANISM varchar(23), FY smallint, IC_NAME varchar(77), NIH_SPENDING_CATS varchar(295), ORG_CITY varchar(20), ORG_COUNTRY varchar(16), ORG_DEPT varchar(30), ORG_DISTRICT smallint, ORG_DUNS integer, ORG_FIPS varchar(2), ORG_NAME varchar(60), ORG_STATE varchar(2), ORG_ZIPCODE integer, PHR varchar(200), PI_IDS varchar(30), PI_NAMEs varchar(200), PROGRAM_OFFICER_NAME varchar(36), PROJECT_START date, PROJECT_END date, PROJECT_TERMS varchar(200), PROJECT_TITLE varchar(244), SERIAL_NUMBER smallint, STUDY_SECTION varchar(4), STUDY_SECTION_NAME varchar(100), SUBPROJECT_ID smallint, SUFFIX varchar(2), SUPPORT_YEAR smallint, DIRECT_COST_AMT integer, INDIRECT_COST_AMT integer, TOTAL_COST integer, TOTAL_COST_SUB_PROJECT integer ); \COPY all_grants FROM 'Jeemin_combined_files 1986-2001.csv' WITH DELIMITER AS E'\t' HEADER NULL AS ''CSV

*Relevant File paths2017-03-29:**E:\McNair\Projects\Agglomeration\TestGIS.sql**Z:\VentureCapitalData\SDCVCData\vcdb2\ProecssingCoLevelSimple.sql**Z:\VentureCapitalData\SDCVCData\vcdb2\CitiesWithGT10ActiveTroubled by the variety of cases - separating keys by keywords will not work favorably when it hits University of California vs.~~txt~~University of Southern California case - find a way to match University of Southern California first (more specific ones first) - but how to generalize

~~====9/26/~~2017 ~~TUESDAY ''9AM~~-~~12PM''====~~* Created a table that maps a state to the database name.** http03-27:~~//mcnair.bakerinstitute.org/wiki/PostGIS_Installation#Translating_Table_names_to_corresponding_States~~* ~~Added more GIS~~Writing code for university matches -~~information (functions, realm & outliers~~ decided to ~~consider)~~** http://mcnair.bakerinstitutego through keys instead of each dataitem.~~org/wiki/Urban_Start-up_Agglomeration#GIS_Resources~~* Visualization Use keywords in ~~PostGIS or connecting~~ each key to ~~ArcGIS~~ go through the dataitem - misspellings are currently unaccounted for ~~visualization (import/export data)~~* Spatial indexing:** http://revenant.ca/www/postgis/workshop/indexing.~~html~~

...~~====10/2/~~2017 ~~MONDAY ''4PM~~-~~6PM''====~~03-24:* ~~Talked~~ Discussed with Julia & Meghana about university keys to use to ~~Harrison & Peter regarding ArcGIS~~** Currently have points plotted on Houston** Trouble interpreting geometry typecount # of occurrences, ~~as currently reads in from text file~~including aliases and misspellings** Documents located in Thoughts: ~~E:\McNair\Projects\Agglomeration\HarrisonPeterWorkArcGIS~~* Attempted to ~~install PostGIS spatial extention from PostgreSQL but getting~~ use a scoring metric with a key of UNIVERSITY OF CALIFORNIA SYSTEM, it should have a '~~spatial database creation failed~~better' ~~error message.~~** Referenced instructions: *** https:score when compared to MATHEMATICAL SCIENCES PUBLISHERS C/~~/www~~O UNIVERSITY OF CALIFORNIA BERKELEY or CALIFORNIA AT LOS ANGELES, UNVIERSITY OF than when compared to UNIVERSITY OF SOUTHERN CALIFORNIA, which may pose a challenge when attempting to implement this in a more general sense.~~gpsfiledepot~~In normalizing a string, strip "THE", "," and split words by spaces and compare each keyword from the two strings.~~com/tutorials/installing~~Deciding on which strings to compare will be another issue -~~and-setting-up-postgresql-with-postgis/~~length (within some range maybe) could be an option. *** http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgis_tut01Federal Grant Data XML Parser was rerun - same output textfiles

~~====10/3/~~2017 ~~TUESDAY ''10AM~~-~~12PM & 1PM~~03-222:~~30PM''====~~* ~~Installed PostGIS~~ Read string matching & ~~is now visible on pgAdmin III~~calculating distance, below are relevant links* [http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf]* [http://web.archive.org/web/20081224234350/http://www.dcs.shef.ac.uk/~sam/stringmetrics.html]

* ArcGIS (connect to postgis database)2017-03-22:** 1) Open ArcMap** 2) Either open blank or open existing file/project** 3) Click on Talked to Julia about universal matcher, want to combine all University of California'~~Add Data' button with a cross and a yellow diamond (under Selection toolbar)~~s to University of California, The Regents of** 4) Go Converted crunchbase2013 data from mySQL to PostgreSQL, but having trouble with the ~~top~~last table -~~most directory by pressing on the arrow that points left-then~~cb_relationships, complains about syntax error at or near some places -~~up (on the left of home button)~~** 5) Click on 'Database Connections'** 6) Click on 'Add Database Connection' (if Connection to localhost.sde) does not but generally all tables exist ~~already)~~in database called crunchbase** 7) Fill in Federal Grant Data XML Parser was run - the ~~following fields:~~*** Database Platform: PostgreSQL*** Instance: localhost*** User name: postgres*** Password: *** Database: tigertest** 8) Press 'OK'** 9) Now you'll have 'Connection to localhost.sde' three output textfiles can be found in ~~your Database Connections~~** 10) Double click on 'Connection to localhost.sde'** 11) Double click on the table of interest** 12) Click 'Finish'** 13) You'll see information populated on map, as one of the 'Layers'*** Tested withE: ~~tigertest.public.copointplacescontains~~\McNair\Projects\Federal Grant Data\NSF

* On running & altering Oliver's script: ** Location: E:\McNair\Projects\OliverLovesCircles\src\python\vc_circles.py** Ed manipulated file names so that underscores would replace dots (St.Louis 2017-03-~~> St_Louis)~~** Takes in instances and sweep times as part of the argument, but not impactful as those variables are hardcoded in the script** Ran vc_circles.py with the following variables with changed values16:*** SWEEP_CYCLE_SECONDS = 10 (used to be 30)Further documented [[University Patent Matching]]*** NUMBER_INSTANCES = 16 (used to be 8)** New output to be found in: E:\McNair\Projects\OliverLovesCircles\outFinished writing XML Parser

~~====10/9/~~2017 ~~MONDAY''3PM~~-~~6PM''====~~* TODO'S03-15:** Downloading data onto tigertest*** Road*** Railway*** Coastline*** Instructions: httpTodo:write a wikipage on possible input/~~/mcnair.bakerinstitute.org/wiki/PostGIS_Installation#Bulk_Download_TIGER_Shapefiles~~output info on string matcher** Configure census Wrote part of XML parser, extracted yearly data ~~from American Community Survey (ACS)~~*** 1) Work out what data is of our interest into E:\McNair\Projects\Federal Grant Data\NSF\NSF Extracted Data (~~confirm ACS)~~*** 2) Determine appropriate shape file unit: **** census block vs. census block group vs. census track*** 3up to year 2010) ~~Load into tigertest~~

* Done2017-03-14:** Downloaded data from httpsStarted pulling academy cases but there are too many cases to worry about, in terms of institution of interest. A document is located in E:~~//www~~\McNair\Projects\University Patents\academies_verify_cases.~~census.gov~~txt* Need Julia/~~cgi-bin/geo/shapefiles/index~~Meghana to look through the hits and see which are relevant & extract pattern from there.~~php~~ ~~tl_2017_us_coastline -- 4209~~ ~~tl_2017_us_primaryroads -- 11574~~ ~~tl_2017_us_rails -- 176237~~** ~~Link found to potentially download ACS data: https://www.census.gov/geo/maps-data/data/tiger-data~~Having trouble outputting txt file without double quotes around every line.~~html~~*** But most files on Thinking that one text file should be output for all keywords instead of having one each, to avoid overlap (ex) COLLEGE and UNIVERSITY are both keywords; ALBERT EINSTEIN COLLEGE OF YESHIVA UNIVERSITY will be hit twice if it ~~come with .gdb extension~~ were counted as two separate instances, one accounting for COLLEGE and ~~not~~ the other for UNIVERSITY) - either in the form of if-elseif statements or one big regex check.~~shp~~

~~====10/10/~~2017 ~~TUESDAY''9AM~~-~~12PM''====~~*Discoveries/ Struggles regarding.gdb to .shp file conversion03-13:** Esri Production Mapping For University Patent Data Matching - matched SCHOOL (~~costly~~output: E:\McNair\Projects\University Patents\school_pulled_from_assignee_list_USA)*** License needs to be purchasesdand matched INSTITUTE(output: ~~http~~E:~~//www~~\McNair\Projects\University Patents\institute_pulled_from_assignee_list_USA).~~esri.com/software/arcgis/extensions/production-mapping/pricing~~** Use ogr2ogr from gdal package*** https://gis.stackexchange.com/questions/14432/migrating-geodatabase-data-into-postgis-without-esri-apps*** Command: ogr2ogr -f "ESRI Shapefile" [~~Destination of shapefile~~[University Patent Matching] ~~[path to gdb file~~]*** Problem installing gdalTo be worked on later: Grant XML parsing & general name matcher

~~====10/16/~~2017 ~~MONDAY''4PM~~-~~6PM''====~~03-08:* ~~Exported maps of points from the Bay Area each year.~~ ** Map used location: Wrote regex pattern that identifies all "university" matchings - can be found in E:\McNair\Projects\~~Agglomeration~~University Patents\~~HarrisonPeterWorkArcGIS\Jeemin_Bay_Area Points_Every_Year\BayAreaEveryYearMap~~university_pulled_from_assignee_list_USA -- is an output file*Talked to Sonia, but didn't come to solid conclusion on identifying whether key words associate with city or country by running a python function* ~~Zoom scale: 1:650~~Output sql tables from finished run of Jeemin_FDATrial_as_key_data_ripping.py * Ran through assigneelist_USA.txt to see how many different ways UNIVERSITY could be spelled wrong. There were many.~~000~~* ~~Location~~ Tried to logic through creating a pattern that could catch all different versions of ~~Bay Area Points png files:~~ UNIVERSITY. Discuss further on whether UNIVERSITIES and those that include UNIVERSITIES but include ~~E:\McNair\Projects\Agglomeration\HarrisonPeterWorkArcGIS\Jeemin_Bay_Area Points_Every_Year~~INC in the end should be pulled as relevant information

2017-03-06:* [[Installing python in a database]]* Added building Python function section to [[Working with PostgreSQL]] at the bottom of the page.* Ran FDA Trial data ripping again, as the text output files were wiped.* Plan on discussing with Julia and Meghana again about pulling universities and other relevant institutions from the Assignee List USA.* Talked to Sonia about pulling city, state, zipcode information, hence python was installed in a database. Will work with Sonia on Wednesday afternoon and see how best a regex function could be implemented

~~====10/23/~~2017 ~~MONDAY''4PM~~-~~6PM''====~~03-03:* ~~Talked~~ Attempted to ~~Ed with Peter & Oliver about upcoming tasks & projects.~~* Loaded acs_place table 2017 (does not contain population) on tigertest.** SQL commands used:output sql tables

~~DROP TABLE acs_place;~~2017-03-01:* Started re-running Jeemin_FDATrial_as_key_data_ripping.py

~~CREATE TABLE acs_place (~~2017-02-27: ~~USPS varchar(5),~~* Finished producing tables from Jeemin_FDATrial_as_key_data_ripping.py ~~GEOID varchar(30),~~* Talked to Julia about LinkedIn data extracting - to be discussed further with Julia & Peter. ~~ANSICODE varchar(30)~~* Started web crawler for Wikipedia - currently pulls Endowment, ~~NAME varchar(100)~~Academic staff, ~~LSAD varchar(30)~~students, ~~FUNCSTAT varchar(10)~~undergraduates, ~~ALAND varchar(30),~~ ~~AWATER varchar(30),~~ ~~ALAND_SQMI varchar(30),~~ ~~AWATER_SQMI varchar(30),~~ ~~INTPTLAT varchar(30),~~ ~~INTPTLONG varchar(30)~~ );and postgraduates info found on Rice Wikipedia page. Can be found in : E:\McNair\Projects\University Patents\Jeemin_University_wikipedia_crawler.py

~~\COPY acs_place FROM '/bulk/2017_Gaz_place_national.txt';~~2017-02-24: * Continued working on producing multiple tables -~~-COPY 29578~~first two are done. Was working on location, as there are multiple location tags per location.

* TODO2017-02-22:** Find acs place 2016 data for population** Find larger acs filesFinished Jeemin_FDATrial_as_key_data_ripping.py (E:\McNair\Projects\FDA Trials\Jeemin_Project\Jeemin_FDATrial_as_key_data_ripping.py), ~~ideally at~~ which outputs to E:\McNair\Projects\FDA Trials\Jeemin_Project\general_data_ripping_output.txt; TODO: output four different tables & replace the ~~place level~~** Provide more documentation on POSTGIS & geocodingwrite in the same for-loop as going through each file

2017-02-20:* Continued working on Jeemin_FDATrial_as_key_data_ripping.py to find tags and place all of those information in a list.The other zipcode file did not finish executing after 2+ hours of running it - considering the possibility of splitting the record file into smaller bits, or running the processing on a faster machine.

~~====10/30/~~2017 ~~MONDAY''4PM~~-~~6PM''====~~02-17:* ~~Downloaded~~ Completed code for counting the number of occurrences for each unique zipcode. (currently titled & located: E:\McNair\Projects\FDA Trials\Jeemin_Project\Jeemin_Running_File.py). It has been running for 20+min because of the comprehensive XML data ~~from ACS~~files. Meanwhile started coding to create a dictionary with the keys corresponding to each unique trial ID, mapped to ~~be continued~~* File pathevery other information (location, sponsors, phase, drugs ...etc.) (currently titled & located: E:\McNair\Projects\~~Agglomeration~~FDA Trials\~~ACS_Downloaded_Data~~* Fields of interest: ~~S1401 SCHOOL ENROLLMENT~~ ~~S1501 EDUCATIONAL ATTAINMENT~~ ~~S2301 EMPLOYMENT STATUS~~ ~~B01003 TOTAL POPULATION~~ ~~B02001 RACE~~ ~~B07201 GEOGRAPHICAL MOBILITY~~ ~~B08303 TRAVEL TIME TO WORK~~ ~~B19013 MEDIAN HOUSEHOLD INCOME~~ ~~B19053 SELF-EMPLOYMENT INCOME IN THE PAST 12 MONTHS FOR HOUSEHOLDS~~ ~~B19083 GINI INDEX OF INCOME INEQUALITY~~ ~~B25003 TENURE~~ ~~B25105 MEDIAN MONTHLY HOUSING COSTS~~ ~~B28011 INTERNET SUBSCRIPTIONS IN HOUSEHOLD~~ ~~G001 GEOGRAPHIC IDENTIFIERS~~Jeemin_Project\Jeemin_FDATrial_as_key_data_ripping.py).

~~====10/31/~~2017 ~~TUESDAY''9AM~~-~~12PM''====~~02-15:* ~~Finished doanloading~~ Discussed with Catherine what to do with FDA Trial data and decided to have a dictionary with zip-codes as keys and number of trials occurred in that zipcode as values. Was still attempting to loop through the files ~~from ACS.~~* Started loading tables into tigertest.* Commands run could be found without the code having to exist in ~~E:\McNair\Projects\Agglomeration\ACS_Downloaded_Data\DataLoading_SQL_Commands~~the same directory as the XML files.~~txt~~ ~~====11/07/2017 TUESDAY''9AM~~Plan to write to excel via tsv, with zip-~~12PM''====~~code as one column and # of occurrence as the other.

2017-02-13:*~~Yesterday~~Goals (for trials): 1) Build ER Diagram 2) For each entity, ~~narrowed down columns of interest from ACS_S1501_educationattain_2016 table.~~get XML snippet 3) Build a parser/ripper for single file; the python parser can be found at: E:\McNair\Projects\FDA Trials\Jeemin_Project

Id ~~Id2~~ ~~Geography~~ ~~Total; Estimate; Population 25 years and over~~ ~~Total; Estimate; Population 25 years and over~~ 2017- ~~High school graduate (includes equivalency)~~ ~~Total; Margin of Error; Population 25 years and over~~ 02- ~~High school graduate (includes equivalency)~~08: ~~Total; Margin~~ * Attempted to come up with possible cases for locating the description of ~~Error; Population 25 years and over~~ accelerators - ~~High school graduate (includes equivalency)~~ ~~Percent; Margin~~ pick up from extracting bodies of ~~Error; Population 25 years and over - High school graduate~~ text from the about page (~~includes equivalency~~given that it exists) ~~Total; Estimate; Population 25 years and over - Associate's degree~~ ~~Total; Margin of Error; Population 25 years and over - Associate's degree~~ ~~Percent; Estimate; Population 25 years and over - Associate's degree~~ ~~Percent; Margin of Error; Population 25 years and over - Associate's degree~~ ~~Total; Estimate; Population 25 years and over - Bachelor's degree~~ ~~Total; Margin of Error; Population 25 years and over - Bachelor's degree~~ ~~Percent; Estimate; Population 25 years and over - Bachelor's degree~~ ~~Percent; Margin of Error; Population 25 years and over - Bachelor's degree~~ ~~Total; Estimate; Population 25 years and over - Graduate or professional degree~~ ~~Total; Margin of Error; Population 25 years and over - Graduate or professional degree~~ ~~Percent; Estimate; Population 25 years and over - Graduate or professional degree~~ ~~Percent; Margin of Error; Population 25 years and over - Graduate or professional degree~~ ~~Percent; Estimate; Percent high school graduate or higher~~ ~~Percent; Margin of Error; Percent high school graduate or higher~~ ~~Percent; Estimate; Percent bachelor's degree or higher~~ ~~Percent; Margin of Error; Percent bachelor's degree or higher~~* [[Trial Data Project]]

*Complications2017-02-06:**For csv files corresponding to years 2015 Set up wikiPage & ~~2016, all of the above columns exist~~remote desktop.**For csv files corresponding to years 2005 - 2014, no 'Percent' columns exist*** Instead their 'Total' columns are percentage values**For csv Started working on python version of web crawler. So far it successfully prints out a catchphrase/ description for one website. To be worked on. The python file ~~corresponding to year 2005, columns regarding Graduate or professional degree are labeled differently~~can be found in: E:\McNair\Projects\Accelerators\Python WebCrawler\webcrawlerpython.py

[[Category:Work Log]]

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,613

edits

Changes

Jeemin Sim (Work Log) (view source)

Revision as of 13:38, 8 October 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools