Changes

Jump to navigation Jump to search
no edit summary
{{Project|Has project output=Data|Has sponsor=McNair ProjectsCenter
|Has title=American Community Survey (ACS) Data
|Has owner=Jeemin Sim,
=File Location=
 
=New Files=
 
All of the corrected files and the Load.sql script are in:
E:\projects\agglomeration\ACS\CleanFiles
 
The Load.sql script was run against [[Vcdb4]]
 
==Old Files==
Python Script to pull certain columns from excel file:
B28011 INTERNET SUBSCRIPTIONS IN HOUSEHOLD
G001 GEOGRAPHIC IDENTIFIERS
 
At the end, I chose to have full descriptions but not geographic components.
=Notes on Data=
*Educational Attainment are based on population of age 25 years or higher (except for years 2010, 2011, and 2012 - that's why there are fewer entries in those tables)
=To Access Downloaded TablesLoading to a Dbase=
1) ssh researcher@128The zip file from AFF was downloaded and extracted to E:\projects\agglomeration\ACS\AFFFiles There are 13 files entitled ACS_17_5YR_table_with_ann.csv, where table is S1501, etc. The files have two header lines and are CSV, with quoted strings of the format "Abanda CDP, Alabama" for the display label. There are 5 columns in the first file:*Geo.id e.g., 1600000US0100100*GEO.id2 e.g., 0100100*GEO.42display-label e.44g.181 on terminal, "Abanda CDP, Alabama" 2*HD01_VD01 (Estimate; Total) cd /bulke.g., 174*HD02_VD01 (Margin of Error; Total) 3e.g., 158 Geo.id2 appears to match our Geoids. Note the leading zero, so these fields should be converted varchars(7) psql tigertestwith front padding.  Checking Geo.id2 of 0103076: SELECT * from Tigerplaces WHERE geoid='0103076'; To view all tables:statecode statename gid statefp placefp placens geoid name namelsad lsad classfp pcicbsa pcinecta mtfcc funcstat aland awater intptlat intptlon geom \dtAL Alabama 380 01 03076 02403132 0103076 Auburn Auburn city 25 C1 Y N G4110 A 152375113 2646161 +32.6077220 -085.4895446 1600000US0103076,0103076,"Auburn city, Alabama",61462,65 To view contents of a All files are renamed table.txt and have their second header line removed. The first three columns are common to all files, but then files can have multiple estimate and margin columns. Redundant ones are deleted. The state is also split out. The following are kept: \d (Table Name)*B01003 - Population - Total, margin*B02001 - Race - Total, WhiteAlone, BlackAlone, IndianAlone, AsianAlone, IslanderAlone, OtherAlone, TwoPlus*B07201 - Moving - Total, Same1yr, DiffinUS1yr, DiffinMSA1yr, Abroad1yr*B08303 - Driving - Total, Lt5, Btw5And9, Btw10And14, Btw..., Btw60And89, Gt90*B19013 - HHIncome - MedHHInc2017*B19053 - SelfEmployment - Total, NumWSelfEmpIncome, NumWOSelfEmpInc*B19083 - GiniIndex - Gini, margin*B25003 - Accomodation - Total, Owner, Renter*B25105 - Housing - monthlyhousingcost*B28011 - Internet - Total, InternetSub, DialUp, Broadband, Satellite, Other, NoSub, NoInternet*S1501 - Education - Total25plus, Male25Plus, Female25Plus, Assoc25Plus, Bach25Plus, Grad25Plus, HSOrHigher, BachOrHigher, PCPovertyDueToEdu, MedEarnings*S2301 - Labor - Total, LFPR, EmpPopRatio, UnEmpRate

Navigation menu