Difference between revisions of "American Community Survey (ACS) Data"

From edegan.com
Jump to navigation Jump to search
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{McNair Projects
+
{{Project
 +
|Has project output=Data
 +
|Has sponsor=McNair Center
 
|Has title=American Community Survey (ACS) Data
 
|Has title=American Community Survey (ACS) Data
 
|Has owner=Jeemin Sim,
 
|Has owner=Jeemin Sim,
 
}}
 
}}
  
http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation
+
See also:
http://mcnair.bakerinstitute.org/wiki/Urban_Start-up_Agglomeration
+
*[[PostGIS Installation]]
 +
*[[Urban Start-up Agglomeration]]
 +
 
 +
=Notes=
 +
 
 +
The approach below, using American Fact Finder, seems to be the most standard. However, it does look like it works if you choose the right data year -- 2012 ACS 1-year estimates or 5 year estimates -- seems to have all of the information needed. See the [https://factfinder.census.gov/faces/nav/jsf/pages/download_center.xhtml Fact Finder Download Center]. Note that the 1 yr estimates are only available for places with populations of 65,000 people or more, the 1yr supplemental estimates cover down to places with 20,000 people or more, and the 5 yr estimates cover a whopping 29,573 places [https://www.census.gov/programs-surveys/acs/geography-acs/areas-published.html], as compared with just 630 for the 1yr and 2,323 or the 1yr supplements.
 +
 
 +
One can download the PUMS data for 2017 here: https://www2.census.gov/programs-surveys/acs/data/pums/2017/1-Year/ However, PUMS is only available for places in the [https://en.wikipedia.org/wiki/Public_Use_Microdata_Area Public Use Microdata Areas], which have a population 100,000 or more.
 +
 
 +
And state-by-state, it looks possible to get the latest data profiles (2013-2017 5 yr estimates) but not place by place as data, just as an aggregate:
 +
https://www.census.gov/acs/www/data/data-tables-and-tools/data-profiles/
 +
 
 +
For just population, also see: http://census.ire.org/data/bulkdata.html. This seems to be based on the 2000 or 2010 census data only.
 +
 
 +
=File Location=
 +
 
 +
=New Files=
 +
 
 +
All of the corrected files and the Load.sql script are in:
 +
E:\projects\agglomeration\ACS\CleanFiles
 +
 
 +
The Load.sql script was run against [[Vcdb4]]
 +
 
 +
==Old Files==
 +
 +
Python Script to pull certain columns from excel file:
 +
*E:\McNair\Projects\Agglomeration\ACS_Downloaded_Data\pullCertainColumns.py
 +
 
 +
SQL Commands to create tables and load data:
 +
*E:\McNair\Projects\Agglomeration\ACS_Downloaded_Data\DataLoading_SQL_Commands
 +
 
 +
Excel files downloads from ACS website:
 +
*E:\McNair\Projects\Agglomeration\ACS_Downloaded_Data\csvFiles
  
 
=Steps to Obtain Data=
 
=Steps to Obtain Data=
Line 26: Line 60:
 
  11) Press Next
 
  11) Press Next
  
=Tables to Pull=
+
For the new pull, I used 2017 5yr Estimates.
 +
 
 +
=Tables Pulled=
  
S1401 SCHOOL ENROLLMENT
 
 
  S1501 EDUCATIONAL ATTAINMENT
 
  S1501 EDUCATIONAL ATTAINMENT
 
  S2301 EMPLOYMENT STATUS
 
  S2301 EMPLOYMENT STATUS
Line 43: Line 78:
 
  G001 GEOGRAPHIC IDENTIFIERS
 
  G001 GEOGRAPHIC IDENTIFIERS
  
=To Access Downloaded Tables=
+
At the end, I chose to have full descriptions but not geographic components.
 +
 
 +
=Notes on Data=
 +
 
 +
*Educational Attainment are based on population of age 25 years or higher (except for years 2010, 2011, and 2012 - that's why there are fewer entries in those tables)
  
1) ssh researcher@128.42.44.181 on terminal
+
=Loading to a Dbase=
2) cd /bulk
+
 
  3) psql tigertest
+
The zip file from AFF was downloaded and extracted to E:\projects\agglomeration\ACS\AFFFiles
   
+
 
  To view all tables:
+
There are 13 files entitled ACS_17_5YR_table_with_ann.csv, where table is S1501, etc. The files have two header lines and are CSV, with quoted strings of the format "Abanda CDP, Alabama" for the display label. There are 5 columns in the first file:
  \dt
+
*Geo.id e.g., 1600000US0100100
   
+
*GEO.id2 e.g., 0100100
To view contents of a table:
+
*GEO.display-label e.g., "Abanda CDP, Alabama"
\d (Table Name)
+
*HD01_VD01 (Estimate; Total) e.g., 174
 +
*HD02_VD01 (Margin of Error; Total) e.g., 158
 +
 
 +
Geo.id2 appears to match our Geoids. Note the leading zero, so these fields should be converted varchars(7) with front padding.
 +
 
 +
Checking Geo.id2 of 0103076:
 +
  SELECT * from Tigerplaces WHERE geoid='0103076';
 +
  statecode statename gid statefp placefp placens geoid name namelsad lsad classfp pcicbsa pcinecta mtfcc funcstat aland awater intptlat intptlon geom
 +
  AL Alabama 380 01 03076 02403132 0103076 Auburn Auburn city 25 C1 Y N G4110 A 152375113 2646161 +32.6077220 -085.4895446
 +
  1600000US0103076,0103076,"Auburn city, Alabama",61462,65
 +
 
 +
All files are renamed table.txt and have their second header line removed. The first three columns are common to all files, but then files can have multiple estimate and margin columns. Redundant ones are deleted. The state is also split out. The following are kept:
 +
*B01003 - Population - Total, margin
 +
*B02001 - Race - Total, WhiteAlone, BlackAlone, IndianAlone, AsianAlone, IslanderAlone, OtherAlone, TwoPlus
 +
*B07201 - Moving - Total, Same1yr, DiffinUS1yr, DiffinMSA1yr, Abroad1yr
 +
*B08303 - Driving - Total, Lt5, Btw5And9, Btw10And14, Btw..., Btw60And89, Gt90
 +
*B19013 - HHIncome - MedHHInc2017
 +
*B19053 - SelfEmployment - Total, NumWSelfEmpIncome, NumWOSelfEmpInc
 +
*B19083 - GiniIndex - Gini, margin
 +
*B25003 - Accomodation - Total, Owner, Renter
 +
*B25105 - Housing - monthlyhousingcost
 +
*B28011 - Internet - Total, InternetSub, DialUp, Broadband, Satellite, Other, NoSub, NoInternet
 +
*S1501 - Education - Total25plus, Male25Plus, Female25Plus, Assoc25Plus, Bach25Plus, Grad25Plus, HSOrHigher, BachOrHigher, PCPovertyDueToEdu, MedEarnings
 +
*S2301 - Labor - Total, LFPR, EmpPopRatio, UnEmpRate

Revision as of 13:41, 21 September 2020


Project
American Community Survey (ACS) Data
Project logo 02.png
Project Information
Has title American Community Survey (ACS) Data
Has owner Jeemin Sim
Has start date
Has deadline date
Has project status
Has sponsor McNair Center
Has project output Data
Copyright © 2019 edegan.com. All Rights Reserved.


See also:

Notes

The approach below, using American Fact Finder, seems to be the most standard. However, it does look like it works if you choose the right data year -- 2012 ACS 1-year estimates or 5 year estimates -- seems to have all of the information needed. See the Fact Finder Download Center. Note that the 1 yr estimates are only available for places with populations of 65,000 people or more, the 1yr supplemental estimates cover down to places with 20,000 people or more, and the 5 yr estimates cover a whopping 29,573 places [1], as compared with just 630 for the 1yr and 2,323 or the 1yr supplements.

One can download the PUMS data for 2017 here: https://www2.census.gov/programs-surveys/acs/data/pums/2017/1-Year/ However, PUMS is only available for places in the Public Use Microdata Areas, which have a population 100,000 or more.

And state-by-state, it looks possible to get the latest data profiles (2013-2017 5 yr estimates) but not place by place as data, just as an aggregate: https://www.census.gov/acs/www/data/data-tables-and-tools/data-profiles/

For just population, also see: http://census.ire.org/data/bulkdata.html. This seems to be based on the 2000 or 2010 census data only.

File Location

New Files

All of the corrected files and the Load.sql script are in:

E:\projects\agglomeration\ACS\CleanFiles

The Load.sql script was run against Vcdb4

Old Files

Python Script to pull certain columns from excel file:

  • E:\McNair\Projects\Agglomeration\ACS_Downloaded_Data\pullCertainColumns.py

SQL Commands to create tables and load data:

  • E:\McNair\Projects\Agglomeration\ACS_Downloaded_Data\DataLoading_SQL_Commands

Excel files downloads from ACS website:

  • E:\McNair\Projects\Agglomeration\ACS_Downloaded_Data\csvFiles

Steps to Obtain Data

1) Go to https://factfinder.census.gov/faces/nav/jsf/pages/download_center.xhtml
2) Select 'I know the dataset or table(s) that I want to download.'
3) Press Next
4) For 'Select a program:' choose
       'American Community Survey'
5) For 'Select a dataset and click Add to Your Selections:' choose
       '<YEAR OF INTEREST> ACS 1-year estimates'
6) Press 'Add To Your Selections'
7) Press Next
8) For 'Select a geographic type:' choose
       'Place - 160'
9) For Select a state:
       Don't choose a state, as we wish to download all.
10) For 'Select one or more geographic areas...' choose
       'All Places within United States and Puerto Rico'
11) Press Next

For the new pull, I used 2017 5yr Estimates.

Tables Pulled

S1501 EDUCATIONAL ATTAINMENT
S2301 EMPLOYMENT STATUS
B01003 TOTAL POPULATION
B02001 RACE
B07201 GEOGRAPHICAL MOBILITY
B08303 TRAVEL TIME TO WORK
B19013 MEDIAN HOUSEHOLD INCOME
B19053 SELF-EMPLOYMENT INCOME IN THE PAST 12 MONTHS FOR HOUSEHOLDS
B19083 GINI INDEX OF INCOME INEQUALITY
B25003 TENURE
B25105 MEDIAN MONTHLY HOUSING COSTS
B28011 INTERNET SUBSCRIPTIONS IN HOUSEHOLD
G001 GEOGRAPHIC IDENTIFIERS

At the end, I chose to have full descriptions but not geographic components.

Notes on Data

  • Educational Attainment are based on population of age 25 years or higher (except for years 2010, 2011, and 2012 - that's why there are fewer entries in those tables)

Loading to a Dbase

The zip file from AFF was downloaded and extracted to E:\projects\agglomeration\ACS\AFFFiles

There are 13 files entitled ACS_17_5YR_table_with_ann.csv, where table is S1501, etc. The files have two header lines and are CSV, with quoted strings of the format "Abanda CDP, Alabama" for the display label. There are 5 columns in the first file:

  • Geo.id e.g., 1600000US0100100
  • GEO.id2 e.g., 0100100
  • GEO.display-label e.g., "Abanda CDP, Alabama"
  • HD01_VD01 (Estimate; Total) e.g., 174
  • HD02_VD01 (Margin of Error; Total) e.g., 158

Geo.id2 appears to match our Geoids. Note the leading zero, so these fields should be converted varchars(7) with front padding.

Checking Geo.id2 of 0103076:

SELECT * from Tigerplaces WHERE geoid='0103076';
statecode	statename	gid	statefp	placefp	placens	geoid	name	namelsad	lsad	classfp	pcicbsa	pcinecta	mtfcc	funcstat	aland	awater	intptlat	intptlon	geom
AL	Alabama	380	01	03076	02403132	0103076	Auburn	Auburn city	25	C1	Y	N	G4110	A	152375113	2646161	+32.6077220	-085.4895446	
1600000US0103076,0103076,"Auburn city, Alabama",61462,65

All files are renamed table.txt and have their second header line removed. The first three columns are common to all files, but then files can have multiple estimate and margin columns. Redundant ones are deleted. The state is also split out. The following are kept:

  • B01003 - Population - Total, margin
  • B02001 - Race - Total, WhiteAlone, BlackAlone, IndianAlone, AsianAlone, IslanderAlone, OtherAlone, TwoPlus
  • B07201 - Moving - Total, Same1yr, DiffinUS1yr, DiffinMSA1yr, Abroad1yr
  • B08303 - Driving - Total, Lt5, Btw5And9, Btw10And14, Btw..., Btw60And89, Gt90
  • B19013 - HHIncome - MedHHInc2017
  • B19053 - SelfEmployment - Total, NumWSelfEmpIncome, NumWOSelfEmpInc
  • B19083 - GiniIndex - Gini, margin
  • B25003 - Accomodation - Total, Owner, Renter
  • B25105 - Housing - monthlyhousingcost
  • B28011 - Internet - Total, InternetSub, DialUp, Broadband, Satellite, Other, NoSub, NoInternet
  • S1501 - Education - Total25plus, Male25Plus, Female25Plus, Assoc25Plus, Bach25Plus, Grad25Plus, HSOrHigher, BachOrHigher, PCPovertyDueToEdu, MedEarnings
  • S2301 - Labor - Total, LFPR, EmpPopRatio, UnEmpRate