VM Instructions
Current VM version: 1.0.0, February 7, 2015
This virtual machine is intended to assist researchers working with U.S. Census Current Population Survey (CPS) data from March 2014. See http://www.realworlddivorce.com/Methodology for how we used the data.
First-time Setup Instructions
VM Usage Instructions
Census Data Usage
We've preloaded data from the U.S. Census Bureau Current Population Survey (http://www.census.gov/cps/) into census_cps_all and created a potentially helpful view called census_cps. CPS is based on gathering data, via interviews, on nearly 150,000 Americans every month. (There are 139,414 rows in the March 2014 data set that we loaded.)
There are also some helper tables:
Each record in census_cps_all contains information about one person. It also contains information about his or her family and his or her household. When there are multiple people in a family or household this means the data are not fully normalized, e.g., because information about a household will be duplicated in multiple records. There are more than 700 columns in this table and the field names are described in the Current Population Survey, 2013 Annual Social and Economic (ASEC) Supplement (http://www.census.gov/prod/techdoc/cps/cpsmar13.pdf ).
Here's the view definition…
CREATE VIEW census_cps AS SELECT
a_age as age,
IF(a_sex=1, "M", "F") as sex,
census_states_decode.state_name as state,
if(pehruslt=-4, 40, pehruslt) as weekly_hours,
wsal_val as annual_wages,
census_education_decode.level_name as education_level,
IF(alm_yn=0, NULL, IF(alm_yn=2, 0, 1)) as receiving_alimony, -- no info changed to null, yes to 1, and no to 0
alm_val as annual_alimony,
IF(csp_yn=0, NULL, IF(csp_yn=2, 0, 1)) as receiving_child_support, -- no info changed to null, yes to 1, and no to 0
csp_val as annual_child_support,
IF(halm_yn=0, NULL, IF(halm_yn=2, 0, 1)) as household_receiving_alimony,
IF(hcsp_yn=0, NULL, IF(hcsp_yn=2, 0, 1)) as household_receiving_child_support,
IF(resnssi1=1, 1, 0) as receiving_ssdi,
IF(hfoodsp=1, 1, 0) as receiving_food_stamps
FROM census_cps_all, census_states_decode, census_education_decode
WHERE
census_states_decode.state_id=census_cps_all.gestfips AND
census_education_decode.level_id=census_cps_all.a_hga;
Some sample uses of the data:
-- where do men receiving alimony live?
select state, age, annual_wages, annual_alimony, round(100*annual_alimony/(annual_wages+annual_alimony))
from census_cps
where receiving_alimony = 1
and sex = "M";
--- percentage of people of one gender receiving child support?
SELECT state, count(receiving_child_support)*100.0/count(1) AS percentage
FROM
(SELECT state, IF(receiving_child_support=1, 1, NULL) AS receiving_child_support
FROM census_cps WHERE age BETWEEN 30 AND 40 AND sex="M") t
GROUP BY state
ORDER BY percentage desc;
-- median wage for one state in one query
create view census_cps_massachusetts_subgroup
as
select * from census_cps
WHERE annual_wages > 0
AND age BETWEEN 22 AND 36
AND education_level = "Bachelor's"
AND weekly_hours >= 30
AND state = "Massachusetts";
SELECT x.annual_wages
from census_cps_massachusetts_subgroup x, census_cps_massachusetts_subgroup y
GROUP BY x.annual_wages
HAVING SUM(SIGN(1-SIGN(y.annual_wages-x.annual_wages)))/COUNT(*) > .5
LIMIT 1
Additional Instructions
$ cd three-day-rdbms
$ vagrant up