VM Instructions

Current VM version: 1.0.0, February 7, 2015

This virtual machine is intended to assist researchers working with U.S. Census Current Population Survey (CPS) data from March 2014. See http://www.realworlddivorce.com/Methodology for how we used the data.

First-time Setup Instructions

  1. Install Virtualbox
  2. Install Vagrant
  3. Create a folder on your computer called three-day-rdbms
  4. Open a command shell
  1. In OSX: Applications -> Utilities -> Terminal
  2. in Windows: Start Menu -> type ‘cmd’ in the Search Programs and Applications textbox
  1. Do not use cygwin.  Vagrant will overwrite your machine data each time your reboot.
  1. To install (1.2GB download), type the following commands (without the $)$ mkdir three-day-rdbms$ cd three-day-rdbms
    $ vagrant init aegrumet/three-day-rdbms
    $ vagrant up
  2. Wait for the virtual machine to boot.  If performing a network install for this first time, this could take a while.  Once it has a copy of the VM image, Vagrant will create a Virtualbox VM with 2G ram and 40g disk (dynamically allocated), launch the image, map guest ports to host ports and set up file sharing between the host and guest.

VM Usage Instructions

Census Data Usage

We've preloaded data from the U.S. Census Bureau Current Population Survey (http://www.census.gov/cps/) into census_cps_all and created a potentially helpful view called census_cps. CPS is based on gathering data, via interviews, on nearly 150,000 Americans every month. (There are 139,414 rows in the March 2014 data set that we loaded.)

There are also some helper tables:


Each record in
census_cps_all contains information about one person. It also contains information about his or her family and his or her household. When there are multiple people in a family or household this means the data are not fully normalized, e.g., because information about a household will be duplicated in multiple records. There are more than 700 columns in this table and the field names are described in the Current Population Survey, 2013 Annual Social and Economic (ASEC) Supplement (http://www.census.gov/prod/techdoc/cps/cpsmar13.pdf ).

Here's the view definition…

CREATE VIEW census_cps AS SELECT

    a_age as age,

    IF(a_sex=1, "M", "F") as sex,

    census_states_decode.state_name as state,

        if(pehruslt=-4, 40, pehruslt) as weekly_hours,

    wsal_val as annual_wages,

    census_education_decode.level_name as education_level,

    IF(alm_yn=0, NULL, IF(alm_yn=2, 0, 1)) as receiving_alimony, -- no info changed to null, yes to 1, and no to 0

        alm_val as annual_alimony,

    IF(csp_yn=0, NULL, IF(csp_yn=2, 0, 1)) as receiving_child_support, -- no info changed to null, yes to 1, and no to 0

        csp_val as annual_child_support,

        IF(halm_yn=0, NULL, IF(halm_yn=2, 0, 1)) as household_receiving_alimony,

    IF(hcsp_yn=0, NULL, IF(hcsp_yn=2, 0, 1)) as household_receiving_child_support,

        IF(resnssi1=1, 1, 0) as receiving_ssdi,

        IF(hfoodsp=1, 1, 0) as receiving_food_stamps

    FROM census_cps_all, census_states_decode, census_education_decode

    WHERE

        census_states_decode.state_id=census_cps_all.gestfips AND

        census_education_decode.level_id=census_cps_all.a_hga;

Some sample uses of the data:

-- where do men receiving alimony live?

select state, age, annual_wages, annual_alimony, round(100*annual_alimony/(annual_wages+annual_alimony))

from census_cps

where receiving_alimony = 1

and sex = "M";

--- percentage of people of one gender receiving child support?

SELECT state, count(receiving_child_support)*100.0/count(1) AS percentage

FROM

    (SELECT state, IF(receiving_child_support=1, 1, NULL) AS receiving_child_support

        FROM census_cps WHERE age BETWEEN 30 AND 40 AND sex="M") t

GROUP BY state

ORDER BY percentage desc;

-- median wage for one state in one query

create view census_cps_massachusetts_subgroup

as

select * from census_cps

WHERE annual_wages > 0

AND age BETWEEN 22 AND 36

AND education_level = "Bachelor's"

AND weekly_hours >= 30

AND state = "Massachusetts";

SELECT x.annual_wages

from census_cps_massachusetts_subgroup x, census_cps_massachusetts_subgroup y

GROUP BY x.annual_wages

HAVING SUM(SIGN(1-SIGN(y.annual_wages-x.annual_wages)))/COUNT(*) > .5

LIMIT 1

Additional Instructions

        $ cd three-day-rdbms

$ vagrant up