The Capstone Project offers qualified learners to the opportunity to apply their knowledge by analyzing and comparing multiple data sources on the same topic. Students will develop a research question, access and analyze relevant data, and critically examine the quality of each data source.
At the completion of this capstone, students will have demonstrated hands-on data analysis capability, evaluated the quality of different data sources using the Total Survey Error approach, involving at least some of the following: comparing weighted non-probability samples to data collected from probability samples, using sampling techniques to correct for coverage errors, and tracking and assess the ease of using an online questionnaire that you implement.
Develop a questionnaire for the capstone project. You can choose between two topics: A post-course evaluation survey for the Coursera MOOC “Questionnaire Design in Social Surveys” or a National Health Interview survey focusing on fitness, exercise, and nutrition. The survey should not take more than 10 minutes (not more than 30 questions) and should be a self-administered survey (online).
This part of the capstone project involves (1) implementing the questionnaire you developed in the Questionnaire Design for Social Surveys part of the capstone project as an online questionnaire designed for both mobile and desktop (PC) administration, (2) deploying the questionnaire to 100 respondents, and (3) interpreting some of the data.
Sampling and Weighting
Conduct a two-stage sampling project. You will select a two-stage stratified sample from a population that is based on data from the 2015 National Health Interview Survey done in the United States. You will complete three steps: (1) Select a sample of two block groups in each stratum with probabilities proportional to the number of persons in each block group. This requires counting the number of persons in each block group by stratum to plan five separate selections of two block groups in each stratum. (2) Then select a simple random sample without replacement (srswor) of 10% of the persons within each sample block group. (3) Calculate weights for each sample case and compute weighted estimates from the sample.
Here you will analyze the National Health Interview Survey sample data file, NHISsamdat.csv. You should be careful to properly deal with missing values in the data set.