Statistics

Statistics�� assignment. Some of the questions require using Stata. See attachments for assignment

1. Consider three education variables whose survey questions and response categories are listed below. What types/forms of variables are these? What do you see as

some challenges to describing the distribution of education (in the target population) using each of these variables? Are there other ways you might gather education

that might be more informative?

���What is the highest degree you have completed?��� ���How many years of school have you completed? ���How far did you get in school?���

1=high school diploma

2=GED

3=associate���s degree

4=bachelor���s degree

5=master���s degree

6=PhD

7=JD

8=MD

9=other (specify) 0 to 25 years 9=9th grade

10=10th grade

11=11th grade

12=12th grade

13=high school diploma or GED

14=some college

15=associate���s degree

16=bachelor���s degree

17=master���s of professional degree

18=doctoral degree

Q1-A: What type of variables are these? 3 pts

Q1-B: What do you see as some challenges to describing the distribution of education (in the target population) using each of these variables? 4 pts

Q1-C: Are there other ways you might gather education that might be more informative? 3 pts

2. What education variables are available in the South African CSG Impact Evaluation dataset?

Q2-A: Identify three of these variables and produce appropriate descriptive statistics for each one. 6 pts

Q2-B: What is limiting about these CSG education measures? 3 pts

3. Using the South African CSG Impact Evaluation dataset, produce a histogram, box plot and summary statistics for the variable (ad3q20a2) that shows the total

amount of expenditures (in Rands) on school fees by households. Describe the patterns you see in these data. How many households are missing information on the amount

of school fees for this variable (hint: use the codebook command and browse the data to assess the cases where no fee amount is recorded)? What do you learn? Does no

response implies no school fees were paid?

Q3-A: Produce a histogram, box plot and summary statistics for the variable (ad3q20a2) that shows the total amount of expenditures (in Rands) on school fees by

households. 3 pts

Q3-B: Describe the patterns you see in these data. 3 pts

Q3-C: How many households are missing information on the amount of school fees for this variable (hint: use the codebook command and browse the data to assess the

cases where no fee amount is recorded)? 4 pts

Q3-D: What do you learn? Does no response imply no school fees were paid? 4 pts

4. Examine the information in Table 6 of the CSG Impact Evaluation Fieldwork Report.

Q4-A: Based on the information reported, where did the fieldwork team have the greatest success in securing completed surveys (in terms of sampling unit and

geography)? Present statistics to explain your answer. 4 pts

Q4-B: Where did they have the lowest success rates? Present statistics to explain your answer. 3pts

5. Look at Table 15 in the CSG Impact Evaluation Fieldwork Report (p. 80).

Q5-A: What is the total number of questionnaires generated in the fieldwork? 3 pts

Q5-B: What fraction of all questionnaires obtained in the fieldwork were invalid? 3pts

Q5-C: What paypoint/province/household type combination generated the largest number of invalid questionnaires? 3pts

Q5-D: For that paypoint/province/household type combination with the largest number of invalid questionnaires, what fraction of total attempted questionnaires did the

invalid questionnaires represent? (Show your math work in your response). 3 pts

6. Open the Los Angeles public school student data (���LA students���) in Stata. Assume that you have the population of students eligible for free extra academic

assistance (as mandated under No Child Left Behind) in the 2012-13 school year. Calculate basic descriptive statistics (including the mean and standard deviation) and

produce a histogram for 2012-13 math test scores in this dataset. Characterize the distribution of test scores (in words) for the LA school district administrator,

keeping in mind what an education leader might want to understand from this distribution.

Q6-A: Calculate basic descriptive statistics (including the mean and standard deviation) and produce a histogram for 2012-13 math test scores in this dataset. 4 pts

Q6-B: Characterize the distribution of test scores (in words) for the LA school district administrator, keeping in mind what an education leader might want to

understand from this distribution.

3 pts

7. Draw a random sample of 5,000 students from the LA students dataset and compute and record the mean and standard deviation of 2012-13 math test scores in your

sample. Then ���clear��� the data and repeat these steps (i.e., sampling 5,000 students and computing and recording the sample mean and standard deviation of their 2012-

13 math test scores). Also be sure to track the number of observations for each set of statistics recorded. Continue to repeat this step until you have 10 sample

means and standard deviations. 10 pts

8. Follow the same steps in #4, but instead draw 10 samples of 500 students each, computing the same statistics and tracking the number of observations. Repeat

these steps once more to obtain 10 samples of 50 students each.

Q8-A: Follow the same steps in #4, but instead draw 10 samples of 500 students each, computing the same statistics and tracking the number of observations. 5 pts

Q8-B: Repeat these steps once more to obtain 10 samples of 50 students each. 5 pts

9. Now treat each set of sample statistics for a given sample size (5,000, 500 and 50) as a dataset (n=10) and compute the mean and standard deviation for each.

How do these means compare with the mean you calculated in #6 for all students with 2012-13 math test scores in this dataset? How do the three sets of sample standard

deviations compare to each other and to the standard deviation that you calculated in #2 for all students?

Q9-A: Now treat each set of sample statistics for a given sample size (5,000, 500 and 50) as a dataset (n=10) and compute the mean and standard deviation for each.

10 pts.

Q9-B: How do these means compare with the mean you calculated in #6 for all students with 2012-13 math test scores in this dataset? 3 pts.

Q9-C: How do the three sets of sample standard deviations compare to each other and to the standard deviation that you calculated in #2 for all students? 3 pts.

10. If you had been asked to draw sample sizes of 50,000 students each (10 different times), how would you expect the mean and standard deviation of those

resulting statistics to compare to those you already calculated above? Explain your answer. 5 pts.