First, save this spreadsheet to your computer files, adding your name to the filename (e.g. SmithM2Assign.xls).
Save in .xls format.
Open the file on your computer, work on the problems, re-saving your file in .xls format always
You may copy the data and/or problems to additional sheets in this workbook or work on this page.
To add a sheet, click Shift+F11. Or right-click on the sheet tab, which also allows you to rename the tab.
Attach the file in the assignment drop box for Module 2.
Be sure to show your calculations and/or paste in results from StatCrunch, DDXL, etc.
Remember to use 4-Step problem solving process: Declare, Strategize, Execute, Deduce.
Problem 1 (30 points)
This problem is about examining data for validity. You need to examine a set of data and determine whether there are
"nonsense" entries (or errors). For example, we have data sent in by previous students on Height, Weight, Age, Sex (HWAS).
Some data was entered incorrectly. Students were asked to provide height in inches, but there were entries of 5 (perhaps feet?).
Using data that is obviously incorrect can cause errors in the data analysis. Read the Instructor Commentary for more on the topic.
Click on the Sheet tab below named Bad.Data and complete problem 1 on that sheet.
For Problems 2 and 3, download the HWAS data set provided in the HWAS folder, if you have not already done so. This is a set
of "clean" data that you will use for problems 2 and 3, as well as other assignments. Save it to your files as CleanData.xls.
You may copy the Clean data set into this spreadsheet on a separate sheet, or you may prefer to copy just the column you need.
Problem 2 (35 points)
a. Show the distribution of Age visually, using 2 approaches, one of which is a histogram.
b. Compare/Discuss the 2 methods.
c. Describe the age distribution and make conclusions.
Problem 3 (35 points)
a. Show the distribution of Height, using a histogram and a boxplot.
b. Provide a Five-Number Summary, the Mean and Standard Deviation.
c. Discuss the type of information that you can glean from the graphs and the numbers.
Process for Cleaning Bad Data from data set
1. To locate bad data items, you may sort the data (be sure to select all the data columns with headers — each row is a person).
Numbers that are at the extremes of the range will show up as the min or the max. Bad data is nonsense data, such as 6 inches tall.
While an adult with a height of 48 inches is unusal, it is possible. A graph will generally make it easy to spot extreme outliers.
If you remove or modify data, that information should be noted below the data set along with reason.
2. If you need to remove or modify data, that information should be noted below the data set along with reason.
Once you locate the bad data record, put an x in the extra column if eliminating the bad data record
Or, put an m in the extra column if modifying a value , and explain the modification. Make correction to the data item.
3. Do this for each column to find the bad data. There may be more than one bad data item in a column.
4. Finally, sort the data set by the Corrections column values. Copy any rows with a x to area below the list, then delete the original row.
A summary list of all changes and deletions should be provided.
Data set containing Bad Data…