The “Region” predictor
Reconsider the wine quality data in Table E3.4. The “Region” predictor refers to three distinct geographical regions where the wine was produced. Note that this is a categorical variable.
a. Fit the model using the “Region” variable as it is given in Table E3.4. What potential difficulties could be introduced by including this variable in the regression model using the three levels shown in Table E3.4?
b. An alternative way to include the categorical variable “Region” would be to introduce two indicator variables x1 and x2as follows:
Region
X1
X2
1
0
0
2
1
0
3
0
1
Why is this approach better than just using the codes 1, 2, and 3?
c. Rework Exercise 3.7 using the indicator variables defined in part b for “Region.”
TABLE E3.4 Wine Quality Datau (Found in Minitab)
Clarity,
Aroma,
Body,
Flavor,
Oakiness.
Quality.

X1
X2
X3
x4
X5
y
Region
1
3.3
2.8
3.1
4.1
9.8
1
1
4.4
4.9
3.5
3.9
12.6
1
1
3.9
5.3
4.8
4.7
11.9
1
1
3.9
2.6
3.1
3.6
11.1
1
1
5.6
5.1
5.5
5.1
13.3
1
1
4.6
4.7
5
4.1
12.8
1
1
4.8
4.8
4.8
3.3
12.8
1
1
5.3
4.5
4.3
5.2
12
1
1
4.3
4.3
3.9
2.9
13.6
3
1
4.3
3.9
4.7
3.9
13.9
1
1
5.1
4.3
4.5
3.6
14.4
3
0.5
3.3
5.4
4.3
3.6
12.3
2
0.8
5.9
5.7
7
4.1
16.1
3
0.7
7.7
6.6
6.7
3.7
16.1
3
1
7.1
4.4
5.8
4.1
15.5
3
0.9
5.5
5.6
5.6
4.4
15.5
3
1
6.3
5.4
4.8
4.6
13.8
3
1
5
5.5
5.5
4.1
13.8
3
1
4.6
4.1
4.3
3.1
11.3
1
0.9
3.4
5
3.4
3.4
7.9
2
0.9
6.4
5.4
6.6
4.8
15.1
3
1
5.5
5.3
5.3
3.8
13.5
3
0.7
4.7
4.1
5
3.7
10.8
2
0.7
4.1
4
4.1
4
9.5
2
1
6
5.4
5.7
4.7
12.7
3
1
4.3
4.6
4.7
4.9
11.6
2
1
3.9
4
5.1
5.1
11.7
1
1
5.1
4.9
5
5.1
11.9
2
1
3.9
4.4
5
4.4
10.8
2
1
4.5
3.7
2.9
3.9
8.5
2
1
5.2
4.3
5
6
10.7
2
0.8
4.2
3.8
3
4.7
9.1
1
1
3.3
3.5
4.3
4.5
12.1
1
1
6.8
5
6
5.2
14.9
3
0.8
5
5.7
5.5
4.8
13.5
1
0.8
3.5
4.7
4.2
3.3
12.2
1
0.8
4.3
5.5
3.5
5.8
10.3
1
0.8
5.2
4.8
5.7
3.5
13.2
1

The “Region” predictor
Reconsider the wine quality data in Table E3.4. The “Region” predictor refers to three distinct geographical regions where the wine was produced. Note that this is a categorical variable.
a. Fit the model using the “Region” variable as it is given in Table E3.4. What potential difficulties could be introduced by including this variable in the regression model using the three levels shown in Table E3.4?
b. An alternative way to include the categorical variable “Region” would be to introduce two indicator variables x1 and x2as follows:
Region
X1
X2
1
0
0
2
1
0
3
0
1
Why is this approach better than just using the codes 1, 2, and 3?
c. Rework Exercise 3.7 using the indicator variables defined in part b for “Region.”
TABLE E3.4 Wine Quality Datau (Found in Minitab)
Clarity,
Aroma,
Body,
Flavor,
Oakiness.
Quality.

X1
X2
X3
x4
X5
y
Region
1
3.3
2.8
3.1
4.1
9.8
1
1
4.4
4.9
3.5
3.9
12.6
1
1
3.9
5.3
4.8
4.7
11.9
1
1
3.9
2.6
3.1
3.6
11.1
1
1
5.6
5.1
5.5
5.1
13.3
1
1
4.6
4.7
5
4.1
12.8
1
1
4.8
4.8
4.8
3.3
12.8
1
1
5.3
4.5
4.3
5.2
12
1
1
4.3
4.3
3.9
2.9
13.6
3
1
4.3
3.9
4.7
3.9
13.9
1
1
5.1
4.3
4.5
3.6
14.4
3
0.5
3.3
5.4
4.3
3.6
12.3
2
0.8
5.9
5.7
7
4.1
16.1
3
0.7
7.7
6.6
6.7
3.7
16.1
3
1
7.1
4.4
5.8
4.1
15.5
3
0.9
5.5
5.6
5.6
4.4
15.5
3
1
6.3
5.4
4.8
4.6
13.8
3
1
5
5.5
5.5
4.1
13.8
3
1
4.6
4.1
4.3
3.1
11.3
1
0.9
3.4
5
3.4
3.4
7.9
2
0.9
6.4
5.4
6.6
4.8
15.1
3
1
5.5
5.3
5.3
3.8
13.5
3
0.7
4.7
4.1
5
3.7
10.8
2
0.7
4.1
4
4.1
4
9.5
2
1
6
5.4
5.7
4.7
12.7
3
1
4.3
4.6
4.7
4.9
11.6
2
1
3.9
4
5.1
5.1
11.7
1
1
5.1
4.9
5
5.1
11.9
2
1
3.9
4.4
5
4.4
10.8
2
1
4.5
3.7
2.9
3.9
8.5
2
1
5.2
4.3
5
6
10.7
2
0.8
4.2
3.8
3
4.7
9.1
1
1
3.3
3.5
4.3
4.5
12.1
1
1
6.8
5
6
5.2
14.9
3
0.8
5
5.7
5.5
4.8
13.5
1
0.8
3.5
4.7
4.2
3.3
12.2
1
0.8
4.3
5.5
3.5
5.8
10.3
1
0.8
5.2
4.8
5.7
3.5
13.2
1