ECONOMETRICS COURSEWORK ASSIGNMENT 2

For this assignment, use the dataset eaef_as2.dta, which has 1200 observations, and is

downloadable from SyD. The data contain information on wages and characteristics of workers in

United States, in 2002. It includes a variable ‘catgov’ which indicates whether the person works

for government.

TASK:

a) Build a model to estimate how much more or less workers on average earn when they work

for government as opposed to private sector, holding the other determinants of wages

constant. Interpret the findings. [80%]

b) Assess whether government is a more ‘meritocratic’ employer. Do this by expanding the

model in a) to test whether ability (asvabc) and years of schooling (s) have a larger effect on

earnings in government than in the private sector. Interpret the findings. [20%]

Practical notes:

It may be sensible to use a do-file to avoid retyping the regression commands multiple times. If you

however prefer to work from the command line, note that by pressing ‘page up/down’-buttons you

can get the previous commands to the command line – this may be the quickest way to modify the

your estimation.

Points to keep in mind:

-Are you controlling all relevant variables?

-Could you improve by making transformations to variables / assume non-linearities?

-Could interactions help you improve the model?

-Are there outliers that distort your model?

-Can you rule out endogeneity of your explanatory variables?

-To tabulate 2 variables use tabulate. Example: tabulate female catgov

-To look up how a command works, use help, such as help regress.

-Are the key assumptions of OLS holding (note that some of them can’t be directly tested)?

-Study the dataset and variables and think what you can and can’t do with it.

-Do your results make sense to you? (No need for literature review or outside references!)

How to submit an answer?

The submitted answer should consist of maximum of 3 printed pages, using font size 10 or 12.

To align stata output nicely use Courier 10 Pitch, or Courier New font, and font size 10 in

Word. For estimations in part a) and b) only add the 4 sections as shown on next page. Nothing

more, please. Longer answers will be penalised by at least 5 points.

The next page is a simplified sample answer for part a). Answer should be in similar format in

part b)

Grading is based on the overall sensibility of the preferred models, and their correct interpretation

and testing. There is no one right answer for this assignment. Finally, while this is an open-ended

project, returns to further efforts diminish quickly after a certain point.

SAMPLE ANSWER for part a) (Note: This would not be an answer with high grade)

Candidate number 12345

a)

1. PREFERRED FINAL MODEL

Source | SS df MS Number of obs = 1200

————-+—————————— F( 5, 1194) = 71.77

Model | 64141.7848 5 12828.357 Prob > F = 0.0000

Residual | 213421.268 1194 178.744781 R-squared = 0.2311

————-+—————————— Adj R-squared = 0.2279

Total | 277563.053 1199 231.495457 Root MSE = 13.37

——————————————————————————

earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]

————-+—————————————————————-

catgov | -3.358658 .9409986 -3.57 0.000 -5.204853 -1.512464

s | 2.479537 .1581709 15.68 0.000 2.169213 2.789861

tenure | .7054635 .2014462 3.50 0.000 .3102357 1.100691

tenure2 | -.015971 .0093648 -1.71 0.088 -.0343443 .0024024

female | -5.913968 .7880701 -7.50 0.000 -7.460125 -4.367812

_cons | -14.35778 2.225892 -6.45 0.000 -18.72488 -9.990688

——————————————————————————

2. EXPLANATION OF CONSTRUCTED VARIABLES:

tenure2 = tenure^2

3. INTERPRETATION (Key findings, and rationale for the choice of the final

model)

Working for government reduces earnings by 3 dollars per hour, given the control

variables above. I have left out variables that weren’t significant or

interesting for wage determination.

4. DIAGNOSTIC TESTS FOR NORMALITY AND HETEROSCEDASTICITY

Command ‘estat hettest’ suggests I have heteroscedasticity, the null is

rejected.

‘sktest’ on the residuals suggests that they are not normally distributed.

ECONOMETRICS COURSEWORK ASSIGNMENT 2

For this assignment, use the dataset eaef_as2.dta, which has 1200 observations, and is

downloadable from SyD. The data contain information on wages and characteristics of workers in

United States, in 2002. It includes a variable ‘catgov’ which indicates whether the person works

for government.

TASK:

a) Build a model to estimate how much more or less workers on average earn when they work

for government as opposed to private sector, holding the other determinants of wages

constant. Interpret the findings. [80%]

b) Assess whether government is a more ‘meritocratic’ employer. Do this by expanding the

model in a) to test whether ability (asvabc) and years of schooling (s) have a larger effect on

earnings in government than in the private sector. Interpret the findings. [20%]

Practical notes:

It may be sensible to use a do-file to avoid retyping the regression commands multiple times. If you

however prefer to work from the command line, note that by pressing ‘page up/down’-buttons you

can get the previous commands to the command line – this may be the quickest way to modify the

your estimation.

Points to keep in mind:

-Are you controlling all relevant variables?

-Could you improve by making transformations to variables / assume non-linearities?

-Could interactions help you improve the model?

-Are there outliers that distort your model?

-Can you rule out endogeneity of your explanatory variables?

-To tabulate 2 variables use tabulate. Example: tabulate female catgov

-To look up how a command works, use help, such as help regress.

-Are the key assumptions of OLS holding (note that some of them can’t be directly tested)?

-Study the dataset and variables and think what you can and can’t do with it.

-Do your results make sense to you? (No need for literature review or outside references!)

How to submit an answer?

The submitted answer should consist of maximum of 3 printed pages, using font size 10 or 12.

To align stata output nicely use Courier 10 Pitch, or Courier New font, and font size 10 in

Word. For estimations in part a) and b) only add the 4 sections as shown on next page. Nothing

more, please. Longer answers will be penalised by at least 5 points.

The next page is a simplified sample answer for part a). Answer should be in similar format in

part b)

Grading is based on the overall sensibility of the preferred models, and their correct interpretation

and testing. There is no one right answer for this assignment. Finally, while this is an open-ended

project, returns to further efforts diminish quickly after a certain point.

SAMPLE ANSWER for part a) (Note: This would not be an answer with high grade)

Candidate number 12345

a)

1. PREFERRED FINAL MODEL

Source | SS df MS Number of obs = 1200

————-+—————————— F( 5, 1194) = 71.77

Model | 64141.7848 5 12828.357 Prob > F = 0.0000

Residual | 213421.268 1194 178.744781 R-squared = 0.2311

————-+—————————— Adj R-squared = 0.2279

Total | 277563.053 1199 231.495457 Root MSE = 13.37

——————————————————————————

earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]

————-+—————————————————————-

catgov | -3.358658 .9409986 -3.57 0.000 -5.204853 -1.512464

s | 2.479537 .1581709 15.68 0.000 2.169213 2.789861

tenure | .7054635 .2014462 3.50 0.000 .3102357 1.100691

tenure2 | -.015971 .0093648 -1.71 0.088 -.0343443 .0024024

female | -5.913968 .7880701 -7.50 0.000 -7.460125 -4.367812

_cons | -14.35778 2.225892 -6.45 0.000 -18.72488 -9.990688

——————————————————————————

2. EXPLANATION OF CONSTRUCTED VARIABLES:

tenure2 = tenure^2

3. INTERPRETATION (Key findings, and rationale for the choice of the final

model)

Working for government reduces earnings by 3 dollars per hour, given the control

variables above. I have left out variables that weren’t significant or

interesting for wage determination.

4. DIAGNOSTIC TESTS FOR NORMALITY AND HETEROSCEDASTICITY

Command ‘estat hettest’ suggests I have heteroscedasticity, the null is

rejected.

‘sktest’ on the residuals suggests that they are not normally distributed.