ECONOMETRICS COURSEWORK ASSIGNMENT 2
For this assignment, use the dataset eaef_as2.dta, which has 1200 observations, and is
downloadable from SyD. The data contain information on wages and characteristics of workers in
United States, in 2002. It includes a variable ‘catgov’ which indicates whether the person works
for government.
TASK:
a) Build a model to estimate how much more or less workers on average earn when they work
for government as opposed to private sector, holding the other determinants of wages
constant. Interpret the findings. [80%]
b) Assess whether government is a more ‘meritocratic’ employer. Do this by expanding the
model in a) to test whether ability (asvabc) and years of schooling (s) have a larger effect on
earnings in government than in the private sector. Interpret the findings. [20%]
Practical notes:
It may be sensible to use a do-file to avoid retyping the regression commands multiple times. If you
however prefer to work from the command line, note that by pressing ‘page up/down’-buttons you
can get the previous commands to the command line – this may be the quickest way to modify the
your estimation.
Points to keep in mind:
-Are you controlling all relevant variables?
-Could you improve by making transformations to variables / assume non-linearities?
-Could interactions help you improve the model?
-Are there outliers that distort your model?
-Can you rule out endogeneity of your explanatory variables?
-To tabulate 2 variables use tabulate. Example: tabulate female catgov
-To look up how a command works, use help, such as help regress.
-Are the key assumptions of OLS holding (note that some of them can’t be directly tested)?
-Study the dataset and variables and think what you can and can’t do with it.
-Do your results make sense to you? (No need for literature review or outside references!)
How to submit an answer?
The submitted answer should consist of maximum of 3 printed pages, using font size 10 or 12.
To align stata output nicely use Courier 10 Pitch, or Courier New font, and font size 10 in
Word. For estimations in part a) and b) only add the 4 sections as shown on next page. Nothing
more, please. Longer answers will be penalised by at least 5 points.
The next page is a simplified sample answer for part a). Answer should be in similar format in
part b)
Grading is based on the overall sensibility of the preferred models, and their correct interpretation
and testing. There is no one right answer for this assignment. Finally, while this is an open-ended
project, returns to further efforts diminish quickly after a certain point.
SAMPLE ANSWER for part a) (Note: This would not be an answer with high grade)
Candidate number 12345
a)
1. PREFERRED FINAL MODEL
Source | SS df MS Number of obs = 1200
————-+—————————— F( 5, 1194) = 71.77
Model | 64141.7848 5 12828.357 Prob > F = 0.0000
Residual | 213421.268 1194 178.744781 R-squared = 0.2311
————-+—————————— Adj R-squared = 0.2279
Total | 277563.053 1199 231.495457 Root MSE = 13.37
——————————————————————————
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
catgov | -3.358658 .9409986 -3.57 0.000 -5.204853 -1.512464
s | 2.479537 .1581709 15.68 0.000 2.169213 2.789861
tenure | .7054635 .2014462 3.50 0.000 .3102357 1.100691
tenure2 | -.015971 .0093648 -1.71 0.088 -.0343443 .0024024
female | -5.913968 .7880701 -7.50 0.000 -7.460125 -4.367812
_cons | -14.35778 2.225892 -6.45 0.000 -18.72488 -9.990688
——————————————————————————
2. EXPLANATION OF CONSTRUCTED VARIABLES:
tenure2 = tenure^2
3. INTERPRETATION (Key findings, and rationale for the choice of the final
model)
Working for government reduces earnings by 3 dollars per hour, given the control
variables above. I have left out variables that weren’t significant or
interesting for wage determination.
4. DIAGNOSTIC TESTS FOR NORMALITY AND HETEROSCEDASTICITY
Command ‘estat hettest’ suggests I have heteroscedasticity, the null is
rejected.
‘sktest’ on the residuals suggests that they are not normally distributed.

ECONOMETRICS COURSEWORK ASSIGNMENT 2
For this assignment, use the dataset eaef_as2.dta, which has 1200 observations, and is
downloadable from SyD. The data contain information on wages and characteristics of workers in
United States, in 2002. It includes a variable ‘catgov’ which indicates whether the person works
for government.
TASK:
a) Build a model to estimate how much more or less workers on average earn when they work
for government as opposed to private sector, holding the other determinants of wages
constant. Interpret the findings. [80%]
b) Assess whether government is a more ‘meritocratic’ employer. Do this by expanding the
model in a) to test whether ability (asvabc) and years of schooling (s) have a larger effect on
earnings in government than in the private sector. Interpret the findings. [20%]
Practical notes:
It may be sensible to use a do-file to avoid retyping the regression commands multiple times. If you
however prefer to work from the command line, note that by pressing ‘page up/down’-buttons you
can get the previous commands to the command line – this may be the quickest way to modify the
your estimation.
Points to keep in mind:
-Are you controlling all relevant variables?
-Could you improve by making transformations to variables / assume non-linearities?
-Could interactions help you improve the model?
-Are there outliers that distort your model?
-Can you rule out endogeneity of your explanatory variables?
-To tabulate 2 variables use tabulate. Example: tabulate female catgov
-To look up how a command works, use help, such as help regress.
-Are the key assumptions of OLS holding (note that some of them can’t be directly tested)?
-Study the dataset and variables and think what you can and can’t do with it.
-Do your results make sense to you? (No need for literature review or outside references!)
How to submit an answer?
The submitted answer should consist of maximum of 3 printed pages, using font size 10 or 12.
To align stata output nicely use Courier 10 Pitch, or Courier New font, and font size 10 in
Word. For estimations in part a) and b) only add the 4 sections as shown on next page. Nothing
more, please. Longer answers will be penalised by at least 5 points.
The next page is a simplified sample answer for part a). Answer should be in similar format in
part b)
Grading is based on the overall sensibility of the preferred models, and their correct interpretation
and testing. There is no one right answer for this assignment. Finally, while this is an open-ended
project, returns to further efforts diminish quickly after a certain point.
SAMPLE ANSWER for part a) (Note: This would not be an answer with high grade)
Candidate number 12345
a)
1. PREFERRED FINAL MODEL
Source | SS df MS Number of obs = 1200
————-+—————————— F( 5, 1194) = 71.77
Model | 64141.7848 5 12828.357 Prob > F = 0.0000
Residual | 213421.268 1194 178.744781 R-squared = 0.2311
————-+—————————— Adj R-squared = 0.2279
Total | 277563.053 1199 231.495457 Root MSE = 13.37
——————————————————————————
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
————-+—————————————————————-
catgov | -3.358658 .9409986 -3.57 0.000 -5.204853 -1.512464
s | 2.479537 .1581709 15.68 0.000 2.169213 2.789861
tenure | .7054635 .2014462 3.50 0.000 .3102357 1.100691
tenure2 | -.015971 .0093648 -1.71 0.088 -.0343443 .0024024
female | -5.913968 .7880701 -7.50 0.000 -7.460125 -4.367812
_cons | -14.35778 2.225892 -6.45 0.000 -18.72488 -9.990688
——————————————————————————
2. EXPLANATION OF CONSTRUCTED VARIABLES:
tenure2 = tenure^2
3. INTERPRETATION (Key findings, and rationale for the choice of the final
model)
Working for government reduces earnings by 3 dollars per hour, given the control
variables above. I have left out variables that weren’t significant or
interesting for wage determination.
4. DIAGNOSTIC TESTS FOR NORMALITY AND HETEROSCEDASTICITY
Command ‘estat hettest’ suggests I have heteroscedasticity, the null is
rejected.
‘sktest’ on the residuals suggests that they are not normally distributed.