INTRODUCTION TO ECONOMETRICS
Home » INTRODUCTION TO ECONOMETRICS

# INTRODUCTION TO ECONOMETRICS

L1090

Don't use plagiarized sources. Get Your Custom Essay on
INTRODUCTION TO ECONOMETRICS
Just from \$13/Page

THE UNIVERSITY OF SUSSEX

## INTRODUCTION TO ECONOMETRICS

Assessment Period: January 2018 (A1)

DO NOT TURN OVER UNTIL INSTRUCTED TO BY THE LEAD INVIGILATOR

Candidates must attempt BOTH questions in SECTION 1 and ONE question From SECTION 2. The use of approved calculators is permitted

Duration: 2 hours

Examination handout:  Statistical Tables and Formulae sheet

SECTION 1 – Answer both questions

1. A researcher is studying the ceteris paribus effect of reading scores on math scores. Using data on math scores, reading scores, family income and parental education, he estimates a number of models. The table below shows the OLS estimates, with standard errors in brackets.

 Model 1 Model 2 Model 3 Dependent variable Math score (0-100 scale) Math score (0-100 scale) Math score  (0-100 scale) Independent variables reading standardized score (0-100 scale) 0.714 (0.008) 0.677 (0.008) 0.602 (0.008) Natural logarithm of annual family income 1.691 (0.098) 0.941 (0.107) Mother’s education (completed years) 0.344 (0.046) Father’s education (completed years) 0.395 (0.042) Intercept 15.153 (0.432) -0.445 (1.004) -0.894 (0.992) N 7430 7430 7430 R2 0.504 0.523 0.539

1. a) [6 marks] Interpret carefully the coefficients in model 1.

Reading score: on average and ceteris paribus [1 point], an increase in reading score of 1 percentage point will increase the math score by

0.7 percentage point. [1 point for the units, 1 point for the effect

Intercept: For an individual with zero reading score [1 point], the

predicted math score is 15.15. [2 points]

Students lost marks for: not specifying units

Some students forgot to interpret the intercept

1. [5 marks]Test at the 5% level whether reading standardised score has a statistically significant effect on math standardised score, using the estimate in model 1.

H0 = b(reading) = 0 vs Ha b(reading) ≠0 [2 points] t-stat = 0.714/0.008 = 89.25 [1 point]

Critical value for the two tailed test at 5% with 7428 degrees of freedom: 1.96. [point]

Hence we reject1 the null and conclude that reading score has a stat sig effect on math score [1 point].

Most students answered this part correctly.

Some students set this up as a one tailed test: we awarded marks if they provided a rationale for why.

1. [2 marks]Interpret carefully the coefficient of the natural logarithm of annual family income in model 2.

LHS is in percentage point, income is now in logs, hence we divide the ln(income) coeff by 100. On average and ceteris paribus an increase in family income of 1% raises math score by 1.69/100=0.0169 percentage points.

Students lost marks for: not interpreting in terms of increase in

income

1. [3 marks] Using model 3, predict the math score of a student with a reading score of 50, whose mother and father finished 14 years of schooling with natural logarithm of annual family income of 10. [2 points for correct substitution, 1 point for correct predicted value]

Predicted math score = -0.894 + 0.602*50 + 0.094*10 + 0.344*14 +

0.395*14 = 40.49

Some students forgot to include the intercept here.

1. [2 marks] Is the intercept in model 3 meaningful? Explain your answer.

Based on model 3, an individual with zero reading score, zero family income and zero years of schooling for the father the predicted math score is -0.894. This is not meaningful. Since it is rare to observe an individual with zero income, zero reading score and zero family education the model does poorly in predicting the math score of such individual.

[1 point to explain what does the intercept capture, 1 point to discuss such observation is rare]

Only few students answered this part correctly. Some students explained A negative math score is not reasonable and they obtained a point for it.

1. [4 marks] Why did the coefficient and standard error of natural logarithm of annual family income changed after introducing mother and father’s education to the model.

The decrease in the coefficient suggests that parent’s education and family income are correlated. Therefore model 2, overestimated the effect of family income/it has upward bias [2 points].

The  increase in standard errors suggest that there is multicollinearity in the model, meaning part of the variation of family income is explained by father and mother’s education [2 points].

1. [5 marks]Test the overall significance of model 3 at the 1% level.

Ho: R2=0 vs Ha R2>0 (nb must be set up as 1-tailed) [2 points]

F test statistic [1 point]

Critical value at 1% value with 4 and 7425 degrees of freedom is

3.3192. [1 point]

Hence we reject the null hypothesis and conclude that the model does explain some of the variation in math score. [1 point]   Most students did this part correctly.

Students lost mark for not correctly specifying the null and alternative.

1. [3 marks] Suppose the researcher presents these results to a school manager and the manager says “The findings show that to improve math scores we just need to improve reading scores, so we should hire more reading tutors.” How would you respond to this comment? As a hint, if you instead regress read12 on math 12, what would you expect to find?

If we regress read12 on math12 we would expect to see similar results [1 point]. This result shows that holding everything else constant, students with higher reading score are predicted to have higher math score as well [1 point]. However, this does not mean higher reading score causes higher math score. This model does not show us the direction of the causality [1 point]

Students who answered this part, mostly explained that correlation does not mean causation but did not manage to explain we would expect similar results from such regression.

1. Briefly explain each of the following, clearly describe the relevant formulas when applicable :

1. a) [6 marks] OLS Residual sum of squares

The residual for each observation in the OLS regression is defined as the difference between actual observation and its predicted value from the estimated model: u_i= y_i- yhat_i. [4 points] Squaring the residual for each observation [1 point] and summing up these values for the sample would give us the residual sum of squares [1 point]

[Or 4 points to define residual and 2 points to explain sum of squared residuals]

[6 points to variation in dependent variable that is not explained by the

OLS model ]

1. [6 marks] Unbiased estimator

Ubiasedness is a feature of sampling distribution of betahat. The betahat across all possible samples is considered a random variable [2 points]. If the expected value of betahat is equal to its population value then we say betaht is an unbiased estimator of the true

population value [4 points]. If they say an estimator,parameter that follows GM assumption they get 4 points

1. [6 marks] Multicollinearity and a way to measure it                         A problem that arises when collinearity between two or more independent variables leads to a lack of statistically significant coefficients even when a satisfactory overall explanatory power of the model is obtained [3 points] (If students state in their own words that multicollinearity is correlation between two or more independent variables, they get 3 points). We can measure with variance inflation factor [1 point]: VIF_j=1/(1-r2_j). Where r2_j is the r2 of regressing variable j on all other explanatory variables.  [2 points]  1 point for the formula, 1 point for explaining what is r-squared_j

1. [6 marks] AR(1) process

A time-series process in which the series is produced by its once lagged value plus an independent error term: Yt = a + bYt-1 + ut

1. e) [6 marks] Homoskedasticity

Constant variance of error terms conditional on values of x.

SECTION 2 – Answer ONE question

1. A researcher estimates a model of the following form

Yt = a + b1X1t + b2X2t + ut            t=1 to 15

1. [8 marks] Explain what autocorrelation is, how it might arise and discuss the consequences for the OLS estimates.

Autocorrelation is a problem affecting the errors, they are no longer iid and are serially correlated with each other. A simple case is where ut and ut-1 are correlated. [4 points]

It can occur for various reasons : inertia in time series data, or lags in responses to events, adjustment etc. If students provide meaningful examples for this part it is acceptable. [1 points]

OLS estimates are no longer BLUE and tests based on t and F are no  longer valid. [3 points]

1. [10 marks] The correlation coefficient between the residuals and the lagged residuals from the model is calculated to be -0.574. Use this to implement a test for autocorrelation, specifying clearly the null and alternative hypotheses. Interpret your results.

This provides the basis for the DW test: [1 mark for correct recognition] DW=2(1-p) where p is the correlation between the errors. So DW=3.148. [1  point]

The null being tested is that 𝑯𝟎 :  = 0  and 𝑯𝒂 :  < 0  [2  point]. DW is therefore bounded by 0 and 4.

The critical values, with 2 independent variables and n=15 in our original  model, are 0.946 and 1.543, and we use these to mark on the bounds of  the inconclusive region. [2  point]

Decision Rule     [3 point]

Either graph or the decision rule below.

Students don’t have to draw the graph exactly as below but they must show the decision zones on a horizontal axis, alternatively they can clearly write the decision rule.

If 𝑫𝒘 > 𝟒 − 𝒅𝑳                 reject  𝑯𝟎      If 𝑫𝒘 <𝟒 − 𝒅𝑼                  do not reject 𝑯𝟎      If 𝟒 − 𝒅𝑼< 𝑫𝒘 <𝟒 − 𝒅𝑳   test is inconclusive

Based on these value the test result is rejecting the null, in favour of negative autocorrelation  [1  point]

1. c) [10 marks] The researcher estimates a second model of the form:

Yt =  a + b1X1t + b2X2t + b3Yt-1 + ut           t=1 to 15

and obtains a Durbin-Watson statistic of 1.86. The coefficient of Yt-1 is estimated to be 0.79 with a standard error of 0.09. Use this information to test for autocorrelation in this model.

Since we have a lagged dependent variable [1  point] we need to use durbin’s h test , which is defined as

T

hˆ* )    [3  points]

1T var()

We need to use the estimated DW to get p:DW=2(1-p), 1.86=2(1-p) so p=0.07 [2 points] , and using the info about we calculate

14        14

h 0.07*    2  0.07*    0.28 114*0.09      0.887

approx.  [2 points for correct calculation]

NB T=15 in original model but now we have a lagged dep var, T=14.

[Students lose one point for using 15 instead of 14]

This is less than any critical value at conventional levels from the normal   tables so we can conclude that we no longer have serial correlation in the model. [2  points]

1. d) A Breusch-Godfrey test with four lags is implemented, yielding the following output in Stata:

Breusch-Godfrey LM test for autocorrelation

————————————————————————— lags(p) |          chi2               df                 Prob > chi2

————————————————————————— 4      |         14.576               4                   0.0057

—————————————————————————

H0: no serial correlation

1. [6 marks] What is the conclusion of the Breusch-Godfrey test?

We are given the test stat=14.576 [1  point]. From the table of critical values for chi-squared distribution we obtain the critical value at 5% level of significance with 4 degrees of freedom [2  point]. The critical value is

9.49 [1  point]

(Students get one point for correct critical values at different significant levels: at 1% critical value is: 13.28 at 10% it is: 7.779)

Since the test stat is larger than the critical value, so we reject the null of no serial correlation and conclude that we do have an AR(4) error structure. [2  points]

1. [6 marks] Explain how the Breusch-Godfrey test is implemented.

The BG test is an LM test and implemented with an auxiliary regression. The residuals from the regression are saved and regressed on the X variables plus the errors lagged 1,2,3 and 4 periods.

Ut=a+b1x1t +b2x2t +1ut-1 +2ut-2 + 3ut-3 + 4ut-4

The R2 from this auxiliary equation is obtained and the test statistic (Tq)R2 is calculated.

This follows a chi-squared distribution with 4 (because we are allowing for 4 lags) degrees of freedom.

1. A researcher wants to test whether there is a difference in college GPA (grade point average) for male and female college athletes. He obtains the following:

𝐺𝑃𝐴̂ = 1.39 − 0.310 female + 0.02SAT − 0.09 hsperc + 0.03 tothrs

(0.18)   (0.05)            (0.002)        (0.012)         (0.007)

His sample size is 366. GPA is the average college GPA, female is a dummy variable being equal to 1 for females and 0 for males, SAT is the SAT scores hsperc is the high school rank percentile (for example a student is among the top 5%, top 20% etc.), tothrs is the total hours of college courses, and the R2 =0.398.

1. a) [6 marks] Test whether being a female has significant effect on college GPA at the 5% level.

[2 points for the null and alternative of two sided test]

[1 point for test stat = 0.310/0.05= 6.2]

• point for critical value = 1.96]
• points for interpretation: We reject the null and conclude being a female has a significant effect on GPA]

1. [5 marks] Calculate the difference between male and female GPAs based on the above results, holding other factors constant.

women’s GPA in this sample is 0.310 points lower than that of men

1. The model is estimated again with an interaction term between female and the rest of the explanatory variables and the following results are obtained (R2 =0.366):

𝐺𝑃𝐴̂ = 1.48 −.35 female + 0.01SAT + 0.0007 female. sat − 0.08 hsperc

(0.2)   (0.04)            (0.002)      (0.0004)                    (0.01)

−0.005 female. hsperc + 0.02 tothrs − 0.001 female. tothrs +

(0.003)                              (0.001)           (0.002)

1. [5 marks] Predict the difference in college GPA between males and females at SAT=1100, hsperc = 10, and tothrs= 50. Compare your results with part (b) above.

The predicted difference between a woman and a man with the above characteristics is = 0.32, which shows at these values the GPA of women is predicted to be 1.02 points higher than that of men.  (Male GPA = 12.68, Female GPA= 13)

1. [10 marks] Interpret the effect of tothrs on college GPA and how this differs between male and female college athletes.

For males the effect of tothrs on GPA is just the coeff of tothrs( 2 marks): a 1 hour increase in tothrs raises GPA by 0.02 points, on average and ceteris paribus. ( 2 marks)

For females the effect of tothrs on GPA is (0.02-0.001=0.019), (2 marks) i.e. a 1 hour increase in courses taken raises GPA by 0.019 points, on average and ceteris paribus (2 marks)

Therefore the effect of an extra hour of courses has lower return on female’s GPA (2 mark).

• [14 marks] Based on the model in part (c) state the null hypothesis that college GPA follows the same model for males and females. Test this null with 5% significant, if the R2 of the restricted model is 0.352.Sate the result of your test. Would you reach the same conclusion if you instead test the significance of each term in the second model that allows men and women to be different?

The null: the coeff of female and all interactions = 0.  (3 marks) The alternative: at least one of the coefficients is different from zero.

F statistic = [(Rsquared_ur – Rsquared_r)/4]/[(1 – Rsquared_ur)/ 366-7-1)] = 1.974.

(4 marks: 1 for correct 4, 1 for correct 358, 1 for correct restricted and unrestricted and 1 for correct F-stat value)

Critical value from F-dist at 5% level of significance is 2.37 (1 mark)

We fail to reject the null. and don’t have enough evidence that men and women athletes follow different GPA models. (2 marks)

No we don’t (1 mark) because the coefficient on female is statistically significant (1 mark). While none of the interaction terms are significant. (1 mark) we could do another chow test to test for the difference in intercepts (1 mark)

1. Consider the estimated model relating R&D intensity to firm sales:

RD̂ = 𝟎. 26 + 0.04 sales − 0.007sales2

(1.4)         (0.01)              (0.002)

Variables are measured in thousand dollars. N=32, R2 = 0.148.

1. [5 marks] What is the purpose of sales squared term in the model?

sales^2 picks up non-linear relationships R and D intensity and sales. Amount of intensity to the sales might vary patterns may vary with levels of sales.

In this model it has a negative coefficient so that that suggests that while sales increase the R&D it does so at a decreasing rate, before  eventually decreasing.

The t-stat  is 3.5 which is above conventional critical values, so there is some justification for the variable to be included.

1. [5 marks] At what point does the marginal effect of sales on RD become negative?

RD

 0.042*0.007sales= 0  therefore almost at sales = 2.85 the R&D

sales

intensity   falls.

1. [5 marks] Would you keep the quadratic term in the mode? Explain.

Yes the coefficient is statistically significant, indicating a nonlinear relationship between sales and R&D.

1. [5 marks] Sketch a graph illustrating the relationship between RD and sales, be sure to label your graph.

An inverted U-shaped graph with sales on vertical, R&D on horizontal and the peak of the graph indicated as the turning point sales=2.85

1. [20 marks] Describe how you would implement TWO tests for heteroscedasticity.

Any two from:

White’s test;  Breusch pagan ; Goldfeld Quant

In your answers null must be clear, form of auxiliary regression (if relevant), and test statistic accurately described.

• Ho: 21 = 22 (Homoscedastic error variance)
• Ha: 21 < 22 (Heteroscedastic error variance)

## Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26
The price is based on these factors:
Number of pages
Urgency
Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee
On-demand options
• Writer’s samples
• Part-by-part delivery
• Overnight delivery
• Copies of used sources
Paper format
• 275 words per page
• 12 pt Arial/Times New Roman
• Double line spacing
• Any citation style (APA, MLA, Chicago/Turabian, Harvard)

## Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

### Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

### Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

### Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.