L1090
THE UNIVERSITY OF SUSSEX
Assessment Period: January 2018 (A1)
DO NOT TURN OVER UNTIL INSTRUCTED TO BY THE LEAD INVIGILATOR
Candidates must attempt BOTH questions in SECTION 1 and ONE question From SECTION 2. The use of approved calculators is permitted
Duration: 2 hours
Examination handout: Statistical Tables and Formulae sheet
SECTION 1 – Answer both questions
Model 1 | Model 2 | Model 3 | |
Dependent variable | Math score (0-100 scale) | Math score (0-100 scale) | Math score (0-100 scale) |
Independent variables | |||
reading standardized score (0-100 scale) | 0.714
(0.008) |
0.677
(0.008) |
0.602
(0.008) |
Natural logarithm of annual family income | 1.691
(0.098) |
0.941
(0.107) |
|
Mother’s education (completed years) | 0.344
(0.046) |
||
Father’s education (completed years) | 0.395
(0.042) |
||
Intercept | 15.153
(0.432) |
-0.445
(1.004) |
-0.894
(0.992) |
N | 7430 | 7430 | 7430 |
R2 | 0.504 | 0.523 | 0.539 |
Reading score: on average and ceteris paribus [1 point], an increase in reading score of 1 percentage point will increase the math score by
0.7 percentage point. [1 point for the units, 1 point for the effect]
Intercept: For an individual with zero reading score [1 point], the
predicted math score is 15.15. [2 points]
Students lost marks for: not specifying units
Some students forgot to interpret the intercept
H0 = b(reading) = 0 vs Ha b(reading) ≠0 [2 points] t-stat = 0.714/0.008 = 89.25 [1 point]
Critical value for the two tailed test at 5% with 7428 degrees of freedom: 1.96. [point]
Hence we reject1 the null and conclude that reading score has a stat sig effect on math score [1 point].
Most students answered this part correctly.
Some students set this up as a one tailed test: we awarded marks if they provided a rationale for why.
LHS is in percentage point, income is now in logs, hence we divide the ln(income) coeff by 100. On average and ceteris paribus an increase in family income of 1% raises math score by 1.69/100=0.0169 percentage points.
Students lost marks for: not interpreting in terms of increase in
income
Predicted math score = -0.894 + 0.602*50 + 0.094*10 + 0.344*14 +
0.395*14 = 40.49
Some students forgot to include the intercept here.
Based on model 3, an individual with zero reading score, zero family income and zero years of schooling for the father the predicted math score is -0.894. This is not meaningful. Since it is rare to observe an individual with zero income, zero reading score and zero family education the model does poorly in predicting the math score of such individual.
[1 point to explain what does the intercept capture, 1 point to discuss such observation is rare]
Only few students answered this part correctly. Some students explained A negative math score is not reasonable and they obtained a point for it.
The decrease in the coefficient suggests that parent’s education and family income are correlated. Therefore model 2, overestimated the effect of family income/it has upward bias [2 points].
The increase in standard errors suggest that there is multicollinearity in the model, meaning part of the variation of family income is explained by father and mother’s education [2 points].
Ho: R2=0 vs Ha R2>0 (nb must be set up as 1-tailed) [2 points]
F test statistic [1 point]
Critical value at 1% value with 4 and 7425 degrees of freedom is
3.3192. [1 point]
Hence we reject the null hypothesis and conclude that the model does explain some of the variation in math score. [1 point] Most students did this part correctly.
Students lost mark for not correctly specifying the null and alternative.
If we regress read12 on math12 we would expect to see similar results [1 point]. This result shows that holding everything else constant, students with higher reading score are predicted to have higher math score as well [1 point]. However, this does not mean higher reading score causes higher math score. This model does not show us the direction of the causality [1 point].
Students who answered this part, mostly explained that correlation does not mean causation but did not manage to explain we would expect similar results from such regression.
The residual for each observation in the OLS regression is defined as the difference between actual observation and its predicted value from the estimated model: u_i= y_i- yhat_i. [4 points] Squaring the residual for each observation [1 point] and summing up these values for the sample would give us the residual sum of squares [1 point].
[Or 4 points to define residual and 2 points to explain sum of squared residuals]
[6 points to variation in dependent variable that is not explained by the
OLS model ]
Ubiasedness is a feature of sampling distribution of betahat. The betahat across all possible samples is considered a random variable [2 points]. If the expected value of betahat is equal to its population value then we say betaht is an unbiased estimator of the true
population value [4 points]. If they say an estimator,parameter that follows GM assumption they get 4 points
A time-series process in which the series is produced by its once lagged value plus an independent error term: Yt = a + bYt-1 + ut
Constant variance of error terms conditional on values of x.
SECTION 2 – Answer ONE question
Yt = a + b1X1t + b2X2t + ut t=1 to 15
Autocorrelation is a problem affecting the errors, they are no longer iid and are serially correlated with each other. A simple case is where ut and ut-1 are correlated. [4 points]
It can occur for various reasons : inertia in time series data, or lags in responses to events, adjustment etc. If students provide meaningful examples for this part it is acceptable. [1 points]
OLS estimates are no longer BLUE and tests based on t and F are no longer valid. [3 points]
This provides the basis for the DW test: [1 mark for correct recognition] DW=2(1-p) where p is the correlation between the errors. So DW=3.148. [1 point]
The null being tested is that 𝑯𝟎 : = 0 and 𝑯𝒂 : < 0 [2 point]. DW is therefore bounded by 0 and 4.
The critical values, with 2 independent variables and n=15 in our original model, are 0.946 and 1.543, and we use these to mark on the bounds of the inconclusive region. [2 point]
Decision Rule [3 point]
Either graph or the decision rule below.
Students don’t have to draw the graph exactly as below but they must show the decision zones on a horizontal axis, alternatively they can clearly write the decision rule.
If 𝑫𝒘 > 𝟒 − 𝒅𝑳 reject 𝑯𝟎 If 𝑫𝒘 <𝟒 − 𝒅𝑼 do not reject 𝑯𝟎 If 𝟒 − 𝒅𝑼< 𝑫𝒘 <𝟒 − 𝒅𝑳 test is inconclusive
Based on these value the test result is rejecting the null, in favour of negative autocorrelation [1 point]
Yt = a + b1X1t + b2X2t + b3Yt-1 + ut t=1 to 15
and obtains a Durbin-Watson statistic of 1.86. The coefficient of Yt-1 is estimated to be 0.79 with a standard error of 0.09. Use this information to test for autocorrelation in this model.
Since we have a lagged dependent variable [1 point] we need to use durbin’s h test , which is defined as
T
hˆ* ) [3 points]
1T var()
We need to use the estimated DW to get p:DW=2(1-p), 1.86=2(1-p) so p=0.07 [2 points] , and using the info about we calculate
14 14
h 0.07* 2 0.07* 0.28 114*0.09 0.887
approx. [2 points for correct calculation]
NB T=15 in original model but now we have a lagged dep var, T=14.
[Students lose one point for using 15 instead of 14]
This is less than any critical value at conventional levels from the normal tables so we can conclude that we no longer have serial correlation in the model. [2 points]
Breusch-Godfrey LM test for autocorrelation
————————————————————————— lags(p) | chi2 df Prob > chi2
————————————————————————— 4 | 14.576 4 0.0057
—————————————————————————
H0: no serial correlation
We are given the test stat=14.576 [1 point]. From the table of critical values for chi-squared distribution we obtain the critical value at 5% level of significance with 4 degrees of freedom [2 point]. The critical value is
9.49 [1 point]
(Students get one point for correct critical values at different significant levels: at 1% critical value is: 13.28 at 10% it is: 7.779).
Since the test stat is larger than the critical value, so we reject the null of no serial correlation and conclude that we do have an AR(4) error structure. [2 points]
The BG test is an LM test and implemented with an auxiliary regression. The residuals from the regression are saved and regressed on the X variables plus the errors lagged 1,2,3 and 4 periods.
Ut=a+b1x1t +b2x2t +1ut-1 +2ut-2 + 3ut-3 + 4ut-4
The R2 from this auxiliary equation is obtained and the test statistic (Tq)R2 is calculated.
This follows a chi-squared distribution with 4 (because we are allowing for 4 lags) degrees of freedom.
𝐺𝑃𝐴̂ = 1.39 − 0.310 female + 0.02SAT − 0.09 hsperc + 0.03 tothrs
(0.18) (0.05) (0.002) (0.012) (0.007)
His sample size is 366. GPA is the average college GPA, female is a dummy variable being equal to 1 for females and 0 for males, SAT is the SAT scores hsperc is the high school rank percentile (for example a student is among the top 5%, top 20% etc.), tothrs is the total hours of college courses, and the R2 =0.398.
[2 points for the null and alternative of two sided test]
[1 point for test stat = 0.310/0.05= 6.2]
women’s GPA in this sample is 0.310 points lower than that of men
𝐺𝑃𝐴̂ = 1.48 −.35 female + 0.01SAT + 0.0007 female. sat − 0.08 hsperc
(0.2) (0.04) (0.002) (0.0004) (0.01)
−0.005 female. hsperc + 0.02 tothrs − 0.001 female. tothrs +
(0.003) (0.001) (0.002)
The predicted difference between a woman and a man with the above characteristics is = 0.32, which shows at these values the GPA of women is predicted to be 1.02 points higher than that of men. (Male GPA = 12.68, Female GPA= 13)
For males the effect of tothrs on GPA is just the coeff of tothrs( 2 marks): a 1 hour increase in tothrs raises GPA by 0.02 points, on average and ceteris paribus. ( 2 marks)
For females the effect of tothrs on GPA is (0.02-0.001=0.019), (2 marks) i.e. a 1 hour increase in courses taken raises GPA by 0.019 points, on average and ceteris paribus (2 marks).
Therefore the effect of an extra hour of courses has lower return on female’s GPA (2 mark).
The null: the coeff of female and all interactions = 0. (3 marks) The alternative: at least one of the coefficients is different from zero.
F statistic = [(Rsquared_ur – Rsquared_r)/4]/[(1 – Rsquared_ur)/ 366-7-1)] = 1.974.
(4 marks: 1 for correct 4, 1 for correct 358, 1 for correct restricted and unrestricted and 1 for correct F-stat value)
Critical value from F-dist at 5% level of significance is 2.37 (1 mark)
We fail to reject the null. and don’t have enough evidence that men and women athletes follow different GPA models. (2 marks)
No we don’t (1 mark) because the coefficient on female is statistically significant (1 mark). While none of the interaction terms are significant. (1 mark) we could do another chow test to test for the difference in intercepts (1 mark)
RD̂ = 𝟎. 26 + 0.04 sales − 0.007sales2
(1.4) (0.01) (0.002)
Variables are measured in thousand dollars. N=32, R2 = 0.148.
sales^2 picks up non-linear relationships R and D intensity and sales. Amount of intensity to the sales might vary patterns may vary with levels of sales.
In this model it has a negative coefficient so that that suggests that while sales increase the R&D it does so at a decreasing rate, before eventually decreasing.
The t-stat is 3.5 which is above conventional critical values, so there is some justification for the variable to be included.
RD
0.042*0.007sales= 0 therefore almost at sales = 2.85 the R&D
sales
intensity falls.
Yes the coefficient is statistically significant, indicating a nonlinear relationship between sales and R&D.
An inverted U-shaped graph with sales on vertical, R&D on horizontal and the peak of the graph indicated as the turning point sales=2.85
Any two from:
White’s test; Breusch pagan ; Goldfeld Quant
In your answers null must be clear, form of auxiliary regression (if relevant), and test statistic accurately described.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more