L1090
THE UNIVERSITY OF SUSSEX
BA AND BSc
Jan 2019 (A1)
Table of Contents
Assessment Period: January 2019 (A1)
DO NOT TURN OVER UNTIL INSTRUCTED TO BY THE LEAD INVIGILATOR
Model 1 | Model 2 | Model 3 | |
Dependent variable | educ | educ | educ |
Independent variables | |||
motheduc | 0.30
(0.03) |
– | 0.17
(0.03) |
fatheduc | 0.19
(0.02) |
– | 0.11
(0.02) |
abil | – | 0.52
(0.03) |
0.39
(0.03) |
abil 2 | – | 0.05
(0.01) |
0.05
(0.01) |
Number of siblings | -0.15
(0.03) |
-0.10
(0.03) |
|
Intercept | 6.94
(0.32) |
12.14 (0.12) | 8.74
(0.31) |
N | 1230 | 1230 | 1230 |
SSR | 5114.31 | 4193.89 | 3741.82 |
R2 | 0.25 | 0.38 | 0.45 |
The ceteris paribus effect is the effect of one variable on the dependent variable when all other factors are held constant [2 marks].
In simple linear regression, we assume there are no other factors that affect the dependent variable apart from the only explanatory variable of the model. The effect of all those other factors are assumed to be zero on average, and they show up in the error term of the model.
In Multiple linear regression we explicitly control for all factors that we believe might have an effect on the dependent variable. Hence by controlling for those factors in the model, we are more confident that we can capture the ceteris paribus effect of the variable of interest on the dependent variable [3 marks].
Model 1 has 2 slope coefficients and one intercept.
The coefficient of motheduc predicts that an extra level of education qualification of the mother, increasesd the respondent’s education by 0.3 grade, on average and ceteris paribus.
This effect is predicted to be 0.19 of a grade, for an extra grade of father’s education, on average and ceteris paribus.
The intercepts predicts that for individuals with zero parental education, the highest level of education is on average 6.94. [2 marks per coefficient].
The coefficient of abil 2 measures the marginal effect of ability on educ (students can explain this in terms of nonlinear effect, or how the effect of ability varies at different levels and is not fixed) [2 marks]
To test for linear returns we test whether the coefficient of abil 2 is statistically different from zero. The null hypothesis of this test is:
𝛽𝑎𝑏𝑖𝑙 2 = 0 [1 mark]
Against the alternative 𝛽𝑎𝑏𝑖𝑙 2 ≠ 0. [1 mark] , no marks for a one-tailed alternative.
t-[1 mark]
To test this null we need to compare the test statistics with the critical value from z distribution. The critical value at 5% significance level, for a two tailed test is 1.96.
[1 mark]
If student lost a mark above for setting a one tailed test but they have obtained the correct critical value for a one tailed test they should be given the 1 mark for this section.
Since the test statistics is greater than the critical value, we reject the null that the returns to ability are linear. [1 mark]
Here we are testing that father’s and mother’s education have no effect on the dependent variable, hence the null hypothesis is 𝛽motheduc = 𝛽fatheduc = 0. This hypothesis excludes two of the variables from the model.
To test this hypothesis, we compare the r-squared of the restricted with unrestricted model, through an F-test.
The unrestricted model is model (3) and the restricted model is model (2) were we have excluded motheduc and fatheduc from the model.
The test statistic is
Where is the r-squared of the unrestricted model, model 3 in this case. is the r-squared of restricted model, model 2. is the number of restrictions, 2 restrictions in this case.
𝑘 is the number of independent variables in the unrestricted model, k= 5 in this case.
F= critical value for 𝐹2 ,1230−5−1, at 1% significance level is 4.605. Since the F-statistic is greater than the critical value we reject the null that father’s and mother’s education has no effect on an individual’s education attainment.
Students can choose a different significance level to form their answer.
Comparing model 1 and model 3, we can see that after controlling for ability, the coefficients of both mother’s and father’s education have reduced [2 marks for commenting on the effect].
This can suggest that ability is correlated with the dependent variable and with father’s and mother’s education [3 marks for mentioning the correlation between ability with and other regressors].
without further information we cannot verify the size of the bias. Model 3 controls for ability_squared and number of siblings too, therefore we cannot isolate the omitted variable bias that is due to excluding ability from the model.
One possible scenario is that ability has a positive, though non-linear effect on educational attainment (as seen by its coefficient). It might be the case that father’s and mother’s education are positively correlated with ability as well. Therefore, not controlling for ability will create an omitted variable bias. If people whose parents are better educated have higher level of ability, then not controlling for ability results in an upward bias in the coefficient of father’s and mother’s education.
[5 marks for a similar discussion on omitted variable bias].
We can do an F-test for the statistical significance of the overall model:
The null hypothesis of the test is that all the coefficients are zero, against the alternative that at least one coefficient is different from zero.
R 2 /(k)
F = (1 R 2 )/(n−k−1) ~ Fk,n−k−1
−
F =
The F-statistic is 200.2 and the critical value for 𝐹5,1224 at 1% significance level is 3.017. Hence we reject the null and conclude that the model is overall significant.
1 mark for correctly indicating each value.
The 𝑅2 is a measure of goodness of it and it indicates the proportion of the variation in y that is explained by our fitted line.
The 𝑅2 can be measured by dividing the explained sum of variations in y by total sum of variations in y:
Students can use other version of the r-squared formula.
[5 marks]
Homoscedasticiy is one of the Gauss-Markov assumptions and it says that in a simple or multiple linear regression model, the errors of the regression have the same variance given any values of the explanatory variable(s).
Or an answer like this: Constant variance of error terms conditional on values of x.
Stationarity has to do with the joint distribution of a process as it moves through time. A time series is stationary if its stochastic properties and its temporal dependence structure do not change over time.
The residual for an observation, is the difference between the actual observation and its predicted value.
A problem that arises when collinearity between two or more independent variables leads to a lack of statistically significant coefficients even when a satisfactory overall explanatory power of the model is obtained. (If students state in their own words that multicollinearity is correlation between two or more independent variables that can lead to an increase in standard errors of estimators, they should be given full marks
HRi = β0 + β1UEi + β2INCOMEi + β3SOUTHi +β4ETHNICi + ui, i=1,….51
Where HR is the number of homicides (murders) per 100,000 population in state i, UE is the male unemployment rate in percentages, INCOME is mean per capita income in dollars, SOUTH is a binary variable which takes a value of 1 if the state is southern, 0 otherwise, and ETHNIC is the percentage of the state population that is not white.
The model is estimated with OLS and the following results are found (standard errors are shown in brackets).
𝐻𝑅̂ = -8 + 0.65 UE + 0.0005INCOME + 2.4SOUTH +0.21ETHNIC (1)
(1.3) (0.26) (0.0002) (1.0) (0.04)
R2=0.58
Interpretation: On average and ceteris paribus, the homicide rate in the south is predicted to be 2.4 per 100,000 population higher than other states.
To test for the significance of the coefficient, we use a two-tailed t-test as the following:
𝐻0: 𝛽𝑠𝑜𝑢𝑡ℎ = 0
𝑯𝟏: 𝛽𝑠𝑜𝑢𝑡ℎ ≠ 𝟎
The t-stat = (under the null, the t-stat follows a t-distribution with n-k-1 degrees of freedom)
n=51, and k=4. The critical value at 5% significance level with 46 degrees of freedom is 2.021.
At 5% level of significance, the t-stat is larger than the critical value, hence we reject the null and conclude that the coefficient of south is statistically significant.
𝐻𝑅̂ = -15.7 + 0.99UE + 0.0009INCOME + 18.7SOUTH +0.19ETHNIC
2.4 |
(3.2) (0.31) (0.0003) (7.8) (0.04)
-0.88SOUTH*UE -0.0008SOUTH*INCOME – 0.12SOUTH*ETHNIC (2)
(0.54) (0.0004) (0.04)
R2=0.62
In northern states the effect of UE on HR is just the coefficient of UE: 1 percentage point increase in UE raises HR by 0.99 murders per 100000 population, on average and ceteris paribus.
In southern states the effect of UE on HR is (0.99 – 0.88=) 0.11. One percentage point rise in UE in southern states raises HR by 0.11 homicides per 100000 population, on average and ceteris paribus.
Northern states:
HR=-15.7 + 0.99UE + 0.0009INCOME +0.19ETHNIC
Southern states:
HR = -15.7 + 0.99UE + 0.0009INCOME + 18.7 +0.19ETHNIC -0.88*UE -0.0008*INCOME – 0.12*ETHNIC
The expression for the predicted difference is:
∆𝑯𝑹 = 18.7-0.88*UE – 0.0008*INCOME – 0.12*ETHNIC
Yes we can. The model in part (a) restricts the determinants of homicide rate to be the same between southern and northern states while the model in part (c) allows these determinants to be different. [2 marks] If the answer says no we can’t as we don’t have the RSS of the two models, students should be given the 2 points, as they only practiced this F-test with the RSS and not r-squared.
To test whether these coefficients are statistically different between the two regions we use an F-test to test whether the interacted coefficients are jointly significant:
𝐻0: the three interaction terms have coefficients=0
𝐻𝐴: At least one of these three coefficients is different from zero
Then we use the 𝑅2 of restricted and unrestricted models to construct the F-statistics and carry out an F-test.
The and the , the number of restrictions is 3.
[8 marks]
(up to this point suffices for full mark for this question as students were not asked to carry out the F-test).
F=((0.62-0.58)/3)((1-0.62)/(51-7-1))=0.01333/0.00883=1.51
Critical value of F at 5% with 3 and 43 degrees of freedom =2.84
So we do not reject the null and we conclude that determinants of HR are not statistically different between southern and non-southern states.
𝑌𝑡= a + 𝑏1𝑋1𝑡 + 𝑏2𝑋2𝑡 + 𝑢𝑡 t=1 to 16
Autocorrelation refers to the correlation between error terms over time. More specifically autocorrelation is the violation of the following assumption:
Conditional on the explanatory variables, the unobserved factors must not be correlated over time.
[4 marks]
Autocorrelation can occur for various reasons. For example, when conditional on knowing the values of the independent variables, omitted factors are correlated over time we will have an autocorrelation issue.
Another situation that might result in autocorrelation is when past values of the dependent variable feed forward to future values of explanatory variables. [2 marks – one reason is enough]
The consequence of autocorrelation on OLS parameters are that the OLS estimates are no longer BLUE and tests based on t and F are no longer valid. [4 marks]
This provides the basis for the DW test: [2 marks for correct recognition]
DW=2(1-p) where p is the correlation between the errors. So DW=1.088.
The null being tested is that p=0 and the alternative is that p0. DW is bounded by 0 and 4, sine the highest correlation coefficient is 1.
The critical values, with 2 independent variables and n=16 in our original model, are approximately 0.946 and 1.543, and we use these to mark on the bounds of the inconclusive region.
𝑌𝑡= a + 𝑏1𝑋1𝑡 + 𝑏2𝑋2𝑡 + 𝑏3𝑌𝑡−1 + 𝑢𝑡 t=1 to 16
and obtains a Durbin-Watson statistic of 1.75. The coefficient of 𝑌𝑡−1 is estimated to be 0.65 with a standard error of 0.06. Use this information to test for autocorrelation in this model. [10 marks]
Since we have a lagged dependent variable we need to use Durbin’s h test, which is defined as
T
h=ˆ*
1−T var()
[ 4 marks for recognising the test statistic]
We need to use the estimated DW to get p
DW=2(1-p), 1.75=2(1-p) so p= 0.125, and using the info about we calculate [2 marks]
[1 mark]
T=16 in original model but now we have a lagged dependent variable, T=15. [1 mark, student should not be penalised twice if they insert wrong T in the formula above. They should lose only the 1 mark dedicated to correct realisation of T].
The null and alternative is the same as part (b) of this question. Under the null the h statistic would follow a standard normal distribution [1 marks].
This h is less than any critical value at conventional levels from the z table so we cannot reject the null that there is no serial correlation [1 marks].
𝑡𝑜𝑏𝑎𝑐𝑐𝑜̂ = -3.64 + 0.88income – 0.501educ + 0.571age – 0.0057age2
(24.08) (0.728) (0.167) (0.160) (0.0017)
where tobacco is the number of cigarettes consumed per week, income is annual income measured in £1000s, educ is years of schooling and age is the age of each customer, also measured in years. (Standard errors are in parentheses).
The age squared controls for nonlinear relationship between age and tobacco consumption. This says that the effect of an increase in age on tobacco consumption depends on the level of age. [2 marks]
If the coefficient of age-squared is statistically significant then it should be included in the model. In this case, the t-statistics is 0.0057/0.0017 = 3.35. With a sample size of 807 we can compare this statistic with the critical value from z distribution. At 1% level of significance the critical value is approximately 2.57 hence the age-squared should be included in the model. [2 marks]
∆𝑡𝑜𝑏𝑎𝑐𝑐𝑜̂ = 0.571∆𝑎𝑔𝑒 − 0.0114 𝑎𝑔𝑒 ∗ ∆𝑎𝑔𝑒
[2 marks for either of these expressions]
For ∆𝑎𝑔𝑒 = 1 at different age levels we have:
At 20 years old, the effect of getting one year older on tobacco consumption is an increase in consumption by 0.343 of a cigarette per week:
∆𝑡𝑜𝑏𝑎𝑐𝑐𝑜̂ = 0.571 ∗ 1 − 0.0114 ∗ 20 ∗ 1 = 0.343
At 60 years old on the other hand, the effect of getting one year older on tobacco consumption is a decrease of 0.113 of a cigarette per week:
∆𝑡𝑜𝑏𝑎𝑐𝑐𝑜̂ = 0.571 ∗ 1 − 0.0114 ∗ 60 ∗ 1 = -0.113
[4 marks for similar answer]
age
[4 marks]
To find the turning point we need to use the first order condition and set the first derivative equal to zero:
= 0
Solving for this gives age=50. Therefore at age 50, cigarette consumption starts to decline. [3 marks]
We can do an F-test for the statistical significance of the overall model:
The null hypothesis of the test is that all the coefficients are zero, against the alternative that at least one coefficients is different from zero. R 2 /(k)
F = (1 R 2 )/(n−k−1) ~ Fk,n−k−1 −
F =
The F-statistic is 11.22 and the critical value for 𝐹5,1224 at 5% significance level is 2.371. Hence we reject the null and conclude that the model is overall significant.
The standard error of OLS estimators are not the smallest under heteroskedasticiy ii) The F-tests [3 marks]
With heterokedastic errors, the F-statistic does not follow an F distribution, making the test invalid
iii) The bias in OLS estimators [3 marks]
Hetersokedasticiy has no effect on the bias.
END OF PAPER
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more