 INTRODUCTION TO ECONOMETRICS Assessment Period: Jan 2019
Home » INTRODUCTION TO ECONOMETRICS Assessment Period: January 2019

# INTRODUCTION TO ECONOMETRICS Assessment Period: January 2019 L1090

Don't use plagiarized sources. Get Your Custom Essay on
INTRODUCTION TO ECONOMETRICS Assessment Period: January 2019
Just from \$13/Page

THE UNIVERSITY OF SUSSEX

BA AND BSc

Jan 2019 (A1)

## INTRODUCTION TO ECONOMETRICS

Assessment Period: January 2019 (A1)

DO NOT TURN OVER UNTIL INSTRUCTED TO BY THE LEAD INVIGILATOR

## SECTION A – Answer all questions

1. A researcher is studying the effect of parents’ level of education on educational achievement of individuals. Using data on highest grade of education completed (educ), mother’s level of education (motheduc), father’s level of education (fatheduc), a measure of cognitive ability (abil), and the number of siblings per individual, the researcher estimates a number of models. The table below shows the OLS estimates, with standard errors in brackets.

 Model 1 Model 2 Model 3 Dependent variable educ educ educ Independent variables motheduc 0.30 (0.03) – 0.17 (0.03) fatheduc 0.19 (0.02) – 0.11 (0.02) abil – 0.52 (0.03) 0.39 (0.03) abil 2 – 0.05 (0.01) 0.05 (0.01) Number of siblings -0.15 (0.03) -0.10 (0.03) Intercept 6.94 (0.32) 12.14 (0.12) 8.74 (0.31) N 1230 1230 1230 SSR 5114.31 4193.89 3741.82 R2 0.25 0.38 0.45

1. First explain what is a ceteris paribus effect. Secondly, regarding ceteris paribus analysis, explain the advantage of using multiple regression analysis, over simple regression analysis. [5 marks]

The ceteris paribus effect is the effect of one variable on the dependent variable when all other factors are held constant [2 marks].

In simple linear regression, we assume there are no other factors that affect the dependent variable apart from the only explanatory variable of the model. The effect of all those other factors are assumed to be zero on average, and they show up in the error term of the model.

In Multiple linear regression we explicitly control for all factors that we believe might have an effect on the dependent variable. Hence by controlling for those factors in the model, we are more confident that we can capture the ceteris paribus effect of the variable of interest on the dependent variable [3 marks].

1. Interpret each of the OLS estimates in model (1). [6 marks]

Model 1 has 2 slope coefficients and one intercept.

The coefficient of motheduc predicts that an extra level of education qualification of the mother, increasesd the respondent’s education by 0.3 grade, on average and ceteris paribus.

This effect is predicted to be 0.19 of a grade, for an extra grade of father’s education, on average and ceteris paribus.

The intercepts predicts that for individuals with zero parental education, the highest level of education is on average 6.94.  [2 marks per coefficient].

1. Interpret the coefficient of abil 2, in model (3). Can you reject the claim that the returns to ability are linear? Be explicit about the hypothesis that you make in order to answer the question. [7 marks]

The coefficient of abil 2 measures the marginal effect of ability on educ (students can explain this in terms of nonlinear effect, or how the effect of ability varies at different levels and is not fixed) [2 marks]

To test for linear returns we test whether  the coefficient of abilis statistically different from zero. The null hypothesis of this test is:

𝛽𝑎𝑏𝑖𝑙 2 = 0 [1 mark]

Against the alternative 𝛽𝑎𝑏𝑖𝑙 2 ≠ 0.  [1 mark] , no marks for a one-tailed alternative.

t-[1 mark]

To test this null we need to compare the test statistics with the critical value from z distribution. The critical value at 5% significance level, for a two tailed test is 1.96.

[1 mark]

If student lost a mark above for setting a one tailed test but they have obtained the correct critical value for a one tailed test they should be given the 1 mark for this section.

Since the test statistics is greater than the critical value, we reject the null that the returns to ability are linear. [1 mark]

1. Comparing model (2) with model (3), at 5% level of significance, test the hypothesis that parents’ (mother and father) education has no effect on educational achievements of an individual. Be explicit about the hypothesis that you make. [7 marks]

Here we are testing that father’s and mother’s education have no effect on the dependent variable, hence the null hypothesis is  𝛽motheduc = 𝛽fatheduc = 0. This hypothesis excludes two of the variables from the model.

To test this hypothesis, we compare the r-squared of the restricted with unrestricted model, through an F-test.

The unrestricted model is model (3) and the restricted model is model (2) were we have excluded motheduc and fatheduc from the model.

The test statistic is

Where is the r-squared of the unrestricted model, model 3 in this case.   is the r-squared of restricted model, model 2.   is the number of restrictions, 2 restrictions in this case.

𝑘 is the number of independent variables in the unrestricted model, k= 5 in this case.

F=  critical value for 𝐹2 ,1230−5−1, at 1% significance level is 4.605. Since the F-statistic is greater than the critical value we reject the null that father’s and mother’s education has no effect on an individual’s education attainment.

Students can choose a different significance level to form their answer.

1. e) Comparing model 1 and model 3, why did the coefficient of parents’ (mother and father) education change after introducing ability to the model? [10 marks]

Comparing model 1 and model 3, we can see that after controlling for ability, the coefficients of both mother’s and father’s education have reduced  [2 marks for commenting on the effect].

This can suggest that ability is correlated with the dependent variable and with father’s and mother’s education [3 marks for mentioning the correlation between ability with and other regressors].

without further information we cannot verify the size of the bias. Model 3 controls for ability_squared and number of siblings too, therefore we cannot isolate the omitted variable bias that is due to excluding ability from the model.

One possible scenario is that ability has a positive, though non-linear effect on educational attainment (as seen by its coefficient). It might be the case that father’s and mother’s education are positively correlated with ability as well. Therefore, not controlling for ability will create an omitted variable bias. If people whose parents are better educated have higher level of ability, then not controlling for ability results in an upward bias in the coefficient of father’s and mother’s education.

[5 marks for a similar discussion on omitted variable bias].

1. f) Test the overall significance of model (3). [5 marks]

We can do an F-test for the statistical significance of the overall model:

The null hypothesis of the test is that all the coefficients are zero, against the alternative that at least one coefficient is different from zero.

R 2 /(k)

F = (1 R   2 )/(nk−1) ~   Fk,nk−1

F =

The F-statistic is 200.2 and the critical value for 𝐹5,1224 at 1% significance level is 3.017. Hence we reject the null and conclude that the model is overall significant.

1. The graph below is the plot of a simple linear regression of the dependent variable (Y) against the independent variable (X).
2. Copy the diagram on your answer sheet and on it, label the mean value of y (𝑌̅), the fitted regression line, and for the point indicated by the arrow label the actual observation (𝑦𝑖), the residual (𝑢𝑖) and the predicted value (𝑦̂𝑖).

1 mark for correctly indicating each value.

1. Explain what is 𝑅2 of the regression, how is it measured and what is its interpretation.

The 𝑅2 is a measure of goodness of it and it indicates the proportion of the variation in y that is explained by our fitted line.

The 𝑅2 can be measured by dividing the explained sum of variations in y by total sum of variations in y:

### R2 = SST SSE

Students can use other version of the r-squared formula.

[5 marks]

1. Briefly explain each of the following concepts:

1. Homoscedasticity                                                                       [5 marks]

Homoscedasticiy is one of the Gauss-Markov assumptions and it says that in a simple or multiple linear regression model, the errors of the regression have the same variance given any values of the explanatory variable(s).

Or an answer like this: Constant variance of error terms conditional on values of x.

1. Stationary process                                                              [5 marks]

Stationarity has to do with the joint distribution of  a process as it moves through time.  A time series is stationary if  its stochastic  properties and its temporal dependence structure do not change over time.

1. Regression residual                                                                       [5 marks]

The residual for an observation, is the difference between the actual observation and its predicted value.

1. Multicollinearity                                                                        [5 marks]

A problem that arises when collinearity between two or more independent variables leads to a lack of statistically significant coefficients even when a satisfactory overall explanatory power of the  model is obtained. (If students state in their own words that multicollinearity is correlation between two or more independent variables that can lead to an increase in standard errors of estimators, they should be given full marks

## SECTION B – Answer ONE question

1. A model of homicide rates in the USA is estimated, using state level data, as follows:

HRi = β0 + β1UEi + β2INCOMEi + β3SOUTHi4ETHNICi + ui,    i=1,….51

Where HR is the number of homicides (murders) per 100,000 population in state i, UE is the male unemployment rate in percentages, INCOME is mean per capita income in dollars, SOUTH is a binary variable which takes a value of 1 if the state is southern, 0 otherwise, and ETHNIC is the percentage of the state population that is not white.

The model is estimated with OLS and the following results are found (standard errors are shown in brackets).

𝐻𝑅̂ = -8 + 0.65 UE + 0.0005INCOME + 2.4SOUTH +0.21ETHNIC   (1)

(1.3)   (0.26)      (0.0002)                   (1.0)              (0.04)

R2=0.58

1. Interpret the coefficient of SOUTH and test for its statistical significance at the 5% level. [5 marks]

Interpretation: On average and ceteris paribus, the homicide rate in the south is predicted to be 2.4 per 100,000 population higher than other states.

• marks]

To test for the significance of the coefficient, we use a two-tailed t-test as the following:

𝐻0: 𝛽𝑠𝑜𝑢𝑡ℎ = 0

𝑯𝟏: 𝛽𝑠𝑜𝑢𝑡ℎ ≠ 𝟎

The t-stat = (under the null, the t-stat follows a t-distribution with n-k-1 degrees of freedom)

n=51, and k=4. The critical value at 5% significance level with 46 degrees of freedom is  2.021.

At 5% level of significance, the t-stat is larger than the critical value, hence we reject the null and conclude that the coefficient of south is statistically significant.

• marks of significance test]

1. Sketch a simple graph of HR against INCOME which illustrates how homicide rates are predicted to differ between southern and non-southern states. [5 marks]

1. The model was re-estimated with an interaction term between SOUTH and each of the continuous variables, and the following results were obtained.

𝐻𝑅̂ = -15.7 + 0.99UE + 0.0009INCOME + 18.7SOUTH +0.19ETHNIC

 2.4

(3.2)   (0.31)          (0.0003)                  (7.8)              (0.04)

-0.88SOUTH*UE -0.0008SOUTH*INCOME – 0.12SOUTH*ETHNIC  (2)

(0.54)                       (0.0004)                                (0.04)

R2=0.62

1. Interpret the effect of the male unemployment rate on homicide rates and how this differs between southern and non-southern states. [5 marks]

In  northern  states  the  effect  of  UE  on  HR  is  just  the  coefficient  of  UE:  1 percentage  point  increase  in  UE  raises  HR by  0.99  murders  per  100000 population, on average and ceteris paribus.

In  southern  states  the  effect  of  UE  on  HR  is  (0.99 – 0.88=) 0.11. One percentage   point   rise   in   UE   in   southern   states   raises   HR   by   0.11  homicides per 100000 population, on average and ceteris paribus.

1. Write down an expression that shows the predicted difference in homicide rates between southern and non-southern states. [5 marks]

Northern states:

HR=-15.7 + 0.99UE + 0.0009INCOME +0.19ETHNIC

Southern states:

HR = -15.7 + 0.99UE + 0.0009INCOME + 18.7 +0.19ETHNIC           -0.88*UE -0.0008*INCOME – 0.12*ETHNIC

The expression for the predicted difference is:

∆𝑯𝑹 = 18.7-0.88*UE – 0.0008*INCOME – 0.12*ETHNIC

1. Can you use the provided information to carry out an F-test of whether the determinants of homicide rates differ significantly between southern and non-southern states? Explain what would you have to do to compute the F-statistic. [10 marks]

Yes we can. The model in part (a) restricts the determinants of homicide rate to be the same between southern and northern states while the model in part (c) allows these determinants to be different.  [2 marks] If the answer says no we can’t as we don’t have the RSS of the two models, students should be given the 2 points, as they only practiced this F-test with the RSS and not r-squared.

To test whether these coefficients are statistically different between the two regions we use an F-test to test whether the interacted coefficients are jointly significant:

𝐻0: the three interaction terms have coefficients=0

𝐻𝐴: At least one of these three coefficients is different from zero

Then we use the 𝑅2 of restricted and unrestricted models to construct the F-statistics and carry out an F-test.

The   and the , the number of restrictions is 3.

[8 marks]

(up to this point suffices for full mark for this question as students were not asked to carry out the F-test).

F=((0.62-0.58)/3)((1-0.62)/(51-7-1))=0.01333/0.00883=1.51

Critical value of F at 5% with 3 and 43 degrees of freedom =2.84

So we do not reject the null and we conclude that determinants of HR are not statistically different between southern and non-southern states.

1. A researcher estimates a model of the following form

𝑌𝑡=  a + 𝑏1𝑋1𝑡  + 𝑏2𝑋2𝑡 + 𝑢𝑡    t=1 to 16

1. Explain what autocorrelation is, how it might arise and discuss the consequences for the OLS estimates. [10 marks]

Autocorrelation refers to the correlation between error terms over time. More specifically autocorrelation is the violation of the following assumption:

Conditional on the explanatory variables, the unobserved factors must not be correlated over time.

[4 marks]

Autocorrelation can occur for various reasons. For example, when conditional on knowing the values of the independent variables, omitted factors are correlated over time we will have an autocorrelation issue.

Another situation that might result in autocorrelation is when past values of the dependent variable feed forward to future values of explanatory variables. [2 marks – one reason is enough]

The consequence of autocorrelation on OLS parameters are that the OLS estimates are no longer BLUE and tests based on t and F are no longer valid.  [4 marks]

1. The correlation coefficient between the residuals 𝑢̂𝑡 and the lagged residuals 𝑢̂𝑡−1 from the model is calculated to be 0.456. Use this to implement a test for autocorrelation, specifying clearly the null and alternative hypotheses. Interpret your results. [10 marks]

This provides the basis for the DW test: [2 marks for correct recognition]

DW=2(1-p) where p is the correlation between the errors. So DW=1.088.

The null being tested is that p=0 and the alternative is that p0. DW is bounded by 0 and 4, sine the highest correlation coefficient is 1.

The critical values, with 2 independent variables and n=16 in our original model, are approximately 0.946 and 1.543, and we use these to mark on the bounds of the inconclusive region.

1. The researcher estimates a second model of the form:

𝑌𝑡=  a + 𝑏1𝑋1𝑡  + 𝑏2𝑋2𝑡 + 𝑏3𝑌𝑡−1 + 𝑢𝑡            t=1 to 16

and obtains a Durbin-Watson statistic of 1.75. The coefficient of 𝑌𝑡−1 is estimated to be 0.65 with a standard error of 0.06. Use this information to test for autocorrelation in this model.  [10 marks]

Since we have a lagged dependent variable we need to use Durbin’s h test, which is defined as

T

h=ˆ*

1−T var()

[ 4 marks for recognising the test statistic]

We need to use the estimated DW to get p

DW=2(1-p), 1.75=2(1-p) so p=  0.125, and using the info about  we calculate [2 marks]

[1 mark]

T=16 in original model but now we have a lagged dependent variable, T=15.  [1 mark,  student should not be penalised twice if they insert wrong T in the formula above. They should lose only the 1 mark dedicated to correct realisation of T].

The null and alternative is the same as part (b) of this question. Under the null the h statistic would follow a standard normal distribution [1 marks].

This h  is less than any critical value at conventional levels from the z table so we cannot reject the null that there is no serial correlation [1 marks].

1. A tobacco company is investigating the determinants of tobacco consumption. It estimates the following model using OLS (N=807, R2=0.053)

𝑡𝑜𝑏𝑎𝑐𝑐𝑜̂ = -3.64 + 0.88income – 0.501educ + 0.571age  – 0.0057age2

(24.08)  (0.728)           (0.167)        (0.160)       (0.0017)

where tobacco is the number of cigarettes consumed per week, income is annual income measured in £1000s, educ is years of schooling and age is the age of each customer, also measured in years. (Standard errors are in parentheses).

1. What is the purpose of the age squared term? Explain whether or not you think it should be included in the model. [4 marks]

The age squared controls for nonlinear relationship between age and tobacco consumption. This says that the effect of an increase in age on tobacco consumption depends on the level of age. [2 marks]

If the coefficient of age-squared is statistically significant then it should be included in the model. In this case, the t-statistics is 0.0057/0.0017 = 3.35. With a sample size of 807 we can compare this statistic with the critical value from z distribution. At 1% level of significance the critical value is approximately 2.57 hence the age-squared should be included in the model. [2 marks]

1. Write down an expression that shows the effect of a change in age on tobacco consumption, and use this to show how the effect of being a year older on tobacco consumption is different for a person aged 20 and a person aged 60. [6 marks]

∆𝑡𝑜𝑏𝑎𝑐𝑐𝑜̂  = 0.571∆𝑎𝑔𝑒 − 0.0114 𝑎𝑔𝑒 ∗ ∆𝑎𝑔𝑒

[2 marks for either of these expressions]

For ∆𝑎𝑔𝑒 = 1 at different age levels we have:

At 20 years old, the effect of getting one year older on tobacco consumption is an increase in consumption by 0.343 of a cigarette per week:

∆𝑡𝑜𝑏𝑎𝑐𝑐𝑜̂  = 0.571 ∗ 1 − 0.0114 ∗ 20 ∗ 1 = 0.343

At 60 years old on the other hand, the effect of getting one year older on tobacco consumption is a decrease of 0.113 of a cigarette per week:

∆𝑡𝑜𝑏𝑎𝑐𝑐𝑜̂  = 0.571 ∗ 1 − 0.0114 ∗ 60 ∗ 1 = -0.113

1. Sketch a graph illustrating the relationship between age and tobacco consumption, and calculate at what age tobacco consumption is predicted to decline. [7 marks]

age

[4 marks]

To find the turning point we need to use the first order condition and set the first derivative equal to zero:

= 0

Solving for this gives age=50. Therefore at age 50, cigarette consumption starts to decline.   [3 marks]

1. Test the overall significance of the fitted line in this question,  at  the  5%       [4 marks]

We can do an F-test for the statistical significance of the overall model:

The null hypothesis of the test is that all the coefficients are zero, against the alternative that at least one coefficients is different from zero.  R 2 /(k)

F = (1 R    2 )/(nk−1) ~   Fk,nk−1 −

F =

The F-statistic is 11.22 and the critical value for 𝐹5,1224 at 5% significance level is 2.371. Hence we reject the null and conclude that the model is overall significant.

1. You are told that the errors are heteroskedastic. What is the consequence on each of the following:

1. i) The standard error of the OLS estimators [3 marks]

The standard error of OLS estimators are not the smallest under heteroskedasticiy ii) The F-tests    [3 marks]

With heterokedastic errors, the F-statistic does not follow an F distribution, making the test invalid

iii)       The bias in OLS estimators   [3 marks]

Hetersokedasticiy has no effect on the bias.

END OF PAPER

## Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26
The price is based on these factors:
Number of pages
Urgency
Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee
On-demand options
• Writer’s samples
• Part-by-part delivery
• Overnight delivery
• Copies of used sources
Paper format
• 275 words per page
• 12 pt Arial/Times New Roman
• Double line spacing
• Any citation style (APA, MLA, Chicago/Turabian, Harvard)

## Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

### Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

### Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

### Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.