MSCI 212 Statistical Methods for Business
PART II (Second, Third and Final Year)
MSCI 212 Statistical Methods for Business
WRITTEN COURSEWORK ASSIGNMENT (DUE WEEK 12)
Please read the following instructions carefully.
- Your coursework must be submitted electronically to the MSCI212 Moodle site as a single document by 10:00am on Monday, 18th January 2021. The place to submit your file is in the Coursework section of the MSCI212 Moodle site.
- Your coursework will NOT be accepted unless you sign and return a declaration form (available from the Web Board) that includes the statement that you have read and understood the University regulations relating to plagiarism. Plagiarism includes:
- Collusion, where a piece of work prepared by a group is represented as if it were the student’s own;
- The purchase of a paper from a commercial service, including Internet sites, whether pre-written or specially prepared for the student concerned;
- The submission of a paper written by any other person, including a friend, a fellow student or a person who is not a member of the university;
- The submission of another student’s work, whether with or without that other student’s knowledge or consent.
Incidents of plagiarism are recorded on a student’s file. Penalties are in line with the institutional framework of the University.
- In accordance with University regulations, marks are deducted from any coursework which is not submitted by the deadline. This penalty will apply for 3 days after the deadline and then a mark of zero will be given to any work not submitted. However, if an extension is given then the rule applies from the date of the extension.
- This coursework has two questions and you must answer both questions in full.
- Both questions carry equal marks (50%) and you should be able to begin Question 1 immediately; for Question 2 some other important tools will still be covered later.
- Each of your answers should state clearly your reasoning. Please also state clearly any assumptions that you have made in addition to those given in the questions.
- You are allowed to submit handwritten answers but should include carefully selected extracts of your SPSS output to justify your answers. Also, please write neatly and if we cannot read your handwriting your answer will NOT be marked.
Question 1 [Worth 50% of the marks]
As at workshop 2, use <Transform><Random Number Generators> to set your unique starting point for the SPSS random number generator. For this coursework question use the last four digits of your library card PLUS 1, i.e. if your library card ends ‘4321’ type in ‘4322’, and if your library card ends ‘4329’ type in ‘4330’. Record these four digits at the top of your answer.
Accident and Emergency Departments (AEDs) in hospitals in England have to deal with highly variable workloads, providing high quality healthcare within a reasonable length of time. The government has set a target that 95% of patients should leave the AED within 4 hours. Data gathered on the 8781 patients who attended one such AED over a 4 week period is contained in the SPSS data file ‘AED4weeks.sav’ in the following variables:
- Age – of patient in years;
- Day – numbers the days 1 to 28 ;
- DayofWeek – Monday … Sunday;
- Period – patient’s time of arrival at AED, where ‘0’ means between midnight and 1am, ‘1’ means between 1am and 2am, etc.
- LoS – Patient’s length of stay in the AED (in minutes);
- Breachornot – patients who spend more than 4 hours in the AED have breached the NHS 4- hour performance target;
- HRG – Health Related Group – this is a code that categorises the treatment that the patient received;
- noofinvestigations – number of diagnostic tests and other investigations that were performed on the patient;
- nooftreatments – number of treatments that the patient received;
- noofpatients – the number of patients who were already in the AED when the patient arrived.
You have been asked by the AED management to use a sample of this data to help them describe to the hospital management the nature of the workload going through the AED, and the extent to which causes of patients breaching (or just staying in the AED a long time) can be identified.
Draw a random sample of 400 patients from this 4 week period and investigate your sample using SPSS.
- In no more than 6 pages describe the main features of your sample as if to the hospital management. You should include main features of individual variables and interesting relationships between them. You may include SPSS numerical and graphical output and/or you may quote values from your SPSS output. (The clarity and content of your report are both important). [Worth about 75% of the marks for
- Using your sample as evidence, explain what you believe may be causes of patients breaching (or just staying in the AED a long time). [No more than 2 pages. Worth about 25% of the marks for Q1]
Question 2 [Worth 50% of the marks]
You should include key parts of your SPSS output in your answer. You must explain your answer clearly and you are limited to a maximum of 6 pages plus 3 pages of Appendix.
Matching production with demand is essential in modern day manufacturing to avoid huge inventory costs. To achieve this a good prediction of sales is important so companies can avoid producing too much or too little. The BSW-Bens motorbike company is a major motorbike manufacturer in UK. Their only custom made model is also named after the company and has been produced since 1990. The task in this question is to build linear regression models to predict monthly sales using economic indicators of the UK as well as Google search queries. The data for this problem is contained in ‘BSW-Bens.csv’. Each observation in the file is a month, from January 2010 to February 2014. The variables are given in Table 1.
- Using descriptive and graphical methods conduct an exploratory analysis on the given data and generate some insights for conducting regression analysis. [Worth about 10% of the marks for Q2]
- Use linear regression to determine which, if any, of unemployment, Google queries, CPI.All and CPI.Energy show evidence of a significant linear relationship with sales? Justify your answer. [ Worth about 24% of the marks for Q2]
- Which of the four models do you think is best? Justify your answer. Carry out a common-sense based examination of the residuals for your chosen model and comment on your findings. [Worth about 6 % of the marks for Q2]
- Use your models from part (c) to estimate the value of sales for the following scenarios: (i) at unemployment rate 9, with 200 Google queries, with CPI.All 220 and CPI.Energy equal to 200; (ii) at unemployment rate 5, with 700 Google queries, with CPI.All 250 and CPI.Energy equal to 220;. Explain your beliefs about the accuracy of your estimates as if to a sales manager of the company. [ Worth about 10% of the marks for Q2]
- Investigate as far as you are able the possibility of a multiple linear regression model using unemployment, Google queries, CPI.All and CPI.Energy as explanatory variables. Explain clearly what you believe your results are telling you. [Worth about 20% of the marks for Q2]
- Modify your model in part (e) by modelling monthly seasonality. Compare and contrast the resulting
model with the model in part (e). [Worth about 30% of the marks for Q2]