# Data Analysis Report

400 WordsMar 30, 20152 Pages
2/19/2015 Data Analysis Methods Mid-Term Project Govind Ramchander Vinay Gupta Utkarsh Srivastava MS-IS Report: Goal: The goal of the report is to study the factors and how they would impact the landing distance of a commercial flight so that the risk of landing overrun is reduced. Approach: We have landing data of 800 commercial flights to help us analyse and model the equation to find out the distance based on the other parameters in the data supplied. We will be following the below steps to achieve the final model. 1. Import data from the csv file. 2. Clean data based on the below requirements a. Duration should be always greater than 40mins. b. Ground speed should be between 30mph and 140mph. c. Air speed should be between 30mph and 140mph. d. Height should be at least 6m. e. Distance should be less than 6000 feet. 3. Examine correlations between different variables in the data set. 4. Perform fitting into multiple linear regression model. 5. Re-explore and re-model data to find the most important parameters that impact the landing distance. Result: We found that speed_ground and speed_air have a strong correlation. Hence we chose to retain only speed_ground in our model as it was complete (i.e. no missing values) and also to prevent multi-collinearity. A Multiple Linear Regression model fit was then done, assuming that distance is affected by all other variables of the dataset. In our first iteration, firstmodel, we eliminated 3 factors viz. duration, no_pasg, and pitch from our model since it did not significantly affect our response. In the next model we left these variables out and went on to perform residual analysis on it check its correctness. We found that residuals followed a trend with respect to the speed_ground variable and hence we revised our model to include the squared value of speed_ground. Our final model, revisedmodel, showed