Chapter 2 Proposal
2.1 Research topic
Since 1987, Airlines have reported on-time performance for non-stop domestic flights to the U.S. Department of Transportation(DOT). Reporting was modified in 1995 to include reporting of mechanical delays. Later on, DOT formed an Air Carrier On-Time Reporting Advisory Committee in August 2000 to consider changes to the current on-time reporting system so that the public would have clear information about the nature and sources of airline delays and cancellations. In 2001, Bureau of Transportation Statistics(BTS) conducted a pilot program with four airlines to test the monthly reporting of causation. Beginning in June 2003, based on the result of the program, airlines started to report information on causes of delays in five broad categories: air carrier, extreme weather, national aviation system, late-arrival aircraft, and security.
Since that time, all the information has been published by the U.S. Department of Transportation monthly as a summary of airlines on-time performance, including cause of delays, in the Air Travel Consumer Report. So this project is meant to perform an exploratory data analysis on the variety of on-time and delay information published by the Bureau of Transportation Statistics to find underlying relationships between airline on-time performance and other factors.
2.2 Data availability
What data source we chose and why:
Initially, we found the Airline Reporting Carrier On-Time Performance DataSet on IBM. Then according to the original source listed on IBM, we traced the data source to the Bureau of Transportation Statistics(BTS). With quick googling about BTS, we realized that all statistics regarding transportations are published there, so we decided to use BTS as our data source. On the BTS website, there are tons of databases, since we are interested in airline on-time performance, we chose the Airline On-Time Performance Data as the data set for this project.
Institution that maintains the data source:
The Bureau of Transportation Statistics, part of The Department of Transportation, is the institution that maintains the data source.
How the data being collected:
The data are collected by carriers reporting information regarding on-time performance and causes of cancellations and delays for the flights they operate by month and year to The Department of Transportation.
Frequency of data updates and time range of data being considered for this project:
The Department of Transportation publishes a monthly summary of airline on-time performance. Thus the data set is updated monthly. The data has been collected since 1987 with the latest update in August 2022. Since the cause of delays was not reported to BTS until 2003, data before that will not be taken into consideration for this project.
Available data format:
The data can be downloaded from the BTS website with filters for which years, months, states and features to include. The downloaded data format is csv files.
How to import the data:
We need to download the data from the BTS website. It seems like we can only download data for one month at one time. Therefore, in order to analyze the trend between months and years, we need to combine the different months data together after downloading. Since the data format is csv files, we can import and read data through the read.csv() function in R.
Contact information for questions on data:
If we have any questions regarding the data, we can contact The Bureau of Transportation Statistics. The contact information is listed under the contact us page with office location, office hour schedule, and phone number.
Issues regarding the data:
There are no known issues relating to the data quality. However, the data set is extremely large, so it’s going to take a huge amount of time to download and to subset the data for better analysis in terms of efficiency.
Citations to the data source:
Link to data set description: https://www.transtats.bts.gov/Fields.asp?gnoyr_VQ=FGJ
Link to download page: https://www.transtats.bts.gov/DL_SelectFields.aspx?gnoyr_VQ=FGJ&QO_fu146_anzr=b0-gvzr