Another awesomely awful statistical analysis of airlines

Apparently it is the season of producing bad statistical interpretations of the airline industry. First there was the NY Times piece that failed quite notably to account for the primary factors of airfare pricing. And now we have the annual Airline Quality Report and the associated article, America’s Meanest Airlines.

With a headline like that you’re bound to get plenty of readers. What you do not get, however, is a particularly useful analysis of the underlying data. The authors of the AQR describe their research as bettering the historically subjective analysis that had previously existed in the industry. They use objective measures instead. Sortof.

Using the Airline Quality Rating system of weighted averages and monthly performance data in the areas of on-time arrivals, involuntary denied boardings, mishandled baggage, and a combination of 12 customer complaint categories, airlines’ comparative performance for the calendar year of 2010 is reported.

The first three categories are definitely objective, though still skewed as noted below. That last category, not so much. After all, not only must there be a service problem but the customer must also know that they can file a complaint to the DoT and then figure out how to do so. Most customers simply don’t. So while it may be the best data we have, it is not great data in that category.

The researchers also conducted significant surveys to determine how important the 4 categories of statistics are to passengers. On a scale of 1-10 the four categories range between 7.17 and 8.63. That’s not a ton of range, though having a bit of weighting between them is better than nothing I suppose.

But my biggest beef with reporting the results as a meaningful measure of airline quality is that there is rarely statistical significance between the values being compared. This is most noticeable in the "customer complaints" category, where the values range between 0.44 and 2.00 in this year’s results. That’s the number of complaints per 100,000 passengers. Suggesting that such a tiny variation across such a large user population matter is a specious claim, yet it represents nearly 20% of the value of the AQR score, sortof.

On-time performance has the broadest spread – from 75.7 – 92.5% – and also gets the highest weighting (and then some, as noted below).

The spread on mishandled baggage for mainline carriers is between 1.63 and 3.49 instances per 1,000 passengers. This can either be reported as one being twice as bad or both being incredibly good; it is all about how sensational you want the headline to be. And whether one regularly check bags or not probably skews the value of this metric. Priority baggage handling (and no fees) for elite passengers also probably skews the importance of this factor more in recent years than in the past, though the weighting has remained constant.

The variance for involuntary denied boarding is 0-2.26 instances per 10,000 mainline passengers. It would certainly suck if you’re one of the two, but it is not clear that there is sufficient statistical within that range to discern that one carrier is better than another.

The most common category for customer complaints in each of the 12 months is "Flight Problems." This category covers "cancellations, delays and other deviations from schedule" which seems to be quite similar to on-time performance ratings. I happen to agree that getting where I’m going, hopefully close to on-time, is important, but not so much so that it is worth giving it extra weight above and beyond the established 8.63 weighting. With nearly 30% of the complaints in this category they are effectively worth an extra 2.4 points.

The second highest ranking category of complaints in 10/12 months was baggage related. This is on top of the wholly dedicated missing baggage category in the formula. Similar to the on-time performance numbers above, this results in baggage receiving an extra 1.14 points of significance in the scoring.

At the end of the day, the fact that they’re using consistent calculations and mostly objective numbers means that tracking trends is viable. The question is what the value of those trends really is. Sadly, in this case, the reporting is trending towards the sensationalistic rather than observing statistically significant variances. But these guys get a lot of press and the airlines who win tend to brag about it. I guess objective analysis is overrated.

Read the AQR report here (PDF!).