What would you do with 5 million award search data points?


I love good data. Taking huge chunks of information and trying to distill trends, patterns and links has always been interesting to me. And so I find myself wondering this afternoon just what to do with a massive batch of data related to airline award searches. See, for about the past year (probably longer, actually) I’ve had a tool available online to allow people to search for awards on Star Alliance carriers. And those searches each return some collection of data. Over time the data collected added up and I now realize that I have more than 5 million rows of search results available.

And now I cannot help but wonder what I should do with it. Also, I’m not entirely sure I know how to tease the data out into something useful.

Are there trends in when seats are released or booked? Are certain months or routes really more likely to have seats available? More likely to be searched on?

What else? What types of information do you want me to try to pull out of the data?

No promises, as I’m not entirely sure I know where to begin with the analysis, but I’m definitely willing to give it a shot if anyone has a suggestion of something that seems like a useful query to run.

Thoughts??

Never miss another post: Sign up for email alerts and get only the content you want direct to your inbox.


Seth Miller

I'm Seth, also known as the Wandering Aramean. I was bit by the travel bug 30 years ago and there's no sign of a cure. I fly ~200,000 miles annually; these are my stories. You can connect with me on Twitter, Facebook, and LinkedIn.

24 Comments

  1. Open source the data set? (unless you see some exceptional value in it and you think you could sell it to some travel-related website)

  2. Make it publicly available, and let crowdsourcing work its magic. You’ll probably be surprised by the different approaches or analyses taken by the hive mind.

  3. I’ve read a few recent posts about specific carriers tightening availability, particularly in premium cabins, and tightening industry-wide in general – something along those lines would be interesting. % of queries returning an available option by month and operating carrier?

  4. The problem with seeing “what is searched” is that your tool is a very non-representative sample. People have to know about your tool, then choose to use it over other options like airline websites or award nexus. That wouldn’t provide much information.

    Availability information, on the other hand, is much more useful. Saying “what percentage of queries from the USA to NRT showed availability” is helpful. Again, it’s not going to be spot-on representative because of who uses your site. That said, the Switchfly study that everyone quotes is so horrible that pretty much anything from you would be an improvement.

  5. I think a chart showing award availability % starting 331 days out up to day before……….and it is probably equally important to break out FC and business……….and finally can you group regional availability rather than specific routes which are sometimes not that helpful……

  6. I’m curious about most common destinations searched. I’m convinced most frequent flyers look for tickets to Europe and Hawaii (again, not sure if your users are representative of most frequent fliers).

  7. I’m sure its worth more then any of those award travel reports published online.
    You get real people, looking for real flights with real result. Great deal of information on where/when people want to travel and who makes it happen with seats available.

  8. 5m rows is still a very small data set and it’ll be limited for some of the broader analyses you could do.

    email me if you’d like some insights, I manage mid-size dbs currently (~500gb/db). I average about 4m hits daily and analyze that data pretty heavily.

  9. Please do NOT make it publicly available unless you want to see consistent patterns of inventory to dry up like LH/LX F which used to be a gimme. Great job you’ve made the set but keep it to yourself. It’s in your own self-interest.

  10. Does your privacy policy cover the collection of this data and how it can be used or shared? Is it compliant with each individual country’s laws? Just curious. Wouldn’t want to see this data collection die on the vine due to legal/privacy concerns.

  11. The irregular frequency of which your data is sampled poses a unique problem. If it’s possible, I think it may be beneficial to assume a much lower sampling frequency and decimate the extra data points between to have some periodicity. Then you could do some neat things(and more quickly). You can always look at the higher resolution samples later if need be.

  12. The irregular frequency of which your data is sampled poses a unique problem. If it’s possible, I think it may be beneficial to assume a much lower sampling frequency and decimate the extra data points between to have some periodicity. Then you could do some neat things(and more quickly). You can always look at the higher resolution samples later if need be.

  13. Would be interesting to see what is the most common award travel route people search.

  14. I like the open source idea. At least a few of us have training in statistics and might have some good ideas.

    I’d start with a high-level analysis along the lines of North America to Asia. Look at the average rate of successful searches. If N.A. to Asia has high variation and N.A. to Europe has low variation, then breakdown N.A.-Asia into individual routes and see which ones are more successful than others. Put N.A.-Europe on hold.

  15. Since it’s Star Alliance exclusive…

    Something that helps us see which gateway route from the Star hubs has the best C/F avail.

    The users are self-selected as travel hackers, so might as well offer a cut of data that’s aimed at what they’re looking for.

    Doubt coach tickets to Orlando are a big set of this data. But lots of flights to FRA, ZRH, NRT…

  16. it’s hard to tell without seeing the data. I will suggest making it available too.

  17. Build a pretty map of city pairs and seats available by month. Maybe the visualizations tools in R can do it.

  18. Alerts triggered: by airline, by specific route (NYC – LON), by transit region transpacific/ transatlantic/ domestic. Those could be really intriguing. Thanks for the great work Seth!

  19. And on the flip side, rather than analyzing targeted alerts, it’d be interesting to know what routes are searched for the most. That should help with the demand side analysts.

  20. @abcx makes a good point. S/he needs to keep all that award space open for a first class ticket to North Korea for reprogramming.

    As for me, I vote to “open source” the data set. Cool shit.

Comments are closed.