Fare dealing

Remind me again: what’s the purpose of opening up all this public data?

Ah yes, that’s it. To create value. And you can’t get a much stronger example of real value in the real world than showing people how to save money when buying train tickets.

Fare pricing is a fairly hit-and-miss business, as you’ve probably noticed. We don’t have a straight relationship between distance and price. Far from it.

The many permutations of route, operator and ticket type throw up some strange results. We hear of first class tickets being cheaper than standard, returns cheaper than singles, and you can definitely get a lower overall price by buying your journey in parts, provided that the train stops at the place where the tickets join.

The rules here are a bit weird: although station staff have an obligation to quote the cheapest overall price for a particular route, they aren’t allowed to advertise “split-fare” deals, even where they know they exist. Huh?

Why this distinctly paternalistic approach? Well, say the operators: if a connection runs late, your second ticket might not be eligible, and there might be little details of the terms and conditions of component tickets that trip you up, and, and, and…well, it’s all just too complicated for you. Better you get a coherent through-price (and we pocket the higher fare, hem hem).

There’s no denying it is complicated. Precisely how to find the “split-fare” deal you need is a tiresome, labour-intensive process of examining every route, terms and price combination, and stitching together some sense out of it all. And, indeed, in taking on a bit of risk if some of those connections don’t run to time.

You might be lucky, and have an assistant who will hack through fares tables and separate websites to do you that for you. But you’d be really be wasting their time (and your money).

Because that sort of task is exactly what technology is good at.

Taking vast arrays of semi-structured data and finding coherent answers. Quickly. And if there’s some risk involved, making that clear. We’re grown-ups. We can cope.

There’s no doubt at all that the raw materials–the fares for individual journey segments–are public information. Nobody would ever want, or try, to hide a fare for a specific route.

So when my esteemed colleague Jonathan Raper–doyen of opening up travel-related information and making it useful–in his work at Placr and elsewhere, put his mind to the question of how new services could crunch up the underlying data to drive out better deals for passengers, I don’t doubt that some operators started to get very nervous indeed.

Jonathan got wind–after the November 2011 meeting of the Transport Sector Transparency Board–that a most intriguing piece of advice had been given by the Association of Train Operating Companies (ATOC) to the Department for Transport on the “impact of fare-splitting on rail ticket revenues”.

Well, you’d sort of expect an association which represents the interests of train operators to have a view on something that might be highly disruptive to their business models, wouldn’t you?

So what was that advice? He put in a Freedom of Information request to find out.

And has just had it refused, on grounds of commercial confidentiality.

This is pretty shocking–and will certainly be challenged, with good reason.

Perhaps more than most, I have some sympathy with issues of commercial reality in relation to operational data. We set up forms of “competition” between providers for contracts, and in order to make that real, it’s inevitable that some details–perhaps relating to detailed breakdowns of internal costs, or technical logistics data–might make a difference to subsequent market interest (and pricing strategy) were they all to be laid out on the table. I really do understand that.

But a fare is a fare. It’s a very public fact. It’s not hidden in any way. So what could ATOC have said to DfT that is so sensitive?

The excuse given by DfT that this advice itself is the sort of commercial detail that would prejudice future openness is, frankly, nonsense.

I look forward to the unmasking of this advice. And in due course to the freeing-up of detailed fares data.

And then to people like Jonathan and Money Saving Expert creating smart new business models that allow us to use information like it’s supposed to be used: to empower service users, to increase choice, and to deliver real, pound-notes value into the hands of real people.

That’s why we’re doing all this open data stuff, remember?

A bit more about train information

If you were reading my outpourings a year ago you may remember a distinct preoccupation with train operating information. In the great range of public-facing datasets out there, the ones that offer the very highest utility, in my opinion, are those about real-time and real-world things: a picture of what’s happening right now and in the near future.

Transport information, weather, location, revised opening hours, where things are etc. etc. Sure, there may be treasures to dig out from the big dumps of auditable history in other datasets, but when it comes to actually building things people will find useful, there are some targets which are clearly more promising than others. (It’s probably no coincidence that data about timetables, postcodes, maps, operating information and the like are those which are also the most commercially tangled. Value breeds impediments, it would seem.)

I wrote about the problems of there being different versions of the truth about train operation. I wrote it at a time when ice and snow were crippling normal running. So, unsurprisingly, I’m back to revisit what’s happened since.

My idea a year ago – born of the frustration in inaccurate data systems (one could get a different answer from the web, the train station office, the train platform sign, apps and feedback from other travellers via Twitter, for example) was to rethink the way that trains are tracked and described in times of extreme disruption. I’m talking here about normal running disrupted to the point that existing timetables have become meaningless (and have been abandoned), where all trains are out of their normal positions, and the only meaningful data points that might relate to a particular physical train are its current physical location and its proposed calling points.

The notion of “Where’s my train?” was that if these basic data points were captured at the level of the train and made available as a feed, then in the event of utter chaos you would still be able to see the whereabouts of the next train going where you needed to go (even if you couldn’t do much about making it move with any predictability, or at all). Very much about the “where”, rather than the “when”.

This was a departure from information systems which relied on a forecast of train running (that abandoned timetable) or on a train having passed a particular point (for the monitoring of live running information). If trains had GPS tracking (which I heard they did) and the driver knew where the train would call (I was told this was generally the case) then a quantum of data existed which could drive such a feed.

It didn’t get very far. I talked through the principles with operational staff in two train operating companies: one blockage would be that such extreme disruptions were so rare that the usual situation would arise with regard to contingency planning – just not a common enough occurrence to warrant the development of a specific response. In addition, that certainty of where the train was going didn’t seem that certain, after all. Drivers would punch in their intended stops but right up to and even beyond the point of departure this could change. Better information, intended to give comfort to the stranded, might be replaced with false hope and ultimately do more harm than good. And the nagging thought I’d had originally remained: how much use was it really to know there was a train four miles up the track which was going to your stop, if it was quite possible that the points in between you had no chance of being unfrozen?

So, no more of that for the moment. There’ve been other developments. The live running information now seems to be much more accurate. The release of that and other information, such as timetables, is now a political football – should it remain a commercial asset of the train operators to be resold, and controlled, under licence, or are there greater benefits in releasing it to all who may make use of it?

I’m fairly sure that the data will be freely available eventually. There is some sterling work going on from innovators who are banging on the door of the information holders to make better use of it (all Malcolm’s recent posts make fascinating reading). To my mind there is a big difference between the commercial position of providing a physical service, and the commercialisation of the information that describes and records the performance of that service. But we won’t have reached the end of this story until that information is dependable. And at the moment, it’s not. You can still see differences between web information, feed information via an app, and trackside information. All were different on my line a day or so ago.

Yet the teasing paradox is that there is only one ‘truth’ at a particular point in time about a train’s running, even if it may then vary over time. And sometimes, as I experienced last Tuesday evening in South London, that truth is “we don’t know where this train is going”. In extreme circumstances, when lines are blocked and points jammed, I might have known where my train was (I was sitting in it) but I had as much idea as any of the crew (i.e. none) where it was going.

Distributing and presenting this information is far from being a trivial task. I don’t know the details of the architecture behind train information systems. But I can postulate that there are many different models by which highly volatile information, in thousands of different places, could be brought together, indexed, shared, distributed and so on. And they’re all pretty complicated. Before you blithely say “well, they just need to update a single database”, think what that might mean.

The track record (sorry) of mega-aggregation isn’t great. Without doubt it’s been attempted before in this area. Perhaps linked data mark-ups of distributed information sources hold the answer? I’d be interested on any thoughts on this.

But I’m clear about this:

the sorry position in late 2010 that it is still a matter of detective work and guesswork to find out where a large, highly-tracked piece of public-service equipment is, when it’s coming, and where it’s going, cannot be allowed to continue.

I’m fairly forgiving of physical failures in cases of extreme disruption – there is much real-world complexity that lies behind simple-sounding services. But on information failures? We can, and must, do better.