honestlyreal

Icon

A bit more about train information

If you were reading my outpourings a year ago you may remember a distinct preoccupation with train operating information. In the great range of public-facing datasets out there, the ones that offer the very highest utility, in my opinion, are those about real-time and real-world things: a picture of what’s happening right now and in the near future.

Transport information, weather, location, revised opening hours, where things are etc. etc. Sure, there may be treasures to dig out from the big dumps of auditable history in other datasets, but when it comes to actually building things people will find useful, there are some targets which are clearly more promising than others. (It’s probably no coincidence that data about timetables, postcodes, maps, operating information and the like are those which are also the most commercially tangled. Value breeds impediments, it would seem.)

I wrote about the problems of there being different versions of the truth about train operation. I wrote it at a time when ice and snow were crippling normal running. So, unsurprisingly, I’m back to revisit what’s happened since.

My idea a year ago – born of the frustration in inaccurate data systems (one could get a different answer from the web, the train station office, the train platform sign, apps and feedback from other travellers via Twitter, for example) was to rethink the way that trains are tracked and described in times of extreme disruption. I’m talking here about normal running disrupted to the point that existing timetables have become meaningless (and have been abandoned), where all trains are out of their normal positions, and the only meaningful data points that might relate to a particular physical train are its current physical location and its proposed calling points.

The notion of “Where’s my train?” was that if these basic data points were captured at the level of the train and made available as a feed, then in the event of utter chaos you would still be able to see the whereabouts of the next train going where you needed to go (even if you couldn’t do much about making it move with any predictability, or at all). Very much about the “where”, rather than the “when”.

This was a departure from information systems which relied on a forecast of train running (that abandoned timetable) or on a train having passed a particular point (for the monitoring of live running information). If trains had GPS tracking (which I heard they did) and the driver knew where the train would call (I was told this was generally the case) then a quantum of data existed which could drive such a feed.

It didn’t get very far. I talked through the principles with operational staff in two train operating companies: one blockage would be that such extreme disruptions were so rare that the usual situation would arise with regard to contingency planning – just not a common enough occurrence to warrant the development of a specific response. In addition, that certainty of where the train was going didn’t seem that certain, after all. Drivers would punch in their intended stops but right up to and even beyond the point of departure this could change. Better information, intended to give comfort to the stranded, might be replaced with false hope and ultimately do more harm than good. And the nagging thought I’d had originally remained: how much use was it really to know there was a train four miles up the track which was going to your stop, if it was quite possible that the points in between you had no chance of being unfrozen?

So, no more of that for the moment. There’ve been other developments. The live running information now seems to be much more accurate. The release of that and other information, such as timetables, is now a political football – should it remain a commercial asset of the train operators to be resold, and controlled, under licence, or are there greater benefits in releasing it to all who may make use of it?

I’m fairly sure that the data will be freely available eventually. There is some sterling work going on from innovators who are banging on the door of the information holders to make better use of it (all Malcolm’s recent posts make fascinating reading). To my mind there is a big difference between the commercial position of providing a physical service, and the commercialisation of the information that describes and records the performance of that service. But we won’t have reached the end of this story until that information is dependable. And at the moment, it’s not. You can still see differences between web information, feed information via an app, and trackside information. All were different on my line a day or so ago.

Yet the teasing paradox is that there is only one ‘truth’ at a particular point in time about a train’s running, even if it may then vary over time. And sometimes, as I experienced last Tuesday evening in South London, that truth is “we don’t know where this train is going”. In extreme circumstances, when lines are blocked and points jammed, I might have known where my train was (I was sitting in it) but I had as much idea as any of the crew (i.e. none) where it was going.

Distributing and presenting this information is far from being a trivial task. I don’t know the details of the architecture behind train information systems. But I can postulate that there are many different models by which highly volatile information, in thousands of different places, could be brought together, indexed, shared, distributed and so on. And they’re all pretty complicated. Before you blithely say “well, they just need to update a single database”, think what that might mean.

The track record (sorry) of mega-aggregation isn’t great. Without doubt it’s been attempted before in this area. Perhaps linked data mark-ups of distributed information sources hold the answer? I’d be interested on any thoughts on this.

But I’m clear about this:

the sorry position in late 2010 that it is still a matter of detective work and guesswork to find out where a large, highly-tracked piece of public-service equipment is, when it’s coming, and where it’s going, cannot be allowed to continue.

I’m fairly forgiving of physical failures in cases of extreme disruption – there is much real-world complexity that lies behind simple-sounding services. But on information failures? We can, and must, do better.