Fare dealing

Remind me again: what’s the purpose of opening up all this public data?

Ah yes, that’s it. To create value. And you can’t get a much stronger example of real value in the real world than showing people how to save money when buying train tickets.

Fare pricing is a fairly hit-and-miss business, as you’ve probably noticed. We don’t have a straight relationship between distance and price. Far from it.

The many permutations of route, operator and ticket type throw up some strange results. We hear of first class tickets being cheaper than standard, returns cheaper than singles, and you can definitely get a lower overall price by buying your journey in parts, provided that the train stops at the place where the tickets join.

The rules here are a bit weird: although station staff have an obligation to quote the cheapest overall price for a particular route, they aren’t allowed to advertise “split-fare” deals, even where they know they exist. Huh?

Why this distinctly paternalistic approach? Well, say the operators: if a connection runs late, your second ticket might not be eligible, and there might be little details of the terms and conditions of component tickets that trip you up, and, and, and…well, it’s all just too complicated for you. Better you get a coherent through-price (and we pocket the higher fare, hem hem).

There’s no denying it is complicated. Precisely how to find the “split-fare” deal you need is a tiresome, labour-intensive process of examining every route, terms and price combination, and stitching together some sense out of it all. And, indeed, in taking on a bit of risk if some of those connections don’t run to time.

You might be lucky, and have an assistant who will hack through fares tables and separate websites to do you that for you. But you’d be really be wasting their time (and your money).

Because that sort of task is exactly what technology is good at.

Taking vast arrays of semi-structured data and finding coherent answers. Quickly. And if there’s some risk involved, making that clear. We’re grown-ups. We can cope.

There’s no doubt at all that the raw materials–the fares for individual journey segments–are public information. Nobody would ever want, or try, to hide a fare for a specific route.

So when my esteemed colleague Jonathan Raper–doyen of opening up travel-related information and making it useful–in his work at Placr and elsewhere, put his mind to the question of how new services could crunch up the underlying data to drive out better deals for passengers, I don’t doubt that some operators started to get very nervous indeed.

Jonathan got wind–after the November 2011 meeting of the Transport Sector Transparency Board–that a most intriguing piece of advice had been given by the Association of Train Operating Companies (ATOC) to the Department for Transport on the “impact of fare-splitting on rail ticket revenues”.

Well, you’d sort of expect an association which represents the interests of train operators to have a view on something that might be highly disruptive to their business models, wouldn’t you?

So what was that advice? He put in a Freedom of Information request to find out.

And has just had it refused, on grounds of commercial confidentiality.

This is pretty shocking–and will certainly be challenged, with good reason.

Perhaps more than most, I have some sympathy with issues of commercial reality in relation to operational data. We set up forms of “competition” between providers for contracts, and in order to make that real, it’s inevitable that some details–perhaps relating to detailed breakdowns of internal costs, or technical logistics data–might make a difference to subsequent market interest (and pricing strategy) were they all to be laid out on the table. I really do understand that.

But a fare is a fare. It’s a very public fact. It’s not hidden in any way. So what could ATOC have said to DfT that is so sensitive?

The excuse given by DfT that this advice itself is the sort of commercial detail that would prejudice future openness is, frankly, nonsense.

I look forward to the unmasking of this advice. And in due course to the freeing-up of detailed fares data.

And then to people like Jonathan and Money Saving Expert creating smart new business models that allow us to use information like it’s supposed to be used: to empower service users, to increase choice, and to deliver real, pound-notes value into the hands of real people.

That’s why we’re doing all this open data stuff, remember?

Neither one thing nor the other

In which I look more closely at one particular, well-known data set: what makes it what it is, and what we might draw from the way it’s managed to help us with some other challenging questions about privacy and transparency.

Surely data is open, or it isn’t?

(I’m using “open” here as shorthand for the ability to be reached and reused, not with any particular commercial or licensing gloss. It’s a loaded term. But let’s not snag on it at the beginning, hey?)

Data is either out there, on the internet, without encryption or paywall, or it isn’t. And if it is, then that’s that. Anyone can reach it, rearrange it or republish it, restrained or hampered only by such man-made contrivances as copyright and data protection laws.

Maybe. Maybe not.

I’ve been involved in some interesting discussions recently about the tricky issues surrounding the publication of personal data. By that, I mean data which identifies individuals. To be specific: some of the information in the criminal justice sector about court hearings, convictions and the like.

You’ll have seen much in the press, especially following the riots, about a renewed political and societal interest in this type of publication.

Without making this post all about the detailed nuances of those questions, this broader issue about the implications of “open” publication seems to me to need a bit more exploration before we can sensibly make judgements about such cases.

And to do that I took a close look at one very well-known data set: the electoral register.

What is it? Well, it’s a register of those who’ve expressed their entitlement, being over 18 (or about to be) and otherwise eligible, to vote in local and national elections, through returning a form sent to them by their council each year. If you’re reading this, you’re probably on it. I am.

It’s therefore not: a complete list of people in the UK (or even of those entitled to vote); a citizenship register; a census; a single, master database of everyone; accurate; or a distillation of lots of big government systems holding personal information.

What’s it for? An interesting question. I suppose its primary existence is to support the validation of those entitled to vote, at and around election time. But you’ll know, if you have voted, that it’s more of an afterthought to the actual process; most people show up with polling cards in hand, and anyway, there’d be no possibility of any real form of authentication, as the register doesn’t contain signatures, photos, privileged information or any other usable method of assurance. It’s not even concealed from view. (More on that here.)

But it does some other things, doesn’t it? It provides a means for political candidates to be able to make contact for canvassing purposes with their electorate. And I suppose, for that reason, it has this interesting status as a “public document”. Which we’ll come back to in a moment.

And to complete the picture, a subset of it (the “edited register”) is also sold to commercial organisations for marketing purposes, enabling them, amongst other things, to compile pretty comprehensive databases of people.

…and as a byproduct of that it also forms an important part of credit-checking processes–with said commercial organisations able to offer services, at a price, to anyone who wants to run a check that at least someone claiming to have X name has at some point claimed to live at Y address. (Remember, it’s all pretty weak information really, self-asserted with no comprehensive checking process.) You can opt out of the edited register if you choose, but you’re included by default.

[Update 2 Oct: Matthew, below, comments that I’m not quite right here–the full register is also available to be used for credit checking]

There’s probably more, but let’s get stuck into some of this.

First off, I will happily add that the whole business of why it needs to be public at all seems highly questionable. And I don’t remember the public debate where we all thought that it was a great idea to try and make a few quid off the back of this potentially highly-sensitive data? Do you? How do you feel about that?

And the idea that the process of democracy would be terminally hampered were candidates, agents and parties not able to make checklists of who’d been canvassed? Really? Couldn’t they perhaps just knock on doors anyway? As a potential representative would I only be willing to learn from encountering those who had a vote? I suggest not.

So, moving on past those knotty questions about “why do we have it, and why do we sell it?”, we have in practice established some conventions about managing it as “a public document”.

Can I, as a member of the public, request a copy be sent to me? Certainly not. Ok, perhaps I can download it then? Nope. Search it online? Hell no.

I can go and see it in my local library.

So I did.

I heartily recommend you do the same. It is a real eye-opener in terms of the idea of data being “semi-public”.

I trotted up to the (soon-to-be-closed [boo hiss]) information desk at the library under Westminster City Hall.

–Can I see the electoral register please?

–Sure. We only have the edited version here: if you want the whole thing, you have to go through there and ask for Electoral Services.

(He pointed at a forbidding and not-at-all-public-looking door).

–You’re ok, I’ll just have a look at this one

And out from the back window-ledge comes a battered green lever-arch file, containing bundles of papers.

–You know how to use this? he says

I shake my head. It seems the top bundle of papers is a street index. The personal information (names grouped by cohabitation, basically) is listed by street, then house name/number within street. Not by names.

So, you can’t, easily, find someone you’re stalking. (Did I say that? I mean, “whose democratic participative standing you have a legitimate interest in establishing.”)

But you can if you’re patient. Or if their name, like that of one Mr Portillo, leaps off the page at you. I intentionally chose the register of the area immediately around the Houses of Parliament, for just this reason. Curiously, I couldn’t actually find the HoP itself listed, but Buckingham Palace does have over 50 registered voters (none of whom are called Windsor.)

But back to the process: as I picked up the box to head towards an empty desk a finger came down on the lid: –you have to read it here, he says.

I look at the lid. Wow.

I ask the question about photocopying anyway, just to judge the reaction. Kitten-killer, his eyes say.

But I take it a few paces away anyway and have a closer look.

Fascinating. I see a bunch of well-known people from industry and politics, their home addresses, and who else lives with them.

I’m sure I’ll go grey in chokey if I actually published unredacted screen shots in this post, but I’m pretty sure this one will be ok; if nothing else I think its historical interest justifies it… (RIP, Brian.)

Now, in all the fuss we make about child benefit claimant data being mislaid via CD, and in all the howling we make about anonymisation of health records and other sensitive data, and through all the fog that surrounds the commercialisation of public information and the Public Data Corporation etc. isn’t this sort of information that we would normally expect to be the subject of an enormous public debate about even its very existence? And I’m walking off the street and making notes of it, and, and…

And I can see what’s happening here.

Yes, it’s “public”. Sort of. But so much friction has been thrown in the way of the process–from the shirty look as I have the temerity to request it, to the deliberate choice over structure that minimises me being able to quickly find my target–that I would strongly argue it to be “semi-public” rather than public.

There are some important lessons here perhaps when considering the mode, and the consequence, of publishing data online. Clearly, structure is highly relevant. If I am able to sort, and index it, that instantly creates a whole universe of permanent, additional consequences. Not all of which may be that desirable. “A perpetual, searchable, SEO-friendly database of all those ever summoned to court, convicted or not, you say? Certainly sir…coming right up.”

If I’m able to relate information–by association with others–I can also help the cause of those wishing to track someone or something down. Look at Facebook. It does a great job of finding people you search for, even those with very common names amongst its hundreds of millions of accounts, by this type of associative referencing. Powerful stuff.

And let’s not forget that ALL this information is pretty easily available online anyway. You just have to pay for it. The best-known provider that I’ve looked at, 192.com, has an interesting model. You’ll be giving them at least a tenner, and more like £30 to buy some credits to search their databases. And they have the ominous rider that their really sexy information–the historic registers, is only available at an entry-level price of £150 a year. For that reason, I haven’t actually given them a penny as yet. But it’s no obstacle to the serious stalker. I mean, researcher.

I’m sure there are all sorts of impediments, from download limits to penalties for misuse, that attempt to put further spokes in the wheel of it becoming a common commodity. But how long, really, before the whole register is available as a torrent on the Pirate Bay? Maybe it is already?

And we’re not bothered about this? It’s amazing, isn’t it? Yes, this whole industry is built on data that we’re required to submit to public authorities–and if we don’t, we’re disenfranchised.

This is a scandal, and one that urgently needs review.

But do take away the point that there is such a concept as “semi-public” – at least for now. It’s the ability to process, to restructure, to index, that makes online data different from those box files in the library.

The friction we throw into the system, whether it’s (intentionally?) releasing information via pdf, or slipping a local journalist a hand-written note of the names of those in court, is perhaps more than just dumb intransigence in the face of “information that wants to be free”. And it can serve some potentially legitimate social purposes.

Think how you’d feel if those frictions weren’t there around the electoral roll? Even the money that 192.com require for you to buy back the data you gave up in the first place?

Happy that every comment you made online under your own name, every mention in the press, could be traced back to your real address along with the names of your (18+) family? I think perhaps not.

So, a very big public debate is required on the consequences of any personal data being put online. But remember, stealthily or not, we’ve had experience of these issues for years. We just need to look on the library window-ledges to find it.

A question of trust

In seeking an antidote to the selfish ravings of Somalia-bound Liz Jones (I’m not linking. You’ll work it out, but I don’t suggest you try too hard), a kind soul pointed me towards the wise words of Barry Schwartz on society’s loss of wisdom. It’s a great piece: one of those tub-thumping, uplifting TED talks that gets you nodding and waving along with his thesis. Whooping, even.

Basically, he says we’ve dispensed with our humanity in our quest for efficiency and profit. The wrong things are being measured. What really counts in any public-facing service is an appreciation of the softer aspects of, well, human interaction. We’ve lost the wisdom that gives us sensible decision-making, discretion and the ability to “get” all this. Perhaps not “lost”, as much as “designed-out”, in order to please all sorts of other gods.

What’s not to like? How could he possibly be wrong?

There he is, pointing to the job description of the janitor who has a whole load of specified tasks to perform. Mop the floor. Straighten the curtains. Swab the sink. But nowhere, nowhere, does it say: “Be nice to people. Be human. Be flexible.” (In a really perverse way, Bonkers Liz was saying something similar. But from a position of ignorance and vacuous moral bankruptcy, so basically, she can fuck right off.)

And, one might argue, does a job description need to spell out the requirement to be nice? I don’t know, perhaps it would make some difference if it were written down? I’m not convinced.

In the murky world of measurability and management, what does it even mean, anyway? If you put your cleaning out to tender, and one company comes back with a price that’s 10% higher than their competitor, but they promise to smile a lot more at people, and leave a bit of cleaning until tomorrow if someone really just needs a nice chat instead…what then?

Because when you do start buying into this idea, and go down the road of rewarding the soft stuff like satisfaction and happiness, all sorts of strange things are going to happen.

Only last month I heard tales from a friend whose former employer was very keen for staff to “revisit” customer surveys that weren’t high enough, point out to the customer that their personal bonuses were connected to the score, emphasise that the survey wasn’t the place for all their woes with the company to be vented, and see if they couldn’t nudge it up a couple of points. Seriously.

You get what you measure, remember?

Or rather, you get the measurements that lead to a benefit for the person being measured.

And there’s a double-edged sword in all of this. Mr Schwartz and his cheering audience are doing a great TED-style job of assuming good intent. They’re thinking of all the upside that comes from freeing people up to be a bit nicer. Like that extra latitude to go and make a cup of tea for Mrs Jones through being given a bit of slack on the amount of loo-scrubbing they have to do.

They’re probably not thinking of the janitor who is a living misery to the people around him, but who, when challenged, points to the mopped floor, the straight curtains, the swabbed sink… Fancy taking on that performance review? Substituting the subjective judgements of whether someone “has the right attitude” for the hard measures of dustiness or shine? Subjectivity that puts feudalistic power back in the hands of managers who can bully or fire pretty much at will? Always a trade-off, isn’t there?

One persons’s empowered janitor is another person’s slacker-in-waiting. One person’s disability benefit is another’s disempowering handout. One banker’s justified performance bonus is…ok, perhaps that’s too far.

But it’s just Red vs Blue. The eternal debate. Centralise, decentralise. Liberate, control. Trust, assure.

Reds are great at spending someone else’s money. Blues think that pain is a far better motivator.

Trust. Trust. It all really comes down to trust. And so much of trust is based on visibility.

What we decide, what we believe, is based on what we see. The stories we’re told. And here there is an asymmetry. Negative stories travel fast, and easily become powerful myths. If conservative forces don’t believe, deep down, in public service provision at all, that will drive the narrative.

Transparency means that we get a lot more narrative. Blue editors have no end of material, and mass-consumption platforms on which to put it, to propagate Schwartz’s death of wisdom. And when they also claim to be willing to wave aside protocol and contract to “do the right thing”, the dissonance can be shocking.

I’ll end this by mentioning a fantastic piece by Onora O’Neill, one of the most enlightened people it’s my pleasure to know. She thinks rather harder about these things than most. Join her.

The trust paradox

Although we think that “being open” will increase trust and transparency, the reverse is more likely.

I came to this paradoxical conclusion after reading an interesting piece on perverse economics [link; but summarised here to save you jumping around]: why the decreasing cost of something over time doesn’t mean that overall expenditure on it is reduced; instead usage goes up by a relatively larger rate—therefore so does overall expenditure.

It was first formally proposed by William Stanley Jevons in relation to coal production in the c19th and has been applied to lots of other resources including, in that linked piece, the cost of computers. Now I’m thinking about it in relation to the issues of trust in our public services and government.

We express a wish for our politicians to be more open—to share more about the detail of their lives, and not just at the lobbyist-lunching, shady-room-negotiating level. About them as people. We have social media and other channels now that make it faster and easier to do so. The boundary between their (and our) public and private lives gets fuzzy. We love this, when we see it serving our interests.

We have more direct access to our representatives. We can exchange a few words with a government minister via Facebook updates, or hear an opinion from the front bench even before the House does. We love that we can do this with our celebrities too, and we perhaps blur the categories at times. It’s all “public interest”, and the more open the better, hey?

And then things go wrong. With wholly predictable regularity. A public figure says something they shouldn’t. Perhaps something careless, a bit dumb, or misinformed, or—indeed—showing up actual malpractice in either a professional or personal capacity. The resources of a 100-hour working week, 200-mile commuting MP with a family and private life to manage are suddenly matched against sharp-eyed and keen-witted bloggers sitting at home with hours to spend forensically dissecting every statement, every inconsistency. And with no incentive to preserve any of those category boundaries, especially between professional and personal capacity. MPs are there to be kicked, particularly if they’re not of your favourite political colour.

You probably know the sort of thing I mean. The MP may not be whiter than white. But this was always our delusion that they would ever be. They are human. And they’ll get filleted in what amounts to asymmetric warfare. Openness goes up. Honesty and dishonesty are revealed. We amplify the dishonesty and ignore the rest. And trust goes down.

There are similar arguments at play with openness in relation to published data. Throwing everything over the wall creates the appearance of transparency. Surely it must increase our trust? But like a good astrologer we’ll expertly search for the material that confirms our thesis, and glide swiftly past the rest. And I’m not necessarily talking here about material that is genuinely in the public interest: the big fraud, the unambiguous cover-up—I’m talking about the trivial, the amusing, the petty contradictions that arise when serving many complex interests at the same time. The sieve that’s required to separate the two is a rare thing indeed.

Openness goes up. Trust goes down.

There are two ways this effect could be countered: by withdrawing openness (either outright or by stealth) or by drawing on the trusty old “sunlight=disinfectant” argument—that nobody will do anything stupid or wrong any more as they know they’ll be spotted. Good luck if you think the latter is more likely.

The speed camera and the Public Data Corporation

Think of a speed camera.

Think of the proposal for the Public Data Corporation.

One of them has attracted controversy. This seems to be based on instinct or ideology, without much groundwork being put in on the complex models and circumstances that surround it, and what it might mean as part of a bigger picture.

Its supporters see it as a way of bringing some order to a complex system; of ensuring that things actually do move more quickly by introducing an element of regulation. That it will actually bring some accountability and ensure things don’t run recklessly out of control.

Its detractors see it as a cynical front for raising cash for the government.

Oh, and the other one is a speed camera… :-)

What I’m saying, of course, is: we don’t really have much evidence as yet – perhaps it would be good to tease some out before taking a strong position either way?

There’s data, and there’s data

I’m enjoying the latest flowerings of open data, and the recent quality posts from Ingrid Koehler and Steph Gray on what it all might mean. As well as quality action from Rewired State and others to actually demonstrate it in practice. (ooh, I just spotted that a reel of my photos is running on the Rewired State home page – thanks guys)

We’re getting a better understanding of what data actually is now that we’re seeing more of the things that were previously tucked away.

I’ll add my own observations: it helps me, at least, when thinking about complicated things to break them down a bit. My suggestion is to think in terms of four broad types:

1. Historical data

What’s happened in the past: how organisations and people have performed – what’s been said in meetings – what’s been spent – where the pollution has been – how children performed in tests…

2. Planning data

What’s projected to happen, or will shape what will happen: this and next year’s budget – legislation in progress – consultations – proposed housing developments – manifestos…

3. Infrastructural data

The building blocks of useful services. Boring stuff, doesn’t change that often, but when it does, it needs to be swiftly and accurately updated: postcodes – boundaries – base maps – contact directories – opening hours – organisation structures – “find my nearest…”

4. Operational data

The real-time stuff; what’s happening NOW: where’s my train/bus? – crime in progress – emergency information – school closures – traffic reports – happening in your area today…

These are not unrelated: what’s happened in the past will often guide what’s planned for the future. Today’s operational information becomes tomorrow’s history. And so on. There’s plenty of overlap. They’re intended as concepts, not hard definitions. The types can also be combined in every way conceivable: that’s part of the point of releasing the data in the first place.

I’m deliberately drawing no great distinction here between ‘information’ and ‘data’: the latter is a structured, interpretable incarnation of the former. That’s another set of issues in itself. I’ve also skipped over questions of interpretation and spin – this is a blog post, not a chapter of my book ;) And I’ve omitted “personal data” as a type – this is woven through all areas and carries with it its own baggage. I’m thinking more about the basics of function and purpose. Which lead on to usefulness. Which, as I’ve said before, is the test that all this is taking us in the right direction.

“Useful to whom” does of course vary by type: 1 and 2 are great for those holding public service to account (press, public, whoever). 2 is for those who will make change happen. 3 will benefit of ordinary people in day-to-day life (and I’m careful here not to imply that these ordinary people ever have to see ‘data’ or an ‘e-service’ themselves: their local paper, toddler group, or community centre noticeboard are all valid intermediaries here). 4 will do things for the e-enabled – the mobile generation, the data natives, as well as for places that can serve an offline public (screens in train stations, visuals at bus-stops).

As a practical suggestion, I would love to see some of the current initiatives to build repositories and access to data recognising these distinctions exist. A little more signposting about the type of data that’s being released may help to highlight which types are being overlooked. For as we know, opening up the narrative helps to drive the change itself.

And how are we doing against these four types?

Pretty good on historical (it’s quite easy to dump old files online); weak on the future planning stuff (trickier, because if there’s no means of action accompanying the data, will publishing do anything other than frustrate?); getting there on infrastructural (though licensing, linking and standards offer the greatest challenges); struggling on operational (contractual, accuracy, standards).

That’s a one line summary. What do you think? Where should we putting more effort?