Just because you can…

An interesting piece appeared on the Guardian data blog on Friday. It describes a wealth of new data being released relating to court and conviction information.

The database shows sentencing in 322 magistrates and crown courts in England and Wales. Defendants’ names are excluded but details such as age, ethnicity, type of offence and sentence are not. Any computer user can analyse aspects such as how many white people were sent to jail for driving offences.

All good stuff. There’s definitely value to be gained from this type of analysis. It’s being released as a database (hopefully with a commitment to regular ongoing publication), and it brings consistency to often haphazard arrangements for making data available. These are positive moves, and should be welcomed.


Transparency campaigner William Perrin, who advises the Ministry of Justice on opening up its data, says the release is a big step: “Publishing the details of each sentence handed down in each court is a great leap forward for transparency in the UK, for which MoJ should be warmly praised. Courts have to be accountable to the local populations they serve.” But he, like some campaigners, believes the MoJ should go further, releasing the names of defendants. “The data published is anonymised, flying in the face of hundreds of years of tradition of open courts and public justice.

“The MoJ need to have an open and public debate about the conflict between the central role in our society of open public courts where you can hear the name and details of offenders read out in public and crude misapplication of data protection.”

My concern lies with the consequences of releasing the names of individuals, as proposed here, in a completely accessible and reusable way.

William draws a parallel between the act of reading out names in public court and publishing them on the Internet. (Disclosure: William and I both sit on the Transparency Sector Panel in MoJ.)

Were it a simple parallel, with the same consequences, I’d be pretty comfortable with the principle of release, too. But I see one very big difference: raw content on the Internet is (almost always) indexed by search engines. And search engines have very, very long memories. The (only) two things that the Internet has fundamentally changed are the ease with which information can be found, and the duration and extent over which it persists–as I’ve banged on about on this blog before.

So, this proposal (if taken at face value) would lead to a couple of consequences which might not be wholly desirable: firstly, a name would quite feasibly, if entered into a search engine, throw up information about an offence and the consequent sentencing for an indefinite time. What implications does that have for rehabilitation of offenders? If your conviction has been spent, and your potential employer does a quick check and finds that the only thing you’ve ever been noted for on the Internet is… Well, would that feel just to you?

Ah, I hear you say–but look at court reporting now: those journalists that do manage to get intelligible information out of a clerk so they can write their pieces accurately end up with their content being indexed (paywalls permitting), and the Google ghosts will be there to do their haunting anyway. Yes. They will. But this is an issue of scale and ease, not principle. Journalists today, even those with perfect information, exercise some choice over what they choose to print. Maybe this is just because of space constraints, maybe there are other factors at play. But the “release everything for reuse” stance would dramatically increase this scale of publication.

You may say that this is a good thing: along similar lines as “nothing to hide, nothing to fear”, this extra hangover from a criminal’s downfall may be a very positive thing for society. Another deterrent to criminality, maybe? I don’t know about that, but I do know that we then face a reappraisal about what we mean by rehabilitation as a direct consequence of data release.

And, as William says, that needs proper public debate.

But it’s not just a matter of scale. We find, when public data is released en masse, that new business opportunities spring up. Imagine the entrepreneur who gathers all data on convictions and charges for their own employee check service. They might adhere to principles of time limitation on their data. They might not. They might mash-up this data set with other information. They might not. They might put profit before principle.

We attempt to control such reuse of information with regulation, but on the Internet, it gets very much harder to make this stick in practice. Again, we risk changing the landscape of what it means to be convicted, by releasing data like this.

I’m fascinated by how even something like the current Data Protection Act relates to the indexing of personal information within search engines. Surely, almost by definition, the end purpose of such indexing cannot be known, and therefore Principle 2 (Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes–source: ICO) must surely be creaking already?

So, I’m not so keen on making it indexable. Can this be avoided? Is there a middle ground which acknowledges the shambles that is the current practice in courts–with some prepared to supply information in machine-readable format, others insisting on hand-written notes being passed, and some seemingly actively obstructive in providing information?

I think there might be. There are some “government” datasets which although they could be released for reuse, aren’t. For fairly good reasons. The database of car registrations, for example. I suspect we’d consider if a bad thing if a road rage incident could be easily followed up with some bricks through windows on the basis of typing in the offending registration plate when you got home.

Similarly, we have a curious set of “frictions” in place to allow us to have an electoral roll which is at the same time both “publicly viewable” (provided you go to a library) and searchable online only if you pay up a good chunk of cash. A big hmmm from me to that latter part, by the way, but you can read much more on electoral roll issues here.

And the way that this data is structured is also important: so that we can’t, for example, easily go online, type in an address down the road, get a full list of occupants’ names and pop round there with all sorts of social engineering stories designed to make trouble/extract money/dig for further info/groom/be very creepy. Again, I’d suggest we do this for good reasons, and we know how to build machinery to keep this equilibrium in our society.

We may solve the problem through choosing carefully the format for release, the means by which it’s referenced, and even to whom it’s released. Yes, I know, those wretched privileged accessors again (just like the Police, DVLA, local authorities, credit agencies etc etc etc.) Always a subject to warm the temperature in open data discussions!

But I’m not arguing for wilful obfuscation of this data, merely putting forward some of the alternative perspectives to “everything, raw, now”. We do need this public debate, and we need to be reasonably confident that we’re getting a net societal benefit from whatever action we take.

Let’s tread carefully here–just because you can, doesn’t always mean you should.

[I’d be commenting on the Guardian article if I could, but it doesn’t seem to have comments open, so I’ve written this in response.]

Not quite public

That old question came up recently: What really good online service experiences has government ever given us?

As usual, the first (and, sadly, often the last) answer: the online tax disc service. There’s no doubt it’s a properly good use of the online channel to save people hours of queueing and paper fiddling. It achieves its magic not with any fancy visual design–its interface isn’t that great (those five questions up front–why, just why?). And it still stubbornly refuses to update its strapline from one that was phased out several years ago. (Did you notice? No, of course not. Straplines are irrelevant.)

What matters are two bits of genius: one, the removal of any burdensome personal identification at the front end. No Government Gateway, no personal identifiers. Just a reference number that you type straight off the paper form that’s sent to you when its due. That’s it. If you have that number, and you have the means to pay, a tax disc will soon be on its way. Whoever you are. It’s about the car, not you.

The second miracle is the joining up of databases at the back-end. The car’s registration is used to call on MOT and insurance databases (information from completely different sectors, let alone organisations) to save you digging out slips of paper and doing all that queueing only to find out that one of them is a little bit out of date. Don’t underestimate how valuable that service join-up is.

But this post is not about tax discs. It’s about another online service, also from DVLAVOSA [updated: the MOT scheme is run by another Dept Transport agency, VOSA], that far fewer people know about.

And it’s not to illustrate a service point, for a change, it’s to explore an information point.

The MOT.

The evil cousin of the tax disc. It doesn’t display its expiry date for the world to see on your windscreen. Well, it does if you choose to fill in the little sticker that you get with your MOT pass certificate. But that depends on your choice. And nobody is going to punish you if you don’t do it, or if the little sticky peels off and gets lost. So we know what that means.

When’s your car’s MOT due? Yes, you! Do you know? Without finding the last certificate, which, let me guess, isn’t about your person or your desk as you read this.

You could find out online, you know. There’s a nifty little utility here. And what do you need to get access to that magic expiry date? Well, you need the registration number of your vehicle, naturally. Which you probably know.

And you need a reference number from your last test certificate (or failure notice).

Really. No, I’m not joking.

Ok, I’m exaggerating a little: if you don’t have the test certificate to hand, there is a fallback. You can use the number on your blue V5 form. The one that we oldsters still call “the logbook”. Now, let me take a wild guess as to where your logbook is kept? Any possibility it might be in the same drawer as your… You’re there already aren’t you? Give yourself a quick kick on the inside of your shins, DVLAVOSA service designers.

So we have a potentially brilliant online service, that, if promoted, could stop tens of thousands (my guess) of people slipping past their MOT expiry dates without realising. The only time they think of these things is in idle hours at their desks at work, while the documents they need languish in a dusty study drawer at home.

And what would make the service brilliant? Just making it usable on the basis of the registration number alone. Which would mean that anyone could look up anyone else’s MOT expiry status. (The crowd suck through their teeth…is that, I mean is that, ok?)

And is it?

The point (which I have finally got to) is that MOT status information is a curious dataset. It’s not quite private (well, it’s barely “protected” to any appreciable extent), and it’s definitely not public. Instead we’ve built a little friction around accessing it (needing to drag out a hard-to-find bit of paper rather than an easy-to-find remembered–or seen in the street–fact).

Does it feel like personal data to you? Would it bother you if your nosy neighbour could look up your missed test date and start leaving little passive-aggressive notes on your windscreen? Or should it be a public data set? Nothing to hide, nothing to fear and all that. And the bloody tax disc expiry date is printed loud and clear for all in the street to see, isn’t it? What’s the difference?

The only risks I can think of that are headed off by this rigmarole are the nosy neighbour one, or possibly a local garage touting for business on the basis they’ve spotted your car is coming up for a test soon, or a miserable Lazy Wail underling sitting in a grey basement tapping in slebs’ car registrations in the hope of getting a pathetic non-story.

That’s not a lot, is it? Am I missing something? Is that the entirety of the reason why we are denied an incredibly easy-to-implement online tool which would save us real time and real money?

Over to you. And over to you, DVLAVOSA, if you’re reading. Which I hope you are.

I’ll revisit this concept of quasi-public data soon. Things that aren’t quite public, aren’t quite private, and may well be personal. Things like the electoral roll, for example :)

Data.gov.uk one year on

A year, almost to the day, from the launch of data.gov.uk it seems clearer that it was really trying to fire at three targets simultaneously: transparency, usefulness and good old commercial value. Three targets that have some overlap, but also some inherent tensions. How well has it done?

On transparency, we heard much along the lines of “sunlight being the best disinfectant” and that the very act of openly publishing information, particularly on accounting and spending, would do much to reduce wrong-doing and rebuild trust. It might not matter so much if the information wasn’t actually read that regularly or in detail; what mattered most was that it was published. We were told that tools would emerge to make general understanding easier, that amateur auditors would audit from their armchairs and indeed there has been some progress in this area. But there hasn’t been a dramatic unveiling of hitherto concealed horrors, just some visualisations and a tendency to focus on quirky details that make interesting stories—with no substantive follow-up.

On the subject of usefulness, things have gone less well. We haven’t seen much in the way of new apps and services driven by data.gov.uk data which actually deliver value to people in their day-to-day lives. Political pressure has been focused on driving out more of the spending data, perhaps at the expense of data that may be practically useful. We can speculate about the political factors at work here: gleeful exposure of the excesses of the last government and the current tensions between central and local government on spending priorities both spring to mind. But it does mean that the genuinely “useful”—the data that describes things in real people’s lives: maps, postcodes, contact information, opening hours, forthcoming events—and the real-time stuff, such as live running transport information, are falling behind. And that’s where the really useful apps and services are going to come from. Certainly, recent moves such as the release of Ordnance Survey maps under reusable licence are steps in the right direction, but much more political will is needed here to level things up.

And on the last target—the billions of commercial value that were touted as being locked up in government data—things don’t seem to be going too well at all. Some of this value was no doubt to be derived from the opening up of key enabling datasets—such as maps and postcodes—allowing new business opportunities to really take off. But some of it would have to come from inherent value in the data itself, or released from the combining of datasets to produce new products: taking data and finding new markets for it. Quite where this is currently headed remains shrouded in vagueness, but a new Public Data Corporation is now proposed, which lists among its objectives the management of the conflict between revenues from the sale of data and the benefits of making it freely available. This doesn’t actually seem that unreasonable. If one considers data as a national asset, why would it not be sensible to secure appropriate commercial value from it as with any other asset? But the proposal has triggered questions and some criticism from open data campaigners that this wasn’t how it was supposed to be. The extent to which commitments to release data free of charge were actually made or implied is now coming under scrutiny.

So where do we go from here? In the light of what we’ve learned over the last year, I’d prescribe the following: a rebalancing of the data held within data.gov.uk in favour of the genuinely useful; swift clarification of what is to be made available free of charge and what is not; a more mature approach to engaging developers and entrepreneurs if we’re really to see apps and services flourish (it’s going to take more than just a few “hack days”); and some exploration of how to demonstrate the value returned from what government spends. This last point should be of concern: at the launch last November of central government spending data, I reminded Francis Maude and the Transparency Board of Wilde’s description of those who knew the price of everything and the value of nothing…