About me, but not of me?

Silhouette of boy in sea

No man is an island,
Entire of itself.
Each is a piece of the continent,
A part of the main.

Bear with me here.

Lots of agitation at the moment about the prospect of our health records being flogged to the highest bidder and scattered to the four winds in the interests of progress and profit.

So here’s a thought to chew on: how are we so sure they are actually our records?

What are these records, anyway? A gathering of facts – of some very personal facts for sure – my weight, my addictions, my phobias, my illnesses. But it’s also a record of my transactions, medical interventions, successes and failures.

And that latter aspect takes us, in this contorted line of reasoning, to a more complex place than merely a collection of details about me.

For transactions have two sides. Givers and receivers. Ministers and ministered. To choke off the supply of feedback on interventions is to poke a stick in the eye of rationality and science, surely?

There’s a ready assumption that it’s “my record this” and “my record that” – but what if we were to reframe this? What if we were to accept (after a huge public thrashing-out that has shown no sign of taking place so far) that by receiving we also have to give? That the quid pro quo of that new medication is the giving up of that transaction, of its success or failure, so that others may learn, and that we all may benefit? Thus science would march onwards with its boots reinforced by the tough leather of real-world evidence.

Of course those two constructs I mentioned above: facts about us, and about our transactions, can’t be neatly separated like that. To make sense of my intervention you have to know about my underlying condition. Evidence of interaction may not have much meaning without a historical context. So while it may be more palatable to argue for the sharing of intervention experiences for the greater good, pieces of the “us” stuff will inevitably be attached.

But pause for a moment – consider what the debate about personal data would look like were we to acknowledge that just because something is about me, it isn’t necessarily of me.

Insistence on opt-outs from data collection would start to look like an act of selfish resistance, a dogmatic adherence to an ideology of paranoia, a one-way street in terms of the flow of benefits. Yeah, give me the treatments, but don’t expect to be able to learn anything based on the outcomes.

Deanonymisation (the jigsaw rebuilding of supposedly laundered data to reidentify personal records) would start to look less like an absolute evil, and more an equivocal risk to be weighed against benefits. See those benefits of sharing as benefits to us all, and the harsh black and white of much of the debate around rights and records melds to a rather more nuanced sea of greys.

This is hardly a popular line of thought. But I think it merits a bit more of an airing. Where else would you expect to preserve such transactional asymmetry? What is so sacrosanct about our physical existence that makes it right to fight against information sharing when this may act against the rational interests of our collective societal body?

Having set your stall out against data sharing or anonymisation, or in favour of informed consent to share, are you still so sure of the moral rock on which you’ve built it?

Digital by default

As I write this, I’m sitting on a stationary train. In a station. The rail app on my phone tells me it’s the train I want. But the signs on the platform are totally blank. And the guy in uniform on the train doing the uncoupling says he doesn’t know where it’s going.

So, do I believe what the app tells me? Rather than embark on an exercise in Bayesian conditional probability, it’s making me think about that phrase “digital by default”.

Because I’m still not entirely sure I know what it means. Or, even if I do, that I’m seeing it used consistently.

And this experience with the phone app right now is a good reflection of what I think it should mean: that a service has been built, first and foremost, so that its delivery in digital channels is the way that it works best.

–that information in the digital channel is “the truth”.

–that if the train is switched to another platform, the digital channel will be the first to reflect this.

–that train staff will be looking at their own digital devices for information before they look at platform signs, or paper print-outs of departures, or get on the internal intercom to the driver.

That, to me, is digital by default.

An underpinning design principle that the service is supposed to be like this. Not, as has so often been the case, with digital features as a sort of awkward bolt-on after the fact.

I pointed out to a member of station staff a few weeks ago, who tried to stop me, that I was going through the gates to platform 10 because this device in my hand was telling me my train would be there. And I trusted it, at least enough to wait there.

He looked in incomprehension at this device. It wasn’t part of the script. The situation was the very opposite of “digital by default”.

So, apart from this nice, rosy, optimistic definition, what else have I seen it used to mean?

Well – sadly, sometimes as the Mr Nasty of channel-shift enthusiasts: the reason why counter services will be closed, the hammer that will force people to abandon their Luddite ways, the only real means of forcing out cash savings in this techno-progressive world we were told so much about.

And if people don’t want to shift, then tough. They won’t have the option. Default, innit? Capisce? Ok, if they’re really incapable, because of disability or crap connectivity, there’ll be some sort of stop-gap. A bolt-on, if you like. After the fact.

Now, does that sound somewhat familiar?

Or, for a third flavour, how about Mr Nasty’s gentler cousin: the service redesign that still has the closure of non-digital channels at its heart, but attempts to do so by attraction to a better, digital alternative, rather than brute imposition?

The interpretation you hear is connected to the source you hear it from, I guess. These versions all have different political palatability, and provoke different passions in different audiences.

So which do you i) think it really means now? And ii) which one would you like it to mean?

A – a fundamental design principle from the ground up
B – channel shift by imposition and removal of choice
C – channel shift by being more attractive than non-digital

Your answers, below, if you please:

Just because you can…

An interesting piece appeared on the Guardian data blog on Friday. It describes a wealth of new data being released relating to court and conviction information.

The database shows sentencing in 322 magistrates and crown courts in England and Wales. Defendants’ names are excluded but details such as age, ethnicity, type of offence and sentence are not. Any computer user can analyse aspects such as how many white people were sent to jail for driving offences.

All good stuff. There’s definitely value to be gained from this type of analysis. It’s being released as a database (hopefully with a commitment to regular ongoing publication), and it brings consistency to often haphazard arrangements for making data available. These are positive moves, and should be welcomed.


Transparency campaigner William Perrin, who advises the Ministry of Justice on opening up its data, says the release is a big step: “Publishing the details of each sentence handed down in each court is a great leap forward for transparency in the UK, for which MoJ should be warmly praised. Courts have to be accountable to the local populations they serve.” But he, like some campaigners, believes the MoJ should go further, releasing the names of defendants. “The data published is anonymised, flying in the face of hundreds of years of tradition of open courts and public justice.

“The MoJ need to have an open and public debate about the conflict between the central role in our society of open public courts where you can hear the name and details of offenders read out in public and crude misapplication of data protection.”

My concern lies with the consequences of releasing the names of individuals, as proposed here, in a completely accessible and reusable way.

William draws a parallel between the act of reading out names in public court and publishing them on the Internet. (Disclosure: William and I both sit on the Transparency Sector Panel in MoJ.)

Were it a simple parallel, with the same consequences, I’d be pretty comfortable with the principle of release, too. But I see one very big difference: raw content on the Internet is (almost always) indexed by search engines. And search engines have very, very long memories. The (only) two things that the Internet has fundamentally changed are the ease with which information can be found, and the duration and extent over which it persists–as I’ve banged on about on this blog before.

So, this proposal (if taken at face value) would lead to a couple of consequences which might not be wholly desirable: firstly, a name would quite feasibly, if entered into a search engine, throw up information about an offence and the consequent sentencing for an indefinite time. What implications does that have for rehabilitation of offenders? If your conviction has been spent, and your potential employer does a quick check and finds that the only thing you’ve ever been noted for on the Internet is… Well, would that feel just to you?

Ah, I hear you say–but look at court reporting now: those journalists that do manage to get intelligible information out of a clerk so they can write their pieces accurately end up with their content being indexed (paywalls permitting), and the Google ghosts will be there to do their haunting anyway. Yes. They will. But this is an issue of scale and ease, not principle. Journalists today, even those with perfect information, exercise some choice over what they choose to print. Maybe this is just because of space constraints, maybe there are other factors at play. But the “release everything for reuse” stance would dramatically increase this scale of publication.

You may say that this is a good thing: along similar lines as “nothing to hide, nothing to fear”, this extra hangover from a criminal’s downfall may be a very positive thing for society. Another deterrent to criminality, maybe? I don’t know about that, but I do know that we then face a reappraisal about what we mean by rehabilitation as a direct consequence of data release.

And, as William says, that needs proper public debate.

But it’s not just a matter of scale. We find, when public data is released en masse, that new business opportunities spring up. Imagine the entrepreneur who gathers all data on convictions and charges for their own employee check service. They might adhere to principles of time limitation on their data. They might not. They might mash-up this data set with other information. They might not. They might put profit before principle.

We attempt to control such reuse of information with regulation, but on the Internet, it gets very much harder to make this stick in practice. Again, we risk changing the landscape of what it means to be convicted, by releasing data like this.

I’m fascinated by how even something like the current Data Protection Act relates to the indexing of personal information within search engines. Surely, almost by definition, the end purpose of such indexing cannot be known, and therefore Principle 2 (Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes–source: ICO) must surely be creaking already?

So, I’m not so keen on making it indexable. Can this be avoided? Is there a middle ground which acknowledges the shambles that is the current practice in courts–with some prepared to supply information in machine-readable format, others insisting on hand-written notes being passed, and some seemingly actively obstructive in providing information?

I think there might be. There are some “government” datasets which although they could be released for reuse, aren’t. For fairly good reasons. The database of car registrations, for example. I suspect we’d consider if a bad thing if a road rage incident could be easily followed up with some bricks through windows on the basis of typing in the offending registration plate when you got home.

Similarly, we have a curious set of “frictions” in place to allow us to have an electoral roll which is at the same time both “publicly viewable” (provided you go to a library) and searchable online only if you pay up a good chunk of cash. A big hmmm from me to that latter part, by the way, but you can read much more on electoral roll issues here.

And the way that this data is structured is also important: so that we can’t, for example, easily go online, type in an address down the road, get a full list of occupants’ names and pop round there with all sorts of social engineering stories designed to make trouble/extract money/dig for further info/groom/be very creepy. Again, I’d suggest we do this for good reasons, and we know how to build machinery to keep this equilibrium in our society.

We may solve the problem through choosing carefully the format for release, the means by which it’s referenced, and even to whom it’s released. Yes, I know, those wretched privileged accessors again (just like the Police, DVLA, local authorities, credit agencies etc etc etc.) Always a subject to warm the temperature in open data discussions!

But I’m not arguing for wilful obfuscation of this data, merely putting forward some of the alternative perspectives to “everything, raw, now”. We do need this public debate, and we need to be reasonably confident that we’re getting a net societal benefit from whatever action we take.

Let’s tread carefully here–just because you can, doesn’t always mean you should.

[I’d be commenting on the Guardian article if I could, but it doesn’t seem to have comments open, so I’ve written this in response.]

The Accidental Data Controller

It happened a few months back.

Facebook (that hideous, grunt-cheering, dumb-arse cesspit of a privacy clusterfuck–but let me try and remain objective) started to put some rather strange suggestions for new “friends” up on the top right. People who weren’t unknown to me, exactly, but whose electronic link to me could only have been derived in one way.

From email addresses.

These were people who I may once, ever, have emailed. Or who had emailed me, maybe just the once.

And this latter angle got me worried.

Because I know I have never, ever pressed that “find my friends by pillaging my address book” button. Not in Facebook; not in any other service.

And anyway, some of those names weren’t in my address book anyway. But mine must have been in someone else’s… And I started to whiff a potentially horrible thing. However, this being Facebook, and Facebook being full of horrible things, I tucked it into a mental back drawer and let it go. That time.

Then, last week, I got another email invite to some new whizzy networking service. The invite came from someone I’ve got a lot of time for, so I figured there’d be no harm in signing up and having a quick look around.

The first thing I was greeted with on entering the new service was the message: “Ah – it looks like you already know Rich D—-; why don’t you connect to him on here?”

And that, dear reader, brought the whole sorry mess tumbling out of that back drawer in my head.

This service was entirely greenfield territory to me. I had shared absolutely nothing with it, other than my name and email address (by virtue of using it as the basis of my registration).

So the only way this matching could have occurred would be if Rich had clicked on the “Pillage Me!” button, and passed his entire address book to the new service, there to be held in limbo until such time as happy little matches like me popped up to trigger this unwelcome welcome.

I know I’ve agonised on this blog before about what makes personal data personal. About how uniqueness, utility and linkability all have a big bearing on just how “personal” a piece of data is (and how much we should therefore be bothered by its loss or misappropriation).

Just having one bit of data floating about would be concerning enough, but–and this is a big but: what if that address book pillaging also took not just the raw email address itself, but also the associated name (or indeed any other fields)?

Anon@freetibetbyforce.com may just be an address to a dead-drop online account, but if it’s ever been associated with a real name, manually entered, in someone’s address book…(you see where I’m going here?)…the consequences could be pretty horrendous. Obviously this is an extreme example–but it makes the point–third parties are sharing your email address and perhaps related personal data in vast quantities, without really realising they are doing so, with services that hold it…where? how securely? for how long? IN ORDER TO MATCH YOU UP ON SOME LAME SKILLS NETWORK SITE?

When companies first started this sort of indiscriminate hoarding and sharing of personal data, we created the Data Protection Act as a countermeasure. Clearly, it’s getting hopelessly out of date and was never designed for this sort of scenario.

But humour me, and assume we should still adhere to its principles.

That would mean that you, me, anyone with an address book, could (or should?) be required to register as a Data Controller–mindful of the fact that our own address books have powerful, valuable content and with one click we become complicit in a process that spreads it way beyond the bounds of any purpose we could sensibly be said to have consented to.

I think this is hugely important, as no matter how careful we are with our own information, we are entirely reliant on the caution of others not to compromise it.

It’s an interesting one. Exam question for the Information Commissioner’s Office then: how big does your address book have to be before you need to register it under the Data Protection Act?

About that Data Protection myth

If you follow me on Twitter you might have spotted a recent exchange of views over the last few days with Vodafone. They do a fair job, it has to be said, of engaging in that channel. I’m not sure how joined-up or consistent it is with their other channels, but at least it’s nice to be able to ask a question and get a sort-of-answer.

My question stemmed from a curious experience when trying to contact the Vodafons via their website. They’ve taken the “use our webform, not an email address” approach. And to use the webform, I have to be logged in to the Vodasite using what I consider to be fairly strong credentials: i.e. to register on the site in the first place I had to have the physical phone to hand so that an SMS could be received and a time-limited security code typed in (as well as account details and so on)–you get the picture, nice use of a reasonably secure channel to confirm who I am. [See update below: the same web form is available even if you’re not logged in, going some way to explaining the subsequent requests for further information by email.]

I’m also required, during registration, to supply an email address. In this case, the same one as I then supplied on their webform for further contact.

So having duly completed and sent off my webform, I was surprised to receive the following email two days later [extract, verbatim]:

At Vodafone, we are very particular about the security of every customer’s account to ensure that account specific information is not being shared with a non-account holder.

For me to access your phone account and provide you the account information, please provide me below mentioned security details:

– First Line of Address with Postcode
– Date of Birth
– Payment method
– Account number

Now this seems like an awful lot of personal data to be supplying simply to “prove” that the email address which sits in my securely-registered account is actually mine. Doesn’t it? Is it just me?

And being a bit twitchy about personal data exchange, especially via a channel as insecure as unencrypted email, I take it up with them. And via Twitter, I get that old favourite answer for this odd request: “…because of Data Protection” — and later “…in order to pass Data Protection”.

It’s worth reminding ourselves at this point what the Data Protection Act actually says and does. It’s built around eight fundamental principles which are all fair and reasonable provisions like “you must have consent from someone for the purpose for which you want to hold and process their data”. That sort of thing.

Principle number seven is an interesting one: it requires the company holding personal information to have adequate measures in place to protect it.

And here’s where this particular Data Protection myth arises. A company will often say “Data Protection makes us…” when what they mean is: “in order to mitigate the risk of bad things happening with your data, we’ve decided to implement some internal procedures which we think do the job”.

See the difference?

Let’s just scrutinise what’s happening here: I am being asked to provide personal information via an insecure channel to validate identical information that’s held within an account already held by them, which was created in a more secure channel.

And the company have the brass neck to tell me that “Data Protection” is making them do this?

Frankly, how well or badly they choose to implement their own processes is up to them. Up until the point at which their customers think they’re just so awful that they move to another service provider. That’s the free market; and perhaps this sort of oddness isn’t so whingeworthy.

But what’s made this into a blog post, and something I will be following up with the Information Commissioner’s Office, is this lazy use of tired, old mythspeak to try and present a poorly-designed, internal attempt at risk mitigation as something that the nasty old government has forced them to do.

(I’ve asked for a contact in Vodafone’s Data Protection team to explore this further, but haven’t received one at the time of writing.)

UPDATE: 2100, 17 Oct

Well, Vodafone certainly got engaged (at an accelerated pace once I’d posted this, and it had had a bit of RT love). Tweets, the address for the Data Protection team, and finally a very friendly phone call. Nice work. So it turns out I made an inaccurate assumption in the post above, which puts a different cast on some of the story, but raises other questions. You don’t have to be logged in to the site to use the “contact us” web form. In fact, whether you’re logged in or not (I happened to be), the web form simply has the function of sending an email to Vodafone, to which they will then respond via “standard” email. One might ask why they don’t just provide an email address: I suppose they avoid some spam this way, but you also lose the benefit of being able to see what you reported in your sent items… Swings and roundabouts.

More serious though is that much is made of the web form being secure (https). A level of comfort which is then utterly undermined by the subsequent request for that personal information to be sent back to them in clear email. I offered some alternative approaches, including taking advantage of the ability to log in securely in order to establish a much smoother, and less risky, communication channel. And a few pointers on copywriting to ensure that users don’t get the sort of surprise I did at being asked to email a bunch of personal data back at them.

It makes a certain, convoluted sense that they then have to ask these personal information questions in order to satisfy their Principle Seven obligations, but only because they’ve paid insufficient attention to contact design in the first place. I noted that in all the online transactions I’ve used (and that’s quite a lot) some of them involving rather bigger lumps of money, or data of greater sensitivity, than a phone account, I’d never been asked to provide information in clear like this. And that by itself should be a clue that all was not as it should be. The combination of address, date of birth, and an account number provides a malefactor with a heck of a headstart in further social engineering, and there’s really no excuse for asking it to be passed over like that.

We’ll see what changes.

Getting personal

For a long time, I’ve shied away from writing here about personal data. Or even thinking that deeply about it. The nature of identity, yes. The usefulnesss of data, yes. Personal data, no. Why?

Not because it isn’t fascinating, or important. Mainly because it’s so…damn…nebulous. And difficult. Time to get over that, I think. Very significant things are happening in this area, and we all need to raise our game in how we understand and engage with the concepts involved.

As I’ve surmised before, the only things that are really different in the Internet age are the ease with which information can be found, and the ease with which it can be stored.

Two things, really. That’s all.

The first embraces everything around indexing, cross-referencing, labelling, structure and searching. The latter takes us into the territory of copying (and of course copyright), archiving, and the general issue of persistence.

And when we look at personal data in that context, there is an immediacy–and potential toxicity–in what emerges.

We saw early rumblings of this long before the Internet, of course, when computers were first used for the mass processing of information about people. Things could be done with databases that simply weren’t possible with big paper ledgers.

We created Data Protection legislation which attempted to put reins on the ability to make free use of some types of information. Gathering stuff about people, from the basic facts of who and where, to how to contact them, who they were connected to, and what their tastes and preferences were. Pure gold, used in the right (or wrong) ways.

Data Protection set out some pretty sound, but general, principles. The overarching one being that the purpose to which data could be put should always be made clear to whoever provides it, at the time of providing. Lots of other stuff about processing, storage, where and how long, and so forth–but that issue of consent always seemed the most important, to me.

And we scratched about a bit to actually try and define what we meant by “personal data”. Some things were easy. Names. Addresses and phone numbers. They’re just obvious.

But what about our tastes? Our buying history? The movements of our mobile phone from cell to cell? A journey we took? As one takes informational side-steps away from the individual, the obviousness diminishes, but if you can make meaningful connections back to the person…

…and remember the first thing that the Internet really changes?

Being able to make those tenuous links between blocks of information into something really substantive.

And the second thing? That information and those links are now permanent. You can’t delete them, once they’re there.

All those things that databases couldn’t previously do, because they all conformed to different standards, and weren’t connected together? They can now. Things can be done via the Internet that simply weren’t possible with just the databases.

Bit by bit, it’s been possible to build up the most humongous repositories about people. Maybe entirely within the law, maybe in other ways as well. Maybe with explicit and informed consent all the way down the line. And maybe not.

Who’s to know? We find strange things going on with data that we provide in order to use one or other service–or even to exercise our democratic rights. Didn’t it ever strike you as slightly weird that the electoral roll could be sold on for commercial purposes? (Much more on the electoral roll in another post coming soon.Update: now here)

We have big companies that have built successful businesses just like this: perhaps using aggregated personal information for credit referencing, perhaps to sell to marketeers to give them a better understanding of demographics.

The genie is very much out of the bottle. Your rights to see the information that a particular company holds on you may exist, but you have to have a fair idea of which company to ask in the first place. Can you ever see the full picture of what others know about you?

Of course not.

And it’s unreasonable to suggest that we’ll ever be able to do that. Instances of data multiply more rapidly than does our capability to track them. (There must be a Law of Internet Entropy out there that says something like that. If not, I just invented one.)

(As an aside, a dear friend once uttered the memorable line “somewhere out there, there’s a database with your dick size on it”. That was in 1989.)

So what can we do?

Realistically, all that’s available to us are firebreaks and friction.

We can’t get that genie back in the bottle, but we can slow it down a bit, and find ways to mitigate the impacts.

Do we need an updated definition of personal data? It’s MUCH harder than it seems at first glance to create one. The best I can find at the moment in terms of an “official” position is here.

And it’s clumsier than you think. Essentially, it’s a list of ever-widening filters that assess whether a particular piece of information can be connected to a specific individual. Culminating in the rather wonderful catch-all of the final category:

8. Does the data impact or have the potential to impact on an individual, whether in a personal, family, business or professional capacity?

Yes The data is ‘personal data’ for the purposes of the DPA.
No The data is unlikely to be ‘personal data’.

Even though the data is not usually processed by the data controller to provide information about an individual, if there is a reasonable chance that the data will be processed for that purpose, the data will be personal data.

That’s pretty general, no? In fact, going by that, an awful lot of things are now personal data. I really like the emphasis it puts on the outcome of the data use, not attempting to over-define things like form and structure.

I’d go as far to say we should probably throw away that big long document, and just run with this definition:

Personal data is information that affects you when it’s used. Either directly, or through being linked to other information using technologies that exist now, or may exist in the future.

Broad enough? ;)

(So my beloved photos: they’re personal data. I take them with a camera that has a unique number, held in metadata in the picture file. That provides a way to link all the pictures it takes together, and then, through the various accounts I put them in online, back to me. Think how many other trails you leave…)

But again, all we really have are firebreaks, and friction. There’s a sort of reverse entropy at work. Unlike almost every other instance of entropy–where things get more chaotic over time (china plates get broken, they never put themselves back together again)–personal information is, relentlessly, only going to get more linked. More aggregated. More pervasive. More permanent.

(So, maybe I just invented The Law of Reverse Internet Entropy as well? Not bad going for one post…)

And if someone tells you that big blocks of personal data can be “de-anonymised”, be very sceptical indeed. (You can read some wise thoughts on the issues involved here and elsewhere on that blog.)

We can undertake some pretty noble fire-breaking: like ensuring the state doesn’t become the source of a global universal identifier for you. And we will certainly see more developments around multiple personas: compartments of your life associated with particular tasks, contexts, or connections. I think we’ll have to. (The concept of federated identity helps here, but that’s too much to go into for this post. Read more thoughts from the team working up these concepts for government.)

And we’ll adjust. Society has seen some pretty dramatic upheavals. Often associated with a new technology, or philosophy. If we adjust our societal norms faster than the upheaval, we don’t notice. If we’re slower to change, it’s painful. For a bit.

But we get through. We adapt. And we change. Always.


I’ve written before about something that would really set a rocket under the opening up of data: the vigorous pursuit of the useful stuff.

When we’ve been given access to transport data, wonderful things have happened. When we get real-time feeds, useful services follow hot on their heels. Let’s make those infrastructural building blocks of services available for free, unfettered use: the maps, the postcodes, the electoral roll, your personal health records.

(Ok, I didn’t mean the latter two. Or did I? It gets complicated. Still writing that post…)

Here’s a vision:

Roll forward to a time when the first priority of any service owner within the public sector is not “how shall I display the accounting information about the costs of this service” (or indeed “how shall I obfuscate the accounting information..?”).

No. Instead, it is: WHERE is the service? WHEN is the service? WHAT is the service? HOW DO I USE the service? (And maybe even: WHAT DO PEOPLE THINK about the service?)

Those basic, factual jigsaw pieces that allow any service to be found, understood, described and interacted with. From a map of where things can be found, to always-up-to-date information about their condition, and a nice set of APIs with which others can build ways in.

The genius of this type of thinking being that many of the operational headaches of current service delivery simply fall away. They are no longer a concern for the service owner. “Our content management system can’t show the information quite like that.” “We haven’t got the staff to go building a mapping interface.” “We’re not quite sure how we’d slot all that into our website’s information architecture.”

Pouf. No more. Gone. The primary concern becomes: is the data that describes this service accurate (or accurate enough–with some canny thinking about how it might then be written to and corrected), and available (using a broad definition of availability which considers things like interoperability standards).

Well, Paul. Nice. But what a load of flowery language, you theoretical arm-waver. Can’t you give a more practical example?

Well, reader. Yes I can.


That’s right. Public conveniences. A universal need. A universal presence. But where are they? When are they open? And what about their special features? Disabled access? Disabled parking? Baby-changing?

There’s actually a bit more to think about (once you start to think hard) than just location and description. But not a whole lot more. The wonderful Gail Knight has been banging this drum for a while, and has made some good progress, especially on things like the specification for data you’d need to have to make a useful loo finder service.

Why’s this really interesting? Really, really interesting? Because having got a good idea of the usefulness of the data [tick] and a description of what good data looks like [tick] we then find all the other little gems that stand between A Great Idea, and a Service That Ordinary People Can Easily Use.

Who collects the data? Where does it get put? Who updates it? Who’s responsible if its wrong? How do people know they can trust it? Can people make money from it? (I could go on…)

Bear in mind that any additional burden of work on a local authority (who have some duties around the provision of public loos) probably isn’t going to fly too high in the current climate of cuts. Bear in mind also that anyone else who does a whole load of work like this is probably going to want something in return. Bear in mind also that “having a sensible standard” and “having a standard that everyone agrees is sensible” are two different things. Oh, and I need hardly add that much of this data will not currently be held in nice, accessible, extractable formats. If, indeed, it exists at all.

Two characters usually step forward at this point.

The first is the Big Stick Wielder (“well, they should just make councils publish this stuff. Send them a strong letter from the PM saying that this is now mandatory. That’s the standard. Get on with it. It’s only dumping a file from a database to somewhere on the Internet, innit?”) BSW may get a bit vague after this about precisely where on the Internet, and may, after a bit of mumbling start talking about a national database, or “a portal”, or how Atos could probably knock one up for under a million… (and it’s usually at this point that some clever flipchart jockey will say “Why just loos? Let’s make a generic, EVERYTHING-finder! Let’s stretch out that scope until we’ve got something really unwieldy massive on our hands”.) We know how this song goes, don’t we?

The second is the Cuddly Crowd-Sourcer (“forget all that heavy top-down stuff, man. We have the tools. We have some data to start from. Let’s crack on and start building! Use a wiki. Get people involved. Make it all open and free.”) CCS’s turn to go a bit vague happens when pushed on things like: will this project ever move beyond a proof-of-concept? how do we get critical mass? does it need any marketing? can people charge for apps that reuse the data and add value to it? how do we choose the right tools?

Both have some good points, of course. And some shakier ones. That’s why this is a debate. If it were clear-cut, we’d have sorted it by now, and all be looking at apps that find useful stuff for us. And isn’t just a matter of WDTJ (Why don’t they just..?).

My suggestion? CCS is nearer the mark. Create a data collection tool which can take in and build on what already exists. Use Open Street Map as the destination for gathered data. Do get on with it.

Matthew Somerville’s excellent work to get an accurate data set of postbox locations and the Blue Plaque finder are obvious examples to draw inspiration from. Once in OSM, data can be got out again should the need arise. There will be a few wrinkles around the edges as app developers seek to make a return on what they build using the data. There may well be a case for publicly-funded development on top of the open data. But get the data there first. Make it a priority.

Because if, after years of trying to make real-world, practical, open, useful services based on data we continue as we are, with a pitiful selection of half-baked novelties and demonstrators of “what useful might look like, at some point in the indeterminate future” we’re badly letting ourselves down.

Basically, what I’m saying is: if we can’t get this right for something as well-defined and basic as loos, a lot of what we dream of in our hack-days and on our blogs about the potential of data will just go down the pan.



OK, so it seems it already exists. Or at least a London version of it anyway. Don’t you love it when that happens? Would be good to see how it progresses, and what its business model looks like. I like the way that data descriptions have been used e.g. “Pseudo-public” for that class of loos which aren’t formally public conveniences, but can easily be accessed and used – e.g. those in libraries, and cooperative shops. The crowd-update function looks good too.

In a way, this also shows up another headache that arises when spontaneous services start to appear: there is only one set of loos in the real-world. But each representation of them in an app or online service must go through the same process of ensuring accuracy and extent of coverage. Distributed information is always tricky to manage. Should we hope that several competing services make it into production, with the market determining which succeeds? Will that be the one with the best data? Or is there scope for an underpinning data service that feeds them all? (But then we court the central, mega-project problems again…)

Answers on a postcard, please.

On the shifting of control of personal data

If you’ve been locked in a cupboard for the last five (or more) years, you’re excused from observing this thematic shift:

In the longer term, data about people is more likely to be owned and controlled by them. Rather than having many instances of personal information scattered around organisations and agencies, to be confused, duplicated, corrupted and left on buses, simpler technologies have emerged to put the data owner, you, back in control.

We see this theme emerging with several different labels: from vendor relationship management, to volunteered personal information, to personal datastores, to a “control shift” in the concept of personal data.

I agree that this shift is inevitable, to a greater or lesser extent. Everyone wants it. What’s not to like? Less cost of processing, greater security, reinforcement of personal rights etc. etc.

We start to make the ideologically satisfying separation of identification and authentication/entitlement more of a reality. More of this in other posts.

I just have two snagging issues which I’d love to hear a response on from those who want to get us moving on this now:

The first is a transitional one, but an important one. As the group of “personal data holders” grows, the infrastructure and operations required to support the other group won’t change. There’ll be a double running of systems. Although this is inevitable with any system change, it puts an immediate disincentive on any service provider to explore this route. (But this is not my point here.)

My point is that strange things will start to happen in terms of operational continuity and completeness. There will be “gaps” in databases, where the personal data holders used to be. Instead of their information, there will be links and interfaces to the data they control for themselves. Will this create all sorts of headaches and risks just by itself? Enough to seriously dampen any service provider’s enthusiasm for adopting volunteered personal information?

The second will persist, and is perhaps more problematic. Because your personal information (whether it’s about your identity, other descriptive information about you, or about your authorisation to a particular service) is going to have to be assured by someone. This may not, and indeed should not–in the case of identity–be the exclusive province of government agencies, but someone is going to have to do it.

Some will do it well: banks, for example, are rather more incentivised (and skilled as a result) to be damn sure you are who you claim to be. But some won’t. And when we get down to the level of a patchwork of assurers, in any system, we start to get some problems. When things go wrong (and they will)–have a vision of a functional world by all means, but build for the real, dysfunctional one–the untangling of liability may consume more resource than was ever achieved by enabling the shift of control in the first place?

Thoughts? I’d love to be convinced. I really would. But I’m a healthy skeptic at the moment.

De devil tail

I saw a statistical tweet this morning: the sort that makes me pause and think. What is it really saying, and what is the story that could be told?

The average local government public sector pension is £4200 a year. For women it is £2870. Gold-plated? Really? I think not.

Now, setting aside the comment about gold-plating (with which I heartily agree, for the record—these numbers are insultingly small) and the fact that no source is given, which is perhaps forgiveable given the space constraints of a tweet—what is the actual story here?

We don’t know if the £4200 figure is for male pensioners, or for all pensioners. It makes a big difference to the resulting headline.

This is why.

Let’s say the £4200 is the average male pension. This gives us headline comparisons that “women get 19% less than average, and 32% less than men”. 32%. Gulp. Powerful, huh?

But what if that £4200 is the average across all pensioners? To get any further, we need some information about the male:female ratio of local authority pensioners. Let’s assume 50:50 on the basis of no further information.

Now our headline comparisons are “women get 32% less than average, and 48% less than men”. Wow. This is looking much worse. (I’ve resisted the crap journalism temptation to put “over 48%”, by the way. Rounding is rounding.)

However, I’ll go out on a limb here and suggest the male:female ratio is tilted towards female pensioners. This is based on women living longer than men, having an earlier retirement age, and my gut feel that large swathes of local government employment have lots of women.

If we go for 35:65 male:female, we now get the headline comparisons: “women get 32% less than average (as before), and a whopping 57% less than men”.

Now that is a headline. (But are you getting a funny feeling about the attempt to compare one party with another, and with their collective average, in a two-party situation? Perhaps you should?)

So there you have a story about a 19% disadvantage, or a 57% one, from the same numbers, depending on how you interpret them. That’s quite a difference.

Were I writing up this story as a data journalist or campaigner, I’d want to pin down this type of population detail, and also correct for differences in the amount of time worked over a career for each sex (assuming that this contributes to calculating a pension figure).

I’d also want to show how much the underlying differences in base pay rates were making to the pension figures. Were historical (and current!) disparities in pay passing straight through to affect pension entitlements, or were there other factors (and perhaps inequalities) at work?

And is this getting better, compared with data from say 10 years ago, given some of the recent moves towards addressing long-standing gender inequality in pay?

Lots there, isn’t there?

We’re opening up a lot of data.

Let’s make sure we open up our analytical skills to match.

Not quite public

That old question came up recently: What really good online service experiences has government ever given us?

As usual, the first (and, sadly, often the last) answer: the online tax disc service. There’s no doubt it’s a properly good use of the online channel to save people hours of queueing and paper fiddling. It achieves its magic not with any fancy visual design–its interface isn’t that great (those five questions up front–why, just why?). And it still stubbornly refuses to update its strapline from one that was phased out several years ago. (Did you notice? No, of course not. Straplines are irrelevant.)

What matters are two bits of genius: one, the removal of any burdensome personal identification at the front end. No Government Gateway, no personal identifiers. Just a reference number that you type straight off the paper form that’s sent to you when its due. That’s it. If you have that number, and you have the means to pay, a tax disc will soon be on its way. Whoever you are. It’s about the car, not you.

The second miracle is the joining up of databases at the back-end. The car’s registration is used to call on MOT and insurance databases (information from completely different sectors, let alone organisations) to save you digging out slips of paper and doing all that queueing only to find out that one of them is a little bit out of date. Don’t underestimate how valuable that service join-up is.

But this post is not about tax discs. It’s about another online service, also from DVLAVOSA [updated: the MOT scheme is run by another Dept Transport agency, VOSA], that far fewer people know about.

And it’s not to illustrate a service point, for a change, it’s to explore an information point.

The MOT.

The evil cousin of the tax disc. It doesn’t display its expiry date for the world to see on your windscreen. Well, it does if you choose to fill in the little sticker that you get with your MOT pass certificate. But that depends on your choice. And nobody is going to punish you if you don’t do it, or if the little sticky peels off and gets lost. So we know what that means.

When’s your car’s MOT due? Yes, you! Do you know? Without finding the last certificate, which, let me guess, isn’t about your person or your desk as you read this.

You could find out online, you know. There’s a nifty little utility here. And what do you need to get access to that magic expiry date? Well, you need the registration number of your vehicle, naturally. Which you probably know.

And you need a reference number from your last test certificate (or failure notice).

Really. No, I’m not joking.

Ok, I’m exaggerating a little: if you don’t have the test certificate to hand, there is a fallback. You can use the number on your blue V5 form. The one that we oldsters still call “the logbook”. Now, let me take a wild guess as to where your logbook is kept? Any possibility it might be in the same drawer as your… You’re there already aren’t you? Give yourself a quick kick on the inside of your shins, DVLAVOSA service designers.

So we have a potentially brilliant online service, that, if promoted, could stop tens of thousands (my guess) of people slipping past their MOT expiry dates without realising. The only time they think of these things is in idle hours at their desks at work, while the documents they need languish in a dusty study drawer at home.

And what would make the service brilliant? Just making it usable on the basis of the registration number alone. Which would mean that anyone could look up anyone else’s MOT expiry status. (The crowd suck through their teeth…is that, I mean is that, ok?)

And is it?

The point (which I have finally got to) is that MOT status information is a curious dataset. It’s not quite private (well, it’s barely “protected” to any appreciable extent), and it’s definitely not public. Instead we’ve built a little friction around accessing it (needing to drag out a hard-to-find bit of paper rather than an easy-to-find remembered–or seen in the street–fact).

Does it feel like personal data to you? Would it bother you if your nosy neighbour could look up your missed test date and start leaving little passive-aggressive notes on your windscreen? Or should it be a public data set? Nothing to hide, nothing to fear and all that. And the bloody tax disc expiry date is printed loud and clear for all in the street to see, isn’t it? What’s the difference?

The only risks I can think of that are headed off by this rigmarole are the nosy neighbour one, or possibly a local garage touting for business on the basis they’ve spotted your car is coming up for a test soon, or a miserable Lazy Wail underling sitting in a grey basement tapping in slebs’ car registrations in the hope of getting a pathetic non-story.

That’s not a lot, is it? Am I missing something? Is that the entirety of the reason why we are denied an incredibly easy-to-implement online tool which would save us real time and real money?

Over to you. And over to you, DVLAVOSA, if you’re reading. Which I hope you are.

I’ll revisit this concept of quasi-public data soon. Things that aren’t quite public, aren’t quite private, and may well be personal. Things like the electoral roll, for example :)