midata: revolution or enigma?

No technology contracts bigger than £100m.

Bye-bye proprietary software monopolies–hello Open alternatives.

An avalanche of government data to generate new business opportunities and pump billions into the economy.

Fast broadband for (almost) all.

Agility, everywhere–no more risk-averse, unchangeable systems–instead, a commitment to diversity and experimentation.

Reskilling in-house tech teams, reducing dependence on external suppliers with vested interests.

And after years of false dawns, services actually joined up around–and designed for–their users.

There’s not a lot not to like, really. Is there?

Just before the election we heard a torrent of such promises. Watching the gathered geeks and entrepreneurs around me at the launch of the Conservative Technology Manifesto last March I could see tongues virtually hanging out. We weren’t just being offered the keys to the sweetshop–Francis Maude and Jeremy Hunt were pretty much proposing ripping its doors off.

How much of these sweeties have actually been delivered post-election is a story for another day (ah, the shackles of that Coalition Agreement, I’m sure…).

But over recent weeks and months we’ve seen glimpses of another what’s-not-to-like initiative. And now it’s been launched.


[Ok, try this link. I was making a dodgy CMS point with the first one, that Google (and BIS site search!) gave me…]

So here comes the grumpy blogger to get all picky with what on the face of it is a risk-free, consumer-enriching move willingly volunteered by industry, facilitated by government, to make real people’s lives easier at no cost. (Coz there’s loads of those.)

Well, not so much of the picky, really–just an interest in shining a light into some of the corners of this debate. Because corners and angles there most certainly are.

The first thing to get to grips with is that there seem to be two big agendas wrapped up together here.

Both can be connected to the words “me” and “data”. But they seem to be quite different in their nature and purpose. That’s always a recipe for confusion if not properly unpacked. So let’s see what we have.

Agenda 1: better information for consumers

We have a consumer empowerment angle here, clearly. “Giving people back their data” is billed as putting the customer back in control when forming or reviewing a relationship with a vendor. For some services, especially things like utilities and telecomms, the case is very tangibly made.

We generate a lot of data in consuming the service. Understanding our consumption patterns in detail would help us when making future choices about service provider, as we’d be able to match the terms that were on offer with what we actually needed.

So far so good.

This also extends to things like preference data: as we go about buying things (and even just looking at them) we generate a cloud of information about our preferences, choices, needs and their timing. This has a value–how much, nobody really knows, though there are some florid estimates–to marketeers, and could drive better deals and more targeted, less intrusive advertising.

Agenda 2: proving your identity online

The moment we started to move transactions away from being with someone you knew personally in your village, we increased the complexity of how you prove things: who you are, can you pay, entitlement-by-residence and so on. Online, it’s pretty horrible, and attempts at building something that’s simultaneously secure and usable by normal people have foundered.

(There is more elsewhere on this blog about these issues–otherwise this post would be very long.)

Suffice to say that the current approach (which actually looks pretty promising) is that of “federated identity assurance”. Not trying to create one massive database of people information against which things can be checked, but to use information sourced from a number of existing trusted relationships, in combination, to give sufficient assurance of identity.

Which means that both these agendas are the same, doesn’t it? They both involve consumers getting their hands on personal data that’s previously been locked up in companies.

Well, actually, I don’t think it does.

Why not?

A definition of “personal data” is harder to pin down than might seem initially apparent [more here]. Lots of things that don’t look that personal by themselves (points on a map, equipment serial numbers etc.) take on a whole new power when linked to an individual.

There’s the obvious “personal facts” stuff, of course: name, address, account number etc. which usually (but not always) identify an individual.

Then there’s operational data, made much of by midata: what we’ve used, what we’re interested in, what service choices we made etc.

Releasing structured chunks of this latter type could well meet Agenda 1’s objectives. And there are design choices to be made here which will have a big impact on risk and privacy.

Would it be sufficient to get a log of mobile calls by time band and number type, for example, rather than a detailed list of numbers actually called, and precisely when they were made? The former could well be enough to allow a better contract to be found: the latter would be a potential privacy nightmare, not just for the caller, but also whom they called, if it were mislaid.

My point being that meeting a consumer empowerment agenda requires the “giving back” of information with certain characteristics–i.e. tailored to fit the way that consumer services are packaged.

But the giving back of information to help confirm an identity relationship–Agenda 2–seems to me to be a very different beast.

Because I thought the whole concept of using a number of different identity providers was that you asked them to pass confirmations of trust around–not the actual personal data itself? So one might ask a bank to confirm electronically that some submitted data matched a record that they held, but that’s not the same as handing the requestor (or indeed the individual) chunks of personal data.

So I fear that in an attempt “not to go into too much detail” we’ve got a conflation of two separate, interesting, important issues under the midata flag.

One can always argue that “it’s the principle that counts–we should establish that first, then let the clever people get on with the solutions”. Well, yes. Ok.

We did that with electronic patient records, with Post Office smartcards, with national identity cards and registers… At some point we do need a public airing of the underlying principles in a greater level of detail than the initial press release. And before a major delivery programme has been commissioned, I’d suggest.

Other than this “issue overlap” there are a few other points that strike me about midata. There is this underlying sentiment that consumers have a right to “their data”. But what is it that actually makes a particular piece of data “theirs”?

Information about usage is a hybrid of personal facts (e.g. who is the account holder?) and operational information as a consequence of service use. How far does it extend? Basic consumption patterns? Probably yes. Detailed, time-stamped records of every purchase and all parties involved? Hmm. Maybe. Serial numbers and last maintenance dates of the precise routers and masts that were used to deliver a phone call? Well, now you’re being silly, Paul.

Yes, I am, of course. But I’m trying to illustrate that the translation of this “right to data” into reality involves more than just signing a memorandum of understanding. Update: there’s a more detailed post about “Whose data is it anyway?” here now.

And then there’s the cost angle. Even if we assume that the addition of a simple bit of code will suddenly enable service providers to spit out raw chunks of data onto the Internet (aka the “it can’t be that hard to get their systems to…” fallacy argument) the midata announcement is already talking about a greater degree of sophistication: particularly the bit about “access, retrieve and store their data securely”. Who’s going to pay for that?

And do we have robust evidence that there is interest and demand for this type of data release, other than from the vociferous lobbyists with their eyes on constructing a wealth of new “personal data store” opportunities?

It’s great to see entrepreneurial spirit flourishing, but how much is this about solving real consumer problems, and how much about playing yet more variations on the “consumer as product” theme–you tell us about your interests, and we’ll give you better deals (but only as a share of what we’re really making by selling that information to other vendors).

The argument that better information increases customer choice, and therefore power, is of course another “what’s-not-to-like”. But if you take a step back, and look at the implied problem that “people don’t know which is the best deal as they’re all so complicated and people don’t really know what they use anyway…”

…would you put your energy into releasing chunks of data to help make a better match with a complicated tariff, or would you have another look at the issue of tariffs in general, and simplify them? Yes, both represent some form of intervention, and I can see the political attractiveness of the former, as (especially under a voluntary scheme like midata) it plays down the regulatory role in favour of cheerful vendors all quite happy to be a lot more transparent with their/your operational information. But one wonders just how sustainable this level of voluntary cooperation would actually be in the longer term in highly competitive markets…

That’s a bit like imagining a set of doors with fantastically complicated locks, and giving people the right to have equally complicated keys cut–rather than pushing for simpler locks in the first place.

So, a lot of questions remain. Conceptually, midata isn’t something that could or should be objected to. And this post is not written to criticise, but to suggest a few areas that need more detail and analysis.

When we see press releases that let fly with cool talk of data, empowerment and choice we should be getting a lot more eager to ask the next level of questions. What does this really mean? How will it work in practice? And what might some of the broader economic, competitive, social and privacy implications be?

Until we do, we’ll be dazzled by press releases and then a bit disappointed when delivery swings into action. And it’s usually too late by then to do much about it.

Getting personal

For a long time, I’ve shied away from writing here about personal data. Or even thinking that deeply about it. The nature of identity, yes. The usefulnesss of data, yes. Personal data, no. Why?

Not because it isn’t fascinating, or important. Mainly because it’s so…damn…nebulous. And difficult. Time to get over that, I think. Very significant things are happening in this area, and we all need to raise our game in how we understand and engage with the concepts involved.

As I’ve surmised before, the only things that are really different in the Internet age are the ease with which information can be found, and the ease with which it can be stored.

Two things, really. That’s all.

The first embraces everything around indexing, cross-referencing, labelling, structure and searching. The latter takes us into the territory of copying (and of course copyright), archiving, and the general issue of persistence.

And when we look at personal data in that context, there is an immediacy–and potential toxicity–in what emerges.

We saw early rumblings of this long before the Internet, of course, when computers were first used for the mass processing of information about people. Things could be done with databases that simply weren’t possible with big paper ledgers.

We created Data Protection legislation which attempted to put reins on the ability to make free use of some types of information. Gathering stuff about people, from the basic facts of who and where, to how to contact them, who they were connected to, and what their tastes and preferences were. Pure gold, used in the right (or wrong) ways.

Data Protection set out some pretty sound, but general, principles. The overarching one being that the purpose to which data could be put should always be made clear to whoever provides it, at the time of providing. Lots of other stuff about processing, storage, where and how long, and so forth–but that issue of consent always seemed the most important, to me.

And we scratched about a bit to actually try and define what we meant by “personal data”. Some things were easy. Names. Addresses and phone numbers. They’re just obvious.

But what about our tastes? Our buying history? The movements of our mobile phone from cell to cell? A journey we took? As one takes informational side-steps away from the individual, the obviousness diminishes, but if you can make meaningful connections back to the person…

…and remember the first thing that the Internet really changes?

Being able to make those tenuous links between blocks of information into something really substantive.

And the second thing? That information and those links are now permanent. You can’t delete them, once they’re there.

All those things that databases couldn’t previously do, because they all conformed to different standards, and weren’t connected together? They can now. Things can be done via the Internet that simply weren’t possible with just the databases.

Bit by bit, it’s been possible to build up the most humongous repositories about people. Maybe entirely within the law, maybe in other ways as well. Maybe with explicit and informed consent all the way down the line. And maybe not.

Who’s to know? We find strange things going on with data that we provide in order to use one or other service–or even to exercise our democratic rights. Didn’t it ever strike you as slightly weird that the electoral roll could be sold on for commercial purposes? (Much more on the electoral roll in another post coming soon.Update: now here)

We have big companies that have built successful businesses just like this: perhaps using aggregated personal information for credit referencing, perhaps to sell to marketeers to give them a better understanding of demographics.

The genie is very much out of the bottle. Your rights to see the information that a particular company holds on you may exist, but you have to have a fair idea of which company to ask in the first place. Can you ever see the full picture of what others know about you?

Of course not.

And it’s unreasonable to suggest that we’ll ever be able to do that. Instances of data multiply more rapidly than does our capability to track them. (There must be a Law of Internet Entropy out there that says something like that. If not, I just invented one.)

(As an aside, a dear friend once uttered the memorable line “somewhere out there, there’s a database with your dick size on it”. That was in 1989.)

So what can we do?

Realistically, all that’s available to us are firebreaks and friction.

We can’t get that genie back in the bottle, but we can slow it down a bit, and find ways to mitigate the impacts.

Do we need an updated definition of personal data? It’s MUCH harder than it seems at first glance to create one. The best I can find at the moment in terms of an “official” position is here.

And it’s clumsier than you think. Essentially, it’s a list of ever-widening filters that assess whether a particular piece of information can be connected to a specific individual. Culminating in the rather wonderful catch-all of the final category:

8. Does the data impact or have the potential to impact on an individual, whether in a personal, family, business or professional capacity?

Yes The data is ‘personal data’ for the purposes of the DPA.
No The data is unlikely to be ‘personal data’.

Even though the data is not usually processed by the data controller to provide information about an individual, if there is a reasonable chance that the data will be processed for that purpose, the data will be personal data.

That’s pretty general, no? In fact, going by that, an awful lot of things are now personal data. I really like the emphasis it puts on the outcome of the data use, not attempting to over-define things like form and structure.

I’d go as far to say we should probably throw away that big long document, and just run with this definition:

Personal data is information that affects you when it’s used. Either directly, or through being linked to other information using technologies that exist now, or may exist in the future.

Broad enough? ;)

(So my beloved photos: they’re personal data. I take them with a camera that has a unique number, held in metadata in the picture file. That provides a way to link all the pictures it takes together, and then, through the various accounts I put them in online, back to me. Think how many other trails you leave…)

But again, all we really have are firebreaks, and friction. There’s a sort of reverse entropy at work. Unlike almost every other instance of entropy–where things get more chaotic over time (china plates get broken, they never put themselves back together again)–personal information is, relentlessly, only going to get more linked. More aggregated. More pervasive. More permanent.

(So, maybe I just invented The Law of Reverse Internet Entropy as well? Not bad going for one post…)

And if someone tells you that big blocks of personal data can be “de-anonymised”, be very sceptical indeed. (You can read some wise thoughts on the issues involved here and elsewhere on that blog.)

We can undertake some pretty noble fire-breaking: like ensuring the state doesn’t become the source of a global universal identifier for you. And we will certainly see more developments around multiple personas: compartments of your life associated with particular tasks, contexts, or connections. I think we’ll have to. (The concept of federated identity helps here, but that’s too much to go into for this post. Read more thoughts from the team working up these concepts for government.)

And we’ll adjust. Society has seen some pretty dramatic upheavals. Often associated with a new technology, or philosophy. If we adjust our societal norms faster than the upheaval, we don’t notice. If we’re slower to change, it’s painful. For a bit.

But we get through. We adapt. And we change. Always.

Google Plus Ungood

I know many people have managed to get up and running with Google+ fairly easily. The usual snags have been reported, of course, as users get used to the idiosyncrasies of the network, and as new etiquette and conventions emerge.

Today, it’s become clear that there are some deeper issues emerging, as Google enforces a “real names only” policy. Erm, good luck with that, in a hard identity sense, guys. Unless you’re going to try and peg people back to a state-issued identifier… (no, I’m not even going to go down that road of horror).

There’ve also been a few nasties creeping out of the woodwork as users realise some of the drawbacks of putting it all in the cloud. One wrong step with your service provider, and you’ll be writing a rant like this as thousands of hours of curation, not to mention thousands of irreplaceable and irretrievable content files, are briskly wiped out.

But for me, Google’s latest foray into social networking has pretty much been a non-experience. Although I was invited fairly early on, and signed up successfully for a few days, it all went belly-up pretty soon afterwards.

Why? Because of the cack-handed way in which Google identities work, that’s why. Here’s the detail.

Like most people, at some point I signed up for a Gmail account. I didn’t get a very nice address, as I wasn’t in there early enough, but it is a version of my name.

What I do have, and use instead, is a funky email address that I set up 10 years ago, and a couple of years ago moved over to Google Apps. (Bear with me.)

That email address is pretty much the way in which I’m identified for all services I use that are based on email address. In many ways it is my self-asserted identity on the Internet.

So it won’t surprise you to learn that when I came to create a Google profile, based on an email address, I used my “home” email identity.

So far so good, and for a couple of years everything worked smoothly. Google Apps did the things Google Apps did (email, calendar, contacts). And for the other Google services I used (Analytics, and probably not much more than that), I logged in with my Google profile. All was well. I had a slightly uncomfortable feeling that there may be trouble down the line though, with two identically-named identities that were logically separate.

And I was right.

A few days after I joined Google+ I got a friendly-but-firm email from Google. “We’re consolidating your accounts,” they told me. This dual use of the same email address can’t go on. Not optional. Indeed.

As I’d been invited to Google+ using the “profile version” of my email address, I feared the worst. And I was right. That was the account which was going to be stripped of my preferred email address. To be replaced by a “temporary address”–something horrible with a percentage sign in the middle of it. Great. The G+ connections dried up–nobody knows me as “the percentage sign email guy”–they know me as my ordinary, erm, email address. Bugger.

It got worse. To be able to get into G+ at all I didn’t just have to log in to Google using the temporary profile, I also had to log OUT of Apps (explicitly, even if I were already logged out of “normal” Google)–otherwise Google thought I was attempting to access G+ using a “business identity”. The horror!

The solution–according to Google–was to assign my Google profile an entirely new email address. Right. A new identity, for what could emerge as a pretty important service, should Google actually get their act together. An identity, and email address, that I didn’t need or use anywhere else. Not. Ideal.

So we have an impasse. I am hanging on, the temporary account unused and unloved, in the hope that Apps users will at some point be able to use their Apps email as a G+ identity. (It’s a rather faint hope, given the strategic direction that Google seem to be taking with identity.)

Why would I waste time now building up a social network where I, quite literally, don’t know who I am?

But it explains why I’m not part of this party, remain unconvinced of Google’s ability to handle the basics of social interaction, and am pursuing a wholesale review of my domains, addresses and identities for what now seems an inevitable clean break, sooner or later, with Google. Nice work, chaps.

Update, 25 July: a few morning-after-posting thoughts

Is there any real significance in all this? Surely this is just the moaning of yet another free-service user who didn’t read the Ts&Cs? Nothing paid, nothing to complain about.

Well, this is significant, because:

  • Identity, and cross-platform identity, are hugely important in an ever-more-connected world. Mess with those and you mess with the core of user experience: user existence.
  • Like it or not, it’s hard to see how a relationship with Google won’t form some part or other of everyone’s Internet activity at least over the next few years. This makes a Google profile (whatever neglect Google may have shown for it to date) disproportionately important.
  • The attempt to enforce “realness” is weak. Google’s requests for reference to “government-issued ID” (redacted or not)–whether to “prove” age or identity–is a troubling step. It puts a little friction in the path of being anonymous, sure, but if you want to, you can be.

And these characteristics (inflexibility, heavy-handedness, dependence) are all indicators of things that we’ll need to worry much more about in the future.


Google account administration functions really are up the spout. Here’s a good piece by Dan Harrison on Google administration in general, and another on Google Apps deficiencies in particular. I’ve said it before: if a profit-focused, cash-rich organisation like Google finds identity so difficult, do we really hold out much hope for government?


Google Wave also revealed some of these flaws. I actually thought, briefly at the time, that the whole Wave concept was actually a Trojan Horse to get people to sign up for a Google profile (or to take one more seriously if they already had). And what did they force me to have as my Wave ID, despite me already having a friendly Apps address, and a slightly less friendly Gmail address? Something like paulclarke0001@gmail.com (I actually forget how many zeroes.) Face hits palm.

On the shifting of control of personal data

If you’ve been locked in a cupboard for the last five (or more) years, you’re excused from observing this thematic shift:

In the longer term, data about people is more likely to be owned and controlled by them. Rather than having many instances of personal information scattered around organisations and agencies, to be confused, duplicated, corrupted and left on buses, simpler technologies have emerged to put the data owner, you, back in control.

We see this theme emerging with several different labels: from vendor relationship management, to volunteered personal information, to personal datastores, to a “control shift” in the concept of personal data.

I agree that this shift is inevitable, to a greater or lesser extent. Everyone wants it. What’s not to like? Less cost of processing, greater security, reinforcement of personal rights etc. etc.

We start to make the ideologically satisfying separation of identification and authentication/entitlement more of a reality. More of this in other posts.

I just have two snagging issues which I’d love to hear a response on from those who want to get us moving on this now:

The first is a transitional one, but an important one. As the group of “personal data holders” grows, the infrastructure and operations required to support the other group won’t change. There’ll be a double running of systems. Although this is inevitable with any system change, it puts an immediate disincentive on any service provider to explore this route. (But this is not my point here.)

My point is that strange things will start to happen in terms of operational continuity and completeness. There will be “gaps” in databases, where the personal data holders used to be. Instead of their information, there will be links and interfaces to the data they control for themselves. Will this create all sorts of headaches and risks just by itself? Enough to seriously dampen any service provider’s enthusiasm for adopting volunteered personal information?

The second will persist, and is perhaps more problematic. Because your personal information (whether it’s about your identity, other descriptive information about you, or about your authorisation to a particular service) is going to have to be assured by someone. This may not, and indeed should not–in the case of identity–be the exclusive province of government agencies, but someone is going to have to do it.

Some will do it well: banks, for example, are rather more incentivised (and skilled as a result) to be damn sure you are who you claim to be. But some won’t. And when we get down to the level of a patchwork of assurers, in any system, we start to get some problems. When things go wrong (and they will)–have a vision of a functional world by all means, but build for the real, dysfunctional one–the untangling of liability may consume more resource than was ever achieved by enabling the shift of control in the first place?

Thoughts? I’d love to be convinced. I really would. But I’m a healthy skeptic at the moment.

Preaching to the unconverted

I’ve been getting this blogging thing all wrong. Three years of grinding out thoughts about public services and technology, generally pointed towards an audience already versed in the issues, have all been for nothing.

I’ve been missing the real audience. The one that truly needs to understand more about this stuff.

A spirited discussion on Tuesday with a doughty advocate for public transparency convinced me that I need a change of approach.

Our debate arose from his astonishment that it wasn’t possible for “government” to say at any one time how many people it employed. Despite this being an “obvious” factual issue in his eyes, no amount of requests seemed to be able to produce a meaningful answer.

My response “well, it’s not really a meaningful question” – didn’t go down too well. Even having navigated the complexities of what “being employed” might mean, with all its colour and texture of vacant posts, secondments, part-funded posts, long-term absentees and part-timers, I felt there were still problems with the concept of such a broad question.

If asked by an economist with a specialism in operational research or organisational productivity, I could possibly, possibly see some sort of tangible purpose to a question, but more likely a version targeted at a more specific organisation or sector than just “all of government”. Possibly.

I know this is heresy: information should be free, yadayadayada, and the motivation of the questioner unimportant. But open your mind just for a moment to the possibility that context may have some value, in light of what came next in our debate.

The moment when I realised I’d got all my public service technology blogging pointing in completely the wrong direction was when my interlocutor said “you technical guys – you can sort all this out – surely the systems know how many people are on each payroll? Just add them up every night. You could if you wanted to.”

Here was an acclaimed expert in transparency of information, someone who’d spent much of his professional life pursuing the dark corners of government’s secrecy and intransigence. And he thought that a few lines of code and a dictat to “just f-ing report it daily” would meet this requirement.

(A spurious requirement, I’d say, as the journalist asking the question would be likely to write the same story whatever the actual number they got in response to their question. Any Big Number would do the job – and hey, if no meaningful answer came forth, that would be an even better story. “How stupid are they! They don’t even know…” Win, whichever way you look at it.)

I blame the Daily Mail, of course (shorthand for any form of lazy, populist, press). As with most difficult public policy issues, from asylum seekers to disability claimants to identity, there’s always an easy, quick answer that will get heads nodding in the pub and taxi.

But which is almost always utterly, hopelessly, WRONG. Who wouldn’t like an easy answer to a hard question? To avoid any deeper thinking about the subject. Or acknowledgement of history, personal responsibility or sense of others? To gloss past the difficulties that arise when something that looks (from a huge distance) a tiny bit like a simple, familiar, backyard activity is attempted on a scale of tens of millions of people and transactions.

So here’s the plan: a post, or small series of posts, called “The Daily Mail Reader’s Guide to Public Services Technology”.

Taking some of the favourite old chestnuts (Why can’t they count X? Surely if everyone just had one ID number? Why so many different systems essentially doing the same job?) and really, anything else that begins with: “I don’t see why they can’t just…”

And writing them up in language that DM folk may identify with. Analogies from golf clubs, caravan parks, tea shops. You get the drift.

I’ll make a start, but do please add your suggestions here for topics that you’d like to see given the treatment.

How the Government Gateway works

Caveat: this is not a technical description of how the Gateway works. Nor does it cover the behind-the-scenes services that the Gateway provides in terms of messaging and interoperation between various government systems. But it is my description of the way it works at the front end–the signing-on bit–of government services. Because that’s where it’s most apparent, and that’s the bit that’s often misunderstood. I wrote this because I haven’t been able to find such a description anywhere else on the Internet. Which is slightly odd (isn’t it?) given that the Gateway has been around for about ten years.

For a service that plays a part in millions of online public service transactions a year, the Government Gateway is surprisingly poorly understood, and described. What you can find online varies from the noble attempt (but not exactly functionally descriptive) to the flamboyant, to the technical, and on to the slightly bizarre.

But nothing in plain language that really sets out what’s going on. And, perhaps, what isn’t. I have something of a fascination around the mechanics of authorisation and authentication, particularly when applied to government services, so here goes.

You want to a use a service that has the gateway sign-on apparatus at its front-end. Like Income Tax Self-Assessment. So you go to HMRC’s Self-Assessment service and register as a new, Individual, user (as opposed to an Organisation, Agent or Pensions administrator). Very quickly you’re taken through a brief request for your name and a password, a few warnings about the seriousness of what you’re about to do and the type of documentation you’ll need with you later on, and behold: a big long formal 12-digit User ID pops up. 848355815693 is the one I just registered.

Shriek! Did I just put my Gateway User ID out there on the Internet? Why, yes I did. (We’ll come back to why that doesn’t matter in a moment.) HMRC are now asking me to continue through the process and ‘enrol’ in the service. But we’ll pause there for the moment.

The Government Gateway uses an approach called “Registration and Enrolment” (R&E). First you have to register for a User ID (we just did that). Then you have to enrol in the various services you want to use with it. Enrolment means you go through a process, specific to the service you’re trying to use, of giving proof of who you are and that you’re entitled to use the service. Leaving it up to the service to decide how much proof is needed is a really good thing, surely? No avalanche of information required to use a simple, low-value, low-risk service? We’ll see…

In theory, therefore, you can add more and more services to your ID, leading to what becomes a single sign-on for lots of services, using the same User ID and password. In theory.

The great genius of the Gateway R&E design is that it does the reverse of what you’d expect. Instead of trying to be all secure up front–insisting you prove entitlement and identity straight away–it wilfully ignores all that and gives you a wholly anonymous, “throwaway” ID number. You can go and get as many as you like. Try it yourself, now. Really, go and do it a few times. You can either do it via hmrc.gov.uk (just my little joke) or at the Gateway’s own site. They both work the same way.

It was once memorably described by a much cleverer colleague as “an insecure keyring to which you can attach secure keys”. (Great, until you need to find your keyring.)

The great folly of R&E is that it is utterly pointless, unsupportable, and ultimately valueless for normal people in real life. Have you spotted the gaping holes yet? Before we expose them in more detail, let’s quickly look at enrolment.

For HMRC self-assessment the enrolment process is the bit where you enter your Tax Reference Number and a few other bits of identifying information. And then you wait. For a PIN to arrive in the post. As a means of confirming you are who you say you are, before you can go any further. Not quite a seamless electronic transaction there, then. In the days leading up to Jan 31st the post seems to move very slowly indeed. And you might lose that 12-digit number in the meantime.

DVLA have a twist on the process: not for them the “give us a name and here’s your ID” approach. Oh no. They ask for lots of other qualifying information, name, address, Date of Birth, Passport Number, and—of course—money before they get to the bit where they spit out your new provisional driving licence. Not bad, really.

They’ve almost masked the presence of the Gateway entirely. There’s a question at the very beginning saying: “While applying, you’ll be issued with a Government Gateway user ID. If you already have a Government Gateway User ID, simply enter it with your password.” And if you haven’t, can’t remember it, or can’t be bothered—don’t fret, you can just get another one.

Getting a sinking feeling about the value of this User ID yet? (And actually, people will fret. They will spot this sort of “do I/don’t I need to…” ambiguity and it will delay or put off some people from using the service.) Doubt is something you really want to design out of online transactions.

So, behind the scenes, DVLA just went and generated you another Gateway User ID. One you’ll probably never need again, and one which carries no security risk, but isn’t necessarily anything to do with your other Gateway relationships. Unless you happened to have a previous one to hand when you applied. (I’d love to see some stats on how many do this, by the way.)

So, let’s look at what’s really bad about all this (and I stress again that I am talking about the user experience of the Gateway as a front end to transactions: Gateway R&E. Not about the back-end messaging standards which also form part of the Gateway suite of services):

1. Unsupportable. You can’t find your Gateway ID or password: what do you do? No point approaching the Government Gateway team—they don’t know who you are. They only recorded a name and password (which you might have lost). If you’re going to start resetting passwords and handing out IDs by email you need some better checks than that. They don’t have any information to check against. (And you’ve probably spawned several by now as you’ve been navigating through various online services. Which one have you lost?) So you approach HMRC, or whoever you need to deal with at the time. And they ask for your Tax Reference Number. Because your relationship is with them and that’s how they know you. The Gateway adds no value.

2. Take-up. Despite a bit of official posturing about it being government’s preferred online transaction authentication solution, and a few high-profile services which incorporate the front-end bit in some inconsistent way, most services routinely ignore it. Look at this service list: and this service has been operating for how many years, and has had how much spent on it? The Gateway is routinely ignored at the front end because it adds no value.

3. Lack of transparency or challenge. Try and find another piece like this on the internet that explains what’s going on and casts a critical eye over value. People seem remarkably reticent to discuss something that is a pretty big feature on the government technology landscape. If they do praise it, it looks like this, emphasising the benefits to service providers of using its protocols and messaging, but glossing over the broken stuff with phrases like “allows citizens to have one user ID and password”. Yes. In theory. Oh pur-lease.

4. It’s not Your Account for Government. It never can be. It’s designed not to be. This is a particularly pernicious failing. It raises expectations that it should, somehow, be a single connection point between citizen and state online. When it’s compromised, we panic. When it fails to add any value, we’re disappointed. We’ve been, effectively, duped into thinking some sort of useful, usable functionality has been added. It hasn’t.

5. It fundamentally misreads individual user behaviour online. People do share and lose their IDs and passwords. Putting in a wait for the postman does result in everything having to be redone, and in sapping user confidence in government’s online services. The situation is slightly better for businesses, and I will concede that for business-facing transactions (and for accountants, agents and other intermediaries), Gateway R&E probably does add some value. But there’s a hell of a difference between employing someone whose job it is to get these processes right, and providing services to individuals.

One can see why Gateway R&E had some attractions: ten years ago, when it started, there was massive political pressure to bring public services online. Earlier attempts to build a secure authentication framework across all services had foundered (and still do, see numerous other posts here on this). This half-way house created a way in which the press and public could be fed stuff like that BCS line above, and we public could be left to pick up the pieces of a miserable, broken, user experience.

A value-adding single sign-on experience can be yours. If only you don’t do stupid stuff like lose passwords, IDs, or a strange little card we send you, and if you can manage to navigate around the workarounds (like that DVLA “if you already have…” stuff) that we have to build into every service to make them actually get used.

Time for a few pointed questions and FOIs, I think. Because this is fundamentally difficult territory, I think it’s had a bit of an easy ride.

Why the big fuss?

The usual parade of whimsy on this blog about this or that in public services, or things-that-make-my-head-hurt-in-general, has been rudely interrupted by a series of diatribes on identity and trust online, with a focus on people interacting with government.

Why, why all this attention to some rather obscure mental masturbation?—I hear you cry.

By way of brief explanation:

1. It’s fascinating: intellectually, socially, and philosophically; here we find very real and somewhat abstract concepts fusing together to try and do an important job.

2. Did I say important? It’s really, really important. Progress on this issue has the potential to shape some pretty fundamental things about our privacy, freedom and relationship with the state (and with each other).

3. It’s big. A brief look at the history of computerising National Insurance or patient health records is enough to show that national-scale anything of this nature is not to be taken lightly. Grand schemes have to be very well designed before implementation (and not just technically, but socially and behaviourally); start-small-and-scale requires a good understanding of what will really change as things grow.

4. It’s fraught with paradoxes: it’s easy to imagine tempting answers – very hard to design workable solutions. What seems easy in broad outline dissolves into complexity in the detail. The ingredients themselves are elusive: shape-shifters at times—identity information can be there to act as a reference (help me look your record up), a verification (we’ll check that out against our database), a diversion of risk (well, we ask all callers for their date of birth), a red herring (no, I do need your mobile phone number before I’m allowed to talk to you…Data Protection innit?)…and more. And the best hope we currently have for a solution relies on concepts which are far from our mental models of how such things should work.

5. I don’t know all the answers—I know a few of the questions, that’s all. I am happy to be set straight about any of this—if you can describe a simple, workable solution, please do so. Just don’t start “well, can’t we just give everyone a unique number…?” ok? If you hear someone spout that we should be able to knock up an Amazon-type “account for government” tomorrow, gently ask them to go a little further. Ask a few questions. Ask if it has to be “the real you” holding the account. Ask if you can have more than one. Ask if you’ll have to have one, even in the distant future. But be nice.

Finally, a consoling thought, before I leave this topic for a while. There are some parallels here with another tricky technology/people problem: for thousands of years cryptography was beset by one major problem—how do you get a key from Alice to Bob so that Bob can unlock a message when he receives it? Anyone intercepting the key could then intercept the message that followed and open it. Seemingly intractable—one’s only option was to find clever ways to exchange or vary the keys—it was blown away by a neat bit of maths in the mid-70s, leading to a simple form of code-making (involving no exchange of keys) which underpins secure ecommerce to this day. Perhaps there’s something out there, as yet undiscovered, which will allow us to square these circles of usability, privacy and assurance. A public-key cryptography equivalent for identity. I just wish I could find it first.

The Nature of the Relationship, part 2

In which we look more deeply into that business of what an online trusted relationship actually means—over and above the mechanics of actually “proving” something about it to a particular degree.

New readers will probably want to read an introductory piece, a logical separation of issues relating to trust from those of identity relationship, and the post immediately preceding this one. (Keener-eyed regular readers may now be getting some clues as to what this oddity was all about.)

So we’ve found so far that some of the stuff we imagine should be quite simple, isn’t. A single log-in using one identifier to get to lots of services is a shaky concept. In theory, it should be fine (we can create models very easily in our minds of things that work like that and don’t cause much difficulty). But in practice it creates what—at scales of national, or even widespread local level—quickly become data management and security nightmares. It leaves the way open for other things, perhaps unwanted, to be attached to that identifier, covertly or overtly. And, assuming that you provide a few different passwords or other tokens, or even add in some biometric checks to the mix (coz you wouldn’t want to lock all your possessions using just one type of key, would you?), you begin, very quickly, to make things very much more complex. And we’re trying to use online channels to simplify, save money and increase access, aren’t we?

There’s an inherent tension here: if the credentials you use are powerful enough to actually be trusted and useful, then they quickly become fraught with risk and unusability. I’d suggest that the risks scale faster than the benefits, which might account for the fact that a plain old general “account” type relationship with government hasn’t made much progress in well over a dozen years of (expensive) trying.

There are some twists too that come from the fact that it’s government we’re talking about here, not an online bookseller. We take a different view, as I wrote in the previous post, about business risks that attach to public sector transactions. Many people quite naturally think of government as one indivisible entity, even though many different agencies, people, standards, systems and contractors may be involved. That’s just reality. We want government to have an overall view of us a whole when it suits us, for instance when changing name, address, or informing of a death (à la Tell Us Once programme), but on the other hand we don’t want everything too joined up. We really don’t. Contradictions, paradoxes, tensions…

A few other twists: because these services are public, we expect (and deserve) the very highest standards of accessibility. And, if we’re serious about building them as part of the infrastructure of life in the UK, having a decent quality connection to actually get to them is a good start. We’d like to have more options about where transactions are served—to have more flexible models of delivery so that government might offer an interface to its processing engine, allowing other bodies to run a user front-end. But we want to be absolutely sure we don’t create brand confusion, or create gaps that accountability can fall through. Contradictions, paradoxes…

Oh, just one more—if your bookseller stuffs up with your account, you go to another bookseller: there isn’t another government—how do you really think you’re going to get your data back? (There’s more, but this is just a quick glide through some of the reasons we can’t just take a completely standard ecommerce approach to this.)

But, and it’s a big but, many of these challenges arise if we’re trying to envisage an account-type relationship with government. We’re conditioned to do so. We’ve been trained. By customer relationship systems in the commercial sector—we have Amazon, Google, eBay and our bank accounts—and we even have an HMRC online tax account. It looks, and feels, a bit like any other financial service. Surely there’s nothing more natural than trying to extend this concept to accessing health records, to applying for things like licenses, to making complex choices about social care? If you’re getting a bit wary that an “account” is a bit of a conceptual stretch for something you do only once every ten years (bearing in mind what’s gone before in these posts about the problems with a “general purpose” relationship) then you’re probably right. But that’s another side-turning we might explore separately.

If, and I believe this to be true, the concept of a general citizen account—a governmental panopticon which stores, links and serves us a unified whole—lies out of our reach, whether for privacy, security, complexity or technical constraints (and combinations thereof), is there another way?

The answer, we hope, is yes. US and UK policy at the moment is bent on developing along these lines, anyway.

The concept of this alternative: a trusted identity framework is tricky. There’s a particularly good description here of some of the concepts involved—which I’m not going to attempt to rewrite, for the moment. Except to note that it contains useful concepts such as what I like to call “transferability of trust”—the ability to reuse a trusted relationship (a classic one being that if you log into your bank online, it’s seriously likely that the bank will hold the correct address for you, and be able to confirm it) to do other things. You don’t have to reenter or reprove it, but crucially, government doesn’t have to go through the business of verifying and processing it, either.

But fragmenting the Nature of the Relationship like this is not without its problems. It doesn’t give us Tell Us Once (about changes of circumstances). Far from it—it deliberately compartmentalises an interaction so that just those bits which need to be proved, get proved. The eventual models of relationship that emerge are still being determined, I think—with relationships emerging that vary from entirely anonymous (though verified where needed) to increasingly rich with personal information. Maybe there is a Tell Us Once-type account at the bottom of the well, but I seriously doubt it. It’s all going to be a long and tricky journey.

I might bring back the Greek and his pet for a play with a trusted identity framework later. Might. This is heavy going ;)

The Nature of the Relationship, part 1

This is where the going gets tougher. The previous post here was about the different things we use to bodge our way around the minor inconvenience that you can’t actually prove anything about identity with absolute certainty (and it’s all even harder on the Internet). Accepting that we’re all just a collection of risks and uncertainties to be managed, and that we’ve got quite a few tricks (good and bad) at our disposal for doing so, we move towards an even knottier problem.

But to help us do this, let’s bring back Tortoise. Who is on a bit of a mission.

Tortoise: Achilles, Achilles—I’ve been reading all this waffly crap about online identity and I just want to get on with things.

Achilles: How so?

Tortoise: Screw it, fella. I just want my unique identifier now, please. I’ve got nothing to hide. I’m volunteering for you to strap everything you like on to me. Tie it to my old shell, big boy.

Achilles: You sure? Well, if it’s to make a bloody good point about what happens if you do—for the purposes of illustration—I’m game. You want it public or private?

Tortoise: How do you mean?

A: Do you want your identifier to be kept a secret that only you know about, or do you want it splashed everywhere in public?

T: Well, secret, I guess? Is it really a straight choice like that?

A: I’m afraid so. What type of things were you hoping to use it for?

T: Well, to log on to my local council services, naturally. And to see my health record. And to book a driving test. And to pay my taxes. And, and, and…

A: And you reckon that using this fiddly little string of numbers in what already adds up to hundreds of systems from that little list you’ve just given me means that number can be kept…secret? [raises a well-groomed Grecian eyebrow]

T: Fair point. So at some point I have to be ok with the fact that an abandoned hard disk…but surely encryption and good local security management policy will take care of that?…oh, wait, yeah, I see…I have to be ok with the fact that a big list of unique identifiers is going to wind up on Wikileaks or something like that eventually?

A: You do.

T: OK. I accept. I’m ok with that. After all, it’s just a string of fiddly little numbers. It’s not about me, the actual Tortoise that is me. Oh, or is it?

A: What do you think?

T: Well, I don’t really know. It could be. Or it might not be. If it isn’t, then is it really that much use? And if it is, I have this creeping feeling you’re about to show me cracking ice and swooping vultures. Hell’s bells, this has gone and got difficult already, hasn’t it? Why does this always happen? What’s the right answer, Achilles?

A: I guess it depends on whether you want to be identified as you, the real Tortoise, in all these transactions. And you do, don’t you? You have nothing to hide, remember?

T: Sure. But doesn’t that mean…oh I see what you’ve done, you clever bugger. You’ve let me neatly draw out the conclusion that the actual identifier is no great shakes, it’s what it’s attached to that really matters.

A: Quite. And as you’ve said that you’ve got nothing to hide, let’s take your Tortoise Insurance Number (TINO) and from henceforth make it the only identifier about you to be used anywhere in government. After all, lots of people keep banging on about how that must be the long-overdue common-sense solution to all this identity uncertainty. £1,500 please.

T: What? You’re going to charge me for giving me a number you’ve already given me?

A: No, don’t be absurd. This is just a one-off charge for all the migration work.

T: Migration?

A: Changing every single existing government system so they all sing and dance and recognise you off of this here TINO.

T: [Gulps] Is that strictly necessary?

A: Well, perhaps not. We could build some elaborate middleware and interfaces and yada yada yada. Might be a bit shonky and fall over from time to time. Or scramble your records with someone else’s. But you’re ok with that aren’t you. £1,500, remember?

T: It just all seems so expensive.

A: That’s because this is the real world, old son. I know when you were just out of the shell, you used to line up all the other tortoises and make up your own Little Tortoise Club stuff, giving everyone a secret name and a password?

T: Bloody hell – so I did. How did you know?

A: We all did. And it worked, didn’t it? You kept pretty strict records of, oh, a whole 10 individuals. Nothing leaked, nothing got mixed up, and it was all beautifully administered. And you used that as a mental model in your horny wee head of how identities and secrets and all that might work in the big world. But you know what, dear little chap? You were utterly wrong. This is a world of baddies, of fraudsters, of the incompetent and the helpless, of the excluded and the disabled. It’s a world of error, of approximation, of faults and mistakes. Lots of gritty reality that, if I’m honest, tends to bugger up enterprise-scale secrecyidentitysecurity systems faster than we can actually squeeze benefits out of them.

T: Lawks! Have you finished?

A: Yeah. But then I start again, and spend another £100m repeating all the mistakes I made last time. Just using a different firm of consultants. Boom boom!

T: So, to recap, I’ll be able to use my TINO wherever I like, accepting that at some point the relationship between it and me will come into the open somewhere, and that it provides a handy hook for anyone, anywhere, with or without me knowing, to hang whatever facts, associations or other metadata they like on me—which may be used against my interests to sell me stuff, compromise me or do loads of other bad things? And that I’ll be reliant on a panoply of passwords and other tokens to associate with my TINO to unlock the various doors that need unlocking in such a way that losing one of them doesn’t give the bad guys control of my entire life, but at the same time, a panoply that I will find easily manageable? I don’t see how that’s possible.

A: S’ok, my shelled friend. You have nothing to hide, remember?

T: I’m really not liking this much at all now. Is there an alternative to my ill-thought-through quick and dirty answer?

A: Why yes, there is. But we’ve just gone over 1,000 words, and according to the rules, that means waiting for the next post.

T: Oh, cloacas.

Who are you again?

This online identity stuff is very difficult—as I’ve written here before: much harder to truly grasp than it should be, in a peculiar way. I think that one of the reasons is that there are really two, logically separate things going on. Unless one puts a bit of mental legwork into understanding them—well, almost philosophically—all that follows in terms of technical solutions and so on can be irrelevant, at best.

So, those two parts: 1. how do you “prove” you are who you say you are? and 2. (the bit that’s perhaps harder to encapsulate) what is the relationship model that’s constructed when such a “proof” transaction takes place?

Let me try it another way: (1) what are you trying to prove and how do you go about that? and (2) what are the consequences of you having done that “proving”?

I hope to make some progress in illustrating why they’re quite different, but both very, very important. The first of those two parts—the “what and how you prove” bit—is the subject of this post. Probably because it’s the easier of the two. Though still complicated.

You never really prove anything, of course. If we are going to get into the business of cutting people open to extract a bit of DNA from their very bones and analysing it against some sort of uber-register of genome sequences…yeah, yeah, yeah. But we’re not. So stop being silly. (And they might have implanted somebody else’s bones, anyway. Ok, that’s silly. Or is it? Let’s move on. You see the point: every obstacle is just another challenge.)

What we do instead is use a number of arbitrary proxies for identity: tokens that either alone or in combination give a certain sense of assurance that their presenter is who they claim to be. The passport is a common (and relatively strong) example. There’s the photoID (with a government issued driving licence being rather more trusted than a cheaply-laminated snooker club membership card). There’s the infamous utility bill—which has the benefit of also fixing the presenter to a physical location of residence. You get the picture. Sometimes the detail is checked against something else, sometimes it’s recorded, and sometimes it’s not checked in any meaningful way, but the request itself is enough to dissuade naughtiness.

Because, for most of the transactions one carries out with government (central, local, police, whatever) checks like this are pretty damn important. (At least they are perceived to be, anyway, certainly in comparison to some private sector transactions. Compare the following headlines: “x% of cardholder-not-present credit card transactions are fraudulent, costing £Ybn per year” with “x% of online benefits claims are fraudulent, costing £Ybn per year”. Which one will have the nation frothing that Something Must Be Done? But that’s for another post…)

The guys at the gate of Caterham tip ask for a utility bill to confirm that you’re allowed to dump there. (Well, only when it’s busy, it seems.) To them, a location is the only important fact that’s been asserted—who I am, or indeed whether that utility bill matches anything else about me or my car, are unimportant. At the supermarket checkout, the young-looking booze buyer will only be troubled for something featuring a date of birth, and so on.

The tokens we use to give that degree of proof don’t have to be physical bits of paper, of course. We can memorise PIN numbers, or be asked for known facts about our previous transactions which only we’d be likely to know the answers to. We can set up “shared secrets” in advance so that only we will know the answer when challenged by our remote interlocutor.

We can have combinations of things used together—to see my bank statements online I now have to put my bank card into a reader the bank have sent me, pass a challenge, and then enter a result online. Sure, if you have my card, my reader, know my PIN and at the same time can open a session of my online banking you are me, at least as far as my bank is concerned. But that’s a lot of hardware and effort, and reasonably proportionate to the stakes involved, I’d say. We talk of “something you have and something you know” as a basic type of multi-factor authentication, or “something you have, something you know and something you are” if we add in a biometric component.

You see the point?—there isn’t really any proving going on. Just an exchange of information that gives a certain level of assurance, upon which trust can then be built. Sometimes it’s done well. And sometimes it’s not. Sometimes the requests for “proof” information are proportionate to the task being undertaken. And sometimes they’re not. But the request/risk relationship is likely to be quite specific to the task being attempted.

You’ll notice that I freely used offline examples above, when normally I bang on about how hard all this is in the online world. Well, the concepts are the same. It’s just that there are some characteristics of online channels that tilt the tables of risk. The lack of a face-to-face element removes some of the visual cues we might use to strengthen trust in a claimed identity. But this applies to the phone as well (how many times have I assumed the guise of “Mrs-C-with-a-cold” to try and sort out a minor squabble with a utility company?).

No, what makes things really very different in the online channel are those two old favourites: accessibility and recordability. The friction of having to find a benefits office, queue up, and try it on with the clerk by wearing a false moustache all disappears. You can be fast, anonymous and massively multi-tasked, using tools to try thousands of entry points and potential tokens simultaneously.

And what you do undertake, successfully or unsuccessfully, creates a record—leading to all sorts of other consequences—something that doesn’t happen when a guy in a fluorescent jacket glances at your water bill. Nobody writes anything down in lots of offline transactions—that’s important. Or captures and indexes it, for example, on video. (The indexing bit matters, by the way…but that’s taking us into the next area: the Nature of the Relationship.)

Oh, and I fear there’s one other powerful reason why this is so challenging for those who “think digitally”—a digital relationship is generally conceived as one of certainty—the bits match the requirement, ergo the door is unlocked; whereas everything above is an assembly of probabilities, seeing people less as people but as a collection of analogue risks, in a context where “good intent” and “assurance” are just shades of grey. No wonder we experience some cognitive dissonance in this area.

If you’re now drowning in a sea of uncertainty and looking lovingly back at that idea of sawing people open and extracting an inarguable(?) DNA sequence—congratulations. This is a highly normal response. Rushing back to a “unique identifier” to solve everything is pretty common. Engadget managed to do that neatly in their headline yesterday on the latest moves in US federal identity assurance—even though the source material talks about something rather different—a distributed identity framework. I’ll cover this, and the fallacy of the “unique ID” as a solution, in the next post: this dark business of the relationship that’s created as a result of digital transactions.

I might need my Greek hero and his friendly chelonian to help with that one. This stuff is not easy.

But what helps me sometimes, when thinking about this topic, is that this is a game you can play at home. Sort of. Every time you exchange anything about you (whether that involves your facial features, your money, or information about you) with anyone, anyone at all, online or offline, think about what’s actually being exchanged, why, and what the consequences could be. Try witholding everything except what turns out to be absolutely essential. Lie, subvert, play (within reason). It’s going to be useful to hone this awareness and these skills, I suspect.

Now read on…