Know Me, Know Me Not


A featureless airport departures hall.

Behind the check-in desk, a large warrior stands, strip-lighting lending a pale lilac wash to his magnificent plumed helmet.

Half-way along the queue is a rather dishevelled Tortoise, surrounded by heavy bags.


Achilles (for he’s back again): Oi, Tortoise!

Tortoise [po-faced and unresponsive]


Tortoise: WTF? How do you know my number? Thought that was just between me and the hatchery?

Achilles: See this print-out of your markings? [holds up said print-out] Got this off of Google; on CheloniansOfNote.com it was. That’s you, isn’t it? Blotch, blotch, stripe, worn patch, shape that looks a bit like David Willetts’ head? Yes? Got a few other bits of info here too, to help me recognise you and the better to meet your every need.

T: Um, so I see. But how dare you…

A: Hang on, my horny-carapaced friend. Shuffle up to the front here. Let’s have a quiet word about this. [Tortoise makes the painfully slow journey to the head of the queue, nudging his bags one by one with his nose.] This is what you wanted, see?


A: You told us. You did. Well, not you individually, Tortoise NP150…


A: Ok, ok. Well, collectively, our customers said things like “Hey Trojan Air, time to wake up to the new world and start treating us like people. We’re not just lumps of flesh with wallets. We want you to throw away all that stiff, corporate formality. Get to know us. Empower yourselves. Adapt. Use a bit of bloody initiative. See us for who we are.” So we have.

T: Yeah, but you can’t just go gathering information like that about me, without my permission. It’s like me shell’s been invaded. Horrible. Oi moi!

A: Don’t go getting classical on me: these characterisations are only pixel-deep. Now, look over there, now, at the SleazyJet desk. See that queue? Hundreds of them. Hot and knackered, they are. And going nowhere for a couple of hours yet. Now, I know, and the SleazyStaff know, that there’s a nice little waiting room round the back. With just one very comfy seat in it. And air-con. They can’t tell everyone, it’d get rammed. But see that woman just there? With the huge bump? Could drop any minute. You think it’s ok for the staff to, you know, use their bloody EYES to spot her, and offer her that seat? Or are you going to go all “no, no, they must know nothing, they must treat us all-equal-and-anonymous like”?

T: Well, I suppose that’s a bit different.

A: So it’s ok to use my bloody EYES to infer stuff about my customers, so’s I can make their service better, but it has to stop when I use, what? A computer? A phone? A database?

T: Now you come to mention it…

A: Because isn’t that where mechanical process (oh so twentieth century) stops, and service begins? When we start inferring? When we use one of the very few gifts that mankind seems to be blessed with – pattern recognition – to judge that if someone is cross-legged and hopping from foot to foot, it might be politic to proactively remind them where the loo is? To check on our systems so that their seventeen letters of complaint that they keep getting woken for meals when they’d rather sleep haven’t been an utter waste of time? To infer, beyond this, that similar awakenings for important matters of Shop-In-The-Sky sales might also receive an unfavourable response even though they haven’t actually WRITTEN TO US ABOUT THIS NOR GIVEN US EXPLICIT PERMISSION TO EVEN GUESS IT MIGHT MATTER TO THEM?

T: Steady on, old boy.

A: Sorry. Emotive stuff, this. Which is why this post is written as a dialogue – less confrontational that way. Where were we? Oh yes – look over there! PoshAir have got one of their regulars arriving. He’s a FTSE-100 Chairman, he is. Yeah, I know. Miserable and anonymous, grey and crumpled, to you and me. But to him? The Grand Kahoona. The Large Cheese. He wants to be recognised. And look again: by the sort of chance that only occurs in allegorical blog posts, he happens to be featured on the cover of this month’s Kahoona magazine over there on that newsstand. Now, shall we ask their staff to shield their eyes so that there is no prospect of them contaminating their green-field minds with this inarguably public-domain factuality of who the fuck he is?

T: Yeah, but it’s invasive. He might not want to be recognised.

A: Isn’t that a matter for their judgement? They are, remember, humans. Providing a service. Let’s at least hope they have some basic lightness of touch. They do not have to march up and shout “Mr Cheese great to have you back it has been 34 days and 2 hours since you flew with us shame about the collapse of the zinc deal in Bolivia your usual gin and valium then?” A mere “Mr Cheese, good to see you again. Let us know if you need anything” isn’t invasive. Invasive is ferreting through information that’s not public. Invasive is phoning people up or emailing them out of the blue, forcibly taking their time away. This stuff here is just observation, inference and discretion.

T: Ah, but it’s where it could all lead, innit. That dossier on me that you’ve got behind the desk…

A: Dossier? Ooooh how very Le Carré! You got that out of that article, didn’t you? One of many using lurid language to play on everyone’s fears about “where it could all lead”.

T: Call it what you will. You are reprocessing data and creating databases and riding a chariot and horses through the provisions of the Data Protection Act (1998). And you know it.

A: I am, and that’s a very fair challenge. I am struggling to justify it – hey, hang on, pass me your phone for a minute.

T: No bloody chance. You know enough about me already.

A: I just wanted a quick peep at your contacts book.

T: That’s none of your business.

A: And yet you download all these apps to your phone and give them permission to access what must be hundreds, maybe even more, personal records and upload them to Morin Towers and gods knows where else, and remind me at what point did you register yourself with the Information Commissioner let alone do any of that “seeking consent” hoo-ha?

T: Yeah, well, that’s for organisations. I’m just Tortoise.

A: Tortoise With A Talent, Ltd, according to my, erm, “dossier”. You still think the boundary between individual and organisation is that clear, and in any case serves as any sort of robust moral framework for this sort of issue about data responsibility? You still content that the DPA (1998) is in any way fit for purpose for the world we now live in? A world of massive volunteered personal information? A world where even if you don’t put your own pics up somebody is going to tag your face and you will be able to do jack all about it and will just have to get over this unassailable fact?

T: I suppose. That’s all going to need clearing up when they refresh the Data Protection Act, innit?

A: Just. A. Bit. But in one final attempt to justify my creepy snooping, can I at least appeal to your libertarian side? It’s one thing to berate the state for acting like this, for gathering information and building megadatabases about individuals. Its civic hygiene may one day become suspect, its motivation potentially questionable, and it’s pretty hard to avoid. But this is a freaking airline. You don’t like what we do, if you think we’re creepy, then you’ll stop using us, and we’ll change the way we work to get you back again. Less of this Big Brother Watch angst; save that for those who really deserve it. Frankly Tortoise, there’s some cognitive dissonance going on here. I know (coz it says so in your dossier) that you hate all this state intervention stuff. You really want businesses to be able to do a good job with the very lightest hand of regulation ‘pon them. Now you’re making no sense with all this paranoid guff.

T: Ok, ok. The jig’s up. I guess what’s really going on is that a general, non-specific feeling of impending doom about personal data in the cloud (and in our hands/claws) is creating a toxic environment where any story that even touches on search, or social networks, or biometrics leads us to throw all common sense out of the window. I guess.

A&T: Oi moi! Ta’las! Tlê’môn!

midata: revolution or enigma?

No technology contracts bigger than £100m.

Bye-bye proprietary software monopolies–hello Open alternatives.

An avalanche of government data to generate new business opportunities and pump billions into the economy.

Fast broadband for (almost) all.

Agility, everywhere–no more risk-averse, unchangeable systems–instead, a commitment to diversity and experimentation.

Reskilling in-house tech teams, reducing dependence on external suppliers with vested interests.

And after years of false dawns, services actually joined up around–and designed for–their users.

There’s not a lot not to like, really. Is there?

Just before the election we heard a torrent of such promises. Watching the gathered geeks and entrepreneurs around me at the launch of the Conservative Technology Manifesto last March I could see tongues virtually hanging out. We weren’t just being offered the keys to the sweetshop–Francis Maude and Jeremy Hunt were pretty much proposing ripping its doors off.

How much of these sweeties have actually been delivered post-election is a story for another day (ah, the shackles of that Coalition Agreement, I’m sure…).

But over recent weeks and months we’ve seen glimpses of another what’s-not-to-like initiative. And now it’s been launched.


[Ok, try this link. I was making a dodgy CMS point with the first one, that Google (and BIS site search!) gave me…]

So here comes the grumpy blogger to get all picky with what on the face of it is a risk-free, consumer-enriching move willingly volunteered by industry, facilitated by government, to make real people’s lives easier at no cost. (Coz there’s loads of those.)

Well, not so much of the picky, really–just an interest in shining a light into some of the corners of this debate. Because corners and angles there most certainly are.

The first thing to get to grips with is that there seem to be two big agendas wrapped up together here.

Both can be connected to the words “me” and “data”. But they seem to be quite different in their nature and purpose. That’s always a recipe for confusion if not properly unpacked. So let’s see what we have.

Agenda 1: better information for consumers

We have a consumer empowerment angle here, clearly. “Giving people back their data” is billed as putting the customer back in control when forming or reviewing a relationship with a vendor. For some services, especially things like utilities and telecomms, the case is very tangibly made.

We generate a lot of data in consuming the service. Understanding our consumption patterns in detail would help us when making future choices about service provider, as we’d be able to match the terms that were on offer with what we actually needed.

So far so good.

This also extends to things like preference data: as we go about buying things (and even just looking at them) we generate a cloud of information about our preferences, choices, needs and their timing. This has a value–how much, nobody really knows, though there are some florid estimates–to marketeers, and could drive better deals and more targeted, less intrusive advertising.

Agenda 2: proving your identity online

The moment we started to move transactions away from being with someone you knew personally in your village, we increased the complexity of how you prove things: who you are, can you pay, entitlement-by-residence and so on. Online, it’s pretty horrible, and attempts at building something that’s simultaneously secure and usable by normal people have foundered.

(There is more elsewhere on this blog about these issues–otherwise this post would be very long.)

Suffice to say that the current approach (which actually looks pretty promising) is that of “federated identity assurance”. Not trying to create one massive database of people information against which things can be checked, but to use information sourced from a number of existing trusted relationships, in combination, to give sufficient assurance of identity.

Which means that both these agendas are the same, doesn’t it? They both involve consumers getting their hands on personal data that’s previously been locked up in companies.

Well, actually, I don’t think it does.

Why not?

A definition of “personal data” is harder to pin down than might seem initially apparent [more here]. Lots of things that don’t look that personal by themselves (points on a map, equipment serial numbers etc.) take on a whole new power when linked to an individual.

There’s the obvious “personal facts” stuff, of course: name, address, account number etc. which usually (but not always) identify an individual.

Then there’s operational data, made much of by midata: what we’ve used, what we’re interested in, what service choices we made etc.

Releasing structured chunks of this latter type could well meet Agenda 1’s objectives. And there are design choices to be made here which will have a big impact on risk and privacy.

Would it be sufficient to get a log of mobile calls by time band and number type, for example, rather than a detailed list of numbers actually called, and precisely when they were made? The former could well be enough to allow a better contract to be found: the latter would be a potential privacy nightmare, not just for the caller, but also whom they called, if it were mislaid.

My point being that meeting a consumer empowerment agenda requires the “giving back” of information with certain characteristics–i.e. tailored to fit the way that consumer services are packaged.

But the giving back of information to help confirm an identity relationship–Agenda 2–seems to me to be a very different beast.

Because I thought the whole concept of using a number of different identity providers was that you asked them to pass confirmations of trust around–not the actual personal data itself? So one might ask a bank to confirm electronically that some submitted data matched a record that they held, but that’s not the same as handing the requestor (or indeed the individual) chunks of personal data.

So I fear that in an attempt “not to go into too much detail” we’ve got a conflation of two separate, interesting, important issues under the midata flag.

One can always argue that “it’s the principle that counts–we should establish that first, then let the clever people get on with the solutions”. Well, yes. Ok.

We did that with electronic patient records, with Post Office smartcards, with national identity cards and registers… At some point we do need a public airing of the underlying principles in a greater level of detail than the initial press release. And before a major delivery programme has been commissioned, I’d suggest.

Other than this “issue overlap” there are a few other points that strike me about midata. There is this underlying sentiment that consumers have a right to “their data”. But what is it that actually makes a particular piece of data “theirs”?

Information about usage is a hybrid of personal facts (e.g. who is the account holder?) and operational information as a consequence of service use. How far does it extend? Basic consumption patterns? Probably yes. Detailed, time-stamped records of every purchase and all parties involved? Hmm. Maybe. Serial numbers and last maintenance dates of the precise routers and masts that were used to deliver a phone call? Well, now you’re being silly, Paul.

Yes, I am, of course. But I’m trying to illustrate that the translation of this “right to data” into reality involves more than just signing a memorandum of understanding. Update: there’s a more detailed post about “Whose data is it anyway?” here now.

And then there’s the cost angle. Even if we assume that the addition of a simple bit of code will suddenly enable service providers to spit out raw chunks of data onto the Internet (aka the “it can’t be that hard to get their systems to…” fallacy argument) the midata announcement is already talking about a greater degree of sophistication: particularly the bit about “access, retrieve and store their data securely”. Who’s going to pay for that?

And do we have robust evidence that there is interest and demand for this type of data release, other than from the vociferous lobbyists with their eyes on constructing a wealth of new “personal data store” opportunities?

It’s great to see entrepreneurial spirit flourishing, but how much is this about solving real consumer problems, and how much about playing yet more variations on the “consumer as product” theme–you tell us about your interests, and we’ll give you better deals (but only as a share of what we’re really making by selling that information to other vendors).

The argument that better information increases customer choice, and therefore power, is of course another “what’s-not-to-like”. But if you take a step back, and look at the implied problem that “people don’t know which is the best deal as they’re all so complicated and people don’t really know what they use anyway…”

…would you put your energy into releasing chunks of data to help make a better match with a complicated tariff, or would you have another look at the issue of tariffs in general, and simplify them? Yes, both represent some form of intervention, and I can see the political attractiveness of the former, as (especially under a voluntary scheme like midata) it plays down the regulatory role in favour of cheerful vendors all quite happy to be a lot more transparent with their/your operational information. But one wonders just how sustainable this level of voluntary cooperation would actually be in the longer term in highly competitive markets…

That’s a bit like imagining a set of doors with fantastically complicated locks, and giving people the right to have equally complicated keys cut–rather than pushing for simpler locks in the first place.

So, a lot of questions remain. Conceptually, midata isn’t something that could or should be objected to. And this post is not written to criticise, but to suggest a few areas that need more detail and analysis.

When we see press releases that let fly with cool talk of data, empowerment and choice we should be getting a lot more eager to ask the next level of questions. What does this really mean? How will it work in practice? And what might some of the broader economic, competitive, social and privacy implications be?

Until we do, we’ll be dazzled by press releases and then a bit disappointed when delivery swings into action. And it’s usually too late by then to do much about it.

Getting personal

For a long time, I’ve shied away from writing here about personal data. Or even thinking that deeply about it. The nature of identity, yes. The usefulnesss of data, yes. Personal data, no. Why?

Not because it isn’t fascinating, or important. Mainly because it’s so…damn…nebulous. And difficult. Time to get over that, I think. Very significant things are happening in this area, and we all need to raise our game in how we understand and engage with the concepts involved.

As I’ve surmised before, the only things that are really different in the Internet age are the ease with which information can be found, and the ease with which it can be stored.

Two things, really. That’s all.

The first embraces everything around indexing, cross-referencing, labelling, structure and searching. The latter takes us into the territory of copying (and of course copyright), archiving, and the general issue of persistence.

And when we look at personal data in that context, there is an immediacy–and potential toxicity–in what emerges.

We saw early rumblings of this long before the Internet, of course, when computers were first used for the mass processing of information about people. Things could be done with databases that simply weren’t possible with big paper ledgers.

We created Data Protection legislation which attempted to put reins on the ability to make free use of some types of information. Gathering stuff about people, from the basic facts of who and where, to how to contact them, who they were connected to, and what their tastes and preferences were. Pure gold, used in the right (or wrong) ways.

Data Protection set out some pretty sound, but general, principles. The overarching one being that the purpose to which data could be put should always be made clear to whoever provides it, at the time of providing. Lots of other stuff about processing, storage, where and how long, and so forth–but that issue of consent always seemed the most important, to me.

And we scratched about a bit to actually try and define what we meant by “personal data”. Some things were easy. Names. Addresses and phone numbers. They’re just obvious.

But what about our tastes? Our buying history? The movements of our mobile phone from cell to cell? A journey we took? As one takes informational side-steps away from the individual, the obviousness diminishes, but if you can make meaningful connections back to the person…

…and remember the first thing that the Internet really changes?

Being able to make those tenuous links between blocks of information into something really substantive.

And the second thing? That information and those links are now permanent. You can’t delete them, once they’re there.

All those things that databases couldn’t previously do, because they all conformed to different standards, and weren’t connected together? They can now. Things can be done via the Internet that simply weren’t possible with just the databases.

Bit by bit, it’s been possible to build up the most humongous repositories about people. Maybe entirely within the law, maybe in other ways as well. Maybe with explicit and informed consent all the way down the line. And maybe not.

Who’s to know? We find strange things going on with data that we provide in order to use one or other service–or even to exercise our democratic rights. Didn’t it ever strike you as slightly weird that the electoral roll could be sold on for commercial purposes? (Much more on the electoral roll in another post coming soon.Update: now here)

We have big companies that have built successful businesses just like this: perhaps using aggregated personal information for credit referencing, perhaps to sell to marketeers to give them a better understanding of demographics.

The genie is very much out of the bottle. Your rights to see the information that a particular company holds on you may exist, but you have to have a fair idea of which company to ask in the first place. Can you ever see the full picture of what others know about you?

Of course not.

And it’s unreasonable to suggest that we’ll ever be able to do that. Instances of data multiply more rapidly than does our capability to track them. (There must be a Law of Internet Entropy out there that says something like that. If not, I just invented one.)

(As an aside, a dear friend once uttered the memorable line “somewhere out there, there’s a database with your dick size on it”. That was in 1989.)

So what can we do?

Realistically, all that’s available to us are firebreaks and friction.

We can’t get that genie back in the bottle, but we can slow it down a bit, and find ways to mitigate the impacts.

Do we need an updated definition of personal data? It’s MUCH harder than it seems at first glance to create one. The best I can find at the moment in terms of an “official” position is here.

And it’s clumsier than you think. Essentially, it’s a list of ever-widening filters that assess whether a particular piece of information can be connected to a specific individual. Culminating in the rather wonderful catch-all of the final category:

8. Does the data impact or have the potential to impact on an individual, whether in a personal, family, business or professional capacity?

Yes The data is ‘personal data’ for the purposes of the DPA.
No The data is unlikely to be ‘personal data’.

Even though the data is not usually processed by the data controller to provide information about an individual, if there is a reasonable chance that the data will be processed for that purpose, the data will be personal data.

That’s pretty general, no? In fact, going by that, an awful lot of things are now personal data. I really like the emphasis it puts on the outcome of the data use, not attempting to over-define things like form and structure.

I’d go as far to say we should probably throw away that big long document, and just run with this definition:

Personal data is information that affects you when it’s used. Either directly, or through being linked to other information using technologies that exist now, or may exist in the future.

Broad enough? ;)

(So my beloved photos: they’re personal data. I take them with a camera that has a unique number, held in metadata in the picture file. That provides a way to link all the pictures it takes together, and then, through the various accounts I put them in online, back to me. Think how many other trails you leave…)

But again, all we really have are firebreaks, and friction. There’s a sort of reverse entropy at work. Unlike almost every other instance of entropy–where things get more chaotic over time (china plates get broken, they never put themselves back together again)–personal information is, relentlessly, only going to get more linked. More aggregated. More pervasive. More permanent.

(So, maybe I just invented The Law of Reverse Internet Entropy as well? Not bad going for one post…)

And if someone tells you that big blocks of personal data can be “de-anonymised”, be very sceptical indeed. (You can read some wise thoughts on the issues involved here and elsewhere on that blog.)

We can undertake some pretty noble fire-breaking: like ensuring the state doesn’t become the source of a global universal identifier for you. And we will certainly see more developments around multiple personas: compartments of your life associated with particular tasks, contexts, or connections. I think we’ll have to. (The concept of federated identity helps here, but that’s too much to go into for this post. Read more thoughts from the team working up these concepts for government.)

And we’ll adjust. Society has seen some pretty dramatic upheavals. Often associated with a new technology, or philosophy. If we adjust our societal norms faster than the upheaval, we don’t notice. If we’re slower to change, it’s painful. For a bit.

But we get through. We adapt. And we change. Always.

Google Plus Ungood

I know many people have managed to get up and running with Google+ fairly easily. The usual snags have been reported, of course, as users get used to the idiosyncrasies of the network, and as new etiquette and conventions emerge.

Today, it’s become clear that there are some deeper issues emerging, as Google enforces a “real names only” policy. Erm, good luck with that, in a hard identity sense, guys. Unless you’re going to try and peg people back to a state-issued identifier… (no, I’m not even going to go down that road of horror).

There’ve also been a few nasties creeping out of the woodwork as users realise some of the drawbacks of putting it all in the cloud. One wrong step with your service provider, and you’ll be writing a rant like this as thousands of hours of curation, not to mention thousands of irreplaceable and irretrievable content files, are briskly wiped out.

But for me, Google’s latest foray into social networking has pretty much been a non-experience. Although I was invited fairly early on, and signed up successfully for a few days, it all went belly-up pretty soon afterwards.

Why? Because of the cack-handed way in which Google identities work, that’s why. Here’s the detail.

Like most people, at some point I signed up for a Gmail account. I didn’t get a very nice address, as I wasn’t in there early enough, but it is a version of my name.

What I do have, and use instead, is a funky email address that I set up 10 years ago, and a couple of years ago moved over to Google Apps. (Bear with me.)

That email address is pretty much the way in which I’m identified for all services I use that are based on email address. In many ways it is my self-asserted identity on the Internet.

So it won’t surprise you to learn that when I came to create a Google profile, based on an email address, I used my “home” email identity.

So far so good, and for a couple of years everything worked smoothly. Google Apps did the things Google Apps did (email, calendar, contacts). And for the other Google services I used (Analytics, and probably not much more than that), I logged in with my Google profile. All was well. I had a slightly uncomfortable feeling that there may be trouble down the line though, with two identically-named identities that were logically separate.

And I was right.

A few days after I joined Google+ I got a friendly-but-firm email from Google. “We’re consolidating your accounts,” they told me. This dual use of the same email address can’t go on. Not optional. Indeed.

As I’d been invited to Google+ using the “profile version” of my email address, I feared the worst. And I was right. That was the account which was going to be stripped of my preferred email address. To be replaced by a “temporary address”–something horrible with a percentage sign in the middle of it. Great. The G+ connections dried up–nobody knows me as “the percentage sign email guy”–they know me as my ordinary, erm, email address. Bugger.

It got worse. To be able to get into G+ at all I didn’t just have to log in to Google using the temporary profile, I also had to log OUT of Apps (explicitly, even if I were already logged out of “normal” Google)–otherwise Google thought I was attempting to access G+ using a “business identity”. The horror!

The solution–according to Google–was to assign my Google profile an entirely new email address. Right. A new identity, for what could emerge as a pretty important service, should Google actually get their act together. An identity, and email address, that I didn’t need or use anywhere else. Not. Ideal.

So we have an impasse. I am hanging on, the temporary account unused and unloved, in the hope that Apps users will at some point be able to use their Apps email as a G+ identity. (It’s a rather faint hope, given the strategic direction that Google seem to be taking with identity.)

Why would I waste time now building up a social network where I, quite literally, don’t know who I am?

But it explains why I’m not part of this party, remain unconvinced of Google’s ability to handle the basics of social interaction, and am pursuing a wholesale review of my domains, addresses and identities for what now seems an inevitable clean break, sooner or later, with Google. Nice work, chaps.

Update, 25 July: a few morning-after-posting thoughts

Is there any real significance in all this? Surely this is just the moaning of yet another free-service user who didn’t read the Ts&Cs? Nothing paid, nothing to complain about.

Well, this is significant, because:

  • Identity, and cross-platform identity, are hugely important in an ever-more-connected world. Mess with those and you mess with the core of user experience: user existence.
  • Like it or not, it’s hard to see how a relationship with Google won’t form some part or other of everyone’s Internet activity at least over the next few years. This makes a Google profile (whatever neglect Google may have shown for it to date) disproportionately important.
  • The attempt to enforce “realness” is weak. Google’s requests for reference to “government-issued ID” (redacted or not)–whether to “prove” age or identity–is a troubling step. It puts a little friction in the path of being anonymous, sure, but if you want to, you can be.

And these characteristics (inflexibility, heavy-handedness, dependence) are all indicators of things that we’ll need to worry much more about in the future.


Google account administration functions really are up the spout. Here’s a good piece by Dan Harrison on Google administration in general, and another on Google Apps deficiencies in particular. I’ve said it before: if a profit-focused, cash-rich organisation like Google finds identity so difficult, do we really hold out much hope for government?


Google Wave also revealed some of these flaws. I actually thought, briefly at the time, that the whole Wave concept was actually a Trojan Horse to get people to sign up for a Google profile (or to take one more seriously if they already had). And what did they force me to have as my Wave ID, despite me already having a friendly Apps address, and a slightly less friendly Gmail address? Something like paulclarke0001@gmail.com (I actually forget how many zeroes.) Face hits palm.

On the shifting of control of personal data

If you’ve been locked in a cupboard for the last five (or more) years, you’re excused from observing this thematic shift:

In the longer term, data about people is more likely to be owned and controlled by them. Rather than having many instances of personal information scattered around organisations and agencies, to be confused, duplicated, corrupted and left on buses, simpler technologies have emerged to put the data owner, you, back in control.

We see this theme emerging with several different labels: from vendor relationship management, to volunteered personal information, to personal datastores, to a “control shift” in the concept of personal data.

I agree that this shift is inevitable, to a greater or lesser extent. Everyone wants it. What’s not to like? Less cost of processing, greater security, reinforcement of personal rights etc. etc.

We start to make the ideologically satisfying separation of identification and authentication/entitlement more of a reality. More of this in other posts.

I just have two snagging issues which I’d love to hear a response on from those who want to get us moving on this now:

The first is a transitional one, but an important one. As the group of “personal data holders” grows, the infrastructure and operations required to support the other group won’t change. There’ll be a double running of systems. Although this is inevitable with any system change, it puts an immediate disincentive on any service provider to explore this route. (But this is not my point here.)

My point is that strange things will start to happen in terms of operational continuity and completeness. There will be “gaps” in databases, where the personal data holders used to be. Instead of their information, there will be links and interfaces to the data they control for themselves. Will this create all sorts of headaches and risks just by itself? Enough to seriously dampen any service provider’s enthusiasm for adopting volunteered personal information?

The second will persist, and is perhaps more problematic. Because your personal information (whether it’s about your identity, other descriptive information about you, or about your authorisation to a particular service) is going to have to be assured by someone. This may not, and indeed should not–in the case of identity–be the exclusive province of government agencies, but someone is going to have to do it.

Some will do it well: banks, for example, are rather more incentivised (and skilled as a result) to be damn sure you are who you claim to be. But some won’t. And when we get down to the level of a patchwork of assurers, in any system, we start to get some problems. When things go wrong (and they will)–have a vision of a functional world by all means, but build for the real, dysfunctional one–the untangling of liability may consume more resource than was ever achieved by enabling the shift of control in the first place?

Thoughts? I’d love to be convinced. I really would. But I’m a healthy skeptic at the moment.

Why the big fuss?

The usual parade of whimsy on this blog about this or that in public services, or things-that-make-my-head-hurt-in-general, has been rudely interrupted by a series of diatribes on identity and trust online, with a focus on people interacting with government.

Why, why all this attention to some rather obscure mental masturbation?—I hear you cry.

By way of brief explanation:

1. It’s fascinating: intellectually, socially, and philosophically; here we find very real and somewhat abstract concepts fusing together to try and do an important job.

2. Did I say important? It’s really, really important. Progress on this issue has the potential to shape some pretty fundamental things about our privacy, freedom and relationship with the state (and with each other).

3. It’s big. A brief look at the history of computerising National Insurance or patient health records is enough to show that national-scale anything of this nature is not to be taken lightly. Grand schemes have to be very well designed before implementation (and not just technically, but socially and behaviourally); start-small-and-scale requires a good understanding of what will really change as things grow.

4. It’s fraught with paradoxes: it’s easy to imagine tempting answers – very hard to design workable solutions. What seems easy in broad outline dissolves into complexity in the detail. The ingredients themselves are elusive: shape-shifters at times—identity information can be there to act as a reference (help me look your record up), a verification (we’ll check that out against our database), a diversion of risk (well, we ask all callers for their date of birth), a red herring (no, I do need your mobile phone number before I’m allowed to talk to you…Data Protection innit?)…and more. And the best hope we currently have for a solution relies on concepts which are far from our mental models of how such things should work.

5. I don’t know all the answers—I know a few of the questions, that’s all. I am happy to be set straight about any of this—if you can describe a simple, workable solution, please do so. Just don’t start “well, can’t we just give everyone a unique number…?” ok? If you hear someone spout that we should be able to knock up an Amazon-type “account for government” tomorrow, gently ask them to go a little further. Ask a few questions. Ask if it has to be “the real you” holding the account. Ask if you can have more than one. Ask if you’ll have to have one, even in the distant future. But be nice.

Finally, a consoling thought, before I leave this topic for a while. There are some parallels here with another tricky technology/people problem: for thousands of years cryptography was beset by one major problem—how do you get a key from Alice to Bob so that Bob can unlock a message when he receives it? Anyone intercepting the key could then intercept the message that followed and open it. Seemingly intractable—one’s only option was to find clever ways to exchange or vary the keys—it was blown away by a neat bit of maths in the mid-70s, leading to a simple form of code-making (involving no exchange of keys) which underpins secure ecommerce to this day. Perhaps there’s something out there, as yet undiscovered, which will allow us to square these circles of usability, privacy and assurance. A public-key cryptography equivalent for identity. I just wish I could find it first.

The Nature of the Relationship, part 2

In which we look more deeply into that business of what an online trusted relationship actually means—over and above the mechanics of actually “proving” something about it to a particular degree.

New readers will probably want to read an introductory piece, a logical separation of issues relating to trust from those of identity relationship, and the post immediately preceding this one. (Keener-eyed regular readers may now be getting some clues as to what this oddity was all about.)

So we’ve found so far that some of the stuff we imagine should be quite simple, isn’t. A single log-in using one identifier to get to lots of services is a shaky concept. In theory, it should be fine (we can create models very easily in our minds of things that work like that and don’t cause much difficulty). But in practice it creates what—at scales of national, or even widespread local level—quickly become data management and security nightmares. It leaves the way open for other things, perhaps unwanted, to be attached to that identifier, covertly or overtly. And, assuming that you provide a few different passwords or other tokens, or even add in some biometric checks to the mix (coz you wouldn’t want to lock all your possessions using just one type of key, would you?), you begin, very quickly, to make things very much more complex. And we’re trying to use online channels to simplify, save money and increase access, aren’t we?

There’s an inherent tension here: if the credentials you use are powerful enough to actually be trusted and useful, then they quickly become fraught with risk and unusability. I’d suggest that the risks scale faster than the benefits, which might account for the fact that a plain old general “account” type relationship with government hasn’t made much progress in well over a dozen years of (expensive) trying.

There are some twists too that come from the fact that it’s government we’re talking about here, not an online bookseller. We take a different view, as I wrote in the previous post, about business risks that attach to public sector transactions. Many people quite naturally think of government as one indivisible entity, even though many different agencies, people, standards, systems and contractors may be involved. That’s just reality. We want government to have an overall view of us a whole when it suits us, for instance when changing name, address, or informing of a death (à la Tell Us Once programme), but on the other hand we don’t want everything too joined up. We really don’t. Contradictions, paradoxes, tensions…

A few other twists: because these services are public, we expect (and deserve) the very highest standards of accessibility. And, if we’re serious about building them as part of the infrastructure of life in the UK, having a decent quality connection to actually get to them is a good start. We’d like to have more options about where transactions are served—to have more flexible models of delivery so that government might offer an interface to its processing engine, allowing other bodies to run a user front-end. But we want to be absolutely sure we don’t create brand confusion, or create gaps that accountability can fall through. Contradictions, paradoxes…

Oh, just one more—if your bookseller stuffs up with your account, you go to another bookseller: there isn’t another government—how do you really think you’re going to get your data back? (There’s more, but this is just a quick glide through some of the reasons we can’t just take a completely standard ecommerce approach to this.)

But, and it’s a big but, many of these challenges arise if we’re trying to envisage an account-type relationship with government. We’re conditioned to do so. We’ve been trained. By customer relationship systems in the commercial sector—we have Amazon, Google, eBay and our bank accounts—and we even have an HMRC online tax account. It looks, and feels, a bit like any other financial service. Surely there’s nothing more natural than trying to extend this concept to accessing health records, to applying for things like licenses, to making complex choices about social care? If you’re getting a bit wary that an “account” is a bit of a conceptual stretch for something you do only once every ten years (bearing in mind what’s gone before in these posts about the problems with a “general purpose” relationship) then you’re probably right. But that’s another side-turning we might explore separately.

If, and I believe this to be true, the concept of a general citizen account—a governmental panopticon which stores, links and serves us a unified whole—lies out of our reach, whether for privacy, security, complexity or technical constraints (and combinations thereof), is there another way?

The answer, we hope, is yes. US and UK policy at the moment is bent on developing along these lines, anyway.

The concept of this alternative: a trusted identity framework is tricky. There’s a particularly good description here of some of the concepts involved—which I’m not going to attempt to rewrite, for the moment. Except to note that it contains useful concepts such as what I like to call “transferability of trust”—the ability to reuse a trusted relationship (a classic one being that if you log into your bank online, it’s seriously likely that the bank will hold the correct address for you, and be able to confirm it) to do other things. You don’t have to reenter or reprove it, but crucially, government doesn’t have to go through the business of verifying and processing it, either.

But fragmenting the Nature of the Relationship like this is not without its problems. It doesn’t give us Tell Us Once (about changes of circumstances). Far from it—it deliberately compartmentalises an interaction so that just those bits which need to be proved, get proved. The eventual models of relationship that emerge are still being determined, I think—with relationships emerging that vary from entirely anonymous (though verified where needed) to increasingly rich with personal information. Maybe there is a Tell Us Once-type account at the bottom of the well, but I seriously doubt it. It’s all going to be a long and tricky journey.

I might bring back the Greek and his pet for a play with a trusted identity framework later. Might. This is heavy going ;)

The Nature of the Relationship, part 1

This is where the going gets tougher. The previous post here was about the different things we use to bodge our way around the minor inconvenience that you can’t actually prove anything about identity with absolute certainty (and it’s all even harder on the Internet). Accepting that we’re all just a collection of risks and uncertainties to be managed, and that we’ve got quite a few tricks (good and bad) at our disposal for doing so, we move towards an even knottier problem.

But to help us do this, let’s bring back Tortoise. Who is on a bit of a mission.

Tortoise: Achilles, Achilles—I’ve been reading all this waffly crap about online identity and I just want to get on with things.

Achilles: How so?

Tortoise: Screw it, fella. I just want my unique identifier now, please. I’ve got nothing to hide. I’m volunteering for you to strap everything you like on to me. Tie it to my old shell, big boy.

Achilles: You sure? Well, if it’s to make a bloody good point about what happens if you do—for the purposes of illustration—I’m game. You want it public or private?

Tortoise: How do you mean?

A: Do you want your identifier to be kept a secret that only you know about, or do you want it splashed everywhere in public?

T: Well, secret, I guess? Is it really a straight choice like that?

A: I’m afraid so. What type of things were you hoping to use it for?

T: Well, to log on to my local council services, naturally. And to see my health record. And to book a driving test. And to pay my taxes. And, and, and…

A: And you reckon that using this fiddly little string of numbers in what already adds up to hundreds of systems from that little list you’ve just given me means that number can be kept…secret? [raises a well-groomed Grecian eyebrow]

T: Fair point. So at some point I have to be ok with the fact that an abandoned hard disk…but surely encryption and good local security management policy will take care of that?…oh, wait, yeah, I see…I have to be ok with the fact that a big list of unique identifiers is going to wind up on Wikileaks or something like that eventually?

A: You do.

T: OK. I accept. I’m ok with that. After all, it’s just a string of fiddly little numbers. It’s not about me, the actual Tortoise that is me. Oh, or is it?

A: What do you think?

T: Well, I don’t really know. It could be. Or it might not be. If it isn’t, then is it really that much use? And if it is, I have this creeping feeling you’re about to show me cracking ice and swooping vultures. Hell’s bells, this has gone and got difficult already, hasn’t it? Why does this always happen? What’s the right answer, Achilles?

A: I guess it depends on whether you want to be identified as you, the real Tortoise, in all these transactions. And you do, don’t you? You have nothing to hide, remember?

T: Sure. But doesn’t that mean…oh I see what you’ve done, you clever bugger. You’ve let me neatly draw out the conclusion that the actual identifier is no great shakes, it’s what it’s attached to that really matters.

A: Quite. And as you’ve said that you’ve got nothing to hide, let’s take your Tortoise Insurance Number (TINO) and from henceforth make it the only identifier about you to be used anywhere in government. After all, lots of people keep banging on about how that must be the long-overdue common-sense solution to all this identity uncertainty. £1,500 please.

T: What? You’re going to charge me for giving me a number you’ve already given me?

A: No, don’t be absurd. This is just a one-off charge for all the migration work.

T: Migration?

A: Changing every single existing government system so they all sing and dance and recognise you off of this here TINO.

T: [Gulps] Is that strictly necessary?

A: Well, perhaps not. We could build some elaborate middleware and interfaces and yada yada yada. Might be a bit shonky and fall over from time to time. Or scramble your records with someone else’s. But you’re ok with that aren’t you. £1,500, remember?

T: It just all seems so expensive.

A: That’s because this is the real world, old son. I know when you were just out of the shell, you used to line up all the other tortoises and make up your own Little Tortoise Club stuff, giving everyone a secret name and a password?

T: Bloody hell – so I did. How did you know?

A: We all did. And it worked, didn’t it? You kept pretty strict records of, oh, a whole 10 individuals. Nothing leaked, nothing got mixed up, and it was all beautifully administered. And you used that as a mental model in your horny wee head of how identities and secrets and all that might work in the big world. But you know what, dear little chap? You were utterly wrong. This is a world of baddies, of fraudsters, of the incompetent and the helpless, of the excluded and the disabled. It’s a world of error, of approximation, of faults and mistakes. Lots of gritty reality that, if I’m honest, tends to bugger up enterprise-scale secrecyidentitysecurity systems faster than we can actually squeeze benefits out of them.

T: Lawks! Have you finished?

A: Yeah. But then I start again, and spend another £100m repeating all the mistakes I made last time. Just using a different firm of consultants. Boom boom!

T: So, to recap, I’ll be able to use my TINO wherever I like, accepting that at some point the relationship between it and me will come into the open somewhere, and that it provides a handy hook for anyone, anywhere, with or without me knowing, to hang whatever facts, associations or other metadata they like on me—which may be used against my interests to sell me stuff, compromise me or do loads of other bad things? And that I’ll be reliant on a panoply of passwords and other tokens to associate with my TINO to unlock the various doors that need unlocking in such a way that losing one of them doesn’t give the bad guys control of my entire life, but at the same time, a panoply that I will find easily manageable? I don’t see how that’s possible.

A: S’ok, my shelled friend. You have nothing to hide, remember?

T: I’m really not liking this much at all now. Is there an alternative to my ill-thought-through quick and dirty answer?

A: Why yes, there is. But we’ve just gone over 1,000 words, and according to the rules, that means waiting for the next post.

T: Oh, cloacas.

Who are you again?

This online identity stuff is very difficult—as I’ve written here before: much harder to truly grasp than it should be, in a peculiar way. I think that one of the reasons is that there are really two, logically separate things going on. Unless one puts a bit of mental legwork into understanding them—well, almost philosophically—all that follows in terms of technical solutions and so on can be irrelevant, at best.

So, those two parts: 1. how do you “prove” you are who you say you are? and 2. (the bit that’s perhaps harder to encapsulate) what is the relationship model that’s constructed when such a “proof” transaction takes place?

Let me try it another way: (1) what are you trying to prove and how do you go about that? and (2) what are the consequences of you having done that “proving”?

I hope to make some progress in illustrating why they’re quite different, but both very, very important. The first of those two parts—the “what and how you prove” bit—is the subject of this post. Probably because it’s the easier of the two. Though still complicated.

You never really prove anything, of course. If we are going to get into the business of cutting people open to extract a bit of DNA from their very bones and analysing it against some sort of uber-register of genome sequences…yeah, yeah, yeah. But we’re not. So stop being silly. (And they might have implanted somebody else’s bones, anyway. Ok, that’s silly. Or is it? Let’s move on. You see the point: every obstacle is just another challenge.)

What we do instead is use a number of arbitrary proxies for identity: tokens that either alone or in combination give a certain sense of assurance that their presenter is who they claim to be. The passport is a common (and relatively strong) example. There’s the photoID (with a government issued driving licence being rather more trusted than a cheaply-laminated snooker club membership card). There’s the infamous utility bill—which has the benefit of also fixing the presenter to a physical location of residence. You get the picture. Sometimes the detail is checked against something else, sometimes it’s recorded, and sometimes it’s not checked in any meaningful way, but the request itself is enough to dissuade naughtiness.

Because, for most of the transactions one carries out with government (central, local, police, whatever) checks like this are pretty damn important. (At least they are perceived to be, anyway, certainly in comparison to some private sector transactions. Compare the following headlines: “x% of cardholder-not-present credit card transactions are fraudulent, costing £Ybn per year” with “x% of online benefits claims are fraudulent, costing £Ybn per year”. Which one will have the nation frothing that Something Must Be Done? But that’s for another post…)

The guys at the gate of Caterham tip ask for a utility bill to confirm that you’re allowed to dump there. (Well, only when it’s busy, it seems.) To them, a location is the only important fact that’s been asserted—who I am, or indeed whether that utility bill matches anything else about me or my car, are unimportant. At the supermarket checkout, the young-looking booze buyer will only be troubled for something featuring a date of birth, and so on.

The tokens we use to give that degree of proof don’t have to be physical bits of paper, of course. We can memorise PIN numbers, or be asked for known facts about our previous transactions which only we’d be likely to know the answers to. We can set up “shared secrets” in advance so that only we will know the answer when challenged by our remote interlocutor.

We can have combinations of things used together—to see my bank statements online I now have to put my bank card into a reader the bank have sent me, pass a challenge, and then enter a result online. Sure, if you have my card, my reader, know my PIN and at the same time can open a session of my online banking you are me, at least as far as my bank is concerned. But that’s a lot of hardware and effort, and reasonably proportionate to the stakes involved, I’d say. We talk of “something you have and something you know” as a basic type of multi-factor authentication, or “something you have, something you know and something you are” if we add in a biometric component.

You see the point?—there isn’t really any proving going on. Just an exchange of information that gives a certain level of assurance, upon which trust can then be built. Sometimes it’s done well. And sometimes it’s not. Sometimes the requests for “proof” information are proportionate to the task being undertaken. And sometimes they’re not. But the request/risk relationship is likely to be quite specific to the task being attempted.

You’ll notice that I freely used offline examples above, when normally I bang on about how hard all this is in the online world. Well, the concepts are the same. It’s just that there are some characteristics of online channels that tilt the tables of risk. The lack of a face-to-face element removes some of the visual cues we might use to strengthen trust in a claimed identity. But this applies to the phone as well (how many times have I assumed the guise of “Mrs-C-with-a-cold” to try and sort out a minor squabble with a utility company?).

No, what makes things really very different in the online channel are those two old favourites: accessibility and recordability. The friction of having to find a benefits office, queue up, and try it on with the clerk by wearing a false moustache all disappears. You can be fast, anonymous and massively multi-tasked, using tools to try thousands of entry points and potential tokens simultaneously.

And what you do undertake, successfully or unsuccessfully, creates a record—leading to all sorts of other consequences—something that doesn’t happen when a guy in a fluorescent jacket glances at your water bill. Nobody writes anything down in lots of offline transactions—that’s important. Or captures and indexes it, for example, on video. (The indexing bit matters, by the way…but that’s taking us into the next area: the Nature of the Relationship.)

Oh, and I fear there’s one other powerful reason why this is so challenging for those who “think digitally”—a digital relationship is generally conceived as one of certainty—the bits match the requirement, ergo the door is unlocked; whereas everything above is an assembly of probabilities, seeing people less as people but as a collection of analogue risks, in a context where “good intent” and “assurance” are just shades of grey. No wonder we experience some cognitive dissonance in this area.

If you’re now drowning in a sea of uncertainty and looking lovingly back at that idea of sawing people open and extracting an inarguable(?) DNA sequence—congratulations. This is a highly normal response. Rushing back to a “unique identifier” to solve everything is pretty common. Engadget managed to do that neatly in their headline yesterday on the latest moves in US federal identity assurance—even though the source material talks about something rather different—a distributed identity framework. I’ll cover this, and the fallacy of the “unique ID” as a solution, in the next post: this dark business of the relationship that’s created as a result of digital transactions.

I might need my Greek hero and his friendly chelonian to help with that one. This stuff is not easy.

But what helps me sometimes, when thinking about this topic, is that this is a game you can play at home. Sort of. Every time you exchange anything about you (whether that involves your facial features, your money, or information about you) with anyone, anyone at all, online or offline, think about what’s actually being exchanged, why, and what the consequences could be. Try witholding everything except what turns out to be absolutely essential. Lie, subvert, play (within reason). It’s going to be useful to hone this awareness and these skills, I suspect.

Now read on…

Petitions and democracy

Tortoise: Y’know Achilles, when we were last talking about this identity business we got into all sorts of hot water very quickly in trying to find ways to use a definitive identity to do governmenty things on the Internet. But I’ve found a brilliant use for one this morning!

Achilles: Really? What’s that then?

Tortoise: Well – this new idea to transform our democratic participation by cutting a swathe through centuries of saggy old unsexy representative democracy and allowing us, through the power of the Interwebs, to have our say directly about what does and doesn’t get gamed into the Parliamentary timetable.

Achilles: Gamed?

Tortoise: I mean, debated. Sorry. We haven’t got to that bit yet, have we?

Achilles: And it’s also a great excuse for some cheap headlines about the X-factor, isn’t it?

T: Naturally.

A: So what have you read?

T: That this new petitiony thing is coming in and it will let you band together in a free and open way and get really popular people’s choices some proper Parliamentary time.

A: And will this change anything?

T: Dunno. But giving the important stuff some proper Parliamentary time has got to be a good thing in itself, hasn’t it? Especially stuff which is bound to be based on issues that get people to join their voices together, really quickly, using the Internet? Oh…

A: Indeed. But you mentioned something about identity?

T: Yeah. But aren’t you meant to be the personification of the State in these dialogues?

A: I am. Sorry. That’s what happens when you start to mess around with the model of who really holds the power, hey? Just my little joke. Sorry.

T: Accepted.

A: So. Tortoise. I have realised that with this direct democracy business it’s pretty important that we only hear from those from whom we should hear. If you get my drift. So, if you’re not on the electoral roll, I’m sorry, your voice has no place here.

T: Couldn’t agree more.

A: So, are you on the electoral roll then?

T: Is that it? Is that the test – you ask me, and I say I am, and then my voice gets heard? Is that all?

A: It’s what happens when you vote in a polling station, pretty much. There’s nothing by way of a very rigorous identity check, is there? Got a little piece of card, you vote. Not got one, you say your name, my guys check it’s on a big paper list, you vote. What’s the difference?

T: Have you heard of channel friction, Achilles?

A: Yes, I had a touch of that when Agamemnon stuck his javelin… What do you mean, Tortoise?

T: Well, it’s a bit weak to say that just because something works one way in the physical world then its online analogue must be just the same. There’s a certain amount of bother involved in diddling votes down the polling station. You have to queue up, you might see someone who knows you and says “Hi Tortoise!” just as you’re squeaking “I’m Mr Mouse” to the teller, and you can only get away with it once in the same place or you’re really asking for trouble. That all takes time and effort. Think of it as a kind of ‘friction’ associated with the physical voting approach that sort of acts as a check on all the other bad things that might happen. It’s not perfect, but it’s worked just about well enough for quite a while now.

A: Whereas the Internet is very much a frictionless channel, isn’t it? Hmm. It would seem, Tortoise, that those who want to create mischief or subvert the democratic process can do so easily, at great speed, in great fictitious numbers and all without having to leave their bedroom and feign an honest face to the bobby looming at the school doorway. Yes, I see your point.

T: You’re getting there…

A: We’d better stiffen it up then. I need, Tortoise, for you to prove, online, that you are the same Tortoise who is on my electoral roll. Otherwise this whole petitiony thing is quickly going to descend into discredited chaos. (If I’m not to quietly drop the bit about electoral roll verification, that is, hem hem.)

T: And how are you going to do that then?

A: Well, I tell you what – I’ll build this massive database which has a unique identifier associated with every person who appears on the electoral roll, and then I will, having verified through the physical examination of something like your passport, securely give you that identifier and some associated credentials…oh bollocks. We’re here again, aren’t we?

T: I’m afraid so.

A: And we haven’t even got to the bit where any attempt at online democratic participation is going to be holed below the waterline morally, and possibly legally, when so much of our population doesn’t have decent Internet access anyway?

T: I’m glad you got there before Cyberdoyle did.

A: Quite. One for a future conversation?

T: With pleasure.