honestlyreal

Icon

The Accidental Data Controller

It happened a few months back.

Facebook (that hideous, grunt-cheering, dumb-arse cesspit of a privacy clusterfuck–but let me try and remain objective) started to put some rather strange suggestions for new “friends” up on the top right. People who weren’t unknown to me, exactly, but whose electronic link to me could only have been derived in one way.

From email addresses.

These were people who I may once, ever, have emailed. Or who had emailed me, maybe just the once.

And this latter angle got me worried.

Because I know I have never, ever pressed that “find my friends by pillaging my address book” button. Not in Facebook; not in any other service.

And anyway, some of those names weren’t in my address book anyway. But mine must have been in someone else’s… And I started to whiff a potentially horrible thing. However, this being Facebook, and Facebook being full of horrible things, I tucked it into a mental back drawer and let it go. That time.

Then, last week, I got another email invite to some new whizzy networking service. The invite came from someone I’ve got a lot of time for, so I figured there’d be no harm in signing up and having a quick look around.

The first thing I was greeted with on entering the new service was the message: “Ah – it looks like you already know Rich D—-; why don’t you connect to him on here?”

And that, dear reader, brought the whole sorry mess tumbling out of that back drawer in my head.

This service was entirely greenfield territory to me. I had shared absolutely nothing with it, other than my name and email address (by virtue of using it as the basis of my registration).

So the only way this matching could have occurred would be if Rich had clicked on the “Pillage Me!” button, and passed his entire address book to the new service, there to be held in limbo until such time as happy little matches like me popped up to trigger this unwelcome welcome.

I know I’ve agonised on this blog before about what makes personal data personal. About how uniqueness, utility and linkability all have a big bearing on just how “personal” a piece of data is (and how much we should therefore be bothered by its loss or misappropriation).

Just having one bit of data floating about would be concerning enough, but–and this is a big but: what if that address book pillaging also took not just the raw email address itself, but also the associated name (or indeed any other fields)?

Anon@freetibetbyforce.com may just be an address to a dead-drop online account, but if it’s ever been associated with a real name, manually entered, in someone’s address book…(you see where I’m going here?)…the consequences could be pretty horrendous. Obviously this is an extreme example–but it makes the point–third parties are sharing your email address and perhaps related personal data in vast quantities, without really realising they are doing so, with services that hold it…where? how securely? for how long? IN ORDER TO MATCH YOU UP ON SOME LAME SKILLS NETWORK SITE?

When companies first started this sort of indiscriminate hoarding and sharing of personal data, we created the Data Protection Act as a countermeasure. Clearly, it’s getting hopelessly out of date and was never designed for this sort of scenario.

But humour me, and assume we should still adhere to its principles.

That would mean that you, me, anyone with an address book, could (or should?) be required to register as a Data Controller–mindful of the fact that our own address books have powerful, valuable content and with one click we become complicit in a process that spreads it way beyond the bounds of any purpose we could sensibly be said to have consented to.

I think this is hugely important, as no matter how careful we are with our own information, we are entirely reliant on the caution of others not to compromise it.

It’s an interesting one. Exam question for the Information Commissioner’s Office then: how big does your address book have to be before you need to register it under the Data Protection Act?

About that Data Protection myth

If you follow me on Twitter you might have spotted a recent exchange of views over the last few days with Vodafone. They do a fair job, it has to be said, of engaging in that channel. I’m not sure how joined-up or consistent it is with their other channels, but at least it’s nice to be able to ask a question and get a sort-of-answer.

My question stemmed from a curious experience when trying to contact the Vodafons via their website. They’ve taken the “use our webform, not an email address” approach. And to use the webform, I have to be logged in to the Vodasite using what I consider to be fairly strong credentials: i.e. to register on the site in the first place I had to have the physical phone to hand so that an SMS could be received and a time-limited security code typed in (as well as account details and so on)–you get the picture, nice use of a reasonably secure channel to confirm who I am. [See update below: the same web form is available even if you’re not logged in, going some way to explaining the subsequent requests for further information by email.]

I’m also required, during registration, to supply an email address. In this case, the same one as I then supplied on their webform for further contact.

So having duly completed and sent off my webform, I was surprised to receive the following email two days later [extract, verbatim]:

At Vodafone, we are very particular about the security of every customer’s account to ensure that account specific information is not being shared with a non-account holder.

For me to access your phone account and provide you the account information, please provide me below mentioned security details:

– First Line of Address with Postcode
– Date of Birth
– Payment method
– Account number

Now this seems like an awful lot of personal data to be supplying simply to “prove” that the email address which sits in my securely-registered account is actually mine. Doesn’t it? Is it just me?

And being a bit twitchy about personal data exchange, especially via a channel as insecure as unencrypted email, I take it up with them. And via Twitter, I get that old favourite answer for this odd request: “…because of Data Protection” — and later “…in order to pass Data Protection”.

It’s worth reminding ourselves at this point what the Data Protection Act actually says and does. It’s built around eight fundamental principles which are all fair and reasonable provisions like “you must have consent from someone for the purpose for which you want to hold and process their data”. That sort of thing.

Principle number seven is an interesting one: it requires the company holding personal information to have adequate measures in place to protect it.

And here’s where this particular Data Protection myth arises. A company will often say “Data Protection makes us…” when what they mean is: “in order to mitigate the risk of bad things happening with your data, we’ve decided to implement some internal procedures which we think do the job”.

See the difference?

Let’s just scrutinise what’s happening here: I am being asked to provide personal information via an insecure channel to validate identical information that’s held within an account already held by them, which was created in a more secure channel.

And the company have the brass neck to tell me that “Data Protection” is making them do this?

Frankly, how well or badly they choose to implement their own processes is up to them. Up until the point at which their customers think they’re just so awful that they move to another service provider. That’s the free market; and perhaps this sort of oddness isn’t so whingeworthy.

But what’s made this into a blog post, and something I will be following up with the Information Commissioner’s Office, is this lazy use of tired, old mythspeak to try and present a poorly-designed, internal attempt at risk mitigation as something that the nasty old government has forced them to do.

(I’ve asked for a contact in Vodafone’s Data Protection team to explore this further, but haven’t received one at the time of writing.)

UPDATE: 2100, 17 Oct

Well, Vodafone certainly got engaged (at an accelerated pace once I’d posted this, and it had had a bit of RT love). Tweets, the address for the Data Protection team, and finally a very friendly phone call. Nice work. So it turns out I made an inaccurate assumption in the post above, which puts a different cast on some of the story, but raises other questions. You don’t have to be logged in to the site to use the “contact us” web form. In fact, whether you’re logged in or not (I happened to be), the web form simply has the function of sending an email to Vodafone, to which they will then respond via “standard” email. One might ask why they don’t just provide an email address: I suppose they avoid some spam this way, but you also lose the benefit of being able to see what you reported in your sent items… Swings and roundabouts.

More serious though is that much is made of the web form being secure (https). A level of comfort which is then utterly undermined by the subsequent request for that personal information to be sent back to them in clear email. I offered some alternative approaches, including taking advantage of the ability to log in securely in order to establish a much smoother, and less risky, communication channel. And a few pointers on copywriting to ensure that users don’t get the sort of surprise I did at being asked to email a bunch of personal data back at them.

It makes a certain, convoluted sense that they then have to ask these personal information questions in order to satisfy their Principle Seven obligations, but only because they’ve paid insufficient attention to contact design in the first place. I noted that in all the online transactions I’ve used (and that’s quite a lot) some of them involving rather bigger lumps of money, or data of greater sensitivity, than a phone account, I’d never been asked to provide information in clear like this. And that by itself should be a clue that all was not as it should be. The combination of address, date of birth, and an account number provides a malefactor with a heck of a headstart in further social engineering, and there’s really no excuse for asking it to be passed over like that.

We’ll see what changes.

Getting personal

For a long time, I’ve shied away from writing here about personal data. Or even thinking that deeply about it. The nature of identity, yes. The usefulnesss of data, yes. Personal data, no. Why?

Not because it isn’t fascinating, or important. Mainly because it’s so…damn…nebulous. And difficult. Time to get over that, I think. Very significant things are happening in this area, and we all need to raise our game in how we understand and engage with the concepts involved.

As I’ve surmised before, the only things that are really different in the Internet age are the ease with which information can be found, and the ease with which it can be stored.

Two things, really. That’s all.

The first embraces everything around indexing, cross-referencing, labelling, structure and searching. The latter takes us into the territory of copying (and of course copyright), archiving, and the general issue of persistence.

And when we look at personal data in that context, there is an immediacy–and potential toxicity–in what emerges.

We saw early rumblings of this long before the Internet, of course, when computers were first used for the mass processing of information about people. Things could be done with databases that simply weren’t possible with big paper ledgers.

We created Data Protection legislation which attempted to put reins on the ability to make free use of some types of information. Gathering stuff about people, from the basic facts of who and where, to how to contact them, who they were connected to, and what their tastes and preferences were. Pure gold, used in the right (or wrong) ways.

Data Protection set out some pretty sound, but general, principles. The overarching one being that the purpose to which data could be put should always be made clear to whoever provides it, at the time of providing. Lots of other stuff about processing, storage, where and how long, and so forth–but that issue of consent always seemed the most important, to me.

And we scratched about a bit to actually try and define what we meant by “personal data”. Some things were easy. Names. Addresses and phone numbers. They’re just obvious.

But what about our tastes? Our buying history? The movements of our mobile phone from cell to cell? A journey we took? As one takes informational side-steps away from the individual, the obviousness diminishes, but if you can make meaningful connections back to the person…

…and remember the first thing that the Internet really changes?

Being able to make those tenuous links between blocks of information into something really substantive.

And the second thing? That information and those links are now permanent. You can’t delete them, once they’re there.

All those things that databases couldn’t previously do, because they all conformed to different standards, and weren’t connected together? They can now. Things can be done via the Internet that simply weren’t possible with just the databases.

Bit by bit, it’s been possible to build up the most humongous repositories about people. Maybe entirely within the law, maybe in other ways as well. Maybe with explicit and informed consent all the way down the line. And maybe not.

Who’s to know? We find strange things going on with data that we provide in order to use one or other service–or even to exercise our democratic rights. Didn’t it ever strike you as slightly weird that the electoral roll could be sold on for commercial purposes? (Much more on the electoral roll in another post coming soon.Update: now here)

We have big companies that have built successful businesses just like this: perhaps using aggregated personal information for credit referencing, perhaps to sell to marketeers to give them a better understanding of demographics.

The genie is very much out of the bottle. Your rights to see the information that a particular company holds on you may exist, but you have to have a fair idea of which company to ask in the first place. Can you ever see the full picture of what others know about you?

Of course not.

And it’s unreasonable to suggest that we’ll ever be able to do that. Instances of data multiply more rapidly than does our capability to track them. (There must be a Law of Internet Entropy out there that says something like that. If not, I just invented one.)

(As an aside, a dear friend once uttered the memorable line “somewhere out there, there’s a database with your dick size on it”. That was in 1989.)

So what can we do?

Realistically, all that’s available to us are firebreaks and friction.

We can’t get that genie back in the bottle, but we can slow it down a bit, and find ways to mitigate the impacts.

Do we need an updated definition of personal data? It’s MUCH harder than it seems at first glance to create one. The best I can find at the moment in terms of an “official” position is here.

And it’s clumsier than you think. Essentially, it’s a list of ever-widening filters that assess whether a particular piece of information can be connected to a specific individual. Culminating in the rather wonderful catch-all of the final category:

8. Does the data impact or have the potential to impact on an individual, whether in a personal, family, business or professional capacity?

Yes The data is ‘personal data’ for the purposes of the DPA.
No The data is unlikely to be ‘personal data’.

Even though the data is not usually processed by the data controller to provide information about an individual, if there is a reasonable chance that the data will be processed for that purpose, the data will be personal data.

That’s pretty general, no? In fact, going by that, an awful lot of things are now personal data. I really like the emphasis it puts on the outcome of the data use, not attempting to over-define things like form and structure.

I’d go as far to say we should probably throw away that big long document, and just run with this definition:

Personal data is information that affects you when it’s used. Either directly, or through being linked to other information using technologies that exist now, or may exist in the future.

Broad enough? ;)

(So my beloved photos: they’re personal data. I take them with a camera that has a unique number, held in metadata in the picture file. That provides a way to link all the pictures it takes together, and then, through the various accounts I put them in online, back to me. Think how many other trails you leave…)

But again, all we really have are firebreaks, and friction. There’s a sort of reverse entropy at work. Unlike almost every other instance of entropy–where things get more chaotic over time (china plates get broken, they never put themselves back together again)–personal information is, relentlessly, only going to get more linked. More aggregated. More pervasive. More permanent.

(So, maybe I just invented The Law of Reverse Internet Entropy as well? Not bad going for one post…)

And if someone tells you that big blocks of personal data can be “de-anonymised”, be very sceptical indeed. (You can read some wise thoughts on the issues involved here and elsewhere on that blog.)

We can undertake some pretty noble fire-breaking: like ensuring the state doesn’t become the source of a global universal identifier for you. And we will certainly see more developments around multiple personas: compartments of your life associated with particular tasks, contexts, or connections. I think we’ll have to. (The concept of federated identity helps here, but that’s too much to go into for this post. Read more thoughts from the team working up these concepts for government.)

And we’ll adjust. Society has seen some pretty dramatic upheavals. Often associated with a new technology, or philosophy. If we adjust our societal norms faster than the upheaval, we don’t notice. If we’re slower to change, it’s painful. For a bit.

But we get through. We adapt. And we change. Always.