honestlyreal

Icon

About that Data Protection myth

If you follow me on Twitter you might have spotted a recent exchange of views over the last few days with Vodafone. They do a fair job, it has to be said, of engaging in that channel. I’m not sure how joined-up or consistent it is with their other channels, but at least it’s nice to be able to ask a question and get a sort-of-answer.

My question stemmed from a curious experience when trying to contact the Vodafons via their website. They’ve taken the “use our webform, not an email address” approach. And to use the webform, I have to be logged in to the Vodasite using what I consider to be fairly strong credentials: i.e. to register on the site in the first place I had to have the physical phone to hand so that an SMS could be received and a time-limited security code typed in (as well as account details and so on)–you get the picture, nice use of a reasonably secure channel to confirm who I am. [See update below: the same web form is available even if you're not logged in, going some way to explaining the subsequent requests for further information by email.]

I’m also required, during registration, to supply an email address. In this case, the same one as I then supplied on their webform for further contact.

So having duly completed and sent off my webform, I was surprised to receive the following email two days later [extract, verbatim]:

At Vodafone, we are very particular about the security of every customer’s account to ensure that account specific information is not being shared with a non-account holder.

For me to access your phone account and provide you the account information, please provide me below mentioned security details:

- First Line of Address with Postcode
- Date of Birth
- Payment method
- Account number

Now this seems like an awful lot of personal data to be supplying simply to “prove” that the email address which sits in my securely-registered account is actually mine. Doesn’t it? Is it just me?

And being a bit twitchy about personal data exchange, especially via a channel as insecure as unencrypted email, I take it up with them. And via Twitter, I get that old favourite answer for this odd request: “…because of Data Protection” — and later “…in order to pass Data Protection”.

It’s worth reminding ourselves at this point what the Data Protection Act actually says and does. It’s built around eight fundamental principles which are all fair and reasonable provisions like “you must have consent from someone for the purpose for which you want to hold and process their data”. That sort of thing.

Principle number seven is an interesting one: it requires the company holding personal information to have adequate measures in place to protect it.

And here’s where this particular Data Protection myth arises. A company will often say “Data Protection makes us…” when what they mean is: “in order to mitigate the risk of bad things happening with your data, we’ve decided to implement some internal procedures which we think do the job”.

See the difference?

Let’s just scrutinise what’s happening here: I am being asked to provide personal information via an insecure channel to validate identical information that’s held within an account already held by them, which was created in a more secure channel.

And the company have the brass neck to tell me that “Data Protection” is making them do this?

Frankly, how well or badly they choose to implement their own processes is up to them. Up until the point at which their customers think they’re just so awful that they move to another service provider. That’s the free market; and perhaps this sort of oddness isn’t so whingeworthy.

But what’s made this into a blog post, and something I will be following up with the Information Commissioner’s Office, is this lazy use of tired, old mythspeak to try and present a poorly-designed, internal attempt at risk mitigation as something that the nasty old government has forced them to do.

(I’ve asked for a contact in Vodafone’s Data Protection team to explore this further, but haven’t received one at the time of writing.)

UPDATE: 2100, 17 Oct

Well, Vodafone certainly got engaged (at an accelerated pace once I’d posted this, and it had had a bit of RT love). Tweets, the address for the Data Protection team, and finally a very friendly phone call. Nice work. So it turns out I made an inaccurate assumption in the post above, which puts a different cast on some of the story, but raises other questions. You don’t have to be logged in to the site to use the “contact us” web form. In fact, whether you’re logged in or not (I happened to be), the web form simply has the function of sending an email to Vodafone, to which they will then respond via “standard” email. One might ask why they don’t just provide an email address: I suppose they avoid some spam this way, but you also lose the benefit of being able to see what you reported in your sent items… Swings and roundabouts.

More serious though is that much is made of the web form being secure (https). A level of comfort which is then utterly undermined by the subsequent request for that personal information to be sent back to them in clear email. I offered some alternative approaches, including taking advantage of the ability to log in securely in order to establish a much smoother, and less risky, communication channel. And a few pointers on copywriting to ensure that users don’t get the sort of surprise I did at being asked to email a bunch of personal data back at them.

It makes a certain, convoluted sense that they then have to ask these personal information questions in order to satisfy their Principle Seven obligations, but only because they’ve paid insufficient attention to contact design in the first place. I noted that in all the online transactions I’ve used (and that’s quite a lot) some of them involving rather bigger lumps of money, or data of greater sensitivity, than a phone account, I’d never been asked to provide information in clear like this. And that by itself should be a clue that all was not as it should be. The combination of address, date of birth, and an account number provides a malefactor with a heck of a headstart in further social engineering, and there’s really no excuse for asking it to be passed over like that.

We’ll see what changes.

On communal grief

We’re all entitled to our own reaction.

To catastrophe, to unexpected joy, to death.

When people get very involved in the death of someone they didn’t know, I am slightly puzzled. Of course it is their right. But all behaviours meet a need. And I’m a little baffled as to what need is actually being met in these cases.

Anyway, no sermon. Do what meets your needs, and respect that others may meet theirs by not feeling any urge to join you. And that’s fine too.

A small anecdote:

A dozen or so years ago, a close relative by marriage was hit by a speeding police car in Croydon town centre. She died a day or so later from her severe head injuries. A tall, beautiful girl, 17, with her place at Cambridge secured. Devastating.

We drove past the scene a couple of days after it happened (I’m not sure if it was by chance, or that sort of “by chance” that is actually quite intentional.)

A few tragic bundles of flowers were taped to the lamp post across the road from the library. Small, bedraggled cards from schoolfriends. Very moving.

A few days later, we passed by again. This time, a mountain of flowers were there. We never realised she was so popular. I stopped and got out to look at the first one.

“Dear Diana, you will forever be in our hearts”.

And so it went on, right down the pile.

It was early September, 1997. The good people of Croydon had clearly been struggling to know where the “official” flower-laying place was, until this little scattering of children’s tributes appeared.

Alice gave them that, at least.

Neither one thing nor the other

In which I look more closely at one particular, well-known data set: what makes it what it is, and what we might draw from the way it’s managed to help us with some other challenging questions about privacy and transparency.

Surely data is open, or it isn’t?

(I’m using “open” here as shorthand for the ability to be reached and reused, not with any particular commercial or licensing gloss. It’s a loaded term. But let’s not snag on it at the beginning, hey?)

Data is either out there, on the internet, without encryption or paywall, or it isn’t. And if it is, then that’s that. Anyone can reach it, rearrange it or republish it, restrained or hampered only by such man-made contrivances as copyright and data protection laws.

Maybe. Maybe not.

I’ve been involved in some interesting discussions recently about the tricky issues surrounding the publication of personal data. By that, I mean data which identifies individuals. To be specific: some of the information in the criminal justice sector about court hearings, convictions and the like.

You’ll have seen much in the press, especially following the riots, about a renewed political and societal interest in this type of publication.

Without making this post all about the detailed nuances of those questions, this broader issue about the implications of “open” publication seems to me to need a bit more exploration before we can sensibly make judgements about such cases.

And to do that I took a close look at one very well-known data set: the electoral register.

What is it? Well, it’s a register of those who’ve expressed their entitlement, being over 18 (or about to be) and otherwise eligible, to vote in local and national elections, through returning a form sent to them by their council each year. If you’re reading this, you’re probably on it. I am.

It’s therefore not: a complete list of people in the UK (or even of those entitled to vote); a citizenship register; a census; a single, master database of everyone; accurate; or a distillation of lots of big government systems holding personal information.

What’s it for? An interesting question. I suppose its primary existence is to support the validation of those entitled to vote, at and around election time. But you’ll know, if you have voted, that it’s more of an afterthought to the actual process; most people show up with polling cards in hand, and anyway, there’d be no possibility of any real form of authentication, as the register doesn’t contain signatures, photos, privileged information or any other usable method of assurance. It’s not even concealed from view. (More on that here.)

But it does some other things, doesn’t it? It provides a means for political candidates to be able to make contact for canvassing purposes with their electorate. And I suppose, for that reason, it has this interesting status as a “public document”. Which we’ll come back to in a moment.

And to complete the picture, a subset of it (the “edited register”) is also sold to commercial organisations for marketing purposes, enabling them, amongst other things, to compile pretty comprehensive databases of people.

…and as a byproduct of that it also forms an important part of credit-checking processes–with said commercial organisations able to offer services, at a price, to anyone who wants to run a check that at least someone claiming to have X name has at some point claimed to live at Y address. (Remember, it’s all pretty weak information really, self-asserted with no comprehensive checking process.) You can opt out of the edited register if you choose, but you’re included by default.

[Update 2 Oct: Matthew, below, comments that I'm not quite right here--the full register is also available to be used for credit checking]

There’s probably more, but let’s get stuck into some of this.

First off, I will happily add that the whole business of why it needs to be public at all seems highly questionable. And I don’t remember the public debate where we all thought that it was a great idea to try and make a few quid off the back of this potentially highly-sensitive data? Do you? How do you feel about that?

And the idea that the process of democracy would be terminally hampered were candidates, agents and parties not able to make checklists of who’d been canvassed? Really? Couldn’t they perhaps just knock on doors anyway? As a potential representative would I only be willing to learn from encountering those who had a vote? I suggest not.

So, moving on past those knotty questions about “why do we have it, and why do we sell it?”, we have in practice established some conventions about managing it as “a public document”.

Can I, as a member of the public, request a copy be sent to me? Certainly not. Ok, perhaps I can download it then? Nope. Search it online? Hell no.

I can go and see it in my local library.

So I did.

I heartily recommend you do the same. It is a real eye-opener in terms of the idea of data being “semi-public”.

I trotted up to the (soon-to-be-closed [boo hiss]) information desk at the library under Westminster City Hall.

–Can I see the electoral register please?

–Sure. We only have the edited version here: if you want the whole thing, you have to go through there and ask for Electoral Services.

(He pointed at a forbidding and not-at-all-public-looking door).

–You’re ok, I’ll just have a look at this one

And out from the back window-ledge comes a battered green lever-arch file, containing bundles of papers.

–You know how to use this? he says

I shake my head. It seems the top bundle of papers is a street index. The personal information (names grouped by cohabitation, basically) is listed by street, then house name/number within street. Not by names.

So, you can’t, easily, find someone you’re stalking. (Did I say that? I mean, “whose democratic participative standing you have a legitimate interest in establishing.”)

But you can if you’re patient. Or if their name, like that of one Mr Portillo, leaps off the page at you. I intentionally chose the register of the area immediately around the Houses of Parliament, for just this reason.. Curiously, I couldn’t actually find the HoP itself listed, but Buckingham Palace does have over 50 registered voters (none of whom are called Windsor.)

But back to the process: as I picked up the box to head towards an empty desk a finger came down on the lid: –you have to read it here, he says.

I look at the lid. Wow.

I ask the question about photocopying anyway, just to judge the reaction. Kitten-killer, his eyes say.

But I take it a few paces away anyway and have a closer look.

Fascinating. I see a bunch of well-known people from industry and politics, their home addresses, and who else lives with them.

I’m sure I’ll go grey in chokey if I actually published unredacted screen shots in this post, but I’m pretty sure this one will be ok; if nothing else I think its historical interest justifies it… (RIP, Brian.)

Now, in all the fuss we make about child benefit claimant data being mislaid via CD, and in all the howling we make about anonymisation of health records and other sensitive data, and through all the fog that surrounds the commercialisation of public information and the Public Data Corporation etc. isn’t this sort of information that we would normally expect to be the subject of an enormous public debate about even its very existence? And I’m walking off the street and making notes of it, and, and…

And I can see what’s happening here.

Yes, it’s “public”. Sort of. But so much friction has been thrown in the way of the process–from the shirty look as I have the temerity to request it, to the deliberate choice over structure that minimises me being able to quickly find my target–that I would strongly argue it to be “semi-public” rather than public.

There are some important lessons here perhaps when considering the mode, and the consequence, of publishing data online. Clearly, structure is highly relevant. If I am able to sort, and index it, that instantly creates a whole universe of permanent, additional consequences. Not all of which may be that desirable. “A perpetual, searchable, SEO-friendly database of all those ever summoned to court, convicted or not, you say? Certainly sir…coming right up.”

If I’m able to relate information–by association with others–I can also help the cause of those wishing to track someone or something down. Look at Facebook. It does a great job of finding people you search for, even those with very common names amongst its hundreds of millions of accounts, by this type of associative referencing. Powerful stuff.

And let’s not forget that ALL this information is pretty easily available online anyway. You just have to pay for it. The best-known provider that I’ve looked at, 192.com, has an interesting model. You’ll be giving them at least a tenner, and more like £30 to buy some credits to search their databases. And they have the ominous rider that their really sexy information–the historic registers, is only available at an entry-level price of £150 a year. For that reason, I haven’t actually given them a penny as yet. But it’s no obstacle to the serious stalker. I mean, researcher.

I’m sure there are all sorts of impediments, from download limits to penalties for misuse, that attempt to put further spokes in the wheel of it becoming a common commodity. But how long, really, before the whole register is available as a torrent on the Pirate Bay? Maybe it is already?

And we’re not bothered about this? It’s amazing, isn’t it? Yes, this whole industry is built on data that we’re required to submit to public authorities–and if we don’t, we’re disenfranchised.

This is a scandal, and one that urgently needs review.

But do take away the point that there is such a concept as “semi-public” – at least for now. It’s the ability to process, to restructure, to index, that makes online data different from those box files in the library.

The friction we throw into the system, whether it’s (intentionally?) releasing information via pdf, or slipping a local journalist a hand-written note of the names of those in court, is perhaps more than just dumb intransigence in the face of “information that wants to be free”. And it can serve some potentially legitimate social purposes.

Think how you’d feel if those frictions weren’t there around the electoral roll? Even the money that 192.com require for you to buy back the data you gave up in the first place?

Happy that every comment you made online under your own name, every mention in the press, could be traced back to your real address along with the names of your (18+) family? I think perhaps not.

So, a very big public debate is required on the consequences of any personal data being put online. But remember, stealthily or not, we’ve had experience of these issues for years. We just need to look on the library window-ledges to find it.

Coming out

Nobody tells me how to think.

That’s important. A core value.

Influence me, by all means. Educate me as much as you can. Push me to see something from a different angle. Lend me your shoes and let me walk a few miles in them.

But don’t try and control me.

And that, in a nutshell, is a big problem I’ve had with organised politics. I wrote a smuggish piece last year about why I was oh-so-special–why I remained above and outside any formal political machinery–because…well, because of what, really? (I didn’t even post it on this blog, such was my trepidation about the subject.)

Maybe it has some parallels with religion. Having an (intermittent) sense of faith is one thing. Becoming a card-carrying, incense-swinging, habit-wearing adherent is quite another. Boundaries spring up. Positions are taken. There’s only so far you can go before those boundaries are hit.

In short, I wasn’t sure there was a church broad enough to fit me in. And I didn’t know how to react if I didn’t like parts of the sermon.

I had some interesting feedback from braver, political friends about that post. Was I really being honest about my reasons? Was I actually evading responsibility? Actively shunning ways in which I might make some difference? Thinking that politics was something that other people got involved in…what sort of stance was that?

And then I took a hard look at some of my own writing and thinking. How I would robustly challenge any cherry-picking of a particular bit of policy that wasn’t seen in its wider context… And I’m the one that’s been banging on about things being interconnected, and needing to be tackled as wholes, not parts.

And I looked around me. I realised that the party system, whether at local or national level, does a job. Not perfectly, of course (and I still don’t fully understand its relevance at local authority level, but that’s another post).

Nevertheless, it’s a huge part of how we make these things called society, and government, work. Whatever imperfections it may have, it’s there, and I wasn’t engaging with it.

So: a choice.

To stay on the sidelines hoping to shape things a little through acerbic blogposts and a few pointed questions in think-tank debates? Well, ok. But is that enough? I’m not sure.

Or, my other option: to give it a go, and pull my wagon up to the campfire.

And I looked a little harder at the current state of our democracy, and the way we’ve allowed politics to depart from the things I hold very dear: rationality, honesty, liberality, inclusion.

And putting all that together, I made my choice.

Some fears, of course: that I’ll lose friends, that I’ll lose respect, that I’ll lose work (I’ve traded on political neutrality to some extent, in my work on public information projects, and in the access that I get as a photographer). My decision may not be without some disadvantages.

And that dodgy sermon thing? What do you do when your friends are dicks? One of the perpetual dilemmas I’ve found in a networked world is the issue of tribalism. When a friend screws up, perhaps even conflicting with another friend, how do you react? How do you maintain your own integrity when the actions of others inevitably challenge it?

I may not accept, or even understand, a party line on everything. That’s a reality. The easy crutch that party membership presents–of having someone else’s opinion available, on a matter I haven’t properly researched for myself–is problematic.

However, I propose to put my energies into the things I really do know a bit about. The relationship between technology and society. What liberty will come to mean in a networked world. Access to democracy. Fairness. And a few more. There’s enough there to chew on without me feeling I have to take on the whole lot all at once.

So, what was my choice of party?

Easy, really. What all my experience and thinking leads to, time and time again, is the importance of the societal consequences of everything we do and permit.

Society? I mean people, really. Real people. Not the privileged, the articulate, the ones that some choose to populate the little fictional worlds they create in their heads.

No, the full, gritty reality of what it’s really like. And there’s only one party that has a hope of doing that, as far as I can see.

So I joined the Labour Party.

It’s not perfect. There are some, but not many, areas on which I find the accepted line challenging. But I propose to bring my energies to respond to that challenge: to debating and understanding from inside the tent. To helping in the areas in which I can, and learning in the areas that I can’t yet.

(By complete coincidence, as I was finishing this post, a friend tweeted me this link. It raised a wry smile.)

So, I’m absolutely thrilled to be heading to my first party conference tomorrow. As a member, not just an observer.

Bring. It. On.

Gadget envy

I shouldn’t rise to the Taxpayers’ Alliance. I really shouldn’t.

Ok, perhaps just this once.

We see outrage this week that a Council–a publicly-funded service commissioner and provider, mark you–has taken the desperate and profligate step of installing iPads into its bin lorries. You know, using technology to improve the way it does business, and communicates with its residents? Remember, like that Conservative Tech Manifesto said we should see more of?

How can it be, fulminate the TPA, and the local Conservative MP, that a toy, A TOY, is being used like this?

A pencil and paper would be better, surely, for recording information? Well, yes, if you are happy with delays and transcription errors. But it is possible, just possible, that cruder forms of non-digital recording have been tried and found wanting. Really.

I mean you could, possibly, conceivably, trust the people who’ve thought through better ways to improve their service. Who’ve had to justify every penny of expenditure to armies of auditors, scrutineers, members, and the general public. Who are making this investment decision in the face of dire cuts to other services, having to prioritise carefully. Because THAT IS THEIR JOB.

None of that thought appears. None. Because the iPad is a toy. We’ve seen similar things before, in the mindset with which mobile phones and broadband are considered.

We should sharpen our critical faculties here. If we’re going to moan about and amplify a story based just on the word “iPad” we need to be able to ask the more meaningful questions. How much will it save? Let’s see how those figures are derived. Let’s look at the old way of doing it–commission a hideous, unsupportable, proprietary bit of kit from an old-school hardware-cum-services vendor (which if you could ever unpick the morass of add-on charges would probably cost you north of £10k a unit for the hardware alone)–and see how it might just be more sensible to use a robust, usable commodity with little or no training overhead and a mature developer base.

I must be careful not to prejudge the business case, of course. It has to stack up by itself, not because some fanboy or blogger thinks one thing or another.

But I’d rather read the actual bloody document, than drivel like the reporting around it.

The opposite of outrage?

“Elderly war-hero imprisoned for six months for making a recording of an event.” Here’s the protest site.

There’s enough in there to trigger outrage on so many levels!

Where’s our respect for someone who fought for our freedoms? We thought proportional sentencing was going up the spout after the riots, but this? Isn’t this just like Kate Belgrave and her great work to bring public council meetings under greater scrutiny?

In short, you’d expect Twitter to have exploded.

That Messrs Fry, Linehan et al would have been besieged with appeals to amplify the story. That they may even have responded. You’d think some of those wingnuts who made such a blogtastic fuss about my namesake and his shotgun would be right on it. (You can Google all that for yourselves. It began with libertarian outrage but quickly crossed into mainstream.)

But there hasn’t been an explosion.

In that particularly awkward dance of observer and observation, this blog post will no doubt “raise awareness” and make a few more people notice Mr Scarth and his plight.

Anyway, straight away I’m looking for mainstream media coverage of this travesty of justice. And I’m not finding much. Results for “Norman Scarth jailed” give me a lot of blog posts, but that’s all.

Weird.

My attention is now very piqued.

I talk to @newsmary about it. She’s noticed this too.

And mindful of my recent post about negative dynamics in networks, I start to realise that the absence of “expected” outrage is, of itself, modifying how I feel about Mr Scarth’s case.

In short, it doesn’t smell right.

I found some other stuff too, about his previous brushes with the law. About a conviction for violence. About time spent in jail. (Google can be harshly unforgiving like that, no matter if the sentence has been served. More on this anon, in relation to my membership of the Sector Panel looking at transparency in respect of criminal justice data use and release by government.)

But I’m very eager not to judge the case one way or the other. It’s not my role, and there isn’t enough decent evidence available, even if I were so inclined.

What I am observing is that the lack of an expected response can by itself modify feelings about a situation.

Is this another sign of the conditioning we’re experiencing as our melding with the online world matures?

Getting personal

For a long time, I’ve shied away from writing here about personal data. Or even thinking that deeply about it. The nature of identity, yes. The usefulnesss of data, yes. Personal data, no. Why?

Not because it isn’t fascinating, or important. Mainly because it’s so…damn…nebulous. And difficult. Time to get over that, I think. Very significant things are happening in this area, and we all need to raise our game in how we understand and engage with the concepts involved.

As I’ve surmised before, the only things that are really different in the Internet age are the ease with which information can be found, and the ease with which it can be stored.

Two things, really. That’s all.

The first embraces everything around indexing, cross-referencing, labelling, structure and searching. The latter takes us into the territory of copying (and of course copyright), archiving, and the general issue of persistence.

And when we look at personal data in that context, there is an immediacy–and potential toxicity–in what emerges.

We saw early rumblings of this long before the Internet, of course, when computers were first used for the mass processing of information about people. Things could be done with databases that simply weren’t possible with big paper ledgers.

We created Data Protection legislation which attempted to put reigns on the ability to make free use of some types of information. Gathering stuff about people, from the basic facts of who and where, to how to contact them, who they were connected to, and what their tastes and preferences were. Pure gold, used in the right (or wrong) ways.

Data Protection set out some pretty sound, but general, principles. The overarching one being that the purpose to which data could be put should always be made clear to whoever provides it, at the time of providing. Lots of other stuff about processing, storage, where and how long, and so forth–but that issue of consent always seemed the most important, to me.

And we scratched about a bit to actually try and define what we meant by “personal data”. Some things were easy. Names. Addresses and phone numbers. They’re just obvious.

But what about our tastes? Our buying history? The movements of our mobile phone from cell to cell? A journey we took? As one takes informational side-steps away from the individual, the obviousness diminishes, but if you can make meaningful connections back to the person…

…and remember the first thing that the Internet really changes?

Being able to make those tenuous links between blocks of information into something really substantive.

And the second thing? That information and those links are now permanent. You can’t delete them, once they’re there.

All those things that databases couldn’t previously do, because they all conformed to different standards, and weren’t connected together? They can now. Things can be done via the Internet that simply weren’t possible with just the databases.

Bit by bit, it’s been possible to build up the most humongous repositories about people. Maybe entirely within the law, maybe in other ways as well. Maybe with explicit and informed consent all the way down the line. And maybe not.

Who’s to know? We find strange things going on with data that we provide in order to use one or other service–or even to exercise our democratic rights. Didn’t it ever strike you as slightly weird that the electoral roll could be sold on for commercial purposes? (Much more on the electoral roll in another post coming soon.Update: now here)

We have big companies that have built successful businesses just like this: perhaps using aggregated personal information for credit referencing, perhaps to sell to marketeers to give them a better understanding of demographics.

The genie is very much out of the bottle. Your rights to see the information that a particular company holds on you may exist, but you have to have a fair idea of which company to ask in the first place. Can you ever see the full picture of what others know about you?

Of course not.

And it’s unreasonable to suggest that we’ll ever be able to do that. Instances of data multiply more rapidly than does our capability to track them. (There must be a Law of Internet Entropy out there that says something like that. If not, I just invented one.)

(As an aside, a dear friend once uttered the memorable line “somewhere out there, there’s a database with your dick size on it”. That was in 1989.)

So what can we do?

Realistically, all that’s available to us are firebreaks and friction.

We can’t get that genie back in the bottle, but we can slow it down a bit, and find ways to mitigate the impacts.

Do we need an updated definition of personal data? It’s MUCH harder than it seems at first glance to create one. The best I can find at the moment in terms of an “official” position is here.

And it’s clumsier than you think. Essentially, it’s a list of ever-widening filters that assess whether a particular piece of information can be connected to a specific individual. Culminating in the rather wonderful catch-all of the final category:

8. Does the data impact or have the potential to impact on an individual, whether in a personal, family, business or professional capacity?

Yes The data is ‘personal data’ for the purposes of the DPA.
No The data is unlikely to be ‘personal data’.

Even though the data is not usually processed by the data controller to provide information about an individual, if there is a reasonable chance that the data will be processed for that purpose, the data will be personal data.

That’s pretty general, no? In fact, going by that, an awful lot of things are now personal data. I really like the emphasis it puts on the outcome of the data use, not attempting to over-define things like form and structure.

I’d go as far to say we should probably throw away that big long document, and just run with this definition:

Personal data is information that affects you when it’s used. Either directly, or through being linked to other information using technologies that exist now, or may exist in the future.

Broad enough? ;)

(So my beloved photos: they’re personal data. I take them with a camera that has a unique number, held in metadata in the picture file. That provides a way to link all the pictures it takes together, and then, through the various accounts I put them in online, back to me. Think how many other trails you leave…)

But again, all we really have are firebreaks, and friction. There’s a sort of reverse entropy at work. Unlike almost every other instance of entropy–where things get more chaotic over time (china plates get broken, they never put themselves back together again)–personal information is relentless only going to get more linked. More aggregated. More pervasive. More permanent.

(So, maybe I just invented The Law of Reverse Internet Entropy as well? Not bad going for one post…)

And if someone tells you that big blocks of personal data can be “de-anonymised”, be very sceptical indeed. (You can read some wise thoughts on the issues involved here and elsewhere on that blog.)

We can undertake some pretty noble fire-breaking: like ensuring the state doesn’t become the source of a global universal identifier for you. And we will certainly see more developments around multiple personas: compartments of your life associated with particular tasks, contexts, or connections. I think we’ll have to. (The concept of federated identity helps here, but that’s too much to go into for this post. Read more thoughts from the team working up these concepts for government.)

And we’ll adjust. Society has seen some pretty dramatic upheavals. Often associated with a new technology, or philosophy. If we adjust our societal norms faster than the upheaval, we don’t notice. If we’re slower to change, it’s painful. For a bit.

But we get through. We adapt. And we change. Always.

The ultimate democratic data mash-up

So, online petitions are back with us again.

X-factor Whitehall is back for a new series. The power of the people is now only a few clicks away. Etc. etc.

Amidst all the flurry of any new service settling into being, the following thought occurred to me.

The big, obvious problem with an online service as an expression of democracy is that it’s not evenly representative. It’s skewed in all sorts of ways. The most apparent being that not all of the population have online access (or want it), and that distribution isn’t consistent across political or demographic groupings.

But others include people’s willingness to use the Internet for different tasks, the dominance of a few powerful voices with well-developed networks, “bandwagons and herds”, and the relatively small sample sizes involved.

On this latter point, compare the number of “signatures” required to put an issue up for Parliamentary consideration, with the number of votes required to turf someone out of Celebrity Big Brother. The influence of just one mention in the traditional, mass media can dramatically sway the standing of an issue.

But there is a way that we could do better. Or at least shine a different light on the petitions that we are seeing.

Data exists, within the big demographic and market analysis companies (CACI and Experian spring to mind) which does precisely some of this heavy-lifting. Adjusting research samples for all sorts of biases. Correcting for factors such as representativeness of a particular geographical area, preferences or capability to use certain channels, and so on.

I’m not an expert in this type of data comparison, but I do know that such adjustments won’t ever give us perfection. They will, however, add to our insight. In what might prove to be compelling ways.

I wonder who’ll be brave enough to attempt this, the ultimate democratic data mash-up?

Inconvenience

I’ve written before about something that would really set a rocket under the opening up of data: the vigorous pursuit of the useful stuff.

When we’ve been given access to transport data, wonderful things have happened. When we get real-time feeds, useful services follow hot on their heels. Let’s make those infrastructural building blocks of services available for free, unfettered use: the maps, the postcodes, the electoral roll, your personal health records.

(Ok, I didn’t mean the latter two. Or did I? It gets complicated. Still writing that post…)

Here’s a vision:

Roll forward to a time when the first priority of any service owner within the public sector is not “how shall I display the accounting information about the costs of this service” (or indeed “how shall I obfuscate the accounting information..?”).

No. Instead, it is: WHERE is the service? WHEN is the service? WHAT is the service? HOW DO I USE the service? (And maybe even: WHAT DO PEOPLE THINK about the service?)

Those basic, factual jigsaw pieces that allow any service to be found, understood, described and interacted with. From a map of where things can be found, to always-up-to-date information about their condition, and a nice set of APIs with which others can build ways in.

The genius of this type of thinking being that many of the operational headaches of current service delivery simply fall away. They are no longer a concern for the service owner. “Our content management system can’t show the information quite like that.” “We haven’t got the staff to go building a mapping interface.” “We’re not quite sure how we’d slot all that into our website’s information architecture.”

Pouf. No more. Gone. The primary concern becomes: is the data that describes this service accurate (or accurate enough–with some canny thinking about how it might then be written to and corrected), and available (using a broad definition of availability which considers things like interoperability standards).

Well, Paul. Nice. But what a load of flowery language, you theoretical arm-waver. Can’t you give a more practical example?

Well, reader. Yes I can.

Loos.

That’s right. Public conveniences. A universal need. A universal presence. But where are they? When are they open? And what about their special features? Disabled access? Disabled parking? Baby-changing?

There’s actually a bit more to think about (once you start to think hard) than just location and description. But not a whole lot more. The wonderful Gail Knight has been banging this drum for a while, and has made some good progress, especially on things like the specification for data you’d need to have to make a useful loo finder service.

Why’s this really interesting? Really, really interesting? Because having got a good idea of the usefulness of the data [tick] and a description of what good data looks like [tick] we then find all the other little gems that stand between A Great Idea, and a Service That Ordinary People Can Easily Use.

Who collects the data? Where does it get put? Who updates it? Who’s responsible if its wrong? How do people know they can trust it? Can people make money from it? (I could go on…)

Bear in mind that any additional burden of work on a local authority (who have some duties around the provision of public loos) probably isn’t going to fly too high in the current climate of cuts. Bear in mind also that anyone else who does a whole load of work like this is probably going to want something in return. Bear in mind also that “having a sensible standard” and “having a standard that everyone agrees is sensible” are two different things. Oh, and I need hardly add that much of this data will not currently be held in nice, accessible, extractable formats. If, indeed, it exists at all.

Two characters usually step forward at this point.

The first is the Big Stick Wielder (“well, they should just make councils publish this stuff. Send them a strong letter from the PM saying that this is now mandatory. That’s the standard. Get on with it. It’s only dumping a file from a database to somewhere on the Internet, innit?”) BSW may get a bit vague after this about precisely where on the Internet, and may, after a bit of mumbling start talking about a national database, or “a portal”, or how Atos could probably knock one up for under a million… (and it’s usually at this point that some clever flipchart jockey will say “Why just loos? Let’s make a generic, EVERYTHING-finder! Let’s stretch out that scope until we’ve got something really unwieldy massive on our hands”.) We know how this song goes, don’t we?

The second is the Cuddly Crowd-Sourcer (“forget all that heavy top-down stuff, man. We have the tools. We have some data to start from. Let’s crack on and start building! Use a wiki. Get people involved. Make it all open and free.”) CCS’s turn to go a bit vague happens when pushed on things like: will this project ever move beyond a proof-of-concept? how do we get critical mass? does it need any marketing? can people charge for apps that reuse the data and add value to it? how do we choose the right tools?

Both have some good points, of course. And some shakier ones. That’s why this is a debate. If it were clear-cut, we’d have sorted it by now, and all be looking at apps that find useful stuff for us. And isn’t just a matter of WDTJ (Why don’t they just..?).

My suggestion? CCS is nearer the mark. Create a data collection tool which can take in and build on what already exists. Use Open Street Map as the destination for gathered data. Do get on with it.

Matthew Somerville’s excellent work to get an accurate data set of postbox locations and the Blue Plaque finder are obvious examples to draw inspiration from. Once in OSM, data can be got out again should the need arise. There will be a few wrinkles around the edges as app developers seek to make a return on what they build using the data. There may well be a case for publicly-funded development on top of the open data. But get the data there first. Make it a priority.

Because if, after years of trying to make real-world, practical, open, useful services based on data we continue as we are, with a pitiful selection of half-baked novelties and demonstrators of “what useful might look like, at some point in the indeterminate future” we’re badly letting ourselves down.

Basically, what I’m saying is: if we can’t get this right for something as well-defined and basic as loos, a lot of what we dream of in our hack-days and on our blogs about the potential of data will just go down the pan.

————

UPDATE:

OK, so it seems it already exists. Or at least a London version of it anyway. Don’t you love it when that happens? Would be good to see how it progresses, and what its business model looks like. I like the way that data descriptions have been used e.g. “Pseudo-public” for that class of loos which aren’t formally public conveniences, but can easily be accessed and used – e.g. those in libraries, and cooperative shops. The crowd-update function looks good too.

In a way, this also shows up another headache that arises when spontaneous services start to appear: there is only one set of loos in the real-world. But each representation of them in an app or online service must go through the same process of ensuring accuracy and extent of coverage. Distributed information is always tricky to manage. Should we hope that several competing services make it into production, with the market determining which succeeds? Will that be the one with the best data? Or is there scope for an underpinning data service that feeds them all? (But then we court the central, mega-project problems again…)

Answers on a postcard, please.

A question of trust

In seeking an antidote to the selfish ravings of Somalia-bound Liz Jones (I’m not linking. You’ll work it out, but I don’t suggest you try too hard), a kind soul pointed me towards the wise words of Barry Schwartz on society’s loss of wisdom. It’s a great piece: one of those tub-thumping, uplifting TED talks that gets you nodding and waving along with his thesis. Whooping, even.

Basically, he says we’ve dispensed with our humanity in our quest for efficiency and profit. The wrong things are being measured. What really counts in any public-facing service is an appreciation of the softer aspects of, well, human interaction. We’ve lost the wisdom that gives us sensible decision-making, discretion and the ability to “get” all this. Perhaps not “lost”, as much as “designed-out”, in order to please all sorts of other gods.

What’s not to like? How could he possibly be wrong?

There he is, pointing to the job description of the janitor who has a whole load of specified tasks to perform. Mop the floor. Straighten the curtains. Swab the sink. But nowhere, nowhere, does it say: “Be nice to people. Be human. Be flexible.” (In a really perverse way, Bonkers Liz was saying something similar. But from a position of ignorance and vacuous moral bankruptcy, so basically, she can fuck right off.)

And, one might argue, does a job description need to spell out the requirement to be nice? I don’t know, perhaps it would make some difference if it were written down? I’m not convinced.

In the murky world of measurability and management, what does it even mean, anyway? If you put your cleaning out to tender, and one company comes back with a price that’s 10% higher than their competitor, but they promise to smile a lot more at people, and leave a bit of cleaning until tomorrow if someone really just needs a nice chat instead…what then?

Because when you do start buying into this idea, and go down the road of rewarding the soft stuff like satisfaction and happiness, all sorts of strange things are going to happen.

Only last month I heard tales from a friend whose former employer was very keen for staff to “revisit” customer surveys that weren’t high enough, point out to the customer that their personal bonuses were connected to the score, emphasise that the survey wasn’t the place for all their woes with the company to be vented, and see if they couldn’t nudge it up a couple of points. Seriously.

You get what you measure, remember?

Or rather, you get the measurements that lead to a benefit for the person being measured.

And there’s a double-edged sword in all of this. Mr Schwartz and his cheering audience are doing a great TED-style job of assuming good intent. They’re thinking of all the upside that comes from freeing people up to be a bit nicer. Like that extra latitude to go and make a cup of tea for Mrs Jones through being given a bit of slack on the amount of loo-scrubbing they have to do.

They’re probably not thinking of the janitor who is a living misery to the people around him, but who, when challenged, points to the mopped floor, the straight curtains, the swabbed sink… Fancy taking on that performance review? Substituting the subjective judgements of whether someone “has the right attitude” for the hard measures of dustiness or shine? Subjectivity that puts feudalistic power back in the hands of managers who can bully or fire pretty much at will? Always a trade-off, isn’t there?

One persons’s empowered janitor is another person’s slacker-in-waiting. One person’s disability benefit is another’s disempowering handout. One banker’s justified performance bonus is…ok, perhaps that’s too far.

But it’s just Red vs Blue. The eternal debate. Centralise, decentralise. Liberate, control. Trust, assure.

Reds are great at spending someone else’s money. Blues think that pain is a far better motivator.

Trust. Trust. It all really comes down to trust. And so much of trust is based on visibility.

What we decide, what we believe, is based on what we see. The stories we’re told. And here there is an asymmetry. Negative stories travel fast, and easily become powerful myths. If conservative forces don’t believe, deep down, in public service provision at all, that will drive the narrative.

Transparency means that we get a lot more narrative. Blue editors have no end of material, and mass-consumption platforms on which to put it, to propagate Schwartz’s death of wisdom. And when they also claim to be willing to wave aside protocol and contract to “do the right thing”, the dissonance can be shocking.

I’ll end this by mentioning a fantastic piece by Onora O’Neill, one of the most enlightened people it’s my pleasure to know. She thinks rather harder about these things than most. Join her.

Switch to our mobile site