honestlyreal

Icon

Greenland Street

I’ve just done a bad thing.

As a photographer, I’ve got a great deal of respect for the work of others. When copyright is held by someone else, and I don’t have a licence to take it or tinker with it, I don’t.

Well, I just did. (And remorse is fairly low on my list of feelings, to be honest.)

I picked up a nice brief today for a shoot in a week or two. As I often do these days, I put the address into Google Street View, just to get a general sense of where the building lies, and what sort of street vistas might be possible.

And there’s this guy. Right in front of the camera.

Shielding his face from the camera, with not one, but both hands.

I kept coming back to the image. Finding it a powerful visual metaphor for the evasion of surveillance; of a small, bowed figure at the front of the frame, seeking not to be identified.

Did he know about the face-blurring they use? Did he trust it? Did he care?

(Yes, I think he cared.)

So I did the bad thing, and scraped the image, un-watermarked it (in a symbolic echo of de/anonymisation?), gave it a little help with colour and tone and composed it as an image that told a story. As I like to do.

You can see it in its full glory by clicking on the preview below. You can download it and use it for stuff if you so choose.

(If I get into trouble, I’ll let you know. If you do, let me know.)

P.S. Thanks to Michael Smethurst for setting the image in the context of this fabulous story from Cory Doctorow, which then made me think more about its symbolism.

P.P.S. You can see the original image here (until it’s replaced by a fresh camera shot, of course).

P.P.P.S. Yes, I am fully aware that I’m quite happy to use Google Street View to help me in my work but also have little frissons about some of its other “features”. But thank you for thinking it.

Just because you can…

An interesting piece appeared on the Guardian data blog on Friday. It describes a wealth of new data being released relating to court and conviction information.

The database shows sentencing in 322 magistrates and crown courts in England and Wales. Defendants’ names are excluded but details such as age, ethnicity, type of offence and sentence are not. Any computer user can analyse aspects such as how many white people were sent to jail for driving offences.

All good stuff. There’s definitely value to be gained from this type of analysis. It’s being released as a database (hopefully with a commitment to regular ongoing publication), and it brings consistency to often haphazard arrangements for making data available. These are positive moves, and should be welcomed.

But…

Transparency campaigner William Perrin, who advises the Ministry of Justice on opening up its data, says the release is a big step: “Publishing the details of each sentence handed down in each court is a great leap forward for transparency in the UK, for which MoJ should be warmly praised. Courts have to be accountable to the local populations they serve.” But he, like some campaigners, believes the MoJ should go further, releasing the names of defendants. “The data published is anonymised, flying in the face of hundreds of years of tradition of open courts and public justice.

“The MoJ need to have an open and public debate about the conflict between the central role in our society of open public courts where you can hear the name and details of offenders read out in public and crude misapplication of data protection.”

My concern lies with the consequences of releasing the names of individuals, as proposed here, in a completely accessible and reusable way.

William draws a parallel between the act of reading out names in public court and publishing them on the Internet. (Disclosure: William and I both sit on the Transparency Sector Panel in MoJ.)

Were it a simple parallel, with the same consequences, I’d be pretty comfortable with the principle of release, too. But I see one very big difference: raw content on the Internet is (almost always) indexed by search engines. And search engines have very, very long memories. The (only) two things that the Internet has fundamentally changed are the ease with which information can be found, and the duration and extent over which it persists–as I’ve banged on about on this blog before.

So, this proposal (if taken at face value) would lead to a couple of consequences which might not be wholly desirable: firstly, a name would quite feasibly, if entered into a search engine, throw up information about an offence and the consequent sentencing for an indefinite time. What implications does that have for rehabilitation of offenders? If your conviction has been spent, and your potential employer does a quick check and finds that the only thing you’ve ever been noted for on the Internet is… Well, would that feel just to you?

Ah, I hear you say–but look at court reporting now: those journalists that do manage to get intelligible information out of a clerk so they can write their pieces accurately end up with their content being indexed (paywalls permitting), and the Google ghosts will be there to do their haunting anyway. Yes. They will. But this is an issue of scale and ease, not principle. Journalists today, even those with perfect information, exercise some choice over what they choose to print. Maybe this is just because of space constraints, maybe there are other factors at play. But the “release everything for reuse” stance would dramatically increase this scale of publication.

You may say that this is a good thing: along similar lines as “nothing to hide, nothing to fear”, this extra hangover from a criminal’s downfall may be a very positive thing for society. Another deterrent to criminality, maybe? I don’t know about that, but I do know that we then face a reappraisal about what we mean by rehabilitation as a direct consequence of data release.

And, as William says, that needs proper public debate.

But it’s not just a matter of scale. We find, when public data is released en masse, that new business opportunities spring up. Imagine the entrepreneur who gathers all data on convictions and charges for their own employee check service. They might adhere to principles of time limitation on their data. They might not. They might mash-up this data set with other information. They might not. They might put profit before principle.

We attempt to control such reuse of information with regulation, but on the Internet, it gets very much harder to make this stick in practice. Again, we risk changing the landscape of what it means to be convicted, by releasing data like this.

I’m fascinated by how even something like the current Data Protection Act relates to the indexing of personal information within search engines. Surely, almost by definition, the end purpose of such indexing cannot be known, and therefore Principle 2 (Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes–source: ICO) must surely be creaking already?

So, I’m not so keen on making it indexable. Can this be avoided? Is there a middle ground which acknowledges the shambles that is the current practice in courts–with some prepared to supply information in machine-readable format, others insisting on hand-written notes being passed, and some seemingly actively obstructive in providing information?

I think there might be. There are some “government” datasets which although they could be released for reuse, aren’t. For fairly good reasons. The database of car registrations, for example. I suspect we’d consider if a bad thing if a road rage incident could be easily followed up with some bricks through windows on the basis of typing in the offending registration plate when you got home.

Similarly, we have a curious set of “frictions” in place to allow us to have an electoral roll which is at the same time both “publicly viewable” (provided you go to a library) and searchable online only if you pay up a good chunk of cash. A big hmmm from me to that latter part, by the way, but you can read much more on electoral roll issues here.

And the way that this data is structured is also important: so that we can’t, for example, easily go online, type in an address down the road, get a full list of occupants’ names and pop round there with all sorts of social engineering stories designed to make trouble/extract money/dig for further info/groom/be very creepy. Again, I’d suggest we do this for good reasons, and we know how to build machinery to keep this equilibrium in our society.

We may solve the problem through choosing carefully the format for release, the means by which it’s referenced, and even to whom it’s released. Yes, I know, those wretched privileged accessors again (just like the Police, DVLA, local authorities, credit agencies etc etc etc.) Always a subject to warm the temperature in open data discussions!

But I’m not arguing for wilful obfuscation of this data, merely putting forward some of the alternative perspectives to “everything, raw, now”. We do need this public debate, and we need to be reasonably confident that we’re getting a net societal benefit from whatever action we take.

Let’s tread carefully here–just because you can, doesn’t always mean you should.

[I'd be commenting on the Guardian article if I could, but it doesn't seem to have comments open, so I've written this in response.]

The Accidental Data Controller

It happened a few months back.

Facebook (that hideous, grunt-cheering, dumb-arse cesspit of a privacy clusterfuck–but let me try and remain objective) started to put some rather strange suggestions for new “friends” up on the top right. People who weren’t unknown to me, exactly, but whose electronic link to me could only have been derived in one way.

From email addresses.

These were people who I may once, ever, have emailed. Or who had emailed me, maybe just the once.

And this latter angle got me worried.

Because I know I have never, ever pressed that “find my friends by pillaging my address book” button. Not in Facebook; not in any other service.

And anyway, some of those names weren’t in my address book anyway. But mine must have been in someone else’s… And I started to whiff a potentially horrible thing. However, this being Facebook, and Facebook being full of horrible things, I tucked it into a mental back drawer and let it go. That time.

Then, last week, I got another email invite to some new whizzy networking service. The invite came from someone I’ve got a lot of time for, so I figured there’d be no harm in signing up and having a quick look around.

The first thing I was greeted with on entering the new service was the message: “Ah – it looks like you already know Rich D—-; why don’t you connect to him on here?”

And that, dear reader, brought the whole sorry mess tumbling out of that back drawer in my head.

This service was entirely greenfield territory to me. I had shared absolutely nothing with it, other than my name and email address (by virtue of using it as the basis of my registration).

So the only way this matching could have occurred would be if Rich had clicked on the “Pillage Me!” button, and passed his entire address book to the new service, there to be held in limbo until such time as happy little matches like me popped up to trigger this unwelcome welcome.

I know I’ve agonised on this blog before about what makes personal data personal. About how uniqueness, utility and linkability all have a big bearing on just how “personal” a piece of data is (and how much we should therefore be bothered by its loss or misappropriation).

Just having one bit of data floating about would be concerning enough, but–and this is a big but: what if that address book pillaging also took not just the raw email address itself, but also the associated name (or indeed any other fields)?

Anon@freetibetbyforce.com may just be an address to a dead-drop online account, but if it’s ever been associated with a real name, manually entered, in someone’s address book…(you see where I’m going here?)…the consequences could be pretty horrendous. Obviously this is an extreme example–but it makes the point–third parties are sharing your email address and perhaps related personal data in vast quantities, without really realising they are doing so, with services that hold it…where? how securely? for how long? IN ORDER TO MATCH YOU UP ON SOME LAME SKILLS NETWORK SITE?

When companies first started this sort of indiscriminate hoarding and sharing of personal data, we created the Data Protection Act as a countermeasure. Clearly, it’s getting hopelessly out of date and was never designed for this sort of scenario.

But humour me, and assume we should still adhere to its principles.

That would mean that you, me, anyone with an address book, could (or should?) be required to register as a Data Controller–mindful of the fact that our own address books have powerful, valuable content and with one click we become complicit in a process that spreads it way beyond the bounds of any purpose we could sensibly be said to have consented to.

I think this is hugely important, as no matter how careful we are with our own information, we are entirely reliant on the caution of others not to compromise it.

It’s an interesting one. Exam question for the Information Commissioner’s Office then: how big does your address book have to be before you need to register it under the Data Protection Act?

On trolls and anonymity

Picture this.

You’re walking down the street one day and a strange figure blocks your path. They’re clad head-to-foot in a black sheet. They’ve got some strange sort of voice scrambler strapped to their mouth beneath, and you hear this grating mechanical voice emerging.

It’s low, sinister, and very, very unnerving. You’re told that you’re worthless, stupid, wrong, and that all manner of terrible tortures will now befall you. There are slurs on your gender, your age, your politics, your sexuality.

At first, you’re shocked. Terrified and horrified.

Then you take stock. This creature…this shambling figure who dare not show their face nor reveal their true voice. This creature, who you now see is wearing a little badly-spelled badge so that their “distinctive” ranting can be identified wherever they choose to spew it out.

And you’re there, unmasked, identifiably, proudly, you. And you think of the feedback you get–good and bad–from those who do show their faces, and who use names which you can check out at least roughly in twenty seconds on Google or Facebook.

And you also think of those who are generally helpful and positive to you, but go under a pseudonym that can’t be easily checked back to an identifiable person.

And you put these in order of importance in your head. And you look again at the grating, shrouded, cowardly figure, and you laugh. They’re at the bottom. Actually, they and their opinions are completely worthless. The out-and-proud are at the top. And the pseudonymous somewhere in the middle.

You begin to laugh at the creature. Not viciously, not gloatingly. Just in mild amusement that anyone, ever could think that this creature mattered. Others join you. A warm buzz of gentle ridicule washes over the creature. It slopes away.

And you walk on.

Now. That’s a twee little tale if ever there was one. A piece of blogger whimsy, and not a little patronising with it. Of course it is. (I hope to God it doesn’t come over as a piece of “mansplaining” by the way. Because it’s not aimed at any group or individual in particular.)

It’s an observation not on “how we stop anonymity”–if you read my stuff on identity on this blog you’ll understand that I don’t believe that’s possible. Instead it’s a sketch of what type of framing it might take to assign anonymous, negative comments such a low value that everyone–from direct recipient to disinterested observer–just goes “oh, yeah, right, ok, anonymous blah, where’s the valuable stuff?”

Idealistic. Yes. I know. And I’ve skirted around a few obvious issues, above.

That the shock and pain of these comments can be so blithely overcome, if at all. And yes, I’ve had some myself, and not done a very good job of prioritising them as unimportant. (By any stretch of the imagination.)

I’ve ignored the physical reality of intimidation–of attacks moving from the space at the bottom of the blog to a text on your phone or a knock at your door. I’m making some big assumptions that the machinery of our society’s protection of the individual, plus a diminishing urge on the part of trolls to convert their keyboard bile into further threats in riskier channels, combine to mean that actually personal safety isn’t endangered that much. But it is sometimes. I know that.

But the key message of this illustration is to suggest that it isn’t just the personal reframing of a recipient of anonymous hate speech that takes us nearer to a solution–if that worked, we’ve have all done it a long time ago.

It’s that we might find the answer in the growth of a collective recognition–in our society and culture–that there is a pecking order of importance, with anonymous, negative right at the very bottom.

It’s obvious that there’s an asymmetry involved: for hate speech to be a problem the original author has to be identifiable to some degree, and the troll almost without exception anonymous. It would be wonderful if that asymmetry also became the foundation of a recognised hierarchy of weight-given-to-commentary. (No fancy technical mechanics here in the giving of points or +1s–I mean a completely, socially-pervasive, understood hierarchy).

And that would extend not just to an author’s reaction to their troll, but to it becoming completely normal for other commentators to perform the online equivalent of shrugging, smiling slightly, and stepping around the shambling, cloaked, figure. No quick fix, of course: but a cultural goal to aim for.

With thanks to Julia Hobsbawm who wrote about this tonight for making me think more about an issue that’s been bubbling away in my head for a while now. I saw other angles on the debate earlier today too, asking how technology might save us from the curse of the troll: a framing of the question, in my view, that will be very unlikely to lead to fruitful answers.

I guess my one-line summary is: the only viable solutions will come from a focus on how we all react, and not on how we police boundaries. Please let’s not get tangled up with more futile attempts at gatekeeping.

midata: revolution or enigma?

No technology contracts bigger than £100m.

Bye-bye proprietary software monopolies–hello Open alternatives.

An avalanche of government data to generate new business opportunities and pump billions into the economy.

Fast broadband for (almost) all.

Agility, everywhere–no more risk-averse, unchangeable systems–instead, a commitment to diversity and experimentation.

Reskilling in-house tech teams, reducing dependence on external suppliers with vested interests.

And after years of false dawns, services actually joined up around–and designed for–their users.

There’s not a lot not to like, really. Is there?

Just before the election we heard a torrent of such promises. Watching the gathered geeks and entrepreneurs around me at the launch of the Conservative Technology Manifesto last March I could see tongues virtually hanging out. We weren’t just being offered the keys to the sweetshop–Francis Maude and Jeremy Hunt were pretty much proposing ripping its doors off.

How much of these sweeties have actually been delivered post-election is a story for another day (ah, the shackles of that Coalition Agreement, I’m sure…).

But over recent weeks and months we’ve seen glimpses of another what’s-not-to-like initiative. And now it’s been launched.

Midata.

[Ok, try this link. I was making a dodgy CMS point with the first one, that Google (and BIS site search!) gave me...]

So here comes the grumpy blogger to get all picky with what on the face of it is a risk-free, consumer-enriching move willingly volunteered by industry, facilitated by government, to make real people’s lives easier at no cost. (Coz there’s loads of those.)

Well, not so much of the picky, really–just an interest in shining a light into some of the corners of this debate. Because corners and angles there most certainly are.

The first thing to get to grips with is that there seem to be two big agendas wrapped up together here.

Both can be connected to the words “me” and “data”. But they seem to be quite different in their nature and purpose. That’s always a recipe for confusion if not properly unpacked. So let’s see what we have.

Agenda 1: better information for consumers

We have a consumer empowerment angle here, clearly. “Giving people back their data” is billed as putting the customer back in control when forming or reviewing a relationship with a vendor. For some services, especially things like utilities and telecomms, the case is very tangibly made.

We generate a lot of data in consuming the service. Understanding our consumption patterns in detail would help us when making future choices about service provider, as we’d be able to match the terms that were on offer with what we actually needed.

So far so good.

This also extends to things like preference data: as we go about buying things (and even just looking at them) we generate a cloud of information about our preferences, choices, needs and their timing. This has a value–how much, nobody really knows, though there are some florid estimates–to marketeers, and could drive better deals and more targeted, less intrusive advertising.

Agenda 2: proving your identity online

The moment we started to move transactions away from being with someone you knew personally in your village, we increased the complexity of how you prove things: who you are, can you pay, entitlement-by-residence and so on. Online, it’s pretty horrible, and attempts at building something that’s simultaneously secure and usable by normal people have foundered.

(There is more elsewhere on this blog about these issues–otherwise this post would be very long.)

Suffice to say that the current approach (which actually looks pretty promising) is that of “federated identity assurance”. Not trying to create one massive database of people information against which things can be checked, but to use information sourced from a number of existing trusted relationships, in combination, to give sufficient assurance of identity.

Which means that both these agendas are the same, doesn’t it? They both involve consumers getting their hands on personal data that’s previously been locked up in companies.

Well, actually, I don’t think it does.

Why not?

A definition of “personal data” is harder to pin down than might seem initially apparent [more here]. Lots of things that don’t look that personal by themselves (points on a map, equipment serial numbers etc.) take on a whole new power when linked to an individual.

There’s the obvious “personal facts” stuff, of course: name, address, account number etc. which usually (but not always) identify an individual.

Then there’s operational data, made much of by midata: what we’ve used, what we’re interested in, what service choices we made etc.

Releasing structured chunks of this latter type could well meet Agenda 1′s objectives. And there are design choices to be made here which will have a big impact on risk and privacy.

Would it be sufficient to get a log of mobile calls by time band and number type, for example, rather than a detailed list of numbers actually called, and precisely when they were made? The former could well be enough to allow a better contract to be found: the latter would be a potential privacy nightmare, not just for the caller, but also whom they called, if it were mislaid.

My point being that meeting a consumer empowerment agenda requires the “giving back” of information with certain characteristics–i.e. tailored to fit the way that consumer services are packaged.

But the giving back of information to help confirm an identity relationship–Agenda 2–seems to me to be a very different beast.

Because I thought the whole concept of using a number of different identity providers was that you asked them to pass confirmations of trust around–not the actual personal data itself? So one might ask a bank to confirm electronically that some submitted data matched a record that they held, but that’s not the same as handing the requestor (or indeed the individual) chunks of personal data.

So I fear that in an attempt “not to go into too much detail” we’ve got a conflation of two separate, interesting, important issues under the midata flag.

One can always argue that “it’s the principle that counts–we should establish that first, then let the clever people get on with the solutions”. Well, yes. Ok.

We did that with electronic patient records, with Post Office smartcards, with national identity cards and registers… At some point we do need a public airing of the underlying principles in a greater level of detail than the initial press release. And before a major delivery programme has been commissioned, I’d suggest.

Other than this “issue overlap” there are a few other points that strike me about midata. There is this underlying sentiment that consumers have a right to “their data”. But what is it that actually makes a particular piece of data “theirs”?

Information about usage is a hybrid of personal facts (e.g. who is the account holder?) and operational information as a consequence of service use. How far does it extend? Basic consumption patterns? Probably yes. Detailed, time-stamped records of every purchase and all parties involved? Hmm. Maybe. Serial numbers and last maintenance dates of the precise routers and masts that were used to deliver a phone call? Well, now you’re being silly, Paul.

Yes, I am, of course. But I’m trying to illustrate that the translation of this “right to data” into reality involves more than just signing a memorandum of understanding.

And then there’s the cost angle. Even if we assume that the addition of a simple bit of code will suddenly enable service providers to spit out raw chunks of data onto the Internet (aka the “it can’t be that hard to get their systems to…” fallacy argument) the midata announcement is already talking about a greater degree of sophistication: particularly the bit about “access, retrieve and store their data securely”. Who’s going to pay for that?

And do we have robust evidence that there is interest and demand for this type of data release, other than from the vociferous lobbyists with their eyes on constructing a wealth of new “personal data store” opportunities?

It’s great to see entrepreneurial spirit flourishing, but how much is this about solving real consumer problems, and how much about playing yet more variations on the “consumer as product” theme–you tell us about your interests, and we’ll give you better deals (but only as a share of what we’re really making by selling that information to other vendors).

The argument that better information increases customer choice, and therefore power, is of course another “what’s-not-to-like”. But if you take a step back, and look at the implied problem that “people don’t know which is the best deal as they’re all so complicated and people don’t really know what they use anyway…”

…would you put your energy into releasing chunks of data to help make a better match with a complicated tariff, or would you have another look at the issue of tariffs in general, and simplify them? Yes, both represent some form of intervention, and I can see the political attractiveness of the former, as (especially under a voluntary scheme like midata) it plays down the regulatory role in favour of cheerful vendors all quite happy to be a lot more transparent with their/your operational information. But one wonders just how sustainable this level of voluntary cooperation would actually be in the longer term in highly competitive markets…

That’s a bit like imagining a set of doors with fantastically complicated locks, and giving people the right to have equally complicated keys cut–rather than pushing for simpler locks in the first place.

So, a lot of questions remain. Conceptually, midata isn’t something that could or should be objected to. And this post is not written to criticise, but to suggest a few areas that need more detail and analysis.

When we see press releases that let fly with cool talk of data, empowerment and choice we should be getting a lot more eager to ask the next level of questions. What does this really mean? How will it work in practice? And what might some of the broader economic, competitive, social and privacy implications be?

Until we do, we’ll be dazzled by press releases and then a bit disappointed when delivery swings into action. And it’s usually too late by then to do much about it.

The Internet is amazing

This isn’t really a blogpost. Just a tiny anecdote about the power of the information at our fingertips, and how, in less than a minute, it can delight and surprise.

I do try and look at photography other than my own from time to time. I spotted this lovely piece just now: street photos of New York from the middle of the last century.

The photo at the end of that particular link, Zito’s bakery, caught my attention for whatever reason. (I think it was the idea of a “Sanitary Bakery” actually shouting that particular branding at the world.)

As you do, I wondered if Zito was still in business today. (And is he still sanitary?) A quick flick over to Google Maps, popping in the address: 259 Bleecker Street, New York, NY.

And there it is.


View Larger Map

Immediately, perfectly, the streetview is located at the precise spot where that shop stands. The tiling around the cellar hatch is there; it looks like it’s been retiled, but it’s the same shop front, without a doubt. Now an Italian restaurant.

But hang on: scroll a little to the left (try it now on the embedded picture–it works) and you’ll see that 259 is the shop next door. 259 Unique Gifts & Souvenirs. Couldn’t be clearer.

So at some point, did Zito’s stop being 259, and the numbering get changed? Why?

A surprise, a delight, and a little mystery, all in a minute, all far away in Bleecker St, as viewed from this sofa deep beneath the West End of London. I like that.

(And now I have a Simon & Garfunkel earworm, of course.)

A time and a place for everything

The factual bits:

A charity announces its forthcoming annual balloon release.

A campaigner highlights the environmental consequences of balloon releases, and posts his objections–backed up with references–on the Facebook page of the charity.

The references look to have a sound scientific foundation.

The campaigner uses a civil and unemotive tone.

The charity supports those bereaved through the loss of a child.

The supporters of the charity express outrage and condemnation towards the campaigner, and some quickly adopt an abusive stance.

Accusations are flung around on Twitter, Facebook, publicly and privately. It all gets rather nasty.

And what to make of these facts?

Should the campaigner have done it?

This is a pretty good case of “someone is wrong on the Internet” (and indeed, in the environment). But is it all one-sided?

Clearly both sides in the dispute see the other as having crossed over important boundaries.

The charity (and its supporters) are guilty of environmental vandalism, according to the scientific evidence. But they are not interested in scientific evidence. This is their tribute ritual, and the emotions surrounding it are so high as to seemingly overshadow any attempt at rational engagement. That’s “wrong”. [Clarification: the environmental damage is "wrong". Emotions are emotions. Can't really call them right or wrong. Sentence structure could have been better there.]

The campaigner believes that his cause–the potential damage to wildlife and the ecosystem in general–justifies raising awareness in the way he has. But does that make his actions entirely “right”?

I found this case particularly interesting for two reasons: the suspension of rationality, self-justified by those doing it because of the very real grief and suffering they are experiencing, but also by what it tells us about the nature of online engagement spaces.

And ultimately, was the intervention effective? Did it “raise awareness”?

Might it stop this charity doing the same thing next year?

Probably not.

Might it have an impact on those involved in less sensitive matters who might have thought about releasing balloons at some point?

Very possibly.

And does that positive effect in other places justify what was undoubtedly a painful experience in this forum?

I suspect that the campaigner, who I know personally to be highly altruistic in general, acted with a wish to help, not harm. But I wonder if he misjudged to some extent the nature of the space in which he engaged?

That Facebook page might have been billed as the discussion forum for the charity–a place in which, for any generic organisation, one might reasonably expect to conduct debate about the organisation’s aims and objectives.

But in this case, the space clearly has a different purpose. A place of mourning, of solidarity, of remembrance.

The campaigner caused distress in there. It has to be a matter of judgement as to whether the wider awareness of this environmental hazard justifies that. On balance, I think it might have been possible to raise the issue, and create a dialogue with the organisers, in a space other than the “holy ground” of this particular community–perhaps on an environmental blog, or the campaigner’s own online estate. It might not have been as effective in spreading the message, of course.

But it’s very difficult to know. Judging the mood and purpose of an online space, separating its form from its function, is hard indeed. Just because something looks like a discussion forum doesn’t always mean that it actually has that characteristic.

What do you think?

About that Data Protection myth

If you follow me on Twitter you might have spotted a recent exchange of views over the last few days with Vodafone. They do a fair job, it has to be said, of engaging in that channel. I’m not sure how joined-up or consistent it is with their other channels, but at least it’s nice to be able to ask a question and get a sort-of-answer.

My question stemmed from a curious experience when trying to contact the Vodafons via their website. They’ve taken the “use our webform, not an email address” approach. And to use the webform, I have to be logged in to the Vodasite using what I consider to be fairly strong credentials: i.e. to register on the site in the first place I had to have the physical phone to hand so that an SMS could be received and a time-limited security code typed in (as well as account details and so on)–you get the picture, nice use of a reasonably secure channel to confirm who I am. [See update below: the same web form is available even if you're not logged in, going some way to explaining the subsequent requests for further information by email.]

I’m also required, during registration, to supply an email address. In this case, the same one as I then supplied on their webform for further contact.

So having duly completed and sent off my webform, I was surprised to receive the following email two days later [extract, verbatim]:

At Vodafone, we are very particular about the security of every customer’s account to ensure that account specific information is not being shared with a non-account holder.

For me to access your phone account and provide you the account information, please provide me below mentioned security details:

- First Line of Address with Postcode
- Date of Birth
- Payment method
- Account number

Now this seems like an awful lot of personal data to be supplying simply to “prove” that the email address which sits in my securely-registered account is actually mine. Doesn’t it? Is it just me?

And being a bit twitchy about personal data exchange, especially via a channel as insecure as unencrypted email, I take it up with them. And via Twitter, I get that old favourite answer for this odd request: “…because of Data Protection” — and later “…in order to pass Data Protection”.

It’s worth reminding ourselves at this point what the Data Protection Act actually says and does. It’s built around eight fundamental principles which are all fair and reasonable provisions like “you must have consent from someone for the purpose for which you want to hold and process their data”. That sort of thing.

Principle number seven is an interesting one: it requires the company holding personal information to have adequate measures in place to protect it.

And here’s where this particular Data Protection myth arises. A company will often say “Data Protection makes us…” when what they mean is: “in order to mitigate the risk of bad things happening with your data, we’ve decided to implement some internal procedures which we think do the job”.

See the difference?

Let’s just scrutinise what’s happening here: I am being asked to provide personal information via an insecure channel to validate identical information that’s held within an account already held by them, which was created in a more secure channel.

And the company have the brass neck to tell me that “Data Protection” is making them do this?

Frankly, how well or badly they choose to implement their own processes is up to them. Up until the point at which their customers think they’re just so awful that they move to another service provider. That’s the free market; and perhaps this sort of oddness isn’t so whingeworthy.

But what’s made this into a blog post, and something I will be following up with the Information Commissioner’s Office, is this lazy use of tired, old mythspeak to try and present a poorly-designed, internal attempt at risk mitigation as something that the nasty old government has forced them to do.

(I’ve asked for a contact in Vodafone’s Data Protection team to explore this further, but haven’t received one at the time of writing.)

UPDATE: 2100, 17 Oct

Well, Vodafone certainly got engaged (at an accelerated pace once I’d posted this, and it had had a bit of RT love). Tweets, the address for the Data Protection team, and finally a very friendly phone call. Nice work. So it turns out I made an inaccurate assumption in the post above, which puts a different cast on some of the story, but raises other questions. You don’t have to be logged in to the site to use the “contact us” web form. In fact, whether you’re logged in or not (I happened to be), the web form simply has the function of sending an email to Vodafone, to which they will then respond via “standard” email. One might ask why they don’t just provide an email address: I suppose they avoid some spam this way, but you also lose the benefit of being able to see what you reported in your sent items… Swings and roundabouts.

More serious though is that much is made of the web form being secure (https). A level of comfort which is then utterly undermined by the subsequent request for that personal information to be sent back to them in clear email. I offered some alternative approaches, including taking advantage of the ability to log in securely in order to establish a much smoother, and less risky, communication channel. And a few pointers on copywriting to ensure that users don’t get the sort of surprise I did at being asked to email a bunch of personal data back at them.

It makes a certain, convoluted sense that they then have to ask these personal information questions in order to satisfy their Principle Seven obligations, but only because they’ve paid insufficient attention to contact design in the first place. I noted that in all the online transactions I’ve used (and that’s quite a lot) some of them involving rather bigger lumps of money, or data of greater sensitivity, than a phone account, I’d never been asked to provide information in clear like this. And that by itself should be a clue that all was not as it should be. The combination of address, date of birth, and an account number provides a malefactor with a heck of a headstart in further social engineering, and there’s really no excuse for asking it to be passed over like that.

We’ll see what changes.

On communal grief

We’re all entitled to our own reaction.

To catastrophe, to unexpected joy, to death.

When people get very involved in the death of someone they didn’t know, I am slightly puzzled. Of course it is their right. But all behaviours meet a need. And I’m a little baffled as to what need is actually being met in these cases.

Anyway, no sermon. Do what meets your needs, and respect that others may meet theirs by not feeling any urge to join you. And that’s fine too.

A small anecdote:

A dozen or so years ago, a close relative by marriage was hit by a speeding police car in Croydon town centre. She died a day or so later from her severe head injuries. A tall, beautiful girl, 17, with her place at Cambridge secured. Devastating.

We drove past the scene a couple of days after it happened (I’m not sure if it was by chance, or that sort of “by chance” that is actually quite intentional.)

A few tragic bundles of flowers were taped to the lamp post across the road from the library. Small, bedraggled cards from schoolfriends. Very moving.

A few days later, we passed by again. This time, a mountain of flowers were there. We never realised she was so popular. I stopped and got out to look at the first one.

“Dear Diana, you will forever be in our hearts”.

And so it went on, right down the pile.

It was early September, 1997. The good people of Croydon had clearly been struggling to know where the “official” flower-laying place was, until this little scattering of children’s tributes appeared.

Alice gave them that, at least.

Neither one thing nor the other

In which I look more closely at one particular, well-known data set: what makes it what it is, and what we might draw from the way it’s managed to help us with some other challenging questions about privacy and transparency.

Surely data is open, or it isn’t?

(I’m using “open” here as shorthand for the ability to be reached and reused, not with any particular commercial or licensing gloss. It’s a loaded term. But let’s not snag on it at the beginning, hey?)

Data is either out there, on the internet, without encryption or paywall, or it isn’t. And if it is, then that’s that. Anyone can reach it, rearrange it or republish it, restrained or hampered only by such man-made contrivances as copyright and data protection laws.

Maybe. Maybe not.

I’ve been involved in some interesting discussions recently about the tricky issues surrounding the publication of personal data. By that, I mean data which identifies individuals. To be specific: some of the information in the criminal justice sector about court hearings, convictions and the like.

You’ll have seen much in the press, especially following the riots, about a renewed political and societal interest in this type of publication.

Without making this post all about the detailed nuances of those questions, this broader issue about the implications of “open” publication seems to me to need a bit more exploration before we can sensibly make judgements about such cases.

And to do that I took a close look at one very well-known data set: the electoral register.

What is it? Well, it’s a register of those who’ve expressed their entitlement, being over 18 (or about to be) and otherwise eligible, to vote in local and national elections, through returning a form sent to them by their council each year. If you’re reading this, you’re probably on it. I am.

It’s therefore not: a complete list of people in the UK (or even of those entitled to vote); a citizenship register; a census; a single, master database of everyone; accurate; or a distillation of lots of big government systems holding personal information.

What’s it for? An interesting question. I suppose its primary existence is to support the validation of those entitled to vote, at and around election time. But you’ll know, if you have voted, that it’s more of an afterthought to the actual process; most people show up with polling cards in hand, and anyway, there’d be no possibility of any real form of authentication, as the register doesn’t contain signatures, photos, privileged information or any other usable method of assurance. It’s not even concealed from view. (More on that here.)

But it does some other things, doesn’t it? It provides a means for political candidates to be able to make contact for canvassing purposes with their electorate. And I suppose, for that reason, it has this interesting status as a “public document”. Which we’ll come back to in a moment.

And to complete the picture, a subset of it (the “edited register”) is also sold to commercial organisations for marketing purposes, enabling them, amongst other things, to compile pretty comprehensive databases of people.

…and as a byproduct of that it also forms an important part of credit-checking processes–with said commercial organisations able to offer services, at a price, to anyone who wants to run a check that at least someone claiming to have X name has at some point claimed to live at Y address. (Remember, it’s all pretty weak information really, self-asserted with no comprehensive checking process.) You can opt out of the edited register if you choose, but you’re included by default.

[Update 2 Oct: Matthew, below, comments that I'm not quite right here--the full register is also available to be used for credit checking]

There’s probably more, but let’s get stuck into some of this.

First off, I will happily add that the whole business of why it needs to be public at all seems highly questionable. And I don’t remember the public debate where we all thought that it was a great idea to try and make a few quid off the back of this potentially highly-sensitive data? Do you? How do you feel about that?

And the idea that the process of democracy would be terminally hampered were candidates, agents and parties not able to make checklists of who’d been canvassed? Really? Couldn’t they perhaps just knock on doors anyway? As a potential representative would I only be willing to learn from encountering those who had a vote? I suggest not.

So, moving on past those knotty questions about “why do we have it, and why do we sell it?”, we have in practice established some conventions about managing it as “a public document”.

Can I, as a member of the public, request a copy be sent to me? Certainly not. Ok, perhaps I can download it then? Nope. Search it online? Hell no.

I can go and see it in my local library.

So I did.

I heartily recommend you do the same. It is a real eye-opener in terms of the idea of data being “semi-public”.

I trotted up to the (soon-to-be-closed [boo hiss]) information desk at the library under Westminster City Hall.

–Can I see the electoral register please?

–Sure. We only have the edited version here: if you want the whole thing, you have to go through there and ask for Electoral Services.

(He pointed at a forbidding and not-at-all-public-looking door).

–You’re ok, I’ll just have a look at this one

And out from the back window-ledge comes a battered green lever-arch file, containing bundles of papers.

–You know how to use this? he says

I shake my head. It seems the top bundle of papers is a street index. The personal information (names grouped by cohabitation, basically) is listed by street, then house name/number within street. Not by names.

So, you can’t, easily, find someone you’re stalking. (Did I say that? I mean, “whose democratic participative standing you have a legitimate interest in establishing.”)

But you can if you’re patient. Or if their name, like that of one Mr Portillo, leaps off the page at you. I intentionally chose the register of the area immediately around the Houses of Parliament, for just this reason.. Curiously, I couldn’t actually find the HoP itself listed, but Buckingham Palace does have over 50 registered voters (none of whom are called Windsor.)

But back to the process: as I picked up the box to head towards an empty desk a finger came down on the lid: –you have to read it here, he says.

I look at the lid. Wow.

I ask the question about photocopying anyway, just to judge the reaction. Kitten-killer, his eyes say.

But I take it a few paces away anyway and have a closer look.

Fascinating. I see a bunch of well-known people from industry and politics, their home addresses, and who else lives with them.

I’m sure I’ll go grey in chokey if I actually published unredacted screen shots in this post, but I’m pretty sure this one will be ok; if nothing else I think its historical interest justifies it… (RIP, Brian.)

Now, in all the fuss we make about child benefit claimant data being mislaid via CD, and in all the howling we make about anonymisation of health records and other sensitive data, and through all the fog that surrounds the commercialisation of public information and the Public Data Corporation etc. isn’t this sort of information that we would normally expect to be the subject of an enormous public debate about even its very existence? And I’m walking off the street and making notes of it, and, and…

And I can see what’s happening here.

Yes, it’s “public”. Sort of. But so much friction has been thrown in the way of the process–from the shirty look as I have the temerity to request it, to the deliberate choice over structure that minimises me being able to quickly find my target–that I would strongly argue it to be “semi-public” rather than public.

There are some important lessons here perhaps when considering the mode, and the consequence, of publishing data online. Clearly, structure is highly relevant. If I am able to sort, and index it, that instantly creates a whole universe of permanent, additional consequences. Not all of which may be that desirable. “A perpetual, searchable, SEO-friendly database of all those ever summoned to court, convicted or not, you say? Certainly sir…coming right up.”

If I’m able to relate information–by association with others–I can also help the cause of those wishing to track someone or something down. Look at Facebook. It does a great job of finding people you search for, even those with very common names amongst its hundreds of millions of accounts, by this type of associative referencing. Powerful stuff.

And let’s not forget that ALL this information is pretty easily available online anyway. You just have to pay for it. The best-known provider that I’ve looked at, 192.com, has an interesting model. You’ll be giving them at least a tenner, and more like £30 to buy some credits to search their databases. And they have the ominous rider that their really sexy information–the historic registers, is only available at an entry-level price of £150 a year. For that reason, I haven’t actually given them a penny as yet. But it’s no obstacle to the serious stalker. I mean, researcher.

I’m sure there are all sorts of impediments, from download limits to penalties for misuse, that attempt to put further spokes in the wheel of it becoming a common commodity. But how long, really, before the whole register is available as a torrent on the Pirate Bay? Maybe it is already?

And we’re not bothered about this? It’s amazing, isn’t it? Yes, this whole industry is built on data that we’re required to submit to public authorities–and if we don’t, we’re disenfranchised.

This is a scandal, and one that urgently needs review.

But do take away the point that there is such a concept as “semi-public” – at least for now. It’s the ability to process, to restructure, to index, that makes online data different from those box files in the library.

The friction we throw into the system, whether it’s (intentionally?) releasing information via pdf, or slipping a local journalist a hand-written note of the names of those in court, is perhaps more than just dumb intransigence in the face of “information that wants to be free”. And it can serve some potentially legitimate social purposes.

Think how you’d feel if those frictions weren’t there around the electoral roll? Even the money that 192.com require for you to buy back the data you gave up in the first place?

Happy that every comment you made online under your own name, every mention in the press, could be traced back to your real address along with the names of your (18+) family? I think perhaps not.

So, a very big public debate is required on the consequences of any personal data being put online. But remember, stealthily or not, we’ve had experience of these issues for years. We just need to look on the library window-ledges to find it.