The Internet is amazing

This isn’t really a blogpost. Just a tiny anecdote about the power of the information at our fingertips, and how, in less than a minute, it can delight and surprise.

I do try and look at photography other than my own from time to time. I spotted this lovely piece just now: street photos of New York from the middle of the last century.

The photo at the end of that particular link, Zito’s bakery, caught my attention for whatever reason. (I think it was the idea of a “Sanitary Bakery” actually shouting that particular branding at the world.)

As you do, I wondered if Zito was still in business today. (And is he still sanitary?) A quick flick over to Google Maps, popping in the address: 259 Bleecker Street, New York, NY.

And there it is.

View Larger Map

Immediately, perfectly, the streetview is located at the precise spot where that shop stands. The tiling around the cellar hatch is there; it looks like it’s been retiled, but it’s the same shop front, without a doubt. Now an Italian restaurant.

But hang on: scroll a little to the left (try it now on the embedded picture–it works) and you’ll see that 259 is the shop next door. 259 Unique Gifts & Souvenirs. Couldn’t be clearer.

So at some point, did Zito’s stop being 259, and the numbering get changed? Why?

A surprise, a delight, and a little mystery, all in a minute, all far away in Bleecker St, as viewed from this sofa deep beneath the West End of London. I like that.

(And now I have a Simon & Garfunkel earworm, of course.)

About that Data Protection myth

If you follow me on Twitter you might have spotted a recent exchange of views over the last few days with Vodafone. They do a fair job, it has to be said, of engaging in that channel. I’m not sure how joined-up or consistent it is with their other channels, but at least it’s nice to be able to ask a question and get a sort-of-answer.

My question stemmed from a curious experience when trying to contact the Vodafons via their website. They’ve taken the “use our webform, not an email address” approach. And to use the webform, I have to be logged in to the Vodasite using what I consider to be fairly strong credentials: i.e. to register on the site in the first place I had to have the physical phone to hand so that an SMS could be received and a time-limited security code typed in (as well as account details and so on)–you get the picture, nice use of a reasonably secure channel to confirm who I am. [See update below: the same web form is available even if you’re not logged in, going some way to explaining the subsequent requests for further information by email.]

I’m also required, during registration, to supply an email address. In this case, the same one as I then supplied on their webform for further contact.

So having duly completed and sent off my webform, I was surprised to receive the following email two days later [extract, verbatim]:

At Vodafone, we are very particular about the security of every customer’s account to ensure that account specific information is not being shared with a non-account holder.

For me to access your phone account and provide you the account information, please provide me below mentioned security details:

– First Line of Address with Postcode
– Date of Birth
– Payment method
– Account number

Now this seems like an awful lot of personal data to be supplying simply to “prove” that the email address which sits in my securely-registered account is actually mine. Doesn’t it? Is it just me?

And being a bit twitchy about personal data exchange, especially via a channel as insecure as unencrypted email, I take it up with them. And via Twitter, I get that old favourite answer for this odd request: “…because of Data Protection” — and later “…in order to pass Data Protection”.

It’s worth reminding ourselves at this point what the Data Protection Act actually says and does. It’s built around eight fundamental principles which are all fair and reasonable provisions like “you must have consent from someone for the purpose for which you want to hold and process their data”. That sort of thing.

Principle number seven is an interesting one: it requires the company holding personal information to have adequate measures in place to protect it.

And here’s where this particular Data Protection myth arises. A company will often say “Data Protection makes us…” when what they mean is: “in order to mitigate the risk of bad things happening with your data, we’ve decided to implement some internal procedures which we think do the job”.

See the difference?

Let’s just scrutinise what’s happening here: I am being asked to provide personal information via an insecure channel to validate identical information that’s held within an account already held by them, which was created in a more secure channel.

And the company have the brass neck to tell me that “Data Protection” is making them do this?

Frankly, how well or badly they choose to implement their own processes is up to them. Up until the point at which their customers think they’re just so awful that they move to another service provider. That’s the free market; and perhaps this sort of oddness isn’t so whingeworthy.

But what’s made this into a blog post, and something I will be following up with the Information Commissioner’s Office, is this lazy use of tired, old mythspeak to try and present a poorly-designed, internal attempt at risk mitigation as something that the nasty old government has forced them to do.

(I’ve asked for a contact in Vodafone’s Data Protection team to explore this further, but haven’t received one at the time of writing.)

UPDATE: 2100, 17 Oct

Well, Vodafone certainly got engaged (at an accelerated pace once I’d posted this, and it had had a bit of RT love). Tweets, the address for the Data Protection team, and finally a very friendly phone call. Nice work. So it turns out I made an inaccurate assumption in the post above, which puts a different cast on some of the story, but raises other questions. You don’t have to be logged in to the site to use the “contact us” web form. In fact, whether you’re logged in or not (I happened to be), the web form simply has the function of sending an email to Vodafone, to which they will then respond via “standard” email. One might ask why they don’t just provide an email address: I suppose they avoid some spam this way, but you also lose the benefit of being able to see what you reported in your sent items… Swings and roundabouts.

More serious though is that much is made of the web form being secure (https). A level of comfort which is then utterly undermined by the subsequent request for that personal information to be sent back to them in clear email. I offered some alternative approaches, including taking advantage of the ability to log in securely in order to establish a much smoother, and less risky, communication channel. And a few pointers on copywriting to ensure that users don’t get the sort of surprise I did at being asked to email a bunch of personal data back at them.

It makes a certain, convoluted sense that they then have to ask these personal information questions in order to satisfy their Principle Seven obligations, but only because they’ve paid insufficient attention to contact design in the first place. I noted that in all the online transactions I’ve used (and that’s quite a lot) some of them involving rather bigger lumps of money, or data of greater sensitivity, than a phone account, I’d never been asked to provide information in clear like this. And that by itself should be a clue that all was not as it should be. The combination of address, date of birth, and an account number provides a malefactor with a heck of a headstart in further social engineering, and there’s really no excuse for asking it to be passed over like that.

We’ll see what changes.

On the shifting of control of personal data

If you’ve been locked in a cupboard for the last five (or more) years, you’re excused from observing this thematic shift:

In the longer term, data about people is more likely to be owned and controlled by them. Rather than having many instances of personal information scattered around organisations and agencies, to be confused, duplicated, corrupted and left on buses, simpler technologies have emerged to put the data owner, you, back in control.

We see this theme emerging with several different labels: from vendor relationship management, to volunteered personal information, to personal datastores, to a “control shift” in the concept of personal data.

I agree that this shift is inevitable, to a greater or lesser extent. Everyone wants it. What’s not to like? Less cost of processing, greater security, reinforcement of personal rights etc. etc.

We start to make the ideologically satisfying separation of identification and authentication/entitlement more of a reality. More of this in other posts.

I just have two snagging issues which I’d love to hear a response on from those who want to get us moving on this now:

The first is a transitional one, but an important one. As the group of “personal data holders” grows, the infrastructure and operations required to support the other group won’t change. There’ll be a double running of systems. Although this is inevitable with any system change, it puts an immediate disincentive on any service provider to explore this route. (But this is not my point here.)

My point is that strange things will start to happen in terms of operational continuity and completeness. There will be “gaps” in databases, where the personal data holders used to be. Instead of their information, there will be links and interfaces to the data they control for themselves. Will this create all sorts of headaches and risks just by itself? Enough to seriously dampen any service provider’s enthusiasm for adopting volunteered personal information?

The second will persist, and is perhaps more problematic. Because your personal information (whether it’s about your identity, other descriptive information about you, or about your authorisation to a particular service) is going to have to be assured by someone. This may not, and indeed should not–in the case of identity–be the exclusive province of government agencies, but someone is going to have to do it.

Some will do it well: banks, for example, are rather more incentivised (and skilled as a result) to be damn sure you are who you claim to be. But some won’t. And when we get down to the level of a patchwork of assurers, in any system, we start to get some problems. When things go wrong (and they will)–have a vision of a functional world by all means, but build for the real, dysfunctional one–the untangling of liability may consume more resource than was ever achieved by enabling the shift of control in the first place?

Thoughts? I’d love to be convinced. I really would. But I’m a healthy skeptic at the moment.

A bit more about train information

If you were reading my outpourings a year ago you may remember a distinct preoccupation with train operating information. In the great range of public-facing datasets out there, the ones that offer the very highest utility, in my opinion, are those about real-time and real-world things: a picture of what’s happening right now and in the near future.

Transport information, weather, location, revised opening hours, where things are etc. etc. Sure, there may be treasures to dig out from the big dumps of auditable history in other datasets, but when it comes to actually building things people will find useful, there are some targets which are clearly more promising than others. (It’s probably no coincidence that data about timetables, postcodes, maps, operating information and the like are those which are also the most commercially tangled. Value breeds impediments, it would seem.)

I wrote about the problems of there being different versions of the truth about train operation. I wrote it at a time when ice and snow were crippling normal running. So, unsurprisingly, I’m back to revisit what’s happened since.

My idea a year ago – born of the frustration in inaccurate data systems (one could get a different answer from the web, the train station office, the train platform sign, apps and feedback from other travellers via Twitter, for example) was to rethink the way that trains are tracked and described in times of extreme disruption. I’m talking here about normal running disrupted to the point that existing timetables have become meaningless (and have been abandoned), where all trains are out of their normal positions, and the only meaningful data points that might relate to a particular physical train are its current physical location and its proposed calling points.

The notion of “Where’s my train?” was that if these basic data points were captured at the level of the train and made available as a feed, then in the event of utter chaos you would still be able to see the whereabouts of the next train going where you needed to go (even if you couldn’t do much about making it move with any predictability, or at all). Very much about the “where”, rather than the “when”.

This was a departure from information systems which relied on a forecast of train running (that abandoned timetable) or on a train having passed a particular point (for the monitoring of live running information). If trains had GPS tracking (which I heard they did) and the driver knew where the train would call (I was told this was generally the case) then a quantum of data existed which could drive such a feed.

It didn’t get very far. I talked through the principles with operational staff in two train operating companies: one blockage would be that such extreme disruptions were so rare that the usual situation would arise with regard to contingency planning – just not a common enough occurrence to warrant the development of a specific response. In addition, that certainty of where the train was going didn’t seem that certain, after all. Drivers would punch in their intended stops but right up to and even beyond the point of departure this could change. Better information, intended to give comfort to the stranded, might be replaced with false hope and ultimately do more harm than good. And the nagging thought I’d had originally remained: how much use was it really to know there was a train four miles up the track which was going to your stop, if it was quite possible that the points in between you had no chance of being unfrozen?

So, no more of that for the moment. There’ve been other developments. The live running information now seems to be much more accurate. The release of that and other information, such as timetables, is now a political football – should it remain a commercial asset of the train operators to be resold, and controlled, under licence, or are there greater benefits in releasing it to all who may make use of it?

I’m fairly sure that the data will be freely available eventually. There is some sterling work going on from innovators who are banging on the door of the information holders to make better use of it (all Malcolm’s recent posts make fascinating reading). To my mind there is a big difference between the commercial position of providing a physical service, and the commercialisation of the information that describes and records the performance of that service. But we won’t have reached the end of this story until that information is dependable. And at the moment, it’s not. You can still see differences between web information, feed information via an app, and trackside information. All were different on my line a day or so ago.

Yet the teasing paradox is that there is only one ‘truth’ at a particular point in time about a train’s running, even if it may then vary over time. And sometimes, as I experienced last Tuesday evening in South London, that truth is “we don’t know where this train is going”. In extreme circumstances, when lines are blocked and points jammed, I might have known where my train was (I was sitting in it) but I had as much idea as any of the crew (i.e. none) where it was going.

Distributing and presenting this information is far from being a trivial task. I don’t know the details of the architecture behind train information systems. But I can postulate that there are many different models by which highly volatile information, in thousands of different places, could be brought together, indexed, shared, distributed and so on. And they’re all pretty complicated. Before you blithely say “well, they just need to update a single database”, think what that might mean.

The track record (sorry) of mega-aggregation isn’t great. Without doubt it’s been attempted before in this area. Perhaps linked data mark-ups of distributed information sources hold the answer? I’d be interested on any thoughts on this.

But I’m clear about this:

the sorry position in late 2010 that it is still a matter of detective work and guesswork to find out where a large, highly-tracked piece of public-service equipment is, when it’s coming, and where it’s going, cannot be allowed to continue.

I’m fairly forgiving of physical failures in cases of extreme disruption – there is much real-world complexity that lies behind simple-sounding services. But on information failures? We can, and must, do better.

Central or decentral?

Yes, nice easy question. Should be a short post.

One of the debates that stuck in my mind at the UK GovCamp 10 came from a session hosted by Alastair Smith. Ostensibly about the ‘UK snow’* and what that had meant for the likes of local authorities in delivering services and information. At least that’s what I think it was about. One can never quite tell with unconferences.

The difficult issue of managing information in disrupted conditions. One of my favourite subjects, be it weather, strikes, train disruptions or pandemics.

“How to tell people about school closures” is an excellent example.

Why’s it so difficult? Here’s a little list:

It’s a highly localised decision. It’s taken by the headteacher of a school, often at short notice. What if they’re stuck in snow, or can’t communicate their decision to anyone? We’re talking about disruption here, remember?

It’s highly time critical: if the information is to be useful it has to be delivered in the very tight window between decision and parents’ departure for school (or rearrangement of childcare, or whatever) and almost by definition this will be outside normal working hours.

There are no obligations or penalties associated with how well it’s done. (There may be a motivating issue about OFSTED reporting of absence, but I consider that secondary to the actual information process, so am discounting it from this analysis.)

There is no consistent, expected place to find the information. In some areas schools brief local authorities, in others local authorities brief local radio, there are numerous instances of online information, but little in the way of standardised approach.

Kids are involved. Kids who may just have a conflict of interest were there to be any opportunity to game the information. Just possibly.

A variety of tools are used to try and get the message out: from notifications that are actively sent to parents (by SMS, email or phone) – so-called information ‘push’; to information made available for consumption (by web, radio or pinned to the school gates) – the ‘pull’ side. Some parents and schools have developed cascade networks, formal or informal, to pass on the message. Others haven’t.

Do we have any plus sides? Well, the only one of note is that snow closure is usually predicted, to a greater or lesser extent. Something I suspect that fuels even more ire when information management fails. Surely, we cry, they must have know this might happen? Why weren’t they prepared?

Accustomed behaviours are highly personal. Parents have become used to a particular information channel, be it the radio or the web, and any changes to that will cause even more confusion, at least at first.

All complex stuff – did someone say that public service information management was easy?

But where the GovCamp discussion got most interesting was when we tackled the nub of the problem – the overarching philosophy of whether it was worth trying to centralise information at all in such circumstances. Even at the highest level, opinion is divided between attempting to centralise so that information can all be consumed in one place, and ensuring that it is maintained as locally as possible to guarantee its speed and accuracy.

For there are classic trade-offs in this decision. There is no unequivocal ‘right’ answer.

Get it to a central point of consumption (or data feed that can be consumed elsewhere) by whatever communications protocols and brute force pressures you can: advantage – easy to find; disadvantage – very difficult to make foolproof, prone to error.

Or keep it distributed, and make it easier for people to get closer to the source of the decision to get the most accurate picture: advantage – saves money, fast-when-it-works, accurate; disadvantage – hit-and-miss, accessibility, findability.

The list of challenges above should make it clear why this is far from the trivial information management problem that some might assume. One chap in the GovCamp session maintained that all it would take would be a firm hand of authority to be laid on headteachers to comply (“or else their school would be assumed to be open”). I fear that view represents a hopelessly outdated approach to getting things done that actually work.

I’ll come off the fence. I think the answer to a problem like this doesn’t lie in ever more sophisticated linking and aggregation. Building big central solutions, even with a grass-roots crowdsourcing component, probably isn’t going to work.

Instead, my experience and my gut are combining to suggest that local is the place for this information. Ubiquitously local – on school sites, via SMS, on the radio, via local authorities. Keeping them in step is the challenge: but a challenge that’s more worthy of effort than building elaborate information pipelines and monumental repositories.

*if you’re wondering why this phrasing is used, there’s some background here – which might also show why I’m so interested in it.