DIYbanter - View Single Post

John Rumm wrote:

Yup just like now, only without the safguards of multiple incompatible
distributed and non connected databases that limit the scope of an
error, and provide alternative routes to perform sanity checks and
consistency checks on the data when something goes wrong.

So, to expand this out for the hard-of-reading.

Currently, it's impractical to conduct mass surveillance of 'ordinary'
people. People have different identifiers in different databases - your
loyalty card number in one, your NHS number in another, your employee-ID
in a third, and so on. That limits the amount of information which any
single bent insider can gain - i.e., someone whose job gives them
authorised access to the information on one database, and is willing to
take a peek at someone's data for 50-100 quid. You'd have to really
*want* the information to slip tens of people that sort of money, sort
out the false hits between them, and so on. The State may have the
resources to get the necessary access in 'extreme' cases, and most of us
would want it to; even the State doesn't have the resources to do it
routinely.

Behind John's two words 'not connected' are two deep, and distinct,
concepts. Firstly, they're not connected at the 'operational' level:
that means someone whose job gives them access to one database - the car
registration database, say - doesn't have access to medical information.
Nor do the computer systems which query or update the database have
access to those other databases.

Secondly, they're not connected at the 'logical' level, precisely
because there isn't a reliable, common identifier for the 'same' thing
across them. Where that 'thing' is a person, they've got a different
'unique within this database' identifier, as mentioned above
(loyalty-card number, NHS number, employee number). Their 'human
readable' name will vary: it may be Elizabeth R Windsor in one, Liz
Windsor in another, Elisabeth Winsor in the third. For reliable
'linking', a common identifier - as the National ID Register intends to
introduce - is all you need: it's then irrelevant whether you have one
big database or lots of little ones, as the common identifier allows the
information in each one to be reliably associated with the same person.
And it allows whoever's paying the bent insider to be sure they're
getting details on the right subject - not just someone with a similar name.

John then points out that having these 'multiple, incompatible
distributed and non connected databases' acts to 'limit the scope of an
error'. This is what he means: if there's information which is wrong on
one or two of them, it only affects the uses to which those one or two
databases are put. So, if your hospital, gas supplier, and
corgi-appreciation-society all show your address as being 'The Castle,
Windsor', your post reaches you at that correct address from all those
organisations. If the Dunkirk Veteran's Association database misrecords
your address as 'Castle Drive, Staines', only that bit of post goes
missing, and when you notice you haven't had the newsletter and
invitation to the annual dinner-dance, you convince just that one
organisation to fix their records; sometimes, showing them a copy of
your gas bill and corgi-soc post, showing the right address, can help.

Once you've a single point of change, an error affects *all* of your
interactions with *all* of the many organisations who decided it would
be Efficient to use that single point as the Right And Proper way of
getting your address. On the plus side, this means you notice errors
quicker, and have more incentive to keep it up to date; on the minus
side, the effects of an error are greater, and it can be harder to get
the bureaucracy to fix them. Within one organisation, it's worth having
a 'single', authoritative point of change - you'd want, say, Amazon to
not have different databases for their shipping department, their
billing department, and their mailing-out-special-offers department
(note, though, that you *do* want their one database to allow you to
specify a different address for a particular delivery (gift to a
friend), for your bills (usually home, please, but to an employer's or
client's address for a particular purchase)). Across all of your
dealings as a citizen or resident of the UK, though, it's a lot less
clear that the advantages of a single point of change outweigh the
risks: and that's one of the pieces of analysis which simply hasn't been
published, whether or not it's been done.

Once the databases are 'connected' - whether 'operationally' (the
computers that run them actively swapping information) or 'logically'
(one shared personal identifier across lots-n-lots-n-lots of databases),
the kind of 'mass surveillance' which is currently impractical becomes
practical. It becomes practical for the merely nosey, busybody,
vigilante, weirdo-stalker types, who can now feasibly (pay somebody to)
look up the details on the now-linked databases. And it becomes
practical for government departments to design ever more 'joined-up'
systems, which more and more tightly restrict what it is to be 'normal'.
The richer you are, the less this matters - you can opt out of many Govt
services, you can indulge your little privacy foibles; the more you're
an 'ordinary hardworking family', the more it's in your economic and
convenience-of-living interests to simply conform.

Moving by unexamined apathy into that sort of society upsets me: it
seems to me that (a) you should establish a strong genuinely-informed
consensus that 'most of us' really do want to live that way; and (b)
that you need to make some genuine provision for the 'rest of them', who
don't. The tolerance for eccentricity, self-determination, and each
citizen having their own weird ways of *not* conforming - whether it's
Morris-dancing, thinking that what Chris de Burgh produces is music,
urban chicken-keeping, or building barbeques out of emptied propane
cylinders - is the single most attractive aspect of living in the UK.
(Note the crafty link to both uk and d-i-y there ;-)

Also don't forget the new scope for data mining exercises correlating
your innocent behaviour to that of a known problem groups.

John's already explained what 'data mining' means - it's looking for
patterns in the data that's held about a Thing (a person, say, or a car)
to find Interesting New Patterns from which Interesting Conclusions can
be drawn. The uses of this technique are legion. For example, a
supermarket might find that people who often buy nice, hand-made pasta
also buy fancy olive oil (unsurprising) and travel magazines (less
obvious), and decide to put together some Targetted Promotion. Or your
credit card company finds that a long period of disuse followed by
repeated mid-value purchases indicates fraud - great if your card's
nicked and the unauthorised spending's brought to your attention early,
not so great if you've been holding off spending until having all the
grandchildren over for your 75th birthday for which you're buying them
each a pressie.

Because data mining produces only 'correlations', its 'predictions'
aren't 'certain'. This doesn't matter much if it makes your marketing
just 'rather' better, so 'only' 88% of your mailshots are ignored,
instead of 94%. It matters a bit more if your spending/activity patterns
match those of some rightly-suspect group (e.g., you're a foreign-named
keen d-i-y'er and planespotter who spends lots of time buying military
surplus gear and travelling to airports), and the resulting Enquiries
turn neighbours and colleagues against you...

Hope that helps round out John's pithy comments... Stefek