Wolf!
I sporadically hear those who create software expressing exasperation at the
stupidity
of users – and am generally inclined to stand up for the
user. The root cause of this is that (having worked in the software industry
for a quarter century) I'm acutely aware that software (aside from, generally,
being much harder to use than its authors realize) routinely expects users to
think like programmers – whose mindset is, frankly, rare.
However, one particular kind of alleged stupidity
brings forth
another reason for defending the much-maligned user; the software professional
wonders how anyone could be so stupid as to go ahead with some action despite
having been warned
about the possibility of the horrible things that then
ensued. The cause of this is, usually, that software is like the little boy in
the fable, who habitually cried Wolf!
for a lark, and thus was ignored
when he came to report an actual wolf – software all too often pesters the
user with confirmation dialogs, which warn of dire consequences, which users get
used to dismissing without thought.
Software professionals (whether programmers or support staff) marvel at the
stupidity
of dismissing dire warnings but fail to appreciate that
nearly every time the user dismisses such a warning, no dire
consequences follow. That is partly because modern purveyors of malware have
learned to be less obtrusive than the attention-seeking script kiddies of the
90s, so that bad things happen without the user noticing (at least at the time);
but it's also because programmers commonly make a crucial mistake in deciding
when to hassle the user with a warning.
To take an example, many warnings relate to security issues; the innocent
user browsing the internet can all too easily be tricked into doing unwise
things, so the considerate programmer – on discovering a class of unwise
thing the user might be tricked into doing – endeavours to devise some way
for the program to test whether what the user is doing might fall into that
class of folly. If the test usually spots the instances when the user
is making a mistake and seldom hassles the user otherwise, the
programmer uses the test to decide when to warn the user. This is indeed the
right thing to do – provided that seldom
is actually rare
enough.
The problem is assessing how seldom is rare enough. If nine out of ten foolish acts trigger the warning and only one in ten non-foolish acts does so, all too often the programmer thinks that's rare enough – and this is how we have trained users to dismiss warnings without reading, let alone thinking about, them. It's not good enough to make warnings rarer, among times they're not needed, than they are among times they are needed; unwarranted warnings must be rare among warnings – and situations which don't warrant the warning are much more common than ones that do.
Let's go back to our security warning example. There's some class of context in which the test is applied; every time the program hits such a context, it uses the test to decide whether to warn the user. Now, suppose that one time in a thousand, that such a context arises, the potential threat is really there; let's be optimistic and suppose that, among times the threat is there, the test always causes a warning; but one time in twenty that there's no real problem the test also causes a warning. So let's consider a thousand instances of the context in which the program runs the test: one of those really did need a warning; but (roughly) fifty of the others get warnings too. Now look at this from the user's point of view: it's almost certain that the first few times the user meets this warning it's misguided, so the user learns to ignore it. One time in fifty, that's a mistake but 49 times out of 50 it was the right thing to do.
What this means is that, to assess whether the test is good enough to be used to trigger a warning, you need to assess what proportion, of times that the test is going to be invoked, there really is a problem. When that's a small proportion, the test's false warnings need to be rarer than its true warnings by a factor that's small even compared to that proportion. In the example above, if the truly scary situations are one in a thousand, and the test always detects them, then it can't afford to be wrong about the safe (enough) situations even as often as one time in a thousand – that's the point where, half the time the user sees the warning, it's irrelevant.
Written by Eddy.