Monday, May 15, 2006

Spying is Hard Work, Especially When You Failed Math.

"We're not trading privacy for security; we're giving up privacy and getting no security in return." –Bruce Schneier

Schneier provided an excellent overview in March of the major (implementation) problem with searching for a terrorist needle in a civilian haystack, as the government is apparently doing. This argument is made especially pertinent now that we find the scope of data collection is so broad. I think it's probably worthwhile to restate his thesis using numbers more reflective of the situation as we currently understand it.

As the UK apparently has about 700 members of Al Qaeda within its borders for its population of 61 million, let's apply that percentage of 0.001148% to the United States population (300 million) to project 3,444 members of Al Qaeda in the U.S.

(But wait! According to the Terrorist Screening Center, there are 200,000 terrorists. All right. We'll use that number. Depressingly enough, it makes little difference in the end how many terrorists there actually are.)

Now, according to the Consumer Electronics Association, there are roughly 1 billion phone calls placed each day in the United States, on 270 million landline and 200 million cellular telephones, for a total of 470 million. As the NSA is monitoring not the content of the calls but just which telephone called which other telephone, we can simply divide to find that 2 calls, on average, are placed per day from each domestic phone.

Whew. So, the NSA intends to identify each of 2 daily calls placed by the 200,000 terrorists in the United States. Call it 400,000 calls, then, out of the one billion made daily. And this is where the base rate fallacy enters the picture.

The base rate fallacy is a great, official-sounding name for the phenomenon that humans are bad at intuiting statistical relationships between data unless a relationship is obviously apparent. Put more simply, statistics is not intuitive, and people are bad at it. This is important because it tells us that any large-scale surveillance system is destined to fail when what it's trying to pick out is a very rare event, like a terrorist calling another one, from among all of the phone calls that take place in the United States.

The base rate of incidence in this situation is 400,000 calls out of 1 billion total calls, or 0.004 (0.4%). 0.4% of calls are the ones the NSA wants to catch (and it doesn't want to miss any – no false negatives), and presumably it wants to catch no others (no false positives). We'll now re-use Schneier's "optimistic" stats that the NSA has 99.9% accuracy against false negatives, and 99% accuracy against false positives (even though systems of this sort tend to generate "oodles" of false positives, according to an expert social network analyst).

Working the numbers:

  1. 0.004 * 0.999 = 0.003996 = 399,600 terrorist calls are properly classified. (success)
  2. 0.004 * 0.001 = 0.000004 = 400 terrorist calls are improperly classifed. (failure: false negative)
  3. 0.996 * 0.99 = 0.98604 = 98,604,000 innocent calls are properly classified. (success)
  4. 0.996 * 0.01 = 0.00996 = 996,000 innocent calls are improperly classified. (failure: false positive)

First of all, this system concludes that almost 1.4 million calls each day involve a terrorist, but of these, only 28% actually do. This is counterintuitive, but statistically true. Does this sound like a worthwhile surveillance system? How can this volume of leads possibly be handled on a daily basis?

Even worse, though, this means that almost a million innocent ones are misidentified. How are we supposed to accept these odds, especially when the system still lets 400 calls it should've flagged fall through the cracks each day? Can we really even consider trading privacy for security when the security is demonstrably not there? The math doesn't work: this system cannot succeed.

Thursday, May 11, 2006

We All Work With Computers.

Apple's Final Cut Pro interview with Walter Murch inadvertently offers a fascinating example of how switching to new, electronic tools can alter the creative process in subtle but harmful ways. According to Murch, sound / film editor extraordinaire, film editing through purely digital means brings about a particularly interesting effect: it alters his work flow subconsciously.

“How much detail I see around the eyes of the characters subconsciously determines my [shot] choices,” he says. “The lower the resolution, the more I tend to use close-ups. With higher resolution, I feel confident using a wider shot, or a longer shot, because you can clearly see what a character’s eyes are doing, which is to say what the character is thinking.”

Ignoring the remarkable astuteness of this observation, this is a great example of how digitization can make non-obvious changes to users' behavior. Even if the editor's analog workflow was painstakingly deconstructed, analyzed, and replicated digitally, I would argue that a competent analysis of the shot selection process could fail to bring this criteria to light. It makes me nervous to see things like this, because common interaction analysis methods tend to gloss over the professional's actual thought processes in favor of more high-level goal-based models and artifact-based assumptions. We already know that digitization of traditional workflows can potentially be harmful; what we unfortunately don't yet know is how to quantify these sorts of subtle, harmful effects.

Sunday, May 07, 2006

Plus plus.

Maintenance release Vocabulicious 1.04 is finally out, and (among other things) the timer weirdness is fixed. Have at it.

(Does anyone who reads this play it?)