Monday, May 15, 2006

Spying is Hard Work, Especially When You Failed Math.

"We're not trading privacy for security; we're giving up privacy and getting no security in return." –Bruce Schneier

Schneier provided an excellent overview in March of the major (implementation) problem with searching for a terrorist needle in a civilian haystack, as the government is apparently doing. This argument is made especially pertinent now that we find the scope of data collection is so broad. I think it's probably worthwhile to restate his thesis using numbers more reflective of the situation as we currently understand it.

As the UK apparently has about 700 members of Al Qaeda within its borders for its population of 61 million, let's apply that percentage of 0.001148% to the United States population (300 million) to project 3,444 members of Al Qaeda in the U.S.

(But wait! According to the Terrorist Screening Center, there are 200,000 terrorists. All right. We'll use that number. Depressingly enough, it makes little difference in the end how many terrorists there actually are.)

Now, according to the Consumer Electronics Association, there are roughly 1 billion phone calls placed each day in the United States, on 270 million landline and 200 million cellular telephones, for a total of 470 million. As the NSA is monitoring not the content of the calls but just which telephone called which other telephone, we can simply divide to find that 2 calls, on average, are placed per day from each domestic phone.

Whew. So, the NSA intends to identify each of 2 daily calls placed by the 200,000 terrorists in the United States. Call it 400,000 calls, then, out of the one billion made daily. And this is where the base rate fallacy enters the picture.

The base rate fallacy is a great, official-sounding name for the phenomenon that humans are bad at intuiting statistical relationships between data unless a relationship is obviously apparent. Put more simply, statistics is not intuitive, and people are bad at it. This is important because it tells us that any large-scale surveillance system is destined to fail when what it's trying to pick out is a very rare event, like a terrorist calling another one, from among all of the phone calls that take place in the United States.

The base rate of incidence in this situation is 400,000 calls out of 1 billion total calls, or 0.004 (0.4%). 0.4% of calls are the ones the NSA wants to catch (and it doesn't want to miss any – no false negatives), and presumably it wants to catch no others (no false positives). We'll now re-use Schneier's "optimistic" stats that the NSA has 99.9% accuracy against false negatives, and 99% accuracy against false positives (even though systems of this sort tend to generate "oodles" of false positives, according to an expert social network analyst).

Working the numbers:

  1. 0.004 * 0.999 = 0.003996 = 399,600 terrorist calls are properly classified. (success)
  2. 0.004 * 0.001 = 0.000004 = 400 terrorist calls are improperly classifed. (failure: false negative)
  3. 0.996 * 0.99 = 0.98604 = 98,604,000 innocent calls are properly classified. (success)
  4. 0.996 * 0.01 = 0.00996 = 996,000 innocent calls are improperly classified. (failure: false positive)

First of all, this system concludes that almost 1.4 million calls each day involve a terrorist, but of these, only 28% actually do. This is counterintuitive, but statistically true. Does this sound like a worthwhile surveillance system? How can this volume of leads possibly be handled on a daily basis?

Even worse, though, this means that almost a million innocent ones are misidentified. How are we supposed to accept these odds, especially when the system still lets 400 calls it should've flagged fall through the cracks each day? Can we really even consider trading privacy for security when the security is demonstrably not there? The math doesn't work: this system cannot succeed.

4 Comments:

At 7:44 AM, Blogger Mitch Krpata said...

Can't remember who said this, but the way I heard this program described was "Trying to find a needle in a haystack by making the haystack bigger."

 
At 7:18 PM, Anonymous Anonymous said...

Interesting, I had never thought to work out the math. Very silly.

 
At 4:28 PM, Anonymous Anonymous said...

... poast?

 
At 5:13 PM, Blogger avixe said...

Quiet, you.

 

Post a Comment

<< Home