Were the 58-61,000 Internet Targets Part of NSA’s 73,000 Targets?
As I noted, Google, Yahoo, and Microsoft all released transparency reports today.
During the second half of 2012, Microsoft had FISA requests affecting 16,000-16,999 accounts, Google had 12,000 – 12,999. We don’t have Yahoo’s numbers for that period, but for the following six month period they had requests affecting 30,000 – 30,999 accounts; given that numbers for the other two providers dropped during this six month period, it’s likely Yahoo’s did too, so the 30,000 is conservative for the earlier period. So the range for the big 3 email providers in that period is likely around 58,000 – 60,997. [Update: Adding FaceBook would bring it to 62,000 – 64,996. h/t CNet]
I’d like to compare what they report with what this report on FISA Amendments Act compliance shows. I think pages 23 through 26 of the report show that NSA had an average of 73,103 selectors selected via NSA targeting on any given day during the period from June 1, 2012 to November 30, 2012. That’s because the notification delays from the period (212 — see page 26) should be .29% of the average daily selectors (see amount on 23 less amount without the notification delays on page 34).
But remember: these are not the same measurement. The government report number is based on average daily selectors, so it reflects the total of selectors tasked on any given day. Whereas the providers are (I think the numbers must therefore show) the total number of customer selectors affected across the entire 6-month period, and they almost certainly weren’t all tasked across the entire 6 month period (though some surely were).
There’s one possible (gigantic) flaw in this logic. The discussion of the FBI targeting is largely redacted in the government memo. And there have been hints — pretty significant ones — that the FBI takes the lead with the PRISM providers. if so, these numbers are totally unrelated.
Also remember, there are at least two other kinds of 702 targeting: the upstream collection that makes up about 9% of the volume of 702 collection, and phone collection, which is going up again.
This would sure be a lot easier if the government actually backed its claims to transparency.
Here’s Facebook’s release of its latest data about National Security Requests:
http://newsroom.fb.com/Content/Detail.aspx?ReleaseID=797
And here’s LinkedIn’s release of its latest data about National Security Requests:
http://help.linkedin.com/app/answers/detail/a_id/41878/h/c
@Snoopdido: Also note Facebook’s Global Government Requests Report for the 6 month period ending June 30 2013 at https://www.facebook.com/about/government_requests
In particular, note the numbers for the United States during that period:
Total requests: 11,000-12,000
User/Accounts requested: 20,000-21,000
2008 called. It wants its Hope and Change Optimism back.
Maybe I’ve become confused about the definition of the term “selector” as it’s been applied here.
I had been under the impression that a “selector” was a term — which could be a name, user ID, IP address, or just an abstract concept expressed in a string — that the NSA’s or other Eyes’ traffic filtering and database query systems swept for in all intercepted or collected data.
So if my name were a “selector”, I thought that meant that the systems would flag traffic from my Gmail address if the traffic were available in plaintext, but I also assumed that it also indicated that queries would find traffic to/from AdamColligan.net , unencrypted instant messages and SMS messages where my English friends make fun of my silly accent, and e-mail content collected under NSLs or other FISA processes where anyone was mentioning me, regardless of whether my account was involved.
I thought that the 4- and 5-figure account data scales that the tech companies are now releasing was a separate concept: the number of accounts touched by orders that the companies actively responded to. I didn’t think that a selector automatically represented a FISA demand to a tech company for account data or metadata (especially since many selectors aren’t even people’s names, and some selectors that are names will be represented by multiple accounts).
Am I behind the curve on this?
@Adam Colligan:
This may help clarify some of what’s going on. This is how it explains “selectors”: “name, email address, telephone number, IP address, keywords, and even language or type of Internet browser.” But that link is about PRISM/XKeyScore, so I’m not sure how exactly that relates to s. 215 stuff.
@Anonsters: I think ‘selectors’ is one of those weasel words that has been used to confuse.
I cannot immediately identify where I saw it but I recollect seeing a claim by one of the NSA-types (Alexander, I think) that they have only ever searched on 300 ‘selectors’, or maybe 300 in a particular year. When I saw this claim I thought to myself, he is identifying categories of data, not specific values. ‘Colligan’ is a value within the ‘surname’ selector for example. So, what they were claiming is they only have 300 categories of information they use to select targets.
So, by simply selecting everyone with ‘surname’ = ‘colligan’ they are creating a single selection but the resulting set of values that is returned could be everyone with the same last name as yours.
That would explain how they can claim to only have 300 ‘selectors’ but as we can see by the numbers above are then returning result sets containing 10’s of thousands of ‘targets’.
When you think about the categories of information they may use to profile an individual I suspect it would be pretty easy to come up with 300 different features including phone#, email, surname, etc.
I also suspect that they can use partial matching in their selectors. So, by asking for ‘surname’ like ‘%olli%’ they can retrieve not only people with ‘Colligan’ as a surname but also ‘Holliday’ as both contain ‘olli’. As well as this partial match they also probably use phonetic matching so ‘surname’ soundslike ‘colligan’ might return ‘calligen’, ‘coligen’, etc.
In short, if their use of ‘selectors’ is a weasel word as I suspect, they are talking about categories of information rather than actual values. And by using any combination of the 300 categories of information they have used to profile everyone they can get vast result sets returned, maybe millions of targets.
If that is not the case, I’d be interested to know how ‘only 300 selectors’ now appears to have blown up into 70,000 targets.
Ah, found it. Alexander: “In 2012, less than 300 selectors were approved for reasonable, articuable suspicion with that database”
See here: http://icontherecord.tumblr.com/post/57804700833/general-keith-alexander-director-national and search () for “300”.
@Greg Bean (@GregLBean):
I think this is more or less in line with my previous understanding (though I appreciate the texture of “selector” possibly being a scale leap from “term” or “selector term” or “selector string”).
And so what I’m curious about is how selector numbers can be matched up with numbers of accounts touched by requests that tech companies can see, which is what this post does. The possibilities to me seem to be:
1. It is a coincidence that the number of selector terms and the number of accounts touched at major tech firms is roughly similar, and this is just throwing people off.
2. Most selectors are in fact people’s names, most people have an account at a major tech firm, and the NSA has a habit of converting each name selector into a a request set to email/social network providers. So the two numbers do in fact represent matching, just indirect matching.
3. The term “selector” here is being used in a new way, and represents a different set of terms, from where we have seen it before in PRISM/XKEYSCORE and cable-tapping operations. There is one set of selectors for that traffic and that is used to query databases of what is collected from providers after it is collected. But there is a wholly different population of “selectors” which are just people’s names or email account names, and this population is defined by the fact that they are the set of names/accounts that are provided to tech firms in the form of requests for either metadata or content.