NSA Should Have Addressed Its Upstream Problem in 2013

I Con the Record has released a slew of documents pertaining to last year’s problem with upstream searches, including the opinion ultimately approving new certifications. I’m doing a working thread and suspect I will have concerns about FISC oversight that I haven’t had on past such reviews.

But for now, I’m aghast at this paragraph and accompanying footnote, describing how NSA’s office of compliance and IG were trying to get a grasp on the problems.

In anticipation of the January 31 deadline, the government updated the Court on these querying issues in the January 3, 2017 Notice. That Notice indicated that the IG’s follow-on study (covering the first quarter of 2016) was still ongoing. A separate OCO review, limited in many of the same ways as the IG studies, and covering the periods of April through December 2015 and April through July of 2016, found that some redacted] [improper queries were conducted by [redacted] analysts during those periods.21 The January 3, 2017 Notice stated that “human error was the primary factor” in these incidents, but also suggested that system design issues contributed. For example, some systems that are used to query multiple datasets simultaneously required analysts to “opt-out” of querying Section 702 upstream Internet data rather than requiring an affirmative “opt-in,” which, in the Court’s view, would have been more conducive to compliance. See January 3, 2017 Notice at 5-6. It also appeared that NSA had not yet fully assessed the scope of the problem: the IG and OCO reviews “did not include systems through which queries are conducted of upstream data but that do not interface with NSA’s query audit system.” Id. at 3 n.6. Although NSD and ODNI undertook to work with NSA to identify other tools and systems in which NSA analysts were able to query upstream data, id., and the government proposed training and technical measures, it was clear to the Court that the issue was not yet fully scoped out.

21 NSA further reported that OCO reviewed queries involving a number of identifiers for known U.S. persons who were not targets under Sections 704 or 705(b) of the Act, and which were associated with “certain terrorism-related events that had occurred in the United States.” January 3, 2017 Notice at 6. NSA OCO found [redacted] such queries, [redacted] of which improperly ran against Section 702 upstream Internet data. [redacted] of the improper queries were run in a system called [redacted] which NSA analysts use to of a current or prospective target of NSA collection, including under Section 702. Id. at 6-7. [my emphasis]

This passage seems to reveal several things: that NSA was querying upstream content before identifying whether something could be used as a target (which I suspect means it involved a triage process). It reveals that not all queries are being audited!!!!

And it also reveals that one reason NSA analysts were collecting upstream data is because over three years after DOJ and ODNI had figured out analysts were breaking the rules because they forgot to exclude upstream from their search, they were still doing so. Overseers noted this back in 2013!

NSA [redacted] incidents of non-compliance with this subsection of its minimization procedures, many of which involved analysts inadvertently searching upstream collection. For example, [redacted], the NSA analyst conducted approved querying with United States persons identifiers ([long redaction]), but inadvertently forgot to exclude Section 702-acquired upstream data from his query.

This problem should have been fixed in the first full period when they were doing upstream searches. But for some reason … NSA never did.

Update: This language seems to say that this problem existed for the entire time they were conducting upstream in the 2011 fashion.

In May and June 2016, NSA reported to oversight personnel in the ODNI and DOJ that, since approximately 2012, use of to query communications in had resulted in inadvertent violations of the above-described querying rules for Section 702 information. Id. The violations resulted from analysts not recognizing the need to avoid querying datasets for which querying requirements were not satisfied or not understanding how to formulate queries to exclude such datasets. Id. at 1-2.

Verizon Gets Out of the Upstream Surveillance Business

Even as the privacy world has been discussing how NSA got out of one kind of the upstream collection business on April 28, most people overlooked that someone else got out of the upstream collection business almost entirely just a few days later. That’s when Verizon finalized its sale of a big chunk of its data centers — including the ones used for Stormbrew collection — to Equinix. (h/t to SpaceLifeForm for reminding me)

When Equinix announced the $3.6B cash purchase in December, it emphasized the Miami data center — though which much of the traffic from Latin America passes on to the rest of the world — and the Culpepper site serving the National Security world.

  • The NAP (Network Access Point) of the Americas facility in Miami is a key interconnection point and will become a strategic hub and gateway for Equinix customer deployments servicing Latin America. Combined with the Verizon data centers in Bogotá and the NAP do Brasil in São Paulo, it will strategically position Equinix in the growing Latin American market.
  • The NAP of the Capital Region in Culpeper, VA is a highly secure campus focused on government agency customers, strengthening Equinix as a platform of choice for government services and service providers.

The purchase also expands Equinix’s presence in Silicon Valley.

Mind you, spying infrastructure has continued to evolve since Snowden documents elucidated where the Stormbrew collection points were and what they did. So maybe these data centers are no longer key “chokepoints’ (as the NSA called them) of American spying.

But if they are, then Verizon is no longer the one sifting through your data.

The Webster Report Recommendations and FBI’s Federated Back Door Searches

Back in 2013, in the context of a discussion of back door searches, I noted William Webster’s reference, in his report on the Nidal Hasan investigation, to using FISA communications with key targets as tripwires for further investigation, The following spring, in response to Bob Litt’s proclamation that it would be “impracticable” to require the government to count back door searches, I returned to Webster’s recommendations on fixing FBI’s archaic database access to make it easier to match communications from the same user (starting at 140). I suggested that back door searches — particularly their expansion in 2011 — might be a response to his recommendations.

To be fair, I suspect one of the issues is that after the Nidal Hasan attack (and this is just a very well educated guess), NSA rolled out a system whereby new communications between a targeted foreigner and an American automatically pulls up all previous communications involving that US person. That would count as a search, even though it would effectively feel like an automatic cross-referencing of all prior communications involving someone talking to a target, even if that is a US person.

Nevertheless, this means that NSA is conducting so many back door searches on US person data that it would be “impracticable” to actually give those searches some kind of review.

Not long after this hearing, we learned FBI was the agency for which it was impracticable to count back door searches, not NSA.

In the FISA court hearing on October 20, 2015 over whether FBI should provide individual justifications for back door searches, one of the government’s [redacted] lawyers explained that the way federated searches integrate back door searches indeed did come directly from the Webster Report recommendations.

To use an example more recent and even more on point, the Webster Commission’s report on the Fort Hood attack criticized the government’s queries of information in its possession. The people doing the assessment of Nidal Hasan did not identify several messages between Anwar Aulaqi and Nidal Hasan, and the commission deemed it essential that the FBI possess the ability to search all of its repositories and to do so without balkanizing those data sources.

And so these systems that do these federated queries that allow us to, yes, to query the 702 information, but all of these sources are in direct response to those findings, and they’re in direct response to our efforts over the last 15 years to bring down this artificial wall between the law enforcement mission of the FBI and its national security intelligence mission.

Reading this transcript reminded me that, back in 2014, I imagined all this would be automatic — not so much a search, but an interlinked search that would automatically pull up existing content.

There’s reason to believe that model, and the back door access at CIA and NSA to content (which was approved in 2011), was designed to work similarly.

One of the documents recently liberated by ACLU makes it clear that NSA’s metadata back door searches of 702 content are, in some way, automated, such that counts of such queries are counted using algorithms and business rules.

NSA will rely on an algorithm and/or a business rule to identify queries of communications metadata derived from the FAA 702 [redacted] and telephony collection that start with a United States person identifier. Neither method will identify those queries that start with a United States person identifier with 100 percent accuracy.

The I Con the Record report notes the back door content search number, which combined CIA and NSA, is also an estimate, which may suggest it is also counted algorithmically as well (though these are reviewed more closely in compliance reviews). In any case, CIA’s switch from counting each query using a US person identifier to counting each US person identifier queried leads me to suspect it — and NSA — use more of a tasking model, where certain US person identifiers automatically trigger for the period they’re tasked; at the NSA, at least, the duration of approval to do back door searches is either tied to the underlying probable cause FISA order or to a deadline set by the approving authority.

Finally, a Snowden document dating to March 2012 (when NSA was still setting up back door searches) shows that an NSA triage program would first walk users through methods to prioritize communications based off metadata, then have links to access the content directly.

At the time, the sole authority listed was EO 12333, but as noted, this is precisely when they were implementing back door searches on 702 content.

None of this is all that surprising (but hey! Yay me for understanding precisely where back door searches came from three years ago).

But it suggests as we talk about “back door searches,” what we’re really talking about — at least when looking at access programs like the one above — is automatic notice that back door content exists, where content is just a click away.

CIA or NSA Warrantlessly Accessed the Content of More than 300 US Persons (Probably More than 1,300) Who Aren’t Terror Suspects

Because Circa did a really sloppy report on the I Con the Record Transparency Report and Rand Paul quoted, there is a great deal of confusion about what back door searches are.

With the help of the NSA, the FBI collects information via traditional FISA orders. They got 1,559 of them last year, of which 1,477 were targeted at someone in the United States, and of which 336 were targeted at American citizens or permanent residents. All that data goes into a cloud server at the FBI and a separate one at NSA.

In addition, NSA collects information targeted at people overseas under Section 702. FBI can also ask NSA to collect on people they’ve come across in their investigations. Altogether, NSA collected on over 106,000 individual targets last year, via both upstream collection and by asking American providers (Google, Facebook, Yahoo, and the like) for any data they’ve got on those 106,000 targets. They’ll get both sides of targets’ conversations, stored documents and photos, calendar information, and other information.

After NSA gets that information, it will share the parts of that are most relevant to the CIA and the FBI’s missions with them, in raw form. At the FBI, that data is stuck on the same cloud server as the domestic-focused FISA data is in. It is understood that FBI receives any terrorism, counterproliferation, or spying data that has a domestic component (such as Russian spies or ISIS recruiters trying to recruit Americans).

All three agencies — NSA, CIA, and FBI — can then search their own collections of FISA information using the identifier of a US person (a citizen or permanent resident). At NSA and CIA, the analyst has to have a foreign intelligence purpose, such as they think Russians are trying to recruit Mike Flynn. At FBI, an agent has to be looking for criminal information, national security information, or even doing an assessment (such as to figure out whether Carter Page would make a good informant on what the Trump campaign is doing). FBI does so many of these searches they can’t count them.

If there are conversations involving these people in the relevant databases, it appears to the analyst or agent in unmasked form. Yes, if CIA and NSA want to write reports to the White House about what they found, then the name might be masked (but in the vast majority of reports based off 702 reports involving US persons — perhaps 74% — the US person identities eventually get unmasked), but the FBI may dump that data into investigative files.

To understand how and who this might impact in the United States, take this comment from Jim Comey the other day. When asked how many active terrorist investigations the FBI has, he said there were 1,000 investigations where the target was known to be talking to terrorist overseas, and 1,000 where the target embraced radicalism all by him or herself, without talking to an ISIS or any other overseas recruiter.

COMEY: Yes I do. If — we have about 1,000 home grown violent extremist investigations and we probably have another 1,000 or so that are — I should define my terms. Home grown violent extremists, we mean somebody — we have no indication that they’re in touch with any terrorists.

TILLIS: Any foreign touch. Right.

COMEY: Yes. Then we have another big group of people that we’re looking at who we see some contact with foreign terrorists. So you take that 2,000 plus cases, about 300 of them are people who came to the United States as refugees.

Let’s take the higher number, and say there are 2,000 people in the US the intelligence community thinks might be terrorists or susceptible to being convinced to become one.

Now let’s look at the back door search numbers. The NSA used the identifiers (say, their cell phone identifier or their email) of US persons and searched the metadata from their stash of 702 data 30,355 times last year. (The CIA and FBI refuse to count how many metadata searches they did.) That means that NSA tried to do a network analysis on over 28,000 Americans and permanent residents who are not the subject of investigations by the FBI for being terrorists.

Between CIA and FBI combined, they did 5,288 queries on US persons last year. Back in 2013, the CIA did far more searches than the NSA (on 1,400 selectors as compared to NSA’s 198); we don’t know how the split works now. But assume that at least one agency is doing at least 2,644 searches. At the NSA, all 336 traditional FISA targets can be (and I assume are) tasked for back door searches; presumably a chunk of the 336 people targeted under are being investigated for terrorism, though that would also include people like (allegedly) Carter Page, people the FBI has gotten the FISA court to believe are agents of foreign powers). But even if we assume none of the people targeted under FISA are terrorists and all domestic terrorists are being back door searched at NSA, that leaves over 300 people (2,644 – 1,000 – 1,000 – 336) who are having their content accessed without a warrant by the NSA (to say nothing of the FBI, which does it so often it can’t count it). The number is probably higher, though, given that 1,000 of those terrorist suspects aren’t conversing with foreigners. The NSA (or CIA) is only going to access content if they know it exists from metadata, and Comey comment suggests there’s no metadata indicating such conversations. And at least some of those 336 targeted US persons are terror suspects.

Which means one agency — NSA or CIA — is likely accessing the raw content of 1,300 people who aren’t terrorist suspects.

That’s fine. There are other things they might be: suspected weapons proliferators, suspected Russian or Chinese spies, people the government is worried are being recruited by spies, suspected hackers, suspected leakers, Americans who’ve been kidnapped.

But the numbers make clear that the presumption that all of this spying is targeted at terrorists is simply wrong. There are at least 300 people — and probably more like 1,300 people — who even the NSA is accessing the content of without a warrant who are not terrorist suspects.

And the number at FBI is so high it can’t count it.

Hemisphere 2.0

As I note in an update to this post, Charlie Savage is very cross I did some math. On top of making a hilariously bad misreading of my original post — claiming I said a number was implausible even though I said it was plausible on at least five occasions, including the headline — and making a number of other errors about how the phone dragnet works, he bitches that I go through the effort of laying out what the 151 million call event might actually mean. (As always, Charlie doesn’t hold himself to the standards of correction he demands I do, either in the NYT or on posts like this.)

The reason you do that is to lay out assumptions.

And I’ve realized two things about how we’re counting numbers. First, one source of redundancy no one has considered is a SIM/handset redundancy.

One thing phone dragnets are designed to do is correlate identities: track the various identities a suspect and his associates are using, so as to ensure you’re tracking all their possible communications. With cell phones, one thing you want to track is whether someone is swapping out SIM cards. This collection starts with identifiers from EO 12333 collection, which we know is stored logically by IMEI/IMSI. It is possible that providers get both those identifiers as separate identifiers and provide two separate streams of data, especially if they don’t coincide.

If that were the standard practice, it would mean there’d often be a dual set of identical call records.

The more interesting issue is telecom retention. As I Con the Record notes, a request will return historical, current, and prospective call records. We’ve talked a lot about minimum retention (and the two year data handshake that Verizon and T-Mobile agreed to). But we haven’t talked about maximal retention.

As I noted, AT&T has call records going back decades, collected on any call that crossed its lines. We know that under the Hemisphere program, it usually could come up with call records for phones, whether or not they were AT&T customers. That means that the government could always submit requests to AT&T (again, whether or not the target used AT&T as a provider, because the target would surely have used AT&T’s backbone), and get years of records for the handset and SIM, if they existed, as well as for the two hops. This data would effectively create a mini-Hemisphere for the cluster around a given target, including call records for far more than the five years NSA used to be able to obtain data (though they might only retain that decades old data for 5 years).

I’m not saying I think they’re doing that — I don’t. In public testimony, NSA and other agency officials have conceded that data really is most valuable in the first two years, so obtaining 20 years of data would just load down NSA with false positives.

But it is a possibility — one that I hope Congress considers.

One Takeaway from the Five Takeaways from the Comey Hearing: Election 2016 Continues to Suffocate Oversight

The Senate Judiciary Committee had an oversight hearing with Jim Comey yesterday, which I live-tweeted in great depth. As you can imagine, most of the questions pertained either to Comey’s handing of the Hillary investigation and/or to the investigation into Russian interference in the election. So much so that The Hill, in its “Five Takeaways from Comey’s testimony,” described only things that had to do with the election:

  • Comey isn’t sorry (but he was “mildly nauseous” that his conduct may have affected the outcome)
  • Emotions over the election are still raw
  • Comey explains DOJ dynamic: “I hope someday you’ll understand”
  • The FBI may be investigating internal leaks
  • Trump, Clinton investigations are dominating FBI oversight

The Hill’s description of that third bullet doesn’t even include the “news” from Comey’s statement: that there is some still-classified detail, in addition to Loretta Lynch’s tarmac meeting with Bill Clinton and the intercepted Hillary aide email saying Lynch would make sure nothing happened with the investigation, that led Comey to believe he had to take the lead on the non-indictment in July.

I struggled as we got closer to the end of it with the — a number things had gone on, some of which I can’t talk about yet, that made me worry that the department leadership could not credibly complete the investigation and declined prosecution without grievous damage to the American people’s confidence in the — in the justice system.

As I said, it is true that most questions pertained to Hillary’s emails or Russia. Still, reports like this, read primarily by people on the Hill, has the effect of self-fulfilling prophecy by obscuring what little real oversight happened. So here’s my list of five pieces of actual oversight that happened.

Neither Grassley nor Feinstein understand how FISA back door searches work

While they primarily focused on the import of reauthorizing Section 702 (and pretended that there were no interim options between clean reauthorization and a lapse), SJC Chair Chuck Grassley and SJC Ranking Member Dianne Feinstein both said things that made it clear they didn’t understand how FISA back door searches work.

At one point, in a discussion of the leaks about Mike Flynn’s conversation with Sergey Kislyak, Grassley tried to suggest that only a few people at FBI would have access to the unmasked identity in those intercepts.

There are several senior FBI officials who would’ve had access to the classified information that was leaked, including yourself and the deputy director.

He appeared unaware that as soon as the FBI started focusing on either Kislyak or Flynn, a back door search on the FISA content would return those conversations in unmasked form, which would mean a significant number of FBI Agents (and anyone else on that task force) would have access to the information that was leaked.

Likewise, at one point Feinstein was leading Comey through a discussion of why they needed to have easy back door access to communication content collected without a warrant (so we don’t stovepipe anything, Comey said), she said, “so you are not unmasking the data,” as if data obtained through a back door search would be masked, which genuinely (and rightly) confused Comey.

FEINSTEIN: So you are not masking the data — unmasking the data?

COMEY: I’m not sure what that means in this context.

It’s raw data. It would not be masked. That Feinstein, who has been a chief overseer of this program for the entire time back door searches were permitted doesn’t know this, that she repeatedly led the effort to defeat efforts to close the back door loophole, and that she doesn’t know what it means that this is raw data is unbelievably damning.

Incidentally, as part of the exchange wit Feinstein, Comey said the FISA data sits in a cloud type environment.

Comey claims the government doesn’t need the foreign government certificate except to target spies

Several hours into the hearing, Mike Lee asked some questions about surveillance. In particular, he asked if the targeting certificates for 702 ever targeted someone abroad for purposes unrelated to national security. Comey seemingly listed off the certificates we do have — foreign government, counterterrorism, and counterproliferation, noting that cyber gets worked into other ones.

LEE: Yes. Let’s talk about Section 702, for a minute. Section 702 of the Foreign Intelligence Surveillance Amendments Act authorizes the surveillance, the use of U.S. signals surveillance equipment to obtain foreign intelligence information.

The definition includes information that is directly related to national security, but it also includes quote, “information that is relevant to the foreign affairs of the United States,” close quote, regardless of whether that foreign affairs related information is relevant to a national security threat. To your knowledge, has the attorney general or has the DNI ever used Section 702 to target individuals abroad in a situation unrelated to a national security threat?

COMEY: Not that I’m aware of. I think — I could be wrong, but I don’t think so, I think it’s confined to counterterrorism to espionage, to counter proliferation. And — those — those are the buckets. I was going to say cyber but cyber is fits within…

He said they don’t need any FG information except that which targets diplomats and spies.

LEE: Right. So if Section 702 were narrowed to exclude such information, to exclude information that is relevant to foreign affairs, but not relevant to a national security threat, would that mean that the government would be able to obtain the information it needs in order to protect national security?

COMEY: Would seem so logically. I mean to me, the value of 702 is — is exactly that, where the rubber hits the road in the national security context, especially counterterrorism, counter proliferation.

I assume that Comey said this because the FBI doesn’t get all the other FG-collected stuff in raw form and so isn’t as aware that it exists. I assume that CIA and NSA, which presumably use this raw data far more than FBI, will find a way to push back on this claim.

But for now, we have the FBI Director stating that we could limit 702 collection to national security functions, a limitation that was defeated in 2008.

Comey says FBI only needs top level URLs for ECTR searches

In another exchange, Lee asked Comey about the FBI’s continued push to be able to get Electronic Communication Transaction Records. Specifically, he noted that being able to get URLs means being able to find out what someone was reading.

In response, Comey said he thought they could only get the top-level URL.

After some confusion that revealed Comey’s lie about the exclusion of ECTRs from NSLs being just a typo, Comey said FBI did not need any more than the top domain, and Lee answered that the current bill would permit more than that.

LEE: Yes. Based on the legislation that I’ve reviewed, it’s not my recollection that that is the case. Now, what — what I’ve been told is that — it would not necessarily be the policy of the government to use it, to go to that level of granularity. But that the language itself would allow it, is that inconsistent with your understanding?

COMEY: It is and my understanding is we — we’re not looking for that authority.

LEE: You don’t want that authority…

(CROSSTALK)

COMEY: That’s my understanding. What — what we’d like is, the functional equivalent of the dialing information, where you — the address you e-mailed to or the — or the webpage you went to, not where you went within it.

This exchange should be useful for limiting any ECTR provision gets rushed through to what FBI claims it needs.

The publication of (US) intelligence information counts as intelligence porn and therefore not journalism

Ben Sasse asked Comey about the discussion of indicting Wikileaks. Comey’s first refusal to answer whether DOJ would indict Wikileaks led me to believe they already had.

I don’t want to confirm whether or not there are charges pending. He hasn’t been apprehended because he’s inside the Ecuadorian embassy in London.

But as part of that discussion, Comey explained that Wikileaks’ publication of loads of classified materials amounted to intelligence porn, which therefore (particularly since Wikileaks didn’t call the IC for comment first, even though they have in the past) meant they weren’t journalism.

COMEY: Yes and again, I want to be careful that I don’t prejudice any future proceeding. It’s an important question, because all of us care deeply about the First Amendment and the ability of a free press, to get information about our work and — and publish it.

To my mind, it crosses a line when it moves from being about trying to educate a public and instead just becomes about intelligence porn, frankly. Just pushing out information about sources and methods without regard to interest, without regard to the First Amendment values that normally underlie press reporting.

[snip]

[I]n my view, a huge portion of WikiLeaks’s activities has nothing to do with legitimate newsgathering, informing the public, commenting on important public controversies, but is simply about releasing classified information to damage the United States of America. And — and — and people sometimes get cynical about journalists.

American journalists do not do that. They will almost always call us before they publish classified information and say, is there anything about this that’s going to put lives in danger, that’s going to jeopardize government people, military people or — or innocent civilians anywhere in the world.

I’ll write about this more at length.

Relatedly (though technically a Russian investigation detail), Comey revealed that the investigation into Trump ties to Russia is being done at Main Justice and EDVA.

COMEY: Yes, well — two sets of prosecutors, the Main Justice the National Security Division and the Eastern District of Virginia U.S. Attorney’s Office.

That makes Dana Boente’s role, first as Acting Attorney General for the Russian investigation and now the Acting Assistant Attorney General for National Security, all the more interesting, as it means he is the person who can make key approvals related to the investigation.

I don’t have any problem with him being chosen for these acting roles. But I think it supremely unwise to effectively eliminate levels of oversight on these sensitive cases (Russia and Wikileaks) by making the US Attorney already overseeing them also the guys who oversees his own oversight of them.

The US is on its way to becoming the last haven of shell corporations

Okay, technically these were Sheldon Whitehouse and Amy Klobuchar comments about Russia. But as part of a (typically prosecutorial) line of questioning about things related to the Russian investigation, Whitehouse got Comey to acknowledge that as the EU tries to crack down on shell companies, that increasingly leaves the US as the remaining haven for shell companies that can hide who is paying for things like election hacks.

WHITEHOUSE: And lastly, the European Union is moving towards requiring transparency of incorporations so that shell corporations are harder to create. That risks leaving the United States as the last big haven for shell corporations. Is it true that shell corporations are often used as a device for criminal money laundering?

COMEY: Yes.

[snip]

WHITEHOUSE: What do you think the hazards are for the United States with respect to election interference of continuing to maintain a system in which shell corporations — that you never know who’s really behind them are common place?

COMEY: I suppose one risk is it makes it easier for illicit money to make its way into a political environment.

WHITEHOUSE: And that’s not a good thing.

COMEY: I don’t think it is.

And Klobuchar addressed the point specifically as it relates to high end real estate (not mentioning that both Trump and Paul Manafort have been alleged to be involved in such transactions).

There have been recent concerns that organized criminals, including Russians, are using the luxury real estate market to launder money. The Treasury Department has noted a significant rise in the use of shell companies in real estate transactions, because foreign buyers use them as a way to hide their identity and find a safe haven for their money in the U.S. In fact, nearly half of all homes in the U.S. worth at least $5 million are purchased using shell companies.

Does the anonymity associated with the use of shell companies to buy real estate hurt the FBI’s ability to trace the flow of illicit money and fight organized crime? And do you support efforts by the Treasury Department to use its existing authority to require more transparency in these transactions?

COMEY: Yes and yes.

It’s a real problem, and not just because of the way it facilitates election hacks, and it’d be nice if Congress would fix it.

I Con the Record Transparency Bingo: Playing Card

In this post, I’ll cover the rest of the I Con the Record 2016 Transparency Report.

Title I, III, VII 703 and 704

As the report notes, these are the individually approved orders. To be assholes, ODNI includes Section 703, which is not used. I Con the Record reports 1,559 orders, which it does not break down.

For the same authorities (1805, 1824, 1805/1824, and 1881c), the FISA Court, which uses different and in most cases more informative counting metrics, reports 1,220 orders granted, 313 orders modified, and 26 orders denied in part (which add up to I Con the Record’s 1,559), plus 8 orders denied, which I Con the Record doesn’t mention.

As an improvement this year, I Con the Record has broken down how many of these targets are US persons or not, showing it to be 19.9%. That means the vast majority of targeted FISA orders are targeted at people like Sergey Kislyak, the Russian Ambassador all of Trump’s people talked to.

This is the target number for the original report, not the order number, and it is an estimate (which is curious). This means at least 28 orders target multiple people. Neither ICTR nor FISC reveals how many US persons were approved for 705b, meaning they were spied on when they went overseas.

Section 702

This is the authority that covers upstream and PRISM. After presenting its useless report that it had one certificate in 2016 (leftover from 2015), ICTR reports there were 106,469 knowably discrete 702 targets last year, an 11% increase off last year.

Note: one of the games played in the USA Freedom Act transparency procedures was that, once the other counts moved to a selector based count, this was removed from the required reports (which is why ICTR says they weren’t required by law to release it). They presumably did this to hide the likely fact that for every one of these 106,469 targets, there are multiple — possibly very many — selectors tasked, which would make the spying number look Yuge.

NSA and CIA provide the number of content queries they conducted. Since CIA has stopped double counting selectors it uses more than once, this represents more than the 12% increase in queries suggested by the numbers. So queries are increasing at a higher — potentially significantly higher — rate than targets.

Given the way the NSA’s querying process ties queries to deadlines (60 days, for example, or to the underlying authorization), it’s likely NSA just keeps these queries targeted tasked throughout that period (which may mean CIA moved to do the same this year). If that’s right, it would effectively alert an analyst any time a new communication involving the US person came in.

This post talks about what the report’s claim that just one query of FBI holdings designed to find criminal information had a positive hit — and was reviewed– on 702 information really means.

Meanwhile, NSA’s US person metadata queries have gone up much faster than content queries or target selectors, a 32% increase. As noted in this post, FBI doesn’t have to count their queries and CIA still does not do so.

Also note, this is an estimate. The underlying NSA document makes it clear this is done via algorithm or business rule to estimate these queries, which suggests they’re done automatically.

To put these queries into perspective, Jim Comey today said there were 1,000 Islamic extremists in the US who were communicating overseas. Even assuming they track the other 1,000 extremists not known to be communicating overseas, that’s just a tiny fraction of the Americans they’re tracking.

ICTR provided better information on unmasked US person identities this year than last, revealing how many USP identities got released.

As I said last year, ICTR is not doing itself any favors by revealing what a tiny fraction of all 702 reports the 3,914 — it must be truly miniscule.

All that said if you do get reported in one of those rare 702 reports that includes a USP identity, chances are very good you’ll be unmasked. In 30% of the reports with USP identities, last year, at least one USP identity was released in original form unmasked (as might happen, for example, if Carter Page or Mike Flynn’s identity was crucial to understanding the report). Of the remainder, though, 65% had at least one more US person identity unmasked. I believe that means that only roughly 26% of the names originally masked remained masked in the reports.

Pen Registers

See this post for an explanation of why we shouldn’t take too much from a seeming significant decline in pen registers. Note, I didn’t mention that 43.9% of the 41 targets are estimated to be US persons — but are estimates, which is a bit nutty given the small numbers involved.

Note, of the 60 pen registers ICTR shows, FISC shows 10 were modified (perhaps to include minimization procedures).

Section 215

The section on “traditional” Section 215 shows that for each order (of which up to 4 had more than one target), there were almost 1,000 selectors sucked in.

Except!

Except the number is likely far, far higher, because this metric doesn’t track people sucked in via financial or travel or other Section 215 orders.

This post explains why the 151 million call session records sucked in via the new Section 215 phone dragnet may not actually be that much — but also likely represents edge cases.

Note, the FISC report shows 125 total Section 215 reports, with 108 approved, 16 modified, and 1 rejected (the latter of which ICTR doesn’t mention). The approved reports adds up to the same 124 that ICTR shows. The modified orders likely include minimization procedures.

Here’s the number of queries of returned new phone dragnet data done by NSA and CIA (note, in the old dragnet, this data would not have been as readily available even within NSA, much less at CIA).

As always with meaningful metrics, FBI is exempt. I’ll return to this metric.

NSLs

I may come back to this as well, but for now, know that FBI requested fewer NSLs last year than in previous years.

I Con the Record Transparency Bingo (4): How 151 Million Call Events Can Look Reasonable But Is Besides the Point

Other entries in I Con the Record Transparency Bingo:

(1) Only One Positive Hit on a Criminal Search

(2): The Inexplicable Drop in PRTT Numbers

(3): CIA Continues to Hide Its US Person Network Analysis

If your understanding of the phone dragnet replacing the old USA Freedom dragnet came from the the public claims of USA Freedom Act boosters or from this NYT article on the I Con the Record report, you might believe 42 terrorist suspects and their 3,150 friends made 48,000 phone calls last year, which would work out to 130 calls a day … or maybe 24,000 perfectly duplicative calls, which works out to about 65 calls a day.

That’s the math suggested by these two entries in the I Con the Record Transparency Report — showing that the 42 targets of the new phone dragnet generated over 151 million “call detail records.” But as I’ll show, the impact of the 151 million [corrected] records collected last year is in some ways far lower than collecting 65 calls a day, which is a good thing! But it supports a claim that USAF has an entirely different function than boosters understood.

 

Here’s the math for assuming these are just phone calls. There were 42 targets approved for use in the new phone dragnet for some part of last year. Given the data showing just 40 orders, they might only be approved for six months of the year (each order lasts for 180 days), but we’ll just assume the NSA gets multiple targets approved with each order and that all 42 targets were tasked for the entirety of last year (for example, you could have just two orders getting 42 targets approved to cover all these people for a year).

In its report on the phone dragnet, PCLOB estimated that each target might have 75 total contacts. So a first round would collect on 42 targets, but with a second round you would be collecting on 3,192 people. That would mean each of those 3,192 people would be responsible for roughly 48,000 calls a year, every single one of which might represent a new totally innocent American sucked into NSA’s maw for the short term [update: that would be up to a total of 239,400 2nd-degree interlocutors]. The I Con the Record report says that, “the metric provided is over‐inclusive because the government counts each record separately even if the government receives the same record multiple times (whether from one provider or multiple providers).” If these were phone calls between just two people, then if our terrorist buddies only spoke to each other, each would be responsible for 24,000 calls a year, or 65 a day, which is certainly doable, but would mean our terrorist suspects and their friends all spent a lot of time calling each other.

The number becomes less surprising when you remember that even with traditional telephony call records can capture calls and texts. All of a sudden 65 becomes a lot more doable, and a lot more likely to have lots of perfectly duplicative records as terrorists and their buddies spend afternoons texting back and forth with each other.

Still, it may mean that 65 totally innocent people a day get sucked up by NSA.

All that said, there’s no reason to believe we’re dealing just with texts and calls.

As the report reminds us, we’re actually talking about session identifying information, which in the report I Con the Record pretends are “commonly referred to” as “call events.”

Call Detail Records (CDR) – commonly referred to as “call event metadata” – may be obtained from telecommunications providers pursuant to 50 U.S.C. §1861(b)(2)(C). A CDR is defined as session identifying information (including an originating or terminating telephone number, an International Mobile Subscriber Identity (IMSI) number, or an International Mobile Station Equipment Identity (IMEI) number), a telephone calling card number, or the time or duration of a call. See 50 U.S.C. §1861(k)(3)(A). CDRs do not include the content of any communication, the name, address, or financial information of a subscriber or customer, or cell site location or global positioning system information. See 50 U.S.C. §1861(k)(3)(B). CDRs are stored and queried by the service providers. See 50 U.S.C. §1861(c)(2).

Significantly, this parenthesis — “(including an originating or terminating telephone number, an International Mobile Subscriber Identity (IMSI) number, or an International Mobile Station Equipment Identity (IMEI) number)” — suggests that so long as something returns a phone number, a SIM card number, or a handset number, that can be a “call event.” That is, a terrorist using his cell phone to access a site, generating a cookie, would have the requisite identifiers for his phone as well as a time associated with it. And I Con the Record’s transparency report says it is collecting these “call event” records from “telecommunications” firms, not phone companies, meaning a lot more kinds of things might be included — certainly iMessage and WhatsApp, possibly Signal. Indeed, that’s necessarily true given repeated efforts in Congress to get a list of all electronic communications service providers company that don’t keep their “call records” 18 months and to track any changes in retention policies. It’s also necessarily true given Marco Rubio’s claim that we’re sending requests out to a “large and significant number of companies” under the new phone dragnet.

The fine print provides further elements that suggest both that the 151 million events collected last year are not that high. First, it suggests a significant number of CDRs fail validation at some point in the process.

This metric represents the number of records received from the provider(s) and stored in NSA repositories (records that fail at any of a variety of validation steps are not included in this number).

At one level, this means NSA’s results resulted in well more than 151 million events collected. But it also means they may be getting junk. One thing that in the past might have represented a failed validation is if the target no longer uses the selector, though the apparent failure at multiple levels suggests there may be far more interesting reasons for failed validation, some probably technically more interesting.

In addition, the fine print notes that the 151 million call events include both historical events collected with the first order as well as the prospective events collected each day.

CDRs covered by § 501(b)(2)(C) include call detail records created before, on, or after the date of the application relating to an authorized investigation.

So these events weren’t all generated last year — if they’re from AT&T they could have been generated decades ago. Remember that Verizon and T-Mobile agreed to a handshake agreement to keep their call records two years as part of USAF, so for major providers providing just traditional telephony, a request will include at least two years of data, plus the prospective collection. That means our 3,192 targets and friends might only have had 48 calls or texts a day, without any duplication.

Finally, there’s one more thing that suggests this huge number isn’t that huge, but that also it may be a totally irrelevant measure of the privacy impact. In NSA’s document on implementing the program from last year, it described first querying the NSA Enterprise Architecture to find query results, and then sending out selectors for more data.

Once the one-hop results are retrieved from the NSA’s internal holdings, the list of FISC-approved specific selection terms, along with NSA’s internal one-hop results, are submitted to the provider(s).

In other words — and this is a point that was clear about the old phone dragnet but which most people simply refused to understand — this program is not only designed to interact seamlessly with EO 12333 collected data (NSA’s report says so explicitly, as did the USAF report), but many of the selectors involved are already in NSA’s maw.

Under the old phone dragnet, a great proportion of the phone records in question came from EO 12333. NSA preferred then — and I’m sure still prefers now — to rely on queries run on EO 12333 because they came with fewer limits on dissemination.

Which means we need to understand the 65 additional texts — or anything else available only in the US from a large number of electronic communications service providers that might be deemed a session identifier — a day from 42 terrorists and their 3150 buddies on top of the vast store of EO 12333 records that form the primary basis here.

Because (particularly as the rest of the report shows continually expanding metadata analysis and collection) this is literally just the tip of an enormous iceberg, 151 million edge cases to a vast sea of data.

Update: Charlie Savage, who has a really thin skin, wrote me an email trying to dispute this post. In the past, his emails have almost universally devolved into him being really defensive while insisting over and over that stuff I’ve written doesn’t count as reporting (he likes to do this, especially, with stuff he claims a scoop for three years after I’ve written about it). So I told him I would only engage publicly, which he does here.

Fundamentally, Charlie disputes whether Section 215 is getting anything that’s not traditional telephony (he says my texts point is “likely right,” apparently unaware that a document he obtained in FOIA shows an issue that almost certainly shows they were getting texts years ago). Fair enough: the law is written to define CDRs as session identifiers, not telephony calls; we’ll see whether the government is obtaining things that are session identifiers. The I Con the Record report is obviously misleading on other points, but Charlie relies on language from it rather than the actual law. Charlie ignores the larger point, that any discussion of this needs to engage with how Section 215 requests interact with EO 12333, which was always a problem with the reporting on the topic and remains a problem now.

So, perhaps I’m wrong that it is “necessarily” the case that they’re getting non-telephony calls. The law is written such that they can do so (though the bill report limits it to “phone companies,” which would make WhatsApp but not iMessage a stretch).

What’s remarkable about Charlie’s piece, though, is that he utterly and completely misreads this post, “About half” of which, he says, “is devoted to showing how the math to generate 151 million call events within a year is implausible.”

The title of this post says, “151 Million Call Events Can Look Reasonable.” I then say, “But as I’ll show, the impact of the 131 [sic, now corrected] million records collected last year is in some ways far lower than collecting 65 calls a day, which is a good thing!” I then say, “The number becomes less surprising when you remember that even with traditional telephony call records can capture calls and texts. All of a sudden 65 becomes a lot more doable, and a lot more likely to have lots of perfectly duplicative records as terrorists and their buddies spend afternoons texting back and forth with each other.” I go on to say, “The fine print provides further elements that suggest both that the 151 million events collected last year are not that high.” I then go on to say, “So these events weren’t all generated last year — if they’re from AT&T they could have been generated decades ago.”

That is, in the title, and at least four times after that, I point out that 151 million is not that high. Yet he claims that my post aims to show that the math is implausible, not totally plausible.  (He also seems to think I’ve not accounted for the duplicative nature of this, which is curious, since I quote that and incorporate it into my math.)

In his email, I noted that this post replied not just to him, but to others who were alarmed by the number. I said specifically with regards the number, “yes, you were among the people I subtweeted there. But not the only one and some people did take this as just live calls. It’s not all about you, Charlie.”

Yet having been told that that part of the post was not a response to him, Charlie nevertheless persisted in completely misunderstanding the post.

I guess he still believed it was all about him.

Maybe Charlie should spend his time reading the documents he gets in FOIA more attentively rather than writing thin-skinned emails assuming everything is about him?

Update: Once I pointed out that Charlie totally misread this post he told me to go back on my meds.

Since he’s being such a douche, I’ll give you two more pieces of background. First, after I said that I knew CIA wasn’t tracking metadata (because it’s all over public records), Charlie suggested he knew better.

Here’s me twice pointing out that the number of call events was not (just) calls (as he had claimed in his story), a point he mostly concedes in his response.

Here’s the lead of his story:

I Con the Record Transparency Bingo (3): CIA Continues to Hide Its US Person Network Analysis

As I noted in this post on the single positive hit in a criminal back door 702 search and this post on the inexplicable drop in PRTT numbers, I’m going to clarify things I’m seeing confusion over in the I Con the Record Transparency Report, then do a full working thread.

This year’s report shows a steady increase in the number of metadata searches in raw Section 702 data, a 22% (6,555 query) increase off year.

The graphic admits that these 30,355 queries don’t include the FBI (because the transparency procedures passed by USAF freedom pretty much exempted FBI from everything important). But then further down in the written text, I Con the Record admits that one agency of the IC could not estimate its metadata queries.

As with last year’s transparency report, one IC element remains currently unable to provide the number of queries using U.S. person identifiers of unminimized Section 702 non-content information.

That Agency is the CIA, not the FBI (which isn’t required to count its queries).

We know this from a number of places, including James Clapper’s original report on back door searches to Ron Wyden and the PCLOB 702 report (page 58). PCLOB’s most recent Recommendations update noted that CIA hasn’t implemented the recommendation to track foreign intelligence purpose for queries because it has not yet updated its data management. Nor do ODNI and DOJ review it.

The status of the CIA metadata queries remains the same as reported in the Board’s Recommendations Assessment Report of January 2015, namely with respect to the CIA’s metadata queries using U.S. person identifiers, the CIA accepted and plans to implement this recommendation as it refines internal processes for data management. Thus, the CIA’s new minimization procedures do not reflect changes to implement this recommendation with regard to metadata queries.

[snip]

U.S. person queries by the NSA and CIA are already subject to rigorous executive branch oversight (with the exception of metadata queries at the CIA), supplying this additional information to the FISC could help guide the court by highlighting whether the minimization procedures are being followed and whether changes to those procedures are needed.

And a recently ACLU liberated report on CIA’s back door searches also cites data management reasons for not documenting these searches.

CiA’s metadata-only repository does not have the capacity for documenting why the query is reasonably likely to provide foreign intelligence information. Upon opening the repository, however, users will be met with a pop-up reiterating the query standard and requiring their assent before they may proceed.

I officially bet a quarter that CIA will find a way to count this next year, as by then, many of these queries will have moved to EO 12333 querying, which does not get counted.

So the report on metadata searches only shows what NSA does. Since last year, we have confirmed that these metadata queries include upstream 702 data, which carry their own risks.

And we also now have a sense that those queries are automated. The I Con the Record report explains this is just a good faith effort.

The above is a good faith estimate of the number of queries concerning a known U.S. person that the government conducted of unminimized (i.e., raw) lawfully acquired Section 702 metadata.

That’s because this is done by algorithm and business rule, not by any kind of tracking (I’m guessing because of the way metadata is used to triage newly collected identifiers).

NSA will rely on an algorithm and/or a business rule to identify queries of communications metadata derived from the FAA 702 [redacted] and telephony collection that start with a United States person identifier. Neither method will identify those queries that start with a United States person identifier with 100 percent accuracy.

The privacy community made great celebration about shutting down a phone dragnet that was just used to query 200 or so selectors. Meanwhile, each year the NSA, alone, conducts thousands more such queries (and in a way that likely ties more closely to content searches). And 3 years after people started pressuring it to do so, CIA still doesn’t count how many queries it is doing.

Which likely means CIA is doing a whole bunch of network analysis on US persons that it doesn’t want us to know about.

I Con the Record Transparency Bingo (2): The Inexplicable Drop in PRTT Numbers

As noted in this post, I’m going to start my review of the new I Con the Record Transparency Report by addressing misconceptions I’m seeing; then I’ll do a complete working thread. In this post, I’m going to address what appears to be a drop in FISA PRTT searches.

The report does, indeed, show a drop, both in total orders (from 131 to 60 over the last 4 years) and an even bigger drop in targets (from 319 to 41).

Some had speculated that this drop arises from DOJ’s September 2015 loophole-ridden policy guidance on Stingrays, requiring a warrant for prospective Stingrays. But that policy should have already in place on the FISC side (because FISC, on some issues, adopts the highest standard when jurisdictions start to deal with these issues). In March 2014, DOJ told Ron Wyden that it “elected” to use full content warrants for prospective location information (though as always with these things, there was plenty of room for squish, including on public safety usage).

As to the drop in targets: it’s unclear how meaningful that is for two reasons.

First, the ultimate number of unique identifiers collected has not gone down dramatically from last year.

Last year, the 134, 987 identifiers represented 243 identifiers collected per target, or 1,500 per order. This year, the 125,378 identifiers represents a whopping 3,078 per target or 3,756 per order. So it’s appears that each order is just sucking up more records.

But something else may be going on here. As I pointed out consistently though debates about these transparency guidelines, the law ultimately excluded everything we knew to include big numbers. And the law excludes from PRTT identifier reporting any FBI obtained identifier that is not a phone number or email address, as well as anything delivered in hard copy or portable media.

For all we know, the number of unique identifiers implicated last year is 320 million, or billions, but measuring IP addresses or something else. [Update: Reminder that the FBI used a criminal PRTT in the Kelihos botnet case to obtain the IP addresses of up to 100,000 infected computers, but that’s the kind of thing they might use a FISA PRTT for.]

Alternately, it’s possible some portion of what had been done with PRTTs in 2015 moved to some other authority in 2016. A better candidate for that than Stingrays would be CISA voluntary compliance on things like data flow.

One final note. Unless I misunderstand the count, we’re still missing one amicus brief appointment from 2015. The FISC report from that year (covering just 7 months) said there were four appointments across three amici.

During the reporting period, on four occasions individuals were appointed to serve as amicus curiae under 50 U.S.C. § 1803(i). The names of the three individuals appointed to serve as amicus curiae are as follows:  Preston Burton, Kenneth T. Cuccinelli II  (with Freedom Works), and Amy Jeffress. All four appointments in 2015 were made pursuant to § 1803(i)(2)(B). Five findings were made that an amicus curiae appointment was not appropriate under 50 U.S.C. § 1803(i)(2)(A) (however, in three of those five instances, the court appointed an amicus curiae under 50 U.S.C. § 1803(i)(2)(B) in the same matter).

Burton dealt with the resolution of the Section 215 phone data, Ken Cuccinelli dealt with FreedomWork’s challenge to the way USAF extended the phone dragnet, and Amy Jeffress dealt with the Section 702 certificates.

That leaves one appointment unaccounted for (and I’d bet money Jeffress dealt with that too). On June 18, 2015, FISC decided not to use an amicus with an individual PRTT order that was a novel interpretation of what counted as a selection term under USAF. It chose not to use an amicus because the PRTT had already expired and because there were no amici identified at that point to preside. If that issue recurred for a more permanent PRTT later in the year, it may have affected how ODNI counted PRTTs (or the still-hidden amicus use may be for another kind of individual order).

All of which is to say, the government appears to be obtaining fewer PRTT orders over the last two years. But it’s not yet clear whether that has any effect on privacy.