From time to time, there has been interest in metadata related to the DNC email hack (especially in relation to the exfiltration rate), but it’s not easy to locate details. Metadata for the DNC email hack are not the same as metadata for Guccifer 2 ngpvan documents and there has been much confusion on this point.
This article will collect some of my notes on two forms of metadata for each DNC email at Wikileaks (link):
its sent date and time;
the date and time of underlying eml. (The eml document is available at the View Source tab.)
My interest in this topic arose from my prior Climategate experience, where metadata of Climategate documents yielded solid information on the time period of the operation.
To my knowledge, no cybersecurity firm or news organization has done any reporting on DNC email metadata; the analysis has been done entirely in “this corner” of Twitter, especially Adam Carter (@with_integrity), Forensicator and @Tralfamadorenik.
sent_dates (2017)
The first quantitative analysis of DNC email metadata (to my knowledge) was my September 2, 2017 analysis (link) of sent dates. At the time, the emails were typically described as extending from January 2015 to May 2016, with the implication that the hack had been a long term operation. Questioning by House Intelligence Committee members to Crowdstrike, for example, paid particular attention to whether earlier warning to the DNC from the FBI could have prevented the DNC email hack - the form of the question assuming that the hack had been a sort of long-term surveillance.
The sent_dates did extend from January 2015 to May 2016, but, if one calculated the number of emails per day, one could see that nearly all the emails were sent after April 19, 2016 and ended on May 25, 2016. Over 98% of the hacked emails were sent after the April 22, 2016 “staging” that Crowdstrike drew attention to (contradicting the implication that the DNC emails had themselves been staged). In fact, the majority of hacked DNC emails were sent after CrowdStrike installed their monitoring software i.e. the hack occurred on Crowdstrike’s watch, not prior to their arrival (as they had implied.) Crowdstrike not only failed to prevent the hack, but, as it turned, failed to even detect the hack. So how would earlier installation of their software have made a difference?
Around the same time, wh1sks (see Github here) analysed sent times broken down by the 10 different targets in the Wikileaks archive. For each individual, he compiled both the latest sent time and the earliest sent time, and deduced that five targets (Comer, Allen, Parrish, Wright) had been exfiltrated on May 25, 2016; Kaplan on both May 23 and May 25, 2016; Crystal and Banfill on late May 22 and May 23 respectively; Miranda on May 19 and May 23, 2016; and Brinster the earliest on May 19, 2016.
I am unaware of any relevant technical discussion of DNC email metadata in 2018. As discussed in my recent article (link), in his book published in June 2018 (link; Isikoff podcast around minute 24; Isikoff covering article), former DNI James Clapper placed the exfiltration in April 2016 - an observation that was wildly impossible with even the most trivial examination of DNC email metadata. As was Shawn Henry’s assertion that the emails had been “staged” for exfiltration on April 22, 2018 (see link).
eml_dates (2019)
Carter 2019-01-27
On January 27, 2019 - 2.5 years after Wikileaks published the DNC emails - Adam Carter (@with_integrity) made the first report (to my knowledge) on the timestamp metadata of the Wikileaks DNC emails as eml documents (link; archive). In retrospect, it’s quite amazing that no one seems to have done this previously.
Associated with the webpage for each email in the Wikileaks DNC archive was an underlying document: the email in *.eml (Thunderbird) format. Carter collated the eml_date, eml_time and size for each for the 44,053 emails. Carter observed that the eml metadata for the Wikileaks DNC emails contained four dates: May 23, May 25, August 26 and September 21, 2016. The first tranche of DNC emails (published on July 22, 2016) had May 23 and May 25 eml_dates. The second and final tranche of DNC emails (which attracted little attention) was published on November 7, 2016 and had eml_dates of August 26, 2016 and September 21, 2016.
Carter also compiled eml metadata for the Podesta emails and observed that all but 2 Podesta emails had eml_dates of September 19, 2016 (with two anomalous eml_dates of November 6, 2017.)
Carter also observed that the first three batches of DNC emails (May 23, May 25 and August 26) were in FAT format. FAT format rounds the timestamp to nearest even second, as opposed to the millisecond detail available in non-FAT formats. The FAT format timestamps were subsequently shown by Forensicator to have been in the local time of the computer; as discussed below, Forensicator deduced the local timezone from collateral information.
FAT formats were commonplace in thumb drives, giving rise to Carter’s speculation that “2 of them [batches] could have ended up on a USB device around the dates of acquisition”. Bruce Leidl later observed that some zip programs were in FAT format, so that FAT format did not prove use of a thumb drive.
Johnson and Binney, 2019-02-13
On February 13, 2019, Larry Johnson and William Binney (both of VIPS) published an article (link) that (apparently independently) reported FAT format in Wikileaks eml’s.
Analytically, the Johnson and Binney article (also see their followup here) regressed substantially from the Carter tweet. Their analysis was based on a sample of 500 emails, as opposed to the comprehensive census of over 44,000 emails that was carried out by Carter. They also failed to notice the tranche of emails datestamped on 2016-09-21 - which were not FAT format.
They also totally muddied the water through a calculation of copying rates implied by timestamps in the Guccifer 2 ngpvan.7z zipfile (49.1 megabytes per second) - a different dataset with July 5, 2016 datestamp, six weeks later than the DNC email exfiltration. They observed that this copying rate was too rapid for a hack and postulated that the DNC emails had been exfiltrated on a thumb drive - a theory that has become widespread in “skeptic” circles.
Carter 2019-02-17
A few days later (February 17, 2019), Carter published an article (here) that made the first attempt to calculate the exfiltration rate of the DNC emails from eml timestamps. This was possible only for the emails in the first Wikileaks tranche (the emails with datestamps May 23, 2016 and May 25, 2016. Carter sorted the emails by timestamp and then plotted the cumulative size of the emails against the cumulative time of exfiltration, as shown in the diagram below.
Carter obtained an exfiltration rate of approximately 3 Mb/second, an order of magnitude less than the 49 MB/second copying rate cited by Johnson and Binney in regard to the Guccifer 2 ngpvan documents. But while Carter’s procedure was directionally correct, the correct answer using his data was only about 400 kb/second, as Forensicator reported on April 20, 2019 (discussion follows below). This was two orders of magnitude less than the 49 MB/second speed discussed by Johnson and Binney,
Forensicator, April 2019
On April 10, 2019, Forensicator substantially extended Carter’s analysis in the most comprehensive article on DNC metadata thus far (link), including consideration of other metadata additional to simple metrics of sent_time and eml_time.
Forensicator corrected Carter’s calculation of export speeds, obtaining a value of approximately 400 KB/sec. The validity of this correction can be easily seen: on May 25, 2016, just under 1 Gb of data was exfiltrated over approximately 44 minutes. Thus, the issue is the opposite of that posed by Johnson and Binney: for the DNC emails, it’s not a question of explaining a very “high” copying speed (47 MB/second), but of a very low copying speed (400 KB/second). A low copying speed that is entirely consistent with exfiltration.
Forensicator deduced the (unknown) timezone of the eml timestamps was Pacific Time through a clever calculation. For each of the six individuals with May 25, 2016 eml dates, he compared the sent time of the latest email (in GMT) to the eml timestamp (shown below). The eml timestamp - which was necessarily after the sending of the email - was 6-7 hours earlier. The logical conclusion is that the unknown relative timezone of the eml documents was -07:00 — Pacific time. (Diagram below is from Forensicator April 10, 2019 article).
From the above analysis, Forensicator then deduced that “the May 25 Emails Were Likely Acquired and Exported Simultaneously (and not Vetted)”. Once the eml_times were identified as Pacific timezone, the last email times and beginning and end of export operations for each May 25 individual was organized into a tight and very convincing chronology:
The latest emails for each of the four individuals with Aug 26 and Sep 21 eml_dates were on May 23, 2016. Forensicator also re-examined the earliest emails for each individual, reporting that “99% of the DNC emails were within the 30 day retention”.
Of the May 23 individuals, only one (Jeremy Brinster) had emails before April 23, 2016: Brinster’s emails began (in volume) a few days earlier on April 19, 2016, leading to the surmise that exfiltration may have begun with him. Emails going back to 2015 (only a trickle) were limited to the May 25 individuals. Forensicator also extracted a variety of other metadata from the DNC emails: whether they were received or sent and related attributes. (I’ve recently done some re-analysis of these “too early” emails to see if there are any relevant patterns and will try to write it up.)
Conclusion
In the Climategate hack, where I had a front row seat (so to speak), eml_timestamps on the emails themselves were “bleached”, but export timestamps on some documents were not and that this data yielded some insight into the Climategate hacker (when combined with other knowledge).
Here too, one way to find at least some solid ground is through metadata. The metadata reviewed here enables the contradiction of the legend that the emails were “staged” in April 2016. They also clearly show that the exfiltration took place on Crowdstrike’s watch and should have been observed on May 23, 2016 and May 25, 2016.
They also contradict the curiously incorrect dates of Mueller investigation (“between May 25, 2016 and June 1, 2016”). While the Mueller dates are close to the correct dates, they were incorrect nonetheless and the reason for the Mueller dates remain unexplained.
US agencies have rigorously redacted nearly everything in FOIA productions related to the DNC hack - see the Clevenger FOIA where over 900 pages were produced and over 900 pages were more or less totally redacted. It is hard to understand the continuing purpose of such redactions. An unfortunate by-product is that it permits suspicion to linger, even in circumstances where it could be dispelled.
I'm having trouble understanding what the time stamps represent since the DNC Wikileaks include emails with time stamps as late as September 21. Those late time stamps must be much later than the exfiltration since the DNC had their server rebuilt in June. The Podesta hack was in March. It seems the later time stamps had to be later handled (delivered?) tranches, especially the November Podesta emails. Only the May dates seem to represent their date of exfiltration. If so it looks like some more DNC emails came later mixed with the Podesta emails. If so, Assange would have had a second or third chance to identify the leaker. BTW, GMT plus 7 hours in mountain time or am I misunderstanding what +0700 means?
Thanks