Who Wrote It? Determining Authorship using Electronic Document Forensic Techniques

By Joe Bartling, Bartling Forensic LLC

On a recent case,  a client, a non-profit organization, had a dilemma.  An article had appeared in an online newspaper disparaging the newly installed president of the organization.  It was clear from the article, and its distribution, that the author of the article, and whomever was behind it, intended to get the president terminated, or force him to resign.

The article included quotes from at least five emails, sent confidentially on the corporate email system, and between some of the organization’s top leaders, including the president.  The stated author of the article did not work for the organization and did not have access to the organization’s email system in any way.

So began our assignment.  We developed our questions as to what the case was about.

  1. Who “really” wrote the article if not from the stated author
  2. Who leaked the confidential emails to the author of the article
  3. Who was involved in the plot to write the article?

The Board of Directors rightly saw that there was an issue of trust among the senior staff.  The Board had unanimously appointed the new president to make needed changes to the organization, changes that several of the existing staff were vocally resistant to.  Many of the staff didn’t want any change at all to the organization, but other stakeholders recognized that change was needed to make the organization more relevant and financially viable into the future.  The new president was an “outsider”, vastly qualified to implement organizational change, but in the view of many of the staff, not one of “them”.

In order to identify who wrote the article and identify the individuals who were behind its publication, we collected electronic evidence from the organization and hosted it in Bartling Forensic’s cloud-based review platform.

Collecting the First Batch of Email

Since we knew that the article contained exact quotes from confidential emails from the organization’s email system, we started by identifying which custodians were on those email chains and collecting that.  At first, we collected 6-8 email archive files from the corporate Microsoft Office 365 system for the first batch of custodians, staff members who were either on the original chain of the leaked emails or were suspected of having knowledge of the article and its publication.

Some of the staff members had simultaneously participated in a Facebook group calling for the ouster of the president.  Some of them became “Facebook Friends” and used Facebook messaging to communicate among each other privately and avoid using corporate email.  But because some of them used Facebook email notification on Facebook, our collection of corporate email included many of these “private” conversation threads as they were sent to a corporate email address for notification.

Collection of PC Hard Drives

For a few of the custodians, the hard drives on their corporate computers were imaged, processed, and collected for review and also added to the online document repository of email.

Draft Copies of the Article on the Corporate Email System

As we reviewed the documents we had collected from the organization’s email system, we were somewhat surprised to find 5 different versions of the article, in draft forms, and with different dates and some differing metadata in the corporate system, and they were all dated prior to the publication of the article on an external website.  A review of who sent and received the copies of the article, and whether or not and when they were sent to private or organizational email accounts, indicated a flow of the evolution of the article over time.   Comparison of each version by unique MD5 hash values helped us determine which custodians made what changes, and when, and then what each custodian did to the article.

The draft copies of the article also generated a valuable source of metadata.  The earliest copy of the article had metadata indicating that it was created using Microsoft Word on a Macintosh Computer with a generic Microsoft User account.  The organization actually used Microsoft Windows with Microsoft Office including Microsoft Word (not Macintosh) and the organization used login names, which usually were captured in Word documents created by staff members.

Analyzing the Article for Authorship using Stylometry

Although the article, in at least one of its draft versions, had an author identified in the text, we couldn’t possibly be sure that the author identified in the article was its actual author or not.  The author named had other articles attributed to him/her, and there were other articles published by several of the other custodians.  So we used stylometry to examine and compare known articles written by some of the known custodians to the subject article.  Stylometry uses objective statistics about document content for comparison purposes.  Stylometry was recently used to identify, in a shocking literary discovery,  that Robert Galbraith, the “first-time” author of the critically acclaimed novel The Cuckoo’s Calling, was none other than the UK’s best-selling author ever, Harry Potter creator J.K. Rowling.  ‘

Some of the statistics we use for comparison using stylometry are as follows:

  1. Word lengths
  2. Sentence lengths
  3. Paragraph lengths
  4. Use of letters
  5. Use of punctuation
  6. Use of function words

By using various stylometry tools, we were able to determine up to 80% probability of who the genuine author of the article actually was.  The use of the quoted leaked emails in the article prevented even more accurate determination as to the identity of the article’s author.

Who Had Electronic Possession of the “Leaked” Emails?

We collected another dozen or so email archives for other custodians affiliated with our first set of custodians to help us determine who had possession of the emails that were “leaked” that ended up in the article.  This exercise turned out to be extremely helpful in that we were also able to determine who DIDN’T have access to the emails in question.  This led to the exoneration of a number of custodians that, although they were advocates of reform including the president’s departure, they had nothing to do with the article or in the “leaking” of the emails to the article’s author.

Who “Knew” and had “What Kind” of Relationship with the Article’s Author?

We reviewed documents in our Bartling Forensic cloud-based document repository using social network analysis and link analysis, which maps out the relationships and contacts between custodians through the use of emails, calendar appointments, contact lists, etc.   We were able to determine which custodians had what type of contact with the article’s author, when those contacts were made, and in some instances, what those encounters or contacts were about.  For example, we were surprised that one of our senior staff members, who had no reasonable need to have a relationship with the article’s author, had put a calendar entry into his Microsoft Outlook calendar, for a date one week earlier than anyone else that we were aware of, had knowledge of the article.  In the calendar entry for an off-site meeting at 8:30PM on a weekday,  was “Meeting with [author name], re: Article”.  Wow, the smoking gun!  Right there in our exchange server!

This particular custodian had not been on our radar as being significant, and had been identified in the second, wider collection of custodian emails.  During the course of our investigation, this particular custodian resigned from the organization and moved onto his next career post at another non-profit.  Surely he knew that eventually his deeds would be discovered, and they were!


In our final forensic report we were able to explain, with electronic evidence and forensic analysis with some degree of likelihood, 1) who leaked the emails, 2) who wrote the article, 3) which staff members were involved, and more importantly, 4) which staff members were NOT involved in the leaking of emails and in the idea of publishing a disparaging article that reflected poorly on the organization, its Board of Directors, and its president.  Our client was very pleased that we could answer the important questions that they were asking.

Copyright 2016 Bartling Forensic LLC

0 Comment

Leave a Comment

Your email address will not be published.