A Technical and Cultural Assessment of the Mueller Report PDF – PDF Association
Skip to content
Featured articles
New Best Practice Guide for Mathematical Content in PDF
Challenges in the forensic analysis of PDF files
New FAQ: AI and PDF
Discover pdfa.org
Review
publications
from the PDF Association and ISO
The
technical index
lists critical resources for developers
Learn which companies are
PDF Association members
Review hundreds of
presentations
from our events
Key resources
Try the new
VS Code extension
for PDF syntax
Check out our
cheat sheets
for PDF developers
Get PDF’s latest specification,
ISO 32000-2 at no cost
Add
ISO 32000-2 errata
via our public GitHub repo, and check out the
resolutions
Get involved
Discover the
benefits
of PDF Association membership
Join the PDF Association!
Review the PDF technical community’s
working groups
How do you find the right PDF technology vendor?
Use the
Solution Agent
to ask the entire PDF communuity!
The PDF Association celebrates its
members’ public statements
of support
for ISO-standardized PDF technology.
Solution Agent
Board and staff members
Working Group chairs
About PDF
News
Events
Topics
PDF 2.0
Accessible PDF
Archival PDF
Graphic Arts
Tagged PDF projects
Resources
Technical index
Publications
ISO standards
No-cost ISO standards
Specifications errata
GitHub repositories
Glossary of PDF terms
Presentations
Working Groups
Technical WGs
Liaison WGs
Marketing WGs
U.S. TAG to ISO TC 171
Member
Member Area
Members
Product showcase
Feature support
Membership benefits
Membership fees
Join the PDF Association
Board and staff members
Working Group chairs
Get your PDF merch!
Member area
News
Events
Topics
PDF 2.0
Accessible PDF
Archival PDF
Graphic Arts
Tagged PDF projects
Resources
Technical index
Publications
ISO standards
No-cost ISO standards
Specifications errata
GitHub repositories
Glossary of PDF terms
Presentations
Working Groups
Technical WGs
Liaison WGs
Marketing WGs
U.S. TAG to ISO TC 171
Member
Member Area
Members
Product showcase
Feature support
Membership benefits
Membership fees
Join the PDF Association
Board and staff members
Working Group chairs
Get your PDF merch!
Member area
Explore membership benefits
Explore membership benefits
PDF Association on pdfa.org
Related company page
Become a Member!
Become a Member!
A Technical and Cultural Assessment of the Mueller Report PDF
About
Duff Johnson
PDF Association
It‘s too bad that such important historical documents are nonetheless posted as such low-quality PDF files. We provide commentary.
Article
April 19, 2019
A Technical and Cultural Assessment of the Mueller Report PDF
It‘s too bad that such important historical documents are nonetheless posted as such low-quality PDF files. We provide commentary.
Article
April 19, 2019
About
Duff Johnson
PDF Association
What can we learn about the Mueller Report from the PDF file released by the Department of Justice (DoJ) on April 18, 2019?
This article offers two things:
a brief, high-level
technical assessment
of the document, and
question of culture
: why everyone assumes it would be delivered as a PDF file – and would have been shocked otherwise.
The Mueller Report
A Technical and Cultural Assessment of the Mueller Report PDF
(this article)
Even with OCR, the Mueller Report PDF isn't fully searchable
DoJ reposts the Mueller Report!
The Mueller Report
A Technical and Cultural Assessment of the Mueller Report PDF
(this article)
Even with OCR, the Mueller Report PDF isn't fully searchable
DoJ reposts the Mueller Report!
Key take-aways
If Mueller delivered a “born digital” PDF to Justice, that file was printed and scanned back into a set of low-quality images for release; a disservice to all future users of the document, and also a violation of Section 508 regulations.
If Mueller delivered a paper document to the Department of Justice which was subsequently scanned, DoJ‘s treatment of the document is more understandable, but still non-conforming with Section 508.
Irrespective of the evidence and conclusions about the Trump campaign, the Special Counsel‘s report showcases the essential qualities of static, self-contained, reliable, sharable PDF in a world that increasingly runs on HTML.
What is “redaction”?
Redaction is the process of removing content from a document. There are various ways to achieve redaction in electronic documents, ranging from removal of content from an original source document to printing and re-scanning after redaction. Unfortunately, DoJ chose the latter approach, resulting in a pixelated, low-quality document that will make a poor showing in all subsequent uses.
“The PDF itself” - a technical assessment
We downloaded “
report.pdf
” (PDF, 139 MB) from the Special Counsel‘s page on the US Department of Justice website.
The basic facts
From a PDF technology perspective the file uses PDF 1.6 technology and is of acceptable quality. It does not conform to ISO 19005 (PDF/A), the archival standard for PDF files. It is not digitally signed or encrypted for security.
Based on its metadata, the PDF released by the Department of Justice was produced using Ricoh MP 6C502 software, probably a typical office network copier / printer. The file was produced on April 17 after 6:23 pm.
Scanning
Analysis:
The fact that DoJ chose to deliver an “images only” PDF forces a much larger file-size and loss of searchable text. Effectively, this process “dumbed down” the PDF to a set of images – the same type of content that comes out of a scanner. Admittedly, it is also a crude but effective means of ensuring (beyond redaction) that nothing is released besides images of pages... but the redaction software available to DoJ (see below) is fully effective at redacting born-digital PDF files, so image conversion was unnecessary.
From the scanner artifacts left on the images (e.g. the horizontal yellow streak and the gray vertical streak on the right edge) and the voluminous compression artifacts, we assess that the document has certainly been scanned and compressed at least once and more probably twice.
Although DoJ did not OCR the report prior to its release, those downloading the file are free to use their own OCR. Results will not be ideal or identical since the source images are of relatively low quality. In particular, OCR errors will be more common adjacent to underlines and redactions.
Analysis:
We assess that the document was most likely scanned twice, with redactions being added to the first scanned document using software. This implies that the document may have been provided to DoJ on paper rather than as an electronic document. If it was provided by Mueller to DoJ electronically, then printing it just to scan it back into another, far larger and less capable PDF is difficult to understand.
The US Department of Justice has a clear policy of ensuring that public documents comply with Section 508 regulations, and are therefore accessible to users with disabilities. The Mueller Report PDF does not conform with these regulations.
If the Mueller report was delivered to DoJ as a high-quality born-digital PDF, it would have been tagged from the outset. DoJ could have easily redacted it without resorting to printing the result and and re-scanning the printed paper.
Analysis:
If Mueller had delivered a paper document instead of a PDF, then DoJ's process, while not best practice or even within the regulations, is more understandable due to time pressures. If Mueller had delivered a high-quality PDF, however, then it's exceptionally unfortunate that DoJ chose to “dumb it down” when processing and releasing it.
Bookmarks
Analysis:
This fact implies that DoJ broke up the Mueller Report into subsections matching the bookmarks to allow a team to collaborate over adding redaction annotations to these subsections, then reassembling the document and outputting a finished, redacted PDF file.
Redactions
Due to their consistency and regularity of form and application, it‘s clear that the redactions were performed by software rather than manual methods (i.e., to a printed document). The redaction implementation (style, spacing, label) is completely consistent throughout the document, indicating expert use of professional-class redaction software.
Using high-quality redaction software allows organizations to collaborate effectively on such projects, ensuring that the type of redaction used, as well as the color-codes and other features, are consistent for all collaborators. It is to be expected that DoJ possesses and is expert in the use of such software.
Instead of delivering “native” redactions, however, it's obvious that DoJ
printed and then scanned the document after it was redacted.
We know this because on many pages a scanner artifact (the faint yellow line) crosses a redacted area.
This deliberate and unnecessary act made the document substantially harder for anyone and everyone to use, forever.
Analysis:
I asked Mark Gavin, CTO of
Appligent Document Solutions
, and the developer of the first PDF redaction tool, for his comments on the redaction method used in the Mueller Report. Mark said:
“Native PDF redaction has been available now for more than 20 years, yet this document is just images of redacted pages.  As such, there is no searchable text, the document will not reflow on different devices and most importantly this document is not Section 508 compliant.  The document cannot be read by a screen reader for people with visual disabilities and it cannot be analyzed using any text analysis tools. The Mueller Report as a redacted PDF document is really kind of sad.”
Technical assessment: conclusion
It's interesting - and deeply unfortunate - that DoJ clearly used advanced redaction software but nonetheless chose to deliver a paper-age “images only” PDF. In so doing they:
Dramatically increased the file‘s size, probably by 8-10x.
Permanently and substantially reduced the visual text and image quality of a document of historical interest
Permanently reduced text searchability (assuming they received a searchable PDF from Mueller)
Delivered a documents that‘s inherently inaccessible to users who require assistive technology (AT) in order to read, requiring substantial remediation efforts to recover any useful degree of accessibility, let alone full compliance with applicable regulations.
“Just PDF it” – the cultural meaning of PDF
Everyone knew that the US Department of Justice and Attorney General Barr would release the Mueller Report as a PDF file.
In fact, it‘s safe to say that AG Barr never considered delivering anything else. No one would have even suggested a Word file, or a set of TIFF images, or a website, or an XPS file, or EPUB, or plain text. It‘s 2019, but it seems safe to say that they simply assumed they‘d use PDF.
If you are like most people, you simply assumed it would be a PDF as well.
Why?
Authenticity
Once he was done writing and editing, Mueller needed to unambiguously “freeze” or fix his document for the purposes of submitting a report. PDF is the only mainstream document format offering this capability.
Why is the fixed nature (“rendering”) so important? It contains the clues humans use to judge authenticity, such as layout, formatting, dates, logos and signatures, and in many other, more subtle ways.
Everyone knows this, which is why people exchange contracts rather than simply share access to a wiki page. The need for a rendering made it easy to predict ahead of time that Barr would release the Mueller report as a PDF, and would never have considered converting its text to DOCX, or posting the text as HTML on a website.
In releasing the redacted PDF of the report to the public, Barr avoids suspicion that the document had been edited (changed) in addition to straightforward redactions. PDF serves the need to unambiguously assure the press and the public that they are seeing Mueller‘s actual report.
Redaction
PDF is the only electronic document format that fully supports redaction. The alternative is a printer + a pair of scissors, grease pen or opaque tape. There's really no model for redaction of HTML-based web content. Redaction of DOCX files, while possible, doesn't provide any assurance about the content prior to its redaction.
Text search
A PDF file, whether “born digital” or scanned from paper, can be made text-searchable and accessible. Barr chose to release scanned pages instead of searchable, accessible pages, but this may have been due to time pressures.
Ease of use
These days, PDF is supported – at least to some degree – by most browsers and on most platforms, in addition to the thousands of PDF technology-specific applications on the market.  Using PDF ensures that all of those diverse users will get a consistent, reliable PDF rendering (although, in the case of this report, sadly, no tags for accessibility, as required by law).
Unfortunately, the image-based PDF the Department of Justice delivered is the least easy-to-use of any option they could have chosen.
PDF: there is no substitute
PDF is the only document format capable of carrying the cultural and technical requirements for important communications in the modern age.
News
Events
Resources
Communities
Members
Report an issue
SIGN UP FOR UPDATES
SIGN UP FOR UPDATES
GET IN TOUCH
GERMAN OFFICE
✆ +49.30.76007317
US OFFICE
✆ +1 617 283 4226
Imprint
Intellectual Property Rights (IPR) Policy
Change privacy settings
WordPress Cookie Notice by Real Cookie Banner