(Update: See the answers in this blog post)
Can you identify the authors of the four “redacted” documents below?
Do you understand how to effectively redact information from a PDF? Before you say yes, take a look at these PDFs and see which redactions you can “crack”. With some technical savvy, a careful reviewer should be able to identify the author’s name in all four documents.
(Note that none of these documents has actual confidential information! The two on the middle were created just for this post.)
Hint: If you get stumped, read the pointers below.
Has your office ever released confidential information its staff thought redacted? You’re not alone.
Social security numbers, health records, student records—it’s important that we keep such confidential data secure when releasing public records. In the past, redacting confidential information from paper records was fairly straightforward: make a copy, black it out, make another copy, and voila, a redacted paper record! When it comes to digital files, though, stories of botched redactions abound. In 2006, the U.S. Attorney’s Office got in trouble for releasing confidential information related to the BALCO baseball doping scandal–information that its staff believed to be wiped from documents. In 2008, AT&T accidentally publicized sensitive information related to the NSA’s warrantless wiretapping program. Accidents like these are frequently made when information is covered up but not actually removed, leaving the original text or image accessible to anyone with a bit of technical know-how.
In a 2005 document titled Redacting with Confidence: How to Safely Publish Sanitized Reports Converted From Word to PDF the NSA explains that “The key concept for understanding the issues that lead to the inadvertent exposure is that information hidden or covered in a computer document can almost always be recovered.” The NSA advises that “the way to avoid exposure is to ensure that sensitive information is not just visually hidden or made illegible, but is actually removed from the original document.”
A few good pointers when creating redacted PDFs
1. Don’t redact it if it is not confidential. If it’s a public record, then it must be open to public inspection unless you can point to a North Carolina or federal statute that requires access to be withheld. Without such a law, the information contained in the record isn’t confidential. If you’re not sure whether the information is confidential, speak to your legal counsel. You can also consult our Laws Relating to Confidential Records Held by North Carolina Government (updated in 2009) and David M. Lawrence’s very comprehensive Public Records Law for North Carolina Local Governments (the second edition, 2009, is the most recent).
2. Remove the information using the original authoring application. What do we mean by that? Redact the information in the program in which the file was originally created. For example, if a document was written in Word, delete the confidential information and then convert the file to PDF. If it’s a TIFF image, open it in Photoshop, black out the information, make sure the image is flattened (no layers), and convert to PDF. This is less risky than attempting to redact the information after it has already been converted to PDF.
3. If you have to redact directly from the PDF, use Acrobat’s built-in “Mark for Redaction” and “Remove Hidden Information” tools. If you are using Acrobat X, check out the NSA’s Redaction of PDF Files Using Adobe Acrobat Professional X. If you have Adobe Acrobat 9 or earlier, your version may not be able to redact thoroughly. Also keep in mind that Adobe Reader (the free version of Acrobat) does not redact at all.
4. Don’t forget the hidden metadata. There is the information you can see, but there is often also embedded metadata that the authoring application inserts into the document without your ever knowing it. Microsoft Word, for instance, embeds author name, company name, creation and modified dates, and other information. Word and other applications sometimes have a “prepare for sharing,” “check for issues,” or other option to alert you to the presence of such information and to remove it.
5. Don’t presume that if you can’t see the information, other people can’t see it. Covering information up with a black box does not remove the information. There are usually smarter, more persistent people in the world, and if the information exists in the file, they will find it.