Millions of PDF invisibly embedded with your internal disk paths

I found an interesting privacy issue while analyzing PDF files. This bug occurs when you are using Internet Explorer to print locally saved web pages as PDF and affects all IE versions including IE8. It does not matter which PDF generation software you are using like Adobe Acrobat Professional, CutePDF, PrimoPDF, etc as long as you are invoking it from inside the IE print function. In Windows, even when your default browser is not IE and if you right click a file to select the PRINT from the context menu, then by default it invokes the IE print handler. So, you will still see this issue in the generated PDF.

This bug is NOT ABOUT the local disk path appearing in the FOOTER of your pdf since it is clearly visible and already known by most people. This is easy enough to hide by just going File -> Page Setup -> Change the Footer value from “URL” to “-Empty-”. After doing that, you will not expect your internal disk path being put anywhere else. However, that does not happen.

The privacy issue arises from the fact that your local disk path gets invisibly embedded inside your PDF in the title attribute. Only when you open the file in an Editor like Notepad, you will see it. Currently, there is no option in IE to disable it. The only workaround is to manually nullify this value by editing the PDF file. Note that this problem does not occur when using other browsers such as Firefox and Chrome. In fact, Chrome handles the other footer issue intelligently as well by showing your disk path as “…”, rather than exposing it.

Proof of Concept:

Steps to reproduce:
1. Pick a .HTM or .HTML or .MHT file on your local computer.
2. Open this file in IE and click Ctrl-P.
OR Right-click the file in explorer and select PRINT from context menu.
4. Select any PDF writer as Printer such as Adobe PDF / CutePDF / PrimoPDF / etc.
5. Click Print. When the PDF writer asks for a filename, provide any name.
6. Open the generated pdf in notepad, and search for “file://” without quotes.

Search for this on your favorite search engine (Google/Bing)

filetype:pdf file c (htm OR html OR mhtml)

Google Search 1 (for drive C) – 4 million results
Google Search 2 (for drive D) – 13 million results
and so on…. (I added till drive letter J and total was more than 50 million….)

So, out of 280 million pdfs accessible on the internet, more than 20% look to be exposing internal disk paths which is a huge number. I have contacted the Microsoft and Adobe Security Teams about this issue. Microsoft has plans to fix this in IE9, while Adobe has opened the case but hasn’t planned the timelines yet.

Examples:

http://www.eda.gov/PDF/EDA_vol1;%20Issue10.pdf

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:format>application/pdf</dc:format>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>LewtasS</rdf:li>
            </rdf:Seq>
         </dc:creator>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">file://C:\Documents and Settings\lewtass\Desktop\eda newsletter</rdf:li>
            </rdf:Alt>
         </dc:title>
      </rdf:Description>

http://www.oregon.gov/OMD/OEM/plans_train/grant_info/fy2009_hsgp_investment_justification.pdf

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1-701">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
         <pdf:Producer>Acrobat Distiller 7.0.5 (Windows)</pdf:Producer>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xap="http://ns.adobe.com/xap/1.0/">
         <xap:CreatorTool>PScript5.dll Version 5.2.2</xap:CreatorTool>
         <xap:ModifyDate>2009-03-18T15:07:10-07:00</xap:ModifyDate>
         <xap:CreateDate>2009-03-18T15:07:10-07:00</xap:CreateDate>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">mhtml:file://O:\fema\shsp_2009\draft ijs\fy 2009 investment jus</rdf:li>
            </rdf:Alt>

Tags: , ,

18 Responses to “Millions of PDF invisibly embedded with your internal disk paths”

  1. [...] le 22 novembre 2009 Cette “faille” de confidentialité a été révélée par un article sur le blog SecureThoughts.com. Et c’est assez simple en [...]

  2. [...] This post was mentioned on Twitter by Inferno, Critical Security. Critical Security said: Milijonai PDF failų nepastebimai savyje laiko disko kelią http://bit.ly/8iD9fy [...]

  3. Steve says:

    I regularly print to PDF and am conscious about privacy and security related matters.

    I have followed your instructions to see any of my file location information was embedded in such files. I have found no files with such information.

    These are not files where I’ve used Acrobat to inspect the document then remove personal information, instead they’re ordinary print-to-PDF files created with Internet Explorer and Adobe Acrobat Professional 8.1.7.

    I suspect that this issue is more complex than you have described.

    Initially I was quite worried when I read about this as I regularly combined my printed PDF documents to other PDF documents then distribute these electronically. However, before any PDF leaves my local hard drive I always inspect the document for personal information – and remove any which is found.

    In a similar vein to what you have discovered, here is something that I’ve found (and previously reported to Acrobat via one of its blogs http://blogs.adobe.com/acrobat/2009/02/properly_removing_sensitive_in.html ):

    Open a Microsoft Word document, drag and drop an image into the document. Word will automatically create alternate text for that image – that alternate text is your local file location. If you print to PDF that alternate text remains (as you would expect it to). However, when you then examine the document and ask Acrobat to remove metadata and so on, it leave this information in there.

    This behaviour could expose names, project information or other confidential data.

  4. Inferno says:

    Hi Steve,

    Please note that this occurs when you are printing LOCALLY saved web pages from inside IE. If you are just printing pdf from internet web pages, then this won’t happen. The file path gets embedded in every case for local files, so please try the reproducer steps again (i have even got the latest Adobe Acrobat Professional 9 leave the path in there). Otherwise, drop me an email and I can help you out.

    Cheers,
    Inferno

  5. [...] como prefere ser chamado o pesquisador do blog SecureThoughts explica que documentos ‘impressos’ em PDF, a partir do IE, usando ferramentas como Adobe PDF, [...]

  6. Steve says:

    My apologies, indeed you are correct.

    However, if you use the privacy feature built-in to Acrobat Professional to remove the metadata, you will find that reference is removed.

    Thank goodness I have been doing that for any PDF that leaves my PC. Otherwise, smart recipients may have seen stuff like d:\data\projectrelated\XXXX\name\etc\etc… information they otherwise wouldn’t have been privy to.

    Your find, I think, is akin to what I found with the images and alternate text. The potential is there for the damage to be done and there’s nothing to alert a user that that information is embedded in the file.

    With the images issue, it’s a tad worse as the built-in privacy options don’t remove the path – you need to go in and manually edit the alternative text for each image – something which I do in the source (Word) file.

  7. rsc says:

    Didn’t work with PDFCreator either: tried printing locally saved html file from through explorer.exe:s context menu, from IE8 and Firefox 3.5.5.

  8. Inferno says:

    @rsc, looks like some pdf creators like bullzip, pdfcreator, etc not affected as they might not use the title attribute passed by IE. however, other most widely pdf printers such as acrobat prof., cutepdf, primopdf, etc are vulnerable. also verified by @Steve in above comment that it works.

  9. Steve says:

    Have you taken this up with Adobe?

  10. Inferno says:

    @Steve, yes i did talk to Adobe about this. They have a bug logged on their side as well, however they were unable to confirm if the fix can make up in the next quarterly patch cycle. please note that this bug is hard to coordinate and get fixed on their side since it will require getting every third party pdf print driver fixed, in addition to acrobat professional. on IE side, it is much easier since they need to pass the filename in title attribute rather than the entire disk path.

  11. [...] Pese a que sus versiones posteriores son mucho más confiables, la ignota firma de seguridad Inferno reveló que todas las ediciones de IE son vulnerables a una falla que involucra a los archivos PDF. “Encontré un interesante problema de privacidad, que ocurre cuando se usa Internet Explorer para imprimir páginas web guardadas en el disco duro como PDF y afecta a todas las versiones, incluida IE8″, explicó esta desconocida empresa en su blog. [...]

  12. David says:

    I found my userID in a PDF that I printed to the Adobe PDF printer from our corporate web site. This was not a html file on my local machine. It was embedded as myuserID

  13. Galameth says:

    I had forgotten how fun google hack searching could be till I did your suggested search with .gov, .mil, etc.

    filetype:pdf file c .gov
    filetype:pdf file c .mil

    and so on.

  14. blog index says:

    PDF Speedlinking—A Few Noteworthy Articles…

    It’s that time again.
    Every year, from around late November to the end of December, I usually start getting that “rushed” feeling where everything gets busier.   You’re trying to fit in some extra Christmas shopping on your lunch break, try…

  15. [...] Millions of PDF invisibly embedded with your internal disk paths | SecureThoughts.com – [...]

  16. [...] met verwijzingen naar interne schijflocaties, stelt beveiligingsbureau Inferno op basis van een eigen onderzoek. Volgens Inferno betekent dit een privacyrisico. Schijflocaties bevatten immers vaak de namen van [...]

Leave a Reply