Tutorial: Analyzing Malicious PDFs


Principal Cybersecurity Architect
Staff member
Hi all, in this post we will be exploring malicious PDF files and how the bad guys leverage them to infect computer systems.

I'm sure a lot of people are familiar with receiving a strange email often times seemingly from a known person containing an attachment. You open it and miraculously you're system is infected or your AV/AM product triggers an alert of malicious activity on your system. In this tutorial we are going to deep dive into PDF attachments and a bit in to macro enabled office documents and how they can be used to infect computer systems.

I'm sure most if not all of us have come a cross a PDF file or portable document format. Needless to say PDFs are content rich and a widely used document type for just about everything. When it comes to malware analysis there are some very important properties of PDFs that we need to know about and learn to examine. These are the following
  • JavaScript
  • URIs
  • Embedded Files
  • AAs or Auto Annotations
  • Open Actions
While these can be found in legitimate PDFs they are also used by malicious actors to infect computer systems. AAs and Open Actions are the most dangerous coupled with Javascript to infect a system. Both auto annotations and open actions can execute by simply opening a PDF which in many infections will trigger Javascript to act as a downloader for a second or multistage malware payload download.

Well enough with an intro lets get into the meat of it.

For this exercise/tutorial you will need the following.
  • Python 2.7 (this is reaching EOL but required to run the tools)
  • Once python is installed you will need to install the OLEfile module with the following command pip install olefile
  • A PDF to examine. (I won't provide the malicious PDF but easy enough to find one in a search)
  • Didier Stevens PDF and OLE python scripts Didier's full software list here My Software
    • We need specifically pdf-parser.py and oledump.py

Alright so we have a PDF we want to examine, in my example it's one that Didier had created.
The first step in our examination is listing the objects embedded in the PDF with the following command (I created a PATH variable and renamed my Python 2.7 exe to python27 because I have python 2.7 and 3.X installed with another PATH variable)
*NOTE* I also rename malicious files with a .VIR extension like Didier does so that Windows shell execution will not open the files)

In the above picture we see the object type : <object #> for each object type. From above we are interested in the JavaScript, OpenAction, and EmbeddedFile objects

We could go straight to the JavaScript or file but lets look at the OpenAction and see what it does (this is what happens by simply opening the PDF)
This part can get a bit confusing. The open action references several other objects (2, 3, 7 and 9). From the screenshot we see the open action references object 9

Ok lets take a look at object 9. Here we see straight JavaScript code, but sometimes we have to decode it (easy to do with the pdf-parser.py code. In this case its it's simply extracting a word doc (object 8 from the first image) called eicar-dropper.doc

Well that's neat lets extract that word document. Yay!
From the following we are dumping object 8 to C:\SCRATCH\bad.doc.vir. Remember it's a good idea to change the extension so the explorer shell doesn't execute a valid program (Microsoft Word) to open it [that would infect the system]

One important thing to note if we see any type of filter like above /Filter /FlatDecode we need to decode the object with -f otherwise we will use -c to dump the file content.

Cool we now have a word document the PDF extracts from itself and opens just by opening the PDF!
Now we need to switch to the OLE dump tool or Object Linked and Embedded CF (compound format) which is used by all MS office documents and others OLE Compound File - ForensicsWiki

Now lets examine the doc file with oledump.py with the following

Here we see the object # on the left. The M and m indicate macros, followed by the object size and name.
Something interesting to note is that the macros are encoded, but have no fear they are decodable with didiers tool

The first object (7) we see some char data written to a file

and lastly we see an AutoOpen in object 9 (kind of a mess) which is opening the file created in the macro


and from that our malware payload is executed.

Hopefully this was interesting and informative!!!!


  • 1562728148594.png
    70.3 KB · Views: 310
  • 1562729007787.png
    79.2 KB · Views: 255
Last edited:


Essential Member
Premium Supporter
Good one. It needs more publicity.
A while ago, we were hammered with cautions on downloading *.zip's and , in particular, Torrent files. PDF's have taken there place. only examine PDF's from known official sources.


Well-Known Member
If you receive an attachment or link, especially if it's from someone you think you know (including organizations you do business with), contact the apparent source to verify that they sent it. Attachments and links often contain or lead to malware, as the OP rightly points out.

Sent from my iPhone using Tapatalk