Hi all, in this post we will be exploring malicious PDF files and how the bad guys leverage them to infect computer systems.
I'm sure a lot of people are familiar with receiving a strange email often times seemingly from a known person containing an attachment. You open it and miraculously you're system is infected or your AV/AM product triggers an alert of malicious activity on your system. In this tutorial we are going to deep dive into PDF attachments and a bit in to macro enabled office documents and how they can be used to infect computer systems.
I'm sure most if not all of us have come a cross a PDF file or portable document format. Needless to say PDFs are content rich and a widely used document type for just about everything. When it comes to malware analysis there are some very important properties of PDFs that we need to know about and learn to examine. These are the following
Well enough with an intro lets get into the meat of it.
For this exercise/tutorial you will need the following.
Alright so we have a PDF we want to examine, in my example it's one that Didier had created.
The first step in our examination is listing the objects embedded in the PDF with the following command (I created a PATH variable and renamed my Python 2.7 exe to python27 because I have python 2.7 and 3.X installed with another PATH variable)
*NOTE* I also rename malicious files with a .VIR extension like Didier does so that Windows shell execution will not open the files)
In the above picture we see the object type : <object #> for each object type. From above we are interested in the JavaScript, OpenAction, and EmbeddedFile objects
We could go straight to the JavaScript or file but lets look at the OpenAction and see what it does (this is what happens by simply opening the PDF)
This part can get a bit confusing. The open action references several other objects (2, 3, 7 and 9). From the screenshot we see the open action references object 9
Ok lets take a look at object 9. Here we see straight JavaScript code, but sometimes we have to decode it (easy to do with the pdf-parser.py code. In this case its it's simply extracting a word doc (object 8 from the first image) called eicar-dropper.doc
Well that's neat lets extract that word document. Yay!
From the following we are dumping object 8 to C:\SCRATCH\bad.doc.vir. Remember it's a good idea to change the extension so the explorer shell doesn't execute a valid program (Microsoft Word) to open it [that would infect the system]
One important thing to note if we see any type of filter like above /Filter /FlatDecode we need to decode the object with -f otherwise we will use -c to dump the file content.
Cool we now have a word document the PDF extracts from itself and opens just by opening the PDF!
Now we need to switch to the OLE dump tool or Object Linked and Embedded CF (compound format) which is used by all MS office documents and others OLE Compound File - ForensicsWiki
Now lets examine the doc file with oledump.py with the following
Here we see the object # on the left. The M and m indicate macros, followed by the object size and name.
Something interesting to note is that the macros are encoded, but have no fear they are decodable with didiers tool
The first object (7) we see some char data written to a file
and lastly we see an AutoOpen in object 9 (kind of a mess) which is opening the file created in the macro
....
....
...
...
...
...
and from that our malware payload is executed.
Hopefully this was interesting and informative!!!!
I'm sure a lot of people are familiar with receiving a strange email often times seemingly from a known person containing an attachment. You open it and miraculously you're system is infected or your AV/AM product triggers an alert of malicious activity on your system. In this tutorial we are going to deep dive into PDF attachments and a bit in to macro enabled office documents and how they can be used to infect computer systems.
I'm sure most if not all of us have come a cross a PDF file or portable document format. Needless to say PDFs are content rich and a widely used document type for just about everything. When it comes to malware analysis there are some very important properties of PDFs that we need to know about and learn to examine. These are the following
- JavaScript
- URIs
- Embedded Files
- AAs or Auto Annotations
- Open Actions
Well enough with an intro lets get into the meat of it.
For this exercise/tutorial you will need the following.
- Python 2.7 (this is reaching EOL but required to run the tools)
- Once python is installed you will need to install the OLEfile module with the following command pip install olefile
- A PDF to examine. (I won't provide the malicious PDF but easy enough to find one in a search)
- Didier Stevens PDF and OLE python scripts Didier's full software list here My Software
- We need specifically pdf-parser.py and oledump.py
Alright so we have a PDF we want to examine, in my example it's one that Didier had created.
The first step in our examination is listing the objects embedded in the PDF with the following command (I created a PATH variable and renamed my Python 2.7 exe to python27 because I have python 2.7 and 3.X installed with another PATH variable)
*NOTE* I also rename malicious files with a .VIR extension like Didier does so that Windows shell execution will not open the files)
In the above picture we see the object type : <object #> for each object type. From above we are interested in the JavaScript, OpenAction, and EmbeddedFile objects
We could go straight to the JavaScript or file but lets look at the OpenAction and see what it does (this is what happens by simply opening the PDF)
This part can get a bit confusing. The open action references several other objects (2, 3, 7 and 9). From the screenshot we see the open action references object 9
Ok lets take a look at object 9. Here we see straight JavaScript code, but sometimes we have to decode it (easy to do with the pdf-parser.py code. In this case its it's simply extracting a word doc (object 8 from the first image) called eicar-dropper.doc
Well that's neat lets extract that word document. Yay!
From the following we are dumping object 8 to C:\SCRATCH\bad.doc.vir. Remember it's a good idea to change the extension so the explorer shell doesn't execute a valid program (Microsoft Word) to open it [that would infect the system]
One important thing to note if we see any type of filter like above /Filter /FlatDecode we need to decode the object with -f otherwise we will use -c to dump the file content.
Cool we now have a word document the PDF extracts from itself and opens just by opening the PDF!
Now we need to switch to the OLE dump tool or Object Linked and Embedded CF (compound format) which is used by all MS office documents and others OLE Compound File - ForensicsWiki
Now lets examine the doc file with oledump.py with the following
Here we see the object # on the left. The M and m indicate macros, followed by the object size and name.
Something interesting to note is that the macros are encoded, but have no fear they are decodable with didiers tool
The first object (7) we see some char data written to a file
and lastly we see an AutoOpen in object 9 (kind of a mess) which is opening the file created in the macro
....
....
...
...
...
...
and from that our malware payload is executed.
Hopefully this was interesting and informative!!!!
Attachments
Last edited: