First attempt at analyzing a “real” file

I collected a bunch of presumably malicious documents from my Hotmail account that has turned into a great repository of spam and malware over the years. I’ve had this email address since 1997 and I regularly get phishing email sent to me, containing both links and files (typically MS Office documents, .PDFs, and various compressed files). Since I just finished working through Practical Malware Analysis, I thought I’d try analyzing some of the crap that gets sent to me on a daily basis.

I created a shady Russian email address that I use for forwarding all my malicious email. I don’t really want to have my email account that I actually use open while I open or run any of these attachments, so I just forward the messages to this new address and then use that to download the attachments. I access the account through a VM running on a different platform than the guest OS (in this case, I run Linux as the host and Win7 as the guest) on a burner laptop so that any “containment” issues hopefully won’t become too much of a problem.

It’ll be interesting to go through these files to see what types of malware one tends to find here. In speaking with people at conferences and reading various documents over time, I have gotten the impression that downloaders/launchers are probably what I’m going to find in my inbox. You could think of these as the “first stages” that help to get malware into orbit. You might have a piece of malware that’s particularly large, so the downloader/launcher might check the target system for certain conditions before downloading and launching the payload. For instance, you might have a downloader that might check for a particular vulnerability, or maybe check for various indicators of an analysis environment, such as being run within a virtual machine. Another reason one might want to use a downloader/launcher is so that the actual malware payload isn’t delivered until it seems “safe” to do so, so that any analyst doesn’t have much to work with.

This particular email came from on 28APR2016. This domain appears to have been registered about a month and a half ago. While it was registered in Malaysia, we are meant to believe that the owner is in the USA based on the whois information (though I can see that this name and email address associated with that person also appears to be associated with some other shady activities). This email also contains a link to download a purported PC Clean Up tool as well as a single link to an actual SANS webpage.



After downloading these files, I ran them through some basic static tools like Strings and KANAL, and then through various .PDF Python scripts such as those from Didier Stevens. The only interesting things I saw were a bunch of URI functions that linked to sketchy sounding websites and files such as PcCleanUp.exe. PDFStreamDumper suggested that there might be something hidden in a few streams.

I started up a bunch of tools for dynamic analysis and then opened the document, but didn’t observe anything special happening. The .PDF itself is a very ugly looking advertisement for SANS cyber security training classes, with links to various files such as an .EXE, a .ZIP file, a WinACE compressed file, an .SCR file (possibly a screen saver or a Steam-associated file), and an .RTF file. I would have really liked to have gotten some of the hosted files to look at, but by the time I opened these attachments (about two weeks after receipt), the links were already dead.

I didn’t get much out of this file with some of the online malware sites. Virustotal has only a 1/55 detection ratio (Sophos only Sophos detected what it thought to be a trojan). Malwr indicated that no hosts were contacted, but did detect some shellcode byte patterns and a SQLite 3.x database dropping.



I went back for another look through this sample to check out the SQLite db and the possible shellcode. Malwr says that the SQLite db dropped was in a file called SharedDataEvents, so I went and found that in an Adobe directory for the current user (see raw notes below for specifics). Some reading revealed that a SQLite db file will begin with the string “SQLite format 3\000”. It seems that any random binary could be given this string in the beginning and then create a false positive, so I went and got the SQLite file format from and took a look through the header.

The header seemed to check out. Everything seemed to be in the right place – the file size (3072 bytes) makes sense given the page size at offset 16 (1024 bytes) and size of DB in pages at offset 28 (3). Text encoding (offset 56) for the db is UTF-8. Opening this file in DB Browser shows two tables, pref_events and version_table. pref_events held no records while version_table held only a single record consisting of the integer “4”.

Looking on my main machine, though, this db just seems to be a part of Acrobat Reader’s normal operation. A little searching also indicated that you should see Acrobat making use of a SQLite db, so this seems to fit. I took a .PDF that I knew to be fine and ran that through the same sandbox, and got inconsistent results. With this clean .PDF, it didn’t report that it saw a SQLite db drop (though if you look in the files dropped area on the websiet, the db did drop, and in the same manner as with the the malicious .PDF). The clean .PDF also doesn’t show any shellcode signatures detected(which is to be expected). As interesting as it is to go through how Acrobat and .PDF files work, I’m going to conclude at this point that there’s nothing nefarious about this or other files that were observed being dropped or written to.

Moving on to search for shellcode, I had looked through the file in a couple of areas where PDFStreamDumper indicated there might be something hidden, but didn’t see anything that looked like obvious shellcode. I went back through the file again, this time looking for any instances of the following opcodes which would indicate a call or various kinds of jumps or loops (thanks again to PMA):

0x70 through 0x7F

I should note that disassembly and debugging is a bit of a challenge as I am working with the free license of IDA Pro (5.0) which will only support up to 32-bit disassembly, and I have a similar situation with Olly and Immunity. At some point, budget-permitting, I definitely plan to upgrade my license. I have a few different tools I can use for 64-bit, though, so I’ll rotate through them and see what I can get.

I noticed a few areas where there were bytes that could be used to make jumps, so I turned those into code in IDA to see what would happen. These led me to other areas where there might be code, but in looking at what IDA disassembled there in response to the earlier jumps, these sections don’t look like they actually do anything. I’m not seeing call/pop combinations or other things I would expect to see if this were shellcode. I see a few popa instructions but these also don’t appear to be used in ways that I would expect.

Trying with Arkdasm (a 64-bit disassembler) revealed even fewer instructions that made sense. X64dbg can’t attach to any of the processes associated with this file. Copying all this stuff from the .PDF file into Online Disassembler and then setting it to 64-bit results in more garbage. I even tried attaching Olly to a new Acrobat parent and child process but this just resulted in more junk (not surprisingly).

At this point I’m setting this file aside to move on to other samples that have come in more recently. I’m disappointed that I wasn’t able to get to the links before they were taken down, and I would have been particularly interested to see what the .EXE and .RTF files were doing. In the future I’m going to try to analyze these attachments as soon as I can after they arrive.

Findings and observations:
Sample is a .PDF file that was received in an Hotmail account that has been compromised multiple times over the years. This sample contains several links to download files from a few different sites, however upon trying to activate these links they had already been taken down. The sites linked to were generic file sharing sites, and file types included .EXE, .ACE, .RTF, .ZIP and .SCR files. The email was received from a domain currently registered in Malaysia and associated with a person purported to be living in Kansas, which could lead to further information about the source.

Try to examine samples within 72 hours of receipt next time. Upgrade licenses to enable better analysis of 64-bit code.

Missed the window to obtain additional files that would have been downloaded/launched by this document. Act faster on similar samples in the future. Online virus scanning sites may be ineffective in detecting malicious files such as these, and can also produce false positives and/or flag normal program behavior as suspicious.

20160526 MalPDF_001