This weekend I attended Hackers on Planet Earth (HOPE) XI in NYC. Saturday night there was a presentation from Robert Simmons of ThreatConnect on open source malware labs. All the talks are going to be placed online at some point, but I thought I’d highlight some of the material Robert presented in the meantime.
Robert spoke about the use of either open source or otherwise free tools for analyzing malware. By “otherwise free” I mean tools that might require registration or some other action to be performed in order to obtain access. He attributed his overall analysis framework to some of Lenny Zeltser’s work, highlighting four main areas where the lab (open source or otherwise) can impact the organization:
1. Malware Research
2. Enhanced Threat Intelligence
3. Network Defense
4. Fun [IMO, it’s more fun once you figure the sample out]
Robert also spoke about some of the benefits of an automated malware analysis system. He described “hunt teams” which would use an automated analysis system and other tools to enhance threat intelligence. His vision of a hunt team is sort of a “proactive IR”, meaning that rather than reacting to an incident, this team would go searching for malware or other issues and then use the automated system as a force multiplier for analysis. For instance, say the hunt team locates what they believe to be an undetected piece of malware. The team would then use the automated system to begin to build out host and network based signatures, begin to gather pieces necessary for threat intelligence (resolved domains, IP addresses, etc.), etc. This information would then be used to improve defensive measures with updated or new signatures.
His top four entry points for malware are:
1. Files
2. URLs
3. PCAPs
4. Memory Images
My interpretation of this is not that malware literally comes from a memory image, but rather these are the inputs into an automated lab that would then be used to isolate and analyze samples. I think this makes sense because most users aren’t going to be looking through captured packets or memory dumps.
The tools that he highlighted for the talk were:
– Cuckoo Sandbox (also the sandbox used by malwr.com)
– Thug (low interaction URL honeyclient)
– Bro (network monitoring tool)
– Volatility (memory forensics)
One really interesting thing that he mentioned is that Bing can be used for doing passive DNS type searches. He demonstrated that by searching Bing for an IP address, it will return all domains that have resolved to that IP address. This could be really useful in the event that you have a sample that doesn’t resolve a domain name but rather handles C2 via a hard coded IP address. For example, BEAR resolved her.d0kbilo.com (37.59.118.41) this IP is actually associated with a few other malicious domains (e.g., l.kokoke.net). If you lookup an IP address that you feel is suspicious (say from captured packets during dynamic analysis) and notice that it is associated with numerous domains related to malware, you could safely assume that the traffic you found is suspect and should be looked at in more detail. I tried using Bing for this with the BEAR IP address, and didn’t have much luck with it, however trying other IP addresses associated with botnets did reveal other domains that currently or previously resolved to that address. That said, I got some similar results searching on Google also, so perhaps it’s best to just try various search engines for this type of research and then see if you get results from one search engine that you might not get from another.
Anyway, going through the tools, Cuckoo Sandbox is a great tool that I actually use all the time via malwr.com. You can think of a sandbox as sort of an automated container for malware that creates a virtual environment each time a sample is analyzed. In the case of the malwr.com instance, it drops the sample into a Windows XP (or other) environment and then lets the malware run while observing behavior. It’s nice to use a sandbox because it can help get a lot of basic info sorted out for you quickly, such as generating hashes, searching for strings, and so on. These are typically actions that you take with every malware sample that you work on. You can also observe dynamic information such as network activity and dropped files in the sandbox. One thing to keep in mind is that you need to have an understanding of the “baseline behavior” of what sample you are analyzing and the environment so that you can recognize normal behavior that the sandbox might flag. For instance, if you are analyzing a malicious PDF, you should be aware of what files might be dropped or what actions Acrobat Reader will take that are part of normal operation (such as dropping a SQLite database) vs. what can be traced back specifically to the sample you are working on.
Thug is a tool that I haven’t heard of before this talk, and from what I hear isn’t that well known just yet. Thug complements tools like Dionaea. Dionaea and other honeypots will sit out there and collect attacks that are directed towards machines, while Thug is a honeyclient that will go and simulate a connection to suspect sites, in this case websites. It works by crawling sites and simulating various client-side software (e..g, Flash) and trying to see what is triggered by the possibly malicious website, then saving these results for later analysis.
Bro Network Security Monitor could be described simply as an IDS, but it appears that this description doesn’t really do it justice. It’s more of a framework than a system, and it’s actually been around since the 1990s so it has quiet a bit of experience backing it up at this point. Though Bro’s focus is on networking monitoring, you can also use Bro to do traffic analysis, which is how Robert presented this tool during the talk. I remember one interesting point he made was that Bro is very good at recognizing types of traffic – so, if you see captured traffic that Bro didn’t identify, chances are it’s something very shady since at this point Bro will successfully identify pretty much anything that is legitimate. I think sometime I’m going to try using Bro instead of other tools (like Wireshark) and see what results I get.
The last tool that he mentioned was Volatility, which probably doesn’t need an introduction but I’ll talk about it a bit anyway. Volatility is a framework for volatile memory analysis/forensics. There are a few reasons why you might want to look into this type of memory. One is that you might be able to get a better look into things that would otherwise have been obfuscated during static or dynamic analysis. For example, when I analyzed BEAR, you didn’t see the domain or IP address for the C2 server in the static analysis, but you did observe it in the memory dump that I did just with one of the sysinternals tools. Another thing that you can do, which Robert highlighted in the talk, is compare two views of internal memory to see what has changed because of the malware. He took a sample of “normal” memory and then a sample of memory after running the suspicious file and then compared the two in order to help determine what new processes (and other indicators) were associated with the malware sample.
This was a great talk. A couple of other things – one was that Robert had a really clean format for displaying analytical results in swimlanes. He basically set up a swimlane for each of the four tools and then inserted the signatures obtained by each tool in the respective swimlane. I thought it was a great visualization, and I think I’d like to try it at some point with one of my future analyses. The other thing that was nice was that it looks like Robert’s thought process around malware analysis (at a high level) lines up with what I learned, so it was a good validation of how I’ve taught myself and the materials that I used.
Robert has some interesting repositories up on his GitHub site and he’s also on Twitter if you’re interested in following his work.