Malware: It’s out there, lurking. Disguised within emails and web links, it searches for ways to enter into our computers. The turmoil resulting from malware depends on the intent of its creator: it can be as innocuous, but irritating as your screen flashing rainbows, or as malevolent as your computer being stripped of sensitive data or being commandeered by a remote user.
Antivirus companies and organizations struggle to keep up with the newest forms of malware, which includes computer viruses, spyware, worms and trojan horses. But their job could soon be easier, thanks to Benjamin Fung, a professor in the School of Information Studies and McGill’s Canada Research Chair in Data Mining for Cyber Security.
“Computer malware is basically an executable file, a software program, but unlike ordinary software we rarely have its source code,” says Fung. One approach to understanding malware has involved running it inside a “virtual machine,” an emulation of a computer system that allows the software engineers to examine the malware’s operation safely.
“Unfortunately, malware is now so complex that observing all its behavior is almost impossible, because it has different behaviors depending on user interactions or the computer environment,” says Fung. “It can even recognize when it is running in a virtual machine, so it decides ‘I’ll be good while I’m here.’” As a result, this approach – the dynamic approach, because researchers are trying to see the malware in action – is increasingly unsuccessful.
The “static” approach, however, is painstakingly slow. In this approach, malware is disassembled into its initial assembly code, the binary code that provides very basic but detailed directives instructing a computer’s central processing unit what to do.
Translating malware into such basic instructions is a huge task, and a reverse engineer seeking to understand the malware could spend days reading assembly code line by line, trying to understand what a particular set of code doing: for instance, is it reading a file? is it opening a network connection?
However, malware creators don’t build from scratch. Instead, they take an existing malware program and modify it for their own purposes, meaning that the assembly code of malware programs features a lot of overlap. And the malware analysts and reverse engineers involved in combatting malware have a large repository of malware code that they have already analyzed.
With that circumstance in mind, Fung and his team have developed Kam1n0, a unique anti-malware tool. Kam1n0 is essentially a search engine that will read the malware’s assembly code and find all the bits that have been borrowed from previous malware programs and thus have already been identified and analysed.
Fung compares Kam1n0’s function to that of Google. “If I think a document has been plagiarized from somewhere else, I can type its first paragraph into Google, and I have access to millions of documents online which will tell me if this paragraph appears elsewhere first,” he says. “Kam1n0 does this with code. We are helping reverse engineers find code that has already been used so they don’t have to read and analyze it again. They will already know what a particular piece of code does: if it is reading a file or opening a network port.”
Consequently, the reverse engineer can quickly identify and focus on the new functions in the malware, dramatically reducing the time and labour needed to understand and counter it.
While the research that has gone into developing Kam1n0 has been funded primarily by Defense Research and Development Canada, Kam1n0 itself is an open access application.
“From our perspective, there is no secrecy to Kam1n0,” says Fung, noting that his program has been picked up by such companies as Cisco Canada.
And it’s a multifunctional tool. “Kam1n0 allows reverse engineers to understand any software, not necessarily malware,” he says. “So if you want to compare the overlap between Chrome and Firefox web browsers, Kam1n0 can do that as well.” This means that Kam1n0 can also be used to identify copyright or patent infringement. “If someone lifts software I have developed and claims it as their own, I can demonstrate that they are copying my program.”
The software’s name is an homage to George Lucas’ Star Wars: Kamino is the planet where clones are created to fill the ranks of stormtroopers. In naming his program, Fung has reconfigured the name slightly, partly to avoid copyright hassles, but also as a nod to his search engine’s objective. As Kam1n0 searches for digital clones in malware, the “i” and “o” from the name of the planet are transformed into a “1” and “0,” the binary building blocks of digital code.
Kam1n0 isn’t Fung’s only foray into using innovative techniques to shed light on some of the murkiness that exists in the online world. One of the other prominent focal points of his research lab is authorship analysis. Fung and his team have been working on a program that can identify the author of anonymous text if there are enough samples of the writer’s style.
“For instance, if I have enough emails from you, I can identify you as the author of your anonymous text.” While this only works if you have a specific candidate with whom to match writing style, Fung can also come up with a profile of an anonymous author, identifying such characteristics as age, gender, educational level, and even likely political orientation, even if there is no clear candidate.
“We learn writing patterns from social media,” he explains. “For instance, we can see what characterizes high school students’ tweets, or blog entries by women between 30 and 49 years old.” This program, he notes, could provide important information in criminal investigations by helping to create a profile of an anonymous author involved in suspect activities.