Decoupling coding habits from functionality for effective binary authorship attribution

Saed Alrabaee, Paria Shirani, Lingyu Wang, Mourad Debbabi, Aiman Hanna

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Binary authorship attribution refers to the process of identifying the author of a given anonymous binary file based on stylistic characteristics. It aims to automate the laborious and error-prone reverse engineering task of discovering information related to the author(s) of a binary code. Existing works typically employ machine learning methods to extract features that are unique for each author and subsequently match them against a given binary to identify the author. However, most existing works share a common critical limitation, i.e., they cannot distinguish between features representing program functionality and those representing authorship (e.g., authors' coding habits). Such distinction is crucial for effective authorship attribution because what is unique in a particular binary may be attributed to either author, compiler, or function. In this study, we present BinAuthor a system capable of decoupling program functionality from authors' coding habits in binary code. To capture coding habits, BinAuthor leverages a set of features that are based on collections of functionality-independent choices made by authors during coding. Our evaluation demonstrates that BinAuthor outperforms existing methods in several aspects. First, it successfully attributes a larger number of authors with a significantly higher accuracy (around 90 %) based on the large datasets extracted from selected open-source C + + projects in GitHub, Google Code Jam events, Planet Source Code contests, and several programming projects. Second, BinAuthor is more robust than previous methods; there is no significant drop in accuracy when the code is subjected to refactoring techniques, simple obfuscation, and processed with different compilers. Finally, decoupling authorship from functionality allows us to apply BinAuthor to real malware binaries (Citadel, Zeus, Stuxnet, Flame, Bunny, and Babar) to automatically generate evidence on similar coding habits.

Original languageEnglish
Pages (from-to)613-648
Number of pages36
JournalJournal of Computer Security
Volume27
Issue number6
DOIs
Publication statusPublished - 2019

Keywords

  • Binary authorship attribution
  • assembly instructions
  • coding habits
  • malware analysis

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Decoupling coding habits from functionality for effective binary authorship attribution'. Together they form a unique fingerprint.

Cite this