Recon 2024

Unleashing AI: The Future of Reverse Engineering with Large Language Models
06-28, 14:00–14:30 (US/Eastern), Grand Salon

In our talk, we take a closer look at Large Language Models (LLMs) in reverse engineering, highlighting both their current uses and future potential. We address the opportunities and challenges presented by LLMs, from enhancing code analysis to navigating issues of inaccuracies and privacy. To address these challenges, we introduce ReverserAI as a platform designed to explore and expand the capabilities of LLMs within this field. We further illustrate how local, privacy-focused LLM setups can overcome existing privacy limitations. Lastly, we explore and showcase ways to significantly improve current LLM outputs by combining them with traditional static analysis techniques, for example in the context of malware analysis. Our discussion also covers the anticipated evolution of LLM technology, underscoring its promise to advance the field.

LLMs like ChatGPT have made significant strides, excelling in tasks from text and image generation to coding, expanding the horizons of "artificial intelligence". Despite these advancements, their application in reverse engineering has been underexplored. Existing efforts, such as enhancing decompiler output through better naming and interactive code explanations, have shown promise, as evidenced by a surge in supportive plugins. However, these approaches face challenges, including irrelevant suggestions and inaccuracies, and hallucinations due to limited context or input size. Another significant downside is their service-oriented usage, which requires constant internet access, incurs ongoing costs, and raises privacy concerns when dealing with sensitive data.

Despite these challenges, we believe that we are just scratching the surface of how LLMs can support reverse engineering efforts. This talk aims to delve into previously unexplored opportunities of LLMs, extending beyond current applications. For this purpose, we introduce ReverserAI, a plugin serving as a testbed for new techniques, enhancing LLMs' support in reverse engineering tasks. We begin by showing how to run a local, privacy-sensitive setup of an LLM and demonstrate that modern consumer-grade hardware suffices to achieve good performance. Subsequently, we explore the synergy between conventional code analysis techniques and LLMs to refine outcomes, the preprocessing of binaries for malware analysis, as well as the identification & naming of API functions in statically-linked executables. Moreover, we examine the current limitations of these methods and discuss areas that remain inaccessible.

Finally, we look forward to future developments in LLM technology, such as improvements in handling larger code bases and the rise of local LLM implementations and explore how these could further advance reverse engineering practices, as their capabilities expand.

See also:

Tim Blazytko is a well-known binary security researcher and co-founder of emproof. After working on novel methods for code deobfuscation, fuzzing and root cause analysis during his PhD, Tim now builds code obfuscation schemes tailored to embedded devices. Moreover, he gives trainings on reverse engineering & code deobfuscation, analyzes malware and performs security audits.

Moritz Schloegel is a binary security researcher at the CISPA Helmholtz Center for Information Security. He is currently in the last year of his PhD and focuses on automated finding, understanding, and exploitation of bugs. Furthermore, he possesses a deep passion for exploring the complexities of (de-)obfuscation, emphasizing automated deobfuscation attacks and their countermeasures.