Telegram Group & Telegram Channel
LLM4Decompile: Decompiling Binary Code with Large Language Models

8 Mar 2024 · Hanzhuo Tan, Qi Luo, Jing Li, Yuqun Zhang ·

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source #LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results.

Paper: https://arxiv.org/pdf/2403.05286v3.pdf

Code: https://github.com/albertan017/LLM4Decompile



@Machine_learn



tg-me.com/Machine_learn/3408
Create:
Last Update:

LLM4Decompile: Decompiling Binary Code with Large Language Models

8 Mar 2024 · Hanzhuo Tan, Qi Luo, Jing Li, Yuqun Zhang ·

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source #LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results.

Paper: https://arxiv.org/pdf/2403.05286v3.pdf

Code: https://github.com/albertan017/LLM4Decompile



@Machine_learn

BY Machine learning books and papers




Share with your friend now:
tg-me.com/Machine_learn/3408

View MORE
Open in Telegram


Machine learning books and papers Telegram | DID YOU KNOW?

Date: |

Unlimited members in Telegram group now

Telegram has made it easier for its users to communicate, as it has introduced a feature that allows more than 200,000 users in a group chat. However, if the users in a group chat move past 200,000, it changes into "Broadcast Group", but the feature comes with a restriction. Groups with close to 200k members can be converted to a Broadcast Group that allows unlimited members. Only admins can post in Broadcast Groups, but everyone can read along and participate in group Voice Chats," Telegram added.

The global forecast for the Asian markets is murky following recent volatility, with crude oil prices providing support in what has been an otherwise tough month. The European markets were down and the U.S. bourses were mixed and flat and the Asian markets figure to split the difference.The TSE finished modestly lower on Friday following losses from the financial shares and property stocks.For the day, the index sank 15.09 points or 0.49 percent to finish at 3,061.35 after trading between 3,057.84 and 3,089.78. Volume was 1.39 billion shares worth 1.30 billion Singapore dollars. There were 285 decliners and 184 gainers.

Machine learning books and papers from us


Telegram Machine learning books and papers
FROM USA