Skip to content

Official implementation of **LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges (ACL 2025)**

Notifications You must be signed in to change notification settings

MAIL-Tele-AI/MalwareBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges

1. Project Overview

This project focuses on the security issues of Large Language Models (LLMs) in the context of malware-related tasks and jailbreak attacks. With the widespread adoption of LLMs, concerns about their security, especially vulnerability to jailbreak attacks that can lead to malicious outputs, have grown. While previous research has looked at general LLM security, the specific susceptibility to jailbreak attacks in code generation has not been fully explored.

Overview Image

2. Key Objectives

  • Propose MalwareBench, a benchmark dataset designed to evaluate the robustness of LLMs against jailbreak attacks for malicious code generation.
  • Test 29 mainstream LLMs to analyze their security performance and influencing factors when faced with jailbreak attacks.
  • Reveal the security vulnerabilities of current LLMs and provide important directions for subsequent improvements in model security.

3. MalwareBench Dataset

  • Composition: MalwareBench contains 3,520 jailbreaking prompts for malicious code generation. It is based on 320 manually crafted malicious code generation requirements, covering 11 jailbreak methods and 29 code functionality categories.
  • Categorization: When constructing the dataset, researchers referred to the malimg dataset. Malicious problems are categorized into 6 primary classifications based on user intent, with further secondary and tertiary classifications for some categories. For each detailed category, 5 - 20 malicious requirements were manually crafted, considering different operating systems and dividing requirements into rough and detailed types.

Malware Overview Image

  • Jailbreak Methods: 11 jailbreak methods of three types were selected, with Qwen-Turbo used in some for question generation.

4. Experimental Results

  • Rejection Rates: Initially, code-generation models had a 70.56% rejection rate and generic large models had 51.19% for malicious content. However, when jailbreak methods were applied, these rates dropped to 51.50% and 41.47% respectively. The overall average rejection rate for malicious content is 60.93%, which drops to 39.92% when combined with jailbreak attack algorithms.
  • Model Performance:
    • There is a negative correlation between response scores and refusal rates.
    • About 50.35% of jailbreak attempts yield malicious content. Models like OpenAI-o1 and CodeLlama-70B-Instruct show strong security.
    • LLMs are more defensive against detailed requirements. Small parameter models tend to give irrelevant outputs, and large models are more prone to generating malicious pseudo-code, often relying on their knowledge bases.
    • Advanced reasoning models such as OpenAI-o1 and DeepSeek-R1 can handle malicious requests, but security alignment is crucial.
    • Different models have varied sensitivities to attack algorithms, with Benign Expression being a highly effective jailbreaking method.
    • Regarding requirement types, LLMs show a consistent trend, with low scores for Denial Service and Download&Propagation, and high scores for Information Theft, possibly due to training data or model mechanisms.

Heat Map Image Model Result Examples Image

5. Resources

6. Dataset

The test results of LLMs are located in the test_result folder.

7. BibTeX Citation

@article{*****,
  title={LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges},
  author={Li, Haoyang and Gao, Huan and Zhao, Zhiyuan and Lin, Zhiyu and Gao, Junyu and Li, Xuelong},
  conference={Proceedings of the ACL 2025 Conference},
  year={2025}
}

About

Official implementation of **LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges (ACL 2025)**

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •