LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges

1. Project Overview

This project focuses on the security issues of Large Language Models (LLMs) in the context of malware-related tasks and jailbreak attacks. With the widespread adoption of LLMs, concerns about their security, especially vulnerability to jailbreak attacks that can lead to malicious outputs, have grown. While previous research has looked at general LLM security, the specific susceptibility to jailbreak attacks in code generation has not been fully explored.

2. Key Objectives

Propose MalwareBench, a benchmark dataset designed to evaluate the robustness of LLMs against jailbreak attacks for malicious code generation.
Test 29 mainstream LLMs to analyze their security performance and influencing factors when faced with jailbreak attacks.
Reveal the security vulnerabilities of current LLMs and provide important directions for subsequent improvements in model security.

3. MalwareBench Dataset

Composition: MalwareBench contains 3,520 jailbreaking prompts for malicious code generation. It is based on 320 manually crafted malicious code generation requirements, covering 11 jailbreak methods and 29 code functionality categories.
Categorization: When constructing the dataset, researchers referred to the malimg dataset. Malicious problems are categorized into 6 primary classifications based on user intent, with further secondary and tertiary classifications for some categories. For each detailed category, 5 - 20 malicious requirements were manually crafted, considering different operating systems and dividing requirements into rough and detailed types.

Jailbreak Methods: 11 jailbreak methods of three types were selected, with Qwen-Turbo used in some for question generation.

4. Experimental Results

Rejection Rates: Initially, code-generation models had a 70.56% rejection rate and generic large models had 51.19% for malicious content. However, when jailbreak methods were applied, these rates dropped to 51.50% and 41.47% respectively. The overall average rejection rate for malicious content is 60.93%, which drops to 39.92% when combined with jailbreak attack algorithms.
Model Performance:
- There is a negative correlation between response scores and refusal rates.
- About 50.35% of jailbreak attempts yield malicious content. Models like OpenAI-o1 and CodeLlama-70B-Instruct show strong security.
- LLMs are more defensive against detailed requirements. Small parameter models tend to give irrelevant outputs, and large models are more prone to generating malicious pseudo-code, often relying on their knowledge bases.
- Advanced reasoning models such as OpenAI-o1 and DeepSeek-R1 can handle malicious requests, but security alignment is crucial.
- Different models have varied sensitivities to attack algorithms, with Benign Expression being a highly effective jailbreaking method.
- Regarding requirement types, LLMs show a consistent trend, with low scores for Denial Service and Download&Propagation, and high scores for Information Theft, possibly due to training data or model mechanisms.

5. Resources

Code: MAIL-Tele-AI/MalwareBench
Huggingface: Tele-AI-MAIL/MalwareBench

6. Dataset

Original Dataset: original_questions.xlsx
Attack Prompts: attack_prompts.xlsx

The test results of LLMs are located in the test_result folder.

7. BibTeX Citation

@article{*****,
  title={LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges},
  author={Li, Haoyang and Gao, Huan and Zhao, Zhiyuan and Lin, Zhiyu and Gao, Junyu and Li, Xuelong},
  conference={Proceedings of the ACL 2025 Conference},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
static/images		static/images
test_result		test_result
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges

1. Project Overview

2. Key Objectives

3. MalwareBench Dataset

4. Experimental Results

5. Resources

6. Dataset

7. BibTeX Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

MAIL-Tele-AI/MalwareBench

Folders and files

Latest commit

History

Repository files navigation

LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges

1. Project Overview

2. Key Objectives

3. MalwareBench Dataset

4. Experimental Results

5. Resources

6. Dataset

7. BibTeX Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages