Huntr | Blog

Navigating AI/ML Bug Bounty Hunting: Lessons from Hunting Pickle Deserialization Vulnerabilities

Written by Madison Vorbrich | Dec 3, 2024 11:45:00 AM
Introduction

You know what’s better than just using AI/ML systems? Breaking them—ethically, of course. Peng Zhou (aka zpbrent), one of our huntrs, did exactly that when he explored a popular AI hub and uncovered a handful of vulnerabilities that weren’t on anyone’s radar. His blog  gives you the lowdown on how he did it.

Curious how you can up your own bug-hunting game? We’ve got you. Here are five key lessons from Peng’s journey that can help you in your own vulnerability research.

Key Lessons from Peng Zhou’s AI/ML Bug Bounty Journey

  1. Understand Deserialization Risks

    • Peng’s discovery centered on a Python pickle deserialization vulnerability. The pickle module enables object serialization, but using pickle.loads() for deserialization can lead to serious consequences, including Remote Code Execution (RCE). Deserializing untrusted data can allow malicious actors to inject and execute arbitrary code.

      In the context of AI/ML systems, where model loading and data exchange are frequent, this vulnerability can be exploited if the system doesn’t properly validate or sanitize inputs. Auditing every instance of functions like pickle.loads() is essential to prevent such exploits, especially when handling external or untrusted data sources.

  2. Challenging Security Norms

    • Many AI/ML libraries do not classify local pickle deserialization as a critical vulnerability. However, Peng’s research challenged this assumption. By showing how a locally stored pickle file could be manipulated to load remotely, he uncovered a significant security flaw.

      This teaches us a vital lesson: just because a security practice is widely accepted doesn’t mean it’s safe. Common practices, especially in emerging fields like AI/ML, should be questioned and tested rigorously. Always dig deeper into established methods, as these can often contain overlooked vulnerabilities.

  3. Think in Attack Chains

    • A major breakthrough in Peng’s research was exploiting the interaction between seemingly unrelated components. He crafted an attack by linking two repositories—one benign and the other malicious. When the from_pretrained() API was used to load a model from the benign repo, it triggered malicious code from the second repo, bypassing the Pickle Scanning feature.

      When hunting for vulnerabilities, think beyond individual flaws and consider how components can work together to form an attack chain. What may seem like minor issues when isolated can, in combination, create a serious security breach.

  4. Persistence Pays Off

    • Peng’s initial report was rejected as "informative," meaning it wasn’t initially considered a significant vulnerability. However, he persisted, refining his attack vector and providing detailed proof-of-concept (PoC) demonstrations. His persistence ultimately led to the vulnerability being acknowledged as critical, earning him two CVEs and significant bug bounty rewards.

      For security researchers, rejection is often part of the process. The key is to remain persistent, provide clear documentation, and refine your PoC to better demonstrate the impact. A well-documented vulnerability paired with an effective PoC can often turn initial skepticism into recognition.

  5. Securing Model Loading APIs

    • Peng’s research also showed how AI/ML model-loading APIs can be prime targets for exploitation. In one instance, a widely-used from_pretrained() API was vulnerable to an attack vector that could lead to remote code execution by loading a compromised model. By manipulating the model-loading process and bypassing the Pickle Scanning mechanism, Peng demonstrated how easy it was to introduce malicious code into a supposedly secure environment.

      To secure similar systems, it’s crucial to scrutinize the APIs handling external data or models. Ensure that external models are verified before being loaded, and enforce strict security protocols. Regular testing of model-loading APIs for deserialization risks and other vulnerabilities is necessary to prevent such attacks from succeeding.

The Future of AI/ML Bug Bounties

Peng Zhou's journey proves one thing: AI/ML bug bounty hunting is still figuring itself out—kind of like the rest of us. As the landscape grows, there will be more opportunities for hackers to secure the systems that power the future of AI. But let’s be real, it’s not for the lazy. It takes grit, creativity, and a stubborn refusal to give up when the going gets tough (and it will).

If you’re hunting for vulnerabilities in AI/ML applications, stay curious, keep experimenting, and don’t be afraid to challenge the status quo. 

Think you've got what it takes to hunt down vulnerabilities in AI/ML model files? Prove it. Starting submitting your model format vulnerabilities and put your skills to the test. This is your chance to take home up to $3k huntrs. Let's go!