MXNet Unsafe Pointer Usage

Apr 12, 2024 By Dan McInerney

Hacking AI/ML: MXNet Unsafe Pointer Usage

Note from Protect AI

Security researcher Sierra Haex, in collaboration with huntr's Threat Research team, discovered an interesting bug in MXnet, a popular library for creating machine learning models. Mishandling of memory in a core, commonly used, function in MXnet leads to arbitrary code execution. In this guest blog post by Sierra, she explains how this issue can lead to remote code execution if MXnet remotely ingests user input such as through a web application.

What is MXNet?

Apache MXNet was designed as a flexible and efficient library for deep learning, with features like distributed training, 8 language bindings, a thriving ecosystem, as well as a hybrid front end. The current version of this project is 1.9.1, and is available via public repositories / PyPi ecosystem. The source code of the project can be obtained via GitHub:

git clone --recursive https://github.com/apache/mxnet

And the release version of the library for Python3 can be obtained via Pip:

pip3 install mxnet

Vulnerability

MXNet contains a vulnerability in a common function that takes user input which allows attackers to perform code execution. This is particularly dangerous when MXNet is used in an API or web application as remote user input leads directly to remote code execution and system takeover.

Discovering the Vulnerability

MXNet has a number of API functions exposed to python via `src/c_api/c_api.cc`, and then wrapped in user-friendly functions in `python/mxnet/ndarray/ndarray.py`. A large number of these functions give and receive pointers from the application, rather than properly abstracting that and tracking them in the C++ code. This enables an attacker with access to the API to merely modify an object pointer (such as NDArray.handle) or simply call the library functions with malicious pointers to cause memory corruption.

After some manual source code review, the function `MXNDArrayGetStorageType` in the C code was discovered to be vulnerable. The Python bindings can be found in `python/mxnet/ndarray/ndarray.py`:

The Bug

The Python bindings map directly to the MXNet Library API in `src/c_api/c_api.cc`. The `MXNDArrayGetStorageType` function contains the following code:

There are three things to note in this function:

There is no validation of the handle, it is immediately cast to a pointer (`arr`) and used. Same with the `int *out_storage_type` pointer.
The `is_none()` function is defined previously in `ndarray.h` as an inline function, which just checks the value at `[*arr + 0]` and compares it to zero. We can easily bypass this check by setting the handle pointer inside of a byte array buffer we control and setting that value to non-zero.
`storage_type()` is also function is defined previously in `ndarray.h` as an inline function, its assembly is just two instructions:

Number 2 is extremely powerful, it not only gives an attacker the ability to write anywhere in memory, but also read any memory!

Reading All Memory

By supplying a handle that points into a bytes buffer, we can bypass the check at 1 and control what memory address gets read back out in 2. One thing to note is that we have almost full memory read access--in order to read memory address `XYZ` , we have to set `handle` to `XYZ - 0x50`, and that value must be non-zero or the function will return `0xffffffff`. Another constraint due to the data type of `out_storage_type` is that reads will only result in 4 bytes. That's OK, as we can just read twice to get a full 8 byte QWORD:

Writing All Memory

Because in number 2 we control the address of `out_storage_type` so we have the ability to write anywhere in memory – without the reading memory constraint! However, due to the data of `out_storage_type`, we can only write 4 bytes at a time, but that's OK! We just write twice, offset by 4 bytes to write a QWORD into memory:

RIP Control

Now that we have read and write primitives, we can easily get code execution by overwriting a built-in function in Python, and `id()` (address of object) is a great target for this:

And once that has been overwritten, we can cause `RIP` to jump to that address by calling `id()` on an object:

id(1) # this will cause a crash with RIP=0x4142434445464748

Local Exploit Development

Because we can read and write in memory, ASLR (Address Space Randomization) is trivially defeated by calling Python builtin functions like `id()`. That will enable us to know where all the executables and libraries are in memory, although we only have control over the following with our code execution primitive:

- `RIP`

- `[RSI]+0x18` (object parameter passed as argument to `id()`)

- `[RSP-0x48]+0x20` (object parameter passed as argument to `id()`)

At this point, we can either construct a ROP (Return Oriented Programming) chain, but that will require version specific gadgets for every version of Python, LibC, and LibFFI. Great for a single target, but we can do better!

Upon inspection of the memory mapped regions, there is a very interesting memory region referenced by `/usr/lib/x86_64-linux-gnu/libffi.so.8.1.0`:

This page will allow us to write shellcode into it (the `w` permission), as well as run code within it (the `x` permission)! For ease & portability, we will read the map from `/proc/self/maps`, however, it is completely possible to obtain a reference to this memory space in a non-portable way (offsets from each library would need to be computed per Python, FFI, and LibC version):

We can write our shellcode into that address, and use our RIP control primitive to jump to our shellcode:

The source code for this exploit is included in Appendix A.

Remote Exploitation Against a Vulnerable Application

To demonstrate the impact of the vulnerability, we will exploit a custom vulnerable application, cause it to run our shellcode, and get access to a remote system with a shellcode reverse interactive command shell.

A simple vulnerable Python Flask application was made:

The source code for this application is included in Appendix B

The conversion process of the local exploit to a remote one is just ensuring that the following are executed server-side:

- `MXNDArrayGetStorageType`

- `id`

Because the vulnerable application interacts over HTTP and responds in JSON, we can convert those local calls to remote ones by using Python `requests`:

Local (old)

Remote (new)

Local (old)

id(1)

Remote (new)

And with some modification of the shellcode, we can simply re-run the exploit and it now works remotely:

The source code for this exploit is included in Appendix C.

Recommendations

The fundamental cause of this vulnerability is that MXNet requires its API consumers to manage MXNet's internal state (the `handle` property on many objects). To avoid this, MXNet should manage its own state. This can be done by introducing an internal table within MXNet that contains references to objects mapped to a token, and users of the API can then just refer the token when requesting library functions – allowing MXNet to properly handle and retain control over its memory critical operations.

Appendix A - Local Exploit (Tested on Python 3.10.6 & MXNet 1.9.1)

Shellcode Payload

Appendix B - Vulnerable Application (Tested on Python 3.10.6 & MXNet 1.9.1)

This app requires `Flask` and was tested against version `2.3.2`. See Appendix D for a full `requirements.txt`. After saving the python code as `app.py`, it can be run like so:

$ flask run --host=0.0.0.0

App.py

Appendix C - Remote Exploit (Tested on Python 3.10.6 & MXNet 1.9.1)

This exploit uses pwntools, and it can be obtained here: https://github.com/Gallopsled/pwntools. See Appendix D for a full `requirements.txt`.

Shellcode Payload

This shellcode was obtained from https://shell-storm.org/shellcode/files/shellcode-857.html

Appendix D - Requirements.txt

Included here is a `requirements.txt`, which contains the relevant versions and packages used:

Flask==2.3.2

mxnet==1.9.1

mxnet_mkl==1.6.0

pwntools==4.9.0

Requests==2.30.0

This can be installed via:

pip3 install -r requirements.txt