Guide: C++ Instrumentation with Memory Sanitizer

sanitizers MSan C++

Introduction

The LLVM project provides a collection of dynamic analyzers called sanitizers to detect - at runtime - various correctness issues in applications. For example, Thread Sanitizer (TSan) detects data races while Memory Sanitizer (MSan) detects uninitialized memory usage. Using these tools typically requires compiling and linking the binary to be analyzed with special flags. For example, a C++ application is usually instrumented with TSan as follows,

$ clang++ -O0 -g -fsanitize=thread racy.cpp

To reduce false positives, all dependencies of an application must also be built with the appropriate flags. Instrumenting a C++ application with MSan is particularly challenging because this requirement extends to the standard C++ library itself . LLVM recommends using their own implementation of the standard library, libc++.

This guide will demonstrate how to correctly build libc++ and libc++abi from source and then use them to instrument your application. The code samples in this post are further developed here for usage in a CMake-based project.

Motivating Example: False Positives

Suppose we’d like to check the following application for memory usage errors:

 1 #include <set>
 2
 3 int main() {
 4     std::set<int> magic_numbers;
 5     int a;
 6
 7     if (a > 5) {
 8         magic_numbers.insert(a);
 9     }
10
11     return 0;
12 }

We expect MSan to complain about the if statement on line 7 because it reads from the uninitialized variable a. Let’s build the application with MSan instrumentation and see what happens:

$ clang++ -O0 -g -fsanitize=memory -fsanitize-recover=memory -o umr umr.cpp

Upon running umr, MSan reports the expected error on line 7:

==1810118==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x555555620401 in main /workspace/umr.cpp:7:9
    #1 0x7ffff782a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #2 0x7ffff782a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #3 0x555555586364 in _start (/workspace/umr+0x32364) (BuildId: 15beac31df9e85a66355b96ad655c1776ac1fc91)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /workspace/umr.cpp:7:9 in main

However, it also reports two additional errors stemming from the std::set destructor:

==1810118==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x555555621229 in std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int>>::_S_right(std::_Rb_tree_node_base*) /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_tree.h:790:9
    #1 0x555555620ec6 in std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int>>::_M_erase(std::_Rb_tree_node<int>*) /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_tree.h:1936:13
    #2 0x555555620d3a in std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int>>::~_Rb_tree() /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_tree.h:986:9
    #3 0x555555620797 in std::set<int, std::less<int>, std::allocator<int>>::~set() /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_set.h:283:22
    #4 0x5555556204fb in main /workspace/umr.cpp:12:1
    #5 0x7ffff782a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #6 0x7ffff782a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #7 0x555555586364 in _start (/workspace/umr+0x32364) (BuildId: 15beac31df9e85a66355b96ad655c1776ac1fc91)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /workspace/umr.cpp:12:1 in main
==1810118==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5555556212e9 in std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int>>::_S_left(std::_Rb_tree_node_base*) /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_tree.h:782:9
    #1 0x555555620f3e in std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int>>::_M_erase(std::_Rb_tree_node<int>*) /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_tree.h:1937:21
    #2 0x555555620d3a in std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int>>::~_Rb_tree() /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_tree.h:986:9
    #3 0x555555620797 in std::set<int, std::less<int>, std::allocator<int>>::~set() /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_set.h:283:22
    #4 0x5555556204fb in main /workspace/umr.cpp:12:1
    #5 0x7ffff782a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #6 0x7ffff782a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #7 0x555555586364 in _start (/workspace/umr+0x32364) (BuildId: 15beac31df9e85a66355b96ad655c1776ac1fc91)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /workspace/umr.cpp:12:1 in main
MemorySanitizer: 3 warnings reported.

Unless there is a bug in GCC’s standard C++ library, these two warnings are false positives. They occur because that library is not instrumented, so MSan loses track of memory state internal to it. To eliminate them, we must build and link against an MSan-instrumented version of libc++.

Note that we compiled the application with -O0 to eliminate any compiler optimization and ensure all code is executed. On my system, clang will elide line 8 when the code is compiled with -O1. Using -O2 or higher, the whole application is optimized out so MSan reports nothing.

Building an Instrumented libc++

Ensure you have cmake, clang and make installed. Instrumenting libc++ means building it from source with the appropriate flags using CMake, which is LLVM’s supported build system. The recommended compiler for this task is clang whose version must be “close enough” to that of libc++. We’ll be using version 18.1.3.

$ clang --version
Ubuntu clang version 18.1.3 (1ubuntu1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Let’s clone LLVM 18.1.3:

$ git clone --depth 1 https://github.com/llvm/llvm-project.git -b llvmorg-18.1.3

We’ll start by configuring our build system and then build libc++ and libc++abi. The crucial CMake option is -DLLVM_USE_SANITIZER=MemoryWithOrigins. We’re building a Debug version for better error messages. We’ll stick to libunwind for backtraces instead of using LLVM’s unwinder. Lastly, we’ll install the built libraries at a known location /usr/libcxx-msan for ease of integration with our code:

$ cmake \
      -S llvm-project/runtimes \
      -B build \
      -DCMAKE_C_COMPILER=clang \
      -DCMAKE_CXX_COMPILER=clang++ \
      -DCMAKE_BUILD_TYPE=Debug \
      -DCMAKE_INSTALL_PREFIX=/usr/libcxx-msan \
      -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi" \
      -DLLVM_USE_SANITIZER=MemoryWithOrigins \
      -DLIBCXXABI_USE_LLVM_UNWINDER=OFF \
      -DLIBCXX_INCLUDE_TESTS=OFF \
      -DLIBCXX_INCLUDE_BENCHMARKS=OFF

Once done, CMake will output something like:

-- Configuring done (5.7s)
-- Generating done (0.0s)
-- Build files have been written to: /workspace/build

Now build the libraries:

$ cmake --build build -j $(nproc) --target cxx cxxabi

CMake will output the build’s progress:

[  0%] Copying CXXABI header cxxabi.h
[  0%] Copying CXXABI header __cxxabi_config.h
...
[100%] Built target cxxabi_shared
[100%] Built target cxxabi

Now install the instrumented libraries:

$ sudo cmake --build build -j $(nproc) --target install-cxx install-cxxabi

Upon success, CMake will show where it installed the libraries:

[  0%] Built target generate-cxxabi-headers
[ 87%] Built target generate-cxx-headers
-- Install configuration: "Debug"
-- Installing: /usr/libcxx-msan/include/c++/v1/__algorithm/adjacent_find.h
-- Installing: /usr/libcxx-msan/include/c++/v1/__algorithm/all_of.h
...
-- Installing: /usr/libcxx-msan/lib/libc++abi.so
-- Installing: /usr/libcxx-msan/lib/libc++abi.a
[100%] Built target install-cxxabi

Instrumenting the Application

It’s now time to compile and link the sample application against the instrumented libc++ we just built. The compilation and linking step can be combined but are shown below separately for clarity.

We tell clang to not use the system-wide standard C++ library headers and instead use the one we built earlier. We’ll also ask MSan to track allocations using -fsanitize-memory-track-origins so that we know the origin of uninitialized reads.

$ clang++ \
      -O0 \
      -g \
      -nostdinc++ \
      -I /usr/libcxx-msan/include/c++/v1 \
      -fsanitize=memory \
      -fsanitize-recover=memory \
      -fsanitize-memory-track-origins \
      -c umr.cpp

When linking the binary, we tell clang to not use the system-wide standard C++ library and instead use the ones we built. We will opt for static linking to avoid dealing with non-standard library paths:

$ clang++ \
      -nostdlib++ \
      -fsanitize=memory \
      -o umr \
      umr.o \
      /usr/libcxx-msan/lib/libc++.a \
      /usr/libcxx-msan/lib/libc++abi.a

Now let’s run umr. As expected, we have a single warning reported and we’re told where the variable involved in the uninitialized read is allocated.

==1854756==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x58f66bc21511 in main /workspace/umr.cpp:7:9
    #1 0x7ae489c2a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #2 0x7ae489c2a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #3 0x58f66bb873b4 in _start (/workspace/umr+0x3c3b4) (BuildId: 272b661108c2d36eb7b35abae825e22f4dc875d8)

  Uninitialized value was created by an allocation of 'a' in the stack frame
    #0 0x58f66bc214c9 in main /workspace/umr.cpp:5:5

SUMMARY: MemorySanitizer: use-of-uninitialized-value /workspace/umr.cpp:7:9 in main
MemorySanitizer: 1 warnings reported.

Conclusion

Effectively using MSan means the reliable elimination of false positives. This can only be achieved if all of an application’s dependencies including the standard C++ library are correctly instrumented.

© 2025    •  Theme  Moonwalk