6  Debugging R w C++/C code

Although debugging R code is easy, the same doesn’t apply to compiled code in R1. This chapter shows a few ways to debug your R + C++ code. You will need the GNU Debugger (GDB) and Valgrind.

Before we start, remember that we will not deal with the good old-fashioned Rprint("Your code is working up to here") approach. Printing messages while your program runs can be very informative, but using Valgrind and GDB is, in my humble opinion, faster as, most of the time, those will scream at you, indicating the location of your problem.

Note

The manual Writing R Extensions (R Core Team 2023) has a fair amount of information about debugging compiled code in R here. Dirk Eddelbuettel (lead author of Rcpp) has an excellent post on Stackoverflow and recommends a tutorial hosted on BioConductor (Rue-Albrecht et al. n.d.).

6.1 Debugging with Valgrind

As a starting point, we will use Valgrind. Valgrind provides a mature framework for memory debugging and profiling. We must launch the program through the command line to use a debugger within R. To lunch R with Valgrind, we use the following:

$ R --debugger=valgrind

Which will result in something like the following:

==31245== Memcheck, a memory error detector
==31245== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==31245== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==31245== Command: /usr/lib/R/bin/exec/R
==31245== 

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> 

Once R executes with Valgrind, the debugger will catch any memory leaks generated by your C++/C code. The following is a faulty Rcpp program that creates a pointer using new and “forgets” to delete it.

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector faulty_program(int n) {

    // Here is the faulty line
    NumericVector * x_ptr = new NumericVector(n);

    return *x_ptr;

}

/***R
# Calling the faulty program
faulty_program(10)
*/

We can use the -e flag in the R command to compile the Rcpp script using sourceCpp:

R --debugger=valgrind -e 'Rcpp::sourceCpp("rcpp-debugging-faulty.cpp")'
==51618== Memcheck, a memory error detector
==51618== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==51618== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==51618== Command: /usr/lib/R/bin/exec/R -e Rcpp::sourceCpp("rcpp-debugging-faulty.cpp")
==51618== 

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> Rcpp::sourceCpp("rcpp-debugging-faulty.cpp")

> faulty_program(10)
 [1] 0 0 0 0 0 0 0 0 0 0
> 
> 
==51618== 
==51618== HEAP SUMMARY:
==51618==     in use at exit: 55,450,061 bytes in 10,732 blocks
==51618==   total heap usage: 31,980 allocs, 21,248 frees, 95,390,564 bytes allocated
==51618== 
==51618== LEAK SUMMARY:
==51618==    definitely lost: 24 bytes in 1 blocks
==51618==    indirectly lost: 0 bytes in 0 blocks
==51618==      possibly lost: 0 bytes in 0 blocks
==51618==    still reachable: 55,450,037 bytes in 10,731 blocks
==51618==                       of which reachable via heuristic:
==51618==                         newarray           : 4,264 bytes in 1 blocks
==51618==         suppressed: 0 bytes in 0 blocks
==51618== Rerun with --leak-check=full to see details of leaked memory
==51618== 
==51618== For lists of detected and suppressed errors, rerun with: -s
==51618== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

By the end of the output, in the LEAK SUMMARY section, we see definitely lost: 24 bytes in 1 block, i.e., a memory leak. If we change the program by deleting the pointer before returning, the leak will be solved:

New program:

    NumericVector res = *x_ptr;
    delete x_ptr;
    
    return res;

Old program

    
    
    
    return *x_ptr;

Re-running R with Valgrind returns the following (only the last few lines):

R --debugger=valgrind -e 'Rcpp::sourceCpp("rcpp-debugging-faulty-fixed.cpp")'
==50287== HEAP SUMMARY:
==50287==     in use at exit: 55,450,707 bytes in 10,731 blocks
==50287==   total heap usage: 32,017 allocs, 21,286 frees, 95,411,007 bytes allocated
==50287== 
==50287== LEAK SUMMARY:
==50287==    definitely lost: 0 bytes in 0 blocks
==50287==    indirectly lost: 0 bytes in 0 blocks
==50287==      possibly lost: 0 bytes in 0 blocks
==50287==    still reachable: 55,450,707 bytes in 10,731 blocks
==50287==                       of which reachable via heuristic:
==50287==                         newarray           : 4,264 bytes in 1 blocks
==50287==         suppressed: 0 bytes in 0 blocks
==50287== Rerun with --leak-check=full to see details of leaked memory
==50287== 
==50287== For lists of detected and suppressed errors, rerun with: -s
==50287== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

No more memory leaks.

6.2 Using GDB

Sometimes, we need to go further and inspect what’s going on inside the program. GBD is excellent for that. With GBD, we can set breakpoints that allow us to review the program while it is executed.

The following Rcpp code generates a memory not mapped type error:

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector faulty_program(int n) {

    // Here is the faulty line
    NumericVector * x_ptr;
        
    return *x_ptr;

}

/***R
# Calling the faulty program
faulty_program(10)
*/

In it, we try to access a location in the memory that hasn’t been allocated yet, namely, a NumericVector declared as a pointer but never assigned. Using R --debugger=valgrind generates the following code:

==54537== Memcheck, a memory error detector
==54537== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==54537== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==54537== Command: /usr/lib/R/bin/exec/R -e Rcpp::sourceCpp("rcpp-debugging-not-mapped.cpp")
==54537== 

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> Rcpp::sourceCpp("rcpp-debugging-not-mapped.cpp")

> faulty_program(10)
==54537== Invalid read of size 8
==54537==    at 0x1222229A: get__ (PreserveStorage.h:52)
==54537==    by 0x1222229A: copy__<Rcpp::Vector<14, Rcpp::PreserveStorage> > (PreserveStorage.h:66)
==54537==    by 0x1222229A: Vector (Vector.h:64)
==54537==    by 0x1222229A: faulty_program(int) (rcpp-debugging-not-mapped.cpp:11)
==54537==    by 0x12222468: sourceCpp_1_faulty_program (rcpp-debugging-not-mapped.cpp:33)
==54537==    by 0x495391D: ??? (in /usr/lib/R/lib/libR.so)
==54537==    by 0x4953E9C: ??? (in /usr/lib/R/lib/libR.so)
==54537==    by 0x49ABD77: Rf_eval (in /usr/lib/R/lib/libR.so)
==54537==    by 0x49AD2AE: ??? (in /usr/lib/R/lib/libR.so)
==54537==    by 0x49AE0F4: Rf_applyClosure (in /usr/lib/R/lib/libR.so)
==54537==    by 0x49AB84B: Rf_eval (in /usr/lib/R/lib/libR.so)
==54537==    by 0x49B120B: ??? (in /usr/lib/R/lib/libR.so)
==54537==    by 0x498E694: ??? (in /usr/lib/R/lib/libR.so)
==54537==    by 0x49AB71F: Rf_eval (in /usr/lib/R/lib/libR.so)
==54537==    by 0x49AD2AE: ??? (in /usr/lib/R/lib/libR.so)
==54537==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==54537== 

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: .Call(<pointer: 0x122223f0>, n)
 2: faulty_program(10)
 3: eval(ei, envir)
 4: eval(ei, envir)
 5: withVisible(eval(ei, envir))
 6: source(file = srcConn, local = env, echo = echo)
 7: Rcpp::sourceCpp("rcpp-debugging-not-mapped.cpp")
An irrecoverable exception occurred. R is aborting now ...
==54537== 
==54537== Process terminating with default action of signal 11 (SIGSEGV)
==54537==    at 0x4DB4A7C: __pthread_kill_implementation (pthread_kill.c:44)
==54537==    by 0x4DB4A7C: __pthread_kill_internal (pthread_kill.c:78)
==54537==    by 0x4DB4A7C: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==54537==    by 0x4D60475: raise (raise.c:26)
==54537==    by 0x4D6051F: ??? (in /usr/lib/x86_64-linux-gnu/libc.so.6)
==54537==    by 0x12222299: copy__<Rcpp::Vector<14, Rcpp::PreserveStorage> > (PreserveStorage.h:65)
==54537==    by 0x12222299: Vector (Vector.h:64)
==54537==    by 0x12222299: faulty_program(int) (rcpp-debugging-not-mapped.cpp:11)
==54537== 
==54537== HEAP SUMMARY:
==54537==     in use at exit: 55,590,287 bytes in 10,969 blocks
==54537==   total heap usage: 31,943 allocs, 20,974 frees, 95,324,688 bytes allocated
==54537== 
==54537== LEAK SUMMARY:
==54537==    definitely lost: 0 bytes in 0 blocks
==54537==    indirectly lost: 0 bytes in 0 blocks
==54537==      possibly lost: 5,007 bytes in 15 blocks
==54537==    still reachable: 55,585,280 bytes in 10,954 blocks
==54537==                       of which reachable via heuristic:
==54537==                         newarray           : 4,264 bytes in 1 blocks
==54537==         suppressed: 0 bytes in 0 blocks
==54537== Rerun with --leak-check=full to see details of leaked memory
==54537== 
==54537== For lists of detected and suppressed errors, rerun with: -s
==54537== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

To inspect an error with GDB, we have to follow these steps:

  1. Run R with gdb as debugger: R --debugger=gdb. R won’t start immediately, so we have time to add breakpoints.

  2. We can set a breakpoint on the given function with break faulty_program. GDB will grab it on the fly, so choose yes. It most likely will warn you that there’s no symbol for that function.

  3. Run R using the run command in gdb:

  4. Source the program using Rcpp::sourceCpp, and wait for gdb to pause the program once it reaches the breakpoint.

  5. Once the program has paused, we can inspect the context.

    Because of the number of options it has, using GBD can be overwhelming. Here is the list of commands I use the most:

    help        # Get help
    info locals # List the local variables (scope)
    info args   # List the arguments passed to the function
    list        # See the last few lines of the source code
    continue    # Continue running the program
    next        # Execute the next step
    bt          # Show the entire call stack (backtrace)
    up          # Go up one level in the call stack
    down        # Go down one level in the call stack
    print       # Print/display an expression

    And here is an example using info locals, info args, list, and print.

  6. Finally, to exit the program, type exit (similar to q() in R.)


  1. Debugging only C++/C code is easy, though. If you already work with compiled code, you must be aware of VS Code and the many other tools out there for debugging C++/C code.↩︎