A Not Too Big, Not Too Small Trap – GMP’s C++ Binding

WARNING

This article was translated from Chinese by ChatGPT and may contain some errors.

C++ Binding of the GNU MP Library

The GNU MP library is a library for large integer and multi-precision floating-point arithmetic. It is written in C, but also provides C++ bindings. When writing programs in C++, unless you are a masochist or a fanatical manual compiler transformation enthusiast, using the C++ binding is undoubtedly a better choice.

This is because the C language version of the binding encapsulates all operations into instructions similar to those in assembly language. For example, if you want to calculate a large integer version of 1+2, you should write it like this:

1mpz_t a, b, c;
2mpz_init_set_ui(a, 1);
3mpz_init_set_ui(b, 2);
4mpz_add(c, a, b);
5mpz_clear(a);
6mpz_clear(b);
7mpz_clear(c);

Honestly, this is not as concise as writing assembly directly:

1mov $1, %eax
2mov $2, %ebx
3add %ebx, %eax
4mov %ecx, %eax

The fundamental reason for this situation is that there is no convenient means in C to manage memory resources and construct structures quickly (of course, some have been added in the new standards), which makes C able to implement the expression evaluation model, but it is not convenient to implement the custom type expression evaluation model.

The transformation between the expression evaluation model and the register machine model is the essence of compilation. Writing code in C in this way is equivalent to performing some transformations that the compiler does (such as ANF). Therefore, people who like to write code in this way either like to write assembly, or they like to manually perform compiler transformations.

The C++ binding is a savior at this time. In situations where C/C++ must be used (such as the Modern Cryptography Experiment course in our school), using the C++ binding can avoid this embarrassment:

1const mpz_class a {1}, b {2};
2const auto c = a + b;

This code can perform exactly the same behavior as the above C language code. This is because C++ has several excellent features compared to C, the best of which is the so-called RAII, that is, Resource Acquisition Is Initialization. Here, in conjunction with the runtime stack, in simple terms, it is the cooperation of the constructor and destructor, so that objects on the stack acquire resources when constructed (manually introducing a binding), and release resources when destructed (automatically destructed when exiting the current scope). For resources like memory, it seems that we are using a language with GC, and we dont need to worry about any memory issues.

A natural question arises, can stack-based RAII really replace GC? Through the following explanation of GMP, readers should be able to come up with their own answers.

A Strange Problem

As mentioned before, my main purpose of using the GNU MP library is to conduct cryptographic experiments. In our cryptographic experiments, there is a problem of calculating DLP (Discrete Logarithm Problem), which is very large in scale, and the running speed is a relatively important factor. Therefore, I must use the GNU MP library, which guarantees speed. However, in the experiment, I encountered a very strange problem, that is, sometimes some codes often produce incorrect results, and I cant find the source of the problem even after repeated checks. Whats worse, these problems are like ghosts, sometimes they appear and sometimes they dont, and the results when they appear are sometimes different.

My first reaction to this was that there might be some memory issues. But I immediately denied this idea. The GNU MP library, which is used by many people, generally will not have such a malignant problem. But in my code, there are only simple calculations, similar to:

1/*
2 * Pohlig-Hellman algorithm for Group of prime power order
3 */
4mpz_class
5pohligHellmanP(const mpz_class& g, const mpz_class& h,
6               const mpz_class& pn, const mpz_class& en,
7               const mpz_class& p) {
8    const auto y = fastPow(g, Pow(pn, en - 1), p);
9    assert (fastPow(y, pn, p) == 1);
10    mpz_class x{0};
11    for (auto i = 0; i < en; ++i) {
12        auto hi = fastPow(Inverse(fastPow(g, x, p), p) * h,
13                          Pow(pn, en - 1 - i), p);
14        auto di = pDlp(y, hi, pn, p);
15        x = x + Pow(pn, i) * di;
16    }
17    return x;
18}

After a difficult exploration, I determined a minimum problem structure. The minimum problem structure means the simplest and least line of code that triggers this problem. It is:

1mpz_class nothing() {
2  const auto a = mpz_class { 1 } + mpz_class { 2 };
3  std::cout << a << std::endl;
4  return a;
5}
6
7int main() {
8  std::cout << nothing();
9}

On my computer, this code will give a very surprising result:

1➜  gmp_error git:(master) ✗ g++ test.cpp -o a -g -lgmp -O0 -lgmpxx
2➜  gmp_error git:(master) ✗ ./a                                   
394361021124304
494361021124336%

1+2=943610211243041+2=94361021124304 or 1+2=943610211243361+2=94361021124336?

Such simple code has produced such a weird error, its really strange!

The Design of the GNU MP Library

To crack this mystery, we should start from another strange phenomenon, that is:

1mpz_class nothing() {
2  const mpz_class a = mpz_class { 1 } + mpz_class { 2 };
3  std::cout << a << std::endl;
4  return a;
5}
6
7int main() {
8  std::cout << nothing();
9}

This code has no problem at all? Readers may find it hard to believe this fact, but it is the reality:

1➜  gmp_error git:(master) ✗ g++ test.cpp -o a -g -lgmp -O0 -lgmpxx
2➜  gmp_error git:(master) ✗ ./a                                   
33
43%

This makes the problem very clear. What type will the keyword auto deduce a to be? Using IDE or c++filt to check, the answer is even more confusing:

1const __gmp_expr<mpz_t, __gmp_binary_expr<mpz_class, mpz_class, __gmp_binary_plus>> a

What type is this? It seems that we must find the answer in the gmpxx.h file.

In gmpxx.h, we will see that mpz_class is actually mpz_expr<mpz_t, mpz_t>:

1/**************** mpz_class -- wrapper for mpz_t ****************/
2
3template <> // line 1572
4class __gmp_expr<mpz_t, mpz_t>{ ... }; 
5
6typedef __gmp_expr<mpz_t, mpz_t> mpz_class; // line 1756

So, this __gmp_expr high-order type (theoretically, this is indeed equivalent to a high-order type) may have some other specializations. As expected, this file also defines many specializations of __gmp_expr. For example, the a we saw earlier, the actual type is:

1template <class T, class Op>
2class __gmp_expr
3<T, __gmp_binary_expr<__gmp_expr<T, T>, __gmp_expr<T, T>, Op> >

Lets observe the constructor of this class:

1__gmp_expr(const val1_type &val1, const val2_type &val2)
2    : expr(val1, val2) { }

expr is a member variable of the class, it is declared as:

1__gmp_binary_expr<val1_type, val2_type, Op> expr;

What is this __gmp_binary_expr? It is defined as follows:

1template <class T, class U, class Op>
2struct __gmp_binary_expr
3{
4  typename __gmp_resolve_ref<T>::ref_type val1;
5  typename __gmp_resolve_ref<U>::ref_type val2;
6
7  __gmp_binary_expr(const T &v1, const U &v2) : val1(v1), val2(v2) { }
8private:
9  __gmp_binary_expr();
10};

This is a bit confusing, defining such a class with only a constructor seems to have no special meaning, we need to find the function that uses it. As mentioned earlier, if the type of the right value is mpz_class, there will be no problem. From mpz_expr<..> to mpz_class, a type conversion must have occurred. Where is this type conversion function? Lets go back to the definition of mpz_class:

1template <class T>
2__gmp_expr(const __gmp_expr<mpz_t, T> &expr)
3{ mpz_init(mp); __gmp_set_expr(mp, expr); }
4template <class T, class U>
5explicit __gmp_expr(const __gmp_expr<T, U> &expr)
6{ mpz_init(mp); __gmp_set_expr(mp, expr); }

This function is undoubtedly converting __gmp_expr<...> to mpz_class. So what is __gmp_set_expr doing?

Lets look at its definition:

1template <class T>
2inline void __gmp_set_expr(mpz_ptr z, const __gmp_expr<mpz_t, T> &expr)
3{
4  expr.eval(z);
5}

Hmm? This eval function seems to be defined in __gmp_expr<T ...>, lets look at the definition we just saw:

1void eval(typename __gmp_resolve_expr<T>::ptr_type p) const
2{ Op::eval(p, expr.val1.__get_mp(), expr.val2.__get_mp()); }

It is forwarded to the Op::eval function. The previous type of Op was __gmp_binary_plus, how is its eval function defined?

1struct __gmp_binary_plus
2{
3  static void eval(mpz_ptr z, mpz_srcptr w, mpz_srcptr v)
4  { mpz_add(z, w, v); }

This is really familiar, we finally figured out what this combination of punches is doing.

First of all, __gmp_expr< ... > is like a syntax tree, it records all operation information. When the value of this type is converted to mpz_class, it is evaluated, and the evaluated value is put into the binding after conversion.

But what is the point of this? In my opinion, such code does not simplify any logic. The C++ compiler can fully guarantee that no extra copies are generated. In fact, such a complex construction and simply writing a class and overloading the operator have almost the same effect.

The only advantage is that when the variable uses auto instead of mpz_class, the variable itself is a syntax tree rather than a value, and it is only evaluated when the value of this expression is needed (that is, during type conversion). This is the so-called lazy evaluation.

I find it hard to understand what the benefits of lazy evaluation are in numerical computation tasks. The biggest advantage of lazy evaluation is that it does not calculate unnecessary values, such as:

1(define (f) (f))
2(define (g t1 t2) (t2))
3
4(g (f) 1) ;In scheme, infinite loop
1f = f
2g t1 t2 = t2
3g f 1 --In Haskell, this will get 1

However, in such numerical computation tasks, we generally do not perform any redundant calculations. Lazy evaluation itself cannot simplify necessary calculations, and there is no advantage in terms of performance.

Moreover, this design will cause serious errors as just mentioned. This is because each __gmp_binary_expr actually saves the const reference of two variables, and fundamentally, const references cannot capture a right value. Calling

1__gmp_binary_expr(const T &v1, const U &v2) : val1(v1), val2(v2) { }

will only assign the pointer pointing to v1 to val1, and the pointer pointing to v2 to val2.

Looking back at this line of code:

1const auto a = mpz_class { 1 } + mpz_class { 2 };
2...

It will actually become:

1mpz_class temp1 {1}, temp2 {2};
2a = temp1 + temp2;
3~temp1(); ~temp2();
4...

After the destructor is executed, the targets pointed to by the nodes in the syntax tree a have been completely destructed, and all code accessing these objects is wrong. In other words, the legal time of a only exists in the moment after the current statement is executed and before the next statement is executed.

Solving the problem

There are two ways to solve the problem:

  • Modify gmpxx.h.
  • All declarations use mpz_class instead of auto.

However, even if this file is modified, the problem that const& cannot capture right values is still unsolvable.

How about changing __gmp_binary_expr to value semantics? In other words, we let val1 and val2 be not const &T and const &U but real T and U.

This can painlessly solve the problem of const auto a = mpz_class { 1 } + mpz_class { 2 };. Because mpz_class{1} and mpz_class{2} are both rvalues, or X values, the method of rvalue reference can painlessly transfer resources. In fact, if we just want to solve the addition problem, we only need to modify a few places:

1/* 修改这个宏,使得加法有右值引用的版本 */
2#define __GMPP_DEFINE_BINARY_FUNCTION(fun, eval_fun)                   \
3                                                                       \
4template <class T, class U, class V, class W>                          \
5inline __gmp_expr<typename __gmp_resolve_expr<T, V>::value_type,       \
6__gmp_binary_expr<__gmp_expr<T, U>, __gmp_expr<V, W>, eval_fun> >      \
7fun(const __gmp_expr<T, U> &expr1, const __gmp_expr<V, W> &expr2)      \
8{                                                                      \
9  return __gmp_expr<typename __gmp_resolve_expr<T, V>::value_type,     \
10     __gmp_binary_expr<__gmp_expr<T, U>, __gmp_expr<V, W>, eval_fun> > \
11    (expr1, expr2);                                                    \
12}                                                                      \
13template <class T, class U, class V, class W>                          \
14inline __gmp_expr<typename __gmp_resolve_expr<T, V>::value_type,       \
15__gmp_binary_expr<__gmp_expr<T, U>, __gmp_expr<V, W>, eval_fun> >      \
16fun(__gmp_expr<T, U> &&expr1, __gmp_expr<V, W> &&expr2)                \
17{                                                                      \
18  return __gmp_expr<typename __gmp_resolve_expr<T, V>::value_type,     \
19     __gmp_binary_expr<__gmp_expr<T, U>, __gmp_expr<V, W>, eval_fun> > \
20    (std::move(expr1), std::move(expr2));                              \
21}
1/* 修改这个类,使得构造函数有右值引用的版本 */
2template <class T, class Op>
3class __gmp_expr
4<T, __gmp_binary_expr<__gmp_expr<T, T>, __gmp_expr<T, T>, Op> >
5{
6private:
7  typedef __gmp_expr<T, T> val1_type;
8  typedef __gmp_expr<T, T> val2_type;
9
10  __gmp_binary_expr<val1_type, val2_type, Op> expr;
11public:
12  __gmp_expr(const val1_type &val1, const val2_type &val2)
13    : expr(val1, val2) { } 
14  __gmp_expr(val1_type &&val1, val2_type &&val2) // 新加入的构造函数
15    : expr(std::move(val1), std::move(val2)) { }
1template <class Op>
2struct __gmp_binary_expr<mpz_class, mpz_class, Op>
3{
4  mpz_class val1;
5  mpz_class val2;
6  __gmp_binary_expr(const mpz_class &v1, const mpz_class &v2) 
7    : val1(v1), val2(v2) { }
8  __gmp_binary_expr(mpz_class &&v1, mpz_class &&v2) 
9    : val1(std::move(v1)), val2(std::move(v2)) { }
10private:
11  __gmp_binary_expr();
12};

Customize a specialization of __gmp_binary_expr to handle the situation where both are mpz_class.

This way, the above code can get the correct 3.

However, lets not talk about the workload of all corrections, such a correction will inevitably encounter a problem: If the input is a lvalue, it cannot be painlessly moved, and copying is required, which is not conducive to performance.

How to solve this problem? The answer is that (at least I) cant solve it.

GC and RAII

The above problem is not a problem in languages with GC. Even in a language like python, constantly getting a binding of a resource will not produce any copy:

1a = [1, 2, 3, 4]
2b = a
3c = b

Of course, this is because a, b, and c are actually pointing to the same object, similar to const &.

However, const & in languages with GC can perfectly solve the problem of cannot capture rvalues:

1class A:
2    def __init__(self, arr):
3        self.arr = arr
4        
5a = A([1,2,3,4])

Fundamentally speaking, RAII cannot allow the same object on the stack to be owned by two bindings at the same time. The rule of eliminating objects on the stack is a strict scope rule, and there can be no situation of borrowing objects from the stack. In languages with GC, because both objects and resources owned by objects are on the heap, or even integrated, this problem will not occur.

In this way, RAII cannot replace GC. Of course, languages like Rust may be able to solve this problem through some other methods. However, we can draw this conclusion: In C++, the ability of RAII is ultimately limited.