You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Phillip Cloud (Jira)" <ji...@apache.org> on 2021/08/11 13:58:00 UTC

[jira] [Created] (ARROW-13608) R symbol initialization appears to be depending on undefined behavior

Phillip Cloud created ARROW-13608:
-------------------------------------

             Summary: R symbol initialization appears to be depending on undefined behavior
                 Key: ARROW-13608
                 URL: https://issues.apache.org/jira/browse/ARROW-13608
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
         Environment: x86_64, linux
            Reporter: Phillip Cloud


The R bindings for arrow are triggering a segfault when running `library(arrow)`.

After a large amount of investigation by [~jonkeane], [~npr], [~bkietz], [~apitrou] and myself, we narrowed the problem down to what appears to be dependence on the order of static initialization.

The order of static initialization in C++ is indeterminate ([https://en.cppreference.com/w/cpp/language/initialization], see the "Dynamic Initialization" section), which implies that if a {{static A}} depends on a {{static B}} declared and initialized in another translation unit, it is perfectly legal for the compiler to initialize {{A}} _before_ {{B}} and thus trigger undefined behavior due to {{A}} using an uninitialized {{B}}.

This is manifesting as a segmentation fault.

A "prose-level" trace is as follows:

1. The R bindings construct symbols in [https://github.com/apache/arrow/blob/master/r/src/symbols.cpp#L79].
 2. Those binding initialize a number of \{{r_vector}}s, with this overload: [https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/r_vector.hpp#L363-L369]
 3. The overload references the static variable {{preserved}} and calls its {{insert}} method.
 4. {{insert}} dereferences a null pointer here: [https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/protect.hpp#L316] ({{list_}} specifically).

I think the solution lies inside of `cpp11`, and that is to use the [Construct on First Use idiom|https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use] to initialize `preserved` instead of using `static struct` like it does now (https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/protect.hpp#L301).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)