You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Lukas Barth (Jira)" <ji...@apache.org> on 2023/02/10 12:34:00 UTC

[jira] [Updated] (THRIFT-5682) UB in generated C++ code, stops compiling with C++20

     [ https://issues.apache.org/jira/browse/THRIFT-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lukas Barth updated THRIFT-5682:
--------------------------------
    Description: 
The C++ compiler of Thrift 0.17.0 produces C++ code with undefined behavior, which stops compiling on Clang 15 in C++20 mode.

For a generated class, both the default constructor and {{operator==}} implementations reference certain member functions of the class' members. As an example, the default constructor references (i.e., "uses") the default constructors of its members.

If a class contains a {{{}std::vector<Foo>{}}}, and {{Foo}} has only been _forward_ declared (which happens often in Thrift-generated code), this creates undefined behavior: The {{std::vector}} specification states that as long as {{Foo}} is an incomplete type, it is fine to reference {{{}std::vector<Foo>{}}}, but not any members (such as its default constructor).

Thus, we must defer our default constructor's implementation (which references the default constructor of std::vector) to a point where Foo is a complete type. That is the case in the .cpp file.

The same holds for operator==.Example

{{Take this very simple Thrift file:}}

{{struct StructA {}}
{{  1:required list<StructB> myBs}}
{{}}}

{{struct StructB}}
{
{{  1:required string someString}}
{{}}}

{{If I compile this using thrift --gen cpp:no_skeleton -o out ./file.thrift I get a header file that contains the following:}}

{{class StructA;}}
{{class StructB;}}

class StructA : …

{{public:}}
{{  …}}
{{{}  StructA() noexcept {{}}}}
{{  …}}
{{  std::vector<StructB> myBs;}}
{{  …}}
{{};}}

{{…}}

{{class StructB : …}}

{ … };

 

{{In this case, the default constructor for StructA references the default constructor of std::vector<StructB> while StructB is still an incomplete type. *This is undefined behavior.* It did apparently compile with all big compilers until recently, but with C++20, Clang 15 stops accepting this kind of construct, as you can see here at CompilerExplorer: [https://godbolt.org/z/xcc4av6cb]}}
{{ }}
h2. {{Proposed Solution}}

{{I propose to move the definitions of the default constructor and operator== from the foobar_types.h into the foobar_types.cpp file. That way, when the definition is seen (and therefore the methods of std::vector<Foo> referenced), Foo already is a complete type and everything works out.}}

{{The only alternative Solution (and which only works if Thrift does not allow circular dependencies - does it?) that I can see would be to compute a DAG of dependencies between the structures and then output them in topological order. That would not be a minor change.}}

I guess that the reason for the definitions being in the header file is that this allows better inlining of these two methods. Nowadays, most compilers/linkers perform link-time optimization, which allows for inlining to happen at link time, so my feeling is that moving the definitions into their own translation unit inside the CPP file should not cause significant performance loss.

 

I have created a Pull Request for my proposed solution here: [https://github.com/apache/thrift/pull/2755]

 

  was:
The C++ compiler of Thrift 0.17.0 produces C++ code with undefined behavior, which stops compiling on Clang 15 in C++20 mode.

For a generated class, both the default constructor and {{operator==}} implementations reference certain member functions of the class' members. As an example, the default constructor references (i.e., "uses") the default constructors of its members.

If a class contains a {{{}std::vector<Foo>{}}}, and {{Foo}} has only been _forward_ declared (which happens often in Thrift-generated code), this creates undefined behavior: The {{std::vector}} specification states that as long as {{Foo}} is an incomplete type, it is fine to reference {{{}std::vector<Foo>{}}}, but not any members (such as its default constructor).

Thus, we must defer our default constructor's implementation (which references the default constructor of std::vector) to a point where Foo is a complete type. That is the case in the .cpp file.

The same holds for operator==.Example



{{Take this very simple Thrift file:}}

{{struct StructA {}}
{{  1:required list<StructB> myBs}}
{{{}}{}}}{{{}{}}}

{{struct StructB}}
{{{}}
{{  1:required string someString}}
{{}}}

{{If I compile this using thrift --gen cpp:no_skeleton -o out ./file.thrift I get a header file that contains the following:}}


{{class StructA;}}
{{{}class StructB;{}}}{{{}{}}}{{{}{}}}{{{}{}}}

{{class StructA : … }}

{{{}}
{{public:}}
{{  …}}
{{  StructA() noexcept {}}}
{{  …}}
{{  std::vector<StructB> myBs;}}
{{  …}}
{{};}}

{{…}}

{{class StructB : …}}

{{{ … };}}

{{In this case, the default constructor for StructA references the default constructor of std::vector<StructB> while StructB is still an incomplete type. *This is undefined behavior.* It did apparently compile with all big compilers until recently, but with C++20, Clang 15 stops accepting this kind of construct, as you can see here at CompilerExplorer: [https://godbolt.org/z/xcc4av6cb]}}
{{ }}
h2. {{Proposed Solution}}

{{I propose to move the definitions of the default constructor and operator== from the foobar_types.h into the foobar_types.cpp file. That way, when the definition is seen (and therefore the methods of std::vector<Foo> referenced), Foo already is a complete type and everything works out.}}

{{The only alternative Solution (and which only works if Thrift does not allow circular dependencies - does it?) that I can see would be to compute a DAG of dependencies between the structures and then output them in topological order. That would not be a minor change.}}

I guess that the reason for the definitions being in the header file is that this allows better inlining of these two methods. Nowadays, most compilers/linkers perform link-time optimization, which allows for inlining to happen at link time, so my feeling is that moving the definitions into their own translation unit inside the CPP file should not cause significant performance loss.

 

I have created a Pull Request for my proposed solution here: https://github.com/apache/thrift/pull/2755

 


> UB in generated C++ code, stops compiling with C++20
> ----------------------------------------------------
>
>                 Key: THRIFT-5682
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5682
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Compiler
>    Affects Versions: 0.17.0
>            Reporter: Lukas Barth
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The C++ compiler of Thrift 0.17.0 produces C++ code with undefined behavior, which stops compiling on Clang 15 in C++20 mode.
> For a generated class, both the default constructor and {{operator==}} implementations reference certain member functions of the class' members. As an example, the default constructor references (i.e., "uses") the default constructors of its members.
> If a class contains a {{{}std::vector<Foo>{}}}, and {{Foo}} has only been _forward_ declared (which happens often in Thrift-generated code), this creates undefined behavior: The {{std::vector}} specification states that as long as {{Foo}} is an incomplete type, it is fine to reference {{{}std::vector<Foo>{}}}, but not any members (such as its default constructor).
> Thus, we must defer our default constructor's implementation (which references the default constructor of std::vector) to a point where Foo is a complete type. That is the case in the .cpp file.
> The same holds for operator==.Example
> {{Take this very simple Thrift file:}}
> {{struct StructA {}}
> {{  1:required list<StructB> myBs}}
> {{}}}
> {{struct StructB}}
> {
> {{  1:required string someString}}
> {{}}}
> {{If I compile this using thrift --gen cpp:no_skeleton -o out ./file.thrift I get a header file that contains the following:}}
> {{class StructA;}}
> {{class StructB;}}
> class StructA : …
> {{public:}}
> {{  …}}
> {{{}  StructA() noexcept {{}}}}
> {{  …}}
> {{  std::vector<StructB> myBs;}}
> {{  …}}
> {{};}}
> {{…}}
> {{class StructB : …}}
> { … };
>  
> {{In this case, the default constructor for StructA references the default constructor of std::vector<StructB> while StructB is still an incomplete type. *This is undefined behavior.* It did apparently compile with all big compilers until recently, but with C++20, Clang 15 stops accepting this kind of construct, as you can see here at CompilerExplorer: [https://godbolt.org/z/xcc4av6cb]}}
> {{ }}
> h2. {{Proposed Solution}}
> {{I propose to move the definitions of the default constructor and operator== from the foobar_types.h into the foobar_types.cpp file. That way, when the definition is seen (and therefore the methods of std::vector<Foo> referenced), Foo already is a complete type and everything works out.}}
> {{The only alternative Solution (and which only works if Thrift does not allow circular dependencies - does it?) that I can see would be to compute a DAG of dependencies between the structures and then output them in topological order. That would not be a minor change.}}
> I guess that the reason for the definitions being in the header file is that this allows better inlining of these two methods. Nowadays, most compilers/linkers perform link-time optimization, which allows for inlining to happen at link time, so my feeling is that moving the definitions into their own translation unit inside the CPP file should not cause significant performance loss.
>  
> I have created a Pull Request for my proposed solution here: [https://github.com/apache/thrift/pull/2755]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)