You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2019/07/27 19:04:11 UTC

[C++] Private implementations and virtual interfaces

hi folks,

In some places I have favored using an abstract class interface with
exclusively pure virtual functions over the PIMPL pattern. I think it
would merit some discussion of why we should use one over the other.

First, here's a quick example of the difference

https://gist.github.com/wesm/aea2a95000dd12372e0f1a6163dfd2cc

Obviously a lot of details have been omitted.

Here's an example of such a class which may get reworked as such

https://github.com/apache/arrow/blob/apache-arrow-0.14.1/cpp/src/parquet/arrow/reader.h#L289

The abstract/all-virtual base has some benefits:

* No need to implement "forwarding" methods to the private implementation
* Do not have to declare "friend" classes in the header for some cases
where other classes need to access the methods of a private
implementation
* Implementation symbols do not need to be exported in DLLs with an
*_EXPORT macro

There are some drawbacks, or cases where this method cannot be applied, though:

* An implementation of some other abstract interface which needs to
appear in a public header may not be able to use this approach.
* My understanding is that the PIMPL pattern will perform better for
non-virtual functions that are called a lot. It'd be helpful to know
the magnitude of the performance difference
* Complex inheritance patterns may require use of virtual inheritance,
which can create a burden for downstream users (e.g. they may have to
use dynamic_cast to convert between types in the class hierarchy)

I'm far from a C++ expert so I'm always learning as time goes on, but
hopefully other C++ developers have thoughts about this and we can
keep an eye out in code reviews for cases where a simpler
implementation may suffice

- Wes

Re: [C++] Private implementations and virtual interfaces

Posted by Antoine Pitrou <an...@python.org>.
Hi,

Le 27/07/2019 à 21:04, Wes McKinney a écrit :
> * My understanding is that the PIMPL pattern will perform better for
> non-virtual functions that are called a lot. It'd be helpful to know
> the magnitude of the performance difference

Modern CPUs have indirect branch predictors.  If an indirect branch
(e.g. virtual function call) always resolves to the same target address,
it should have very good performance.

My main issue with the Pimpl pattern is the amount of boilerplate it
requires.  One workaround is to use a "half-pimpl", i.e. keep public API
implementations in the main class, but put private helpers in the pimpl.
See e.g. MockFileSystem vs. MockFileSystem::Impl here:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/mockfs.cc#L217

That's not always convenient though.

Regards

Antoine.


> * Complex inheritance patterns may require use of virtual inheritance,
> which can create a burden for downstream users (e.g. they may have to
> use dynamic_cast to convert between types in the class hierarchy)
> 
> I'm far from a C++ expert so I'm always learning as time goes on, but
> hopefully other C++ developers have thoughts about this and we can
> keep an eye out in code reviews for cases where a simpler
> implementation may suffice
> 
> - Wes
> 

Re: [C++] Private implementations and virtual interfaces

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Building CI that detects ABI breakage is not hard. There is a closed PR in the arrow repo from me that does exactly this using abi-complicance-checker.

I understand that we will not be able to provide ABI stability on all Arrow subprojects but having it for a core would be really great. This would allow easy upgrading of adaptors that connect to Arrow but don't actually work with the resulting structures (this can be database connectors and file format implementations).

Uwe

On Sun, Jul 28, 2019, at 10:43 AM, Antoine Pitrou wrote:
> 
> Le 28/07/2019 à 01:49, Wes McKinney a écrit :
> > On Sat, Jul 27, 2019 at 4:38 PM Uwe L. Korn <uw...@xhochy.com> wrote:
> >>
> >> The PIMPL is a thing I would trade a bit of performance as it brings ABI stability. This is something that will help us making Arrow usage in thirdparty code much simpler.
> >>
> > 
> > I question whether ABI stability (at the level of the shared library
> > ABI version) will ever be practical in this project. In the case of
> > NumPy, there are very few C data structures that could actually
> > change. In this library we have a great deal more data structures, I
> > would guess 100s of distinct objects taking into account each C++
> > class in the library. It's one thing to be API forward compatible from
> > major version to major version (with warnings to allow for graceful
> > deprecations) but such forward compatibility strategies are not so
> > readily available when talking about the composition of C++ classes
> > (considering that virtual function tables are part of the ABI).
> > 
> > In any case, as a result of this, I'm not comfortable basing technical
> > decisions on the basis of ABI stability considerations.
> 
> We could at some point define a "stable ABI subset" (for example the
> core classes Array, ArrayData, etc.).  But I'm not sure whether that
> would be sufficient for users.
> 
> Also in any case it would need someone to maintain the regular toolkit
> and continuous integration hooks to prevent ABI breakage.  Otherwise
> it's pointless promising something that we can't efficiently monitor.
> 
> Regards
> 
> Antoine.
>

Re: [C++] Private implementations and virtual interfaces

Posted by Antoine Pitrou <an...@python.org>.
Le 28/07/2019 à 01:49, Wes McKinney a écrit :
> On Sat, Jul 27, 2019 at 4:38 PM Uwe L. Korn <uw...@xhochy.com> wrote:
>>
>> The PIMPL is a thing I would trade a bit of performance as it brings ABI stability. This is something that will help us making Arrow usage in thirdparty code much simpler.
>>
> 
> I question whether ABI stability (at the level of the shared library
> ABI version) will ever be practical in this project. In the case of
> NumPy, there are very few C data structures that could actually
> change. In this library we have a great deal more data structures, I
> would guess 100s of distinct objects taking into account each C++
> class in the library. It's one thing to be API forward compatible from
> major version to major version (with warnings to allow for graceful
> deprecations) but such forward compatibility strategies are not so
> readily available when talking about the composition of C++ classes
> (considering that virtual function tables are part of the ABI).
> 
> In any case, as a result of this, I'm not comfortable basing technical
> decisions on the basis of ABI stability considerations.

We could at some point define a "stable ABI subset" (for example the
core classes Array, ArrayData, etc.).  But I'm not sure whether that
would be sufficient for users.

Also in any case it would need someone to maintain the regular toolkit
and continuous integration hooks to prevent ABI breakage.  Otherwise
it's pointless promising something that we can't efficiently monitor.

Regards

Antoine.

Re: [C++] Private implementations and virtual interfaces

Posted by Wes McKinney <we...@gmail.com>.
On Sat, Jul 27, 2019 at 4:38 PM Uwe L. Korn <uw...@xhochy.com> wrote:
>
> The PIMPL is a thing I would trade a bit of performance as it brings ABI stability. This is something that will help us making Arrow usage in thirdparty code much simpler.
>

I question whether ABI stability (at the level of the shared library
ABI version) will ever be practical in this project. In the case of
NumPy, there are very few C data structures that could actually
change. In this library we have a great deal more data structures, I
would guess 100s of distinct objects taking into account each C++
class in the library. It's one thing to be API forward compatible from
major version to major version (with warnings to allow for graceful
deprecations) but such forward compatibility strategies are not so
readily available when talking about the composition of C++ classes
(considering that virtual function tables are part of the ABI).

In any case, as a result of this, I'm not comfortable basing technical
decisions on the basis of ABI stability considerations.

> Simple updates when an API was only extended but the ABI is intact is a great ease on the Arrow consumer side. I know that this is a bit more hassle on the developer side but it's something I really love with NumPy. It's so much simpler to do a version upgrade than with an ABI breaking library such as Arrow.
>
> Uwe
>
> > Am 27.07.2019 um 22:57 schrieb Jed Brown <je...@jedbrown.org>:
> >
> > Wes McKinney <we...@gmail.com> writes:
> >
> >> The abstract/all-virtual base has some benefits:
> >>
> >> * No need to implement "forwarding" methods to the private implementation
> >> * Do not have to declare "friend" classes in the header for some cases
> >> where other classes need to access the methods of a private
> >> implementation
> >> * Implementation symbols do not need to be exported in DLLs with an
> >> *_EXPORT macro
> >>
> >> There are some drawbacks, or cases where this method cannot be applied, though:
> >>
> >> * An implementation of some other abstract interface which needs to
> >> appear in a public header may not be able to use this approach.
> >> * My understanding is that the PIMPL pattern will perform better for
> >> non-virtual functions that are called a lot. It'd be helpful to know
> >> the magnitude of the performance difference
> >> * Complex inheritance patterns may require use of virtual inheritance,
> >> which can create a burden for downstream users (e.g. they may have to
> >> use dynamic_cast to convert between types in the class hierarchy)
> >
> > I would add these two points, which may or may not be a significant
> > concern to you:
> >
> > * When you add new methods to the abstract virtual model, you change the
> >  ABI [1] and therefore need to recompile client code.  This has many
> >  consequences for distribution.
> >
> > * PIMPL gives a well-defined place for input validation and setting
> >  debugger breakpoints even when you don't know which implementation
> >  will be used.
> >
> >
> > [1] The ABI changes because code to index into the vtable is inlined at
> > the call site.  Adding to your example
> >
> >  void foo(VirtualType &obj) {
> >    obj.Method1();
> >    // any other line to suppress tail call
> >  }
> >
> > produces assembly like
> >
> >  mov    rax,QWORD PTR [rdi]     ; load vtable for obj
> >  call   QWORD PTR [rax+0x10]    ; 0x10 is offset into vtable
> >
> > A different method will use a different offset.  If you add a method,
> > offsets of existing methods may change.  With PIMPL, the equivalent
> > indexing code resides in your library instead of client code, and yields
> > a static (or PLT) call resolved by the linker:
> >
> >  call    _ZN9PIMPLType7Method1Ev@PLT
>

Re: [C++] Private implementations and virtual interfaces

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
The PIMPL is a thing I would trade a bit of performance as it brings ABI stability. This is something that will help us making Arrow usage in thirdparty code much simpler.

Simple updates when an API was only extended but the ABI is intact is a great ease on the Arrow consumer side. I know that this is a bit more hassle on the developer side but it's something I really love with NumPy. It's so much simpler to do a version upgrade than with an ABI breaking library such as Arrow.

Uwe

> Am 27.07.2019 um 22:57 schrieb Jed Brown <je...@jedbrown.org>:
> 
> Wes McKinney <we...@gmail.com> writes:
> 
>> The abstract/all-virtual base has some benefits:
>> 
>> * No need to implement "forwarding" methods to the private implementation
>> * Do not have to declare "friend" classes in the header for some cases
>> where other classes need to access the methods of a private
>> implementation
>> * Implementation symbols do not need to be exported in DLLs with an
>> *_EXPORT macro
>> 
>> There are some drawbacks, or cases where this method cannot be applied, though:
>> 
>> * An implementation of some other abstract interface which needs to
>> appear in a public header may not be able to use this approach.
>> * My understanding is that the PIMPL pattern will perform better for
>> non-virtual functions that are called a lot. It'd be helpful to know
>> the magnitude of the performance difference
>> * Complex inheritance patterns may require use of virtual inheritance,
>> which can create a burden for downstream users (e.g. they may have to
>> use dynamic_cast to convert between types in the class hierarchy)
> 
> I would add these two points, which may or may not be a significant
> concern to you:
> 
> * When you add new methods to the abstract virtual model, you change the
>  ABI [1] and therefore need to recompile client code.  This has many
>  consequences for distribution.
> 
> * PIMPL gives a well-defined place for input validation and setting
>  debugger breakpoints even when you don't know which implementation
>  will be used.
> 
> 
> [1] The ABI changes because code to index into the vtable is inlined at
> the call site.  Adding to your example
> 
>  void foo(VirtualType &obj) {
>    obj.Method1();
>    // any other line to suppress tail call
>  }
> 
> produces assembly like
> 
>  mov    rax,QWORD PTR [rdi]     ; load vtable for obj
>  call   QWORD PTR [rax+0x10]    ; 0x10 is offset into vtable
> 
> A different method will use a different offset.  If you add a method,
> offsets of existing methods may change.  With PIMPL, the equivalent
> indexing code resides in your library instead of client code, and yields
> a static (or PLT) call resolved by the linker:
> 
>  call    _ZN9PIMPLType7Method1Ev@PLT


Re: [C++] Private implementations and virtual interfaces

Posted by Jed Brown <je...@jedbrown.org>.
Wes McKinney <we...@gmail.com> writes:

> The abstract/all-virtual base has some benefits:
>
> * No need to implement "forwarding" methods to the private implementation
> * Do not have to declare "friend" classes in the header for some cases
> where other classes need to access the methods of a private
> implementation
> * Implementation symbols do not need to be exported in DLLs with an
> *_EXPORT macro
>
> There are some drawbacks, or cases where this method cannot be applied, though:
>
> * An implementation of some other abstract interface which needs to
> appear in a public header may not be able to use this approach.
> * My understanding is that the PIMPL pattern will perform better for
> non-virtual functions that are called a lot. It'd be helpful to know
> the magnitude of the performance difference
> * Complex inheritance patterns may require use of virtual inheritance,
> which can create a burden for downstream users (e.g. they may have to
> use dynamic_cast to convert between types in the class hierarchy)

I would add these two points, which may or may not be a significant
concern to you:

* When you add new methods to the abstract virtual model, you change the
  ABI [1] and therefore need to recompile client code.  This has many
  consequences for distribution.

* PIMPL gives a well-defined place for input validation and setting
  debugger breakpoints even when you don't know which implementation
  will be used.


[1] The ABI changes because code to index into the vtable is inlined at
the call site.  Adding to your example

  void foo(VirtualType &obj) {
    obj.Method1();
    // any other line to suppress tail call
  }

produces assembly like

  mov    rax,QWORD PTR [rdi]     ; load vtable for obj
  call   QWORD PTR [rax+0x10]    ; 0x10 is offset into vtable

A different method will use a different offset.  If you add a method,
offsets of existing methods may change.  With PIMPL, the equivalent
indexing code resides in your library instead of client code, and yields
a static (or PLT) call resolved by the linker:

  call    _ZN9PIMPLType7Method1Ev@PLT