You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2021/02/08 17:02:41 UTC

[C++] Conventions around C++ shared_ptr in the code base?

I'm not sure how consistent we are with how shared_ptr is used as a
parameter to methods and as a return type.  In reviewing and writing code
I've been using these guidelines for myself and I was wondering if they
align with others:

1.  If a copy of a shared_ptr is not intended to be made by the method then
use a const ref to underlying type.  i.e. void Foo(const Array& array) is
preferable to void Foo(const shared_ptr<Array>& array) [1].

2.  If a copy is always going to be made pass by value.  i.e. void
Foo(std::shared_ptr<Array>) and to std::move within the method.  The last
time I did research on this allowed for eliminating shared_ptr overhead if
the caller also can std::move() the parameter.

3.  If a copy might be made pass the shared_ptr by const reference.  i.e. void
Foo(const shared_ptr<T>& array) The exception to this if the contents of
the shared_ptr a reference can effectively be copied cheaply without as is
the case with Array via ArrayData in which case #1 applies.

4. For accessor methods prefer returning by const ref or underlying ref to
underlying when appropriate.     i.e. const std::shared_ptr<Array>& foo()  or
const Array& Foo().

5. For factory like methods return a copy i.e. std::shared_ptr<Array>
MakeFoo();

Is this other people's mental model?  I'd like to update our style guide so
we can hopefully drive consistency over time.

Thanks,
Micah

[1] Array is somewhat of a special case because one can have essentially
the same shared_ptr copy semantics by copying the underlying ArrayData
object.

Re: [C++] Conventions around C++ shared_ptr in the code base?

Posted by Wes McKinney <we...@gmail.com>.
On Fri, Feb 19, 2021 at 10:30 AM Micah Kornfield <em...@gmail.com> wrote:
>
> >
> > Agreed on Array::data(), in fact I made this change in #9490 (ARROW-9196)
>
> Thank you!
>
>
> > One convention I'd like to append: we mostly avoid convenience typedefs,
> > but an
> > exception is the common case of `vector<shared_ptr<{class name}>>`, for
> > which we
> > allow and encourage the use of `{class name}Vector` typedefs. (Conversely,
> > nothing
> > should be named /^\w+Vector$/ which isn't a vector-of-shared_ptr typedef.)
>
> I'm -0 on this.  I think it makes the code base slightly less readable for
> new-comers.
>
> People who need abstract tabular data capabilities can still implement
> > Table (though I wonder if that ability has ever been used productively).
>
> I think there is an open PR proposing a dataframe like wrapper around table
> or record batch (I would need to go back and look).  I would have to go
> back and look to see if it made use of either of these.

On this I would caution against preemptively closing doors — this
codebase still has a lot of room to grow, and given how early we still
are on the analytics / query processing / data frames side of the
project I would suggest waiting until there has been an opportunity
for progress to happen here and we see whether these abstract
interfaces are actually needed. For example, if a Table / RecordBatch
façade is provided for lazy/unevaluated operations, that could be
really valuable in practice.

I regret that haven't been able to do more myself to move that forward
but I trust we will get there eventually.

> -Micah
>
>
>
> On Fri, Feb 19, 2021 at 8:25 AM Antoine Pitrou <an...@python.org> wrote:
>
> >
> > I don't think there's any benefit in keeping RecordBatch abstract.
> > Making it concrete would probably reduce overhead slightly as well.
> >
> > People who need abstract tabular data capabilities can still implement
> > Table (though I wonder if that ability has ever been used productively).
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 19/02/2021 à 17:17, Benjamin Kietzman a écrit :
> > > Thanks for looking into this, Micah.
> > >
> > > One convention I'd like to append: we mostly avoid convenience typedefs,
> > > but an
> > > exception is the common case of `vector<shared_ptr<{class name}>>`, for
> > > which we
> > > allow and encourage the use of `{class name}Vector` typedefs.
> > (Conversely,
> > > nothing
> > > should be named /^\w+Vector$/ which isn't a vector-of-shared_ptr
> > typedef.)
> > >
> > > 1. Agreed on Array::data(), in fact I made this change in #9490
> > (ARROW-9196)
> > >
> > > 2. Is it worthwhile to keep RecordBatch abstract? I am only aware of a
> > > single concrete
> > >    subclass (SimpleRecordBatch). I agree that RecordBatch's assessors
> > > should avoid
> > >    unnecessary construction of shared_ptrs, and demoting it to a concrete
> > > class would
> > >    make it clear that this is safe.
> > >
> > > Ben
> > >
> > > On Mon, Feb 15, 2021 at 11:55 PM Micah Kornfield <em...@gmail.com>
> > > wrote:
> > >
> > >> (Apologies if this is a double send)
> > >>
> > >> I'll open a PR on this soon. To update the dev guide.
> > >>
> > >> Given this standard there are few accessor methods that I think we
> > should
> > >> either convert or create a new accessor that does the correct thing with
> > >> respect to return type.  Given how core these methods are I think the
> > >> latter might be a better approach (but I don't feel too strongly if
> > others
> > >> have a good rationale one way or another):
> > >> 1. Array::Data() [1] - In looking at some CPU profiles it seems like
> > most
> > >> of the time spent in Validate is due to shared_ptr
> > >> construction/destruction.  In auditing the code this method appears to
> > be
> > >> the only one returning copies.
> > >>
> > >> 2. RecordBatch::Column* [2] - These are more questionable since they are
> > >> virtual methods, it is not clear if dynamic Record batches where the
> > >> intention behind this design. So it might not be worth it.  Anecdotally,
> > >> I've known people who have written some naive iteration code use these
> > >> methods where shared_ptr construction/destruction contributed 10%
> > overhead.
> > >>
> > >> Thanks,
> > >> Micah
> > >>
> > >>
> > >>
> > >> [1]
> > >>
> > >>
> > https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L163
> > >> [2]
> > >>
> > >>
> > https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.h#L98
> > >>
> > >> On Mon, Feb 8, 2021 at 10:09 AM Wes McKinney <we...@gmail.com>
> > wrote:
> > >>
> > >>> Agreed. We should probably document this in the C++ developer docs.
> > >>>
> > >>> On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou <an...@python.org>
> > >> wrote:
> > >>>>
> > >>>>
> > >>>> Hi Micah,
> > >>>>
> > >>>> That's roughly my mental model as well.
> > >>>>
> > >>>> However, for 4) I would say that return a const ref to shared_ptr if
> > >>>> preferable because the caller will often need the ownership
> > (especially
> > >>>> with Array, ArrayData, DataType, etc.).
> > >>>>
> > >>>> Regards
> > >>>>
> > >>>> Antoine.
> > >>>>
> > >>>>
> > >>>> Le 08/02/2021 à 18:02, Micah Kornfield a écrit :
> > >>>>> I'm not sure how consistent we are with how shared_ptr is used as a
> > >>>>> parameter to methods and as a return type.  In reviewing and writing
> > >>> code
> > >>>>> I've been using these guidelines for myself and I was wondering if
> > >> they
> > >>>>> align with others:
> > >>>>>
> > >>>>> 1.  If a copy of a shared_ptr is not intended to be made by the
> > >> method
> > >>> then
> > >>>>> use a const ref to underlying type.  i.e. void Foo(const Array&
> > >> array)
> > >>> is
> > >>>>> preferable to void Foo(const shared_ptr<Array>& array) [1].
> > >>>>>
> > >>>>> 2.  If a copy is always going to be made pass by value.  i.e. void
> > >>>>> Foo(std::shared_ptr<Array>) and to std::move within the method.  The
> > >>> last
> > >>>>> time I did research on this allowed for eliminating shared_ptr
> > >>> overhead if
> > >>>>> the caller also can std::move() the parameter.
> > >>>>>
> > >>>>> 3.  If a copy might be made pass the shared_ptr by const reference.
> > >>> i.e. void
> > >>>>> Foo(const shared_ptr<T>& array) The exception to this if the contents
> > >>> of
> > >>>>> the shared_ptr a reference can effectively be copied cheaply without
> > >>> as is
> > >>>>> the case with Array via ArrayData in which case #1 applies.
> > >>>>>
> > >>>>> 4. For accessor methods prefer returning by const ref or underlying
> > >>> ref to
> > >>>>> underlying when appropriate.     i.e. const std::shared_ptr<Array>&
> > >>> foo()  or
> > >>>>> const Array& Foo().
> > >>>>>
> > >>>>> 5. For factory like methods return a copy i.e. std::shared_ptr<Array>
> > >>>>> MakeFoo();
> > >>>>>
> > >>>>> Is this other people's mental model?  I'd like to update our style
> > >>> guide so
> > >>>>> we can hopefully drive consistency over time.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Micah
> > >>>>>
> > >>>>> [1] Array is somewhat of a special case because one can have
> > >>> essentially
> > >>>>> the same shared_ptr copy semantics by copying the underlying
> > >> ArrayData
> > >>>>> object.
> > >>>>>
> > >>>
> > >>
> > >
> >

Re: [C++] Conventions around C++ shared_ptr in the code base?

Posted by Micah Kornfield <em...@gmail.com>.
>
> Agreed on Array::data(), in fact I made this change in #9490 (ARROW-9196)

Thank you!


> One convention I'd like to append: we mostly avoid convenience typedefs,
> but an
> exception is the common case of `vector<shared_ptr<{class name}>>`, for
> which we
> allow and encourage the use of `{class name}Vector` typedefs. (Conversely,
> nothing
> should be named /^\w+Vector$/ which isn't a vector-of-shared_ptr typedef.)

I'm -0 on this.  I think it makes the code base slightly less readable for
new-comers.

People who need abstract tabular data capabilities can still implement
> Table (though I wonder if that ability has ever been used productively).

I think there is an open PR proposing a dataframe like wrapper around table
or record batch (I would need to go back and look).  I would have to go
back and look to see if it made use of either of these.

-Micah



On Fri, Feb 19, 2021 at 8:25 AM Antoine Pitrou <an...@python.org> wrote:

>
> I don't think there's any benefit in keeping RecordBatch abstract.
> Making it concrete would probably reduce overhead slightly as well.
>
> People who need abstract tabular data capabilities can still implement
> Table (though I wonder if that ability has ever been used productively).
>
> Regards
>
> Antoine.
>
>
> Le 19/02/2021 à 17:17, Benjamin Kietzman a écrit :
> > Thanks for looking into this, Micah.
> >
> > One convention I'd like to append: we mostly avoid convenience typedefs,
> > but an
> > exception is the common case of `vector<shared_ptr<{class name}>>`, for
> > which we
> > allow and encourage the use of `{class name}Vector` typedefs.
> (Conversely,
> > nothing
> > should be named /^\w+Vector$/ which isn't a vector-of-shared_ptr
> typedef.)
> >
> > 1. Agreed on Array::data(), in fact I made this change in #9490
> (ARROW-9196)
> >
> > 2. Is it worthwhile to keep RecordBatch abstract? I am only aware of a
> > single concrete
> >    subclass (SimpleRecordBatch). I agree that RecordBatch's assessors
> > should avoid
> >    unnecessary construction of shared_ptrs, and demoting it to a concrete
> > class would
> >    make it clear that this is safe.
> >
> > Ben
> >
> > On Mon, Feb 15, 2021 at 11:55 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> >
> >> (Apologies if this is a double send)
> >>
> >> I'll open a PR on this soon. To update the dev guide.
> >>
> >> Given this standard there are few accessor methods that I think we
> should
> >> either convert or create a new accessor that does the correct thing with
> >> respect to return type.  Given how core these methods are I think the
> >> latter might be a better approach (but I don't feel too strongly if
> others
> >> have a good rationale one way or another):
> >> 1. Array::Data() [1] - In looking at some CPU profiles it seems like
> most
> >> of the time spent in Validate is due to shared_ptr
> >> construction/destruction.  In auditing the code this method appears to
> be
> >> the only one returning copies.
> >>
> >> 2. RecordBatch::Column* [2] - These are more questionable since they are
> >> virtual methods, it is not clear if dynamic Record batches where the
> >> intention behind this design. So it might not be worth it.  Anecdotally,
> >> I've known people who have written some naive iteration code use these
> >> methods where shared_ptr construction/destruction contributed 10%
> overhead.
> >>
> >> Thanks,
> >> Micah
> >>
> >>
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L163
> >> [2]
> >>
> >>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.h#L98
> >>
> >> On Mon, Feb 8, 2021 at 10:09 AM Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >>> Agreed. We should probably document this in the C++ developer docs.
> >>>
> >>> On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou <an...@python.org>
> >> wrote:
> >>>>
> >>>>
> >>>> Hi Micah,
> >>>>
> >>>> That's roughly my mental model as well.
> >>>>
> >>>> However, for 4) I would say that return a const ref to shared_ptr if
> >>>> preferable because the caller will often need the ownership
> (especially
> >>>> with Array, ArrayData, DataType, etc.).
> >>>>
> >>>> Regards
> >>>>
> >>>> Antoine.
> >>>>
> >>>>
> >>>> Le 08/02/2021 à 18:02, Micah Kornfield a écrit :
> >>>>> I'm not sure how consistent we are with how shared_ptr is used as a
> >>>>> parameter to methods and as a return type.  In reviewing and writing
> >>> code
> >>>>> I've been using these guidelines for myself and I was wondering if
> >> they
> >>>>> align with others:
> >>>>>
> >>>>> 1.  If a copy of a shared_ptr is not intended to be made by the
> >> method
> >>> then
> >>>>> use a const ref to underlying type.  i.e. void Foo(const Array&
> >> array)
> >>> is
> >>>>> preferable to void Foo(const shared_ptr<Array>& array) [1].
> >>>>>
> >>>>> 2.  If a copy is always going to be made pass by value.  i.e. void
> >>>>> Foo(std::shared_ptr<Array>) and to std::move within the method.  The
> >>> last
> >>>>> time I did research on this allowed for eliminating shared_ptr
> >>> overhead if
> >>>>> the caller also can std::move() the parameter.
> >>>>>
> >>>>> 3.  If a copy might be made pass the shared_ptr by const reference.
> >>> i.e. void
> >>>>> Foo(const shared_ptr<T>& array) The exception to this if the contents
> >>> of
> >>>>> the shared_ptr a reference can effectively be copied cheaply without
> >>> as is
> >>>>> the case with Array via ArrayData in which case #1 applies.
> >>>>>
> >>>>> 4. For accessor methods prefer returning by const ref or underlying
> >>> ref to
> >>>>> underlying when appropriate.     i.e. const std::shared_ptr<Array>&
> >>> foo()  or
> >>>>> const Array& Foo().
> >>>>>
> >>>>> 5. For factory like methods return a copy i.e. std::shared_ptr<Array>
> >>>>> MakeFoo();
> >>>>>
> >>>>> Is this other people's mental model?  I'd like to update our style
> >>> guide so
> >>>>> we can hopefully drive consistency over time.
> >>>>>
> >>>>> Thanks,
> >>>>> Micah
> >>>>>
> >>>>> [1] Array is somewhat of a special case because one can have
> >>> essentially
> >>>>> the same shared_ptr copy semantics by copying the underlying
> >> ArrayData
> >>>>> object.
> >>>>>
> >>>
> >>
> >
>

Re: [C++] Conventions around C++ shared_ptr in the code base?

Posted by Antoine Pitrou <an...@python.org>.
I don't think there's any benefit in keeping RecordBatch abstract.
Making it concrete would probably reduce overhead slightly as well.

People who need abstract tabular data capabilities can still implement
Table (though I wonder if that ability has ever been used productively).

Regards

Antoine.


Le 19/02/2021 à 17:17, Benjamin Kietzman a écrit :
> Thanks for looking into this, Micah.
> 
> One convention I'd like to append: we mostly avoid convenience typedefs,
> but an
> exception is the common case of `vector<shared_ptr<{class name}>>`, for
> which we
> allow and encourage the use of `{class name}Vector` typedefs. (Conversely,
> nothing
> should be named /^\w+Vector$/ which isn't a vector-of-shared_ptr typedef.)
> 
> 1. Agreed on Array::data(), in fact I made this change in #9490 (ARROW-9196)
> 
> 2. Is it worthwhile to keep RecordBatch abstract? I am only aware of a
> single concrete
>    subclass (SimpleRecordBatch). I agree that RecordBatch's assessors
> should avoid
>    unnecessary construction of shared_ptrs, and demoting it to a concrete
> class would
>    make it clear that this is safe.
> 
> Ben
> 
> On Mon, Feb 15, 2021 at 11:55 PM Micah Kornfield <em...@gmail.com>
> wrote:
> 
>> (Apologies if this is a double send)
>>
>> I'll open a PR on this soon. To update the dev guide.
>>
>> Given this standard there are few accessor methods that I think we should
>> either convert or create a new accessor that does the correct thing with
>> respect to return type.  Given how core these methods are I think the
>> latter might be a better approach (but I don't feel too strongly if others
>> have a good rationale one way or another):
>> 1. Array::Data() [1] - In looking at some CPU profiles it seems like most
>> of the time spent in Validate is due to shared_ptr
>> construction/destruction.  In auditing the code this method appears to be
>> the only one returning copies.
>>
>> 2. RecordBatch::Column* [2] - These are more questionable since they are
>> virtual methods, it is not clear if dynamic Record batches where the
>> intention behind this design. So it might not be worth it.  Anecdotally,
>> I've known people who have written some naive iteration code use these
>> methods where shared_ptr construction/destruction contributed 10% overhead.
>>
>> Thanks,
>> Micah
>>
>>
>>
>> [1]
>>
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L163
>> [2]
>>
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.h#L98
>>
>> On Mon, Feb 8, 2021 at 10:09 AM Wes McKinney <we...@gmail.com> wrote:
>>
>>> Agreed. We should probably document this in the C++ developer docs.
>>>
>>> On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou <an...@python.org>
>> wrote:
>>>>
>>>>
>>>> Hi Micah,
>>>>
>>>> That's roughly my mental model as well.
>>>>
>>>> However, for 4) I would say that return a const ref to shared_ptr if
>>>> preferable because the caller will often need the ownership (especially
>>>> with Array, ArrayData, DataType, etc.).
>>>>
>>>> Regards
>>>>
>>>> Antoine.
>>>>
>>>>
>>>> Le 08/02/2021 à 18:02, Micah Kornfield a écrit :
>>>>> I'm not sure how consistent we are with how shared_ptr is used as a
>>>>> parameter to methods and as a return type.  In reviewing and writing
>>> code
>>>>> I've been using these guidelines for myself and I was wondering if
>> they
>>>>> align with others:
>>>>>
>>>>> 1.  If a copy of a shared_ptr is not intended to be made by the
>> method
>>> then
>>>>> use a const ref to underlying type.  i.e. void Foo(const Array&
>> array)
>>> is
>>>>> preferable to void Foo(const shared_ptr<Array>& array) [1].
>>>>>
>>>>> 2.  If a copy is always going to be made pass by value.  i.e. void
>>>>> Foo(std::shared_ptr<Array>) and to std::move within the method.  The
>>> last
>>>>> time I did research on this allowed for eliminating shared_ptr
>>> overhead if
>>>>> the caller also can std::move() the parameter.
>>>>>
>>>>> 3.  If a copy might be made pass the shared_ptr by const reference.
>>> i.e. void
>>>>> Foo(const shared_ptr<T>& array) The exception to this if the contents
>>> of
>>>>> the shared_ptr a reference can effectively be copied cheaply without
>>> as is
>>>>> the case with Array via ArrayData in which case #1 applies.
>>>>>
>>>>> 4. For accessor methods prefer returning by const ref or underlying
>>> ref to
>>>>> underlying when appropriate.     i.e. const std::shared_ptr<Array>&
>>> foo()  or
>>>>> const Array& Foo().
>>>>>
>>>>> 5. For factory like methods return a copy i.e. std::shared_ptr<Array>
>>>>> MakeFoo();
>>>>>
>>>>> Is this other people's mental model?  I'd like to update our style
>>> guide so
>>>>> we can hopefully drive consistency over time.
>>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>> [1] Array is somewhat of a special case because one can have
>>> essentially
>>>>> the same shared_ptr copy semantics by copying the underlying
>> ArrayData
>>>>> object.
>>>>>
>>>
>>
> 

Re: [C++] Conventions around C++ shared_ptr in the code base?

Posted by Benjamin Kietzman <be...@gmail.com>.
Thanks for looking into this, Micah.

One convention I'd like to append: we mostly avoid convenience typedefs,
but an
exception is the common case of `vector<shared_ptr<{class name}>>`, for
which we
allow and encourage the use of `{class name}Vector` typedefs. (Conversely,
nothing
should be named /^\w+Vector$/ which isn't a vector-of-shared_ptr typedef.)

1. Agreed on Array::data(), in fact I made this change in #9490 (ARROW-9196)

2. Is it worthwhile to keep RecordBatch abstract? I am only aware of a
single concrete
   subclass (SimpleRecordBatch). I agree that RecordBatch's assessors
should avoid
   unnecessary construction of shared_ptrs, and demoting it to a concrete
class would
   make it clear that this is safe.

Ben

On Mon, Feb 15, 2021 at 11:55 PM Micah Kornfield <em...@gmail.com>
wrote:

> (Apologies if this is a double send)
>
> I'll open a PR on this soon. To update the dev guide.
>
> Given this standard there are few accessor methods that I think we should
> either convert or create a new accessor that does the correct thing with
> respect to return type.  Given how core these methods are I think the
> latter might be a better approach (but I don't feel too strongly if others
> have a good rationale one way or another):
> 1. Array::Data() [1] - In looking at some CPU profiles it seems like most
> of the time spent in Validate is due to shared_ptr
> construction/destruction.  In auditing the code this method appears to be
> the only one returning copies.
>
> 2. RecordBatch::Column* [2] - These are more questionable since they are
> virtual methods, it is not clear if dynamic Record batches where the
> intention behind this design. So it might not be worth it.  Anecdotally,
> I've known people who have written some naive iteration code use these
> methods where shared_ptr construction/destruction contributed 10% overhead.
>
> Thanks,
> Micah
>
>
>
> [1]
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L163
> [2]
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.h#L98
>
> On Mon, Feb 8, 2021 at 10:09 AM Wes McKinney <we...@gmail.com> wrote:
>
> > Agreed. We should probably document this in the C++ developer docs.
> >
> > On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou <an...@python.org>
> wrote:
> > >
> > >
> > > Hi Micah,
> > >
> > > That's roughly my mental model as well.
> > >
> > > However, for 4) I would say that return a const ref to shared_ptr if
> > > preferable because the caller will often need the ownership (especially
> > > with Array, ArrayData, DataType, etc.).
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 08/02/2021 à 18:02, Micah Kornfield a écrit :
> > > > I'm not sure how consistent we are with how shared_ptr is used as a
> > > > parameter to methods and as a return type.  In reviewing and writing
> > code
> > > > I've been using these guidelines for myself and I was wondering if
> they
> > > > align with others:
> > > >
> > > > 1.  If a copy of a shared_ptr is not intended to be made by the
> method
> > then
> > > > use a const ref to underlying type.  i.e. void Foo(const Array&
> array)
> > is
> > > > preferable to void Foo(const shared_ptr<Array>& array) [1].
> > > >
> > > > 2.  If a copy is always going to be made pass by value.  i.e. void
> > > > Foo(std::shared_ptr<Array>) and to std::move within the method.  The
> > last
> > > > time I did research on this allowed for eliminating shared_ptr
> > overhead if
> > > > the caller also can std::move() the parameter.
> > > >
> > > > 3.  If a copy might be made pass the shared_ptr by const reference.
> > i.e. void
> > > > Foo(const shared_ptr<T>& array) The exception to this if the contents
> > of
> > > > the shared_ptr a reference can effectively be copied cheaply without
> > as is
> > > > the case with Array via ArrayData in which case #1 applies.
> > > >
> > > > 4. For accessor methods prefer returning by const ref or underlying
> > ref to
> > > > underlying when appropriate.     i.e. const std::shared_ptr<Array>&
> > foo()  or
> > > > const Array& Foo().
> > > >
> > > > 5. For factory like methods return a copy i.e. std::shared_ptr<Array>
> > > > MakeFoo();
> > > >
> > > > Is this other people's mental model?  I'd like to update our style
> > guide so
> > > > we can hopefully drive consistency over time.
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > > > [1] Array is somewhat of a special case because one can have
> > essentially
> > > > the same shared_ptr copy semantics by copying the underlying
> ArrayData
> > > > object.
> > > >
> >
>

Re: [C++] Conventions around C++ shared_ptr in the code base?

Posted by Micah Kornfield <em...@gmail.com>.
(Apologies if this is a double send)

I'll open a PR on this soon. To update the dev guide.

Given this standard there are few accessor methods that I think we should
either convert or create a new accessor that does the correct thing with
respect to return type.  Given how core these methods are I think the
latter might be a better approach (but I don't feel too strongly if others
have a good rationale one way or another):
1. Array::Data() [1] - In looking at some CPU profiles it seems like most
of the time spent in Validate is due to shared_ptr
construction/destruction.  In auditing the code this method appears to be
the only one returning copies.

2. RecordBatch::Column* [2] - These are more questionable since they are
virtual methods, it is not clear if dynamic Record batches where the
intention behind this design. So it might not be worth it.  Anecdotally,
I've known people who have written some naive iteration code use these
methods where shared_ptr construction/destruction contributed 10% overhead.

Thanks,
Micah



[1]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L163
[2]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/record_batch.h#L98

On Mon, Feb 8, 2021 at 10:09 AM Wes McKinney <we...@gmail.com> wrote:

> Agreed. We should probably document this in the C++ developer docs.
>
> On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Hi Micah,
> >
> > That's roughly my mental model as well.
> >
> > However, for 4) I would say that return a const ref to shared_ptr if
> > preferable because the caller will often need the ownership (especially
> > with Array, ArrayData, DataType, etc.).
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 08/02/2021 à 18:02, Micah Kornfield a écrit :
> > > I'm not sure how consistent we are with how shared_ptr is used as a
> > > parameter to methods and as a return type.  In reviewing and writing
> code
> > > I've been using these guidelines for myself and I was wondering if they
> > > align with others:
> > >
> > > 1.  If a copy of a shared_ptr is not intended to be made by the method
> then
> > > use a const ref to underlying type.  i.e. void Foo(const Array& array)
> is
> > > preferable to void Foo(const shared_ptr<Array>& array) [1].
> > >
> > > 2.  If a copy is always going to be made pass by value.  i.e. void
> > > Foo(std::shared_ptr<Array>) and to std::move within the method.  The
> last
> > > time I did research on this allowed for eliminating shared_ptr
> overhead if
> > > the caller also can std::move() the parameter.
> > >
> > > 3.  If a copy might be made pass the shared_ptr by const reference.
> i.e. void
> > > Foo(const shared_ptr<T>& array) The exception to this if the contents
> of
> > > the shared_ptr a reference can effectively be copied cheaply without
> as is
> > > the case with Array via ArrayData in which case #1 applies.
> > >
> > > 4. For accessor methods prefer returning by const ref or underlying
> ref to
> > > underlying when appropriate.     i.e. const std::shared_ptr<Array>&
> foo()  or
> > > const Array& Foo().
> > >
> > > 5. For factory like methods return a copy i.e. std::shared_ptr<Array>
> > > MakeFoo();
> > >
> > > Is this other people's mental model?  I'd like to update our style
> guide so
> > > we can hopefully drive consistency over time.
> > >
> > > Thanks,
> > > Micah
> > >
> > > [1] Array is somewhat of a special case because one can have
> essentially
> > > the same shared_ptr copy semantics by copying the underlying ArrayData
> > > object.
> > >
>

Re: [C++] Conventions around C++ shared_ptr in the code base?

Posted by Wes McKinney <we...@gmail.com>.
Agreed. We should probably document this in the C++ developer docs.

On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou <an...@python.org> wrote:
>
>
> Hi Micah,
>
> That's roughly my mental model as well.
>
> However, for 4) I would say that return a const ref to shared_ptr if
> preferable because the caller will often need the ownership (especially
> with Array, ArrayData, DataType, etc.).
>
> Regards
>
> Antoine.
>
>
> Le 08/02/2021 à 18:02, Micah Kornfield a écrit :
> > I'm not sure how consistent we are with how shared_ptr is used as a
> > parameter to methods and as a return type.  In reviewing and writing code
> > I've been using these guidelines for myself and I was wondering if they
> > align with others:
> >
> > 1.  If a copy of a shared_ptr is not intended to be made by the method then
> > use a const ref to underlying type.  i.e. void Foo(const Array& array) is
> > preferable to void Foo(const shared_ptr<Array>& array) [1].
> >
> > 2.  If a copy is always going to be made pass by value.  i.e. void
> > Foo(std::shared_ptr<Array>) and to std::move within the method.  The last
> > time I did research on this allowed for eliminating shared_ptr overhead if
> > the caller also can std::move() the parameter.
> >
> > 3.  If a copy might be made pass the shared_ptr by const reference.  i.e. void
> > Foo(const shared_ptr<T>& array) The exception to this if the contents of
> > the shared_ptr a reference can effectively be copied cheaply without as is
> > the case with Array via ArrayData in which case #1 applies.
> >
> > 4. For accessor methods prefer returning by const ref or underlying ref to
> > underlying when appropriate.     i.e. const std::shared_ptr<Array>& foo()  or
> > const Array& Foo().
> >
> > 5. For factory like methods return a copy i.e. std::shared_ptr<Array>
> > MakeFoo();
> >
> > Is this other people's mental model?  I'd like to update our style guide so
> > we can hopefully drive consistency over time.
> >
> > Thanks,
> > Micah
> >
> > [1] Array is somewhat of a special case because one can have essentially
> > the same shared_ptr copy semantics by copying the underlying ArrayData
> > object.
> >

Re: [C++] Conventions around C++ shared_ptr in the code base?

Posted by Antoine Pitrou <an...@python.org>.
Hi Micah,

That's roughly my mental model as well.

However, for 4) I would say that return a const ref to shared_ptr if
preferable because the caller will often need the ownership (especially
with Array, ArrayData, DataType, etc.).

Regards

Antoine.


Le 08/02/2021 à 18:02, Micah Kornfield a écrit :
> I'm not sure how consistent we are with how shared_ptr is used as a
> parameter to methods and as a return type.  In reviewing and writing code
> I've been using these guidelines for myself and I was wondering if they
> align with others:
> 
> 1.  If a copy of a shared_ptr is not intended to be made by the method then
> use a const ref to underlying type.  i.e. void Foo(const Array& array) is
> preferable to void Foo(const shared_ptr<Array>& array) [1].
> 
> 2.  If a copy is always going to be made pass by value.  i.e. void
> Foo(std::shared_ptr<Array>) and to std::move within the method.  The last
> time I did research on this allowed for eliminating shared_ptr overhead if
> the caller also can std::move() the parameter.
> 
> 3.  If a copy might be made pass the shared_ptr by const reference.  i.e. void
> Foo(const shared_ptr<T>& array) The exception to this if the contents of
> the shared_ptr a reference can effectively be copied cheaply without as is
> the case with Array via ArrayData in which case #1 applies.
> 
> 4. For accessor methods prefer returning by const ref or underlying ref to
> underlying when appropriate.     i.e. const std::shared_ptr<Array>& foo()  or
> const Array& Foo().
> 
> 5. For factory like methods return a copy i.e. std::shared_ptr<Array>
> MakeFoo();
> 
> Is this other people's mental model?  I'd like to update our style guide so
> we can hopefully drive consistency over time.
> 
> Thanks,
> Micah
> 
> [1] Array is somewhat of a special case because one can have essentially
> the same shared_ptr copy semantics by copying the underlying ArrayData
> object.
>