You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2019/11/21 04:50:12 UTC

[DISCUSS][C++] Pointer name aliasing

A recent PR for datasets  [1] seems to have introduced the convention of
aliasing "std::shared_ptr<Type>" with "TypePtr" for some type.  I think in
the past we had decided not to use a convention like this but I can't find
the thread.  IMO, I think this generally makes the code less understandable
but this is a matter of taste.

Before the pattern gets too ingrained in the code base I just wanted to
make sure we discussed using this on the mailing list and made sure there
was a consensus on when to use the pattern.

Thanks,
Micah

[1] https://github.com/apache/arrow/pull/5857/files

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Wes McKinney <we...@gmail.com>.
Hi Liya,

In general I think when a development guideline promotes readability
it is a good thing. Issues like these may be addressed on a case by
case basis, though.

best
Wes

On Sun, Dec 22, 2019 at 8:54 PM Fan Liya <li...@gmail.com> wrote:
>
> IMO, this question relates to something general and fundamental.
>
> Generally, name alias leads to two results:
> 1) It makes writing code easier
> 2) It makes reading code more difficult
>
> Personally, I prefer readability to writability.
> However, I am wrondering if we have some general principles regarding this?
>
> Best,
> Liya Fan
>
>
> On Fri, Dec 20, 2019 at 12:17 AM Francois Saint-Jacques <
> fsaintjacques@gmail.com> wrote:
>
> > I created the following ticket (and sub-tasks) [1]  to track
> >
> > François
> >
> > [1] https://jira.apache.org/jira/browse/ARROW-7438
> >
> > On Tue, Nov 26, 2019 at 12:09 AM Micah Kornfield <em...@gmail.com>
> > wrote:
> > >
> > > I would need to look at the other instances as well.  I will try to so by
> > > next week, but I think we can probably take an incremental approach of:
> > > 1.  Eliminate *Ptr in src/arrow code (discuss similar changes in
> > > parquet/gandiva).
> > > 2.  Decide on the Iterator/Vector.
> > >
> > > On Fri, Nov 22, 2019 at 10:47 AM Wes McKinney <we...@gmail.com>
> > wrote:
> > >
> > > > hi Francois
> > > >
> > > > On Fri, Nov 22, 2019 at 11:17 AM Francois Saint-Jacques
> > > > <fs...@gmail.com> wrote:
> > > > >
> > > > > I'll revert, some questions:
> > > > >
> > > > > 1. Should we revert only the pointer aliases, or also the
> > > > Vector/Iterator.
> > > >
> > > > Could you clarify what Vector/Iterator aliases you are referring to,
> > > > like RecordBatchIterator?
> > > >
> > > > I think we may need to distinguish between endogenous aliases versus
> > > > aliases involving STL types.
> > > >
> > > > IMHO
> > > >
> > > > using RecordBatchIterator = Iterator<shared_ptr<RecordBatch>>;
> > > >
> > > > is less problematic from a readability standpoint than
> > > >
> > > > using RecordBatchPtr = shared_ptr<RecordBatch>;
> > > >
> > > > > 2. Should we revert all modules, i.e. gandiva and compute.
> > > >
> > > > I would say one step at a time -- in the case of Gandiva I would say
> > > > that we should open a JIRA issue and discuss further to ensure that we
> > > > do not cause disruption for public API consumers. Since this software
> > > > has already been released multiple times, a different approach may be
> > > > needed
> > > >
> > > > >
> > > > > François
> > > >
> >

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Fan Liya <li...@gmail.com>.
IMO, this question relates to something general and fundamental.

Generally, name alias leads to two results:
1) It makes writing code easier
2) It makes reading code more difficult

Personally, I prefer readability to writability.
However, I am wrondering if we have some general principles regarding this?

Best,
Liya Fan


On Fri, Dec 20, 2019 at 12:17 AM Francois Saint-Jacques <
fsaintjacques@gmail.com> wrote:

> I created the following ticket (and sub-tasks) [1]  to track
>
> François
>
> [1] https://jira.apache.org/jira/browse/ARROW-7438
>
> On Tue, Nov 26, 2019 at 12:09 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > I would need to look at the other instances as well.  I will try to so by
> > next week, but I think we can probably take an incremental approach of:
> > 1.  Eliminate *Ptr in src/arrow code (discuss similar changes in
> > parquet/gandiva).
> > 2.  Decide on the Iterator/Vector.
> >
> > On Fri, Nov 22, 2019 at 10:47 AM Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > hi Francois
> > >
> > > On Fri, Nov 22, 2019 at 11:17 AM Francois Saint-Jacques
> > > <fs...@gmail.com> wrote:
> > > >
> > > > I'll revert, some questions:
> > > >
> > > > 1. Should we revert only the pointer aliases, or also the
> > > Vector/Iterator.
> > >
> > > Could you clarify what Vector/Iterator aliases you are referring to,
> > > like RecordBatchIterator?
> > >
> > > I think we may need to distinguish between endogenous aliases versus
> > > aliases involving STL types.
> > >
> > > IMHO
> > >
> > > using RecordBatchIterator = Iterator<shared_ptr<RecordBatch>>;
> > >
> > > is less problematic from a readability standpoint than
> > >
> > > using RecordBatchPtr = shared_ptr<RecordBatch>;
> > >
> > > > 2. Should we revert all modules, i.e. gandiva and compute.
> > >
> > > I would say one step at a time -- in the case of Gandiva I would say
> > > that we should open a JIRA issue and discuss further to ensure that we
> > > do not cause disruption for public API consumers. Since this software
> > > has already been released multiple times, a different approach may be
> > > needed
> > >
> > > >
> > > > François
> > >
>

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Francois Saint-Jacques <fs...@gmail.com>.
I created the following ticket (and sub-tasks) [1]  to track

François

[1] https://jira.apache.org/jira/browse/ARROW-7438

On Tue, Nov 26, 2019 at 12:09 AM Micah Kornfield <em...@gmail.com> wrote:
>
> I would need to look at the other instances as well.  I will try to so by
> next week, but I think we can probably take an incremental approach of:
> 1.  Eliminate *Ptr in src/arrow code (discuss similar changes in
> parquet/gandiva).
> 2.  Decide on the Iterator/Vector.
>
> On Fri, Nov 22, 2019 at 10:47 AM Wes McKinney <we...@gmail.com> wrote:
>
> > hi Francois
> >
> > On Fri, Nov 22, 2019 at 11:17 AM Francois Saint-Jacques
> > <fs...@gmail.com> wrote:
> > >
> > > I'll revert, some questions:
> > >
> > > 1. Should we revert only the pointer aliases, or also the
> > Vector/Iterator.
> >
> > Could you clarify what Vector/Iterator aliases you are referring to,
> > like RecordBatchIterator?
> >
> > I think we may need to distinguish between endogenous aliases versus
> > aliases involving STL types.
> >
> > IMHO
> >
> > using RecordBatchIterator = Iterator<shared_ptr<RecordBatch>>;
> >
> > is less problematic from a readability standpoint than
> >
> > using RecordBatchPtr = shared_ptr<RecordBatch>;
> >
> > > 2. Should we revert all modules, i.e. gandiva and compute.
> >
> > I would say one step at a time -- in the case of Gandiva I would say
> > that we should open a JIRA issue and discuss further to ensure that we
> > do not cause disruption for public API consumers. Since this software
> > has already been released multiple times, a different approach may be
> > needed
> >
> > >
> > > François
> >

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Micah Kornfield <em...@gmail.com>.
I would need to look at the other instances as well.  I will try to so by
next week, but I think we can probably take an incremental approach of:
1.  Eliminate *Ptr in src/arrow code (discuss similar changes in
parquet/gandiva).
2.  Decide on the Iterator/Vector.

On Fri, Nov 22, 2019 at 10:47 AM Wes McKinney <we...@gmail.com> wrote:

> hi Francois
>
> On Fri, Nov 22, 2019 at 11:17 AM Francois Saint-Jacques
> <fs...@gmail.com> wrote:
> >
> > I'll revert, some questions:
> >
> > 1. Should we revert only the pointer aliases, or also the
> Vector/Iterator.
>
> Could you clarify what Vector/Iterator aliases you are referring to,
> like RecordBatchIterator?
>
> I think we may need to distinguish between endogenous aliases versus
> aliases involving STL types.
>
> IMHO
>
> using RecordBatchIterator = Iterator<shared_ptr<RecordBatch>>;
>
> is less problematic from a readability standpoint than
>
> using RecordBatchPtr = shared_ptr<RecordBatch>;
>
> > 2. Should we revert all modules, i.e. gandiva and compute.
>
> I would say one step at a time -- in the case of Gandiva I would say
> that we should open a JIRA issue and discuss further to ensure that we
> do not cause disruption for public API consumers. Since this software
> has already been released multiple times, a different approach may be
> needed
>
> >
> > François
>

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Wes McKinney <we...@gmail.com>.
hi Francois

On Fri, Nov 22, 2019 at 11:17 AM Francois Saint-Jacques
<fs...@gmail.com> wrote:
>
> I'll revert, some questions:
>
> 1. Should we revert only the pointer aliases, or also the Vector/Iterator.

Could you clarify what Vector/Iterator aliases you are referring to,
like RecordBatchIterator?

I think we may need to distinguish between endogenous aliases versus
aliases involving STL types.

IMHO

using RecordBatchIterator = Iterator<shared_ptr<RecordBatch>>;

is less problematic from a readability standpoint than

using RecordBatchPtr = shared_ptr<RecordBatch>;

> 2. Should we revert all modules, i.e. gandiva and compute.

I would say one step at a time -- in the case of Gandiva I would say
that we should open a JIRA issue and discuss further to ensure that we
do not cause disruption for public API consumers. Since this software
has already been released multiple times, a different approach may be
needed

>
> François

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Francois Saint-Jacques <fs...@gmail.com>.
I'll revert, some questions:

1. Should we revert only the pointer aliases, or also the Vector/Iterator.
2. Should we revert all modules, i.e. gandiva and compute.

François

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Wes McKinney <we...@gmail.com>.
On Thu, Nov 21, 2019 at 11:27 PM Micah Kornfield <em...@gmail.com> wrote:
>
> >
> > I think we should mostly be careful about public APIs. With public
> > APIs we should write out the types and avoid aliases. With
> > implementation details and private/protected class members, I think it
> > is fine to use aliases.
>
> My concern with this is that in general if the types are in the header
> files they have a way of leaking out (whether intentional or not).

Yes, after looking at the PR that motivated this e-mail thread, I am
concerned about having aliases in public header files

>
>
> On Thu, Nov 21, 2019 at 12:06 PM Wes McKinney <we...@gmail.com> wrote:
>
> > I think we should mostly be careful about public APIs. With public
> > APIs we should write out the types and avoid aliases. With
> > implementation details and private/protected class members, I think it
> > is fine to use aliases.
> >
> > On Thu, Nov 21, 2019 at 11:06 AM Antoine Pitrou <so...@pitrou.net>
> > wrote:
> > >
> > > On Thu, 21 Nov 2019 08:40:10 -0500
> > > Francois Saint-Jacques <fs...@gmail.com> wrote:
> > > > This notation is already used in some parts of the codebase [1]. I
> > > > think it was introduced when absorbing gandiva and then in a draft of
> > > > the logical operations in the compute module. I have no strong opinion
> > > > for/against. I find it convenient to reduce typing, but the style
> > > > guide argue against this.
> > > >
> > > > What about other aliases (Vector & Iterator)? If we revert this
> > > > change, we should do it uniformly, e.g. in gandiva and compute.
> > >
> > > Vector and Iterator sound ok to me (though Iterator could yield some
> > > confusion with STL iterators, and Iterator<T> isn't really longer to
> > > type than TIterator).
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> >

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Micah Kornfield <em...@gmail.com>.
>
> I think we should mostly be careful about public APIs. With public
> APIs we should write out the types and avoid aliases. With
> implementation details and private/protected class members, I think it
> is fine to use aliases.

My concern with this is that in general if the types are in the header
files they have a way of leaking out (whether intentional or not).



On Thu, Nov 21, 2019 at 12:06 PM Wes McKinney <we...@gmail.com> wrote:

> I think we should mostly be careful about public APIs. With public
> APIs we should write out the types and avoid aliases. With
> implementation details and private/protected class members, I think it
> is fine to use aliases.
>
> On Thu, Nov 21, 2019 at 11:06 AM Antoine Pitrou <so...@pitrou.net>
> wrote:
> >
> > On Thu, 21 Nov 2019 08:40:10 -0500
> > Francois Saint-Jacques <fs...@gmail.com> wrote:
> > > This notation is already used in some parts of the codebase [1]. I
> > > think it was introduced when absorbing gandiva and then in a draft of
> > > the logical operations in the compute module. I have no strong opinion
> > > for/against. I find it convenient to reduce typing, but the style
> > > guide argue against this.
> > >
> > > What about other aliases (Vector & Iterator)? If we revert this
> > > change, we should do it uniformly, e.g. in gandiva and compute.
> >
> > Vector and Iterator sound ok to me (though Iterator could yield some
> > confusion with STL iterators, and Iterator<T> isn't really longer to
> > type than TIterator).
> >
> > Regards
> >
> > Antoine.
> >
> >
>

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Wes McKinney <we...@gmail.com>.
I think we should mostly be careful about public APIs. With public
APIs we should write out the types and avoid aliases. With
implementation details and private/protected class members, I think it
is fine to use aliases.

On Thu, Nov 21, 2019 at 11:06 AM Antoine Pitrou <so...@pitrou.net> wrote:
>
> On Thu, 21 Nov 2019 08:40:10 -0500
> Francois Saint-Jacques <fs...@gmail.com> wrote:
> > This notation is already used in some parts of the codebase [1]. I
> > think it was introduced when absorbing gandiva and then in a draft of
> > the logical operations in the compute module. I have no strong opinion
> > for/against. I find it convenient to reduce typing, but the style
> > guide argue against this.
> >
> > What about other aliases (Vector & Iterator)? If we revert this
> > change, we should do it uniformly, e.g. in gandiva and compute.
>
> Vector and Iterator sound ok to me (though Iterator could yield some
> confusion with STL iterators, and Iterator<T> isn't really longer to
> type than TIterator).
>
> Regards
>
> Antoine.
>
>

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Antoine Pitrou <so...@pitrou.net>.
On Thu, 21 Nov 2019 08:40:10 -0500
Francois Saint-Jacques <fs...@gmail.com> wrote:
> This notation is already used in some parts of the codebase [1]. I
> think it was introduced when absorbing gandiva and then in a draft of
> the logical operations in the compute module. I have no strong opinion
> for/against. I find it convenient to reduce typing, but the style
> guide argue against this.
> 
> What about other aliases (Vector & Iterator)? If we revert this
> change, we should do it uniformly, e.g. in gandiva and compute.

Vector and Iterator sound ok to me (though Iterator could yield some
confusion with STL iterators, and Iterator<T> isn't really longer to
type than TIterator).

Regards

Antoine.



Re: [DISCUSS][C++] Pointer name aliasing

Posted by Francois Saint-Jacques <fs...@gmail.com>.
This notation is already used in some parts of the codebase [1]. I
think it was introduced when absorbing gandiva and then in a draft of
the logical operations in the compute module. I have no strong opinion
for/against. I find it convenient to reduce typing, but the style
guide argue against this.

What about other aliases (Vector & Iterator)? If we revert this
change, we should do it uniformly, e.g. in gandiva and compute.

François

[1] https://gist.github.com/fsaintjacques/18720eebd9de3bb7770586ed8ec0ef6f

On Thu, Nov 21, 2019 at 6:10 AM Antoine Pitrou <so...@pitrou.net> wrote:
>
> On Wed, 20 Nov 2019 20:50:12 -0800
> Micah Kornfield <em...@gmail.com> wrote:
> > A recent PR for datasets  [1] seems to have introduced the convention of
> > aliasing "std::shared_ptr<Type>" with "TypePtr" for some type.  I think in
> > the past we had decided not to use a convention like this but I can't find
> > the thread.  IMO, I think this generally makes the code less understandable
> > but this is a matter of taste.
>
> I agree this introduces ambiguity for casual readers of the code.  The
> question is whether the savings in typing are worth it.  Personally,
> I've become used to writing "shared_ptr" a lot :-)
>
> Regards
>
> Antoine.
>
>

Re: [DISCUSS][C++] Pointer name aliasing

Posted by Antoine Pitrou <so...@pitrou.net>.
On Wed, 20 Nov 2019 20:50:12 -0800
Micah Kornfield <em...@gmail.com> wrote:
> A recent PR for datasets  [1] seems to have introduced the convention of
> aliasing "std::shared_ptr<Type>" with "TypePtr" for some type.  I think in
> the past we had decided not to use a convention like this but I can't find
> the thread.  IMO, I think this generally makes the code less understandable
> but this is a matter of taste.

I agree this introduces ambiguity for casual readers of the code.  The
question is whether the savings in typing are worth it.  Personally,
I've become used to writing "shared_ptr" a lot :-)

Regards

Antoine.