You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2023/01/06 17:58:41 UTC

[DISCUSS] Updating what are considered reference implementations?

I'm having trouble finding it, but I think we've previously agreed that new
features needed implementations in 2 reference implementations before
approval (I had thought the community agreed on Java and C++ as the two
implementations but I can't find the vote thread on it).  The recent of
addition RLE arrays [1] used Go and C++ as the reference implementations.

Given current maintainer bandwidth it seems reasonable to me to no longer
consider Java as a canonical reference implementation, but I think it would
be good to standardize on which language bindings are considered canonical
reference implementations, and to require spec changes be implemented in
them.

Simply based on perceived community sizes and completeness it seems that
maybe C++ and Rust should be the new standard?

Thanks,
Micah

[1] https://lists.apache.org/thread/j474q1dq9j11fz563vtztmzf8vjozbfz

Re: [DISCUSS] Updating what are considered reference implementations?

Posted by Raphael Taylor-Davies <r....@googlemail.com.INVALID>.
I agree that requiring addition to a "complete" implementation would be 
unfortunate, if only because a cursory glance at [1] shows that there 
aren't any that implement the entire specification anyway. I don't think 
this should preclude adding new array types, although it might give us 
cause to pause before adding many more...

I personally think requiring at least two native implementations, 
accompanying integration tests, and a formal vote should be sufficient. 
This will serve to both bring visibility to the proposal, ensure it is 
tractable, and that there are no objections to the proposal. Ultimately 
I think it is fine for implementations to differ in the feature set they 
implement, provided it is clearly communicated, as I think [1] does very 
well. Users are then able to make an informed judgement call as to 
whether they wish to use a given feature based on its adoption.

[1]: https://arrow.apache.org/docs/status.html

On 11/01/2023 21:11, Brian Hulette wrote:
> I think this [1] is the thread where the policy was proposed, but it
> doesn't look like we ever settled on "Java and C++" vs. "any two
> implementations", or had a vote.
>
> I worry that requiring maintainers to add new format features to two
> "complete" implementations will just lead to fragmentation. People might
> opt to maintain a fork rather than unblock themselves by implementing a
> backlog of features they don't need.
>
> [1] https://lists.apache.org/thread/9t0pglrvxjhrt4r4xcsc1zmgmbtr8pxj
>
> On Fri, Jan 6, 2023 at 12:33 PM Weston Pace <we...@gmail.com> wrote:
>
>> I think it would be reasonable to state that a reference
>> implementation must be a complete implementation (i.e. supports all
>> existing types) that is not derived from another implementation (e.g.
>> you can't pick pyarrow and arrow-c++).  If an implementation does not
>> plan on ever supporting a new array type then maintainers of that
>> implementation should be empowered to vote against it.  Given that, it
>> seems like a reasonable burden to ask maintainers to catch up first
>> before expanding in new directions.
>>
>>
>> On Fri, Jan 6, 2023 at 10:20 AM Micah Kornfield <em...@gmail.com>
>> wrote:
>>>> Note this wording talks about "two reference implementations" not
>> "*the*
>>>> two reference implementations". So there can be more than two reference
>>>> implementations.
>>>
>>> Maybe reference implementation is the wrong wording here.  My main
>> concern
>>> is that we try to maintain two "feature complete" implementations at all
>>> times.  I worry if there is a pick  2 from N reference implementations
>> that
>>> potentially leads to fragmentation more quickly.  But maybe this is
>>> premature?
>>>
>>> Cheers,
>>> Micah
>>>
>>>
>>> On Fri, Jan 6, 2023 at 10:02 AM Antoine Pitrou <an...@python.org>
>> wrote:
>>>> Le 06/01/2023 à 18:58, Micah Kornfield a écrit :
>>>>> I'm having trouble finding it, but I think we've previously agreed
>> that
>>>> new
>>>>> features needed implementations in 2 reference implementations before
>>>>> approval (I had thought the community agreed on Java and C++ as the
>> two
>>>>> implementations but I can't find the vote thread on it).
>>>> Note this wording talks about "two reference implementations" not
>> "*the*
>>>> two reference implementations". So there can be more than two reference
>>>> implementations.
>>>>
>>>> Regards
>>>>
>>>> Antoine.
>>>>

Re: [DISCUSS] Updating what are considered reference implementations?

Posted by Brian Hulette <bh...@apache.org>.
I think this [1] is the thread where the policy was proposed, but it
doesn't look like we ever settled on "Java and C++" vs. "any two
implementations", or had a vote.

I worry that requiring maintainers to add new format features to two
"complete" implementations will just lead to fragmentation. People might
opt to maintain a fork rather than unblock themselves by implementing a
backlog of features they don't need.

[1] https://lists.apache.org/thread/9t0pglrvxjhrt4r4xcsc1zmgmbtr8pxj

On Fri, Jan 6, 2023 at 12:33 PM Weston Pace <we...@gmail.com> wrote:

> I think it would be reasonable to state that a reference
> implementation must be a complete implementation (i.e. supports all
> existing types) that is not derived from another implementation (e.g.
> you can't pick pyarrow and arrow-c++).  If an implementation does not
> plan on ever supporting a new array type then maintainers of that
> implementation should be empowered to vote against it.  Given that, it
> seems like a reasonable burden to ask maintainers to catch up first
> before expanding in new directions.
>
>
> On Fri, Jan 6, 2023 at 10:20 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > >
> > > Note this wording talks about "two reference implementations" not
> "*the*
> > > two reference implementations". So there can be more than two reference
> > > implementations.
> >
> >
> > Maybe reference implementation is the wrong wording here.  My main
> concern
> > is that we try to maintain two "feature complete" implementations at all
> > times.  I worry if there is a pick  2 from N reference implementations
> that
> > potentially leads to fragmentation more quickly.  But maybe this is
> > premature?
> >
> > Cheers,
> > Micah
> >
> >
> > On Fri, Jan 6, 2023 at 10:02 AM Antoine Pitrou <an...@python.org>
> wrote:
> >
> > >
> > > Le 06/01/2023 à 18:58, Micah Kornfield a écrit :
> > > > I'm having trouble finding it, but I think we've previously agreed
> that
> > > new
> > > > features needed implementations in 2 reference implementations before
> > > > approval (I had thought the community agreed on Java and C++ as the
> two
> > > > implementations but I can't find the vote thread on it).
> > >
> > > Note this wording talks about "two reference implementations" not
> "*the*
> > > two reference implementations". So there can be more than two reference
> > > implementations.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
>

Re: [DISCUSS] Updating what are considered reference implementations?

Posted by Weston Pace <we...@gmail.com>.
I think it would be reasonable to state that a reference
implementation must be a complete implementation (i.e. supports all
existing types) that is not derived from another implementation (e.g.
you can't pick pyarrow and arrow-c++).  If an implementation does not
plan on ever supporting a new array type then maintainers of that
implementation should be empowered to vote against it.  Given that, it
seems like a reasonable burden to ask maintainers to catch up first
before expanding in new directions.


On Fri, Jan 6, 2023 at 10:20 AM Micah Kornfield <em...@gmail.com> wrote:
>
> >
> > Note this wording talks about "two reference implementations" not "*the*
> > two reference implementations". So there can be more than two reference
> > implementations.
>
>
> Maybe reference implementation is the wrong wording here.  My main concern
> is that we try to maintain two "feature complete" implementations at all
> times.  I worry if there is a pick  2 from N reference implementations that
> potentially leads to fragmentation more quickly.  But maybe this is
> premature?
>
> Cheers,
> Micah
>
>
> On Fri, Jan 6, 2023 at 10:02 AM Antoine Pitrou <an...@python.org> wrote:
>
> >
> > Le 06/01/2023 à 18:58, Micah Kornfield a écrit :
> > > I'm having trouble finding it, but I think we've previously agreed that
> > new
> > > features needed implementations in 2 reference implementations before
> > > approval (I had thought the community agreed on Java and C++ as the two
> > > implementations but I can't find the vote thread on it).
> >
> > Note this wording talks about "two reference implementations" not "*the*
> > two reference implementations". So there can be more than two reference
> > implementations.
> >
> > Regards
> >
> > Antoine.
> >

Re: [DISCUSS] Updating what are considered reference implementations?

Posted by Micah Kornfield <em...@gmail.com>.
>
> Note this wording talks about "two reference implementations" not "*the*
> two reference implementations". So there can be more than two reference
> implementations.


Maybe reference implementation is the wrong wording here.  My main concern
is that we try to maintain two "feature complete" implementations at all
times.  I worry if there is a pick  2 from N reference implementations that
potentially leads to fragmentation more quickly.  But maybe this is
premature?

Cheers,
Micah


On Fri, Jan 6, 2023 at 10:02 AM Antoine Pitrou <an...@python.org> wrote:

>
> Le 06/01/2023 à 18:58, Micah Kornfield a écrit :
> > I'm having trouble finding it, but I think we've previously agreed that
> new
> > features needed implementations in 2 reference implementations before
> > approval (I had thought the community agreed on Java and C++ as the two
> > implementations but I can't find the vote thread on it).
>
> Note this wording talks about "two reference implementations" not "*the*
> two reference implementations". So there can be more than two reference
> implementations.
>
> Regards
>
> Antoine.
>

Re: ADLS C++ support in next release (version 11)

Posted by Micah Kornfield <em...@gmail.com>.
It looks like there is an open PR:
https://github.com/apache/arrow/pull/12914 for this but no recent
activity.  Its not clear how much remaining work there is but it seems like
timing might be getting tight.  If you need this functionality consider
coordinating with the author to see if you can help move the functionality
forward.

Cheers,
Micah

On Fri, Jan 6, 2023 at 10:08 AM Jerry Adair <Je...@sas.com.invalid>
wrote:

> I am curious to know if the ADLS support for the Parquet C++ library will
> be included in the version 11 release that is scheduled for mid-January (at
> last check).  Does anyone have feedback?  We are in need of that capability.
>
> Thanks!
> Jerry
>
>

ADLS C++ support in next release (version 11)

Posted by Jerry Adair <Je...@sas.com.INVALID>.
I am curious to know if the ADLS support for the Parquet C++ library will be included in the version 11 release that is scheduled for mid-January (at last check).  Does anyone have feedback?  We are in need of that capability.

Thanks!
Jerry


Re: [DISCUSS] Updating what are considered reference implementations?

Posted by Antoine Pitrou <an...@python.org>.
Le 06/01/2023 à 18:58, Micah Kornfield a écrit :
> I'm having trouble finding it, but I think we've previously agreed that new
> features needed implementations in 2 reference implementations before
> approval (I had thought the community agreed on Java and C++ as the two
> implementations but I can't find the vote thread on it).

Note this wording talks about "two reference implementations" not "*the* 
two reference implementations". So there can be more than two reference 
implementations.

Regards

Antoine.