You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Will Jones <wi...@gmail.com> on 2023/02/01 19:27:22 UTC

[C++] Parquet and Arrow overlap

Hello,

A while back, the Parquet C++ implementation was merged into the Apache
Arrow monorepo [1]. As I understand it, this helped the development process
immensely. However, I am noticing some governance issues because of it.

First, it's not obvious where issues are supposed to be open: In Parquet
Jira or Arrow GitHub issues. Looking back at some of the original
discussion, it looks like the intention was

* use PARQUET-XXX for issues relating to Parquet core
> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> core (e.g. changes that are in parquet/arrow right now)
>

The README for the old parquet-cpp repo [3] states instead in it's
migration note:

 JIRA issues should continue to be opened in the PARQUET JIRA project.


Either way, it doesn't seem like this process is obvious to people. Perhaps
we could clarify this and add notices to Arrow's GitHub issues template?

Second, committer status is a little unclear. I am a committer on Arrow,
but not on Parquet right now. Does that mean I should only merge Parquet
C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
Parquet changes at all?

Also, are the contributions to Arrow C++ Parquet being actively reviewed
for potential new committers?

Best,

Will Jones

[1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
[2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
[3] https://github.com/apache/parquet-cpp

Re: [C++] Parquet and Arrow overlap

Posted by Micah Kornfield <em...@gmail.com>.
>
> I am a committer on Arrow,
> but not on Parquet right now. Does that mean I should only merge Parquet
> C++ PRs for code changes in parquet/arrow?

FWIW, This was the mode I was operating under.

My preference here would be to continue to operate under this mode for the
governance perspective.  As it is, it seems the current parquet PMC [1]
doesn't have a lot of active C++ contributors, so it might be harder to
continue growing out the C++ committer base.

Thanks,
Micah


[1] https://projects.apache.org/committee.html?parquet

On Thu, Feb 2, 2023 at 7:31 AM Will Jones <wi...@gmail.com> wrote:

> Day to day, I think having Parquet-cpp under the Apache Arrow project could
> make sense. Though I worry about two risks:
>
> 1. Would that lead to the governance of the format itself to be primarily
> the responsibility of the developers of Parquet-MR?
> 2. Would C++ developers interested in working with Parquet outside of Arrow
> recognize it as a relevant library?
>
> On Thu, Feb 2, 2023 at 6:03 AM Neal Richardson <
> neal.p.richardson@gmail.com>
> wrote:
>
> > Would it make sense to transfer all governance of the parquet-cpp
> > implementation to Apache Arrow? It seems like that's where we de facto
> are
> > already, so that would resolve these ambiguities and put it in line with
> > the Rust implementation.
> >
> > Would the Parquet PMC be opposed to formalizing this change?
> >
> > Neal
> >
> > On Thu, Feb 2, 2023 at 6:48 AM Raphael Taylor-Davies
> > <r....@googlemail.com.invalid> wrote:
> >
> > > Hi,
> > >
> > > > Does the parquet rust implementation have a similar issue?
> > >
> > > Similar to the C++ implementation, the Rust implementation lives under
> > > the Apache Arrow umbrella and does not have any direct affiliation with
> > > the Apache Parquet project that I am aware of, beyond using the same
> > > format specification. However, as almost all of the users and
> > > contributions are with respect to the arrow interfaces, and not the
> > > parquet record APIs, there perhaps isn't the same ambiguity as
> > > encountered with the C++ implementation. I would expect all issues to
> be
> > > raised in the arrow-rs repository, and a PARQUET Jira only raised,
> > > likely by myself or whoever is triaging the issue, if there is some
> > > issue/ambiguity pertaining to the format itself.
> > >
> > > Kind Regards,
> > >
> > > Raphael
> > >
> > > On 02/02/2023 01:58, Gang Wu wrote:
> > > > Hi Will,
> > > >
> > > > AFAIK, the Apache Parquet community no longer considers contribution
> to
> > > > parquet-cpp when promoting new committers after the donation to
> Apache
> > > > Arrow.
> > > >
> > > > It would be a dilemma for the parquet-cpp contributors if none of the
> > > > Apache Arrow community or Apache Parquet community recognizes their
> > work.
> > > >
> > > > Does the parquet rust implementation have a similar issue?
> > > >
> > > > Best,
> > > > Gang
> > > >
> > > > On Thu, Feb 2, 2023 at 3:27 AM Will Jones <wi...@gmail.com>
> > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> A while back, the Parquet C++ implementation was merged into the
> > Apache
> > > >> Arrow monorepo [1]. As I understand it, this helped the development
> > > process
> > > >> immensely. However, I am noticing some governance issues because of
> > it.
> > > >>
> > > >> First, it's not obvious where issues are supposed to be open: In
> > Parquet
> > > >> Jira or Arrow GitHub issues. Looking back at some of the original
> > > >> discussion, it looks like the intention was
> > > >>
> > > >> * use PARQUET-XXX for issues relating to Parquet core
> > > >>> * use ARROW-XXX for issues relation to Arrow's consumption of
> Parquet
> > > >>> core (e.g. changes that are in parquet/arrow right now)
> > > >>>
> > > >> The README for the old parquet-cpp repo [3] states instead in it's
> > > >> migration note:
> > > >>
> > > >>   JIRA issues should continue to be opened in the PARQUET JIRA
> > project.
> > > >>
> > > >>
> > > >> Either way, it doesn't seem like this process is obvious to people.
> > > Perhaps
> > > >> we could clarify this and add notices to Arrow's GitHub issues
> > template?
> > > >>
> > > >> Second, committer status is a little unclear. I am a committer on
> > Arrow,
> > > >> but not on Parquet right now. Does that mean I should only merge
> > Parquet
> > > >> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> > > >> Parquet changes at all?
> > > >>
> > > >> Also, are the contributions to Arrow C++ Parquet being actively
> > reviewed
> > > >> for potential new committers?
> > > >>
> > > >> Best,
> > > >>
> > > >> Will Jones
> > > >>
> > > >> [1]
> https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> > > >> [2]
> https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> > > >> [3] https://github.com/apache/parquet-cpp
> > > >>
> > >
> >
>

Re: [C++] Parquet and Arrow overlap

Posted by Micah Kornfield <em...@gmail.com>.
>
> I am a committer on Arrow,
> but not on Parquet right now. Does that mean I should only merge Parquet
> C++ PRs for code changes in parquet/arrow?

FWIW, This was the mode I was operating under.

My preference here would be to continue to operate under this mode for the
governance perspective.  As it is, it seems the current parquet PMC [1]
doesn't have a lot of active C++ contributors, so it might be harder to
continue growing out the C++ committer base.

Thanks,
Micah


[1] https://projects.apache.org/committee.html?parquet

On Thu, Feb 2, 2023 at 7:31 AM Will Jones <wi...@gmail.com> wrote:

> Day to day, I think having Parquet-cpp under the Apache Arrow project could
> make sense. Though I worry about two risks:
>
> 1. Would that lead to the governance of the format itself to be primarily
> the responsibility of the developers of Parquet-MR?
> 2. Would C++ developers interested in working with Parquet outside of Arrow
> recognize it as a relevant library?
>
> On Thu, Feb 2, 2023 at 6:03 AM Neal Richardson <
> neal.p.richardson@gmail.com>
> wrote:
>
> > Would it make sense to transfer all governance of the parquet-cpp
> > implementation to Apache Arrow? It seems like that's where we de facto
> are
> > already, so that would resolve these ambiguities and put it in line with
> > the Rust implementation.
> >
> > Would the Parquet PMC be opposed to formalizing this change?
> >
> > Neal
> >
> > On Thu, Feb 2, 2023 at 6:48 AM Raphael Taylor-Davies
> > <r....@googlemail.com.invalid> wrote:
> >
> > > Hi,
> > >
> > > > Does the parquet rust implementation have a similar issue?
> > >
> > > Similar to the C++ implementation, the Rust implementation lives under
> > > the Apache Arrow umbrella and does not have any direct affiliation with
> > > the Apache Parquet project that I am aware of, beyond using the same
> > > format specification. However, as almost all of the users and
> > > contributions are with respect to the arrow interfaces, and not the
> > > parquet record APIs, there perhaps isn't the same ambiguity as
> > > encountered with the C++ implementation. I would expect all issues to
> be
> > > raised in the arrow-rs repository, and a PARQUET Jira only raised,
> > > likely by myself or whoever is triaging the issue, if there is some
> > > issue/ambiguity pertaining to the format itself.
> > >
> > > Kind Regards,
> > >
> > > Raphael
> > >
> > > On 02/02/2023 01:58, Gang Wu wrote:
> > > > Hi Will,
> > > >
> > > > AFAIK, the Apache Parquet community no longer considers contribution
> to
> > > > parquet-cpp when promoting new committers after the donation to
> Apache
> > > > Arrow.
> > > >
> > > > It would be a dilemma for the parquet-cpp contributors if none of the
> > > > Apache Arrow community or Apache Parquet community recognizes their
> > work.
> > > >
> > > > Does the parquet rust implementation have a similar issue?
> > > >
> > > > Best,
> > > > Gang
> > > >
> > > > On Thu, Feb 2, 2023 at 3:27 AM Will Jones <wi...@gmail.com>
> > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> A while back, the Parquet C++ implementation was merged into the
> > Apache
> > > >> Arrow monorepo [1]. As I understand it, this helped the development
> > > process
> > > >> immensely. However, I am noticing some governance issues because of
> > it.
> > > >>
> > > >> First, it's not obvious where issues are supposed to be open: In
> > Parquet
> > > >> Jira or Arrow GitHub issues. Looking back at some of the original
> > > >> discussion, it looks like the intention was
> > > >>
> > > >> * use PARQUET-XXX for issues relating to Parquet core
> > > >>> * use ARROW-XXX for issues relation to Arrow's consumption of
> Parquet
> > > >>> core (e.g. changes that are in parquet/arrow right now)
> > > >>>
> > > >> The README for the old parquet-cpp repo [3] states instead in it's
> > > >> migration note:
> > > >>
> > > >>   JIRA issues should continue to be opened in the PARQUET JIRA
> > project.
> > > >>
> > > >>
> > > >> Either way, it doesn't seem like this process is obvious to people.
> > > Perhaps
> > > >> we could clarify this and add notices to Arrow's GitHub issues
> > template?
> > > >>
> > > >> Second, committer status is a little unclear. I am a committer on
> > Arrow,
> > > >> but not on Parquet right now. Does that mean I should only merge
> > Parquet
> > > >> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> > > >> Parquet changes at all?
> > > >>
> > > >> Also, are the contributions to Arrow C++ Parquet being actively
> > reviewed
> > > >> for potential new committers?
> > > >>
> > > >> Best,
> > > >>
> > > >> Will Jones
> > > >>
> > > >> [1]
> https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> > > >> [2]
> https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> > > >> [3] https://github.com/apache/parquet-cpp
> > > >>
> > >
> >
>

Re: [C++] Parquet and Arrow overlap

Posted by Will Jones <wi...@gmail.com>.
Day to day, I think having Parquet-cpp under the Apache Arrow project could
make sense. Though I worry about two risks:

1. Would that lead to the governance of the format itself to be primarily
the responsibility of the developers of Parquet-MR?
2. Would C++ developers interested in working with Parquet outside of Arrow
recognize it as a relevant library?

On Thu, Feb 2, 2023 at 6:03 AM Neal Richardson <ne...@gmail.com>
wrote:

> Would it make sense to transfer all governance of the parquet-cpp
> implementation to Apache Arrow? It seems like that's where we de facto are
> already, so that would resolve these ambiguities and put it in line with
> the Rust implementation.
>
> Would the Parquet PMC be opposed to formalizing this change?
>
> Neal
>
> On Thu, Feb 2, 2023 at 6:48 AM Raphael Taylor-Davies
> <r....@googlemail.com.invalid> wrote:
>
> > Hi,
> >
> > > Does the parquet rust implementation have a similar issue?
> >
> > Similar to the C++ implementation, the Rust implementation lives under
> > the Apache Arrow umbrella and does not have any direct affiliation with
> > the Apache Parquet project that I am aware of, beyond using the same
> > format specification. However, as almost all of the users and
> > contributions are with respect to the arrow interfaces, and not the
> > parquet record APIs, there perhaps isn't the same ambiguity as
> > encountered with the C++ implementation. I would expect all issues to be
> > raised in the arrow-rs repository, and a PARQUET Jira only raised,
> > likely by myself or whoever is triaging the issue, if there is some
> > issue/ambiguity pertaining to the format itself.
> >
> > Kind Regards,
> >
> > Raphael
> >
> > On 02/02/2023 01:58, Gang Wu wrote:
> > > Hi Will,
> > >
> > > AFAIK, the Apache Parquet community no longer considers contribution to
> > > parquet-cpp when promoting new committers after the donation to Apache
> > > Arrow.
> > >
> > > It would be a dilemma for the parquet-cpp contributors if none of the
> > > Apache Arrow community or Apache Parquet community recognizes their
> work.
> > >
> > > Does the parquet rust implementation have a similar issue?
> > >
> > > Best,
> > > Gang
> > >
> > > On Thu, Feb 2, 2023 at 3:27 AM Will Jones <wi...@gmail.com>
> > wrote:
> > >
> > >> Hello,
> > >>
> > >> A while back, the Parquet C++ implementation was merged into the
> Apache
> > >> Arrow monorepo [1]. As I understand it, this helped the development
> > process
> > >> immensely. However, I am noticing some governance issues because of
> it.
> > >>
> > >> First, it's not obvious where issues are supposed to be open: In
> Parquet
> > >> Jira or Arrow GitHub issues. Looking back at some of the original
> > >> discussion, it looks like the intention was
> > >>
> > >> * use PARQUET-XXX for issues relating to Parquet core
> > >>> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> > >>> core (e.g. changes that are in parquet/arrow right now)
> > >>>
> > >> The README for the old parquet-cpp repo [3] states instead in it's
> > >> migration note:
> > >>
> > >>   JIRA issues should continue to be opened in the PARQUET JIRA
> project.
> > >>
> > >>
> > >> Either way, it doesn't seem like this process is obvious to people.
> > Perhaps
> > >> we could clarify this and add notices to Arrow's GitHub issues
> template?
> > >>
> > >> Second, committer status is a little unclear. I am a committer on
> Arrow,
> > >> but not on Parquet right now. Does that mean I should only merge
> Parquet
> > >> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> > >> Parquet changes at all?
> > >>
> > >> Also, are the contributions to Arrow C++ Parquet being actively
> reviewed
> > >> for potential new committers?
> > >>
> > >> Best,
> > >>
> > >> Will Jones
> > >>
> > >> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> > >> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> > >> [3] https://github.com/apache/parquet-cpp
> > >>
> >
>

Re: [C++] Parquet and Arrow overlap

Posted by Will Jones <wi...@gmail.com>.
Day to day, I think having Parquet-cpp under the Apache Arrow project could
make sense. Though I worry about two risks:

1. Would that lead to the governance of the format itself to be primarily
the responsibility of the developers of Parquet-MR?
2. Would C++ developers interested in working with Parquet outside of Arrow
recognize it as a relevant library?

On Thu, Feb 2, 2023 at 6:03 AM Neal Richardson <ne...@gmail.com>
wrote:

> Would it make sense to transfer all governance of the parquet-cpp
> implementation to Apache Arrow? It seems like that's where we de facto are
> already, so that would resolve these ambiguities and put it in line with
> the Rust implementation.
>
> Would the Parquet PMC be opposed to formalizing this change?
>
> Neal
>
> On Thu, Feb 2, 2023 at 6:48 AM Raphael Taylor-Davies
> <r....@googlemail.com.invalid> wrote:
>
> > Hi,
> >
> > > Does the parquet rust implementation have a similar issue?
> >
> > Similar to the C++ implementation, the Rust implementation lives under
> > the Apache Arrow umbrella and does not have any direct affiliation with
> > the Apache Parquet project that I am aware of, beyond using the same
> > format specification. However, as almost all of the users and
> > contributions are with respect to the arrow interfaces, and not the
> > parquet record APIs, there perhaps isn't the same ambiguity as
> > encountered with the C++ implementation. I would expect all issues to be
> > raised in the arrow-rs repository, and a PARQUET Jira only raised,
> > likely by myself or whoever is triaging the issue, if there is some
> > issue/ambiguity pertaining to the format itself.
> >
> > Kind Regards,
> >
> > Raphael
> >
> > On 02/02/2023 01:58, Gang Wu wrote:
> > > Hi Will,
> > >
> > > AFAIK, the Apache Parquet community no longer considers contribution to
> > > parquet-cpp when promoting new committers after the donation to Apache
> > > Arrow.
> > >
> > > It would be a dilemma for the parquet-cpp contributors if none of the
> > > Apache Arrow community or Apache Parquet community recognizes their
> work.
> > >
> > > Does the parquet rust implementation have a similar issue?
> > >
> > > Best,
> > > Gang
> > >
> > > On Thu, Feb 2, 2023 at 3:27 AM Will Jones <wi...@gmail.com>
> > wrote:
> > >
> > >> Hello,
> > >>
> > >> A while back, the Parquet C++ implementation was merged into the
> Apache
> > >> Arrow monorepo [1]. As I understand it, this helped the development
> > process
> > >> immensely. However, I am noticing some governance issues because of
> it.
> > >>
> > >> First, it's not obvious where issues are supposed to be open: In
> Parquet
> > >> Jira or Arrow GitHub issues. Looking back at some of the original
> > >> discussion, it looks like the intention was
> > >>
> > >> * use PARQUET-XXX for issues relating to Parquet core
> > >>> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> > >>> core (e.g. changes that are in parquet/arrow right now)
> > >>>
> > >> The README for the old parquet-cpp repo [3] states instead in it's
> > >> migration note:
> > >>
> > >>   JIRA issues should continue to be opened in the PARQUET JIRA
> project.
> > >>
> > >>
> > >> Either way, it doesn't seem like this process is obvious to people.
> > Perhaps
> > >> we could clarify this and add notices to Arrow's GitHub issues
> template?
> > >>
> > >> Second, committer status is a little unclear. I am a committer on
> Arrow,
> > >> but not on Parquet right now. Does that mean I should only merge
> Parquet
> > >> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> > >> Parquet changes at all?
> > >>
> > >> Also, are the contributions to Arrow C++ Parquet being actively
> reviewed
> > >> for potential new committers?
> > >>
> > >> Best,
> > >>
> > >> Will Jones
> > >>
> > >> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> > >> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> > >> [3] https://github.com/apache/parquet-cpp
> > >>
> >
>

Re: [C++] Parquet and Arrow overlap

Posted by Neal Richardson <ne...@gmail.com>.
Would it make sense to transfer all governance of the parquet-cpp
implementation to Apache Arrow? It seems like that's where we de facto are
already, so that would resolve these ambiguities and put it in line with
the Rust implementation.

Would the Parquet PMC be opposed to formalizing this change?

Neal

On Thu, Feb 2, 2023 at 6:48 AM Raphael Taylor-Davies
<r....@googlemail.com.invalid> wrote:

> Hi,
>
> > Does the parquet rust implementation have a similar issue?
>
> Similar to the C++ implementation, the Rust implementation lives under
> the Apache Arrow umbrella and does not have any direct affiliation with
> the Apache Parquet project that I am aware of, beyond using the same
> format specification. However, as almost all of the users and
> contributions are with respect to the arrow interfaces, and not the
> parquet record APIs, there perhaps isn't the same ambiguity as
> encountered with the C++ implementation. I would expect all issues to be
> raised in the arrow-rs repository, and a PARQUET Jira only raised,
> likely by myself or whoever is triaging the issue, if there is some
> issue/ambiguity pertaining to the format itself.
>
> Kind Regards,
>
> Raphael
>
> On 02/02/2023 01:58, Gang Wu wrote:
> > Hi Will,
> >
> > AFAIK, the Apache Parquet community no longer considers contribution to
> > parquet-cpp when promoting new committers after the donation to Apache
> > Arrow.
> >
> > It would be a dilemma for the parquet-cpp contributors if none of the
> > Apache Arrow community or Apache Parquet community recognizes their work.
> >
> > Does the parquet rust implementation have a similar issue?
> >
> > Best,
> > Gang
> >
> > On Thu, Feb 2, 2023 at 3:27 AM Will Jones <wi...@gmail.com>
> wrote:
> >
> >> Hello,
> >>
> >> A while back, the Parquet C++ implementation was merged into the Apache
> >> Arrow monorepo [1]. As I understand it, this helped the development
> process
> >> immensely. However, I am noticing some governance issues because of it.
> >>
> >> First, it's not obvious where issues are supposed to be open: In Parquet
> >> Jira or Arrow GitHub issues. Looking back at some of the original
> >> discussion, it looks like the intention was
> >>
> >> * use PARQUET-XXX for issues relating to Parquet core
> >>> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> >>> core (e.g. changes that are in parquet/arrow right now)
> >>>
> >> The README for the old parquet-cpp repo [3] states instead in it's
> >> migration note:
> >>
> >>   JIRA issues should continue to be opened in the PARQUET JIRA project.
> >>
> >>
> >> Either way, it doesn't seem like this process is obvious to people.
> Perhaps
> >> we could clarify this and add notices to Arrow's GitHub issues template?
> >>
> >> Second, committer status is a little unclear. I am a committer on Arrow,
> >> but not on Parquet right now. Does that mean I should only merge Parquet
> >> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> >> Parquet changes at all?
> >>
> >> Also, are the contributions to Arrow C++ Parquet being actively reviewed
> >> for potential new committers?
> >>
> >> Best,
> >>
> >> Will Jones
> >>
> >> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> >> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> >> [3] https://github.com/apache/parquet-cpp
> >>
>

Re: [C++] Parquet and Arrow overlap

Posted by Neal Richardson <ne...@gmail.com>.
Would it make sense to transfer all governance of the parquet-cpp
implementation to Apache Arrow? It seems like that's where we de facto are
already, so that would resolve these ambiguities and put it in line with
the Rust implementation.

Would the Parquet PMC be opposed to formalizing this change?

Neal

On Thu, Feb 2, 2023 at 6:48 AM Raphael Taylor-Davies
<r....@googlemail.com.invalid> wrote:

> Hi,
>
> > Does the parquet rust implementation have a similar issue?
>
> Similar to the C++ implementation, the Rust implementation lives under
> the Apache Arrow umbrella and does not have any direct affiliation with
> the Apache Parquet project that I am aware of, beyond using the same
> format specification. However, as almost all of the users and
> contributions are with respect to the arrow interfaces, and not the
> parquet record APIs, there perhaps isn't the same ambiguity as
> encountered with the C++ implementation. I would expect all issues to be
> raised in the arrow-rs repository, and a PARQUET Jira only raised,
> likely by myself or whoever is triaging the issue, if there is some
> issue/ambiguity pertaining to the format itself.
>
> Kind Regards,
>
> Raphael
>
> On 02/02/2023 01:58, Gang Wu wrote:
> > Hi Will,
> >
> > AFAIK, the Apache Parquet community no longer considers contribution to
> > parquet-cpp when promoting new committers after the donation to Apache
> > Arrow.
> >
> > It would be a dilemma for the parquet-cpp contributors if none of the
> > Apache Arrow community or Apache Parquet community recognizes their work.
> >
> > Does the parquet rust implementation have a similar issue?
> >
> > Best,
> > Gang
> >
> > On Thu, Feb 2, 2023 at 3:27 AM Will Jones <wi...@gmail.com>
> wrote:
> >
> >> Hello,
> >>
> >> A while back, the Parquet C++ implementation was merged into the Apache
> >> Arrow monorepo [1]. As I understand it, this helped the development
> process
> >> immensely. However, I am noticing some governance issues because of it.
> >>
> >> First, it's not obvious where issues are supposed to be open: In Parquet
> >> Jira or Arrow GitHub issues. Looking back at some of the original
> >> discussion, it looks like the intention was
> >>
> >> * use PARQUET-XXX for issues relating to Parquet core
> >>> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> >>> core (e.g. changes that are in parquet/arrow right now)
> >>>
> >> The README for the old parquet-cpp repo [3] states instead in it's
> >> migration note:
> >>
> >>   JIRA issues should continue to be opened in the PARQUET JIRA project.
> >>
> >>
> >> Either way, it doesn't seem like this process is obvious to people.
> Perhaps
> >> we could clarify this and add notices to Arrow's GitHub issues template?
> >>
> >> Second, committer status is a little unclear. I am a committer on Arrow,
> >> but not on Parquet right now. Does that mean I should only merge Parquet
> >> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> >> Parquet changes at all?
> >>
> >> Also, are the contributions to Arrow C++ Parquet being actively reviewed
> >> for potential new committers?
> >>
> >> Best,
> >>
> >> Will Jones
> >>
> >> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> >> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> >> [3] https://github.com/apache/parquet-cpp
> >>
>

Re: [C++] Parquet and Arrow overlap

Posted by Raphael Taylor-Davies <r....@googlemail.com.INVALID>.
Hi,

> Does the parquet rust implementation have a similar issue?

Similar to the C++ implementation, the Rust implementation lives under 
the Apache Arrow umbrella and does not have any direct affiliation with 
the Apache Parquet project that I am aware of, beyond using the same 
format specification. However, as almost all of the users and 
contributions are with respect to the arrow interfaces, and not the 
parquet record APIs, there perhaps isn't the same ambiguity as 
encountered with the C++ implementation. I would expect all issues to be 
raised in the arrow-rs repository, and a PARQUET Jira only raised, 
likely by myself or whoever is triaging the issue, if there is some 
issue/ambiguity pertaining to the format itself.

Kind Regards,

Raphael

On 02/02/2023 01:58, Gang Wu wrote:
> Hi Will,
>
> AFAIK, the Apache Parquet community no longer considers contribution to
> parquet-cpp when promoting new committers after the donation to Apache
> Arrow.
>
> It would be a dilemma for the parquet-cpp contributors if none of the
> Apache Arrow community or Apache Parquet community recognizes their work.
>
> Does the parquet rust implementation have a similar issue?
>
> Best,
> Gang
>
> On Thu, Feb 2, 2023 at 3:27 AM Will Jones <wi...@gmail.com> wrote:
>
>> Hello,
>>
>> A while back, the Parquet C++ implementation was merged into the Apache
>> Arrow monorepo [1]. As I understand it, this helped the development process
>> immensely. However, I am noticing some governance issues because of it.
>>
>> First, it's not obvious where issues are supposed to be open: In Parquet
>> Jira or Arrow GitHub issues. Looking back at some of the original
>> discussion, it looks like the intention was
>>
>> * use PARQUET-XXX for issues relating to Parquet core
>>> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
>>> core (e.g. changes that are in parquet/arrow right now)
>>>
>> The README for the old parquet-cpp repo [3] states instead in it's
>> migration note:
>>
>>   JIRA issues should continue to be opened in the PARQUET JIRA project.
>>
>>
>> Either way, it doesn't seem like this process is obvious to people. Perhaps
>> we could clarify this and add notices to Arrow's GitHub issues template?
>>
>> Second, committer status is a little unclear. I am a committer on Arrow,
>> but not on Parquet right now. Does that mean I should only merge Parquet
>> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
>> Parquet changes at all?
>>
>> Also, are the contributions to Arrow C++ Parquet being actively reviewed
>> for potential new committers?
>>
>> Best,
>>
>> Will Jones
>>
>> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
>> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
>> [3] https://github.com/apache/parquet-cpp
>>

Re: [C++] Parquet and Arrow overlap

Posted by Gang Wu <ga...@apache.org>.
Hi Will,

AFAIK, the Apache Parquet community no longer considers contribution to
parquet-cpp when promoting new committers after the donation to Apache
Arrow.

It would be a dilemma for the parquet-cpp contributors if none of the
Apache Arrow community or Apache Parquet community recognizes their work.

Does the parquet rust implementation have a similar issue?

Best,
Gang

On Thu, Feb 2, 2023 at 3:27 AM Will Jones <wi...@gmail.com> wrote:

> Hello,
>
> A while back, the Parquet C++ implementation was merged into the Apache
> Arrow monorepo [1]. As I understand it, this helped the development process
> immensely. However, I am noticing some governance issues because of it.
>
> First, it's not obvious where issues are supposed to be open: In Parquet
> Jira or Arrow GitHub issues. Looking back at some of the original
> discussion, it looks like the intention was
>
> * use PARQUET-XXX for issues relating to Parquet core
> > * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> > core (e.g. changes that are in parquet/arrow right now)
> >
>
> The README for the old parquet-cpp repo [3] states instead in it's
> migration note:
>
>  JIRA issues should continue to be opened in the PARQUET JIRA project.
>
>
> Either way, it doesn't seem like this process is obvious to people. Perhaps
> we could clarify this and add notices to Arrow's GitHub issues template?
>
> Second, committer status is a little unclear. I am a committer on Arrow,
> but not on Parquet right now. Does that mean I should only merge Parquet
> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> Parquet changes at all?
>
> Also, are the contributions to Arrow C++ Parquet being actively reviewed
> for potential new committers?
>
> Best,
>
> Will Jones
>
> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> [3] https://github.com/apache/parquet-cpp
>

Re: [C++] Parquet and Arrow overlap

Posted by Gang Wu <ga...@apache.org>.
Hi Will,

AFAIK, the Apache Parquet community no longer considers contribution to
parquet-cpp when promoting new committers after the donation to Apache
Arrow.

It would be a dilemma for the parquet-cpp contributors if none of the
Apache Arrow community or Apache Parquet community recognizes their work.

Does the parquet rust implementation have a similar issue?

Best,
Gang

On Thu, Feb 2, 2023 at 3:27 AM Will Jones <wi...@gmail.com> wrote:

> Hello,
>
> A while back, the Parquet C++ implementation was merged into the Apache
> Arrow monorepo [1]. As I understand it, this helped the development process
> immensely. However, I am noticing some governance issues because of it.
>
> First, it's not obvious where issues are supposed to be open: In Parquet
> Jira or Arrow GitHub issues. Looking back at some of the original
> discussion, it looks like the intention was
>
> * use PARQUET-XXX for issues relating to Parquet core
> > * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> > core (e.g. changes that are in parquet/arrow right now)
> >
>
> The README for the old parquet-cpp repo [3] states instead in it's
> migration note:
>
>  JIRA issues should continue to be opened in the PARQUET JIRA project.
>
>
> Either way, it doesn't seem like this process is obvious to people. Perhaps
> we could clarify this and add notices to Arrow's GitHub issues template?
>
> Second, committer status is a little unclear. I am a committer on Arrow,
> but not on Parquet right now. Does that mean I should only merge Parquet
> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> Parquet changes at all?
>
> Also, are the contributions to Arrow C++ Parquet being actively reviewed
> for potential new committers?
>
> Best,
>
> Will Jones
>
> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> [3] https://github.com/apache/parquet-cpp
>

Re: Fwd: [C++] Parquet and Arrow overlap

Posted by Raúl Cumplido <ra...@apache.org>.
For context there was a single issue on the Parquet JIRA merged for 15.0.0
and one for 16.0.0. All the rest of Parquet issues that were merged during
those releases were already tracked on GitHub so in reality we are already
doing it so being able to get rid of all the JIRA integrations on our merge
and release scripts would be great.

Thanks,
Raúl

El mié, 24 abr 2024, 14:42, Uwe L. Korn <uw...@xhochy.com> escribió:

> > Should we consider
> > Parquet developers from other projects than parquet-mr as Parquet
> commuters?
>
> We are doing this (speaking as a Parquet PMC who didn't work on
> parquet-mr, but parquet-cpp).
>
> Best
> Uwe
>
> On Wed, Apr 24, 2024, at 2:38 PM, Gang Wu wrote:
> > +1 for moving parquet-cpp issues from Apache Jira to Arrow's GitHub
> issue.
> >
> > Besides, I want to echo Will's question in the thread. Should we consider
> > Parquet developers from other projects than parquet-mr as Parquet
> commiters?
> > Currently apache/parquet-format and apache/parquet-testing repositories
> are
> > solely governed by Apache Parquet PMC. It would be better for the entire
> > Parquet community if developers with sufficient contributions to open
> source
> > Parquet projects (including but not limited to parquet-cpp, arrow-rs,
> cudf,
> > etc.)
> > can be considered as Parquet committer and PMC.
> >
> > Best,
> > Gang
> >
> > On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn <uw...@xhochy.com> wrote:
> >
> >> I would be very supportive of this move. The Parquet C++ development has
> >> been under the umbrella of the Arrow repository for more than five(six?)
> >> years now. Thus, the issues should also be aligned with the Arrow
> project.
> >>
> >> Uwe
> >>
> >> On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote:
> >> > Bumping this thread again to see if there is will to call for a vote
> and
> >> > move parquet-cpp issues from Apache Jira to Arrow's GitHub issue as
> was
> >> > done for Arrow.
> >> > I'm willing to do the move as I already did it for Arrow.
> >> >
> >> > Rok
> >> >
> >> > On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <
> emkornfield@apache.org>
> >> > wrote:
> >> >
> >> >> Bumping this thread again to see in any Parquet PMC members can chime
> >> >> in/maybe start a formal vote to move governance of Parquet-CPP under
> the
> >> >> umbrella.
> >> >>
> >> >> -Micah
> >> >>
> >> >> On 2023/02/02 10:34:25 Antoine Pitrou wrote:
> >> >> >
> >> >> >
> >> >> > Hi Will,
> >> >> >
> >> >> > Le 01/02/2023 à 20:27, Will Jones a écrit :
> >> >> > >
> >> >> > > First, it's not obvious where issues are supposed to be open: In
> >> >> Parquet
> >> >> > > Jira or Arrow GitHub issues. Looking back at some of the original
> >> >> > > discussion, it looks like the intention was
> >> >> > >
> >> >> > > * use PARQUET-XXX for issues relating to Parquet core
> >> >> > >> * use ARROW-XXX for issues relation to Arrow's consumption of
> >> Parquet
> >> >> > >> core (e.g. changes that are in parquet/arrow right now)
> >> >> > >>
> >> >> > > The README for the old parquet-cpp repo [3] states instead in
> it's
> >> >> > > migration note:
> >> >> > >
> >> >> > >   JIRA issues should continue to be opened in the PARQUET JIRA
> >> project.
> >> >> > >
> >> >> > > Either way, it doesn't seem like this process is obvious to
> people.
> >> >> Perhaps
> >> >> > > we could clarify this and add notices to Arrow's GitHub issues
> >> >> template?
> >> >> >
> >> >> > I agree we should clarify this. I have no personal preference, but
> I
> >> >> will note
> >> >> > that Github issues decrease friction as having a GH account is
> already
> >> >> necessary
> >> >> > for submitting PRs.
> >> >> >
> >> >> > > Second, committer status is a little unclear. I am a committer on
> >> >> Arrow,
> >> >> > > but not on Parquet right now. Does that mean I should only merge
> >> >> Parquet
> >> >> > > C++ PRs for code changes in parquet/arrow? Or that I shouldn't
> merge
> >> >> > > Parquet changes at all?
> >> >> >
> >> >> > Since Parquet C++ is part of Arrow C++, you are allowed to merge
> >> Parquet
> >> >> C++
> >> >> > changes. As always you should ensure you have sufficient
> understanding
> >> >> of the
> >> >> > contribution, and that it follows established practices:
> >> >> > https://arrow.apache.org/docs/dev/developers/reviewing.html
> >> >> >
> >> >> > > Also, are the contributions to Arrow C++ Parquet being actively
> >> >> reviewed
> >> >> > > for potential new committers?
> >> >> >
> >> >> > I would certainly do.
> >> >> >
> >> >> > Regards
> >> >> >
> >> >> > Antoine.
> >> >> >
> >> >> >
> >> >>
> >>
>

Re: Fwd: [C++] Parquet and Arrow overlap

Posted by Gang Wu <us...@gmail.com>.
I know we have some non-Java committers and PMCs. But after the parquet-cpp
donation, it seems that no one worked on Parquet from arrow (cpp, rust, go,
etc.)
and other projects are promoted as a Parquet committer. It would be
inconvenient
for non-Java Parquet developers to work with apache/parquet-format and
apache/parquet-testing repositories. Furthermore, votes from these
developers
are not binding for a format change in the ML.

Best,
Gang

On Wed, Apr 24, 2024 at 8:42 PM Uwe L. Korn <uw...@xhochy.com> wrote:

> > Should we consider
> > Parquet developers from other projects than parquet-mr as Parquet
> commuters?
>
> We are doing this (speaking as a Parquet PMC who didn't work on
> parquet-mr, but parquet-cpp).
>
> Best
> Uwe
>
> On Wed, Apr 24, 2024, at 2:38 PM, Gang Wu wrote:
> > +1 for moving parquet-cpp issues from Apache Jira to Arrow's GitHub
> issue.
> >
> > Besides, I want to echo Will's question in the thread. Should we consider
> > Parquet developers from other projects than parquet-mr as Parquet
> commiters?
> > Currently apache/parquet-format and apache/parquet-testing repositories
> are
> > solely governed by Apache Parquet PMC. It would be better for the entire
> > Parquet community if developers with sufficient contributions to open
> source
> > Parquet projects (including but not limited to parquet-cpp, arrow-rs,
> cudf,
> > etc.)
> > can be considered as Parquet committer and PMC.
> >
> > Best,
> > Gang
> >
> > On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn <uw...@xhochy.com> wrote:
> >
> >> I would be very supportive of this move. The Parquet C++ development has
> >> been under the umbrella of the Arrow repository for more than five(six?)
> >> years now. Thus, the issues should also be aligned with the Arrow
> project.
> >>
> >> Uwe
> >>
> >> On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote:
> >> > Bumping this thread again to see if there is will to call for a vote
> and
> >> > move parquet-cpp issues from Apache Jira to Arrow's GitHub issue as
> was
> >> > done for Arrow.
> >> > I'm willing to do the move as I already did it for Arrow.
> >> >
> >> > Rok
> >> >
> >> > On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <
> emkornfield@apache.org>
> >> > wrote:
> >> >
> >> >> Bumping this thread again to see in any Parquet PMC members can chime
> >> >> in/maybe start a formal vote to move governance of Parquet-CPP under
> the
> >> >> umbrella.
> >> >>
> >> >> -Micah
> >> >>
> >> >> On 2023/02/02 10:34:25 Antoine Pitrou wrote:
> >> >> >
> >> >> >
> >> >> > Hi Will,
> >> >> >
> >> >> > Le 01/02/2023 à 20:27, Will Jones a écrit :
> >> >> > >
> >> >> > > First, it's not obvious where issues are supposed to be open: In
> >> >> Parquet
> >> >> > > Jira or Arrow GitHub issues. Looking back at some of the original
> >> >> > > discussion, it looks like the intention was
> >> >> > >
> >> >> > > * use PARQUET-XXX for issues relating to Parquet core
> >> >> > >> * use ARROW-XXX for issues relation to Arrow's consumption of
> >> Parquet
> >> >> > >> core (e.g. changes that are in parquet/arrow right now)
> >> >> > >>
> >> >> > > The README for the old parquet-cpp repo [3] states instead in
> it's
> >> >> > > migration note:
> >> >> > >
> >> >> > >   JIRA issues should continue to be opened in the PARQUET JIRA
> >> project.
> >> >> > >
> >> >> > > Either way, it doesn't seem like this process is obvious to
> people.
> >> >> Perhaps
> >> >> > > we could clarify this and add notices to Arrow's GitHub issues
> >> >> template?
> >> >> >
> >> >> > I agree we should clarify this. I have no personal preference, but
> I
> >> >> will note
> >> >> > that Github issues decrease friction as having a GH account is
> already
> >> >> necessary
> >> >> > for submitting PRs.
> >> >> >
> >> >> > > Second, committer status is a little unclear. I am a committer on
> >> >> Arrow,
> >> >> > > but not on Parquet right now. Does that mean I should only merge
> >> >> Parquet
> >> >> > > C++ PRs for code changes in parquet/arrow? Or that I shouldn't
> merge
> >> >> > > Parquet changes at all?
> >> >> >
> >> >> > Since Parquet C++ is part of Arrow C++, you are allowed to merge
> >> Parquet
> >> >> C++
> >> >> > changes. As always you should ensure you have sufficient
> understanding
> >> >> of the
> >> >> > contribution, and that it follows established practices:
> >> >> > https://arrow.apache.org/docs/dev/developers/reviewing.html
> >> >> >
> >> >> > > Also, are the contributions to Arrow C++ Parquet being actively
> >> >> reviewed
> >> >> > > for potential new committers?
> >> >> >
> >> >> > I would certainly do.
> >> >> >
> >> >> > Regards
> >> >> >
> >> >> > Antoine.
> >> >> >
> >> >> >
> >> >>
> >>
>

Re: Fwd: [C++] Parquet and Arrow overlap

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
> Should we consider
> Parquet developers from other projects than parquet-mr as Parquet commuters?

We are doing this (speaking as a Parquet PMC who didn't work on parquet-mr, but parquet-cpp).

Best
Uwe

On Wed, Apr 24, 2024, at 2:38 PM, Gang Wu wrote:
> +1 for moving parquet-cpp issues from Apache Jira to Arrow's GitHub issue.
>
> Besides, I want to echo Will's question in the thread. Should we consider
> Parquet developers from other projects than parquet-mr as Parquet commiters?
> Currently apache/parquet-format and apache/parquet-testing repositories are
> solely governed by Apache Parquet PMC. It would be better for the entire
> Parquet community if developers with sufficient contributions to open source
> Parquet projects (including but not limited to parquet-cpp, arrow-rs, cudf,
> etc.)
> can be considered as Parquet committer and PMC.
>
> Best,
> Gang
>
> On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn <uw...@xhochy.com> wrote:
>
>> I would be very supportive of this move. The Parquet C++ development has
>> been under the umbrella of the Arrow repository for more than five(six?)
>> years now. Thus, the issues should also be aligned with the Arrow project.
>>
>> Uwe
>>
>> On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote:
>> > Bumping this thread again to see if there is will to call for a vote and
>> > move parquet-cpp issues from Apache Jira to Arrow's GitHub issue as was
>> > done for Arrow.
>> > I'm willing to do the move as I already did it for Arrow.
>> >
>> > Rok
>> >
>> > On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <em...@apache.org>
>> > wrote:
>> >
>> >> Bumping this thread again to see in any Parquet PMC members can chime
>> >> in/maybe start a formal vote to move governance of Parquet-CPP under the
>> >> umbrella.
>> >>
>> >> -Micah
>> >>
>> >> On 2023/02/02 10:34:25 Antoine Pitrou wrote:
>> >> >
>> >> >
>> >> > Hi Will,
>> >> >
>> >> > Le 01/02/2023 à 20:27, Will Jones a écrit :
>> >> > >
>> >> > > First, it's not obvious where issues are supposed to be open: In
>> >> Parquet
>> >> > > Jira or Arrow GitHub issues. Looking back at some of the original
>> >> > > discussion, it looks like the intention was
>> >> > >
>> >> > > * use PARQUET-XXX for issues relating to Parquet core
>> >> > >> * use ARROW-XXX for issues relation to Arrow's consumption of
>> Parquet
>> >> > >> core (e.g. changes that are in parquet/arrow right now)
>> >> > >>
>> >> > > The README for the old parquet-cpp repo [3] states instead in it's
>> >> > > migration note:
>> >> > >
>> >> > >   JIRA issues should continue to be opened in the PARQUET JIRA
>> project.
>> >> > >
>> >> > > Either way, it doesn't seem like this process is obvious to people.
>> >> Perhaps
>> >> > > we could clarify this and add notices to Arrow's GitHub issues
>> >> template?
>> >> >
>> >> > I agree we should clarify this. I have no personal preference, but I
>> >> will note
>> >> > that Github issues decrease friction as having a GH account is already
>> >> necessary
>> >> > for submitting PRs.
>> >> >
>> >> > > Second, committer status is a little unclear. I am a committer on
>> >> Arrow,
>> >> > > but not on Parquet right now. Does that mean I should only merge
>> >> Parquet
>> >> > > C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
>> >> > > Parquet changes at all?
>> >> >
>> >> > Since Parquet C++ is part of Arrow C++, you are allowed to merge
>> Parquet
>> >> C++
>> >> > changes. As always you should ensure you have sufficient understanding
>> >> of the
>> >> > contribution, and that it follows established practices:
>> >> > https://arrow.apache.org/docs/dev/developers/reviewing.html
>> >> >
>> >> > > Also, are the contributions to Arrow C++ Parquet being actively
>> >> reviewed
>> >> > > for potential new committers?
>> >> >
>> >> > I would certainly do.
>> >> >
>> >> > Regards
>> >> >
>> >> > Antoine.
>> >> >
>> >> >
>> >>
>>

Re: Fwd: [C++] Parquet and Arrow overlap

Posted by Gang Wu <us...@gmail.com>.
+1 for moving parquet-cpp issues from Apache Jira to Arrow's GitHub issue.

Besides, I want to echo Will's question in the thread. Should we consider
Parquet developers from other projects than parquet-mr as Parquet commiters?
Currently apache/parquet-format and apache/parquet-testing repositories are
solely governed by Apache Parquet PMC. It would be better for the entire
Parquet community if developers with sufficient contributions to open source
Parquet projects (including but not limited to parquet-cpp, arrow-rs, cudf,
etc.)
can be considered as Parquet committer and PMC.

Best,
Gang

On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn <uw...@xhochy.com> wrote:

> I would be very supportive of this move. The Parquet C++ development has
> been under the umbrella of the Arrow repository for more than five(six?)
> years now. Thus, the issues should also be aligned with the Arrow project.
>
> Uwe
>
> On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote:
> > Bumping this thread again to see if there is will to call for a vote and
> > move parquet-cpp issues from Apache Jira to Arrow's GitHub issue as was
> > done for Arrow.
> > I'm willing to do the move as I already did it for Arrow.
> >
> > Rok
> >
> > On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <em...@apache.org>
> > wrote:
> >
> >> Bumping this thread again to see in any Parquet PMC members can chime
> >> in/maybe start a formal vote to move governance of Parquet-CPP under the
> >> umbrella.
> >>
> >> -Micah
> >>
> >> On 2023/02/02 10:34:25 Antoine Pitrou wrote:
> >> >
> >> >
> >> > Hi Will,
> >> >
> >> > Le 01/02/2023 à 20:27, Will Jones a écrit :
> >> > >
> >> > > First, it's not obvious where issues are supposed to be open: In
> >> Parquet
> >> > > Jira or Arrow GitHub issues. Looking back at some of the original
> >> > > discussion, it looks like the intention was
> >> > >
> >> > > * use PARQUET-XXX for issues relating to Parquet core
> >> > >> * use ARROW-XXX for issues relation to Arrow's consumption of
> Parquet
> >> > >> core (e.g. changes that are in parquet/arrow right now)
> >> > >>
> >> > > The README for the old parquet-cpp repo [3] states instead in it's
> >> > > migration note:
> >> > >
> >> > >   JIRA issues should continue to be opened in the PARQUET JIRA
> project.
> >> > >
> >> > > Either way, it doesn't seem like this process is obvious to people.
> >> Perhaps
> >> > > we could clarify this and add notices to Arrow's GitHub issues
> >> template?
> >> >
> >> > I agree we should clarify this. I have no personal preference, but I
> >> will note
> >> > that Github issues decrease friction as having a GH account is already
> >> necessary
> >> > for submitting PRs.
> >> >
> >> > > Second, committer status is a little unclear. I am a committer on
> >> Arrow,
> >> > > but not on Parquet right now. Does that mean I should only merge
> >> Parquet
> >> > > C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> >> > > Parquet changes at all?
> >> >
> >> > Since Parquet C++ is part of Arrow C++, you are allowed to merge
> Parquet
> >> C++
> >> > changes. As always you should ensure you have sufficient understanding
> >> of the
> >> > contribution, and that it follows established practices:
> >> > https://arrow.apache.org/docs/dev/developers/reviewing.html
> >> >
> >> > > Also, are the contributions to Arrow C++ Parquet being actively
> >> reviewed
> >> > > for potential new committers?
> >> >
> >> > I would certainly do.
> >> >
> >> > Regards
> >> >
> >> > Antoine.
> >> >
> >> >
> >>
>

Re: Fwd: [C++] Parquet and Arrow overlap

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
I would be very supportive of this move. The Parquet C++ development has been under the umbrella of the Arrow repository for more than five(six?) years now. Thus, the issues should also be aligned with the Arrow project.

Uwe

On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote:
> Bumping this thread again to see if there is will to call for a vote and
> move parquet-cpp issues from Apache Jira to Arrow's GitHub issue as was
> done for Arrow.
> I'm willing to do the move as I already did it for Arrow.
>
> Rok
>
> On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <em...@apache.org>
> wrote:
>
>> Bumping this thread again to see in any Parquet PMC members can chime
>> in/maybe start a formal vote to move governance of Parquet-CPP under the
>> umbrella.
>>
>> -Micah
>>
>> On 2023/02/02 10:34:25 Antoine Pitrou wrote:
>> >
>> >
>> > Hi Will,
>> >
>> > Le 01/02/2023 à 20:27, Will Jones a écrit :
>> > >
>> > > First, it's not obvious where issues are supposed to be open: In
>> Parquet
>> > > Jira or Arrow GitHub issues. Looking back at some of the original
>> > > discussion, it looks like the intention was
>> > >
>> > > * use PARQUET-XXX for issues relating to Parquet core
>> > >> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
>> > >> core (e.g. changes that are in parquet/arrow right now)
>> > >>
>> > > The README for the old parquet-cpp repo [3] states instead in it's
>> > > migration note:
>> > >
>> > >   JIRA issues should continue to be opened in the PARQUET JIRA project.
>> > >
>> > > Either way, it doesn't seem like this process is obvious to people.
>> Perhaps
>> > > we could clarify this and add notices to Arrow's GitHub issues
>> template?
>> >
>> > I agree we should clarify this. I have no personal preference, but I
>> will note
>> > that Github issues decrease friction as having a GH account is already
>> necessary
>> > for submitting PRs.
>> >
>> > > Second, committer status is a little unclear. I am a committer on
>> Arrow,
>> > > but not on Parquet right now. Does that mean I should only merge
>> Parquet
>> > > C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
>> > > Parquet changes at all?
>> >
>> > Since Parquet C++ is part of Arrow C++, you are allowed to merge Parquet
>> C++
>> > changes. As always you should ensure you have sufficient understanding
>> of the
>> > contribution, and that it follows established practices:
>> > https://arrow.apache.org/docs/dev/developers/reviewing.html
>> >
>> > > Also, are the contributions to Arrow C++ Parquet being actively
>> reviewed
>> > > for potential new committers?
>> >
>> > I would certainly do.
>> >
>> > Regards
>> >
>> > Antoine.
>> >
>> >
>>

Re: Fwd: [C++] Parquet and Arrow overlap

Posted by Rok Mihevc <ro...@gmail.com>.
Bumping this thread again to see if there is will to call for a vote and
move parquet-cpp issues from Apache Jira to Arrow's GitHub issue as was
done for Arrow.
I'm willing to do the move as I already did it for Arrow.

Rok

On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <em...@apache.org>
wrote:

> Bumping this thread again to see in any Parquet PMC members can chime
> in/maybe start a formal vote to move governance of Parquet-CPP under the
> umbrella.
>
> -Micah
>
> On 2023/02/02 10:34:25 Antoine Pitrou wrote:
> >
> >
> > Hi Will,
> >
> > Le 01/02/2023 à 20:27, Will Jones a écrit :
> > >
> > > First, it's not obvious where issues are supposed to be open: In
> Parquet
> > > Jira or Arrow GitHub issues. Looking back at some of the original
> > > discussion, it looks like the intention was
> > >
> > > * use PARQUET-XXX for issues relating to Parquet core
> > >> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> > >> core (e.g. changes that are in parquet/arrow right now)
> > >>
> > > The README for the old parquet-cpp repo [3] states instead in it's
> > > migration note:
> > >
> > >   JIRA issues should continue to be opened in the PARQUET JIRA project.
> > >
> > > Either way, it doesn't seem like this process is obvious to people.
> Perhaps
> > > we could clarify this and add notices to Arrow's GitHub issues
> template?
> >
> > I agree we should clarify this. I have no personal preference, but I
> will note
> > that Github issues decrease friction as having a GH account is already
> necessary
> > for submitting PRs.
> >
> > > Second, committer status is a little unclear. I am a committer on
> Arrow,
> > > but not on Parquet right now. Does that mean I should only merge
> Parquet
> > > C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> > > Parquet changes at all?
> >
> > Since Parquet C++ is part of Arrow C++, you are allowed to merge Parquet
> C++
> > changes. As always you should ensure you have sufficient understanding
> of the
> > contribution, and that it follows established practices:
> > https://arrow.apache.org/docs/dev/developers/reviewing.html
> >
> > > Also, are the contributions to Arrow C++ Parquet being actively
> reviewed
> > > for potential new committers?
> >
> > I would certainly do.
> >
> > Regards
> >
> > Antoine.
> >
> >
>

Re: Fwd: [C++] Parquet and Arrow overlap

Posted by Micah Kornfield <em...@apache.org>.
Bumping this thread again to see in any Parquet PMC members can chime in/maybe start a formal vote to move governance of Parquet-CPP under the umbrella.

-Micah

On 2023/02/02 10:34:25 Antoine Pitrou wrote:
> 
> 
> Hi Will,
> 
> Le 01/02/2023 à 20:27, Will Jones a écrit :
> > 
> > First, it's not obvious where issues are supposed to be open: In Parquet
> > Jira or Arrow GitHub issues. Looking back at some of the original
> > discussion, it looks like the intention was
> > 
> > * use PARQUET-XXX for issues relating to Parquet core
> >> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> >> core (e.g. changes that are in parquet/arrow right now)
> >>
> > The README for the old parquet-cpp repo [3] states instead in it's
> > migration note:
> > 
> >   JIRA issues should continue to be opened in the PARQUET JIRA project.
> > 
> > Either way, it doesn't seem like this process is obvious to people. Perhaps
> > we could clarify this and add notices to Arrow's GitHub issues template?
> 
> I agree we should clarify this. I have no personal preference, but I will note
> that Github issues decrease friction as having a GH account is already necessary
> for submitting PRs.
> 
> > Second, committer status is a little unclear. I am a committer on Arrow,
> > but not on Parquet right now. Does that mean I should only merge Parquet
> > C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> > Parquet changes at all?
> 
> Since Parquet C++ is part of Arrow C++, you are allowed to merge Parquet C++
> changes. As always you should ensure you have sufficient understanding of the
> contribution, and that it follows established practices:
> https://arrow.apache.org/docs/dev/developers/reviewing.html
> 
> > Also, are the contributions to Arrow C++ Parquet being actively reviewed
> > for potential new committers?
> 
> I would certainly do.
> 
> Regards
> 
> Antoine.
> 
> 

Fwd: [C++] Parquet and Arrow overlap

Posted by Antoine Pitrou <an...@python.org>.

Hi Will,

Le 01/02/2023 à 20:27, Will Jones a écrit :
> 
> First, it's not obvious where issues are supposed to be open: In Parquet
> Jira or Arrow GitHub issues. Looking back at some of the original
> discussion, it looks like the intention was
> 
> * use PARQUET-XXX for issues relating to Parquet core
>> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
>> core (e.g. changes that are in parquet/arrow right now)
>>
> The README for the old parquet-cpp repo [3] states instead in it's
> migration note:
> 
>   JIRA issues should continue to be opened in the PARQUET JIRA project.
> 
> Either way, it doesn't seem like this process is obvious to people. Perhaps
> we could clarify this and add notices to Arrow's GitHub issues template?

I agree we should clarify this. I have no personal preference, but I will note
that Github issues decrease friction as having a GH account is already necessary
for submitting PRs.

> Second, committer status is a little unclear. I am a committer on Arrow,
> but not on Parquet right now. Does that mean I should only merge Parquet
> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> Parquet changes at all?

Since Parquet C++ is part of Arrow C++, you are allowed to merge Parquet C++
changes. As always you should ensure you have sufficient understanding of the
contribution, and that it follows established practices:
https://arrow.apache.org/docs/dev/developers/reviewing.html

> Also, are the contributions to Arrow C++ Parquet being actively reviewed
> for potential new committers?

I would certainly do.

Regards

Antoine.


Re: [C++] Parquet and Arrow overlap

Posted by Raúl Cumplido <ra...@gmail.com>.
Hi,

I just wanted to add that with the recent migration to GitHub issues for
Arrow we have updated our development tools (merge script, archery release
tasks, ...) to work with GitHub but we haven't been able to drop JIRA
support due to having to support Parquet issues. This makes us have to
support two issue trackers at the moment. For context on the 11.0.0 release
there were 6 issues tracked on the JIRA Parquet.

Thanks,
Raúl



El jue, 2 feb 2023 a las 10:14, Antoine Pitrou (<an...@python.org>)
escribió:

>
> Hi Will,
>
> Le 01/02/2023 à 20:27, Will Jones a écrit :
> >
> > First, it's not obvious where issues are supposed to be open: In Parquet
> > Jira or Arrow GitHub issues. Looking back at some of the original
> > discussion, it looks like the intention was
> >
> > * use PARQUET-XXX for issues relating to Parquet core
> >> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> >> core (e.g. changes that are in parquet/arrow right now)
> >>
> > The README for the old parquet-cpp repo [3] states instead in it's
> > migration note:
> >
> >   JIRA issues should continue to be opened in the PARQUET JIRA project.
> >
> > Either way, it doesn't seem like this process is obvious to people.
> Perhaps
> > we could clarify this and add notices to Arrow's GitHub issues template?
>
> I agree we should clarify this. I have no personal preference, but I
> will note that Github issues decrease friction as having a GH account is
> already necessary for submitting PRs.
>
> > Second, committer status is a little unclear. I am a committer on Arrow,
> > but not on Parquet right now. Does that mean I should only merge Parquet
> > C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> > Parquet changes at all?
>
> Since Parquet C++ is part of Arrow C++, you are allowed to merge Parquet
> C++ changes. As always you should ensure you have sufficient
> understanding of the contribution, and that it follows established
> practices:
> https://arrow.apache.org/docs/dev/developers/reviewing.html
>
> > Also, are the contributions to Arrow C++ Parquet being actively reviewed
> > for potential new committers?
>
> I would certainly do.
>
> Regards
>
> Antoine.
>

Re: [C++] Parquet and Arrow overlap

Posted by Antoine Pitrou <an...@python.org>.
Hi Will,

Le 01/02/2023 à 20:27, Will Jones a écrit :
> 
> First, it's not obvious where issues are supposed to be open: In Parquet
> Jira or Arrow GitHub issues. Looking back at some of the original
> discussion, it looks like the intention was
> 
> * use PARQUET-XXX for issues relating to Parquet core
>> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
>> core (e.g. changes that are in parquet/arrow right now)
>>
> The README for the old parquet-cpp repo [3] states instead in it's
> migration note:
> 
>   JIRA issues should continue to be opened in the PARQUET JIRA project.
> 
> Either way, it doesn't seem like this process is obvious to people. Perhaps
> we could clarify this and add notices to Arrow's GitHub issues template?

I agree we should clarify this. I have no personal preference, but I 
will note that Github issues decrease friction as having a GH account is 
already necessary for submitting PRs.

> Second, committer status is a little unclear. I am a committer on Arrow,
> but not on Parquet right now. Does that mean I should only merge Parquet
> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> Parquet changes at all?

Since Parquet C++ is part of Arrow C++, you are allowed to merge Parquet 
C++ changes. As always you should ensure you have sufficient 
understanding of the contribution, and that it follows established 
practices:
https://arrow.apache.org/docs/dev/developers/reviewing.html

> Also, are the contributions to Arrow C++ Parquet being actively reviewed
> for potential new committers?

I would certainly do.

Regards

Antoine.