You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Wes McKinney <we...@gmail.com> on 2018/02/25 00:07:58 UTC

Rekindling the Apache Parquet monorepo discussion

hi folks,

in a past sync we discussed the prospect of combining all of the
Parquet subprojects into a single code repo. Since there are some
other programming languages which may join the fold (Rust, C#.NET), it
would be beneficial to combine everything into a single repository to
assist with integration and compatibility testing.

Subprojects (C++, Java, etc.) could still have their own versioned releases.

Is this still of interest to the community? I would be willing to
assist with this effort. In theory the repo merge could be performed
without loss of git history in the respective projects.

- Wes

Re: Rekindling the Apache Parquet monorepo discussion

Posted by Wes McKinney <we...@gmail.com>.
The downside I would say is that it's harder to get visibility into
whether a particular patch breaks integration tests. We've been using
the monorepo approach in Apache Arrow, and having the peace of mind
that the matrix of languages are all always working together is quite
nice. I agree that stale / inactive codebases are an issue.

On the C++ side, I would say the main thorn in my side is having to
maintain multiple interconnected build systems. The Google-style
monorepo approach / single unified build systems for C++ projects
makes things much simpler, but this is a separate discussion from
whether the C++ and Java Parquet libraries should live in the same
place.

- Wes

On Tue, Feb 27, 2018 at 12:29 PM, Ryan Blue <rb...@netflix.com.invalid> wrote:
> I don't really like the idea of a single repo for all the languages. That's
> what we have in Avro and I tried (and failed) to separate everything out.
> The build is more complicated, tests take longer and duplicate work, and we
> end up needing to maintain old code that no one works on.
>
> I'd rather see separate repositories and a central test repository that
> uses released versions from the others. Release candidates could be added
> with PRs. Isn't it better, for example, to test a new CPP version against
> the latest Java release instead of unreleased branches of both?
>
> That said, I realize that this community isn't like Avro and the projects
> are more active, so if there's consensus the other direction I'll go along
> with it.
>
> rb
>
> On Tue, Feb 27, 2018 at 12:26 AM, Renato Marroquín Mogrovejo <
> renatoj.marroquin@gmail.com> wrote:
>
>> +1 from having a single repo for different programming language
>> implementations, and one for the spec
>>
>> 2018-02-27 5:19 GMT+01:00 Wes McKinney <we...@gmail.com>:
>>
>> > In the interest of continuity, we could rename apache/parquet-mr to
>> > apache/parquet, move some directories around, and then merge in the
>> > parquet-format and parquet-cpp repos. There's probably some other
>> > approaches that would be fine, too.
>> >
>> > On Mon, Feb 26, 2018 at 9:40 PM, Nong Li <no...@gmail.com> wrote:
>> > > Are you thinking of merging all 3 into one or two at a time?
>> > >
>> > > On Mon, Feb 26, 2018 at 7:09 AM, Wes McKinney <we...@gmail.com>
>> > wrote:
>> > >
>> > >> What I would suggest to do is to create a script to forms the merged
>> > >> repository (so we can verify that "git blame" will still work as
>> > >> intended). If there is consensus about doing this generally, we can
>> > >> then debate the structure of the repo and other details. For example,
>> > >> it could be similar to Apache Thrift's repo structure
>> > >>
>> > >>
>> > >> On Mon, Feb 26, 2018 at 9:32 AM, Deepak Majeti <
>> majeti.deepak@gmail.com
>> > >
>> > >> wrote:
>> > >> > +1. Compatibility benefits will be worthy.
>> > >> >
>> > >> > On Mon, Feb 26, 2018 at 12:48 AM, Nong Li <no...@gmail.com> wrote:
>> > >> >
>> > >> >> I think this is a great idea. Let me know if I can help.
>> > >> >>
>> > >> >> On Sat, Feb 24, 2018 at 4:07 PM, Wes McKinney <wesmckinn@gmail.com
>> >
>> > >> wrote:
>> > >> >>
>> > >> >> > hi folks,
>> > >> >> >
>> > >> >> > in a past sync we discussed the prospect of combining all of the
>> > >> >> > Parquet subprojects into a single code repo. Since there are some
>> > >> >> > other programming languages which may join the fold (Rust,
>> > C#.NET), it
>> > >> >> > would be beneficial to combine everything into a single
>> repository
>> > to
>> > >> >> > assist with integration and compatibility testing.
>> > >> >> >
>> > >> >> > Subprojects (C++, Java, etc.) could still have their own
>> versioned
>> > >> >> > releases.
>> > >> >> >
>> > >> >> > Is this still of interest to the community? I would be willing to
>> > >> >> > assist with this effort. In theory the repo merge could be
>> > performed
>> > >> >> > without loss of git history in the respective projects.
>> > >> >> >
>> > >> >> > - Wes
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> > regards,
>> > >> > Deepak Majeti
>> > >>
>> >
>>
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix

Re: Rekindling the Apache Parquet monorepo discussion

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I don't really like the idea of a single repo for all the languages. That's
what we have in Avro and I tried (and failed) to separate everything out.
The build is more complicated, tests take longer and duplicate work, and we
end up needing to maintain old code that no one works on.

I'd rather see separate repositories and a central test repository that
uses released versions from the others. Release candidates could be added
with PRs. Isn't it better, for example, to test a new CPP version against
the latest Java release instead of unreleased branches of both?

That said, I realize that this community isn't like Avro and the projects
are more active, so if there's consensus the other direction I'll go along
with it.

rb

On Tue, Feb 27, 2018 at 12:26 AM, Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> +1 from having a single repo for different programming language
> implementations, and one for the spec
>
> 2018-02-27 5:19 GMT+01:00 Wes McKinney <we...@gmail.com>:
>
> > In the interest of continuity, we could rename apache/parquet-mr to
> > apache/parquet, move some directories around, and then merge in the
> > parquet-format and parquet-cpp repos. There's probably some other
> > approaches that would be fine, too.
> >
> > On Mon, Feb 26, 2018 at 9:40 PM, Nong Li <no...@gmail.com> wrote:
> > > Are you thinking of merging all 3 into one or two at a time?
> > >
> > > On Mon, Feb 26, 2018 at 7:09 AM, Wes McKinney <we...@gmail.com>
> > wrote:
> > >
> > >> What I would suggest to do is to create a script to forms the merged
> > >> repository (so we can verify that "git blame" will still work as
> > >> intended). If there is consensus about doing this generally, we can
> > >> then debate the structure of the repo and other details. For example,
> > >> it could be similar to Apache Thrift's repo structure
> > >>
> > >>
> > >> On Mon, Feb 26, 2018 at 9:32 AM, Deepak Majeti <
> majeti.deepak@gmail.com
> > >
> > >> wrote:
> > >> > +1. Compatibility benefits will be worthy.
> > >> >
> > >> > On Mon, Feb 26, 2018 at 12:48 AM, Nong Li <no...@gmail.com> wrote:
> > >> >
> > >> >> I think this is a great idea. Let me know if I can help.
> > >> >>
> > >> >> On Sat, Feb 24, 2018 at 4:07 PM, Wes McKinney <wesmckinn@gmail.com
> >
> > >> wrote:
> > >> >>
> > >> >> > hi folks,
> > >> >> >
> > >> >> > in a past sync we discussed the prospect of combining all of the
> > >> >> > Parquet subprojects into a single code repo. Since there are some
> > >> >> > other programming languages which may join the fold (Rust,
> > C#.NET), it
> > >> >> > would be beneficial to combine everything into a single
> repository
> > to
> > >> >> > assist with integration and compatibility testing.
> > >> >> >
> > >> >> > Subprojects (C++, Java, etc.) could still have their own
> versioned
> > >> >> > releases.
> > >> >> >
> > >> >> > Is this still of interest to the community? I would be willing to
> > >> >> > assist with this effort. In theory the repo merge could be
> > performed
> > >> >> > without loss of git history in the respective projects.
> > >> >> >
> > >> >> > - Wes
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > regards,
> > >> > Deepak Majeti
> > >>
> >
>



-- 
Ryan Blue
Software Engineer
Netflix

Re: Rekindling the Apache Parquet monorepo discussion

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
+1 from having a single repo for different programming language
implementations, and one for the spec

2018-02-27 5:19 GMT+01:00 Wes McKinney <we...@gmail.com>:

> In the interest of continuity, we could rename apache/parquet-mr to
> apache/parquet, move some directories around, and then merge in the
> parquet-format and parquet-cpp repos. There's probably some other
> approaches that would be fine, too.
>
> On Mon, Feb 26, 2018 at 9:40 PM, Nong Li <no...@gmail.com> wrote:
> > Are you thinking of merging all 3 into one or two at a time?
> >
> > On Mon, Feb 26, 2018 at 7:09 AM, Wes McKinney <we...@gmail.com>
> wrote:
> >
> >> What I would suggest to do is to create a script to forms the merged
> >> repository (so we can verify that "git blame" will still work as
> >> intended). If there is consensus about doing this generally, we can
> >> then debate the structure of the repo and other details. For example,
> >> it could be similar to Apache Thrift's repo structure
> >>
> >>
> >> On Mon, Feb 26, 2018 at 9:32 AM, Deepak Majeti <majeti.deepak@gmail.com
> >
> >> wrote:
> >> > +1. Compatibility benefits will be worthy.
> >> >
> >> > On Mon, Feb 26, 2018 at 12:48 AM, Nong Li <no...@gmail.com> wrote:
> >> >
> >> >> I think this is a great idea. Let me know if I can help.
> >> >>
> >> >> On Sat, Feb 24, 2018 at 4:07 PM, Wes McKinney <we...@gmail.com>
> >> wrote:
> >> >>
> >> >> > hi folks,
> >> >> >
> >> >> > in a past sync we discussed the prospect of combining all of the
> >> >> > Parquet subprojects into a single code repo. Since there are some
> >> >> > other programming languages which may join the fold (Rust,
> C#.NET), it
> >> >> > would be beneficial to combine everything into a single repository
> to
> >> >> > assist with integration and compatibility testing.
> >> >> >
> >> >> > Subprojects (C++, Java, etc.) could still have their own versioned
> >> >> > releases.
> >> >> >
> >> >> > Is this still of interest to the community? I would be willing to
> >> >> > assist with this effort. In theory the repo merge could be
> performed
> >> >> > without loss of git history in the respective projects.
> >> >> >
> >> >> > - Wes
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > regards,
> >> > Deepak Majeti
> >>
>

Re: Rekindling the Apache Parquet monorepo discussion

Posted by Wes McKinney <we...@gmail.com>.
In the interest of continuity, we could rename apache/parquet-mr to
apache/parquet, move some directories around, and then merge in the
parquet-format and parquet-cpp repos. There's probably some other
approaches that would be fine, too.

On Mon, Feb 26, 2018 at 9:40 PM, Nong Li <no...@gmail.com> wrote:
> Are you thinking of merging all 3 into one or two at a time?
>
> On Mon, Feb 26, 2018 at 7:09 AM, Wes McKinney <we...@gmail.com> wrote:
>
>> What I would suggest to do is to create a script to forms the merged
>> repository (so we can verify that "git blame" will still work as
>> intended). If there is consensus about doing this generally, we can
>> then debate the structure of the repo and other details. For example,
>> it could be similar to Apache Thrift's repo structure
>>
>>
>> On Mon, Feb 26, 2018 at 9:32 AM, Deepak Majeti <ma...@gmail.com>
>> wrote:
>> > +1. Compatibility benefits will be worthy.
>> >
>> > On Mon, Feb 26, 2018 at 12:48 AM, Nong Li <no...@gmail.com> wrote:
>> >
>> >> I think this is a great idea. Let me know if I can help.
>> >>
>> >> On Sat, Feb 24, 2018 at 4:07 PM, Wes McKinney <we...@gmail.com>
>> wrote:
>> >>
>> >> > hi folks,
>> >> >
>> >> > in a past sync we discussed the prospect of combining all of the
>> >> > Parquet subprojects into a single code repo. Since there are some
>> >> > other programming languages which may join the fold (Rust, C#.NET), it
>> >> > would be beneficial to combine everything into a single repository to
>> >> > assist with integration and compatibility testing.
>> >> >
>> >> > Subprojects (C++, Java, etc.) could still have their own versioned
>> >> > releases.
>> >> >
>> >> > Is this still of interest to the community? I would be willing to
>> >> > assist with this effort. In theory the repo merge could be performed
>> >> > without loss of git history in the respective projects.
>> >> >
>> >> > - Wes
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > regards,
>> > Deepak Majeti
>>

Re: Rekindling the Apache Parquet monorepo discussion

Posted by Nong Li <no...@gmail.com>.
Are you thinking of merging all 3 into one or two at a time?

On Mon, Feb 26, 2018 at 7:09 AM, Wes McKinney <we...@gmail.com> wrote:

> What I would suggest to do is to create a script to forms the merged
> repository (so we can verify that "git blame" will still work as
> intended). If there is consensus about doing this generally, we can
> then debate the structure of the repo and other details. For example,
> it could be similar to Apache Thrift's repo structure
>
>
> On Mon, Feb 26, 2018 at 9:32 AM, Deepak Majeti <ma...@gmail.com>
> wrote:
> > +1. Compatibility benefits will be worthy.
> >
> > On Mon, Feb 26, 2018 at 12:48 AM, Nong Li <no...@gmail.com> wrote:
> >
> >> I think this is a great idea. Let me know if I can help.
> >>
> >> On Sat, Feb 24, 2018 at 4:07 PM, Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >> > hi folks,
> >> >
> >> > in a past sync we discussed the prospect of combining all of the
> >> > Parquet subprojects into a single code repo. Since there are some
> >> > other programming languages which may join the fold (Rust, C#.NET), it
> >> > would be beneficial to combine everything into a single repository to
> >> > assist with integration and compatibility testing.
> >> >
> >> > Subprojects (C++, Java, etc.) could still have their own versioned
> >> > releases.
> >> >
> >> > Is this still of interest to the community? I would be willing to
> >> > assist with this effort. In theory the repo merge could be performed
> >> > without loss of git history in the respective projects.
> >> >
> >> > - Wes
> >> >
> >>
> >
> >
> >
> > --
> > regards,
> > Deepak Majeti
>

Re: Rekindling the Apache Parquet monorepo discussion

Posted by Wes McKinney <we...@gmail.com>.
What I would suggest to do is to create a script to forms the merged
repository (so we can verify that "git blame" will still work as
intended). If there is consensus about doing this generally, we can
then debate the structure of the repo and other details. For example,
it could be similar to Apache Thrift's repo structure


On Mon, Feb 26, 2018 at 9:32 AM, Deepak Majeti <ma...@gmail.com> wrote:
> +1. Compatibility benefits will be worthy.
>
> On Mon, Feb 26, 2018 at 12:48 AM, Nong Li <no...@gmail.com> wrote:
>
>> I think this is a great idea. Let me know if I can help.
>>
>> On Sat, Feb 24, 2018 at 4:07 PM, Wes McKinney <we...@gmail.com> wrote:
>>
>> > hi folks,
>> >
>> > in a past sync we discussed the prospect of combining all of the
>> > Parquet subprojects into a single code repo. Since there are some
>> > other programming languages which may join the fold (Rust, C#.NET), it
>> > would be beneficial to combine everything into a single repository to
>> > assist with integration and compatibility testing.
>> >
>> > Subprojects (C++, Java, etc.) could still have their own versioned
>> > releases.
>> >
>> > Is this still of interest to the community? I would be willing to
>> > assist with this effort. In theory the repo merge could be performed
>> > without loss of git history in the respective projects.
>> >
>> > - Wes
>> >
>>
>
>
>
> --
> regards,
> Deepak Majeti

Re: Rekindling the Apache Parquet monorepo discussion

Posted by Deepak Majeti <ma...@gmail.com>.
+1. Compatibility benefits will be worthy.

On Mon, Feb 26, 2018 at 12:48 AM, Nong Li <no...@gmail.com> wrote:

> I think this is a great idea. Let me know if I can help.
>
> On Sat, Feb 24, 2018 at 4:07 PM, Wes McKinney <we...@gmail.com> wrote:
>
> > hi folks,
> >
> > in a past sync we discussed the prospect of combining all of the
> > Parquet subprojects into a single code repo. Since there are some
> > other programming languages which may join the fold (Rust, C#.NET), it
> > would be beneficial to combine everything into a single repository to
> > assist with integration and compatibility testing.
> >
> > Subprojects (C++, Java, etc.) could still have their own versioned
> > releases.
> >
> > Is this still of interest to the community? I would be willing to
> > assist with this effort. In theory the repo merge could be performed
> > without loss of git history in the respective projects.
> >
> > - Wes
> >
>



-- 
regards,
Deepak Majeti

Re: Rekindling the Apache Parquet monorepo discussion

Posted by Nong Li <no...@gmail.com>.
I think this is a great idea. Let me know if I can help.

On Sat, Feb 24, 2018 at 4:07 PM, Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> in a past sync we discussed the prospect of combining all of the
> Parquet subprojects into a single code repo. Since there are some
> other programming languages which may join the fold (Rust, C#.NET), it
> would be beneficial to combine everything into a single repository to
> assist with integration and compatibility testing.
>
> Subprojects (C++, Java, etc.) could still have their own versioned
> releases.
>
> Is this still of interest to the community? I would be willing to
> assist with this effort. In theory the repo merge could be performed
> without loss of git history in the respective projects.
>
> - Wes
>