You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Wes McKinney <we...@gmail.com> on 2018/03/02 21:49:16 UTC

Re: Rekindling the Apache Parquet monorepo discussion

The downside I would say is that it's harder to get visibility into
whether a particular patch breaks integration tests. We've been using
the monorepo approach in Apache Arrow, and having the peace of mind
that the matrix of languages are all always working together is quite
nice. I agree that stale / inactive codebases are an issue.

On the C++ side, I would say the main thorn in my side is having to
maintain multiple interconnected build systems. The Google-style
monorepo approach / single unified build systems for C++ projects
makes things much simpler, but this is a separate discussion from
whether the C++ and Java Parquet libraries should live in the same
place.

- Wes

On Tue, Feb 27, 2018 at 12:29 PM, Ryan Blue <rb...@netflix.com.invalid> wrote:
> I don't really like the idea of a single repo for all the languages. That's
> what we have in Avro and I tried (and failed) to separate everything out.
> The build is more complicated, tests take longer and duplicate work, and we
> end up needing to maintain old code that no one works on.
>
> I'd rather see separate repositories and a central test repository that
> uses released versions from the others. Release candidates could be added
> with PRs. Isn't it better, for example, to test a new CPP version against
> the latest Java release instead of unreleased branches of both?
>
> That said, I realize that this community isn't like Avro and the projects
> are more active, so if there's consensus the other direction I'll go along
> with it.
>
> rb
>
> On Tue, Feb 27, 2018 at 12:26 AM, Renato MarroquĂ­n Mogrovejo <
> renatoj.marroquin@gmail.com> wrote:
>
>> +1 from having a single repo for different programming language
>> implementations, and one for the spec
>>
>> 2018-02-27 5:19 GMT+01:00 Wes McKinney <we...@gmail.com>:
>>
>> > In the interest of continuity, we could rename apache/parquet-mr to
>> > apache/parquet, move some directories around, and then merge in the
>> > parquet-format and parquet-cpp repos. There's probably some other
>> > approaches that would be fine, too.
>> >
>> > On Mon, Feb 26, 2018 at 9:40 PM, Nong Li <no...@gmail.com> wrote:
>> > > Are you thinking of merging all 3 into one or two at a time?
>> > >
>> > > On Mon, Feb 26, 2018 at 7:09 AM, Wes McKinney <we...@gmail.com>
>> > wrote:
>> > >
>> > >> What I would suggest to do is to create a script to forms the merged
>> > >> repository (so we can verify that "git blame" will still work as
>> > >> intended). If there is consensus about doing this generally, we can
>> > >> then debate the structure of the repo and other details. For example,
>> > >> it could be similar to Apache Thrift's repo structure
>> > >>
>> > >>
>> > >> On Mon, Feb 26, 2018 at 9:32 AM, Deepak Majeti <
>> majeti.deepak@gmail.com
>> > >
>> > >> wrote:
>> > >> > +1. Compatibility benefits will be worthy.
>> > >> >
>> > >> > On Mon, Feb 26, 2018 at 12:48 AM, Nong Li <no...@gmail.com> wrote:
>> > >> >
>> > >> >> I think this is a great idea. Let me know if I can help.
>> > >> >>
>> > >> >> On Sat, Feb 24, 2018 at 4:07 PM, Wes McKinney <wesmckinn@gmail.com
>> >
>> > >> wrote:
>> > >> >>
>> > >> >> > hi folks,
>> > >> >> >
>> > >> >> > in a past sync we discussed the prospect of combining all of the
>> > >> >> > Parquet subprojects into a single code repo. Since there are some
>> > >> >> > other programming languages which may join the fold (Rust,
>> > C#.NET), it
>> > >> >> > would be beneficial to combine everything into a single
>> repository
>> > to
>> > >> >> > assist with integration and compatibility testing.
>> > >> >> >
>> > >> >> > Subprojects (C++, Java, etc.) could still have their own
>> versioned
>> > >> >> > releases.
>> > >> >> >
>> > >> >> > Is this still of interest to the community? I would be willing to
>> > >> >> > assist with this effort. In theory the repo merge could be
>> > performed
>> > >> >> > without loss of git history in the respective projects.
>> > >> >> >
>> > >> >> > - Wes
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> > regards,
>> > >> > Deepak Majeti
>> > >>
>> >
>>
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix