You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Wes McKinney <we...@gmail.com> on 2018/09/04 16:27:12 UTC

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Dear all,

The repo merge is nearly ready to go modulo some fixes to CI. There
will be a number of follow up issues to re-establish the various
(untested) build procedures in parquet-cpp

https://github.com/apache/arrow/pull/2453

I would like to merge this by EOD Wednesday 9/5, or Thursday at
latest, so we can get the patches from apache/parquet-cpp moved over
and avoid any disruption to development process. If there are any
comments please let me know

- Wes
On Tue, Aug 21, 2018 at 12:23 PM Wes McKinney <we...@gmail.com> wrote:
>
> hi all,
>
> with 3 binding +1 votes, the vote carries. We will discuss with Apache
> Arrow about how to specifically proceed
>
> I have already done the preparatory work to undertake the merge
>
> https://github.com/apache/arrow/pull/2453
>
> thanks
> Wes
>
> On Tue, Aug 21, 2018 at 10:41 AM, Wes McKinney <we...@gmail.com> wrote:
> > Yes, feel free to have a look at
> >
> > https://github.com/apache/arrow/pull/2453
> >
> > I'm not very in favor of having a commingled non-linear history that
> > makes git bisect difficult. We will have to discuss on the Arrow ML
> >
> > Here's an example from Apache Spark where a similar merge took place
> >
> > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53
> >
> > It would be my preference to have a single squashed commit whose
> > message attributes the developers of the code and provides links back
> > to the original commit history in the commit message
> >
> > - Wes
> >
> >
> > On Tue, Aug 21, 2018 at 9:52 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> >> I have a very strong preference to keep the git history. I will have a look tomorrow to find the correct git magic to get a linear history. For me a single merge commit would be ok but I'm fine to spend an additional hour on this if you care strongly about linear history.
> >>
> >> Uwe
> >>
> >> On Sun, Aug 19, 2018, at 7:36 PM, Wes McKinney wrote:
> >>> OK. I'm a bit -0 on doing anything that results in Arrow having a
> >>> nonlinear git history (and rebasing is not really an option) but we
> >>> can discuss that more later
> >>>
> >>> On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> >>> > +1 on this but also see my comments in the mail on the discussions.
> >>> >
> >>> > We should also keep the git history of parquet-cpp, that should not be hard with git and there is probably a StackOverflow answer out there that gives you the commands to do the merge.
> >>> >
> >>> > Uwe
> >>> >
> >>> > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote:
> >>> >> In case any are interested: my estimate of the work involved in the
> >>> >> migration to be about a full day of total work, possibly less. As soon
> >>> >> as the migration plan is decided upon I intend to execute ASAP so that
> >>> >> ongoing development efforts are not disrupted.
> >>> >>
> >>> >> Additionally, in flight patches do not all need to be merged. Patches
> >>> >> can be easily edited to apply against the modified repository
> >>> >> structure
> >>> >>
> >>> >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney <we...@gmail.com> wrote:
> >>> >> > hi all,
> >>> >> >
> >>> >> > As discussed on the mailing list [1] I am proposing to undertake a
> >>> >> > restructuring of the development process for parquet-cpp and its
> >>> >> > consumption in the Arrow ecosystem to benefit the developers and users
> >>> >> > of both communities.
> >>> >> >
> >>> >> > The specific actions we would take would be:
> >>> >> >
> >>> >> > 1) Move the source code currently located at src/ in the
> >>> >> > apache/parquet-cpp repository [2] to the cpp/src/ directory located in
> >>> >> > apache/arrow [3]
> >>> >> >
> >>> >> > 2) The parquet code tree would remain separate from the Arrow code
> >>> >> > tree, though the two projects will continue to share code as they do
> >>> >> > now
> >>> >> >
> >>> >> > 3) The build system in apache/parquet-cpp would be effectively
> >>> >> > deprecated and can be mostly discarded, as it is largely redundant and
> >>> >> > duplicated from the build system in apache/arrow
> >>> >> >
> >>> >> > 4) The Parquet and Arrow C++ communities will collaborate to provide
> >>> >> > development workflows to enable contributors working exclusively on
> >>> >> > the Parquet core functionality to be able to work unencumbered with
> >>> >> > unnecessary build or test dependencies from the rest of the Arrow
> >>> >> > codebase. Note that parquet-cpp already builds a significant portion
> >>> >> > of Apache Arrow en route to creating its libraries
> >>> >> >
> >>> >> > 5) The Parquet community can create scripts to "cut" Parquet C++
> >>> >> > releases by packaging up the appropriate components and ensuring that
> >>> >> > they can be built and installed independently as now
> >>> >> >
> >>> >> > 6) The CI processes would be merged -- since we already build the
> >>> >> > Parquet libraries in Arrow's CI workflow, this would amount to
> >>> >> > building the Parquet unit tests and running them.
> >>> >> >
> >>> >> > 7) Patches contributed that do not involve Arrow-related functionality
> >>> >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX patches may
> >>> >> > span both codebases
> >>> >> >
> >>> >> > 8) Parquet C++ committers can be given push rights on apache/arrow
> >>> >> > subject to ongoing good citizenry (e.g. not merging patches that break
> >>> >> > builds). The Arrow PMC may need to vote on the procedure for offering
> >>> >> > pass-through commit rights to anyone who has been invited to be a
> >>> >> > committer for Apache Parquet
> >>> >> >
> >>> >> > 9) The contributors who work on both Arrow and Parquet will work in
> >>> >> > good faith to ensure that that needs of Parquet-only developers (i.e.
> >>> >> > who consume Parquet files in some way unrelated to the Arrow columnar
> >>> >> > standard) are accommodated
> >>> >> >
> >>> >> > There are a number of particular details we will need to discuss
> >>> >> > further (such as the specific logistics of the codebase surgery; e.g.
> >>> >> > how to manage the commit history in apache/parquet-cpp -- do we care
> >>> >> > about git blame?)
> >>> >> >
> >>> >> > This vote is to determine if the Parquet PMC is in favor of working in
> >>> >> > good faith to execute on the above plan. I will inquire with the Arrow
> >>> >> > PMC to see if we need to have a corresponding vote there, and also how
> >>> >> > to handle the management of commit rights.
> >>> >> >
> >>> >> > [ ] +1: In favor of implementing the proposed monorepo plan
> >>> >> > [ ] +0: . . .
> >>> >> > [ ] -1: Not in favor because . . .
> >>> >> >
> >>> >> > Here is my vote: +1.
> >>> >> >
> >>> >> > Thank you,
> >>> >> > Wes
> >>> >> >
> >>> >> > [1]: https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E
> >>> >> > [2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet
> >>> >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by Wes McKinney <we...@gmail.com>.
That is fine with me.
On Tue, Oct 16, 2018 at 2:32 AM Uwe L. Korn <uw...@xhochy.com> wrote:
>
> On Tue, Oct 16, 2018, at 1:05 AM, Julien Le Dem wrote:
> > What does archiving the master branch look like? Are we renaming master and
> > leaving a readme pointing to the new repo?
>
> That would be my preferred option. Any objections?

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
On Tue, Oct 16, 2018, at 1:05 AM, Julien Le Dem wrote:
> What does archiving the master branch look like? Are we renaming master and
> leaving a readme pointing to the new repo?

That would be my preferred option. Any objections?

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by Julien Le Dem <ju...@wework.com.INVALID>.
What does archiving the master branch look like? Are we renaming master and
leaving a readme pointing to the new repo?


On Thu, Sep 20, 2018 at 3:30 PM Wes McKinney <we...@gmail.com> wrote:

> OK. There is still some code (examples, CLI tools) that needs to be
> moved over. Once that's done and all the outstanding PRs are
> moved/closed, I will do that
> On Thu, Sep 20, 2018 at 8:45 AM Uwe L. Korn <uw...@xhochy.com> wrote:
> >
> > Hello Wes,
> >
> > I'm definitely +1 on archiving the master branch. I'm not sure what you
> mean exactly with this. I would have simply added a final commit that
> deletes all code and adds a message to the README that the repository has
> moved into a another repo.
> >
> > Cheers
> > Uwe
> >
> > On Thu, Sep 13, 2018, at 10:47 PM, Wes McKinney wrote:
> > > hi folks,
> > >
> > > Could I get some feedback about the follow-up items? There are still
> > > some parts of the codebase that need to be migrated. Additionally, I'm
> > > proposing to archive the master branch so that people with build
> > > toolchains running against parquet-cpp master will be forced to
> > > migrate. The hard part is over; I would like to get things closed out
> > > on apache/parquet-cpp and move development forward.
> > >
> > > Thanks,
> > > Wes
> > > On Sun, Sep 9, 2018 at 8:45 PM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > > Might make sense to archive the master branch so that people's
> > > > now-outdated build toolchains (where they may be cloning
> > > > apache/parquet-cpp) will fail fast. We are already starting to get
> bug
> > > > reports along these lines.
> > > >
> > > > Thoughts?
> > > > On Sat, Sep 8, 2018 at 10:43 AM Wes McKinney <we...@gmail.com>
> wrote:
> > > > >
> > > > > We should probably also write a blog post on the Apache Arrow
> website
> > > > > to increase visibility of this move to the broader community.
> > > > >
> > > > > On Sat, Sep 8, 2018 at 10:42 AM Wes McKinney <we...@gmail.com>
> wrote:
> > > > > >
> > > > > > Dear all -- the merge has been completed, thank you! 318 patches
> > > > > > (after the filter-branch grafting procedure) were merged to
> > > > > > apache/arrow
> > > > > >
> > > > > > We have some follow up work to do:
> > > > > >
> > > > > > * Move patches from apache/parquet-cpp to apache/arrow
> > > > > > * Add CONTRIBUTING.md and note to README that patches are no
> longer
> > > > > > accepted at the old location
> > > > > > * Migrate CLI utiltiies and other small items that did not
> survive the
> > > > > > merge: tools/, benchmarks/, and examples/
> > > > > > * Develop new release procedure for Apache Parquet
> > > > > >
> > > > > > On this third point, we can also import their git history if
> desired.
> > > > > > Incorporating them into the build will be comparatively easy to
> the
> > > > > > library integration.
> > > > > >
> > > > > > There are already some JIRA issues open for some of these, but
> > > > > > anything else please create issues so we can keep track.
> > > > > >
> > > > > > I'm already quite excited to get busy with some refactoring and
> > > > > > internals improvements that I had avoided because of the painful
> > > > > > development procedure.
> > > > > >
> > > > > > Thanks,
> > > > > > Wes
>

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by Wes McKinney <we...@gmail.com>.
OK. There is still some code (examples, CLI tools) that needs to be
moved over. Once that's done and all the outstanding PRs are
moved/closed, I will do that
On Thu, Sep 20, 2018 at 8:45 AM Uwe L. Korn <uw...@xhochy.com> wrote:
>
> Hello Wes,
>
> I'm definitely +1 on archiving the master branch. I'm not sure what you mean exactly with this. I would have simply added a final commit that deletes all code and adds a message to the README that the repository has moved into a another repo.
>
> Cheers
> Uwe
>
> On Thu, Sep 13, 2018, at 10:47 PM, Wes McKinney wrote:
> > hi folks,
> >
> > Could I get some feedback about the follow-up items? There are still
> > some parts of the codebase that need to be migrated. Additionally, I'm
> > proposing to archive the master branch so that people with build
> > toolchains running against parquet-cpp master will be forced to
> > migrate. The hard part is over; I would like to get things closed out
> > on apache/parquet-cpp and move development forward.
> >
> > Thanks,
> > Wes
> > On Sun, Sep 9, 2018 at 8:45 PM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > Might make sense to archive the master branch so that people's
> > > now-outdated build toolchains (where they may be cloning
> > > apache/parquet-cpp) will fail fast. We are already starting to get bug
> > > reports along these lines.
> > >
> > > Thoughts?
> > > On Sat, Sep 8, 2018 at 10:43 AM Wes McKinney <we...@gmail.com> wrote:
> > > >
> > > > We should probably also write a blog post on the Apache Arrow website
> > > > to increase visibility of this move to the broader community.
> > > >
> > > > On Sat, Sep 8, 2018 at 10:42 AM Wes McKinney <we...@gmail.com> wrote:
> > > > >
> > > > > Dear all -- the merge has been completed, thank you! 318 patches
> > > > > (after the filter-branch grafting procedure) were merged to
> > > > > apache/arrow
> > > > >
> > > > > We have some follow up work to do:
> > > > >
> > > > > * Move patches from apache/parquet-cpp to apache/arrow
> > > > > * Add CONTRIBUTING.md and note to README that patches are no longer
> > > > > accepted at the old location
> > > > > * Migrate CLI utiltiies and other small items that did not survive the
> > > > > merge: tools/, benchmarks/, and examples/
> > > > > * Develop new release procedure for Apache Parquet
> > > > >
> > > > > On this third point, we can also import their git history if desired.
> > > > > Incorporating them into the build will be comparatively easy to the
> > > > > library integration.
> > > > >
> > > > > There are already some JIRA issues open for some of these, but
> > > > > anything else please create issues so we can keep track.
> > > > >
> > > > > I'm already quite excited to get busy with some refactoring and
> > > > > internals improvements that I had avoided because of the painful
> > > > > development procedure.
> > > > >
> > > > > Thanks,
> > > > > Wes

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Hello Wes,

I'm definitely +1 on archiving the master branch. I'm not sure what you mean exactly with this. I would have simply added a final commit that deletes all code and adds a message to the README that the repository has moved into a another repo.

Cheers
Uwe

On Thu, Sep 13, 2018, at 10:47 PM, Wes McKinney wrote:
> hi folks,
> 
> Could I get some feedback about the follow-up items? There are still
> some parts of the codebase that need to be migrated. Additionally, I'm
> proposing to archive the master branch so that people with build
> toolchains running against parquet-cpp master will be forced to
> migrate. The hard part is over; I would like to get things closed out
> on apache/parquet-cpp and move development forward.
> 
> Thanks,
> Wes
> On Sun, Sep 9, 2018 at 8:45 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > Might make sense to archive the master branch so that people's
> > now-outdated build toolchains (where they may be cloning
> > apache/parquet-cpp) will fail fast. We are already starting to get bug
> > reports along these lines.
> >
> > Thoughts?
> > On Sat, Sep 8, 2018 at 10:43 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > We should probably also write a blog post on the Apache Arrow website
> > > to increase visibility of this move to the broader community.
> > >
> > > On Sat, Sep 8, 2018 at 10:42 AM Wes McKinney <we...@gmail.com> wrote:
> > > >
> > > > Dear all -- the merge has been completed, thank you! 318 patches
> > > > (after the filter-branch grafting procedure) were merged to
> > > > apache/arrow
> > > >
> > > > We have some follow up work to do:
> > > >
> > > > * Move patches from apache/parquet-cpp to apache/arrow
> > > > * Add CONTRIBUTING.md and note to README that patches are no longer
> > > > accepted at the old location
> > > > * Migrate CLI utiltiies and other small items that did not survive the
> > > > merge: tools/, benchmarks/, and examples/
> > > > * Develop new release procedure for Apache Parquet
> > > >
> > > > On this third point, we can also import their git history if desired.
> > > > Incorporating them into the build will be comparatively easy to the
> > > > library integration.
> > > >
> > > > There are already some JIRA issues open for some of these, but
> > > > anything else please create issues so we can keep track.
> > > >
> > > > I'm already quite excited to get busy with some refactoring and
> > > > internals improvements that I had avoided because of the painful
> > > > development procedure.
> > > >
> > > > Thanks,
> > > > Wes

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by Wes McKinney <we...@gmail.com>.
hi folks,

Could I get some feedback about the follow-up items? There are still
some parts of the codebase that need to be migrated. Additionally, I'm
proposing to archive the master branch so that people with build
toolchains running against parquet-cpp master will be forced to
migrate. The hard part is over; I would like to get things closed out
on apache/parquet-cpp and move development forward.

Thanks,
Wes
On Sun, Sep 9, 2018 at 8:45 PM Wes McKinney <we...@gmail.com> wrote:
>
> Might make sense to archive the master branch so that people's
> now-outdated build toolchains (where they may be cloning
> apache/parquet-cpp) will fail fast. We are already starting to get bug
> reports along these lines.
>
> Thoughts?
> On Sat, Sep 8, 2018 at 10:43 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > We should probably also write a blog post on the Apache Arrow website
> > to increase visibility of this move to the broader community.
> >
> > On Sat, Sep 8, 2018 at 10:42 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > Dear all -- the merge has been completed, thank you! 318 patches
> > > (after the filter-branch grafting procedure) were merged to
> > > apache/arrow
> > >
> > > We have some follow up work to do:
> > >
> > > * Move patches from apache/parquet-cpp to apache/arrow
> > > * Add CONTRIBUTING.md and note to README that patches are no longer
> > > accepted at the old location
> > > * Migrate CLI utiltiies and other small items that did not survive the
> > > merge: tools/, benchmarks/, and examples/
> > > * Develop new release procedure for Apache Parquet
> > >
> > > On this third point, we can also import their git history if desired.
> > > Incorporating them into the build will be comparatively easy to the
> > > library integration.
> > >
> > > There are already some JIRA issues open for some of these, but
> > > anything else please create issues so we can keep track.
> > >
> > > I'm already quite excited to get busy with some refactoring and
> > > internals improvements that I had avoided because of the painful
> > > development procedure.
> > >
> > > Thanks,
> > > Wes

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by Wes McKinney <we...@gmail.com>.
Might make sense to archive the master branch so that people's
now-outdated build toolchains (where they may be cloning
apache/parquet-cpp) will fail fast. We are already starting to get bug
reports along these lines.

Thoughts?
On Sat, Sep 8, 2018 at 10:43 AM Wes McKinney <we...@gmail.com> wrote:
>
> We should probably also write a blog post on the Apache Arrow website
> to increase visibility of this move to the broader community.
>
> On Sat, Sep 8, 2018 at 10:42 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > Dear all -- the merge has been completed, thank you! 318 patches
> > (after the filter-branch grafting procedure) were merged to
> > apache/arrow
> >
> > We have some follow up work to do:
> >
> > * Move patches from apache/parquet-cpp to apache/arrow
> > * Add CONTRIBUTING.md and note to README that patches are no longer
> > accepted at the old location
> > * Migrate CLI utiltiies and other small items that did not survive the
> > merge: tools/, benchmarks/, and examples/
> > * Develop new release procedure for Apache Parquet
> >
> > On this third point, we can also import their git history if desired.
> > Incorporating them into the build will be comparatively easy to the
> > library integration.
> >
> > There are already some JIRA issues open for some of these, but
> > anything else please create issues so we can keep track.
> >
> > I'm already quite excited to get busy with some refactoring and
> > internals improvements that I had avoided because of the painful
> > development procedure.
> >
> > Thanks,
> > Wes

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by Wes McKinney <we...@gmail.com>.
We should probably also write a blog post on the Apache Arrow website
to increase visibility of this move to the broader community.

On Sat, Sep 8, 2018 at 10:42 AM Wes McKinney <we...@gmail.com> wrote:
>
> Dear all -- the merge has been completed, thank you! 318 patches
> (after the filter-branch grafting procedure) were merged to
> apache/arrow
>
> We have some follow up work to do:
>
> * Move patches from apache/parquet-cpp to apache/arrow
> * Add CONTRIBUTING.md and note to README that patches are no longer
> accepted at the old location
> * Migrate CLI utiltiies and other small items that did not survive the
> merge: tools/, benchmarks/, and examples/
> * Develop new release procedure for Apache Parquet
>
> On this third point, we can also import their git history if desired.
> Incorporating them into the build will be comparatively easy to the
> library integration.
>
> There are already some JIRA issues open for some of these, but
> anything else please create issues so we can keep track.
>
> I'm already quite excited to get busy with some refactoring and
> internals improvements that I had avoided because of the painful
> development procedure.
>
> Thanks,
> Wes

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by Wes McKinney <we...@gmail.com>.
Dear all -- the merge has been completed, thank you! 318 patches
(after the filter-branch grafting procedure) were merged to
apache/arrow

We have some follow up work to do:

* Move patches from apache/parquet-cpp to apache/arrow
* Add CONTRIBUTING.md and note to README that patches are no longer
accepted at the old location
* Migrate CLI utiltiies and other small items that did not survive the
merge: tools/, benchmarks/, and examples/
* Develop new release procedure for Apache Parquet

On this third point, we can also import their git history if desired.
Incorporating them into the build will be comparatively easy to the
library integration.

There are already some JIRA issues open for some of these, but
anything else please create issues so we can keep track.

I'm already quite excited to get busy with some refactoring and
internals improvements that I had avoided because of the painful
development procedure.

Thanks,
Wes
On Fri, Sep 7, 2018 at 11:18 AM Wes McKinney <we...@gmail.com> wrote:
>
> After a lot of time beating my head against Windows toolchain issues
> (I now know a _lot_ about this topic!) I have a green build at
>
> https://github.com/apache/arrow/pull/2453
>
> I'd like to merge this before much more time passes (i.e. today if
> possible) and work on getting the outstanding patches migrated.
>
> The only code that isn't a straight-copy is
>
> https://github.com/apache/arrow/pull/2453/commits/fe5d435c9c58af42df4a37e7c97e37f33ae1857d
>
> This contains all the modifications to the build system and CI to get
> things fully working.
>
> I will have to rebase (preserving the author and committer for each
> patch) and then merge --ff-only to get this in
>
> - Wes
> On Tue, Sep 4, 2018 at 2:22 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > Great. It is definitely going to require some follow up patches to fix
> > up the various packaging tasks, but at least the Linux Python wheels
> > will still be working to start
> > On Tue, Sep 4, 2018 at 2:04 PM Uwe L. Korn <uw...@xhochy.com> wrote:
> > >
> > > Hello Wes,
> > >
> > > I have not much time this week but I hope to squeeze in some minutes tomorrow afternoon to review the code. As this is a very big merge, I want to be extra careful to not break anything really badly. Hopefully more eyes will help.
> > >
> > > Thank you for all the work in pushing this forward in the last days!
> > >
> > > Uwe
> > >
> > > On Tue, Sep 4, 2018, at 6:27 PM, Wes McKinney wrote:
> > > > Dear all,
> > > >
> > > > The repo merge is nearly ready to go modulo some fixes to CI. There
> > > > will be a number of follow up issues to re-establish the various
> > > > (untested) build procedures in parquet-cpp
> > > >
> > > > https://github.com/apache/arrow/pull/2453
> > > >
> > > > I would like to merge this by EOD Wednesday 9/5, or Thursday at
> > > > latest, so we can get the patches from apache/parquet-cpp moved over
> > > > and avoid any disruption to development process. If there are any
> > > > comments please let me know
> > > >
> > > > - Wes
> > > > On Tue, Aug 21, 2018 at 12:23 PM Wes McKinney <we...@gmail.com> wrote:
> > > > >
> > > > > hi all,
> > > > >
> > > > > with 3 binding +1 votes, the vote carries. We will discuss with Apache
> > > > > Arrow about how to specifically proceed
> > > > >
> > > > > I have already done the preparatory work to undertake the merge
> > > > >
> > > > > https://github.com/apache/arrow/pull/2453
> > > > >
> > > > > thanks
> > > > > Wes
> > > > >
> > > > > On Tue, Aug 21, 2018 at 10:41 AM, Wes McKinney <we...@gmail.com> wrote:
> > > > > > Yes, feel free to have a look at
> > > > > >
> > > > > > https://github.com/apache/arrow/pull/2453
> > > > > >
> > > > > > I'm not very in favor of having a commingled non-linear history that
> > > > > > makes git bisect difficult. We will have to discuss on the Arrow ML
> > > > > >
> > > > > > Here's an example from Apache Spark where a similar merge took place
> > > > > >
> > > > > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53
> > > > > >
> > > > > > It would be my preference to have a single squashed commit whose
> > > > > > message attributes the developers of the code and provides links back
> > > > > > to the original commit history in the commit message
> > > > > >
> > > > > > - Wes
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 21, 2018 at 9:52 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> > > > > >> I have a very strong preference to keep the git history. I will have a look tomorrow to find the correct git magic to get a linear history. For me a single merge commit would be ok but I'm fine to spend an additional hour on this if you care strongly about linear history.
> > > > > >>
> > > > > >> Uwe
> > > > > >>
> > > > > >> On Sun, Aug 19, 2018, at 7:36 PM, Wes McKinney wrote:
> > > > > >>> OK. I'm a bit -0 on doing anything that results in Arrow having a
> > > > > >>> nonlinear git history (and rebasing is not really an option) but we
> > > > > >>> can discuss that more later
> > > > > >>>
> > > > > >>> On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> > > > > >>> > +1 on this but also see my comments in the mail on the discussions.
> > > > > >>> >
> > > > > >>> > We should also keep the git history of parquet-cpp, that should not be hard with git and there is probably a StackOverflow answer out there that gives you the commands to do the merge.
> > > > > >>> >
> > > > > >>> > Uwe
> > > > > >>> >
> > > > > >>> > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote:
> > > > > >>> >> In case any are interested: my estimate of the work involved in the
> > > > > >>> >> migration to be about a full day of total work, possibly less. As soon
> > > > > >>> >> as the migration plan is decided upon I intend to execute ASAP so that
> > > > > >>> >> ongoing development efforts are not disrupted.
> > > > > >>> >>
> > > > > >>> >> Additionally, in flight patches do not all need to be merged. Patches
> > > > > >>> >> can be easily edited to apply against the modified repository
> > > > > >>> >> structure
> > > > > >>> >>
> > > > > >>> >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney <we...@gmail.com> wrote:
> > > > > >>> >> > hi all,
> > > > > >>> >> >
> > > > > >>> >> > As discussed on the mailing list [1] I am proposing to undertake a
> > > > > >>> >> > restructuring of the development process for parquet-cpp and its
> > > > > >>> >> > consumption in the Arrow ecosystem to benefit the developers and users
> > > > > >>> >> > of both communities.
> > > > > >>> >> >
> > > > > >>> >> > The specific actions we would take would be:
> > > > > >>> >> >
> > > > > >>> >> > 1) Move the source code currently located at src/ in the
> > > > > >>> >> > apache/parquet-cpp repository [2] to the cpp/src/ directory located in
> > > > > >>> >> > apache/arrow [3]
> > > > > >>> >> >
> > > > > >>> >> > 2) The parquet code tree would remain separate from the Arrow code
> > > > > >>> >> > tree, though the two projects will continue to share code as they do
> > > > > >>> >> > now
> > > > > >>> >> >
> > > > > >>> >> > 3) The build system in apache/parquet-cpp would be effectively
> > > > > >>> >> > deprecated and can be mostly discarded, as it is largely redundant and
> > > > > >>> >> > duplicated from the build system in apache/arrow
> > > > > >>> >> >
> > > > > >>> >> > 4) The Parquet and Arrow C++ communities will collaborate to provide
> > > > > >>> >> > development workflows to enable contributors working exclusively on
> > > > > >>> >> > the Parquet core functionality to be able to work unencumbered with
> > > > > >>> >> > unnecessary build or test dependencies from the rest of the Arrow
> > > > > >>> >> > codebase. Note that parquet-cpp already builds a significant portion
> > > > > >>> >> > of Apache Arrow en route to creating its libraries
> > > > > >>> >> >
> > > > > >>> >> > 5) The Parquet community can create scripts to "cut" Parquet C++
> > > > > >>> >> > releases by packaging up the appropriate components and ensuring that
> > > > > >>> >> > they can be built and installed independently as now
> > > > > >>> >> >
> > > > > >>> >> > 6) The CI processes would be merged -- since we already build the
> > > > > >>> >> > Parquet libraries in Arrow's CI workflow, this would amount to
> > > > > >>> >> > building the Parquet unit tests and running them.
> > > > > >>> >> >
> > > > > >>> >> > 7) Patches contributed that do not involve Arrow-related functionality
> > > > > >>> >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX patches may
> > > > > >>> >> > span both codebases
> > > > > >>> >> >
> > > > > >>> >> > 8) Parquet C++ committers can be given push rights on apache/arrow
> > > > > >>> >> > subject to ongoing good citizenry (e.g. not merging patches that break
> > > > > >>> >> > builds). The Arrow PMC may need to vote on the procedure for offering
> > > > > >>> >> > pass-through commit rights to anyone who has been invited to be a
> > > > > >>> >> > committer for Apache Parquet
> > > > > >>> >> >
> > > > > >>> >> > 9) The contributors who work on both Arrow and Parquet will work in
> > > > > >>> >> > good faith to ensure that that needs of Parquet-only developers (i.e.
> > > > > >>> >> > who consume Parquet files in some way unrelated to the Arrow columnar
> > > > > >>> >> > standard) are accommodated
> > > > > >>> >> >
> > > > > >>> >> > There are a number of particular details we will need to discuss
> > > > > >>> >> > further (such as the specific logistics of the codebase surgery; e.g.
> > > > > >>> >> > how to manage the commit history in apache/parquet-cpp -- do we care
> > > > > >>> >> > about git blame?)
> > > > > >>> >> >
> > > > > >>> >> > This vote is to determine if the Parquet PMC is in favor of working in
> > > > > >>> >> > good faith to execute on the above plan. I will inquire with the Arrow
> > > > > >>> >> > PMC to see if we need to have a corresponding vote there, and also how
> > > > > >>> >> > to handle the management of commit rights.
> > > > > >>> >> >
> > > > > >>> >> > [ ] +1: In favor of implementing the proposed monorepo plan
> > > > > >>> >> > [ ] +0: . . .
> > > > > >>> >> > [ ] -1: Not in favor because . . .
> > > > > >>> >> >
> > > > > >>> >> > Here is my vote: +1.
> > > > > >>> >> >
> > > > > >>> >> > Thank you,
> > > > > >>> >> > Wes
> > > > > >>> >> >
> > > > > >>> >> > [1]: https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E
> > > > > >>> >> > [2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet
> > > > > >>> >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by Wes McKinney <we...@gmail.com>.
After a lot of time beating my head against Windows toolchain issues
(I now know a _lot_ about this topic!) I have a green build at

https://github.com/apache/arrow/pull/2453

I'd like to merge this before much more time passes (i.e. today if
possible) and work on getting the outstanding patches migrated.

The only code that isn't a straight-copy is

https://github.com/apache/arrow/pull/2453/commits/fe5d435c9c58af42df4a37e7c97e37f33ae1857d

This contains all the modifications to the build system and CI to get
things fully working.

I will have to rebase (preserving the author and committer for each
patch) and then merge --ff-only to get this in

- Wes
On Tue, Sep 4, 2018 at 2:22 PM Wes McKinney <we...@gmail.com> wrote:
>
> Great. It is definitely going to require some follow up patches to fix
> up the various packaging tasks, but at least the Linux Python wheels
> will still be working to start
> On Tue, Sep 4, 2018 at 2:04 PM Uwe L. Korn <uw...@xhochy.com> wrote:
> >
> > Hello Wes,
> >
> > I have not much time this week but I hope to squeeze in some minutes tomorrow afternoon to review the code. As this is a very big merge, I want to be extra careful to not break anything really badly. Hopefully more eyes will help.
> >
> > Thank you for all the work in pushing this forward in the last days!
> >
> > Uwe
> >
> > On Tue, Sep 4, 2018, at 6:27 PM, Wes McKinney wrote:
> > > Dear all,
> > >
> > > The repo merge is nearly ready to go modulo some fixes to CI. There
> > > will be a number of follow up issues to re-establish the various
> > > (untested) build procedures in parquet-cpp
> > >
> > > https://github.com/apache/arrow/pull/2453
> > >
> > > I would like to merge this by EOD Wednesday 9/5, or Thursday at
> > > latest, so we can get the patches from apache/parquet-cpp moved over
> > > and avoid any disruption to development process. If there are any
> > > comments please let me know
> > >
> > > - Wes
> > > On Tue, Aug 21, 2018 at 12:23 PM Wes McKinney <we...@gmail.com> wrote:
> > > >
> > > > hi all,
> > > >
> > > > with 3 binding +1 votes, the vote carries. We will discuss with Apache
> > > > Arrow about how to specifically proceed
> > > >
> > > > I have already done the preparatory work to undertake the merge
> > > >
> > > > https://github.com/apache/arrow/pull/2453
> > > >
> > > > thanks
> > > > Wes
> > > >
> > > > On Tue, Aug 21, 2018 at 10:41 AM, Wes McKinney <we...@gmail.com> wrote:
> > > > > Yes, feel free to have a look at
> > > > >
> > > > > https://github.com/apache/arrow/pull/2453
> > > > >
> > > > > I'm not very in favor of having a commingled non-linear history that
> > > > > makes git bisect difficult. We will have to discuss on the Arrow ML
> > > > >
> > > > > Here's an example from Apache Spark where a similar merge took place
> > > > >
> > > > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53
> > > > >
> > > > > It would be my preference to have a single squashed commit whose
> > > > > message attributes the developers of the code and provides links back
> > > > > to the original commit history in the commit message
> > > > >
> > > > > - Wes
> > > > >
> > > > >
> > > > > On Tue, Aug 21, 2018 at 9:52 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> > > > >> I have a very strong preference to keep the git history. I will have a look tomorrow to find the correct git magic to get a linear history. For me a single merge commit would be ok but I'm fine to spend an additional hour on this if you care strongly about linear history.
> > > > >>
> > > > >> Uwe
> > > > >>
> > > > >> On Sun, Aug 19, 2018, at 7:36 PM, Wes McKinney wrote:
> > > > >>> OK. I'm a bit -0 on doing anything that results in Arrow having a
> > > > >>> nonlinear git history (and rebasing is not really an option) but we
> > > > >>> can discuss that more later
> > > > >>>
> > > > >>> On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> > > > >>> > +1 on this but also see my comments in the mail on the discussions.
> > > > >>> >
> > > > >>> > We should also keep the git history of parquet-cpp, that should not be hard with git and there is probably a StackOverflow answer out there that gives you the commands to do the merge.
> > > > >>> >
> > > > >>> > Uwe
> > > > >>> >
> > > > >>> > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote:
> > > > >>> >> In case any are interested: my estimate of the work involved in the
> > > > >>> >> migration to be about a full day of total work, possibly less. As soon
> > > > >>> >> as the migration plan is decided upon I intend to execute ASAP so that
> > > > >>> >> ongoing development efforts are not disrupted.
> > > > >>> >>
> > > > >>> >> Additionally, in flight patches do not all need to be merged. Patches
> > > > >>> >> can be easily edited to apply against the modified repository
> > > > >>> >> structure
> > > > >>> >>
> > > > >>> >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney <we...@gmail.com> wrote:
> > > > >>> >> > hi all,
> > > > >>> >> >
> > > > >>> >> > As discussed on the mailing list [1] I am proposing to undertake a
> > > > >>> >> > restructuring of the development process for parquet-cpp and its
> > > > >>> >> > consumption in the Arrow ecosystem to benefit the developers and users
> > > > >>> >> > of both communities.
> > > > >>> >> >
> > > > >>> >> > The specific actions we would take would be:
> > > > >>> >> >
> > > > >>> >> > 1) Move the source code currently located at src/ in the
> > > > >>> >> > apache/parquet-cpp repository [2] to the cpp/src/ directory located in
> > > > >>> >> > apache/arrow [3]
> > > > >>> >> >
> > > > >>> >> > 2) The parquet code tree would remain separate from the Arrow code
> > > > >>> >> > tree, though the two projects will continue to share code as they do
> > > > >>> >> > now
> > > > >>> >> >
> > > > >>> >> > 3) The build system in apache/parquet-cpp would be effectively
> > > > >>> >> > deprecated and can be mostly discarded, as it is largely redundant and
> > > > >>> >> > duplicated from the build system in apache/arrow
> > > > >>> >> >
> > > > >>> >> > 4) The Parquet and Arrow C++ communities will collaborate to provide
> > > > >>> >> > development workflows to enable contributors working exclusively on
> > > > >>> >> > the Parquet core functionality to be able to work unencumbered with
> > > > >>> >> > unnecessary build or test dependencies from the rest of the Arrow
> > > > >>> >> > codebase. Note that parquet-cpp already builds a significant portion
> > > > >>> >> > of Apache Arrow en route to creating its libraries
> > > > >>> >> >
> > > > >>> >> > 5) The Parquet community can create scripts to "cut" Parquet C++
> > > > >>> >> > releases by packaging up the appropriate components and ensuring that
> > > > >>> >> > they can be built and installed independently as now
> > > > >>> >> >
> > > > >>> >> > 6) The CI processes would be merged -- since we already build the
> > > > >>> >> > Parquet libraries in Arrow's CI workflow, this would amount to
> > > > >>> >> > building the Parquet unit tests and running them.
> > > > >>> >> >
> > > > >>> >> > 7) Patches contributed that do not involve Arrow-related functionality
> > > > >>> >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX patches may
> > > > >>> >> > span both codebases
> > > > >>> >> >
> > > > >>> >> > 8) Parquet C++ committers can be given push rights on apache/arrow
> > > > >>> >> > subject to ongoing good citizenry (e.g. not merging patches that break
> > > > >>> >> > builds). The Arrow PMC may need to vote on the procedure for offering
> > > > >>> >> > pass-through commit rights to anyone who has been invited to be a
> > > > >>> >> > committer for Apache Parquet
> > > > >>> >> >
> > > > >>> >> > 9) The contributors who work on both Arrow and Parquet will work in
> > > > >>> >> > good faith to ensure that that needs of Parquet-only developers (i.e.
> > > > >>> >> > who consume Parquet files in some way unrelated to the Arrow columnar
> > > > >>> >> > standard) are accommodated
> > > > >>> >> >
> > > > >>> >> > There are a number of particular details we will need to discuss
> > > > >>> >> > further (such as the specific logistics of the codebase surgery; e.g.
> > > > >>> >> > how to manage the commit history in apache/parquet-cpp -- do we care
> > > > >>> >> > about git blame?)
> > > > >>> >> >
> > > > >>> >> > This vote is to determine if the Parquet PMC is in favor of working in
> > > > >>> >> > good faith to execute on the above plan. I will inquire with the Arrow
> > > > >>> >> > PMC to see if we need to have a corresponding vote there, and also how
> > > > >>> >> > to handle the management of commit rights.
> > > > >>> >> >
> > > > >>> >> > [ ] +1: In favor of implementing the proposed monorepo plan
> > > > >>> >> > [ ] +0: . . .
> > > > >>> >> > [ ] -1: Not in favor because . . .
> > > > >>> >> >
> > > > >>> >> > Here is my vote: +1.
> > > > >>> >> >
> > > > >>> >> > Thank you,
> > > > >>> >> > Wes
> > > > >>> >> >
> > > > >>> >> > [1]: https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E
> > > > >>> >> > [2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet
> > > > >>> >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by Wes McKinney <we...@gmail.com>.
Great. It is definitely going to require some follow up patches to fix
up the various packaging tasks, but at least the Linux Python wheels
will still be working to start
On Tue, Sep 4, 2018 at 2:04 PM Uwe L. Korn <uw...@xhochy.com> wrote:
>
> Hello Wes,
>
> I have not much time this week but I hope to squeeze in some minutes tomorrow afternoon to review the code. As this is a very big merge, I want to be extra careful to not break anything really badly. Hopefully more eyes will help.
>
> Thank you for all the work in pushing this forward in the last days!
>
> Uwe
>
> On Tue, Sep 4, 2018, at 6:27 PM, Wes McKinney wrote:
> > Dear all,
> >
> > The repo merge is nearly ready to go modulo some fixes to CI. There
> > will be a number of follow up issues to re-establish the various
> > (untested) build procedures in parquet-cpp
> >
> > https://github.com/apache/arrow/pull/2453
> >
> > I would like to merge this by EOD Wednesday 9/5, or Thursday at
> > latest, so we can get the patches from apache/parquet-cpp moved over
> > and avoid any disruption to development process. If there are any
> > comments please let me know
> >
> > - Wes
> > On Tue, Aug 21, 2018 at 12:23 PM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > hi all,
> > >
> > > with 3 binding +1 votes, the vote carries. We will discuss with Apache
> > > Arrow about how to specifically proceed
> > >
> > > I have already done the preparatory work to undertake the merge
> > >
> > > https://github.com/apache/arrow/pull/2453
> > >
> > > thanks
> > > Wes
> > >
> > > On Tue, Aug 21, 2018 at 10:41 AM, Wes McKinney <we...@gmail.com> wrote:
> > > > Yes, feel free to have a look at
> > > >
> > > > https://github.com/apache/arrow/pull/2453
> > > >
> > > > I'm not very in favor of having a commingled non-linear history that
> > > > makes git bisect difficult. We will have to discuss on the Arrow ML
> > > >
> > > > Here's an example from Apache Spark where a similar merge took place
> > > >
> > > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53
> > > >
> > > > It would be my preference to have a single squashed commit whose
> > > > message attributes the developers of the code and provides links back
> > > > to the original commit history in the commit message
> > > >
> > > > - Wes
> > > >
> > > >
> > > > On Tue, Aug 21, 2018 at 9:52 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> > > >> I have a very strong preference to keep the git history. I will have a look tomorrow to find the correct git magic to get a linear history. For me a single merge commit would be ok but I'm fine to spend an additional hour on this if you care strongly about linear history.
> > > >>
> > > >> Uwe
> > > >>
> > > >> On Sun, Aug 19, 2018, at 7:36 PM, Wes McKinney wrote:
> > > >>> OK. I'm a bit -0 on doing anything that results in Arrow having a
> > > >>> nonlinear git history (and rebasing is not really an option) but we
> > > >>> can discuss that more later
> > > >>>
> > > >>> On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> > > >>> > +1 on this but also see my comments in the mail on the discussions.
> > > >>> >
> > > >>> > We should also keep the git history of parquet-cpp, that should not be hard with git and there is probably a StackOverflow answer out there that gives you the commands to do the merge.
> > > >>> >
> > > >>> > Uwe
> > > >>> >
> > > >>> > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote:
> > > >>> >> In case any are interested: my estimate of the work involved in the
> > > >>> >> migration to be about a full day of total work, possibly less. As soon
> > > >>> >> as the migration plan is decided upon I intend to execute ASAP so that
> > > >>> >> ongoing development efforts are not disrupted.
> > > >>> >>
> > > >>> >> Additionally, in flight patches do not all need to be merged. Patches
> > > >>> >> can be easily edited to apply against the modified repository
> > > >>> >> structure
> > > >>> >>
> > > >>> >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney <we...@gmail.com> wrote:
> > > >>> >> > hi all,
> > > >>> >> >
> > > >>> >> > As discussed on the mailing list [1] I am proposing to undertake a
> > > >>> >> > restructuring of the development process for parquet-cpp and its
> > > >>> >> > consumption in the Arrow ecosystem to benefit the developers and users
> > > >>> >> > of both communities.
> > > >>> >> >
> > > >>> >> > The specific actions we would take would be:
> > > >>> >> >
> > > >>> >> > 1) Move the source code currently located at src/ in the
> > > >>> >> > apache/parquet-cpp repository [2] to the cpp/src/ directory located in
> > > >>> >> > apache/arrow [3]
> > > >>> >> >
> > > >>> >> > 2) The parquet code tree would remain separate from the Arrow code
> > > >>> >> > tree, though the two projects will continue to share code as they do
> > > >>> >> > now
> > > >>> >> >
> > > >>> >> > 3) The build system in apache/parquet-cpp would be effectively
> > > >>> >> > deprecated and can be mostly discarded, as it is largely redundant and
> > > >>> >> > duplicated from the build system in apache/arrow
> > > >>> >> >
> > > >>> >> > 4) The Parquet and Arrow C++ communities will collaborate to provide
> > > >>> >> > development workflows to enable contributors working exclusively on
> > > >>> >> > the Parquet core functionality to be able to work unencumbered with
> > > >>> >> > unnecessary build or test dependencies from the rest of the Arrow
> > > >>> >> > codebase. Note that parquet-cpp already builds a significant portion
> > > >>> >> > of Apache Arrow en route to creating its libraries
> > > >>> >> >
> > > >>> >> > 5) The Parquet community can create scripts to "cut" Parquet C++
> > > >>> >> > releases by packaging up the appropriate components and ensuring that
> > > >>> >> > they can be built and installed independently as now
> > > >>> >> >
> > > >>> >> > 6) The CI processes would be merged -- since we already build the
> > > >>> >> > Parquet libraries in Arrow's CI workflow, this would amount to
> > > >>> >> > building the Parquet unit tests and running them.
> > > >>> >> >
> > > >>> >> > 7) Patches contributed that do not involve Arrow-related functionality
> > > >>> >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX patches may
> > > >>> >> > span both codebases
> > > >>> >> >
> > > >>> >> > 8) Parquet C++ committers can be given push rights on apache/arrow
> > > >>> >> > subject to ongoing good citizenry (e.g. not merging patches that break
> > > >>> >> > builds). The Arrow PMC may need to vote on the procedure for offering
> > > >>> >> > pass-through commit rights to anyone who has been invited to be a
> > > >>> >> > committer for Apache Parquet
> > > >>> >> >
> > > >>> >> > 9) The contributors who work on both Arrow and Parquet will work in
> > > >>> >> > good faith to ensure that that needs of Parquet-only developers (i.e.
> > > >>> >> > who consume Parquet files in some way unrelated to the Arrow columnar
> > > >>> >> > standard) are accommodated
> > > >>> >> >
> > > >>> >> > There are a number of particular details we will need to discuss
> > > >>> >> > further (such as the specific logistics of the codebase surgery; e.g.
> > > >>> >> > how to manage the commit history in apache/parquet-cpp -- do we care
> > > >>> >> > about git blame?)
> > > >>> >> >
> > > >>> >> > This vote is to determine if the Parquet PMC is in favor of working in
> > > >>> >> > good faith to execute on the above plan. I will inquire with the Arrow
> > > >>> >> > PMC to see if we need to have a corresponding vote there, and also how
> > > >>> >> > to handle the management of commit rights.
> > > >>> >> >
> > > >>> >> > [ ] +1: In favor of implementing the proposed monorepo plan
> > > >>> >> > [ ] +0: . . .
> > > >>> >> > [ ] -1: Not in favor because . . .
> > > >>> >> >
> > > >>> >> > Here is my vote: +1.
> > > >>> >> >
> > > >>> >> > Thank you,
> > > >>> >> > Wes
> > > >>> >> >
> > > >>> >> > [1]: https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E
> > > >>> >> > [2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet
> > > >>> >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Hello Wes,

I have not much time this week but I hope to squeeze in some minutes tomorrow afternoon to review the code. As this is a very big merge, I want to be extra careful to not break anything really badly. Hopefully more eyes will help.

Thank you for all the work in pushing this forward in the last days!

Uwe

On Tue, Sep 4, 2018, at 6:27 PM, Wes McKinney wrote:
> Dear all,
> 
> The repo merge is nearly ready to go modulo some fixes to CI. There
> will be a number of follow up issues to re-establish the various
> (untested) build procedures in parquet-cpp
> 
> https://github.com/apache/arrow/pull/2453
> 
> I would like to merge this by EOD Wednesday 9/5, or Thursday at
> latest, so we can get the patches from apache/parquet-cpp moved over
> and avoid any disruption to development process. If there are any
> comments please let me know
> 
> - Wes
> On Tue, Aug 21, 2018 at 12:23 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > hi all,
> >
> > with 3 binding +1 votes, the vote carries. We will discuss with Apache
> > Arrow about how to specifically proceed
> >
> > I have already done the preparatory work to undertake the merge
> >
> > https://github.com/apache/arrow/pull/2453
> >
> > thanks
> > Wes
> >
> > On Tue, Aug 21, 2018 at 10:41 AM, Wes McKinney <we...@gmail.com> wrote:
> > > Yes, feel free to have a look at
> > >
> > > https://github.com/apache/arrow/pull/2453
> > >
> > > I'm not very in favor of having a commingled non-linear history that
> > > makes git bisect difficult. We will have to discuss on the Arrow ML
> > >
> > > Here's an example from Apache Spark where a similar merge took place
> > >
> > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53
> > >
> > > It would be my preference to have a single squashed commit whose
> > > message attributes the developers of the code and provides links back
> > > to the original commit history in the commit message
> > >
> > > - Wes
> > >
> > >
> > > On Tue, Aug 21, 2018 at 9:52 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> > >> I have a very strong preference to keep the git history. I will have a look tomorrow to find the correct git magic to get a linear history. For me a single merge commit would be ok but I'm fine to spend an additional hour on this if you care strongly about linear history.
> > >>
> > >> Uwe
> > >>
> > >> On Sun, Aug 19, 2018, at 7:36 PM, Wes McKinney wrote:
> > >>> OK. I'm a bit -0 on doing anything that results in Arrow having a
> > >>> nonlinear git history (and rebasing is not really an option) but we
> > >>> can discuss that more later
> > >>>
> > >>> On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> > >>> > +1 on this but also see my comments in the mail on the discussions.
> > >>> >
> > >>> > We should also keep the git history of parquet-cpp, that should not be hard with git and there is probably a StackOverflow answer out there that gives you the commands to do the merge.
> > >>> >
> > >>> > Uwe
> > >>> >
> > >>> > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote:
> > >>> >> In case any are interested: my estimate of the work involved in the
> > >>> >> migration to be about a full day of total work, possibly less. As soon
> > >>> >> as the migration plan is decided upon I intend to execute ASAP so that
> > >>> >> ongoing development efforts are not disrupted.
> > >>> >>
> > >>> >> Additionally, in flight patches do not all need to be merged. Patches
> > >>> >> can be easily edited to apply against the modified repository
> > >>> >> structure
> > >>> >>
> > >>> >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney <we...@gmail.com> wrote:
> > >>> >> > hi all,
> > >>> >> >
> > >>> >> > As discussed on the mailing list [1] I am proposing to undertake a
> > >>> >> > restructuring of the development process for parquet-cpp and its
> > >>> >> > consumption in the Arrow ecosystem to benefit the developers and users
> > >>> >> > of both communities.
> > >>> >> >
> > >>> >> > The specific actions we would take would be:
> > >>> >> >
> > >>> >> > 1) Move the source code currently located at src/ in the
> > >>> >> > apache/parquet-cpp repository [2] to the cpp/src/ directory located in
> > >>> >> > apache/arrow [3]
> > >>> >> >
> > >>> >> > 2) The parquet code tree would remain separate from the Arrow code
> > >>> >> > tree, though the two projects will continue to share code as they do
> > >>> >> > now
> > >>> >> >
> > >>> >> > 3) The build system in apache/parquet-cpp would be effectively
> > >>> >> > deprecated and can be mostly discarded, as it is largely redundant and
> > >>> >> > duplicated from the build system in apache/arrow
> > >>> >> >
> > >>> >> > 4) The Parquet and Arrow C++ communities will collaborate to provide
> > >>> >> > development workflows to enable contributors working exclusively on
> > >>> >> > the Parquet core functionality to be able to work unencumbered with
> > >>> >> > unnecessary build or test dependencies from the rest of the Arrow
> > >>> >> > codebase. Note that parquet-cpp already builds a significant portion
> > >>> >> > of Apache Arrow en route to creating its libraries
> > >>> >> >
> > >>> >> > 5) The Parquet community can create scripts to "cut" Parquet C++
> > >>> >> > releases by packaging up the appropriate components and ensuring that
> > >>> >> > they can be built and installed independently as now
> > >>> >> >
> > >>> >> > 6) The CI processes would be merged -- since we already build the
> > >>> >> > Parquet libraries in Arrow's CI workflow, this would amount to
> > >>> >> > building the Parquet unit tests and running them.
> > >>> >> >
> > >>> >> > 7) Patches contributed that do not involve Arrow-related functionality
> > >>> >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX patches may
> > >>> >> > span both codebases
> > >>> >> >
> > >>> >> > 8) Parquet C++ committers can be given push rights on apache/arrow
> > >>> >> > subject to ongoing good citizenry (e.g. not merging patches that break
> > >>> >> > builds). The Arrow PMC may need to vote on the procedure for offering
> > >>> >> > pass-through commit rights to anyone who has been invited to be a
> > >>> >> > committer for Apache Parquet
> > >>> >> >
> > >>> >> > 9) The contributors who work on both Arrow and Parquet will work in
> > >>> >> > good faith to ensure that that needs of Parquet-only developers (i.e.
> > >>> >> > who consume Parquet files in some way unrelated to the Arrow columnar
> > >>> >> > standard) are accommodated
> > >>> >> >
> > >>> >> > There are a number of particular details we will need to discuss
> > >>> >> > further (such as the specific logistics of the codebase surgery; e.g.
> > >>> >> > how to manage the commit history in apache/parquet-cpp -- do we care
> > >>> >> > about git blame?)
> > >>> >> >
> > >>> >> > This vote is to determine if the Parquet PMC is in favor of working in
> > >>> >> > good faith to execute on the above plan. I will inquire with the Arrow
> > >>> >> > PMC to see if we need to have a corresponding vote there, and also how
> > >>> >> > to handle the management of commit rights.
> > >>> >> >
> > >>> >> > [ ] +1: In favor of implementing the proposed monorepo plan
> > >>> >> > [ ] +0: . . .
> > >>> >> > [ ] -1: Not in favor because . . .
> > >>> >> >
> > >>> >> > Here is my vote: +1.
> > >>> >> >
> > >>> >> > Thank you,
> > >>> >> > Wes
> > >>> >> >
> > >>> >> > [1]: https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E
> > >>> >> > [2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet
> > >>> >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src