You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Nong Li <no...@gmail.com> on 2017/08/02 17:26:06 UTC

[Proposal] Merge parquet-mr and parquet-format repos

Hi,

I'd like to propose retiring the parquet-format repo and moving the code
into
parquet-mr. Having the splits repos causes unnecessary complexity and
doesn't
seem to offer much benefit. For example:
   1. Making changes that require format changes and implementation is
split. Things
       go out of sync.
   2. More release version/release process management
   3. More things to do and understand getting started

I don't recall why it was originally split; probably an artifact of how it
was born. If
this makes sense, we can consider merging parquet-cpp as well.

The specific proposal is to add a commit to parquet-format to indicate it
is moved
and merged into parquet-mr and move the current parquet-format files into
parquet-mr.
The next release of parquet-mr would release both, with the same version.

Thoughts?
Nong

Re: [Proposal] Merge parquet-mr and parquet-format repos

Posted by Wes McKinney <we...@gmail.com>.
To Uwe's point I think we might wait 6-12 months before merging C++
with the main Parquet repo until we've reached functional feature
completeness in our Arrow reader/writer. Until then we will have quite
frequent releases with incremental new functionality and possibly API
changes.

On Thu, Aug 3, 2017 at 12:03 PM, Julien Le Dem <ju...@gmail.com> wrote:
> +1 on merging the repos assuming we find a sane way of doing so that
> somewhat preserves history.
> big +1 on more frequent releases. Reducing friction for releases is a big
> win.
> I'm fine with doing this in 2 steps (mr + format then cpp) or 1 (mr +
> format + cpp).
>
>
> On Thu, Aug 3, 2017 at 5:18 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
>
>> I'm in favour of merging parquet-format and parquet-mr but at the
>> moment, I would not merge MR and CPP, development speeds and release
>> cycles differ and thus it would be more an inconvenience to have them in
>> the same repo.
>>
>> Uwe
>>
>> On Thu, Aug 3, 2017, at 02:37 AM, Deepak Majeti wrote:
>> > +1. I like the idea of a common repository as well. This will ease the
>> > Java
>> > and C++ interoperability. Currently, Java treats parquet files written by
>> > C++ differently.
>> >
>> > On Wed, Aug 2, 2017 at 7:59 PM, Wes McKinney <we...@gmail.com>
>> wrote:
>> >
>> > > +1. In doing so we may want to rename the repository to apache/parquet
>> > > to reflect the expanded scope.
>> > >
>> > > We could also discuss merging in the C++ implementation, though the
>> > > main reservation I would have would be version numbers as we will
>> > > likely be releasing parquet-cpp more frequently than parquet-java has
>> > > been releasing since the implementation continues to evolve. If the
>> > > Java folks are comfortable with more frequent releases (and we would
>> > > want to add a document explaining the respective API stability of each
>> > > component, e.g. C++ will be a bit less stable for a while) then this
>> > > seems OK to me.
>> > >
>> > > On Wed, Aug 2, 2017 at 1:26 PM, Nong Li <no...@gmail.com> wrote:
>> > > > Hi,
>> > > >
>> > > > I'd like to propose retiring the parquet-format repo and moving the
>> code
>> > > > into
>> > > > parquet-mr. Having the splits repos causes unnecessary complexity and
>> > > > doesn't
>> > > > seem to offer much benefit. For example:
>> > > >    1. Making changes that require format changes and implementation
>> is
>> > > > split. Things
>> > > >        go out of sync.
>> > > >    2. More release version/release process management
>> > > >    3. More things to do and understand getting started
>> > > >
>> > > > I don't recall why it was originally split; probably an artifact of
>> how
>> > > it
>> > > > was born. If
>> > > > this makes sense, we can consider merging parquet-cpp as well.
>> > > >
>> > > > The specific proposal is to add a commit to parquet-format to
>> indicate it
>> > > > is moved
>> > > > and merged into parquet-mr and move the current parquet-format files
>> into
>> > > > parquet-mr.
>> > > > The next release of parquet-mr would release both, with the same
>> version.
>> > > >
>> > > > Thoughts?
>> > > > Nong
>> > >
>> >
>> >
>> >
>> > --
>> > regards,
>> > Deepak Majeti
>>

Re: [Proposal] Merge parquet-mr and parquet-format repos

Posted by Julien Le Dem <ju...@gmail.com>.
+1 on merging the repos assuming we find a sane way of doing so that
somewhat preserves history.
big +1 on more frequent releases. Reducing friction for releases is a big
win.
I'm fine with doing this in 2 steps (mr + format then cpp) or 1 (mr +
format + cpp).


On Thu, Aug 3, 2017 at 5:18 AM, Uwe L. Korn <uw...@xhochy.com> wrote:

> I'm in favour of merging parquet-format and parquet-mr but at the
> moment, I would not merge MR and CPP, development speeds and release
> cycles differ and thus it would be more an inconvenience to have them in
> the same repo.
>
> Uwe
>
> On Thu, Aug 3, 2017, at 02:37 AM, Deepak Majeti wrote:
> > +1. I like the idea of a common repository as well. This will ease the
> > Java
> > and C++ interoperability. Currently, Java treats parquet files written by
> > C++ differently.
> >
> > On Wed, Aug 2, 2017 at 7:59 PM, Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > +1. In doing so we may want to rename the repository to apache/parquet
> > > to reflect the expanded scope.
> > >
> > > We could also discuss merging in the C++ implementation, though the
> > > main reservation I would have would be version numbers as we will
> > > likely be releasing parquet-cpp more frequently than parquet-java has
> > > been releasing since the implementation continues to evolve. If the
> > > Java folks are comfortable with more frequent releases (and we would
> > > want to add a document explaining the respective API stability of each
> > > component, e.g. C++ will be a bit less stable for a while) then this
> > > seems OK to me.
> > >
> > > On Wed, Aug 2, 2017 at 1:26 PM, Nong Li <no...@gmail.com> wrote:
> > > > Hi,
> > > >
> > > > I'd like to propose retiring the parquet-format repo and moving the
> code
> > > > into
> > > > parquet-mr. Having the splits repos causes unnecessary complexity and
> > > > doesn't
> > > > seem to offer much benefit. For example:
> > > >    1. Making changes that require format changes and implementation
> is
> > > > split. Things
> > > >        go out of sync.
> > > >    2. More release version/release process management
> > > >    3. More things to do and understand getting started
> > > >
> > > > I don't recall why it was originally split; probably an artifact of
> how
> > > it
> > > > was born. If
> > > > this makes sense, we can consider merging parquet-cpp as well.
> > > >
> > > > The specific proposal is to add a commit to parquet-format to
> indicate it
> > > > is moved
> > > > and merged into parquet-mr and move the current parquet-format files
> into
> > > > parquet-mr.
> > > > The next release of parquet-mr would release both, with the same
> version.
> > > >
> > > > Thoughts?
> > > > Nong
> > >
> >
> >
> >
> > --
> > regards,
> > Deepak Majeti
>

Re: [Proposal] Merge parquet-mr and parquet-format repos

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
I'm in favour of merging parquet-format and parquet-mr but at the
moment, I would not merge MR and CPP, development speeds and release
cycles differ and thus it would be more an inconvenience to have them in
the same repo.

Uwe

On Thu, Aug 3, 2017, at 02:37 AM, Deepak Majeti wrote:
> +1. I like the idea of a common repository as well. This will ease the
> Java
> and C++ interoperability. Currently, Java treats parquet files written by
> C++ differently.
> 
> On Wed, Aug 2, 2017 at 7:59 PM, Wes McKinney <we...@gmail.com> wrote:
> 
> > +1. In doing so we may want to rename the repository to apache/parquet
> > to reflect the expanded scope.
> >
> > We could also discuss merging in the C++ implementation, though the
> > main reservation I would have would be version numbers as we will
> > likely be releasing parquet-cpp more frequently than parquet-java has
> > been releasing since the implementation continues to evolve. If the
> > Java folks are comfortable with more frequent releases (and we would
> > want to add a document explaining the respective API stability of each
> > component, e.g. C++ will be a bit less stable for a while) then this
> > seems OK to me.
> >
> > On Wed, Aug 2, 2017 at 1:26 PM, Nong Li <no...@gmail.com> wrote:
> > > Hi,
> > >
> > > I'd like to propose retiring the parquet-format repo and moving the code
> > > into
> > > parquet-mr. Having the splits repos causes unnecessary complexity and
> > > doesn't
> > > seem to offer much benefit. For example:
> > >    1. Making changes that require format changes and implementation is
> > > split. Things
> > >        go out of sync.
> > >    2. More release version/release process management
> > >    3. More things to do and understand getting started
> > >
> > > I don't recall why it was originally split; probably an artifact of how
> > it
> > > was born. If
> > > this makes sense, we can consider merging parquet-cpp as well.
> > >
> > > The specific proposal is to add a commit to parquet-format to indicate it
> > > is moved
> > > and merged into parquet-mr and move the current parquet-format files into
> > > parquet-mr.
> > > The next release of parquet-mr would release both, with the same version.
> > >
> > > Thoughts?
> > > Nong
> >
> 
> 
> 
> -- 
> regards,
> Deepak Majeti

Re: [Proposal] Merge parquet-mr and parquet-format repos

Posted by Deepak Majeti <ma...@gmail.com>.
+1. I like the idea of a common repository as well. This will ease the Java
and C++ interoperability. Currently, Java treats parquet files written by
C++ differently.

On Wed, Aug 2, 2017 at 7:59 PM, Wes McKinney <we...@gmail.com> wrote:

> +1. In doing so we may want to rename the repository to apache/parquet
> to reflect the expanded scope.
>
> We could also discuss merging in the C++ implementation, though the
> main reservation I would have would be version numbers as we will
> likely be releasing parquet-cpp more frequently than parquet-java has
> been releasing since the implementation continues to evolve. If the
> Java folks are comfortable with more frequent releases (and we would
> want to add a document explaining the respective API stability of each
> component, e.g. C++ will be a bit less stable for a while) then this
> seems OK to me.
>
> On Wed, Aug 2, 2017 at 1:26 PM, Nong Li <no...@gmail.com> wrote:
> > Hi,
> >
> > I'd like to propose retiring the parquet-format repo and moving the code
> > into
> > parquet-mr. Having the splits repos causes unnecessary complexity and
> > doesn't
> > seem to offer much benefit. For example:
> >    1. Making changes that require format changes and implementation is
> > split. Things
> >        go out of sync.
> >    2. More release version/release process management
> >    3. More things to do and understand getting started
> >
> > I don't recall why it was originally split; probably an artifact of how
> it
> > was born. If
> > this makes sense, we can consider merging parquet-cpp as well.
> >
> > The specific proposal is to add a commit to parquet-format to indicate it
> > is moved
> > and merged into parquet-mr and move the current parquet-format files into
> > parquet-mr.
> > The next release of parquet-mr would release both, with the same version.
> >
> > Thoughts?
> > Nong
>



-- 
regards,
Deepak Majeti

Re: [Proposal] Merge parquet-mr and parquet-format repos

Posted by Wes McKinney <we...@gmail.com>.
+1. In doing so we may want to rename the repository to apache/parquet
to reflect the expanded scope.

We could also discuss merging in the C++ implementation, though the
main reservation I would have would be version numbers as we will
likely be releasing parquet-cpp more frequently than parquet-java has
been releasing since the implementation continues to evolve. If the
Java folks are comfortable with more frequent releases (and we would
want to add a document explaining the respective API stability of each
component, e.g. C++ will be a bit less stable for a while) then this
seems OK to me.

On Wed, Aug 2, 2017 at 1:26 PM, Nong Li <no...@gmail.com> wrote:
> Hi,
>
> I'd like to propose retiring the parquet-format repo and moving the code
> into
> parquet-mr. Having the splits repos causes unnecessary complexity and
> doesn't
> seem to offer much benefit. For example:
>    1. Making changes that require format changes and implementation is
> split. Things
>        go out of sync.
>    2. More release version/release process management
>    3. More things to do and understand getting started
>
> I don't recall why it was originally split; probably an artifact of how it
> was born. If
> this makes sense, we can consider merging parquet-cpp as well.
>
> The specific proposal is to add a commit to parquet-format to indicate it
> is moved
> and merged into parquet-mr and move the current parquet-format files into
> parquet-mr.
> The next release of parquet-mr would release both, with the same version.
>
> Thoughts?
> Nong