You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Neal Richardson <ne...@gmail.com> on 2019/10/09 18:57:00 UTC

Looking ahead to 1.0

Congratulations everyone on 0.15! I know a lot of hard work went into
it, not only in the software itself but also in the build and release
process.

Once you've caught your breath from the release, we should start
thinking about what's in scope for our next release, the big 1.0. To
get us started (or restarted, since we did discuss 1.0 before the
flatbuffer alignment issue came up), I've created
https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release
based on our past release wiki pages.

A good place to begin would be to list, either in "blocker" Jiras or
bullet points on the document, the key features and tasks we must
resolve before 1.0. For example, I get the sense that we need to
overhaul the documentation, but that should be expressed in a more
concrete, actionable way.

Neal

Re: Looking ahead to 1.0

Posted by John Muehlhausen <jg...@jgm.org>.
ARROW-6837 (which, er, includes ARROW-6836) and ARROW-5916 have PRs.

Would appreciate some feedback.  I will finish the Python part of 6837 when
I know I'm on the right track.

Thanks,
John

On Thu, Oct 10, 2019 at 9:54 AM John Muehlhausen <jg...@jgm.org> wrote:

> The format change is ARROW-6836 ... add a custom_metadata:[KeyValue] field
> to the Footer table in File.fbs
>
> The other change (slicing a recordbatch to honor RecordBatch.length rather
> than array length if the former is smaller) will hopefully not affect the
> format.
>
>
> On Wed, Oct 9, 2019 at 11:55 PM Wes McKinney <we...@gmail.com> wrote:
>
>> Hi John,
>>
>> Since the 1.0.0 release is focused on Format stability, probably the
>> only real "blockers" will be ensuring that we have hardened multiple
>> implementations (in particular C++ and Java) of the columnar format as
>> specified with integration tests to prove it. The issues you listed
>> sound more like C++ library changes to me?
>>
>> If you want to propose Format-related changes, that would need to
>> happen right away otherwise the ship will sail on that.
>>
>> - Wes
>>
>> On Wed, Oct 9, 2019 at 9:08 PM John Muehlhausen <jg...@jgm.org> wrote:
>> >
>> > ARROW-5916
>> > ARROW-6836/6837
>> >
>> > These are of particular interest to me because they enable recordbatch
>> > "incrementalism" which is useful for streaming applications:
>> >
>> > ARROW-5916 allows a recordbatch to pre-allocate space for future records
>> > that have not yet been populated, making it safe for readers to consume
>> the
>> > partial batch.
>> >
>> > ARROW-6836/6837 allows a file of record batches to be extended at the
>> end,
>> > without re-writing the beginning, while including the idea that the
>> > custom_metadata may change with each update.  (custom_metadata in the
>> > Schema is not a good candidate because Schema also appears at the
>> beginning
>> > of the file.)
>> >
>> > While these are not blockers for me quite yet, they soon will be!  If I
>> > wanted to ensure that these are in 1.0, what is my deadline for
>> > implementation and test cases?  Can such a note be made on the wiki?
>> > Should I change the priority in Jira?
>> >
>> > Thanks,
>> > John
>> >
>> > On Wed, Oct 9, 2019 at 2:57 PM Neal Richardson <
>> neal.p.richardson@gmail.com>
>> > wrote:
>> >
>> > > Congratulations everyone on 0.15! I know a lot of hard work went into
>> > > it, not only in the software itself but also in the build and release
>> > > process.
>> > >
>> > > Once you've caught your breath from the release, we should start
>> > > thinking about what's in scope for our next release, the big 1.0. To
>> > > get us started (or restarted, since we did discuss 1.0 before the
>> > > flatbuffer alignment issue came up), I've created
>> > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release
>> > > based on our past release wiki pages.
>> > >
>> > > A good place to begin would be to list, either in "blocker" Jiras or
>> > > bullet points on the document, the key features and tasks we must
>> > > resolve before 1.0. For example, I get the sense that we need to
>> > > overhaul the documentation, but that should be expressed in a more
>> > > concrete, actionable way.
>> > >
>> > > Neal
>> > >
>>
>

Re: Looking ahead to 1.0

Posted by John Muehlhausen <jg...@jgm.org>.
The format change is ARROW-6836 ... add a custom_metadata:[KeyValue] field
to the Footer table in File.fbs

The other change (slicing a recordbatch to honor RecordBatch.length rather
than array length if the former is smaller) will hopefully not affect the
format.


On Wed, Oct 9, 2019 at 11:55 PM Wes McKinney <we...@gmail.com> wrote:

> Hi John,
>
> Since the 1.0.0 release is focused on Format stability, probably the
> only real "blockers" will be ensuring that we have hardened multiple
> implementations (in particular C++ and Java) of the columnar format as
> specified with integration tests to prove it. The issues you listed
> sound more like C++ library changes to me?
>
> If you want to propose Format-related changes, that would need to
> happen right away otherwise the ship will sail on that.
>
> - Wes
>
> On Wed, Oct 9, 2019 at 9:08 PM John Muehlhausen <jg...@jgm.org> wrote:
> >
> > ARROW-5916
> > ARROW-6836/6837
> >
> > These are of particular interest to me because they enable recordbatch
> > "incrementalism" which is useful for streaming applications:
> >
> > ARROW-5916 allows a recordbatch to pre-allocate space for future records
> > that have not yet been populated, making it safe for readers to consume
> the
> > partial batch.
> >
> > ARROW-6836/6837 allows a file of record batches to be extended at the
> end,
> > without re-writing the beginning, while including the idea that the
> > custom_metadata may change with each update.  (custom_metadata in the
> > Schema is not a good candidate because Schema also appears at the
> beginning
> > of the file.)
> >
> > While these are not blockers for me quite yet, they soon will be!  If I
> > wanted to ensure that these are in 1.0, what is my deadline for
> > implementation and test cases?  Can such a note be made on the wiki?
> > Should I change the priority in Jira?
> >
> > Thanks,
> > John
> >
> > On Wed, Oct 9, 2019 at 2:57 PM Neal Richardson <
> neal.p.richardson@gmail.com>
> > wrote:
> >
> > > Congratulations everyone on 0.15! I know a lot of hard work went into
> > > it, not only in the software itself but also in the build and release
> > > process.
> > >
> > > Once you've caught your breath from the release, we should start
> > > thinking about what's in scope for our next release, the big 1.0. To
> > > get us started (or restarted, since we did discuss 1.0 before the
> > > flatbuffer alignment issue came up), I've created
> > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release
> > > based on our past release wiki pages.
> > >
> > > A good place to begin would be to list, either in "blocker" Jiras or
> > > bullet points on the document, the key features and tasks we must
> > > resolve before 1.0. For example, I get the sense that we need to
> > > overhaul the documentation, but that should be expressed in a more
> > > concrete, actionable way.
> > >
> > > Neal
> > >
>

Re: Looking ahead to 1.0

Posted by Wes McKinney <we...@gmail.com>.
Hi John,

Since the 1.0.0 release is focused on Format stability, probably the
only real "blockers" will be ensuring that we have hardened multiple
implementations (in particular C++ and Java) of the columnar format as
specified with integration tests to prove it. The issues you listed
sound more like C++ library changes to me?

If you want to propose Format-related changes, that would need to
happen right away otherwise the ship will sail on that.

- Wes

On Wed, Oct 9, 2019 at 9:08 PM John Muehlhausen <jg...@jgm.org> wrote:
>
> ARROW-5916
> ARROW-6836/6837
>
> These are of particular interest to me because they enable recordbatch
> "incrementalism" which is useful for streaming applications:
>
> ARROW-5916 allows a recordbatch to pre-allocate space for future records
> that have not yet been populated, making it safe for readers to consume the
> partial batch.
>
> ARROW-6836/6837 allows a file of record batches to be extended at the end,
> without re-writing the beginning, while including the idea that the
> custom_metadata may change with each update.  (custom_metadata in the
> Schema is not a good candidate because Schema also appears at the beginning
> of the file.)
>
> While these are not blockers for me quite yet, they soon will be!  If I
> wanted to ensure that these are in 1.0, what is my deadline for
> implementation and test cases?  Can such a note be made on the wiki?
> Should I change the priority in Jira?
>
> Thanks,
> John
>
> On Wed, Oct 9, 2019 at 2:57 PM Neal Richardson <ne...@gmail.com>
> wrote:
>
> > Congratulations everyone on 0.15! I know a lot of hard work went into
> > it, not only in the software itself but also in the build and release
> > process.
> >
> > Once you've caught your breath from the release, we should start
> > thinking about what's in scope for our next release, the big 1.0. To
> > get us started (or restarted, since we did discuss 1.0 before the
> > flatbuffer alignment issue came up), I've created
> > https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release
> > based on our past release wiki pages.
> >
> > A good place to begin would be to list, either in "blocker" Jiras or
> > bullet points on the document, the key features and tasks we must
> > resolve before 1.0. For example, I get the sense that we need to
> > overhaul the documentation, but that should be expressed in a more
> > concrete, actionable way.
> >
> > Neal
> >

Re: Looking ahead to 1.0

Posted by John Muehlhausen <jg...@jgm.org>.
ARROW-5916
ARROW-6836/6837

These are of particular interest to me because they enable recordbatch
"incrementalism" which is useful for streaming applications:

ARROW-5916 allows a recordbatch to pre-allocate space for future records
that have not yet been populated, making it safe for readers to consume the
partial batch.

ARROW-6836/6837 allows a file of record batches to be extended at the end,
without re-writing the beginning, while including the idea that the
custom_metadata may change with each update.  (custom_metadata in the
Schema is not a good candidate because Schema also appears at the beginning
of the file.)

While these are not blockers for me quite yet, they soon will be!  If I
wanted to ensure that these are in 1.0, what is my deadline for
implementation and test cases?  Can such a note be made on the wiki?
Should I change the priority in Jira?

Thanks,
John

On Wed, Oct 9, 2019 at 2:57 PM Neal Richardson <ne...@gmail.com>
wrote:

> Congratulations everyone on 0.15! I know a lot of hard work went into
> it, not only in the software itself but also in the build and release
> process.
>
> Once you've caught your breath from the release, we should start
> thinking about what's in scope for our next release, the big 1.0. To
> get us started (or restarted, since we did discuss 1.0 before the
> flatbuffer alignment issue came up), I've created
> https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release
> based on our past release wiki pages.
>
> A good place to begin would be to list, either in "blocker" Jiras or
> bullet points on the document, the key features and tasks we must
> resolve before 1.0. For example, I get the sense that we need to
> overhaul the documentation, but that should be expressed in a more
> concrete, actionable way.
>
> Neal
>