You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Bryan Cutler <cu...@gmail.com> on 2020/02/25 01:13:59 UTC

[DISCUSS] Flight testing inconsistency for empty batches

While looking into Null type testing for ARROW-7899, a couple small issues
came up regarding Flight integration testing with empty batches (row count
== 0) that could be worked out with a quick discussion. It seems there is a
small difference between the C++ and Java Flight servers when there are
empty record batches at the end of a stream, more details in PR
https://github.com/apache/arrow/pull/6476.

The Java server sends all record batches, even the empty ones, and the test
client verifies each of these batches matches the batches read from a JSON
file. The C++ servers seems to recognize if the end of the stream is only
empty batches (please correct me if I'm wrong) and will not serve them.
This seems reasonable, as there is no more actual data left in the stream.
The C++ test client reads all batches into a table, does the same for the
JSON file, and compares final Tables. I also noticed that empty batches in
the middle of the stream will be served.  My questions are:

1) What is the expected behavior of a Flight server for empty record
batches, can they be ignored and not sent to the Client?

2) Is it good enough to test against a final concatenation of all batches
in the stream or should each batch be verified individually to ensure the
server is sending out correctly batched data?

Thanks,
Bryan

Re: [DISCUSS] Flight testing inconsistency for empty batches

Posted by Bryan Cutler <cu...@gmail.com>.
Thanks all, I agree with validating each record batch independently. I made
https://issues.apache.org/jira/browse/ARROW-7966 to ensure this, and that
will hopefully iron out any kinks in the different implementations.

Thanks,
Bryan

On Wed, Feb 26, 2020 at 3:13 PM Wes McKinney <we...@gmail.com> wrote:

> I agree with independent validation.
>
> On Tue, Feb 25, 2020 at 2:55 PM David Li <li...@gmail.com> wrote:
> >
> > Hey Bryan,
> >
> > Thanks for looking into this issue. I would vote that we should
> > validate each batch independently, so we can catch issues related to
> > the structure of the data and not just the content. C++ doesn't do any
> > detection of empty batches per se, but on both ends it reads all the
> > data into a table, which would eliminate any empty batches.
> >
> > It also wouldn't be reasonable to stop sending batches that are empty,
> > because Flight lets you attach metadata to batches, and so an empty
> > batch might still have metadata that the client or server wants.
> >
> > Best,
> > David
> >
> > On 2/24/20, Bryan Cutler <cu...@gmail.com> wrote:
> > > While looking into Null type testing for ARROW-7899, a couple small
> issues
> > > came up regarding Flight integration testing with empty batches (row
> count
> > > == 0) that could be worked out with a quick discussion. It seems there
> is a
> > > small difference between the C++ and Java Flight servers when there are
> > > empty record batches at the end of a stream, more details in PR
> > > https://github.com/apache/arrow/pull/6476.
> > >
> > > The Java server sends all record batches, even the empty ones, and the
> test
> > > client verifies each of these batches matches the batches read from a
> JSON
> > > file. The C++ servers seems to recognize if the end of the stream is
> only
> > > empty batches (please correct me if I'm wrong) and will not serve them.
> > > This seems reasonable, as there is no more actual data left in the
> stream.
> > > The C++ test client reads all batches into a table, does the same for
> the
> > > JSON file, and compares final Tables. I also noticed that empty
> batches in
> > > the middle of the stream will be served.  My questions are:
> > >
> > > 1) What is the expected behavior of a Flight server for empty record
> > > batches, can they be ignored and not sent to the Client?
> > >
> > > 2) Is it good enough to test against a final concatenation of all
> batches
> > > in the stream or should each batch be verified individually to ensure
> the
> > > server is sending out correctly batched data?
> > >
> > > Thanks,
> > > Bryan
> > >
>

Re: [DISCUSS] Flight testing inconsistency for empty batches

Posted by Wes McKinney <we...@gmail.com>.
I agree with independent validation.

On Tue, Feb 25, 2020 at 2:55 PM David Li <li...@gmail.com> wrote:
>
> Hey Bryan,
>
> Thanks for looking into this issue. I would vote that we should
> validate each batch independently, so we can catch issues related to
> the structure of the data and not just the content. C++ doesn't do any
> detection of empty batches per se, but on both ends it reads all the
> data into a table, which would eliminate any empty batches.
>
> It also wouldn't be reasonable to stop sending batches that are empty,
> because Flight lets you attach metadata to batches, and so an empty
> batch might still have metadata that the client or server wants.
>
> Best,
> David
>
> On 2/24/20, Bryan Cutler <cu...@gmail.com> wrote:
> > While looking into Null type testing for ARROW-7899, a couple small issues
> > came up regarding Flight integration testing with empty batches (row count
> > == 0) that could be worked out with a quick discussion. It seems there is a
> > small difference between the C++ and Java Flight servers when there are
> > empty record batches at the end of a stream, more details in PR
> > https://github.com/apache/arrow/pull/6476.
> >
> > The Java server sends all record batches, even the empty ones, and the test
> > client verifies each of these batches matches the batches read from a JSON
> > file. The C++ servers seems to recognize if the end of the stream is only
> > empty batches (please correct me if I'm wrong) and will not serve them.
> > This seems reasonable, as there is no more actual data left in the stream.
> > The C++ test client reads all batches into a table, does the same for the
> > JSON file, and compares final Tables. I also noticed that empty batches in
> > the middle of the stream will be served.  My questions are:
> >
> > 1) What is the expected behavior of a Flight server for empty record
> > batches, can they be ignored and not sent to the Client?
> >
> > 2) Is it good enough to test against a final concatenation of all batches
> > in the stream or should each batch be verified individually to ensure the
> > server is sending out correctly batched data?
> >
> > Thanks,
> > Bryan
> >

Re: [DISCUSS] Flight testing inconsistency for empty batches

Posted by David Li <li...@gmail.com>.
Hey Bryan,

Thanks for looking into this issue. I would vote that we should
validate each batch independently, so we can catch issues related to
the structure of the data and not just the content. C++ doesn't do any
detection of empty batches per se, but on both ends it reads all the
data into a table, which would eliminate any empty batches.

It also wouldn't be reasonable to stop sending batches that are empty,
because Flight lets you attach metadata to batches, and so an empty
batch might still have metadata that the client or server wants.

Best,
David

On 2/24/20, Bryan Cutler <cu...@gmail.com> wrote:
> While looking into Null type testing for ARROW-7899, a couple small issues
> came up regarding Flight integration testing with empty batches (row count
> == 0) that could be worked out with a quick discussion. It seems there is a
> small difference between the C++ and Java Flight servers when there are
> empty record batches at the end of a stream, more details in PR
> https://github.com/apache/arrow/pull/6476.
>
> The Java server sends all record batches, even the empty ones, and the test
> client verifies each of these batches matches the batches read from a JSON
> file. The C++ servers seems to recognize if the end of the stream is only
> empty batches (please correct me if I'm wrong) and will not serve them.
> This seems reasonable, as there is no more actual data left in the stream.
> The C++ test client reads all batches into a table, does the same for the
> JSON file, and compares final Tables. I also noticed that empty batches in
> the middle of the stream will be served.  My questions are:
>
> 1) What is the expected behavior of a Flight server for empty record
> batches, can they be ignored and not sent to the Client?
>
> 2) Is it good enough to test against a final concatenation of all batches
> in the stream or should each batch be verified individually to ensure the
> server is sending out correctly batched data?
>
> Thanks,
> Bryan
>