You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by James Duong <ja...@bitquilltech.com> on 2022/06/04 01:11:28 UTC

Re: [C++] Arrow Flight Client and concurrency

Going back to this, the problem also happens when using Dremio's Java
Flight Client test application when using VectorUnloader to move the data
from the FlightStream to a local VectorSchemaRoot.
https://github.com/dremio-hub/arrow-flight-client-examples/blob/2811c650ab66651804a7dcbd78a907510c460f11/java/src/main/java/com/adhoc/flight/client/AdhocFlightClient.java#L349

However the problem _does_ not happen when using Dremio's Python Flight
Client test application.
https://github.com/dremio-hub/arrow-flight-client-examples/blob/2811c650ab66651804a7dcbd78a907510c460f11/python/example.py#L249

So this leads me to think there's something I'm not understanding about how
to iterate over record batches correctly since the PyArrow Flight driver is
built from the C++ driver.

On Fri, May 27, 2022 at 7:40 AM David Li <li...@apache.org> wrote:

> So if the batch length is wrong then I wouldn't expect it to be affected
> by Flight or concurrency, since Flight is just using the IPC machinery and
> these values are immutable. But to double check this I would ValidateFull()
> the batches on both sides to start with just to isolate where the problem
> starts - if the data being written is bad or if it's only bad after being
> read. It does sound suspicious this is happening in a different location
> each time.
>
> On Fri, May 27, 2022, at 10:10, James Duong wrote:
>
> Attached is a screenshot with a partial stacktrace (obscured for
> confidentiality reasons):
> [image: image.png]
> The calling code is calling Slice and trying to get one row at offset
> 3418, but the length of the array is 3417.
>
> The reported batch size is 3468 I believe. This problem is not
> happening consistently at the same offset.
>
> On Fri, May 27, 2022 at 3:36 AM David Li <li...@apache.org> wrote:
>
>
> Ideally if you had a reproduction or a stack trace that might help. In
> general I'm not aware of any issues around concurrency. There are maybe two
> things to be aware of: gRPC does not play well with forking, if you're
> using that for multiple processes; and the C++ implementation doesn't
> validate that batch schemas match the stream schema - this can cause an
> error on the client if bad data gets in via an application bug.
>
> On Thu, May 26, 2022, at 21:08, James Duong wrote:
>
> I've been trying to use the C++ Arrow Flight Client in an application and
> been hitting a crash when querying a non-trivial dataset (about 360K rows).
> The dataset is a mix of data types (strings, integers, doubles, lists, and
> structs).
>
> The application is a combination of native code and .NET code. The
> application spawns multiple processes and uses multiple threads, though I
> don't think the FlightClient gets shared across threads.
>
> I see crashes when querying this dataset. Occasionally the number of
> elements in one of the StringArrays returns does not match the number of
> rows reported in a FlightStreamChunkIterator. This doesn't appear to happen
> consistently in the same place but always happens when querying this data
> set.
>
> Are there any concurrency issues to be aware of when working with
> FlightClient in C++?
>
> --
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> https://www.bitquilltech.com
>
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information.  Any unauthorized
> review, use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.
>
>
>
> --
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> https://www.bitquilltech.com
>
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information.  Any unauthorized
> review, use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.
>
>

-- 

*James Duong*
Lead Software Developer
Bit Quill Technologies Inc.
Direct: +1.604.562.6082 | jamesd@bitquilltech.com
https://www.bitquilltech.com

This email message is for the sole use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized review,
use, disclosure, or distribution is prohibited.  If you are not the
intended recipient, please contact the sender by reply email and destroy
all copies of the original message.  Thank you.

Re: [C++] Arrow Flight Client and concurrency

Posted by David Li <li...@apache.org>.
Ah, thanks for reporting back. Glad you figured it out.

On Mon, Jun 6, 2022, at 18:02, James Duong wrote:
> This turned out to be a bug in the C++ client application code. We were caching an array from a previous chunk and not updating when moving to the next chunk in some scenarios.
> 
> On Fri, Jun 3, 2022 at 10:06 PM David Li <li...@apache.org> wrote:
>> __
>> Hmm, interesting. It's unclear what'd be different here. 
>> 
>> In Java, the zero-copy write optimization can cause weird issues like this (as the application must make sure it does not re-use buffers), but as a server/writer-side option that should apply equally to all clients (and this is not enabled by default). 
>> 
>> I'm not sure why C++/Python would/could differ, unless the reader itself is getting shared across threads in C++.
>> 
>> On Fri, Jun 3, 2022, at 21:11, James Duong wrote:
>>> Going back to this, the problem also happens when using Dremio's Java Flight Client test application when using VectorUnloader to move the data from the FlightStream to a local VectorSchemaRoot.
>>> https://github.com/dremio-hub/arrow-flight-client-examples/blob/2811c650ab66651804a7dcbd78a907510c460f11/java/src/main/java/com/adhoc/flight/client/AdhocFlightClient.java#L349
>>> 
>>> However the problem _does_ not happen when using Dremio's Python Flight Client test application.
>>> https://github.com/dremio-hub/arrow-flight-client-examples/blob/2811c650ab66651804a7dcbd78a907510c460f11/python/example.py#L249
>>> 
>>> So this leads me to think there's something I'm not understanding about how to iterate over record batches correctly since the PyArrow Flight driver is built from the C++ driver.
>>> 
>>> On Fri, May 27, 2022 at 7:40 AM David Li <li...@apache.org> wrote:
>>>> __
>>>> So if the batch length is wrong then I wouldn't expect it to be affected by Flight or concurrency, since Flight is just using the IPC machinery and these values are immutable. But to double check this I would ValidateFull() the batches on both sides to start with just to isolate where the problem starts - if the data being written is bad or if it's only bad after being read. It does sound suspicious this is happening in a different location each time. 
>>>> 
>>>> On Fri, May 27, 2022, at 10:10, James Duong wrote:
>>>>> Attached is a screenshot with a partial stacktrace (obscured for confidentiality reasons):
>>>>> image.png
>>>>> The calling code is calling Slice and trying to get one row at offset 3418, but the length of the array is 3417.
>>>>> 
>>>>> The reported batch size is 3468 I believe. This problem is not happening consistently at the same offset.
>>>>> 
>>>>> On Fri, May 27, 2022 at 3:36 AM David Li <li...@apache.org> wrote:
>>>>>> __
>>>>>> Ideally if you had a reproduction or a stack trace that might help. In general I'm not aware of any issues around concurrency. There are maybe two things to be aware of: gRPC does not play well with forking, if you're using that for multiple processes; and the C++ implementation doesn't validate that batch schemas match the stream schema - this can cause an error on the client if bad data gets in via an application bug. 
>>>>>> 
>>>>>> On Thu, May 26, 2022, at 21:08, James Duong wrote:
>>>>>>> I've been trying to use the C++ Arrow Flight Client in an application and been hitting a crash when querying a non-trivial dataset (about 360K rows). The dataset is a mix of data types (strings, integers, doubles, lists, and structs).
>>>>>>> 
>>>>>>> The application is a combination of native code and .NET code. The application spawns multiple processes and uses multiple threads, though I don't think the FlightClient gets shared across threads.
>>>>>>> 
>>>>>>> I see crashes when querying this dataset. Occasionally the number of elements in one of the StringArrays returns does not match the number of rows reported in a FlightStreamChunkIterator. This doesn't appear to happen consistently in the same place but always happens when querying this data set.
>>>>>>> 
>>>>>>> Are there any concurrency issues to be aware of when working with FlightClient in C++?
>>>>>>> 
>>>>>>> -- 
>>>>>>> *James Duong*
>>>>>>> Lead Software Developer
>>>>>>> Bit Quill Technologies Inc.
>>>>>>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>>>>> https://www.bitquilltech.com
>>>>>>> 
>>>>>>> 
>>>>>>> This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure, or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.  Thank you.
>>>>> 
>>>>> 
>>>>> -- 
>>>>> *James Duong*
>>>>> Lead Software Developer
>>>>> Bit Quill Technologies Inc.
>>>>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>>> https://www.bitquilltech.com
>>>>> 
>>>>> 
>>>>> This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure, or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.  Thank you.
>>> 
>>> 
>>> -- 
>>> *James Duong*
>>> Lead Software Developer
>>> Bit Quill Technologies Inc.
>>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>> https://www.bitquilltech.com
>>> 
>>> 
>>> This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure, or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.  Thank you.
>> 
> 
> 
> -- 
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> https://www.bitquilltech.com
> 
> 
> This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure, or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.  Thank you.

Re: [C++] Arrow Flight Client and concurrency

Posted by James Duong <ja...@bitquilltech.com>.
This turned out to be a bug in the C++ client application code. We were
caching an array from a previous chunk and not updating when moving to the
next chunk in some scenarios.

On Fri, Jun 3, 2022 at 10:06 PM David Li <li...@apache.org> wrote:

> Hmm, interesting. It's unclear what'd be different here.
>
> In Java, the zero-copy write optimization can cause weird issues like this
> (as the application must make sure it does not re-use buffers), but as a
> server/writer-side option that should apply equally to all clients (and
> this is not enabled by default).
>
> I'm not sure why C++/Python would/could differ, unless the reader itself
> is getting shared across threads in C++.
>
> On Fri, Jun 3, 2022, at 21:11, James Duong wrote:
>
> Going back to this, the problem also happens when using Dremio's Java
> Flight Client test application when using VectorUnloader to move the data
> from the FlightStream to a local VectorSchemaRoot.
>
> https://github.com/dremio-hub/arrow-flight-client-examples/blob/2811c650ab66651804a7dcbd78a907510c460f11/java/src/main/java/com/adhoc/flight/client/AdhocFlightClient.java#L349
>
> However the problem _does_ not happen when using Dremio's Python Flight
> Client test application.
>
> https://github.com/dremio-hub/arrow-flight-client-examples/blob/2811c650ab66651804a7dcbd78a907510c460f11/python/example.py#L249
>
> So this leads me to think there's something I'm not understanding about
> how to iterate over record batches correctly since the PyArrow Flight
> driver is built from the C++ driver.
>
> On Fri, May 27, 2022 at 7:40 AM David Li <li...@apache.org> wrote:
>
>
> So if the batch length is wrong then I wouldn't expect it to be affected
> by Flight or concurrency, since Flight is just using the IPC machinery and
> these values are immutable. But to double check this I would ValidateFull()
> the batches on both sides to start with just to isolate where the problem
> starts - if the data being written is bad or if it's only bad after being
> read. It does sound suspicious this is happening in a different location
> each time.
>
> On Fri, May 27, 2022, at 10:10, James Duong wrote:
>
> Attached is a screenshot with a partial stacktrace (obscured for
> confidentiality reasons):
> [image: image.png]
> The calling code is calling Slice and trying to get one row at offset
> 3418, but the length of the array is 3417.
>
> The reported batch size is 3468 I believe. This problem is not
> happening consistently at the same offset.
>
> On Fri, May 27, 2022 at 3:36 AM David Li <li...@apache.org> wrote:
>
>
> Ideally if you had a reproduction or a stack trace that might help. In
> general I'm not aware of any issues around concurrency. There are maybe two
> things to be aware of: gRPC does not play well with forking, if you're
> using that for multiple processes; and the C++ implementation doesn't
> validate that batch schemas match the stream schema - this can cause an
> error on the client if bad data gets in via an application bug.
>
> On Thu, May 26, 2022, at 21:08, James Duong wrote:
>
> I've been trying to use the C++ Arrow Flight Client in an application and
> been hitting a crash when querying a non-trivial dataset (about 360K rows).
> The dataset is a mix of data types (strings, integers, doubles, lists, and
> structs).
>
> The application is a combination of native code and .NET code. The
> application spawns multiple processes and uses multiple threads, though I
> don't think the FlightClient gets shared across threads.
>
> I see crashes when querying this dataset. Occasionally the number of
> elements in one of the StringArrays returns does not match the number of
> rows reported in a FlightStreamChunkIterator. This doesn't appear to happen
> consistently in the same place but always happens when querying this data
> set.
>
> Are there any concurrency issues to be aware of when working with
> FlightClient in C++?
>
> --
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> https://www.bitquilltech.com
>
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information.  Any unauthorized
> review, use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.
>
>
>
> --
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> https://www.bitquilltech.com
>
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information.  Any unauthorized
> review, use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.
>
>
>
> --
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> https://www.bitquilltech.com
>
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information.  Any unauthorized
> review, use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.
>
>
>

-- 

*James Duong*
Lead Software Developer
Bit Quill Technologies Inc.
Direct: +1.604.562.6082 | jamesd@bitquilltech.com
https://www.bitquilltech.com

This email message is for the sole use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized review,
use, disclosure, or distribution is prohibited.  If you are not the
intended recipient, please contact the sender by reply email and destroy
all copies of the original message.  Thank you.

Re: [C++] Arrow Flight Client and concurrency

Posted by David Li <li...@apache.org>.
Hmm, interesting. It's unclear what'd be different here. 

In Java, the zero-copy write optimization can cause weird issues like this (as the application must make sure it does not re-use buffers), but as a server/writer-side option that should apply equally to all clients (and this is not enabled by default). 

I'm not sure why C++/Python would/could differ, unless the reader itself is getting shared across threads in C++.

On Fri, Jun 3, 2022, at 21:11, James Duong wrote:
> Going back to this, the problem also happens when using Dremio's Java Flight Client test application when using VectorUnloader to move the data from the FlightStream to a local VectorSchemaRoot.
> https://github.com/dremio-hub/arrow-flight-client-examples/blob/2811c650ab66651804a7dcbd78a907510c460f11/java/src/main/java/com/adhoc/flight/client/AdhocFlightClient.java#L349
> 
> However the problem _does_ not happen when using Dremio's Python Flight Client test application.
> https://github.com/dremio-hub/arrow-flight-client-examples/blob/2811c650ab66651804a7dcbd78a907510c460f11/python/example.py#L249
> 
> So this leads me to think there's something I'm not understanding about how to iterate over record batches correctly since the PyArrow Flight driver is built from the C++ driver.
> 
> On Fri, May 27, 2022 at 7:40 AM David Li <li...@apache.org> wrote:
>> __
>> So if the batch length is wrong then I wouldn't expect it to be affected by Flight or concurrency, since Flight is just using the IPC machinery and these values are immutable. But to double check this I would ValidateFull() the batches on both sides to start with just to isolate where the problem starts - if the data being written is bad or if it's only bad after being read. It does sound suspicious this is happening in a different location each time. 
>> 
>> On Fri, May 27, 2022, at 10:10, James Duong wrote:
>>> Attached is a screenshot with a partial stacktrace (obscured for confidentiality reasons):
>>> image.png
>>> The calling code is calling Slice and trying to get one row at offset 3418, but the length of the array is 3417.
>>> 
>>> The reported batch size is 3468 I believe. This problem is not happening consistently at the same offset.
>>> 
>>> On Fri, May 27, 2022 at 3:36 AM David Li <li...@apache.org> wrote:
>>>> __
>>>> Ideally if you had a reproduction or a stack trace that might help. In general I'm not aware of any issues around concurrency. There are maybe two things to be aware of: gRPC does not play well with forking, if you're using that for multiple processes; and the C++ implementation doesn't validate that batch schemas match the stream schema - this can cause an error on the client if bad data gets in via an application bug. 
>>>> 
>>>> On Thu, May 26, 2022, at 21:08, James Duong wrote:
>>>>> I've been trying to use the C++ Arrow Flight Client in an application and been hitting a crash when querying a non-trivial dataset (about 360K rows). The dataset is a mix of data types (strings, integers, doubles, lists, and structs).
>>>>> 
>>>>> The application is a combination of native code and .NET code. The application spawns multiple processes and uses multiple threads, though I don't think the FlightClient gets shared across threads.
>>>>> 
>>>>> I see crashes when querying this dataset. Occasionally the number of elements in one of the StringArrays returns does not match the number of rows reported in a FlightStreamChunkIterator. This doesn't appear to happen consistently in the same place but always happens when querying this data set.
>>>>> 
>>>>> Are there any concurrency issues to be aware of when working with FlightClient in C++?
>>>>> 
>>>>> -- 
>>>>> *James Duong*
>>>>> Lead Software Developer
>>>>> Bit Quill Technologies Inc.
>>>>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>>> https://www.bitquilltech.com
>>>>> 
>>>>> 
>>>>> This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure, or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.  Thank you.
>>> 
>>> 
>>> -- 
>>> *James Duong*
>>> Lead Software Developer
>>> Bit Quill Technologies Inc.
>>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>> https://www.bitquilltech.com
>>> 
>>> 
>>> This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure, or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.  Thank you.
> 
> 
> -- 
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> https://www.bitquilltech.com
> 
> 
> This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure, or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.  Thank you.