You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/04/26 12:51:02 UTC

[jira] [Closed] (ARROW-12265) flight_data_from_arrow_batch sends too much data

     [ https://issues.apache.org/jira/browse/ARROW-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Lamb closed ARROW-12265.
-------------------------------
    Resolution: Invalid

> flight_data_from_arrow_batch sends too much data
> ------------------------------------------------
>
>                 Key: ARROW-12265
>                 URL: https://issues.apache.org/jira/browse/ARROW-12265
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: FlightRPC, Rust
>    Affects Versions: 4.0.0
>            Reporter: Marko Mikulicic
>            Priority: Major
>
> Arrow arrays can share the same backing store, even if the array is just a "view" of a slice of another array.
> Yet, when `flight_data_from_arrow_batch` encodes the arrays into a FlightData, it blindly copies the entire buffer ready to be sent over the wire.
> Thus, for example, when DataFusion uses the `arrow::compute::limit` operator to return a few elements of an array, we still end up with a the full (potentially) large array being sent over the wire.
>  
> Since encoding the array in a FlightData involves copying the data anyway, perhaps it would be beneficial to take the Array length in consideration and copy only the parts of the buffer that contain actual data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)