You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "David Lee (Jira)" <ji...@apache.org> on 2022/05/23 14:56:00 UTC

[jira] [Created] (ARROW-16629) Apache Arrow Flight transport speed improvement for list structures

David Lee created ARROW-16629:
---------------------------------

             Summary: Apache Arrow Flight transport speed improvement for list structures
                 Key: ARROW-16629
                 URL: https://issues.apache.org/jira/browse/ARROW-16629
             Project: Apache Arrow
          Issue Type: Improvement
          Components: FlightRPC
    Affects Versions: 8.0.0
            Reporter: David Lee


I just started testing using Arrow Flight to send results from a GraphQL server with FlightServer() running on i.

GraphQL defines a schema for your data output which can be mapped to an Arrow schema so I thought it would make sense to try using Arrow Flight to transport results instead of using REST style JSON records.

Arrow Flight was 66% faster in all case, but it didn't scale as the number of child records increased. I suspect that serializing structs or lists needs some improvement..

Here is the discussion I opened including links to test scripts.

[https://github.com/mirumee/ariadne/discussions/867]

10 records it was 0.049 seconds faster or 80% faster
10000 records it was 0.109 seconds faster or 66% faster
10 million records it was 54 seconds faster or 66% faster.

Also here is the data structure that is sent across the wire..

pyarrow.Table
data: struct<test_lists: struct<float_list: list<item: double>, int_list: list<item: int64>, length: int64, string_list: list<item: string>, time_spent: double>>
child 0, test_lists: struct<float_list: list<item: double>, int_list: list<item: int64>, length: int64, string_list: list<item: string>, time_spent: double>
child 0, float_list: list<item: double>
child 0, item: double
child 1, int_list: list<item: int64>
child 0, item: int64
child 2, length: int64
child 3, string_list: list<item: string>
child 0, item: string
child 4, time_spent: double

data: [
-- is_valid: all not null
-- child 0 type: struct<float_list: list<item: double>, int_list: list<item: int64>, length: int64, string_list: list<item: string>, time_spent: double>
-- is_valid: all not null
-- child 0 type: list<item: double>
[[13.500371672273381,17.747395152140353,28.973205439157457,1.361443415643098,19.029191125636135,14.62284718057391,18.44333922481529,7.906278860251386,14.402464768126993,5.826040531772251]]
-- child 1 type: list<item: int64>
[[23,3,21,15,20,4,10,16,23,25]]
-- child 2 type: int64
[10]
-- child 3 type: list<item: string>
[["qypsupwtxy","vrxptpspyt","qpvruwsuqq","ywwpyxrvrt","wswutpxxqv","tsyypstxvv","ytprpqsxsx","wtwsxvprvu","suwtrvqvwp","wtsrwywwty"]]
-- child 4 type: double
[0]]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)