You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/05/03 13:23:00 UTC

[jira] [Commented] (ARROW-10351) [C++][Flight] See if reading/writing to gRPC get/put streams asynchronously helps performance

    [ https://issues.apache.org/jira/browse/ARROW-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338366#comment-17338366 ] 

David Li commented on ARROW-10351:
----------------------------------

[~yibocai] I rebased the benchmark ([https://github.com/lidavidm/arrow/tree/flight-poc]) and ran with real data (the NYC taxi dataset, for the month of 2009/01: [https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2009-01.csv]).

The command in all cases was {{env OMP_NUM_THREADS=4 ./release/arrow-flight-benchmark -test_put -num_perf_runs=4 -num_streams=4 -num_threads=1 -data_file data.feather}}.

Master (no compression):
{noformat}
Testing method: DoPut
Using spawned TCP server
Server running with pid 20909
Server host: localhost
Server port: 31337
Server host: localhost
Server port: 31337
Number of perf runs: 4
Number of concurrent gets/puts: 1
Batch size: 5265782
Batches written: 3456
Bytes written: 18198543232
Nanos: 8007363952
Speed: 2167.44 MB/s
Throughput: 431.603 batches/s
Latency mean: 1309 us
Latency quantile=0.5: 1156 us
Latency quantile=0.95: 2135 us
Latency quantile=0.99: 2783 us
Latency max: 3876 us
{noformat}
flight-poc, with compression, without asynchronous compression:
{noformat}
Testing method: DoPut
Using spawned TCP server
Server running with pid 13773
Server host: localhost
Server port: 31337
Server host: localhost
Server port: 31337
Number of perf runs: 4
Number of concurrent gets/puts: 1
Batch size: 5265782
Batches written: 3456
Bytes written: 18198543232
Nanos: 23333072829
Speed: 743.815 MB/s
Throughput: 148.116 batches/s
Latency mean: 5666 us
Latency quantile=0.5: 5544 us
Latency quantile=0.95: 6460 us
Latency quantile=0.99: 6831 us
Latency max: 8569 us
{noformat}
flight-poc, with compression, with async compression:
{noformat}
Testing method: DoPut
Using spawned TCP server
Server running with pid 13689
Server host: localhost
Server port: 31337
Server host: localhost
Server port: 31337
Number of perf runs: 4
Number of concurrent gets/puts: 1
Batch size: 5265782
Batches written: 3456
Bytes written: 18198543232
Nanos: 22178585229
Speed: 782.533 MB/s
Throughput: 155.826 batches/s
Latency mean: 5320 us
Latency quantile=0.5: 5183 us
Latency quantile=0.95: 6227 us
Latency quantile=0.99: 6840 us
Latency max: 9255 us
{noformat}
So it seems with real data, things get even worse. Async compression is better than sync compression, but neither is in the ballpark of simply not compressing. Of course this is all over localhost which is likely not fair to compression so maybe I should try over EC2 next (~600MiB/s max bandwidth).

> [C++][Flight] See if reading/writing to gRPC get/put streams asynchronously helps performance
> ---------------------------------------------------------------------------------------------
>
>                 Key: ARROW-10351
>                 URL: https://issues.apache.org/jira/browse/ARROW-10351
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, FlightRPC
>            Reporter: Wes McKinney
>            Priority: Major
>
> We don't use any asynchronous concepts in the way that Flight is implemented now, i.e. IPC deconstruction/reconstruction (which may include compression!) is not performed concurrent with moving FlightData objects through the gRPC machinery, which may yield suboptimal performance. 
> It might be better to apply an actor-type approach where a dedicated thread retrieves and prepares the next raw IPC message (within a Future) while the current IPC message is being processed -- that way reading/writing to/from the gRPC stream is not blocked on the IPC code doing its thing. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)