You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2020/09/03 06:44:00 UTC

[jira] [Commented] (ARROW-9905) [C++][Flight] Evaluate FlightRPC performance

    [ https://issues.apache.org/jira/browse/ARROW-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189875#comment-17189875 ] 

Yibo Cai commented on ARROW-9905:
---------------------------------

h1. Evaluate Flight performance on *single* server

Benchmark Flight throughput, latency and scalability on a 64 core skylake server.
h2. Hardware
 - Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
 - 64 CPUs (2 sockets x 16 cores x 2 threads)
 - 128G RAM

h2. Software
 - records-per-batch = 4096 (batch size = 128K)
 - increasing num-threads = 1,2,4,8,16,24
 - num-streams = num-threads
 - arrow source code version: git commit 7b2307f8a

h2. Benchmark steps

Test server has two numa nodes, with 32 CPUs and 64G memory on each node.
 To avoid cross node memory access latency, flight server is bounded to node#1, and flight client bounded to node#0.
h3. Start flight server
{code:bash}
HOST=localhost
numactl --membind=1 --cpunodebind=1 ./arrow-flight-perf-server --server-host ${HOST}
{code}
h3. Run flight benchmark

Benchmark with increasing number of threads/streams.
{code:bash}
HOST=localhost

# Number of threads/streams
T="1 2 4 8 16 24"

echo TEST GET
for THREADS in ${T}; do
  echo ==================================================
  echo threads = ${THREADS}
  numactl --membind=0 --cpunodebind=0 ./arrow-flight-benchmark --num-streams ${THREADS} --num-threads ${THREADS} --server-host ${HOST} --records-per-batch 4096 --records-per-stream 345676543
done

echo TEST PUT
for THREADS in ${T}; do
  echo ==================================================
  echo threads = ${THREADS}
  numactl --membind=0 --cpunodebind=0 ./arrow-flight-benchmark --num-streams ${THREADS} --num-threads ${THREADS} --server-host ${HOST} --records-per-batch 4096 --records-per-stream 345676543 --test-put
done
{code}
h2. Benchmark result
From below graph, Flight has good scalability.
!flight-skylake.png|width=912,height=310! 

> [C++][Flight] Evaluate FlightRPC performance
> --------------------------------------------
>
>                 Key: ARROW-9905
>                 URL: https://issues.apache.org/jira/browse/ARROW-9905
>             Project: Apache Arrow
>          Issue Type: Task
>          Components: C++, FlightRPC
>            Reporter: Yibo Cai
>            Assignee: Yibo Cai
>            Priority: Major
>         Attachments: flight-skylake.png
>
>
> We did some benchmmark tests about flight throughput, latency and scalability. Would like to share our test results and steps. Comments welcomed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)