You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Li, Jiajia" <ji...@intel.com> on 2020/04/24 01:16:42 UTC

Question regarding Arrow Flight Throughput

Hi all,

I have some doubts about arrow flight throughput. In this article(https://www.dremio.com/understanding-apache-arrow-flight/),  it said "High efficiency. Flight is designed to work without any serialization or deserialization of records, and with zero memory copies, achieving over 20 Gbps per core."  And in the other article (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/), it said "As far as absolute speed, in our C++ data throughput benchmarks, we are seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in about 4 seconds:"

Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a server with two cores, the throughput will be 5 GB/s?  But I have run the arrow-flight-benchmark, my server with 40 cores, but the result is " Speed: 2420.82 MB/s" .

So what should I do to increase the throughput? Please correct me if I am wrong. Thank you in advance!

Thanks,
Jiajia




Re: Question regarding Arrow Flight Throughput

Posted by Wes McKinney <we...@gmail.com>.
gRPC breaks large buffers into smaller pieces that have to be
reassembled after receipt -- this does add some overhead. I would
guess that circumventing gRPC for the transfer of each IPC messages
would be the route to throughput beyond the 20-40Gbps that we're able
to achieve now.

On Fri, Apr 24, 2020 at 1:57 PM Antoine Pitrou <an...@python.org> wrote:
>
>
> I'm not sure a new transport for gRPC would change anything.  gRPC
> currently uses HTTP (HTTP2 I believe), and there's no reason for HTTP to
> be the culprit here.
>
> Regards
>
> Antoine.
>
>
> Le 24/04/2020 à 20:48, Micah Kornfield a écrit :
> > A couple of questions:
> > 1.  For same node transport would doing something with Plasma be a
> > reasonable approach?
> > 2.  What are the advantages/disadvantages of creating a new transport for
> > gRPC [1] vs building an entirely new backend of flight?
> >
> > Thanks,
> > Micah
> >
> > [1] https://github.com/grpc/grpc/issues/7931
> >
> > On Fri, Apr 24, 2020 at 11:37 AM David Li <li...@gmail.com> wrote:
> >
> >> Having alternative backends for Flight has been a goal from the start,
> >> hence why gRPC is wrapped and generally not exposed to the user. I
> >> would be interested in collaborating on an HTTP/1 backend that is
> >> accessible from the browser (or via an alternative transport meeting
> >> the same requirements, e.g. WebSockets).
> >>
> >> In terms of tuning gRPC, taking a performance profile would be useful.
> >> I remember there are some TODOs on the C++ side about copies that
> >> sometimes occur due to gRPC that we don't quite understand yet. I
> >> spent quite a bit of time a while ago trying to tune gRPC, but like
> >> Antoine, couldn't find any easy wins.
> >>
> >> Best,
> >> David
> >>
> >> On 4/24/20, Antoine Pitrou <an...@python.org> wrote:
> >>>
> >>> Hi Jiajia,
> >>>
> >>> I see.  I think there are two possible avenues to try and improve this:
> >>>
> >>> * better use gRPC in the hope of achieving higher performance.  This
> >>> doesn't seem to be easy, though.  I've already tried to change some of
> >>> the parameters listed here, but didn't get any benefits:
> >>> https://grpc.github.io/grpc/cpp/group__grpc__arg__keys.html
> >>>
> >>> (perhaps there are other, lower-level APIs that we should use? I don't
> >>> know)
> >>>
> >>> * take the time to design and start implementing another I/O backend for
> >>> Flight.  gRPC is just one possible backend, but the Flight remote API is
> >>> simple enough that we could envision other backends (for example a HTTP
> >>> REST-like API).  If you opt for this, I would strongly suggest start the
> >>> discussion on the mailing-list in order to coordinate with other
> >>> developers.
> >>>
> >>> Best regards
> >>>
> >>> Antoine.
> >>>
> >>>
> >>> Le 24/04/2020 à 19:16, Li, Jiajia a écrit :
> >>>> Hi Antoine,
> >>>>
> >>>>> The question, though, is: do you *need* those higher speeds on
> >> localhost?
> >>>>>  In which context are you considering Flight?
> >>>>
> >>>> We want to send large data(in cache) to the data analytic application(in
> >>>> local).
> >>>>
> >>>> Thanks,
> >>>> Jiajia
> >>>>
> >>>> -----Original Message-----
> >>>> From: Antoine Pitrou <an...@python.org>
> >>>> Sent: Saturday, April 25, 2020 1:01 AM
> >>>> To: dev@arrow.apache.org
> >>>> Subject: Re: Question regarding Arrow Flight Throughput
> >>>>
> >>>>
> >>>> Hi Jiajia,
> >>>>
> >>>> It's true one should be able to reach higher speeds.  For example, I can
> >>>> reach more than 7 GB/s on a simple TCP connection, in pure Python, using
> >>>> only two threads:
> >>>> https://gist.github.com/pitrou/6cdf7bf6ce7a35f4073a7820a891f78e
> >>>>
> >>>> The question, though, is: do you *need* those higher speeds on
> >> localhost?
> >>>> In which context are you considering Flight?
> >>>>
> >>>> Regards
> >>>>
> >>>> Antoine.
> >>>>
> >>>>
> >>>> Le 24/04/2020 à 18:52, Li, Jiajia a écrit :
> >>>>> Hi Antoine,
> >>>>>
> >>>>> I think here 5 GB/s is in localhost. As localhost does not depend on
> >>>>> network speed and I've checked the CPU is not the bottleneck when
> >> running
> >>>>> benchmark, I think flight can get a higher throughput.
> >>>>>
> >>>>> Thanks,
> >>>>> Jiajia
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Antoine Pitrou <an...@python.org>
> >>>>> Sent: Friday, April 24, 2020 5:47 PM
> >>>>> To: dev@arrow.apache.org
> >>>>> Subject: Re: Question regarding Arrow Flight Throughput
> >>>>>
> >>>>>
> >>>>> The problem with gRPC is that it was designed with relatively small
> >>>>> requests and payloads in mind.  We're using it for a large data
> >>>>> application which it wasn't optimized for.  Also, its threading model
> >> is
> >>>>> inscrutable (yielding those weird benchmark results).
> >>>>>
> >>>>> However, 5 GB/s is indeed very good if between different machines.
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> Antoine.
> >>>>>
> >>>>>
> >>>>> Le 24/04/2020 à 05:15, Wes McKinney a écrit :
> >>>>>> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> hi Jiajia,
> >>>>>>>
> >>>>>>> See my TODO here
> >>>>>>>
> >>>>>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/fli
> >>>>>>> g
> >>>>>>> ht_benchmark.cc#L182
> >>>>>>>
> >>>>>>> My guess is that if you want to get faster throughput with multiple
> >>>>>>> cores, you need to run more than one server and serve on different
> >>>>>>> ports rather than having all threads go to the same server through
> >>>>>>> the same port. I don't think we've made any manycore scalability
> >>>>>>> claims, though.
> >>>>>>>
> >>>>>>> I tried to run this myself but I can't get the benchmark executable
> >>>>>>> to run on my machine right now -- this seems to be a regression.
> >>>>>>>
> >>>>>>> https://issues.apache.org/jira/browse/ARROW-8578
> >>>>>>
> >>>>>> This turned out to be a false alarm and went away after a reboot.
> >>>>>>
> >>>>>> On my laptop a single thread is faster than multiple threads making
> >>>>>> requests to a sole server, so this supports the hypothesis that
> >>>>>> concurrent requests on the same port does not increase throughput.
> >>>>>>
> >>>>>> $ ./release/arrow-flight-benchmark -num_threads 1
> >>>>>> Speed: 5131.73 MB/s
> >>>>>>
> >>>>>> $ ./release/arrow-flight-benchmark -num_threads 16
> >>>>>> Speed: 4258.58 MB/s
> >>>>>>
> >>>>>> I'd suggest improving the benchmark executable to spawn multiple
> >>>>>> servers as the next step to study multicore throughput. That said
> >>>>>> with the above being ~40gbps already it's unclear how higher
> >>>>>> throughput can go realistically.
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> - Wes
> >>>>>>>
> >>>>>>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> I have some doubts about arrow flight throughput. In this
> >>>>>>>> article(https://www.dremio.com/understanding-apache-arrow-flight/),
> >>>>>>>> it said "High efficiency. Flight is designed to work without any
> >>>>>>>> serialization or deserialization of records, and with zero memory
> >>>>>>>> copies, achieving over 20 Gbps per core."  And in the other article
> >>>>>>>> (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/
> >> ),
> >>>>>>>> it said "As far as absolute speed, in our C++ data throughput
> >>>>>>>> benchmarks, we are seeing end-to-end TCP throughput in excess of
> >>>>>>>> 2-3GB/s on localhost without TLS enabled. This benchmark shows a
> >>>>>>>> transfer of ~12 gigabytes of data in about 4 seconds:"
> >>>>>>>>
> >>>>>>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a
> >>>>>>>> server with two cores, the throughput will be 5 GB/s?  But I have
> >> run
> >>>>>>>> the arrow-flight-benchmark, my server with 40 cores, but the result
> >> is
> >>>>>>>> " Speed: 2420.82 MB/s" .
> >>>>>>>>
> >>>>>>>> So what should I do to increase the throughput? Please correct me
> >> if I
> >>>>>>>> am wrong. Thank you in advance!
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Jiajia
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>
> >>
> >

Re: Question regarding Arrow Flight Throughput

Posted by Antoine Pitrou <an...@python.org>.
I'm not sure a new transport for gRPC would change anything.  gRPC
currently uses HTTP (HTTP2 I believe), and there's no reason for HTTP to
be the culprit here.

Regards

Antoine.


Le 24/04/2020 à 20:48, Micah Kornfield a écrit :
> A couple of questions:
> 1.  For same node transport would doing something with Plasma be a
> reasonable approach?
> 2.  What are the advantages/disadvantages of creating a new transport for
> gRPC [1] vs building an entirely new backend of flight?
> 
> Thanks,
> Micah
> 
> [1] https://github.com/grpc/grpc/issues/7931
> 
> On Fri, Apr 24, 2020 at 11:37 AM David Li <li...@gmail.com> wrote:
> 
>> Having alternative backends for Flight has been a goal from the start,
>> hence why gRPC is wrapped and generally not exposed to the user. I
>> would be interested in collaborating on an HTTP/1 backend that is
>> accessible from the browser (or via an alternative transport meeting
>> the same requirements, e.g. WebSockets).
>>
>> In terms of tuning gRPC, taking a performance profile would be useful.
>> I remember there are some TODOs on the C++ side about copies that
>> sometimes occur due to gRPC that we don't quite understand yet. I
>> spent quite a bit of time a while ago trying to tune gRPC, but like
>> Antoine, couldn't find any easy wins.
>>
>> Best,
>> David
>>
>> On 4/24/20, Antoine Pitrou <an...@python.org> wrote:
>>>
>>> Hi Jiajia,
>>>
>>> I see.  I think there are two possible avenues to try and improve this:
>>>
>>> * better use gRPC in the hope of achieving higher performance.  This
>>> doesn't seem to be easy, though.  I've already tried to change some of
>>> the parameters listed here, but didn't get any benefits:
>>> https://grpc.github.io/grpc/cpp/group__grpc__arg__keys.html
>>>
>>> (perhaps there are other, lower-level APIs that we should use? I don't
>>> know)
>>>
>>> * take the time to design and start implementing another I/O backend for
>>> Flight.  gRPC is just one possible backend, but the Flight remote API is
>>> simple enough that we could envision other backends (for example a HTTP
>>> REST-like API).  If you opt for this, I would strongly suggest start the
>>> discussion on the mailing-list in order to coordinate with other
>>> developers.
>>>
>>> Best regards
>>>
>>> Antoine.
>>>
>>>
>>> Le 24/04/2020 à 19:16, Li, Jiajia a écrit :
>>>> Hi Antoine,
>>>>
>>>>> The question, though, is: do you *need* those higher speeds on
>> localhost?
>>>>>  In which context are you considering Flight?
>>>>
>>>> We want to send large data(in cache) to the data analytic application(in
>>>> local).
>>>>
>>>> Thanks,
>>>> Jiajia
>>>>
>>>> -----Original Message-----
>>>> From: Antoine Pitrou <an...@python.org>
>>>> Sent: Saturday, April 25, 2020 1:01 AM
>>>> To: dev@arrow.apache.org
>>>> Subject: Re: Question regarding Arrow Flight Throughput
>>>>
>>>>
>>>> Hi Jiajia,
>>>>
>>>> It's true one should be able to reach higher speeds.  For example, I can
>>>> reach more than 7 GB/s on a simple TCP connection, in pure Python, using
>>>> only two threads:
>>>> https://gist.github.com/pitrou/6cdf7bf6ce7a35f4073a7820a891f78e
>>>>
>>>> The question, though, is: do you *need* those higher speeds on
>> localhost?
>>>> In which context are you considering Flight?
>>>>
>>>> Regards
>>>>
>>>> Antoine.
>>>>
>>>>
>>>> Le 24/04/2020 à 18:52, Li, Jiajia a écrit :
>>>>> Hi Antoine,
>>>>>
>>>>> I think here 5 GB/s is in localhost. As localhost does not depend on
>>>>> network speed and I've checked the CPU is not the bottleneck when
>> running
>>>>> benchmark, I think flight can get a higher throughput.
>>>>>
>>>>> Thanks,
>>>>> Jiajia
>>>>>
>>>>> -----Original Message-----
>>>>> From: Antoine Pitrou <an...@python.org>
>>>>> Sent: Friday, April 24, 2020 5:47 PM
>>>>> To: dev@arrow.apache.org
>>>>> Subject: Re: Question regarding Arrow Flight Throughput
>>>>>
>>>>>
>>>>> The problem with gRPC is that it was designed with relatively small
>>>>> requests and payloads in mind.  We're using it for a large data
>>>>> application which it wasn't optimized for.  Also, its threading model
>> is
>>>>> inscrutable (yielding those weird benchmark results).
>>>>>
>>>>> However, 5 GB/s is indeed very good if between different machines.
>>>>>
>>>>> Regards
>>>>>
>>>>> Antoine.
>>>>>
>>>>>
>>>>> Le 24/04/2020 à 05:15, Wes McKinney a écrit :
>>>>>> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> hi Jiajia,
>>>>>>>
>>>>>>> See my TODO here
>>>>>>>
>>>>>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/fli
>>>>>>> g
>>>>>>> ht_benchmark.cc#L182
>>>>>>>
>>>>>>> My guess is that if you want to get faster throughput with multiple
>>>>>>> cores, you need to run more than one server and serve on different
>>>>>>> ports rather than having all threads go to the same server through
>>>>>>> the same port. I don't think we've made any manycore scalability
>>>>>>> claims, though.
>>>>>>>
>>>>>>> I tried to run this myself but I can't get the benchmark executable
>>>>>>> to run on my machine right now -- this seems to be a regression.
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/ARROW-8578
>>>>>>
>>>>>> This turned out to be a false alarm and went away after a reboot.
>>>>>>
>>>>>> On my laptop a single thread is faster than multiple threads making
>>>>>> requests to a sole server, so this supports the hypothesis that
>>>>>> concurrent requests on the same port does not increase throughput.
>>>>>>
>>>>>> $ ./release/arrow-flight-benchmark -num_threads 1
>>>>>> Speed: 5131.73 MB/s
>>>>>>
>>>>>> $ ./release/arrow-flight-benchmark -num_threads 16
>>>>>> Speed: 4258.58 MB/s
>>>>>>
>>>>>> I'd suggest improving the benchmark executable to spawn multiple
>>>>>> servers as the next step to study multicore throughput. That said
>>>>>> with the above being ~40gbps already it's unclear how higher
>>>>>> throughput can go realistically.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - Wes
>>>>>>>
>>>>>>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have some doubts about arrow flight throughput. In this
>>>>>>>> article(https://www.dremio.com/understanding-apache-arrow-flight/),
>>>>>>>> it said "High efficiency. Flight is designed to work without any
>>>>>>>> serialization or deserialization of records, and with zero memory
>>>>>>>> copies, achieving over 20 Gbps per core."  And in the other article
>>>>>>>> (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/
>> ),
>>>>>>>> it said "As far as absolute speed, in our C++ data throughput
>>>>>>>> benchmarks, we are seeing end-to-end TCP throughput in excess of
>>>>>>>> 2-3GB/s on localhost without TLS enabled. This benchmark shows a
>>>>>>>> transfer of ~12 gigabytes of data in about 4 seconds:"
>>>>>>>>
>>>>>>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a
>>>>>>>> server with two cores, the throughput will be 5 GB/s?  But I have
>> run
>>>>>>>> the arrow-flight-benchmark, my server with 40 cores, but the result
>> is
>>>>>>>> " Speed: 2420.82 MB/s" .
>>>>>>>>
>>>>>>>> So what should I do to increase the throughput? Please correct me
>> if I
>>>>>>>> am wrong. Thank you in advance!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jiajia
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>
>>
> 

Re: Question regarding Arrow Flight Throughput

Posted by Micah Kornfield <em...@gmail.com>.
A couple of questions:
1.  For same node transport would doing something with Plasma be a
reasonable approach?
2.  What are the advantages/disadvantages of creating a new transport for
gRPC [1] vs building an entirely new backend of flight?

Thanks,
Micah

[1] https://github.com/grpc/grpc/issues/7931

On Fri, Apr 24, 2020 at 11:37 AM David Li <li...@gmail.com> wrote:

> Having alternative backends for Flight has been a goal from the start,
> hence why gRPC is wrapped and generally not exposed to the user. I
> would be interested in collaborating on an HTTP/1 backend that is
> accessible from the browser (or via an alternative transport meeting
> the same requirements, e.g. WebSockets).
>
> In terms of tuning gRPC, taking a performance profile would be useful.
> I remember there are some TODOs on the C++ side about copies that
> sometimes occur due to gRPC that we don't quite understand yet. I
> spent quite a bit of time a while ago trying to tune gRPC, but like
> Antoine, couldn't find any easy wins.
>
> Best,
> David
>
> On 4/24/20, Antoine Pitrou <an...@python.org> wrote:
> >
> > Hi Jiajia,
> >
> > I see.  I think there are two possible avenues to try and improve this:
> >
> > * better use gRPC in the hope of achieving higher performance.  This
> > doesn't seem to be easy, though.  I've already tried to change some of
> > the parameters listed here, but didn't get any benefits:
> > https://grpc.github.io/grpc/cpp/group__grpc__arg__keys.html
> >
> > (perhaps there are other, lower-level APIs that we should use? I don't
> > know)
> >
> > * take the time to design and start implementing another I/O backend for
> > Flight.  gRPC is just one possible backend, but the Flight remote API is
> > simple enough that we could envision other backends (for example a HTTP
> > REST-like API).  If you opt for this, I would strongly suggest start the
> > discussion on the mailing-list in order to coordinate with other
> > developers.
> >
> > Best regards
> >
> > Antoine.
> >
> >
> > Le 24/04/2020 à 19:16, Li, Jiajia a écrit :
> >> Hi Antoine,
> >>
> >>> The question, though, is: do you *need* those higher speeds on
> localhost?
> >>>  In which context are you considering Flight?
> >>
> >> We want to send large data(in cache) to the data analytic application(in
> >> local).
> >>
> >> Thanks,
> >> Jiajia
> >>
> >> -----Original Message-----
> >> From: Antoine Pitrou <an...@python.org>
> >> Sent: Saturday, April 25, 2020 1:01 AM
> >> To: dev@arrow.apache.org
> >> Subject: Re: Question regarding Arrow Flight Throughput
> >>
> >>
> >> Hi Jiajia,
> >>
> >> It's true one should be able to reach higher speeds.  For example, I can
> >> reach more than 7 GB/s on a simple TCP connection, in pure Python, using
> >> only two threads:
> >> https://gist.github.com/pitrou/6cdf7bf6ce7a35f4073a7820a891f78e
> >>
> >> The question, though, is: do you *need* those higher speeds on
> localhost?
> >> In which context are you considering Flight?
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 24/04/2020 à 18:52, Li, Jiajia a écrit :
> >>> Hi Antoine,
> >>>
> >>> I think here 5 GB/s is in localhost. As localhost does not depend on
> >>> network speed and I've checked the CPU is not the bottleneck when
> running
> >>> benchmark, I think flight can get a higher throughput.
> >>>
> >>> Thanks,
> >>> Jiajia
> >>>
> >>> -----Original Message-----
> >>> From: Antoine Pitrou <an...@python.org>
> >>> Sent: Friday, April 24, 2020 5:47 PM
> >>> To: dev@arrow.apache.org
> >>> Subject: Re: Question regarding Arrow Flight Throughput
> >>>
> >>>
> >>> The problem with gRPC is that it was designed with relatively small
> >>> requests and payloads in mind.  We're using it for a large data
> >>> application which it wasn't optimized for.  Also, its threading model
> is
> >>> inscrutable (yielding those weird benchmark results).
> >>>
> >>> However, 5 GB/s is indeed very good if between different machines.
> >>>
> >>> Regards
> >>>
> >>> Antoine.
> >>>
> >>>
> >>> Le 24/04/2020 à 05:15, Wes McKinney a écrit :
> >>>> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> hi Jiajia,
> >>>>>
> >>>>> See my TODO here
> >>>>>
> >>>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/fli
> >>>>> g
> >>>>> ht_benchmark.cc#L182
> >>>>>
> >>>>> My guess is that if you want to get faster throughput with multiple
> >>>>> cores, you need to run more than one server and serve on different
> >>>>> ports rather than having all threads go to the same server through
> >>>>> the same port. I don't think we've made any manycore scalability
> >>>>> claims, though.
> >>>>>
> >>>>> I tried to run this myself but I can't get the benchmark executable
> >>>>> to run on my machine right now -- this seems to be a regression.
> >>>>>
> >>>>> https://issues.apache.org/jira/browse/ARROW-8578
> >>>>
> >>>> This turned out to be a false alarm and went away after a reboot.
> >>>>
> >>>> On my laptop a single thread is faster than multiple threads making
> >>>> requests to a sole server, so this supports the hypothesis that
> >>>> concurrent requests on the same port does not increase throughput.
> >>>>
> >>>> $ ./release/arrow-flight-benchmark -num_threads 1
> >>>> Speed: 5131.73 MB/s
> >>>>
> >>>> $ ./release/arrow-flight-benchmark -num_threads 16
> >>>> Speed: 4258.58 MB/s
> >>>>
> >>>> I'd suggest improving the benchmark executable to spawn multiple
> >>>> servers as the next step to study multicore throughput. That said
> >>>> with the above being ~40gbps already it's unclear how higher
> >>>> throughput can go realistically.
> >>>>
> >>>>
> >>>>>
> >>>>> - Wes
> >>>>>
> >>>>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I have some doubts about arrow flight throughput. In this
> >>>>>> article(https://www.dremio.com/understanding-apache-arrow-flight/),
> >>>>>> it said "High efficiency. Flight is designed to work without any
> >>>>>> serialization or deserialization of records, and with zero memory
> >>>>>> copies, achieving over 20 Gbps per core."  And in the other article
> >>>>>> (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/
> ),
> >>>>>> it said "As far as absolute speed, in our C++ data throughput
> >>>>>> benchmarks, we are seeing end-to-end TCP throughput in excess of
> >>>>>> 2-3GB/s on localhost without TLS enabled. This benchmark shows a
> >>>>>> transfer of ~12 gigabytes of data in about 4 seconds:"
> >>>>>>
> >>>>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a
> >>>>>> server with two cores, the throughput will be 5 GB/s?  But I have
> run
> >>>>>> the arrow-flight-benchmark, my server with 40 cores, but the result
> is
> >>>>>> " Speed: 2420.82 MB/s" .
> >>>>>>
> >>>>>> So what should I do to increase the throughput? Please correct me
> if I
> >>>>>> am wrong. Thank you in advance!
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Jiajia
> >>>>>>
> >>>>>>
> >>>>>>
> >
>

Re: Question regarding Arrow Flight Throughput

Posted by David Li <li...@gmail.com>.
Having alternative backends for Flight has been a goal from the start,
hence why gRPC is wrapped and generally not exposed to the user. I
would be interested in collaborating on an HTTP/1 backend that is
accessible from the browser (or via an alternative transport meeting
the same requirements, e.g. WebSockets).

In terms of tuning gRPC, taking a performance profile would be useful.
I remember there are some TODOs on the C++ side about copies that
sometimes occur due to gRPC that we don't quite understand yet. I
spent quite a bit of time a while ago trying to tune gRPC, but like
Antoine, couldn't find any easy wins.

Best,
David

On 4/24/20, Antoine Pitrou <an...@python.org> wrote:
>
> Hi Jiajia,
>
> I see.  I think there are two possible avenues to try and improve this:
>
> * better use gRPC in the hope of achieving higher performance.  This
> doesn't seem to be easy, though.  I've already tried to change some of
> the parameters listed here, but didn't get any benefits:
> https://grpc.github.io/grpc/cpp/group__grpc__arg__keys.html
>
> (perhaps there are other, lower-level APIs that we should use? I don't
> know)
>
> * take the time to design and start implementing another I/O backend for
> Flight.  gRPC is just one possible backend, but the Flight remote API is
> simple enough that we could envision other backends (for example a HTTP
> REST-like API).  If you opt for this, I would strongly suggest start the
> discussion on the mailing-list in order to coordinate with other
> developers.
>
> Best regards
>
> Antoine.
>
>
> Le 24/04/2020 à 19:16, Li, Jiajia a écrit :
>> Hi Antoine,
>>
>>> The question, though, is: do you *need* those higher speeds on localhost?
>>>  In which context are you considering Flight?
>>
>> We want to send large data(in cache) to the data analytic application(in
>> local).
>>
>> Thanks,
>> Jiajia
>>
>> -----Original Message-----
>> From: Antoine Pitrou <an...@python.org>
>> Sent: Saturday, April 25, 2020 1:01 AM
>> To: dev@arrow.apache.org
>> Subject: Re: Question regarding Arrow Flight Throughput
>>
>>
>> Hi Jiajia,
>>
>> It's true one should be able to reach higher speeds.  For example, I can
>> reach more than 7 GB/s on a simple TCP connection, in pure Python, using
>> only two threads:
>> https://gist.github.com/pitrou/6cdf7bf6ce7a35f4073a7820a891f78e
>>
>> The question, though, is: do you *need* those higher speeds on localhost?
>> In which context are you considering Flight?
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 24/04/2020 à 18:52, Li, Jiajia a écrit :
>>> Hi Antoine,
>>>
>>> I think here 5 GB/s is in localhost. As localhost does not depend on
>>> network speed and I've checked the CPU is not the bottleneck when running
>>> benchmark, I think flight can get a higher throughput.
>>>
>>> Thanks,
>>> Jiajia
>>>
>>> -----Original Message-----
>>> From: Antoine Pitrou <an...@python.org>
>>> Sent: Friday, April 24, 2020 5:47 PM
>>> To: dev@arrow.apache.org
>>> Subject: Re: Question regarding Arrow Flight Throughput
>>>
>>>
>>> The problem with gRPC is that it was designed with relatively small
>>> requests and payloads in mind.  We're using it for a large data
>>> application which it wasn't optimized for.  Also, its threading model is
>>> inscrutable (yielding those weird benchmark results).
>>>
>>> However, 5 GB/s is indeed very good if between different machines.
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>> Le 24/04/2020 à 05:15, Wes McKinney a écrit :
>>>> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com>
>>>> wrote:
>>>>>
>>>>> hi Jiajia,
>>>>>
>>>>> See my TODO here
>>>>>
>>>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/fli
>>>>> g
>>>>> ht_benchmark.cc#L182
>>>>>
>>>>> My guess is that if you want to get faster throughput with multiple
>>>>> cores, you need to run more than one server and serve on different
>>>>> ports rather than having all threads go to the same server through
>>>>> the same port. I don't think we've made any manycore scalability
>>>>> claims, though.
>>>>>
>>>>> I tried to run this myself but I can't get the benchmark executable
>>>>> to run on my machine right now -- this seems to be a regression.
>>>>>
>>>>> https://issues.apache.org/jira/browse/ARROW-8578
>>>>
>>>> This turned out to be a false alarm and went away after a reboot.
>>>>
>>>> On my laptop a single thread is faster than multiple threads making
>>>> requests to a sole server, so this supports the hypothesis that
>>>> concurrent requests on the same port does not increase throughput.
>>>>
>>>> $ ./release/arrow-flight-benchmark -num_threads 1
>>>> Speed: 5131.73 MB/s
>>>>
>>>> $ ./release/arrow-flight-benchmark -num_threads 16
>>>> Speed: 4258.58 MB/s
>>>>
>>>> I'd suggest improving the benchmark executable to spawn multiple
>>>> servers as the next step to study multicore throughput. That said
>>>> with the above being ~40gbps already it's unclear how higher
>>>> throughput can go realistically.
>>>>
>>>>
>>>>>
>>>>> - Wes
>>>>>
>>>>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have some doubts about arrow flight throughput. In this
>>>>>> article(https://www.dremio.com/understanding-apache-arrow-flight/),
>>>>>> it said "High efficiency. Flight is designed to work without any
>>>>>> serialization or deserialization of records, and with zero memory
>>>>>> copies, achieving over 20 Gbps per core."  And in the other article
>>>>>> (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/),
>>>>>> it said "As far as absolute speed, in our C++ data throughput
>>>>>> benchmarks, we are seeing end-to-end TCP throughput in excess of
>>>>>> 2-3GB/s on localhost without TLS enabled. This benchmark shows a
>>>>>> transfer of ~12 gigabytes of data in about 4 seconds:"
>>>>>>
>>>>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a
>>>>>> server with two cores, the throughput will be 5 GB/s?  But I have run
>>>>>> the arrow-flight-benchmark, my server with 40 cores, but the result is
>>>>>> " Speed: 2420.82 MB/s" .
>>>>>>
>>>>>> So what should I do to increase the throughput? Please correct me if I
>>>>>> am wrong. Thank you in advance!
>>>>>>
>>>>>> Thanks,
>>>>>> Jiajia
>>>>>>
>>>>>>
>>>>>>
>

Re: Question regarding Arrow Flight Throughput

Posted by Antoine Pitrou <an...@python.org>.
Hi Jiajia,

I see.  I think there are two possible avenues to try and improve this:

* better use gRPC in the hope of achieving higher performance.  This
doesn't seem to be easy, though.  I've already tried to change some of
the parameters listed here, but didn't get any benefits:
https://grpc.github.io/grpc/cpp/group__grpc__arg__keys.html

(perhaps there are other, lower-level APIs that we should use? I don't know)

* take the time to design and start implementing another I/O backend for
Flight.  gRPC is just one possible backend, but the Flight remote API is
simple enough that we could envision other backends (for example a HTTP
REST-like API).  If you opt for this, I would strongly suggest start the
discussion on the mailing-list in order to coordinate with other developers.

Best regards

Antoine.


Le 24/04/2020 à 19:16, Li, Jiajia a écrit :
> Hi Antoine,
> 
>> The question, though, is: do you *need* those higher speeds on localhost?  In which context are you considering Flight?
> 
> We want to send large data(in cache) to the data analytic application(in local).
> 
> Thanks,
> Jiajia
> 
> -----Original Message-----
> From: Antoine Pitrou <an...@python.org> 
> Sent: Saturday, April 25, 2020 1:01 AM
> To: dev@arrow.apache.org
> Subject: Re: Question regarding Arrow Flight Throughput
> 
> 
> Hi Jiajia,
> 
> It's true one should be able to reach higher speeds.  For example, I can reach more than 7 GB/s on a simple TCP connection, in pure Python, using only two threads:
> https://gist.github.com/pitrou/6cdf7bf6ce7a35f4073a7820a891f78e
> 
> The question, though, is: do you *need* those higher speeds on localhost?  In which context are you considering Flight?
> 
> Regards
> 
> Antoine.
> 
> 
> Le 24/04/2020 à 18:52, Li, Jiajia a écrit :
>> Hi Antoine,
>>
>> I think here 5 GB/s is in localhost. As localhost does not depend on network speed and I've checked the CPU is not the bottleneck when running benchmark, I think flight can get a higher throughput.
>>
>> Thanks,
>> Jiajia
>>
>> -----Original Message-----
>> From: Antoine Pitrou <an...@python.org>
>> Sent: Friday, April 24, 2020 5:47 PM
>> To: dev@arrow.apache.org
>> Subject: Re: Question regarding Arrow Flight Throughput
>>
>>
>> The problem with gRPC is that it was designed with relatively small requests and payloads in mind.  We're using it for a large data application which it wasn't optimized for.  Also, its threading model is inscrutable (yielding those weird benchmark results).
>>
>> However, 5 GB/s is indeed very good if between different machines.
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 24/04/2020 à 05:15, Wes McKinney a écrit :
>>> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com> wrote:
>>>>
>>>> hi Jiajia,
>>>>
>>>> See my TODO here
>>>>
>>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/fli
>>>> g
>>>> ht_benchmark.cc#L182
>>>>
>>>> My guess is that if you want to get faster throughput with multiple 
>>>> cores, you need to run more than one server and serve on different 
>>>> ports rather than having all threads go to the same server through 
>>>> the same port. I don't think we've made any manycore scalability 
>>>> claims, though.
>>>>
>>>> I tried to run this myself but I can't get the benchmark executable 
>>>> to run on my machine right now -- this seems to be a regression.
>>>>
>>>> https://issues.apache.org/jira/browse/ARROW-8578
>>>
>>> This turned out to be a false alarm and went away after a reboot.
>>>
>>> On my laptop a single thread is faster than multiple threads making 
>>> requests to a sole server, so this supports the hypothesis that 
>>> concurrent requests on the same port does not increase throughput.
>>>
>>> $ ./release/arrow-flight-benchmark -num_threads 1
>>> Speed: 5131.73 MB/s
>>>
>>> $ ./release/arrow-flight-benchmark -num_threads 16
>>> Speed: 4258.58 MB/s
>>>
>>> I'd suggest improving the benchmark executable to spawn multiple 
>>> servers as the next step to study multicore throughput. That said 
>>> with the above being ~40gbps already it's unclear how higher 
>>> throughput can go realistically.
>>>
>>>
>>>>
>>>> - Wes
>>>>
>>>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have some doubts about arrow flight throughput. In this article(https://www.dremio.com/understanding-apache-arrow-flight/),  it said "High efficiency. Flight is designed to work without any serialization or deserialization of records, and with zero memory copies, achieving over 20 Gbps per core."  And in the other article (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/), it said "As far as absolute speed, in our C++ data throughput benchmarks, we are seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in about 4 seconds:"
>>>>>
>>>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a server with two cores, the throughput will be 5 GB/s?  But I have run the arrow-flight-benchmark, my server with 40 cores, but the result is " Speed: 2420.82 MB/s" .
>>>>>
>>>>> So what should I do to increase the throughput? Please correct me if I am wrong. Thank you in advance!
>>>>>
>>>>> Thanks,
>>>>> Jiajia
>>>>>
>>>>>
>>>>>

RE: Question regarding Arrow Flight Throughput

Posted by "Li, Jiajia" <ji...@intel.com>.
Hi Antoine,

>The question, though, is: do you *need* those higher speeds on localhost?  In which context are you considering Flight?

We want to send large data(in cache) to the data analytic application(in local).

Thanks,
Jiajia

-----Original Message-----
From: Antoine Pitrou <an...@python.org> 
Sent: Saturday, April 25, 2020 1:01 AM
To: dev@arrow.apache.org
Subject: Re: Question regarding Arrow Flight Throughput


Hi Jiajia,

It's true one should be able to reach higher speeds.  For example, I can reach more than 7 GB/s on a simple TCP connection, in pure Python, using only two threads:
https://gist.github.com/pitrou/6cdf7bf6ce7a35f4073a7820a891f78e

The question, though, is: do you *need* those higher speeds on localhost?  In which context are you considering Flight?

Regards

Antoine.


Le 24/04/2020 à 18:52, Li, Jiajia a écrit :
> Hi Antoine,
> 
> I think here 5 GB/s is in localhost. As localhost does not depend on network speed and I've checked the CPU is not the bottleneck when running benchmark, I think flight can get a higher throughput.
> 
> Thanks,
> Jiajia
> 
> -----Original Message-----
> From: Antoine Pitrou <an...@python.org>
> Sent: Friday, April 24, 2020 5:47 PM
> To: dev@arrow.apache.org
> Subject: Re: Question regarding Arrow Flight Throughput
> 
> 
> The problem with gRPC is that it was designed with relatively small requests and payloads in mind.  We're using it for a large data application which it wasn't optimized for.  Also, its threading model is inscrutable (yielding those weird benchmark results).
> 
> However, 5 GB/s is indeed very good if between different machines.
> 
> Regards
> 
> Antoine.
> 
> 
> Le 24/04/2020 à 05:15, Wes McKinney a écrit :
>> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com> wrote:
>>>
>>> hi Jiajia,
>>>
>>> See my TODO here
>>>
>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/fli
>>> g
>>> ht_benchmark.cc#L182
>>>
>>> My guess is that if you want to get faster throughput with multiple 
>>> cores, you need to run more than one server and serve on different 
>>> ports rather than having all threads go to the same server through 
>>> the same port. I don't think we've made any manycore scalability 
>>> claims, though.
>>>
>>> I tried to run this myself but I can't get the benchmark executable 
>>> to run on my machine right now -- this seems to be a regression.
>>>
>>> https://issues.apache.org/jira/browse/ARROW-8578
>>
>> This turned out to be a false alarm and went away after a reboot.
>>
>> On my laptop a single thread is faster than multiple threads making 
>> requests to a sole server, so this supports the hypothesis that 
>> concurrent requests on the same port does not increase throughput.
>>
>> $ ./release/arrow-flight-benchmark -num_threads 1
>> Speed: 5131.73 MB/s
>>
>> $ ./release/arrow-flight-benchmark -num_threads 16
>> Speed: 4258.58 MB/s
>>
>> I'd suggest improving the benchmark executable to spawn multiple 
>> servers as the next step to study multicore throughput. That said 
>> with the above being ~40gbps already it's unclear how higher 
>> throughput can go realistically.
>>
>>
>>>
>>> - Wes
>>>
>>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I have some doubts about arrow flight throughput. In this article(https://www.dremio.com/understanding-apache-arrow-flight/),  it said "High efficiency. Flight is designed to work without any serialization or deserialization of records, and with zero memory copies, achieving over 20 Gbps per core."  And in the other article (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/), it said "As far as absolute speed, in our C++ data throughput benchmarks, we are seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in about 4 seconds:"
>>>>
>>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a server with two cores, the throughput will be 5 GB/s?  But I have run the arrow-flight-benchmark, my server with 40 cores, but the result is " Speed: 2420.82 MB/s" .
>>>>
>>>> So what should I do to increase the throughput? Please correct me if I am wrong. Thank you in advance!
>>>>
>>>> Thanks,
>>>> Jiajia
>>>>
>>>>
>>>>

Re: Question regarding Arrow Flight Throughput

Posted by Antoine Pitrou <an...@python.org>.
Hi Jiajia,

It's true one should be able to reach higher speeds.  For example, I can
reach more than 7 GB/s on a simple TCP connection, in pure Python, using
only two threads:
https://gist.github.com/pitrou/6cdf7bf6ce7a35f4073a7820a891f78e

The question, though, is: do you *need* those higher speeds on
localhost?  In which context are you considering Flight?

Regards

Antoine.


Le 24/04/2020 à 18:52, Li, Jiajia a écrit :
> Hi Antoine,
> 
> I think here 5 GB/s is in localhost. As localhost does not depend on network speed and I've checked the CPU is not the bottleneck when running benchmark, I think flight can get a higher throughput.
> 
> Thanks,
> Jiajia
> 
> -----Original Message-----
> From: Antoine Pitrou <an...@python.org> 
> Sent: Friday, April 24, 2020 5:47 PM
> To: dev@arrow.apache.org
> Subject: Re: Question regarding Arrow Flight Throughput
> 
> 
> The problem with gRPC is that it was designed with relatively small requests and payloads in mind.  We're using it for a large data application which it wasn't optimized for.  Also, its threading model is inscrutable (yielding those weird benchmark results).
> 
> However, 5 GB/s is indeed very good if between different machines.
> 
> Regards
> 
> Antoine.
> 
> 
> Le 24/04/2020 à 05:15, Wes McKinney a écrit :
>> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com> wrote:
>>>
>>> hi Jiajia,
>>>
>>> See my TODO here
>>>
>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flig
>>> ht_benchmark.cc#L182
>>>
>>> My guess is that if you want to get faster throughput with multiple 
>>> cores, you need to run more than one server and serve on different 
>>> ports rather than having all threads go to the same server through 
>>> the same port. I don't think we've made any manycore scalability 
>>> claims, though.
>>>
>>> I tried to run this myself but I can't get the benchmark executable 
>>> to run on my machine right now -- this seems to be a regression.
>>>
>>> https://issues.apache.org/jira/browse/ARROW-8578
>>
>> This turned out to be a false alarm and went away after a reboot.
>>
>> On my laptop a single thread is faster than multiple threads making 
>> requests to a sole server, so this supports the hypothesis that 
>> concurrent requests on the same port does not increase throughput.
>>
>> $ ./release/arrow-flight-benchmark -num_threads 1
>> Speed: 5131.73 MB/s
>>
>> $ ./release/arrow-flight-benchmark -num_threads 16
>> Speed: 4258.58 MB/s
>>
>> I'd suggest improving the benchmark executable to spawn multiple 
>> servers as the next step to study multicore throughput. That said with 
>> the above being ~40gbps already it's unclear how higher throughput can 
>> go realistically.
>>
>>
>>>
>>> - Wes
>>>
>>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I have some doubts about arrow flight throughput. In this article(https://www.dremio.com/understanding-apache-arrow-flight/),  it said "High efficiency. Flight is designed to work without any serialization or deserialization of records, and with zero memory copies, achieving over 20 Gbps per core."  And in the other article (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/), it said "As far as absolute speed, in our C++ data throughput benchmarks, we are seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in about 4 seconds:"
>>>>
>>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a server with two cores, the throughput will be 5 GB/s?  But I have run the arrow-flight-benchmark, my server with 40 cores, but the result is " Speed: 2420.82 MB/s" .
>>>>
>>>> So what should I do to increase the throughput? Please correct me if I am wrong. Thank you in advance!
>>>>
>>>> Thanks,
>>>> Jiajia
>>>>
>>>>
>>>>

RE: Question regarding Arrow Flight Throughput

Posted by "Li, Jiajia" <ji...@intel.com>.
Hi Antoine,

I think here 5 GB/s is in localhost. As localhost does not depend on network speed and I've checked the CPU is not the bottleneck when running benchmark, I think flight can get a higher throughput.

Thanks,
Jiajia

-----Original Message-----
From: Antoine Pitrou <an...@python.org> 
Sent: Friday, April 24, 2020 5:47 PM
To: dev@arrow.apache.org
Subject: Re: Question regarding Arrow Flight Throughput


The problem with gRPC is that it was designed with relatively small requests and payloads in mind.  We're using it for a large data application which it wasn't optimized for.  Also, its threading model is inscrutable (yielding those weird benchmark results).

However, 5 GB/s is indeed very good if between different machines.

Regards

Antoine.


Le 24/04/2020 à 05:15, Wes McKinney a écrit :
> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com> wrote:
>>
>> hi Jiajia,
>>
>> See my TODO here
>>
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flig
>> ht_benchmark.cc#L182
>>
>> My guess is that if you want to get faster throughput with multiple 
>> cores, you need to run more than one server and serve on different 
>> ports rather than having all threads go to the same server through 
>> the same port. I don't think we've made any manycore scalability 
>> claims, though.
>>
>> I tried to run this myself but I can't get the benchmark executable 
>> to run on my machine right now -- this seems to be a regression.
>>
>> https://issues.apache.org/jira/browse/ARROW-8578
> 
> This turned out to be a false alarm and went away after a reboot.
> 
> On my laptop a single thread is faster than multiple threads making 
> requests to a sole server, so this supports the hypothesis that 
> concurrent requests on the same port does not increase throughput.
> 
> $ ./release/arrow-flight-benchmark -num_threads 1
> Speed: 5131.73 MB/s
> 
> $ ./release/arrow-flight-benchmark -num_threads 16
> Speed: 4258.58 MB/s
> 
> I'd suggest improving the benchmark executable to spawn multiple 
> servers as the next step to study multicore throughput. That said with 
> the above being ~40gbps already it's unclear how higher throughput can 
> go realistically.
> 
> 
>>
>> - Wes
>>
>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com> wrote:
>>>
>>> Hi all,
>>>
>>> I have some doubts about arrow flight throughput. In this article(https://www.dremio.com/understanding-apache-arrow-flight/),  it said "High efficiency. Flight is designed to work without any serialization or deserialization of records, and with zero memory copies, achieving over 20 Gbps per core."  And in the other article (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/), it said "As far as absolute speed, in our C++ data throughput benchmarks, we are seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in about 4 seconds:"
>>>
>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a server with two cores, the throughput will be 5 GB/s?  But I have run the arrow-flight-benchmark, my server with 40 cores, but the result is " Speed: 2420.82 MB/s" .
>>>
>>> So what should I do to increase the throughput? Please correct me if I am wrong. Thank you in advance!
>>>
>>> Thanks,
>>> Jiajia
>>>
>>>
>>>

Re: Question regarding Arrow Flight Throughput

Posted by Antoine Pitrou <an...@python.org>.
The problem with gRPC is that it was designed with relatively small
requests and payloads in mind.  We're using it for a large data
application which it wasn't optimized for.  Also, its threading model is
inscrutable (yielding those weird benchmark results).

However, 5 GB/s is indeed very good if between different machines.

Regards

Antoine.


Le 24/04/2020 à 05:15, Wes McKinney a écrit :
> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com> wrote:
>>
>> hi Jiajia,
>>
>> See my TODO here
>>
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flight_benchmark.cc#L182
>>
>> My guess is that if you want to get faster throughput with multiple
>> cores, you need to run more than one server and serve on different
>> ports rather than having all threads go to the same server through the
>> same port. I don't think we've made any manycore scalability claims,
>> though.
>>
>> I tried to run this myself but I can't get the benchmark executable to
>> run on my machine right now -- this seems to be a regression.
>>
>> https://issues.apache.org/jira/browse/ARROW-8578
> 
> This turned out to be a false alarm and went away after a reboot.
> 
> On my laptop a single thread is faster than multiple threads making
> requests to a sole server, so this supports the hypothesis that
> concurrent requests on the same port does not increase throughput.
> 
> $ ./release/arrow-flight-benchmark -num_threads 1
> Speed: 5131.73 MB/s
> 
> $ ./release/arrow-flight-benchmark -num_threads 16
> Speed: 4258.58 MB/s
> 
> I'd suggest improving the benchmark executable to spawn multiple
> servers as the next step to study multicore throughput. That said with
> the above being ~40gbps already it's unclear how higher throughput can
> go realistically.
> 
> 
>>
>> - Wes
>>
>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com> wrote:
>>>
>>> Hi all,
>>>
>>> I have some doubts about arrow flight throughput. In this article(https://www.dremio.com/understanding-apache-arrow-flight/),  it said "High efficiency. Flight is designed to work without any serialization or deserialization of records, and with zero memory copies, achieving over 20 Gbps per core."  And in the other article (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/), it said "As far as absolute speed, in our C++ data throughput benchmarks, we are seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in about 4 seconds:"
>>>
>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a server with two cores, the throughput will be 5 GB/s?  But I have run the arrow-flight-benchmark, my server with 40 cores, but the result is " Speed: 2420.82 MB/s" .
>>>
>>> So what should I do to increase the throughput? Please correct me if I am wrong. Thank you in advance!
>>>
>>> Thanks,
>>> Jiajia
>>>
>>>
>>>

RE: Question regarding Arrow Flight Throughput

Posted by "Li, Jiajia" <ji...@intel.com>.
Hi Wes,

Thanks for your reply! 

Thanks,
Jiajia

-----Original Message-----
From: Wes McKinney <we...@gmail.com> 
Sent: Friday, April 24, 2020 11:15 AM
To: dev <de...@arrow.apache.org>
Subject: Re: Question regarding Arrow Flight Throughput

On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com> wrote:
>
> hi Jiajia,
>
> See my TODO here
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/fligh
> t_benchmark.cc#L182
>
> My guess is that if you want to get faster throughput with multiple 
> cores, you need to run more than one server and serve on different 
> ports rather than having all threads go to the same server through the 
> same port. I don't think we've made any manycore scalability claims, 
> though.
>
> I tried to run this myself but I can't get the benchmark executable to 
> run on my machine right now -- this seems to be a regression.
>
> https://issues.apache.org/jira/browse/ARROW-8578

This turned out to be a false alarm and went away after a reboot.

On my laptop a single thread is faster than multiple threads making requests to a sole server, so this supports the hypothesis that concurrent requests on the same port does not increase throughput.

$ ./release/arrow-flight-benchmark -num_threads 1
Speed: 5131.73 MB/s

$ ./release/arrow-flight-benchmark -num_threads 16
Speed: 4258.58 MB/s

I'd suggest improving the benchmark executable to spawn multiple servers as the next step to study multicore throughput. That said with the above being ~40gbps already it's unclear how higher throughput can go realistically.


>
> - Wes
>
> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com> wrote:
> >
> > Hi all,
> >
> > I have some doubts about arrow flight throughput. In this article(https://www.dremio.com/understanding-apache-arrow-flight/),  it said "High efficiency. Flight is designed to work without any serialization or deserialization of records, and with zero memory copies, achieving over 20 Gbps per core."  And in the other article (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/), it said "As far as absolute speed, in our C++ data throughput benchmarks, we are seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in about 4 seconds:"
> >
> > Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a server with two cores, the throughput will be 5 GB/s?  But I have run the arrow-flight-benchmark, my server with 40 cores, but the result is " Speed: 2420.82 MB/s" .
> >
> > So what should I do to increase the throughput? Please correct me if I am wrong. Thank you in advance!
> >
> > Thanks,
> > Jiajia
> >
> >
> >

Re: Question regarding Arrow Flight Throughput

Posted by Wes McKinney <we...@gmail.com>.
On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <we...@gmail.com> wrote:
>
> hi Jiajia,
>
> See my TODO here
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flight_benchmark.cc#L182
>
> My guess is that if you want to get faster throughput with multiple
> cores, you need to run more than one server and serve on different
> ports rather than having all threads go to the same server through the
> same port. I don't think we've made any manycore scalability claims,
> though.
>
> I tried to run this myself but I can't get the benchmark executable to
> run on my machine right now -- this seems to be a regression.
>
> https://issues.apache.org/jira/browse/ARROW-8578

This turned out to be a false alarm and went away after a reboot.

On my laptop a single thread is faster than multiple threads making
requests to a sole server, so this supports the hypothesis that
concurrent requests on the same port does not increase throughput.

$ ./release/arrow-flight-benchmark -num_threads 1
Speed: 5131.73 MB/s

$ ./release/arrow-flight-benchmark -num_threads 16
Speed: 4258.58 MB/s

I'd suggest improving the benchmark executable to spawn multiple
servers as the next step to study multicore throughput. That said with
the above being ~40gbps already it's unclear how higher throughput can
go realistically.


>
> - Wes
>
> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com> wrote:
> >
> > Hi all,
> >
> > I have some doubts about arrow flight throughput. In this article(https://www.dremio.com/understanding-apache-arrow-flight/),  it said "High efficiency. Flight is designed to work without any serialization or deserialization of records, and with zero memory copies, achieving over 20 Gbps per core."  And in the other article (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/), it said "As far as absolute speed, in our C++ data throughput benchmarks, we are seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in about 4 seconds:"
> >
> > Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a server with two cores, the throughput will be 5 GB/s?  But I have run the arrow-flight-benchmark, my server with 40 cores, but the result is " Speed: 2420.82 MB/s" .
> >
> > So what should I do to increase the throughput? Please correct me if I am wrong. Thank you in advance!
> >
> > Thanks,
> > Jiajia
> >
> >
> >

Re: Question regarding Arrow Flight Throughput

Posted by Wes McKinney <we...@gmail.com>.
hi Jiajia,

See my TODO here

https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flight_benchmark.cc#L182

My guess is that if you want to get faster throughput with multiple
cores, you need to run more than one server and serve on different
ports rather than having all threads go to the same server through the
same port. I don't think we've made any manycore scalability claims,
though.

I tried to run this myself but I can't get the benchmark executable to
run on my machine right now -- this seems to be a regression.

https://issues.apache.org/jira/browse/ARROW-8578

- Wes

On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <ji...@intel.com> wrote:
>
> Hi all,
>
> I have some doubts about arrow flight throughput. In this article(https://www.dremio.com/understanding-apache-arrow-flight/),  it said "High efficiency. Flight is designed to work without any serialization or deserialization of records, and with zero memory copies, achieving over 20 Gbps per core."  And in the other article (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/), it said "As far as absolute speed, in our C++ data throughput benchmarks, we are seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in about 4 seconds:"
>
> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a server with two cores, the throughput will be 5 GB/s?  But I have run the arrow-flight-benchmark, my server with 40 cores, but the result is " Speed: 2420.82 MB/s" .
>
> So what should I do to increase the throughput? Please correct me if I am wrong. Thank you in advance!
>
> Thanks,
> Jiajia
>
>
>