You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Xiaozhen Liu <ja...@seu.edu.cn> on 2020/07/22 12:59:23 UTC

Does Arrow Flight use memory-mapped files for IPC within the same host?

Hi everyone,

Lately, I’ve been experimenting with Arrow Flight. For now, I think it is really great, especially when I’m not planning on building my own IPC framework (as I’ve mentioned earlier I’m trying to use Arrow to communicate between Java and Python processes). And the data transfer speed is very satisfactory, although I haven’t tried very big data.
However, I’m wondering this: when I’m using Arrow Flight to do IPC within the same machine, is there any kind of optimization? And by optimization I mean will Flight internally use something like memory-mapped files to transfer data? Because even though Flight optimizes speed, if it still transfers data over the wire it cannot be faster than shared-memory (file), right?
I know this may be strange since Arrow Flight is an RPC framework and will probably be better suited for communication between different hosts. But the fact that it also provides an RPC protocol that saves me the trouble of building my own IPC framework makes me choose Flight to do IPC (currently still on the same host). 
I know that KNIME Analytics Platform also uses Arrow for IPC, and it also uses temp Arrow file to transfer data. I can also do this within the framework of Arrow Flight by simply passing the location of temp files in the messages. But first I just want to see if it is already implemented by Flight internally. 
I’ve looked up the source code of Flight and haven’t found anything that looks like what I’m describing. Am I missing something, or is this the case, Flight doesn’t (and doesn’t plan to ) use file for IPC within the same host?

Thank you.

Best,
Xiaozhen Liu


Re: Does Arrow Flight use memory-mapped files for IPC within the same host?

Posted by Wes McKinney <we...@gmail.com>.
Something that would be interesting would be to create a high-level
interface with Flight and Plasma (or something like Plasma) that
chooses IPC / shared memory over RPC when client and server are on the
same machine. This would require some development, though

On Thu, Jul 23, 2020 at 3:23 AM Xiaozhen Liu <ja...@seu.edu.cn> wrote:
>
> Hi Ryan,
>
>
>
> Thank you! These are really great suggestions. I’ll definitely try them.
>
>
>
> Best,
>
> Xiaozhen
>
>
>
> From: Ryan Murray
> Sent: Thursday, July 23, 2020 3:58 PM
> To: user@arrow.apache.org
> Subject: Re: Does Arrow Flight use memory-mapped files for IPC within the same host?
>
>
>
> Hey Ziaozhen,
>
>
>
> There are no plans (that I am aware of) to support memory mapped files as you described.
>
> As I see it you have a few options:
>
> * bind Flight to loopback interface (ie 127.0.0.1). The loopback device typically skips parts of the network stack and two processes will talk directly to each other
>
> * use a unix socket. I believe grpc can bind to a unix socket rather than a port which will also be faster than the network stack
>
> * Flight is based on grpc, however it isn't coupled to it. You could theoretically replace grpc w/ a memory mapped file based protocol
>
> * design your own IPC w/ memory mapped files
>
>
>
> Hope that helps!
>
>
>
> Best,
>
> Ryan
>
>
>
> On Wed, Jul 22, 2020 at 2:00 PM Xiaozhen Liu <ja...@seu.edu.cn> wrote:
>
> Hi everyone,
>
>
>
> Lately, I’ve been experimenting with Arrow Flight. For now, I think it is really great, especially when I’m not planning on building my own IPC framework (as I’ve mentioned earlier I’m trying to use Arrow to communicate between Java and Python processes). And the data transfer speed is very satisfactory, although I haven’t tried very big data.
>
> However, I’m wondering this: when I’m using Arrow Flight to do IPC within the same machine, is there any kind of optimization? And by optimization I mean will Flight internally use something like memory-mapped files to transfer data? Because even though Flight optimizes speed, if it still transfers data over the wire it cannot be faster than shared-memory (file), right?
>
> I know this may be strange since Arrow Flight is an RPC framework and will probably be better suited for communication between different hosts. But the fact that it also provides an RPC protocol that saves me the trouble of building my own IPC framework makes me choose Flight to do IPC (currently still on the same host).
>
> I know that KNIME Analytics Platform also uses Arrow for IPC, and it also uses temp Arrow file to transfer data. I can also do this within the framework of Arrow Flight by simply passing the location of temp files in the messages. But first I just want to see if it is already implemented by Flight internally.
>
> I’ve looked up the source code of Flight and haven’t found anything that looks like what I’m describing. Am I missing something, or is this the case, Flight doesn’t (and doesn’t plan to ) use file for IPC within the same host?
>
>
>
> Thank you.
>
>
>
> Best,
>
> Xiaozhen Liu
>
>
>
>

RE: Does Arrow Flight use memory-mapped files for IPC within the same host?

Posted by Xiaozhen Liu <ja...@seu.edu.cn>.
Hi Ryan,

Thank you! These are really great suggestions. I’ll definitely try them.

Best,
Xiaozhen

From: Ryan Murray
Sent: Thursday, July 23, 2020 3:58 PM
To: user@arrow.apache.org
Subject: Re: Does Arrow Flight use memory-mapped files for IPC within the same host?

Hey Ziaozhen,

There are no plans (that I am aware of) to support memory mapped files as you described.
As I see it you have a few options:
* bind Flight to loopback interface (ie 127.0.0.1). The loopback device typically skips parts of the network stack and two processes will talk directly to each other
* use a unix socket. I believe grpc can bind to a unix socket rather than a port which will also be faster than the network stack
* Flight is based on grpc, however it isn't coupled to it. You could theoretically replace grpc w/ a memory mapped file based protocol
* design your own IPC w/ memory mapped files

Hope that helps!

Best,

Ryan

On Wed, Jul 22, 2020 at 2:00 PM Xiaozhen Liu <ja...@seu.edu.cn> wrote:
Hi everyone,
 
Lately, I’ve been experimenting with Arrow Flight. For now, I think it is really great, especially when I’m not planning on building my own IPC framework (as I’ve mentioned earlier I’m trying to use Arrow to communicate between Java and Python processes). And the data transfer speed is very satisfactory, although I haven’t tried very big data.
However, I’m wondering this: when I’m using Arrow Flight to do IPC within the same machine, is there any kind of optimization? And by optimization I mean will Flight internally use something like memory-mapped files to transfer data? Because even though Flight optimizes speed, if it still transfers data over the wire it cannot be faster than shared-memory (file), right?
I know this may be strange since Arrow Flight is an RPC framework and will probably be better suited for communication between different hosts. But the fact that it also provides an RPC protocol that saves me the trouble of building my own IPC framework makes me choose Flight to do IPC (currently still on the same host). 
I know that KNIME Analytics Platform also uses Arrow for IPC, and it also uses temp Arrow file to transfer data. I can also do this within the framework of Arrow Flight by simply passing the location of temp files in the messages. But first I just want to see if it is already implemented by Flight internally. 
I’ve looked up the source code of Flight and haven’t found anything that looks like what I’m describing. Am I missing something, or is this the case, Flight doesn’t (and doesn’t plan to ) use file for IPC within the same host?
 
Thank you.
 
Best,
Xiaozhen Liu
 


Re: Does Arrow Flight use memory-mapped files for IPC within the same host?

Posted by Ryan Murray <ry...@dremio.com>.
Hey Ziaozhen,

There are no plans (that I am aware of) to support memory mapped files as
you described.
As I see it you have a few options:
* bind Flight to loopback interface (ie 127.0.0.1). The loopback device
typically skips parts of the network stack and two processes will talk
directly to each other
* use a unix socket. I believe grpc can bind to a unix socket rather than a
port which will also be faster than the network stack
* Flight is based on grpc, however it isn't coupled to it. You could
theoretically replace grpc w/ a memory mapped file based protocol
* design your own IPC w/ memory mapped files

Hope that helps!

Best,

Ryan

On Wed, Jul 22, 2020 at 2:00 PM Xiaozhen Liu <ja...@seu.edu.cn> wrote:

> Hi everyone,
>
>
>
> Lately, I’ve been experimenting with Arrow Flight. For now, I think it is
> really great, especially when I’m not planning on building my own IPC
> framework (as I’ve mentioned earlier I’m trying to use Arrow to communicate
> between Java and Python processes). And the data transfer speed is very
> satisfactory, although I haven’t tried very big data.
>
> However, I’m wondering this: when I’m using Arrow Flight to do IPC within
> the same machine, is there any kind of optimization? And by optimization I
> mean will Flight internally use something like memory-mapped files to
> transfer data? Because even though Flight optimizes speed, if it still
> transfers data over the wire it cannot be faster than shared-memory
> (file), right?
>
> I know this may be strange since Arrow Flight is an RPC framework and will
> probably be better suited for communication between different hosts. But
> the fact that it also provides an RPC protocol that saves me the trouble of
> building my own IPC framework makes me choose Flight to do IPC (currently
> still on the same host).
>
> I know that KNIME Analytics Platform also uses Arrow for IPC, and it also
> uses temp Arrow file to transfer data. I can also do this within the
> framework of Arrow Flight by simply passing the location of temp files in
> the messages. But first I just want to see if it is already implemented by
> Flight internally.
>
> I’ve looked up the source code of Flight and haven’t found anything that
> looks like what I’m describing. Am I missing something, or is this the
> case, Flight doesn’t (and doesn’t plan to ) use file for IPC within the
> same host?
>
>
>
> Thank you.
>
>
>
> Best,
>
> Xiaozhen Liu
>
>
>