You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by John Muehlhausen <jg...@jgm.org> on 2019/10/03 21:21:30 UTC

arrow::io::MemoryMappedFile from fd rather than path

I have a situation where multiple processes need to access a memory mapped
file.

However, between the time the first process maps the file and the time a
subsequent process in the group maps the file, the file may have been
removed from the filesystem.  (I.e. has no "path")  Coordinating the cache
pruner (which would remove the file) to not affect the overall "atomicity"
of the process group would be a real chore.

Therefore I need to communicate and use the file descriptor rather than the
path name when subsequent processes map the file.  (Using SCM_RIGHTS on a
unix socket, /proc/.../fd ... as a couple of ways that come to mind....
cannot inherit the fd since the parent proc is often the late joiner.)

Would we just make a variant of Open() that takes a fd rather than a path?

Related to this, need to be able to discover the fd of a mapped file and
need these APIs in Python as well.

Would this API have any analogy on Windows?  Do we have platform-specific
functionality?

Thoughts?

-John

Re: arrow::io::MemoryMappedFile from fd rather than path

Posted by Antoine Pitrou <an...@python.org>.
Le 04/10/2019 à 00:31, John Muehlhausen a écrit :
> http://lackingrhoticity.blogspot.com/2015/05/passing-fds-handles-between-processes.html
> 
> If I'm reading this correctly, it doesn't affect our Open(fd) API on
> Windows, but only how descriptors are communicated between processes that
> want to make use of it.

Yeah, well, that part will be completely different :-)  But it's not
part of Arrow concurrently (Plasma has it, but it's POSIX-only precisely).

Regards

Antoine.

Re: arrow::io::MemoryMappedFile from fd rather than path

Posted by John Muehlhausen <jg...@jgm.org>.
http://lackingrhoticity.blogspot.com/2015/05/passing-fds-handles-between-processes.html

If I'm reading this correctly, it doesn't affect our Open(fd) API on
Windows, but only how descriptors are communicated between processes that
want to make use of it.

On Thu, Oct 3, 2019 at 4:24 PM Antoine Pitrou <an...@python.org> wrote:

>
> Le 03/10/2019 à 23:21, John Muehlhausen a écrit :
> >
> > Would we just make a variant of Open() that takes a fd rather than a
> path?
>
> That sounds like a good idea.  Would you like to open a JIRA and a PR?
>
> > Would this API have any analogy on Windows?  Do we have platform-specific
> > functionality?
>
> File descriptors exist on Windows, so it should be fine there as well.
>
> Regards
>
> Antoine.
>

Re: arrow::io::MemoryMappedFile from fd rather than path

Posted by Antoine Pitrou <an...@python.org>.
Le 03/10/2019 à 23:21, John Muehlhausen a écrit :
> 
> Would we just make a variant of Open() that takes a fd rather than a path?

That sounds like a good idea.  Would you like to open a JIRA and a PR?

> Would this API have any analogy on Windows?  Do we have platform-specific
> functionality?

File descriptors exist on Windows, so it should be fine there as well.

Regards

Antoine.