You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Brian Bowman <Br...@sas.com> on 2019/03/14 18:46:59 UTC

Passing File Descriptors in the Low-Level API

 The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.

Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?

Thanks,

Brian


Re: Passing File Descriptors in the Low-Level API

Posted by Brian Bowman <Br...@sas.com>.
Thanks Wes!

I'm working on the integrating and testing the necessary changes in our dev environment.  I'll submit a PR once things are working.

Best,

Brian 

On 3/16/19, 4:24 PM, "Wes McKinney" <we...@gmail.com> wrote:

    EXTERNAL
    
    hi Brian,
    
    Please feel free to submit a PR to add the requisite APIs that you
    need for your application. Antoine or I or others should be able to
    give prompt feedback since we know this code pretty well.
    
    Thanks
    Wes
    
    On Sat, Mar 16, 2019 at 11:40 AM Brian Bowman <Br...@sas.com> wrote:
    >
    > Hi Wes,
    >
    > Thanks for the quick reply!  To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory.  In other words ...
    >
    > SAS component does both open() and mmap() which could be for READ or WRITE.
    >
    > -> Calls low-level Parquet APIs to read an existing file or write a new one.  The open() and mmap() flags are guaranteed to be correct.
    >
    > At some later point SAS component does an unmap() and close().
    >
    > -Brian
    >
    >
    > On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:
    >
    >     hi Brian,
    >
    >     This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
    >
    >     You can open a file using an existing file descriptor using ReadableFile::Open
    >
    >     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
    >
    >     The documentation for this function says:
    >
    >     "The file descriptor becomes owned by the ReadableFile, and will be
    >     closed on Close() or destruction."
    >
    >     If you want to do the equivalent thing, but using memory mapping, I
    >     think you'll need to add a corresponding API to MemoryMappedFile. This
    >     is more perilous because of the API requirements of mmap -- you need
    >     to pass the right flags and they may need to be the same flags that
    >     were passed when opening the file descriptor, see
    >
    >     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
    >
    >     and
    >
    >     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
    >
    >     - Wes
    >
    >     On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
    >     >
    >     >  The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
    >     >
    >     > Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
    >     >
    >     > Thanks,
    >     >
    >     > Brian
    >     >
    >
    >
    >
    


Re: Passing File Descriptors in the Low-Level API

Posted by Brian Bowman <Br...@sas.com>.
Thanks Wes!

I'm working on the integrating and testing the necessary changes in our dev environment.  I'll submit a PR once things are working.

Best,

Brian 

On 3/16/19, 4:24 PM, "Wes McKinney" <we...@gmail.com> wrote:

    EXTERNAL
    
    hi Brian,
    
    Please feel free to submit a PR to add the requisite APIs that you
    need for your application. Antoine or I or others should be able to
    give prompt feedback since we know this code pretty well.
    
    Thanks
    Wes
    
    On Sat, Mar 16, 2019 at 11:40 AM Brian Bowman <Br...@sas.com> wrote:
    >
    > Hi Wes,
    >
    > Thanks for the quick reply!  To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory.  In other words ...
    >
    > SAS component does both open() and mmap() which could be for READ or WRITE.
    >
    > -> Calls low-level Parquet APIs to read an existing file or write a new one.  The open() and mmap() flags are guaranteed to be correct.
    >
    > At some later point SAS component does an unmap() and close().
    >
    > -Brian
    >
    >
    > On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:
    >
    >     hi Brian,
    >
    >     This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
    >
    >     You can open a file using an existing file descriptor using ReadableFile::Open
    >
    >     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
    >
    >     The documentation for this function says:
    >
    >     "The file descriptor becomes owned by the ReadableFile, and will be
    >     closed on Close() or destruction."
    >
    >     If you want to do the equivalent thing, but using memory mapping, I
    >     think you'll need to add a corresponding API to MemoryMappedFile. This
    >     is more perilous because of the API requirements of mmap -- you need
    >     to pass the right flags and they may need to be the same flags that
    >     were passed when opening the file descriptor, see
    >
    >     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
    >
    >     and
    >
    >     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
    >
    >     - Wes
    >
    >     On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
    >     >
    >     >  The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
    >     >
    >     > Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
    >     >
    >     > Thanks,
    >     >
    >     > Brian
    >     >
    >
    >
    >
    


Re: Passing File Descriptors in the Low-Level API

Posted by Wes McKinney <we...@gmail.com>.
hi Brian,

Please feel free to submit a PR to add the requisite APIs that you
need for your application. Antoine or I or others should be able to
give prompt feedback since we know this code pretty well.

Thanks
Wes

On Sat, Mar 16, 2019 at 11:40 AM Brian Bowman <Br...@sas.com> wrote:
>
> Hi Wes,
>
> Thanks for the quick reply!  To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory.  In other words ...
>
> SAS component does both open() and mmap() which could be for READ or WRITE.
>
> -> Calls low-level Parquet APIs to read an existing file or write a new one.  The open() and mmap() flags are guaranteed to be correct.
>
> At some later point SAS component does an unmap() and close().
>
> -Brian
>
>
> On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:
>
>     hi Brian,
>
>     This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
>
>     You can open a file using an existing file descriptor using ReadableFile::Open
>
>     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
>
>     The documentation for this function says:
>
>     "The file descriptor becomes owned by the ReadableFile, and will be
>     closed on Close() or destruction."
>
>     If you want to do the equivalent thing, but using memory mapping, I
>     think you'll need to add a corresponding API to MemoryMappedFile. This
>     is more perilous because of the API requirements of mmap -- you need
>     to pass the right flags and they may need to be the same flags that
>     were passed when opening the file descriptor, see
>
>     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
>
>     and
>
>     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
>
>     - Wes
>
>     On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
>     >
>     >  The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
>     >
>     > Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
>     >
>     > Thanks,
>     >
>     > Brian
>     >
>
>
>

Re: Passing File Descriptors in the Low-Level API

Posted by Wes McKinney <we...@gmail.com>.
hi Brian,

Please feel free to submit a PR to add the requisite APIs that you
need for your application. Antoine or I or others should be able to
give prompt feedback since we know this code pretty well.

Thanks
Wes

On Sat, Mar 16, 2019 at 11:40 AM Brian Bowman <Br...@sas.com> wrote:
>
> Hi Wes,
>
> Thanks for the quick reply!  To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory.  In other words ...
>
> SAS component does both open() and mmap() which could be for READ or WRITE.
>
> -> Calls low-level Parquet APIs to read an existing file or write a new one.  The open() and mmap() flags are guaranteed to be correct.
>
> At some later point SAS component does an unmap() and close().
>
> -Brian
>
>
> On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:
>
>     hi Brian,
>
>     This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
>
>     You can open a file using an existing file descriptor using ReadableFile::Open
>
>     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
>
>     The documentation for this function says:
>
>     "The file descriptor becomes owned by the ReadableFile, and will be
>     closed on Close() or destruction."
>
>     If you want to do the equivalent thing, but using memory mapping, I
>     think you'll need to add a corresponding API to MemoryMappedFile. This
>     is more perilous because of the API requirements of mmap -- you need
>     to pass the right flags and they may need to be the same flags that
>     were passed when opening the file descriptor, see
>
>     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
>
>     and
>
>     https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
>
>     - Wes
>
>     On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
>     >
>     >  The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
>     >
>     > Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
>     >
>     > Thanks,
>     >
>     > Brian
>     >
>
>
>

Re: Passing File Descriptors in the Low-Level API

Posted by Brian Bowman <Br...@sas.com>.
Hi Wes,

Thanks for the quick reply!  To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory.  In other words ...

SAS component does both open() and mmap() which could be for READ or WRITE.

-> Calls low-level Parquet APIs to read an existing file or write a new one.  The open() and mmap() flags are guaranteed to be correct.

At some later point SAS component does an unmap() and close(). 

-Brian


On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:

    hi Brian,
    
    This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
    
    You can open a file using an existing file descriptor using ReadableFile::Open
    
    https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
    
    The documentation for this function says:
    
    "The file descriptor becomes owned by the ReadableFile, and will be
    closed on Close() or destruction."
    
    If you want to do the equivalent thing, but using memory mapping, I
    think you'll need to add a corresponding API to MemoryMappedFile. This
    is more perilous because of the API requirements of mmap -- you need
    to pass the right flags and they may need to be the same flags that
    were passed when opening the file descriptor, see
    
    https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
    
    and
    
    https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
    
    - Wes
    
    On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
    >
    >  The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
    >
    > Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
    >
    > Thanks,
    >
    > Brian
    >
    
    


Re: Passing File Descriptors in the Low-Level API

Posted by Brian Bowman <Br...@sas.com>.
Hi Wes,

Thanks for the quick reply!  To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory.  In other words ...

SAS component does both open() and mmap() which could be for READ or WRITE.

-> Calls low-level Parquet APIs to read an existing file or write a new one.  The open() and mmap() flags are guaranteed to be correct.

At some later point SAS component does an unmap() and close(). 

-Brian


On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:

    hi Brian,
    
    This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
    
    You can open a file using an existing file descriptor using ReadableFile::Open
    
    https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
    
    The documentation for this function says:
    
    "The file descriptor becomes owned by the ReadableFile, and will be
    closed on Close() or destruction."
    
    If you want to do the equivalent thing, but using memory mapping, I
    think you'll need to add a corresponding API to MemoryMappedFile. This
    is more perilous because of the API requirements of mmap -- you need
    to pass the right flags and they may need to be the same flags that
    were passed when opening the file descriptor, see
    
    https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
    
    and
    
    https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
    
    - Wes
    
    On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
    >
    >  The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
    >
    > Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
    >
    > Thanks,
    >
    > Brian
    >
    
    


Re: Passing File Descriptors in the Low-Level API

Posted by Wes McKinney <we...@gmail.com>.
hi Brian,

This is mostly an Arrow platform question so I'm copying the Arrow mailing list.

You can open a file using an existing file descriptor using ReadableFile::Open

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145

The documentation for this function says:

"The file descriptor becomes owned by the ReadableFile, and will be
closed on Close() or destruction."

If you want to do the equivalent thing, but using memory mapping, I
think you'll need to add a corresponding API to MemoryMappedFile. This
is more perilous because of the API requirements of mmap -- you need
to pass the right flags and they may need to be the same flags that
were passed when opening the file descriptor, see

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378

and

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476

- Wes

On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
>
>  The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
>
> Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
>
> Thanks,
>
> Brian
>

Re: Passing File Descriptors in the Low-Level API

Posted by Wes McKinney <we...@gmail.com>.
hi Brian,

This is mostly an Arrow platform question so I'm copying the Arrow mailing list.

You can open a file using an existing file descriptor using ReadableFile::Open

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145

The documentation for this function says:

"The file descriptor becomes owned by the ReadableFile, and will be
closed on Close() or destruction."

If you want to do the equivalent thing, but using memory mapping, I
think you'll need to add a corresponding API to MemoryMappedFile. This
is more perilous because of the API requirements of mmap -- you need
to pass the right flags and they may need to be the same flags that
were passed when opening the file descriptor, see

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378

and

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476

- Wes

On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
>
>  The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
>
> Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
>
> Thanks,
>
> Brian
>