You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Brian Bowman <Br...@sas.com> on 2019/03/14 18:46:59 UTC
Passing File Descriptors in the Low-Level API
The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
Thanks,
Brian
Re: Passing File Descriptors in the Low-Level API
Posted by Brian Bowman <Br...@sas.com>.
Thanks Wes!
I'm working on the integrating and testing the necessary changes in our dev environment. I'll submit a PR once things are working.
Best,
Brian
On 3/16/19, 4:24 PM, "Wes McKinney" <we...@gmail.com> wrote:
EXTERNAL
hi Brian,
Please feel free to submit a PR to add the requisite APIs that you
need for your application. Antoine or I or others should be able to
give prompt feedback since we know this code pretty well.
Thanks
Wes
On Sat, Mar 16, 2019 at 11:40 AM Brian Bowman <Br...@sas.com> wrote:
>
> Hi Wes,
>
> Thanks for the quick reply! To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory. In other words ...
>
> SAS component does both open() and mmap() which could be for READ or WRITE.
>
> -> Calls low-level Parquet APIs to read an existing file or write a new one. The open() and mmap() flags are guaranteed to be correct.
>
> At some later point SAS component does an unmap() and close().
>
> -Brian
>
>
> On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:
>
> hi Brian,
>
> This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
>
> You can open a file using an existing file descriptor using ReadableFile::Open
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
>
> The documentation for this function says:
>
> "The file descriptor becomes owned by the ReadableFile, and will be
> closed on Close() or destruction."
>
> If you want to do the equivalent thing, but using memory mapping, I
> think you'll need to add a corresponding API to MemoryMappedFile. This
> is more perilous because of the API requirements of mmap -- you need
> to pass the right flags and they may need to be the same flags that
> were passed when opening the file descriptor, see
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
>
> and
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
>
> - Wes
>
> On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
> >
> > The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
> >
> > Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
> >
> > Thanks,
> >
> > Brian
> >
>
>
>
Re: Passing File Descriptors in the Low-Level API
Posted by Brian Bowman <Br...@sas.com>.
Thanks Wes!
I'm working on the integrating and testing the necessary changes in our dev environment. I'll submit a PR once things are working.
Best,
Brian
On 3/16/19, 4:24 PM, "Wes McKinney" <we...@gmail.com> wrote:
EXTERNAL
hi Brian,
Please feel free to submit a PR to add the requisite APIs that you
need for your application. Antoine or I or others should be able to
give prompt feedback since we know this code pretty well.
Thanks
Wes
On Sat, Mar 16, 2019 at 11:40 AM Brian Bowman <Br...@sas.com> wrote:
>
> Hi Wes,
>
> Thanks for the quick reply! To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory. In other words ...
>
> SAS component does both open() and mmap() which could be for READ or WRITE.
>
> -> Calls low-level Parquet APIs to read an existing file or write a new one. The open() and mmap() flags are guaranteed to be correct.
>
> At some later point SAS component does an unmap() and close().
>
> -Brian
>
>
> On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:
>
> hi Brian,
>
> This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
>
> You can open a file using an existing file descriptor using ReadableFile::Open
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
>
> The documentation for this function says:
>
> "The file descriptor becomes owned by the ReadableFile, and will be
> closed on Close() or destruction."
>
> If you want to do the equivalent thing, but using memory mapping, I
> think you'll need to add a corresponding API to MemoryMappedFile. This
> is more perilous because of the API requirements of mmap -- you need
> to pass the right flags and they may need to be the same flags that
> were passed when opening the file descriptor, see
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
>
> and
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
>
> - Wes
>
> On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
> >
> > The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
> >
> > Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
> >
> > Thanks,
> >
> > Brian
> >
>
>
>
Re: Passing File Descriptors in the Low-Level API
Posted by Wes McKinney <we...@gmail.com>.
hi Brian,
Please feel free to submit a PR to add the requisite APIs that you
need for your application. Antoine or I or others should be able to
give prompt feedback since we know this code pretty well.
Thanks
Wes
On Sat, Mar 16, 2019 at 11:40 AM Brian Bowman <Br...@sas.com> wrote:
>
> Hi Wes,
>
> Thanks for the quick reply! To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory. In other words ...
>
> SAS component does both open() and mmap() which could be for READ or WRITE.
>
> -> Calls low-level Parquet APIs to read an existing file or write a new one. The open() and mmap() flags are guaranteed to be correct.
>
> At some later point SAS component does an unmap() and close().
>
> -Brian
>
>
> On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:
>
> hi Brian,
>
> This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
>
> You can open a file using an existing file descriptor using ReadableFile::Open
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
>
> The documentation for this function says:
>
> "The file descriptor becomes owned by the ReadableFile, and will be
> closed on Close() or destruction."
>
> If you want to do the equivalent thing, but using memory mapping, I
> think you'll need to add a corresponding API to MemoryMappedFile. This
> is more perilous because of the API requirements of mmap -- you need
> to pass the right flags and they may need to be the same flags that
> were passed when opening the file descriptor, see
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
>
> and
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
>
> - Wes
>
> On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
> >
> > The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
> >
> > Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
> >
> > Thanks,
> >
> > Brian
> >
>
>
>
Re: Passing File Descriptors in the Low-Level API
Posted by Wes McKinney <we...@gmail.com>.
hi Brian,
Please feel free to submit a PR to add the requisite APIs that you
need for your application. Antoine or I or others should be able to
give prompt feedback since we know this code pretty well.
Thanks
Wes
On Sat, Mar 16, 2019 at 11:40 AM Brian Bowman <Br...@sas.com> wrote:
>
> Hi Wes,
>
> Thanks for the quick reply! To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory. In other words ...
>
> SAS component does both open() and mmap() which could be for READ or WRITE.
>
> -> Calls low-level Parquet APIs to read an existing file or write a new one. The open() and mmap() flags are guaranteed to be correct.
>
> At some later point SAS component does an unmap() and close().
>
> -Brian
>
>
> On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:
>
> hi Brian,
>
> This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
>
> You can open a file using an existing file descriptor using ReadableFile::Open
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
>
> The documentation for this function says:
>
> "The file descriptor becomes owned by the ReadableFile, and will be
> closed on Close() or destruction."
>
> If you want to do the equivalent thing, but using memory mapping, I
> think you'll need to add a corresponding API to MemoryMappedFile. This
> is more perilous because of the API requirements of mmap -- you need
> to pass the right flags and they may need to be the same flags that
> were passed when opening the file descriptor, see
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
>
> and
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
>
> - Wes
>
> On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
> >
> > The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
> >
> > Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
> >
> > Thanks,
> >
> > Brian
> >
>
>
>
Re: Passing File Descriptors in the Low-Level API
Posted by Brian Bowman <Br...@sas.com>.
Hi Wes,
Thanks for the quick reply! To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory. In other words ...
SAS component does both open() and mmap() which could be for READ or WRITE.
-> Calls low-level Parquet APIs to read an existing file or write a new one. The open() and mmap() flags are guaranteed to be correct.
At some later point SAS component does an unmap() and close().
-Brian
On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:
hi Brian,
This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
You can open a file using an existing file descriptor using ReadableFile::Open
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
The documentation for this function says:
"The file descriptor becomes owned by the ReadableFile, and will be
closed on Close() or destruction."
If you want to do the equivalent thing, but using memory mapping, I
think you'll need to add a corresponding API to MemoryMappedFile. This
is more perilous because of the API requirements of mmap -- you need
to pass the right flags and they may need to be the same flags that
were passed when opening the file descriptor, see
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
and
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
- Wes
On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
>
> The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
>
> Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
>
> Thanks,
>
> Brian
>
Re: Passing File Descriptors in the Low-Level API
Posted by Brian Bowman <Br...@sas.com>.
Hi Wes,
Thanks for the quick reply! To be clear, the usage I'm working on needs to own both the Open FileDescriptor and corresponding mapped memory. In other words ...
SAS component does both open() and mmap() which could be for READ or WRITE.
-> Calls low-level Parquet APIs to read an existing file or write a new one. The open() and mmap() flags are guaranteed to be correct.
At some later point SAS component does an unmap() and close().
-Brian
On 3/14/19, 3:42 PM, "Wes McKinney" <we...@gmail.com> wrote:
hi Brian,
This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
You can open a file using an existing file descriptor using ReadableFile::Open
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
The documentation for this function says:
"The file descriptor becomes owned by the ReadableFile, and will be
closed on Close() or destruction."
If you want to do the equivalent thing, but using memory mapping, I
think you'll need to add a corresponding API to MemoryMappedFile. This
is more perilous because of the API requirements of mmap -- you need
to pass the right flags and they may need to be the same flags that
were passed when opening the file descriptor, see
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
and
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
- Wes
On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
>
> The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
>
> Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
>
> Thanks,
>
> Brian
>
Re: Passing File Descriptors in the Low-Level API
Posted by Wes McKinney <we...@gmail.com>.
hi Brian,
This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
You can open a file using an existing file descriptor using ReadableFile::Open
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
The documentation for this function says:
"The file descriptor becomes owned by the ReadableFile, and will be
closed on Close() or destruction."
If you want to do the equivalent thing, but using memory mapping, I
think you'll need to add a corresponding API to MemoryMappedFile. This
is more perilous because of the API requirements of mmap -- you need
to pass the right flags and they may need to be the same flags that
were passed when opening the file descriptor, see
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
and
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
- Wes
On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
>
> The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
>
> Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
>
> Thanks,
>
> Brian
>
Re: Passing File Descriptors in the Low-Level API
Posted by Wes McKinney <we...@gmail.com>.
hi Brian,
This is mostly an Arrow platform question so I'm copying the Arrow mailing list.
You can open a file using an existing file descriptor using ReadableFile::Open
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.h#L145
The documentation for this function says:
"The file descriptor becomes owned by the ReadableFile, and will be
closed on Close() or destruction."
If you want to do the equivalent thing, but using memory mapping, I
think you'll need to add a corresponding API to MemoryMappedFile. This
is more perilous because of the API requirements of mmap -- you need
to pass the right flags and they may need to be the same flags that
were passed when opening the file descriptor, see
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L378
and
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L476
- Wes
On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman <Br...@sas.com> wrote:
>
> The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API.
>
> Is there a way for application code to control the open lifetime of mmap()’d Parquet files by passing an already open FileDescriptor to Parquet low-level API open/close methods?
>
> Thanks,
>
> Brian
>