You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Pearu Peterson <pe...@quansight.com> on 2018/09/07 07:51:26 UTC

Buffer writers and seek method, NativeFile.is_seekable proposal

Hi,

In Arrow C++, various buffer writers define Seek method while in
pyarrow the seek is defined only for buffer readers (for instance,
NativeFile.seek references only rd_file).

So, pyarrow relates 'seekable' strictly to 'readable' file property while
'seekable' would make sense also when a file is 'writeable'. Non-seekable
files would be sockets or pipes but memory buffers like CudaBuffer can be
seekable.

Is there any reason for relating 'seekable' to 'readable-only' within
pyarrow?

I propose introducing is_seekable attribute to NativeFile in order to untie
'seekable' property from 'readable' and 'writable' properties. What do you
think?

Best regards,
Pearu

Re: Buffer writers and seek method, NativeFile.is_seekable proposal

Posted by Wes McKinney <we...@gmail.com>.
hi Paul,

We aren't talking about columnar data structures, but file interfaces,
i.e. the C++ classes in
https://github.com/apache/arrow/tree/master/cpp/src/arrow/io

- Wes
On Fri, Sep 7, 2018 at 2:56 PM Paul Rogers <pa...@yahoo.com.invalid> wrote:
>
> Hi Wes,
>
> Intersting. Random-access writes is easy for fixed-width vectors. I'm curious how it might be done for variable-width vectors (VARCHAR, or arrays) given the structure of the offset vectors? Is the structure of the offset vector changing (to include, say, the start and length of each value?) This always seemed the stumbling block in prior discussions of this topic..
>
> Thanks,
> - Paul
>
>
>
>     On Friday, September 7, 2018, 11:40:07 AM PDT, Wes McKinney <we...@gmail.com> wrote:
>
>  hi Pearu,
>
> Sounds good to me. I'd always intended to add support for random
> access writes but have not done it yet.
>
> Thanks,
> Wes
> On Fri, Sep 7, 2018 at 3:51 AM Pearu Peterson
> <pe...@quansight.com> wrote:
> >
> > Hi,
> >
> > In Arrow C++, various buffer writers define Seek method while in
> > pyarrow the seek is defined only for buffer readers (for instance,
> > NativeFile.seek references only rd_file).
> >
> > So, pyarrow relates 'seekable' strictly to 'readable' file property while
> > 'seekable' would make sense also when a file is 'writeable'. Non-seekable
> > files would be sockets or pipes but memory buffers like CudaBuffer can be
> > seekable.
> >
> > Is there any reason for relating 'seekable' to 'readable-only' within
> > pyarrow?
> >
> > I propose introducing is_seekable attribute to NativeFile in order to untie
> > 'seekable' property from 'readable' and 'writable' properties. What do you
> > think?
> >
> > Best regards,
> > Pearu
>

Re: Buffer writers and seek method, NativeFile.is_seekable proposal

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Wes,

Intersting. Random-access writes is easy for fixed-width vectors. I'm curious how it might be done for variable-width vectors (VARCHAR, or arrays) given the structure of the offset vectors? Is the structure of the offset vector changing (to include, say, the start and length of each value?) This always seemed the stumbling block in prior discussions of this topic..

Thanks,
- Paul

 

    On Friday, September 7, 2018, 11:40:07 AM PDT, Wes McKinney <we...@gmail.com> wrote:  
 
 hi Pearu,

Sounds good to me. I'd always intended to add support for random
access writes but have not done it yet.

Thanks,
Wes
On Fri, Sep 7, 2018 at 3:51 AM Pearu Peterson
<pe...@quansight.com> wrote:
>
> Hi,
>
> In Arrow C++, various buffer writers define Seek method while in
> pyarrow the seek is defined only for buffer readers (for instance,
> NativeFile.seek references only rd_file).
>
> So, pyarrow relates 'seekable' strictly to 'readable' file property while
> 'seekable' would make sense also when a file is 'writeable'. Non-seekable
> files would be sockets or pipes but memory buffers like CudaBuffer can be
> seekable.
>
> Is there any reason for relating 'seekable' to 'readable-only' within
> pyarrow?
>
> I propose introducing is_seekable attribute to NativeFile in order to untie
> 'seekable' property from 'readable' and 'writable' properties. What do you
> think?
>
> Best regards,
> Pearu
  

Re: Buffer writers and seek method, NativeFile.is_seekable proposal

Posted by Wes McKinney <we...@gmail.com>.
I just created https://issues.apache.org/jira/browse/ARROW-3189
On Fri, Sep 7, 2018 at 2:39 PM Wes McKinney <we...@gmail.com> wrote:
>
> hi Pearu,
>
> Sounds good to me. I'd always intended to add support for random
> access writes but have not done it yet.
>
> Thanks,
> Wes
> On Fri, Sep 7, 2018 at 3:51 AM Pearu Peterson
> <pe...@quansight.com> wrote:
> >
> > Hi,
> >
> > In Arrow C++, various buffer writers define Seek method while in
> > pyarrow the seek is defined only for buffer readers (for instance,
> > NativeFile.seek references only rd_file).
> >
> > So, pyarrow relates 'seekable' strictly to 'readable' file property while
> > 'seekable' would make sense also when a file is 'writeable'. Non-seekable
> > files would be sockets or pipes but memory buffers like CudaBuffer can be
> > seekable.
> >
> > Is there any reason for relating 'seekable' to 'readable-only' within
> > pyarrow?
> >
> > I propose introducing is_seekable attribute to NativeFile in order to untie
> > 'seekable' property from 'readable' and 'writable' properties. What do you
> > think?
> >
> > Best regards,
> > Pearu

Re: Buffer writers and seek method, NativeFile.is_seekable proposal

Posted by Wes McKinney <we...@gmail.com>.
hi Pearu,

Sounds good to me. I'd always intended to add support for random
access writes but have not done it yet.

Thanks,
Wes
On Fri, Sep 7, 2018 at 3:51 AM Pearu Peterson
<pe...@quansight.com> wrote:
>
> Hi,
>
> In Arrow C++, various buffer writers define Seek method while in
> pyarrow the seek is defined only for buffer readers (for instance,
> NativeFile.seek references only rd_file).
>
> So, pyarrow relates 'seekable' strictly to 'readable' file property while
> 'seekable' would make sense also when a file is 'writeable'. Non-seekable
> files would be sockets or pipes but memory buffers like CudaBuffer can be
> seekable.
>
> Is there any reason for relating 'seekable' to 'readable-only' within
> pyarrow?
>
> I propose introducing is_seekable attribute to NativeFile in order to untie
> 'seekable' property from 'readable' and 'writable' properties. What do you
> think?
>
> Best regards,
> Pearu