You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by paddy horan <pa...@hotmail.com> on 2018/12/07 02:19:53 UTC

[RUST] [DISCUSS] Changing type of array lengths

All,

As part of the PR for ARROW-3347 there was a discussion regarding the type that should be used for anything that measures the length of an array, i.e.  len and capacity.

The result of this discussion was that the Rust implementation should switch to using usize as the type for representing len and capacity.  This would mean supporting a way to split larger arrays into smaller array when passing data from one implementation to another.  The exact size of these smaller arrays would depend on the implementation you are passing data to.  C++ supports arrays up to size i64, but **all** implementations support lengths up to i32 as specified by the spec.  The full discussion is here:
https://github.com/apache/arrow/pull/2858

This is not a major change so I’ll push it to 0.13 but I wanted to open up the discussion before making the change, the previous debate was hidden in a PR.  In particular, Andy and Chao are you in favor of this change?

Paddy

Re: [RUST] [DISCUSS] Changing type of array lengths

Posted by paddy horan <pa...@hotmail.com>.
Thanks All,

I didn't hear any strong opinions against this change so the PR is here:
https://github.com/apache/arrow/pull/3142

Thanks,
Paddy
________________________________
From: Marco Neumann <ma...@crepererum.net.INVALID>
Sent: Friday, December 7, 2018 12:35 PM
To: dev@arrow.apache.org
Subject: Re: [RUST] [DISCUSS] Changing type of array lengths

On windows it depends if it's a 32 or 64 bit binary, like on every other system as well.

usize is usually used by Rust containers for indexing (see for example Vec in the standard library) and I found it personally very annoying if libraries break that rule, because in Rust you have to be explicit about integer conversions. You don't have implicit down or up sizings like in C/C++. So you cast all back and forth 100 of times just for a single library you use.

On December 7, 2018 6:18:42 PM GMT+01:00, Wes McKinney <we...@gmail.com> wrote:
>What would be the argument for using usize over i64/u64? Is usize 64
>bits in Rust when compiling on Windows?
>On Fri, Dec 7, 2018 at 9:48 AM Andy Grove <an...@gmail.com>
>wrote:
>>
>> I am in favor of using usize.
>>
>> Thanks.
>>
>> On Thu, Dec 6, 2018 at 7:20 PM paddy horan <pa...@hotmail.com>
>wrote:
>>
>> > All,
>> >
>> > As part of the PR for ARROW-3347 there was a discussion regarding
>the type
>> > that should be used for anything that measures the length of an
>array,
>> > i.e.  len and capacity.
>> >
>> > The result of this discussion was that the Rust implementation
>should
>> > switch to using usize as the type for representing len and
>capacity.  This
>> > would mean supporting a way to split larger arrays into smaller
>array when
>> > passing data from one implementation to another.  The exact size of
>these
>> > smaller arrays would depend on the implementation you are passing
>data to.
>> > C++ supports arrays up to size i64, but **all** implementations
>support
>> > lengths up to i32 as specified by the spec.  The full discussion is
>here:
>> > https://github.com/apache/arrow/pull/2858
>> >
>> > This is not a major change so I’ll push it to 0.13 but I wanted to
>open up
>> > the discussion before making the change, the previous debate was
>hidden in
>> > a PR.  In particular, Andy and Chao are you in favor of this
>change?
>> >
>> > Paddy
>> >

Re: [RUST] [DISCUSS] Changing type of array lengths

Posted by Marco Neumann <ma...@crepererum.net.INVALID>.
On windows it depends if it's a 32 or 64 bit binary, like on every other system as well.

usize is usually used by Rust containers for indexing (see for example Vec in the standard library) and I found it personally very annoying if libraries break that rule, because in Rust you have to be explicit about integer conversions. You don't have implicit down or up sizings like in C/C++. So you cast all back and forth 100 of times just for a single library you use. 

On December 7, 2018 6:18:42 PM GMT+01:00, Wes McKinney <we...@gmail.com> wrote:
>What would be the argument for using usize over i64/u64? Is usize 64
>bits in Rust when compiling on Windows?
>On Fri, Dec 7, 2018 at 9:48 AM Andy Grove <an...@gmail.com>
>wrote:
>>
>> I am in favor of using usize.
>>
>> Thanks.
>>
>> On Thu, Dec 6, 2018 at 7:20 PM paddy horan <pa...@hotmail.com>
>wrote:
>>
>> > All,
>> >
>> > As part of the PR for ARROW-3347 there was a discussion regarding
>the type
>> > that should be used for anything that measures the length of an
>array,
>> > i.e.  len and capacity.
>> >
>> > The result of this discussion was that the Rust implementation
>should
>> > switch to using usize as the type for representing len and
>capacity.  This
>> > would mean supporting a way to split larger arrays into smaller
>array when
>> > passing data from one implementation to another.  The exact size of
>these
>> > smaller arrays would depend on the implementation you are passing
>data to.
>> > C++ supports arrays up to size i64, but **all** implementations
>support
>> > lengths up to i32 as specified by the spec.  The full discussion is
>here:
>> > https://github.com/apache/arrow/pull/2858
>> >
>> > This is not a major change so I’ll push it to 0.13 but I wanted to
>open up
>> > the discussion before making the change, the previous debate was
>hidden in
>> > a PR.  In particular, Andy and Chao are you in favor of this
>change?
>> >
>> > Paddy
>> >

Re: [RUST] [DISCUSS] Changing type of array lengths

Posted by Wes McKinney <we...@gmail.com>.
What would be the argument for using usize over i64/u64? Is usize 64
bits in Rust when compiling on Windows?
On Fri, Dec 7, 2018 at 9:48 AM Andy Grove <an...@gmail.com> wrote:
>
> I am in favor of using usize.
>
> Thanks.
>
> On Thu, Dec 6, 2018 at 7:20 PM paddy horan <pa...@hotmail.com> wrote:
>
> > All,
> >
> > As part of the PR for ARROW-3347 there was a discussion regarding the type
> > that should be used for anything that measures the length of an array,
> > i.e.  len and capacity.
> >
> > The result of this discussion was that the Rust implementation should
> > switch to using usize as the type for representing len and capacity.  This
> > would mean supporting a way to split larger arrays into smaller array when
> > passing data from one implementation to another.  The exact size of these
> > smaller arrays would depend on the implementation you are passing data to.
> > C++ supports arrays up to size i64, but **all** implementations support
> > lengths up to i32 as specified by the spec.  The full discussion is here:
> > https://github.com/apache/arrow/pull/2858
> >
> > This is not a major change so I’ll push it to 0.13 but I wanted to open up
> > the discussion before making the change, the previous debate was hidden in
> > a PR.  In particular, Andy and Chao are you in favor of this change?
> >
> > Paddy
> >

Re: [RUST] [DISCUSS] Changing type of array lengths

Posted by Andy Grove <an...@gmail.com>.
I am in favor of using usize.

Thanks.

On Thu, Dec 6, 2018 at 7:20 PM paddy horan <pa...@hotmail.com> wrote:

> All,
>
> As part of the PR for ARROW-3347 there was a discussion regarding the type
> that should be used for anything that measures the length of an array,
> i.e.  len and capacity.
>
> The result of this discussion was that the Rust implementation should
> switch to using usize as the type for representing len and capacity.  This
> would mean supporting a way to split larger arrays into smaller array when
> passing data from one implementation to another.  The exact size of these
> smaller arrays would depend on the implementation you are passing data to.
> C++ supports arrays up to size i64, but **all** implementations support
> lengths up to i32 as specified by the spec.  The full discussion is here:
> https://github.com/apache/arrow/pull/2858
>
> This is not a major change so I’ll push it to 0.13 but I wanted to open up
> the discussion before making the change, the previous debate was hidden in
> a PR.  In particular, Andy and Chao are you in favor of this change?
>
> Paddy
>

Re: [RUST] [DISCUSS] Changing type of array lengths

Posted by Marco Neumann <ma...@crepererum.net.INVALID>.
One question here is: do we want to support datasets with more than 4G entries on 32bit systems? If so, how would this even be possible (since you cannot just fit that much data in any addressable memory chunk in Rust)? 

So I would say: usize is idiomic and supports large enough datasets on the system in question. So you get u64 on 64 bit systems and u32 on 32 bit systems. 

On December 7, 2018 4:05:34 PM GMT+01:00, Wes McKinney <we...@gmail.com> wrote:
>Thanks for raising the issue, Paddy. In C++/Python/R we often work
>with vary large contiguous datasets, so having support for 64-bit
>lengths is important. If supporting this in Rust is not a hardship, I
>think it's a good idea.
>
>For IPC (shared memory) or RPC (Flight / gRPC), in many cases it would
>make sense to break things into smaller chunks. We have an interface
>to slice a table (which may be either contiguous or chunked
>internally) into chunks of a desired size (like 64K or similar)
>
>https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.h#L266
>
>- Wes
>On Thu, Dec 6, 2018 at 8:20 PM paddy horan <pa...@hotmail.com>
>wrote:
>>
>> All,
>>
>> As part of the PR for ARROW-3347 there was a discussion regarding the
>type that should be used for anything that measures the length of an
>array, i.e.  len and capacity.
>>
>> The result of this discussion was that the Rust implementation should
>switch to using usize as the type for representing len and capacity. 
>This would mean supporting a way to split larger arrays into smaller
>array when passing data from one implementation to another.  The exact
>size of these smaller arrays would depend on the implementation you are
>passing data to.  C++ supports arrays up to size i64, but **all**
>implementations support lengths up to i32 as specified by the spec. 
>The full discussion is here:
>> https://github.com/apache/arrow/pull/2858
>>
>> This is not a major change so I’ll push it to 0.13 but I wanted to
>open up the discussion before making the change, the previous debate
>was hidden in a PR.  In particular, Andy and Chao are you in favor of
>this change?
>>
>> Paddy

Re: [RUST] [DISCUSS] Changing type of array lengths

Posted by Wes McKinney <we...@gmail.com>.
Thanks for raising the issue, Paddy. In C++/Python/R we often work
with vary large contiguous datasets, so having support for 64-bit
lengths is important. If supporting this in Rust is not a hardship, I
think it's a good idea.

For IPC (shared memory) or RPC (Flight / gRPC), in many cases it would
make sense to break things into smaller chunks. We have an interface
to slice a table (which may be either contiguous or chunked
internally) into chunks of a desired size (like 64K or similar)

https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.h#L266

- Wes
On Thu, Dec 6, 2018 at 8:20 PM paddy horan <pa...@hotmail.com> wrote:
>
> All,
>
> As part of the PR for ARROW-3347 there was a discussion regarding the type that should be used for anything that measures the length of an array, i.e.  len and capacity.
>
> The result of this discussion was that the Rust implementation should switch to using usize as the type for representing len and capacity.  This would mean supporting a way to split larger arrays into smaller array when passing data from one implementation to another.  The exact size of these smaller arrays would depend on the implementation you are passing data to.  C++ supports arrays up to size i64, but **all** implementations support lengths up to i32 as specified by the spec.  The full discussion is here:
> https://github.com/apache/arrow/pull/2858
>
> This is not a major change so I’ll push it to 0.13 but I wanted to open up the discussion before making the change, the previous debate was hidden in a PR.  In particular, Andy and Chao are you in favor of this change?
>
> Paddy