You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Jörn Horstmann <jo...@signavio.com> on 2020/08/03 12:37:53 UTC

Compute kernels and alignment of buffer slices

While investigating an issue regarding offset handling in the rust
arithmetic kernels (https://issues.apache.org/jira/browse/ARROW-9583),
I started to wonder how the other implementations are handling compute
on buffer slices.

The rust implementation currently allows creating slices of arrays
starting at arbitrary aligned offsets. This becomes a problem with
boolean arrays and with the null bitmaps, since operations on those
are currently working with whole bytes as the smallest unit. There
could be several options to solve this, all adding additional
complexity or having other downsides:

- calculate null bitmaps bit by bit if not properly aligned, leading
to a big performance drop
- calculate null bitmaps on whole bytes and then try to rotate the
resulting buffer by a certain number of bits. quite complex code and
also some performance overhead
- disallow compute kernels on non-aligned buffers, at least if null
bitmaps are involved

I'm leaning towards the last option, a draft PR is at
https://github.com/apache/arrow/pull/7854

Another issue with offsets is that, at least in the rust
implementation, some simd kernels currently assume the whole buffer to
be aligned to 64 bytes. As soon as there is an offset that is not a
multiple of 64, this could lead to unsafe out of bounds reads and
writes of memory.

I'm very interested in how the C++ and Java implementations handle those issues.

-- 
Jörn Horstmann | Senior Backend Engineer

www.signavio.com
Kurfürstenstraße 111, 10787 Berlin, Germany

Re: Compute kernels and alignment of buffer slices

Posted by Micah Kornfield <em...@gmail.com>.
Also for alignment requirements in C++ we sometimes process unaligned
leading/trailing data to reach the required alignment.


On Monday, August 3, 2020, Wes McKinney <we...@gmail.com> wrote:

> We handle arbitrary slices in C++ and it hasn't seemed especially
> burdensome. We have some utilities (e.g. see
> arrow/util/bit_block_counter.h) to facilitate efficiently iterating
> through bitmaps 64 bits at a time (even when slices on an unaligned
> offset) and associated bit-by-bit iterators (e.g. BitmapReader,
> BitmapWriter)
>
> On Mon, Aug 3, 2020 at 7:38 AM Jörn Horstmann
> <jo...@signavio.com> wrote:
> >
> > While investigating an issue regarding offset handling in the rust
> > arithmetic kernels (https://issues.apache.org/jira/browse/ARROW-9583),
> > I started to wonder how the other implementations are handling compute
> > on buffer slices.
> >
> > The rust implementation currently allows creating slices of arrays
> > starting at arbitrary aligned offsets. This becomes a problem with
> > boolean arrays and with the null bitmaps, since operations on those
> > are currently working with whole bytes as the smallest unit. There
> > could be several options to solve this, all adding additional
> > complexity or having other downsides:
> >
> > - calculate null bitmaps bit by bit if not properly aligned, leading
> > to a big performance drop
> > - calculate null bitmaps on whole bytes and then try to rotate the
> > resulting buffer by a certain number of bits. quite complex code and
> > also some performance overhead
> > - disallow compute kernels on non-aligned buffers, at least if null
> > bitmaps are involved
> >
> > I'm leaning towards the last option, a draft PR is at
> > https://github.com/apache/arrow/pull/7854
> >
> > Another issue with offsets is that, at least in the rust
> > implementation, some simd kernels currently assume the whole buffer to
> > be aligned to 64 bytes. As soon as there is an offset that is not a
> > multiple of 64, this could lead to unsafe out of bounds reads and
> > writes of memory.
> >
> > I'm very interested in how the C++ and Java implementations handle those
> issues.
> >
> > --
> > Jörn Horstmann | Senior Backend Engineer
> >
> > www.signavio.com
> > Kurfürstenstraße 111, 10787 Berlin, Germany
>

Re: Compute kernels and alignment of buffer slices

Posted by Wes McKinney <we...@gmail.com>.
We handle arbitrary slices in C++ and it hasn't seemed especially
burdensome. We have some utilities (e.g. see
arrow/util/bit_block_counter.h) to facilitate efficiently iterating
through bitmaps 64 bits at a time (even when slices on an unaligned
offset) and associated bit-by-bit iterators (e.g. BitmapReader,
BitmapWriter)

On Mon, Aug 3, 2020 at 7:38 AM Jörn Horstmann
<jo...@signavio.com> wrote:
>
> While investigating an issue regarding offset handling in the rust
> arithmetic kernels (https://issues.apache.org/jira/browse/ARROW-9583),
> I started to wonder how the other implementations are handling compute
> on buffer slices.
>
> The rust implementation currently allows creating slices of arrays
> starting at arbitrary aligned offsets. This becomes a problem with
> boolean arrays and with the null bitmaps, since operations on those
> are currently working with whole bytes as the smallest unit. There
> could be several options to solve this, all adding additional
> complexity or having other downsides:
>
> - calculate null bitmaps bit by bit if not properly aligned, leading
> to a big performance drop
> - calculate null bitmaps on whole bytes and then try to rotate the
> resulting buffer by a certain number of bits. quite complex code and
> also some performance overhead
> - disallow compute kernels on non-aligned buffers, at least if null
> bitmaps are involved
>
> I'm leaning towards the last option, a draft PR is at
> https://github.com/apache/arrow/pull/7854
>
> Another issue with offsets is that, at least in the rust
> implementation, some simd kernels currently assume the whole buffer to
> be aligned to 64 bytes. As soon as there is an offset that is not a
> multiple of 64, this could lead to unsafe out of bounds reads and
> writes of memory.
>
> I'm very interested in how the C++ and Java implementations handle those issues.
>
> --
> Jörn Horstmann | Senior Backend Engineer
>
> www.signavio.com
> Kurfürstenstraße 111, 10787 Berlin, Germany