You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Jorge Cardoso Leitão <jo...@gmail.com> on 2021/01/20 05:22:51 UTC

[Rust] Recent backward incompatible changes in master

Hi,

Just a heads up that there are 4 changes in master that may affect your
code, and I would like to bring them to your attention.

1. `memory::allocate_aligned` no longer initializes memory regions with
zeros. Use `memory::allocate_aligned_zeroed` for this.

Rational: up to 3.0, we allocate a zero-initialized memory region and write
new data to it. This is 25-30% slower than allocating an un-initialized
memory region and writing data to it. We are working towards offering safe
APIs to allow people to grow out of an uninitialized region, just like Vec
does. We opted for changing `allocate_aligned` and introducing
`allocate_aligned_zeroed` because it aligns with Rust's `std::alloc::alloc`
and `std::alloc::alloc_zeroed`.

2. `MutableBuffer::reserve()` signature was changed from
`MutableBuffer::reserve(capacity)` to `MutableBuffer::reserve(additional)`.

Rational: we are trying to offer a `MutableBuffer` experience as similar to
Vec<T> as possible (with an arrow's specific allocation rules, cache-lines,
etc), so that people can easily use it to build buffers without having to
reason much about it. `std::vec::Vec::reserve()` is in additionals, mostly
because it allows itself to handle overflows (i.e. when additional +
existing_cap > size::MAX). In some instances, our own code was already
calling `reserve(additional)` by mistake. This change removes the
dissonance between Vec::reserve(additional) and
MutableBuffer::reserve(capacity), a mistake that is difficult to discover
as it either affects performance or leads to undefined behavior.

3. `MutableBuffer::resize()` signature was changed from
`MutableBuffer::resize(new_len)` to `MutableBuffer::resize(new_len, value:
u8)`.

Rational: same as 2: `Vec::resize()` expects a second parameter `value`.

4. `PrimitiveArray<T>::value(i)` will be marked as `unsafe`.

Rational: the function does not perform bound checks and is thus unsafe.
Use `values()` to get all values as a slice.

All of these changes are obviously open to be revisited. They seem
reasonable to some of us, but there is still plenty of time until 4.0.0 :)

Best,
Jorge