You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by paddy horan <pa...@hotmail.com> on 2020/01/10 02:13:22 UTC

[Discuss][Rust] Policy regarding "unsafe"

Hi All,

This time last year there was a brief discussion on the usage of unsafe in Rust (a user on github raised the issue and I created the JIRA). [1]

So far we mostly avoid unsafe in the public API's.  The thinking here is that Arrow is a "development platform", i.e. lower level that most libraries, and library builders will want to avoid any performance hit of bounds checking, etc.

This is not typical in the Rust community where unsafe is a clear signal that care is needed.  Although it might clutter the API a little more I would be in favor of having safe and unsafe variants of methods as needed.  For instance, "value" for array access would be changed to "value" and "value_unchecked" where the latter is unsafe and does not perform bounds checks.

We don't have a huge number of libraries building on top of Arrow in Rust at the moment so it seems like a good time, before 1.0, to decide on this to avoid breaking changes to the public API in post 1.0.

Thoughts?

Paddy

[1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557


Re: [Discuss][Rust] Policy regarding "unsafe"

Posted by Neville Dipale <ne...@gmail.com>.
Hi Paddy,

On Fri, 10 Jan 2020 at 04:13, paddy horan <pa...@hotmail.com> wrote:

> Hi All,
>
> This time last year there was a brief discussion on the usage of unsafe in
> Rust (a user on github raised the issue and I created the JIRA). [1]
>
> So far we mostly avoid unsafe in the public API's.  The thinking here is
> that Arrow is a "development platform", i.e. lower level that most
> libraries, and library builders will want to avoid any performance hit of
> bounds checking, etc.
>
> This is not typical in the Rust community where unsafe is a clear signal
> that care is needed.  Although it might clutter the API a little more I
> would be in favor of having safe and unsafe variants of methods as needed.
> For instance, "value" for array access would be changed to "value" and
> "value_unchecked" where the latter is unsafe and does not perform bounds
> checks.
>
> We don't have a huge number of libraries building on top of Arrow in Rust
> at the moment so it seems like a good time, before 1.0, to decide on this
> to avoid breaking changes to the public API in post 1.0.
>
> Thoughts?
>
> Paddy
>
> [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557
>
>

Re: [Discuss][Rust] Policy regarding "unsafe"

Posted by paddy horan <pa...@hotmail.com>.
I think we are all broadly thinking along the same lines.

I would mention that I don't see "unsafe" as a problem that needs to be removed per-se, it has it's place especially in libraries that are lower level like Arrow.

I do have a problem with the fact that "value" (which is safe) has a comment in the docstring that says "Note this doesn't do any bounds checking for performance reasons", I would prefer to see this marked as unsafe.

I'll open some JIRA's and PR's and we can decide on a case by case basis.

Thanks,
Paddy

________________________________
From: Andy Grove <an...@gmail.com>
Sent: Friday, January 17, 2020 1:44 PM
To: dev <de...@arrow.apache.org>
Subject: Re: [Discuss][Rust] Policy regarding "unsafe"

I agree that we need to audit use of unsafe and think carefully about how
we use it.

This blog post is somewhat tangential since it is primarily about the
actix-web situation with the author quitting open source, but "unsafe"
played a large role in it, and I think this is worth reading.

 https://words.steveklabnik.com/a-sad-day-for-rust

Andy.

On Fri, Jan 17, 2020 at 11:15 AM Neville Dipale <ne...@gmail.com>
wrote:

> Hi Paddy, Arrow Developers,
>
> I've given this some thought, and I preliminarily think that perhaps we can
> audit our use of unsafe and evaluate where we can remove it, propagate it
> upwards (and provide safe alternatives) or provide some safety to callers.
>
> Looking at the 3 options that Paul Kernfeld raised in the linked JIRA:
>
>
>    1. Add in bounds checking so that we don't need to deal with unsafe at
>    all.
>    2. Propagate the unsafes up through the code.
>    3. Maintain a safe and unsafe version of each function that is currently
>    unsafe.
>
>
> I think bounds checking would hurt performance, an example being the
> changes introduced in https://issues.apache.org/jira/browse/ARROW-4670. In
> ARROW-4670, I believe we were able to get the compiler to auto-vectorise
> due to Array::value() avoiding bounds checks. In the case of compute, we
> are in control of the array length, and so we know that it's safe to skip
> bounds checking. I presume this would largely be the case in tabular-data
> use-cases (because we assert that arrows in a record batch meet certain
> criteria).
>
> From a cursory glance, if we do find that we don't need explicit SIMD
> (still immature in Rust, I've found it difficult to implement in some
> cases), we could potentially reduce our unsafe count by around 20%. The
> flatbuffers generated files also introduce a lot of unsafe (~26%), so we'd
> need to maybe adopt option 2 from Paul on IPC once we're done with the
> basics.
>
> We'd then mainly be left with bit manipulation and `Buffer` (which as as
> much unsafe as the fbs generated files). I think the API around buffer
> would depend on whether we're expecting (based on what can be done with
> buffers) this to be exposed to users beyond those using Arrow as a
> development platform.
>
> The above are some of my thoughts, but important's that I don't have a lot
> of experience with Rust, especially `unsafe` and the other dark corners of
> the language.
>
> Regards
> Neville
>
> On Fri, 10 Jan 2020 at 04:13, paddy horan <pa...@hotmail.com> wrote:
>
> > Hi All,
> >
> > This time last year there was a brief discussion on the usage of unsafe
> in
> > Rust (a user on github raised the issue and I created the JIRA). [1]
> >
> > So far we mostly avoid unsafe in the public API's.  The thinking here is
> > that Arrow is a "development platform", i.e. lower level that most
> > libraries, and library builders will want to avoid any performance hit of
> > bounds checking, etc.
> >
> > This is not typical in the Rust community where unsafe is a clear signal
> > that care is needed.  Although it might clutter the API a little more I
> > would be in favor of having safe and unsafe variants of methods as
> needed.
> > For instance, "value" for array access would be changed to "value" and
> > "value_unchecked" where the latter is unsafe and does not perform bounds
> > checks.
> >
> > We don't have a huge number of libraries building on top of Arrow in Rust
> > at the moment so it seems like a good time, before 1.0, to decide on this
> > to avoid breaking changes to the public API in post 1.0.
> >
> > Thoughts?
> >
> > Paddy
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557
> >
> >
>

Re: [Discuss][Rust] Policy regarding "unsafe"

Posted by Andy Grove <an...@gmail.com>.
I agree that we need to audit use of unsafe and think carefully about how
we use it.

This blog post is somewhat tangential since it is primarily about the
actix-web situation with the author quitting open source, but "unsafe"
played a large role in it, and I think this is worth reading.

 https://words.steveklabnik.com/a-sad-day-for-rust

Andy.

On Fri, Jan 17, 2020 at 11:15 AM Neville Dipale <ne...@gmail.com>
wrote:

> Hi Paddy, Arrow Developers,
>
> I've given this some thought, and I preliminarily think that perhaps we can
> audit our use of unsafe and evaluate where we can remove it, propagate it
> upwards (and provide safe alternatives) or provide some safety to callers.
>
> Looking at the 3 options that Paul Kernfeld raised in the linked JIRA:
>
>
>    1. Add in bounds checking so that we don't need to deal with unsafe at
>    all.
>    2. Propagate the unsafes up through the code.
>    3. Maintain a safe and unsafe version of each function that is currently
>    unsafe.
>
>
> I think bounds checking would hurt performance, an example being the
> changes introduced in https://issues.apache.org/jira/browse/ARROW-4670. In
> ARROW-4670, I believe we were able to get the compiler to auto-vectorise
> due to Array::value() avoiding bounds checks. In the case of compute, we
> are in control of the array length, and so we know that it's safe to skip
> bounds checking. I presume this would largely be the case in tabular-data
> use-cases (because we assert that arrows in a record batch meet certain
> criteria).
>
> From a cursory glance, if we do find that we don't need explicit SIMD
> (still immature in Rust, I've found it difficult to implement in some
> cases), we could potentially reduce our unsafe count by around 20%. The
> flatbuffers generated files also introduce a lot of unsafe (~26%), so we'd
> need to maybe adopt option 2 from Paul on IPC once we're done with the
> basics.
>
> We'd then mainly be left with bit manipulation and `Buffer` (which as as
> much unsafe as the fbs generated files). I think the API around buffer
> would depend on whether we're expecting (based on what can be done with
> buffers) this to be exposed to users beyond those using Arrow as a
> development platform.
>
> The above are some of my thoughts, but important's that I don't have a lot
> of experience with Rust, especially `unsafe` and the other dark corners of
> the language.
>
> Regards
> Neville
>
> On Fri, 10 Jan 2020 at 04:13, paddy horan <pa...@hotmail.com> wrote:
>
> > Hi All,
> >
> > This time last year there was a brief discussion on the usage of unsafe
> in
> > Rust (a user on github raised the issue and I created the JIRA). [1]
> >
> > So far we mostly avoid unsafe in the public API's.  The thinking here is
> > that Arrow is a "development platform", i.e. lower level that most
> > libraries, and library builders will want to avoid any performance hit of
> > bounds checking, etc.
> >
> > This is not typical in the Rust community where unsafe is a clear signal
> > that care is needed.  Although it might clutter the API a little more I
> > would be in favor of having safe and unsafe variants of methods as
> needed.
> > For instance, "value" for array access would be changed to "value" and
> > "value_unchecked" where the latter is unsafe and does not perform bounds
> > checks.
> >
> > We don't have a huge number of libraries building on top of Arrow in Rust
> > at the moment so it seems like a good time, before 1.0, to decide on this
> > to avoid breaking changes to the public API in post 1.0.
> >
> > Thoughts?
> >
> > Paddy
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557
> >
> >
>

Re: [Discuss][Rust] Policy regarding "unsafe"

Posted by Neville Dipale <ne...@gmail.com>.
Hi Paddy, Arrow Developers,

I've given this some thought, and I preliminarily think that perhaps we can
audit our use of unsafe and evaluate where we can remove it, propagate it
upwards (and provide safe alternatives) or provide some safety to callers.

Looking at the 3 options that Paul Kernfeld raised in the linked JIRA:


   1. Add in bounds checking so that we don't need to deal with unsafe at
   all.
   2. Propagate the unsafes up through the code.
   3. Maintain a safe and unsafe version of each function that is currently
   unsafe.


I think bounds checking would hurt performance, an example being the
changes introduced in https://issues.apache.org/jira/browse/ARROW-4670. In
ARROW-4670, I believe we were able to get the compiler to auto-vectorise
due to Array::value() avoiding bounds checks. In the case of compute, we
are in control of the array length, and so we know that it's safe to skip
bounds checking. I presume this would largely be the case in tabular-data
use-cases (because we assert that arrows in a record batch meet certain
criteria).

From a cursory glance, if we do find that we don't need explicit SIMD
(still immature in Rust, I've found it difficult to implement in some
cases), we could potentially reduce our unsafe count by around 20%. The
flatbuffers generated files also introduce a lot of unsafe (~26%), so we'd
need to maybe adopt option 2 from Paul on IPC once we're done with the
basics.

We'd then mainly be left with bit manipulation and `Buffer` (which as as
much unsafe as the fbs generated files). I think the API around buffer
would depend on whether we're expecting (based on what can be done with
buffers) this to be exposed to users beyond those using Arrow as a
development platform.

The above are some of my thoughts, but important's that I don't have a lot
of experience with Rust, especially `unsafe` and the other dark corners of
the language.

Regards
Neville

On Fri, 10 Jan 2020 at 04:13, paddy horan <pa...@hotmail.com> wrote:

> Hi All,
>
> This time last year there was a brief discussion on the usage of unsafe in
> Rust (a user on github raised the issue and I created the JIRA). [1]
>
> So far we mostly avoid unsafe in the public API's.  The thinking here is
> that Arrow is a "development platform", i.e. lower level that most
> libraries, and library builders will want to avoid any performance hit of
> bounds checking, etc.
>
> This is not typical in the Rust community where unsafe is a clear signal
> that care is needed.  Although it might clutter the API a little more I
> would be in favor of having safe and unsafe variants of methods as needed.
> For instance, "value" for array access would be changed to "value" and
> "value_unchecked" where the latter is unsafe and does not perform bounds
> checks.
>
> We don't have a huge number of libraries building on top of Arrow in Rust
> at the moment so it seems like a good time, before 1.0, to decide on this
> to avoid breaking changes to the public API in post 1.0.
>
> Thoughts?
>
> Paddy
>
> [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557
>
>