You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Andrew Lamb <al...@influxdata.com> on 2021/09/29 19:11:40 UTC

[Rust] Heads up: RUSTSEC security advisory against arrow-rs

Dear Rust Developers,

As a heads up, several pre-existing security tickets filed against arrow-rs
have been added[1][2] to the RUSTSEC database[1][2] a few hours ago. The
author says he plans to file additional RUSTSEC entries for the other
security tickets [3].

The criteria used for adding the arrow issues to the RUSTSEC database is
not clear to me,  but given widely used tools such as `cargo audit` report
such issues, it is likely that this will become an visible issue for our
users soon.

Given this, I will likely start looking into existing security issues [4]
reported against arrow-rs and any help would be appreciated.

Andrew

[1] https://github.com/rustsec/advisory-db/pull/1057
[2] https://github.com/rustsec/advisory-db/pull/1059
[3] https://github.com/rustsec/advisory-db/pull/1057#issuecomment-930455127
[4] https://github.com/apache/arrow-rs/labels/security

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Jacques Nadeau <ja...@apache.org>.

In the past I was dealing with something similar. My experience was when
data was accepted at the edge, the cost of validating that the first offset
is zero, the last is within the data bounds and that all others are equal
or increasing was a reasonable overhead associated with validating offsets
from unsafe sources.

On Thu, Sep 30, 2021 at 1:29 PM Andrew Lamb <al...@influxdata.com> wrote:

> I understand the need to avoid sacrificing performance as much as possible.
> I have begun looking into adding validation into ArrayData::new as you
> suggest.
>
> I am making progress, but haven't fully figured out the nested types yet.
> Hope to have a PR up in the next day or two.
>
> Should we come up with something that adds unacceptable performance
> overhead, we can also feature flag it (so that the check is done by
> default, but can be disabled if needed)
>
> Andrew
>
> On Thu, Sep 30, 2021 at 5:33 AM Jörn Horstmann <
> joern.horstmann@signavio.com>
> wrote:
>
> > Most of these issues seem to originate from creating arrays from
> ArrayData.
> > While we could validate buffers in the XArray::from implementations, that
> > would have some performance overhead and also some edge cases for nested
> > data. Thinking about List<List<Utf8>>>, we recursively would need to
> > validate and know which buffers are used for offsets or utf8 data. Since
> > the root cause is always ArrayData::new (and possibly
> > ArrayDataBuilder::finish) I would suggest marking those functions as
> > unsafe.
> >
> > Adding checks for the top-level buffer length would still be useful, I'm
> > just a bit opposed to having to iterate over whole buffers for
> validation,
> > turning these operations from O(1) to O(n).
> >
> > Slightly related, GenericListArray::try_new_from_array_data currently
> > validates that the first offset is 0. I think this check could actually
> be
> > relaxed and the offset only needs to be in bounds for the child array.
> >
> > Jörn
> >
> >
> > On Wed, Sep 29, 2021 at 9:12 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> >
> > > Dear Rust Developers,
> > >
> > > As a heads up, several pre-existing security tickets filed against
> > arrow-rs
> > > have been added[1][2] to the RUSTSEC database[1][2] a few hours ago.
> The
> > > author says he plans to file additional RUSTSEC entries for the other
> > > security tickets [3].
> > >
> > > The criteria used for adding the arrow issues to the RUSTSEC database
> is
> > > not clear to me,  but given widely used tools such as `cargo audit`
> > report
> > > such issues, it is likely that this will become an visible issue for
> our
> > > users soon.
> > >
> > > Given this, I will likely start looking into existing security issues
> [4]
> > > reported against arrow-rs and any help would be appreciated.
> > >
> > > Andrew
> > >
> > > [1] https://github.com/rustsec/advisory-db/pull/1057
> > > [2] https://github.com/rustsec/advisory-db/pull/1059
> > > [3]
> > >
> https://github.com/rustsec/advisory-db/pull/1057#issuecomment-930455127
> > > [4] https://github.com/apache/arrow-rs/labels/security
> > >
> >
> >
> > --
> > *Jörn Horstmann* | Senior Backend Engineer
> >
> > www.signavio.com
> > Kurfürstenstraße 111, 10787 Berlin, Germany
> >
> > Work with us! <https://hubs.ly/H0wwzcr0>
> >
> >
> >
> > <https://www.linkedin.com/company/signavio/>
> > <https://www.twitter.com/signavio>   <https://www.facebook.com/signavio>
> > <https://www.youtube.com/user/signavio>
> > <https://www.xing.com/companies/signaviogmbh>
> >
> > [image: BPI Forum] <
> https://t-eu.xink.io/Tracking/Index/sSMAALBtAAAnu0MA0>
> >
> > HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123
> > Managing Directors: Dr. Gero Decker, Rouven Morato Adam
> >
>

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Andrew Lamb <al...@influxdata.com>.

To finish up this thread, the rustsec advisory has been updated:
https://github.com/rustsec/advisory-db/pull/1131

On Tue, Nov 30, 2021 at 5:10 PM Andrew Lamb <al...@influxdata.com> wrote:

> Well, I finally have a PR[1]  that adds full ArrayData validation ready
> for review  Any thoughts or review comments would be most helpful.
>
> Thank you,
> Andrew
>
> [1] https://github.com/apache/arrow-rs/pull/921
>
> On Fri, Oct 29, 2021 at 6:23 AM Andrew Lamb <al...@influxdata.com> wrote:
>
>> There is more discussion about the RUSTSEC process here:
>> https://github.com/rustsec/advisory-db/issues/1092
>>
>> On Wed, Oct 6, 2021 at 10:52 AM Andrew Lamb <al...@influxdata.com> wrote:
>>
>>> I have incorporated feedback into a proposal [1] of how to handle
>>> validation of arguments to ArrayData::new, and would appreciate further
>>> review
>>>
>>> [1] https://github.com/apache/arrow-rs/issues/817
>>>
>>> On Fri, Oct 1, 2021 at 6:44 AM Andrew Lamb <al...@influxdata.com> wrote:
>>>
>>>> Thank you Antoine,
>>>>
>>>> The C++ validation routine is super helpful, especially with respect to
>>>> validating nested structures. I will follow its lead
>>>>
>>>> Andrew
>>>>
>>>> On Fri, Oct 1, 2021 at 3:25 AM Antoine Pitrou <an...@python.org>
>>>> wrote:
>>>>
>>>>>
>>>>> In C++ we have dedicated validation routines that can be used against
>>>>> untrusted input. (*)  We also have fuzzing set up using OSS-Fuzz to
>>>>> validate that invalid input cannot crash the IPC reader.
>>>>>
>>>>> (*)
>>>>>
>>>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/validate.h
>>>>>
>>>>>
>>>>> Le 01/10/2021 à 00:13, Andrew Lamb a écrit :
>>>>> > I have created a WIP PR for initial feedback on the approach of
>>>>> validating
>>>>> > ArrayData upon creation[1]. If there are no objections to the
>>>>> approach I
>>>>> > will complete the implementation over the next few days
>>>>> >
>>>>> > The approach that Sergey describes of `get` and `unsafe
>>>>> get_unchecked`
>>>>> > sounds like a good one to me if performance testing shows we need a
>>>>> bypass.
>>>>> >
>>>>> > Andrew
>>>>> >
>>>>> > [1] https://github.com/apache/arrow-rs/pull/810
>>>>>
>>>>

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Andrew Lamb <al...@influxdata.com>.

Well, I finally have a PR[1]  that adds full ArrayData validation ready for
review  Any thoughts or review comments would be most helpful.

Thank you,
Andrew

[1] https://github.com/apache/arrow-rs/pull/921

On Fri, Oct 29, 2021 at 6:23 AM Andrew Lamb <al...@influxdata.com> wrote:

> There is more discussion about the RUSTSEC process here:
> https://github.com/rustsec/advisory-db/issues/1092
>
> On Wed, Oct 6, 2021 at 10:52 AM Andrew Lamb <al...@influxdata.com> wrote:
>
>> I have incorporated feedback into a proposal [1] of how to handle
>> validation of arguments to ArrayData::new, and would appreciate further
>> review
>>
>> [1] https://github.com/apache/arrow-rs/issues/817
>>
>> On Fri, Oct 1, 2021 at 6:44 AM Andrew Lamb <al...@influxdata.com> wrote:
>>
>>> Thank you Antoine,
>>>
>>> The C++ validation routine is super helpful, especially with respect to
>>> validating nested structures. I will follow its lead
>>>
>>> Andrew
>>>
>>> On Fri, Oct 1, 2021 at 3:25 AM Antoine Pitrou <an...@python.org>
>>> wrote:
>>>
>>>>
>>>> In C++ we have dedicated validation routines that can be used against
>>>> untrusted input. (*)  We also have fuzzing set up using OSS-Fuzz to
>>>> validate that invalid input cannot crash the IPC reader.
>>>>
>>>> (*)
>>>>
>>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/validate.h
>>>>
>>>>
>>>> Le 01/10/2021 à 00:13, Andrew Lamb a écrit :
>>>> > I have created a WIP PR for initial feedback on the approach of
>>>> validating
>>>> > ArrayData upon creation[1]. If there are no objections to the
>>>> approach I
>>>> > will complete the implementation over the next few days
>>>> >
>>>> > The approach that Sergey describes of `get` and `unsafe get_unchecked`
>>>> > sounds like a good one to me if performance testing shows we need a
>>>> bypass.
>>>> >
>>>> > Andrew
>>>> >
>>>> > [1] https://github.com/apache/arrow-rs/pull/810
>>>>
>>>

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Andrew Lamb <al...@influxdata.com>.

There is more discussion about the RUSTSEC process here:
https://github.com/rustsec/advisory-db/issues/1092

On Wed, Oct 6, 2021 at 10:52 AM Andrew Lamb <al...@influxdata.com> wrote:

> I have incorporated feedback into a proposal [1] of how to handle
> validation of arguments to ArrayData::new, and would appreciate further
> review
>
> [1] https://github.com/apache/arrow-rs/issues/817
>
> On Fri, Oct 1, 2021 at 6:44 AM Andrew Lamb <al...@influxdata.com> wrote:
>
>> Thank you Antoine,
>>
>> The C++ validation routine is super helpful, especially with respect to
>> validating nested structures. I will follow its lead
>>
>> Andrew
>>
>> On Fri, Oct 1, 2021 at 3:25 AM Antoine Pitrou <an...@python.org> wrote:
>>
>>>
>>> In C++ we have dedicated validation routines that can be used against
>>> untrusted input. (*)  We also have fuzzing set up using OSS-Fuzz to
>>> validate that invalid input cannot crash the IPC reader.
>>>
>>> (*)
>>>
>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/validate.h
>>>
>>>
>>> Le 01/10/2021 à 00:13, Andrew Lamb a écrit :
>>> > I have created a WIP PR for initial feedback on the approach of
>>> validating
>>> > ArrayData upon creation[1]. If there are no objections to the approach
>>> I
>>> > will complete the implementation over the next few days
>>> >
>>> > The approach that Sergey describes of `get` and `unsafe get_unchecked`
>>> > sounds like a good one to me if performance testing shows we need a
>>> bypass.
>>> >
>>> > Andrew
>>> >
>>> > [1] https://github.com/apache/arrow-rs/pull/810
>>>
>>

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Andrew Lamb <al...@influxdata.com>.

I have incorporated feedback into a proposal [1] of how to handle
validation of arguments to ArrayData::new, and would appreciate further
review

[1] https://github.com/apache/arrow-rs/issues/817

On Fri, Oct 1, 2021 at 6:44 AM Andrew Lamb <al...@influxdata.com> wrote:

> Thank you Antoine,
>
> The C++ validation routine is super helpful, especially with respect to
> validating nested structures. I will follow its lead
>
> Andrew
>
> On Fri, Oct 1, 2021 at 3:25 AM Antoine Pitrou <an...@python.org> wrote:
>
>>
>> In C++ we have dedicated validation routines that can be used against
>> untrusted input. (*)  We also have fuzzing set up using OSS-Fuzz to
>> validate that invalid input cannot crash the IPC reader.
>>
>> (*)
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/validate.h
>>
>>
>> Le 01/10/2021 à 00:13, Andrew Lamb a écrit :
>> > I have created a WIP PR for initial feedback on the approach of
>> validating
>> > ArrayData upon creation[1]. If there are no objections to the approach I
>> > will complete the implementation over the next few days
>> >
>> > The approach that Sergey describes of `get` and `unsafe get_unchecked`
>> > sounds like a good one to me if performance testing shows we need a
>> bypass.
>> >
>> > Andrew
>> >
>> > [1] https://github.com/apache/arrow-rs/pull/810
>>
>

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Andrew Lamb <al...@influxdata.com>.

Thank you Antoine,

The C++ validation routine is super helpful, especially with respect to
validating nested structures. I will follow its lead

Andrew

On Fri, Oct 1, 2021 at 3:25 AM Antoine Pitrou <an...@python.org> wrote:

>
> In C++ we have dedicated validation routines that can be used against
> untrusted input. (*)  We also have fuzzing set up using OSS-Fuzz to
> validate that invalid input cannot crash the IPC reader.
>
> (*)
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/validate.h
>
>
> Le 01/10/2021 à 00:13, Andrew Lamb a écrit :
> > I have created a WIP PR for initial feedback on the approach of
> validating
> > ArrayData upon creation[1]. If there are no objections to the approach I
> > will complete the implementation over the next few days
> >
> > The approach that Sergey describes of `get` and `unsafe get_unchecked`
> > sounds like a good one to me if performance testing shows we need a
> bypass.
> >
> > Andrew
> >
> > [1] https://github.com/apache/arrow-rs/pull/810
>

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Antoine Pitrou <an...@python.org>.

In C++ we have dedicated validation routines that can be used against 
untrusted input. (*)  We also have fuzzing set up using OSS-Fuzz to 
validate that invalid input cannot crash the IPC reader.

(*) 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/validate.h


Le 01/10/2021 à 00:13, Andrew Lamb a écrit :
> I have created a WIP PR for initial feedback on the approach of validating
> ArrayData upon creation[1]. If there are no objections to the approach I
> will complete the implementation over the next few days
> 
> The approach that Sergey describes of `get` and `unsafe get_unchecked`
> sounds like a good one to me if performance testing shows we need a bypass.
> 
> Andrew
> 
> [1] https://github.com/apache/arrow-rs/pull/810

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Andrew Lamb <al...@influxdata.com>.

I have created a WIP PR for initial feedback on the approach of validating
ArrayData upon creation[1]. If there are no objections to the approach I
will complete the implementation over the next few days

The approach that Sergey describes of `get` and `unsafe get_unchecked`
sounds like a good one to me if performance testing shows we need a bypass.

Andrew

[1] https://github.com/apache/arrow-rs/pull/810

On Thu, Sep 30, 2021 at 5:28 PM Sergey Davidoff <sh...@gmail.com> wrote:

> I believe feature flags are not the right choice here. The problem is that
> feature flags are toggled globally, so the behavior of your crate can be
> affected by the behavior of some other crate that toggles the feature.
>
> A better approach is to provide two methods, one safe and the other
> unsafe, see for example:
> https://doc.rust-lang.org/stable/std/primitive.slice.html#method.get
>
> https://doc.rust-lang.org/stable/std/primitive.slice.html#method.get_unchecked
>
> Speaking of O(1) vs O(n), a correct program would perform these checks at
> least once anyway, and so the complexity is unavoidable. However, you can
> avoid re-checking bounds for the same buffer over and over if the state of
> the check is encoded in the type system - for example, the `&str` standard
> library type is just a slice that's already been checked and found to be
> valid UTF-8.
>
> Although I struggle to imagine bounds checks being a performance
> bottleneck when done once on object creation, given that removing them from
> hot loops usually wins you only a couple % in performance.
>
> чт, 30 сент. 2021 г. в 22:28, Andrew Lamb <al...@influxdata.com>:
>
>> I understand the need to avoid sacrificing performance as much as
>> possible. I have begun looking into adding validation into ArrayData::new
>> as you suggest.
>>
>> I am making progress, but haven't fully figured out the nested types yet.
>> Hope to have a PR up in the next day or two.
>>
>> Should we come up with something that adds unacceptable performance
>> overhead, we can also feature flag it (so that the check is done by
>> default, but can be disabled if needed)
>>
>> Andrew
>>
>> On Thu, Sep 30, 2021 at 5:33 AM Jörn Horstmann <
>> joern.horstmann@signavio.com> wrote:
>>
>>> Most of these issues seem to originate from creating arrays from
>>> ArrayData.
>>> While we could validate buffers in the XArray::from implementations, that
>>> would have some performance overhead and also some edge cases for nested
>>> data. Thinking about List<List<Utf8>>>, we recursively would need to
>>> validate and know which buffers are used for offsets or utf8 data. Since
>>> the root cause is always ArrayData::new (and possibly
>>> ArrayDataBuilder::finish) I would suggest marking those functions as
>>> unsafe.
>>>
>>> Adding checks for the top-level buffer length would still be useful, I'm
>>> just a bit opposed to having to iterate over whole buffers for
>>> validation,
>>> turning these operations from O(1) to O(n).
>>>
>>> Slightly related, GenericListArray::try_new_from_array_data currently
>>> validates that the first offset is 0. I think this check could actually
>>> be
>>> relaxed and the offset only needs to be in bounds for the child array.
>>>
>>> Jörn
>>>
>>>
>>> On Wed, Sep 29, 2021 at 9:12 PM Andrew Lamb <al...@influxdata.com>
>>> wrote:
>>>
>>> > Dear Rust Developers,
>>> >
>>> > As a heads up, several pre-existing security tickets filed against
>>> arrow-rs
>>> > have been added[1][2] to the RUSTSEC database[1][2] a few hours ago.
>>> The
>>> > author says he plans to file additional RUSTSEC entries for the other
>>> > security tickets [3].
>>> >
>>> > The criteria used for adding the arrow issues to the RUSTSEC database
>>> is
>>> > not clear to me,  but given widely used tools such as `cargo audit`
>>> report
>>> > such issues, it is likely that this will become an visible issue for
>>> our
>>> > users soon.
>>> >
>>> > Given this, I will likely start looking into existing security issues
>>> [4]
>>> > reported against arrow-rs and any help would be appreciated.
>>> >
>>> > Andrew
>>> >
>>> > [1] https://github.com/rustsec/advisory-db/pull/1057
>>> > [2] https://github.com/rustsec/advisory-db/pull/1059
>>> > [3]
>>> >
>>> https://github.com/rustsec/advisory-db/pull/1057#issuecomment-930455127
>>> > [4] https://github.com/apache/arrow-rs/labels/security
>>> >
>>>
>>>
>>> --
>>> *Jörn Horstmann* | Senior Backend Engineer
>>>
>>> www.signavio.com
>>> Kurfürstenstraße 111, 10787 Berlin, Germany
>>>
>>> Work with us! <https://hubs.ly/H0wwzcr0>
>>>
>>>
>>>
>>> <https://www.linkedin.com/company/signavio/>
>>> <https://www.twitter.com/signavio>   <https://www.facebook.com/signavio>
>>> <https://www.youtube.com/user/signavio>
>>> <https://www.xing.com/companies/signaviogmbh>
>>>
>>> [image: BPI Forum] <
>>> https://t-eu.xink.io/Tracking/Index/sSMAALBtAAAnu0MA0>
>>>
>>> HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123
>>> Managing Directors: Dr. Gero Decker, Rouven Morato Adam
>>>
>>

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Sergey Davidoff <sh...@gmail.com>.

I believe feature flags are not the right choice here. The problem is that
feature flags are toggled globally, so the behavior of your crate can be
affected by the behavior of some other crate that toggles the feature.

A better approach is to provide two methods, one safe and the other unsafe,
see for example:
https://doc.rust-lang.org/stable/std/primitive.slice.html#method.get
https://doc.rust-lang.org/stable/std/primitive.slice.html#method.get_unchecked

Speaking of O(1) vs O(n), a correct program would perform these checks at
least once anyway, and so the complexity is unavoidable. However, you can
avoid re-checking bounds for the same buffer over and over if the state of
the check is encoded in the type system - for example, the `&str` standard
library type is just a slice that's already been checked and found to be
valid UTF-8.

Although I struggle to imagine bounds checks being a performance bottleneck
when done once on object creation, given that removing them from hot loops
usually wins you only a couple % in performance.

чт, 30 сент. 2021 г. в 22:28, Andrew Lamb <al...@influxdata.com>:

> I understand the need to avoid sacrificing performance as much as
> possible. I have begun looking into adding validation into ArrayData::new
> as you suggest.
>
> I am making progress, but haven't fully figured out the nested types yet.
> Hope to have a PR up in the next day or two.
>
> Should we come up with something that adds unacceptable performance
> overhead, we can also feature flag it (so that the check is done by
> default, but can be disabled if needed)
>
> Andrew
>
> On Thu, Sep 30, 2021 at 5:33 AM Jörn Horstmann <
> joern.horstmann@signavio.com> wrote:
>
>> Most of these issues seem to originate from creating arrays from
>> ArrayData.
>> While we could validate buffers in the XArray::from implementations, that
>> would have some performance overhead and also some edge cases for nested
>> data. Thinking about List<List<Utf8>>>, we recursively would need to
>> validate and know which buffers are used for offsets or utf8 data. Since
>> the root cause is always ArrayData::new (and possibly
>> ArrayDataBuilder::finish) I would suggest marking those functions as
>> unsafe.
>>
>> Adding checks for the top-level buffer length would still be useful, I'm
>> just a bit opposed to having to iterate over whole buffers for validation,
>> turning these operations from O(1) to O(n).
>>
>> Slightly related, GenericListArray::try_new_from_array_data currently
>> validates that the first offset is 0. I think this check could actually be
>> relaxed and the offset only needs to be in bounds for the child array.
>>
>> Jörn
>>
>>
>> On Wed, Sep 29, 2021 at 9:12 PM Andrew Lamb <al...@influxdata.com> wrote:
>>
>> > Dear Rust Developers,
>> >
>> > As a heads up, several pre-existing security tickets filed against
>> arrow-rs
>> > have been added[1][2] to the RUSTSEC database[1][2] a few hours ago. The
>> > author says he plans to file additional RUSTSEC entries for the other
>> > security tickets [3].
>> >
>> > The criteria used for adding the arrow issues to the RUSTSEC database is
>> > not clear to me,  but given widely used tools such as `cargo audit`
>> report
>> > such issues, it is likely that this will become an visible issue for our
>> > users soon.
>> >
>> > Given this, I will likely start looking into existing security issues
>> [4]
>> > reported against arrow-rs and any help would be appreciated.
>> >
>> > Andrew
>> >
>> > [1] https://github.com/rustsec/advisory-db/pull/1057
>> > [2] https://github.com/rustsec/advisory-db/pull/1059
>> > [3]
>> > https://github.com/rustsec/advisory-db/pull/1057#issuecomment-930455127
>> > [4] https://github.com/apache/arrow-rs/labels/security
>> >
>>
>>
>> --
>> *Jörn Horstmann* | Senior Backend Engineer
>>
>> www.signavio.com
>> Kurfürstenstraße 111, 10787 Berlin, Germany
>>
>> Work with us! <https://hubs.ly/H0wwzcr0>
>>
>>
>>
>> <https://www.linkedin.com/company/signavio/>
>> <https://www.twitter.com/signavio>   <https://www.facebook.com/signavio>
>> <https://www.youtube.com/user/signavio>
>> <https://www.xing.com/companies/signaviogmbh>
>>
>> [image: BPI Forum] <https://t-eu.xink.io/Tracking/Index/sSMAALBtAAAnu0MA0
>> >
>>
>> HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123
>> Managing Directors: Dr. Gero Decker, Rouven Morato Adam
>>
>

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Andrew Lamb <al...@influxdata.com>.

I understand the need to avoid sacrificing performance as much as possible.
I have begun looking into adding validation into ArrayData::new as you
suggest.

I am making progress, but haven't fully figured out the nested types yet.
Hope to have a PR up in the next day or two.

Should we come up with something that adds unacceptable performance
overhead, we can also feature flag it (so that the check is done by
default, but can be disabled if needed)

Andrew

On Thu, Sep 30, 2021 at 5:33 AM Jörn Horstmann <jo...@signavio.com>
wrote:

> Most of these issues seem to originate from creating arrays from ArrayData.
> While we could validate buffers in the XArray::from implementations, that
> would have some performance overhead and also some edge cases for nested
> data. Thinking about List<List<Utf8>>>, we recursively would need to
> validate and know which buffers are used for offsets or utf8 data. Since
> the root cause is always ArrayData::new (and possibly
> ArrayDataBuilder::finish) I would suggest marking those functions as
> unsafe.
>
> Adding checks for the top-level buffer length would still be useful, I'm
> just a bit opposed to having to iterate over whole buffers for validation,
> turning these operations from O(1) to O(n).
>
> Slightly related, GenericListArray::try_new_from_array_data currently
> validates that the first offset is 0. I think this check could actually be
> relaxed and the offset only needs to be in bounds for the child array.
>
> Jörn
>
>
> On Wed, Sep 29, 2021 at 9:12 PM Andrew Lamb <al...@influxdata.com> wrote:
>
> > Dear Rust Developers,
> >
> > As a heads up, several pre-existing security tickets filed against
> arrow-rs
> > have been added[1][2] to the RUSTSEC database[1][2] a few hours ago. The
> > author says he plans to file additional RUSTSEC entries for the other
> > security tickets [3].
> >
> > The criteria used for adding the arrow issues to the RUSTSEC database is
> > not clear to me,  but given widely used tools such as `cargo audit`
> report
> > such issues, it is likely that this will become an visible issue for our
> > users soon.
> >
> > Given this, I will likely start looking into existing security issues [4]
> > reported against arrow-rs and any help would be appreciated.
> >
> > Andrew
> >
> > [1] https://github.com/rustsec/advisory-db/pull/1057
> > [2] https://github.com/rustsec/advisory-db/pull/1059
> > [3]
> > https://github.com/rustsec/advisory-db/pull/1057#issuecomment-930455127
> > [4] https://github.com/apache/arrow-rs/labels/security
> >
>
>
> --
> *Jörn Horstmann* | Senior Backend Engineer
>
> www.signavio.com
> Kurfürstenstraße 111, 10787 Berlin, Germany
>
> Work with us! <https://hubs.ly/H0wwzcr0>
>
>
>
> <https://www.linkedin.com/company/signavio/>
> <https://www.twitter.com/signavio>   <https://www.facebook.com/signavio>
> <https://www.youtube.com/user/signavio>
> <https://www.xing.com/companies/signaviogmbh>
>
> [image: BPI Forum] <https://t-eu.xink.io/Tracking/Index/sSMAALBtAAAnu0MA0>
>
> HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123
> Managing Directors: Dr. Gero Decker, Rouven Morato Adam
>

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

Posted by Jörn Horstmann <jo...@signavio.com>.

Most of these issues seem to originate from creating arrays from ArrayData.
While we could validate buffers in the XArray::from implementations, that
would have some performance overhead and also some edge cases for nested
data. Thinking about List<List<Utf8>>>, we recursively would need to
validate and know which buffers are used for offsets or utf8 data. Since
the root cause is always ArrayData::new (and possibly
ArrayDataBuilder::finish) I would suggest marking those functions as unsafe.

Adding checks for the top-level buffer length would still be useful, I'm
just a bit opposed to having to iterate over whole buffers for validation,
turning these operations from O(1) to O(n).

Slightly related, GenericListArray::try_new_from_array_data currently
validates that the first offset is 0. I think this check could actually be
relaxed and the offset only needs to be in bounds for the child array.

Jörn

On Wed, Sep 29, 2021 at 9:12 PM Andrew Lamb <al...@influxdata.com> wrote:

> Dear Rust Developers,
>
> As a heads up, several pre-existing security tickets filed against arrow-rs
> have been added[1][2] to the RUSTSEC database[1][2] a few hours ago. The
> author says he plans to file additional RUSTSEC entries for the other
> security tickets [3].
>
> The criteria used for adding the arrow issues to the RUSTSEC database is
> not clear to me,  but given widely used tools such as `cargo audit` report
> such issues, it is likely that this will become an visible issue for our
> users soon.
>
> Given this, I will likely start looking into existing security issues [4]
> reported against arrow-rs and any help would be appreciated.
>
> Andrew
>
> [1] https://github.com/rustsec/advisory-db/pull/1057
> [2] https://github.com/rustsec/advisory-db/pull/1059
> [3]
> https://github.com/rustsec/advisory-db/pull/1057#issuecomment-930455127
> [4] https://github.com/apache/arrow-rs/labels/security
>

-- 
*Jörn Horstmann* | Senior Backend Engineer

www.signavio.com
Kurfürstenstraße 111, 10787 Berlin, Germany

Work with us! <https://hubs.ly/H0wwzcr0>

<https://www.linkedin.com/company/signavio/>
<https://www.twitter.com/signavio>   <https://www.facebook.com/signavio>
<https://www.youtube.com/user/signavio>
<https://www.xing.com/companies/signaviogmbh>

[image: BPI Forum] <https://t-eu.xink.io/Tracking/Index/sSMAALBtAAAnu0MA0>

HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123
Managing Directors: Dr. Gero Decker, Rouven Morato Adam