You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Dimitri Vorona <al...@googlemail.com.INVALID> on 2018/07/03 11:11:33 UTC

Uninitialized buffer memory leads to buffer warnings

Hi all,

currently, running json-integration-test with valgrind leads to the
following warning: "Syscall param write(buf) points to uninitialised
byte(s)". This is caused by PrimitiveBufferBuilder not initializing its
data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e. setting
all values to not-null by default.

Since having tests pass valgrind might be desirable for the CI, I think we
should fix this warning. There are a couple of possibilities:

1. Add suppression. The specs doesn't require padding to have a specific
value, so we might consider it to be false positive
2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal
implementations.
3. Generally zero-initialize memory in PoolBuffer. Might be too expensive.

Or course there could be number of other options, includeing "do nothing".
If we settle on a best option, I can make a PR.

Cheers,
Dimitri.

Re: Uninitialized buffer memory leads to buffer warnings

Posted by Wes McKinney <we...@gmail.com>.
hi Dimitri,

On Mon, Jul 9, 2018 at 4:12 AM, Dimitri Vorona
<al...@googlemail.com.invalid> wrote:
> Hi all,
>
> now that PR 2216 has been merged, many of the mentioned problems have been
> alleviated. To prevent similar problems, I'll add, as Philipp suggested, a
> debug-build check to the buffer builders, which ensures that the finish
> method has been called before the builder destructs.

Great, this sounds useful. The downside to this is that it would cause
debug builds to terminate if the builder is destructed for some other
reason. It might be better to just have logging output in debug builds
rather than a DCHECK

>
> Unfortunately, not every buffer is built with the builders. Since the
> buffers are allocated uninitialized, it's really easy to forget about
> padding. The Copy method of the buffer, for example, leaves the padding of
> the target buffer uninitialized, except in the rare case where size_ ==
> capacity_. Basically, before a newly allocated buffer is used, you have to
> make sure it's zero padded.

Thank you for catching this; I agree we should always zero the bytes
between size_ and capacity_. We could go even further by putting this
logic in the buffer implementations (Resize, etc.)

>
> I experimented with changing Buffer::data method's semantics to check if
> the buffer is zero padded in the debug builds. It turned up a number of
> problems (the ray one including), but also a great deal of false positives,
> so I'm not sure if it's the right way forward.
>
> I think we should at least add a AllocateBufferZeroPadded method and make
> explicit, that you should use it, if you don't plan on resizing the buffer
> after allocation. Also, a Buffer::ZeroPadding method which memsets
> everything between size_ and capacity_ to 0 might be useful, since it would
> catch programmers attention to the need. Possible checks can then be added
> as needed.

Sounds good to me. Thank you for looking into this.

>
> Cheers,
> Dimitri.

Re: Uninitialized buffer memory leads to buffer warnings

Posted by Dimitri Vorona <al...@googlemail.com.INVALID>.
Hi all,

now that PR 2216 has been merged, many of the mentioned problems have been
alleviated. To prevent similar problems, I'll add, as Philipp suggested, a
debug-build check to the buffer builders, which ensures that the finish
method has been called before the builder destructs.

Unfortunately, not every buffer is built with the builders. Since the
buffers are allocated uninitialized, it's really easy to forget about
padding. The Copy method of the buffer, for example, leaves the padding of
the target buffer uninitialized, except in the rare case where size_ ==
capacity_. Basically, before a newly allocated buffer is used, you have to
make sure it's zero padded.

I experimented with changing Buffer::data method's semantics to check if
the buffer is zero padded in the debug builds. It turned up a number of
problems (the ray one including), but also a great deal of false positives,
so I'm not sure if it's the right way forward.

I think we should at least add a AllocateBufferZeroPadded method and make
explicit, that you should use it, if you don't plan on resizing the buffer
after allocation. Also, a Buffer::ZeroPadding method which memsets
everything between size_ and capacity_ to 0 might be useful, since it would
catch programmers attention to the need. Possible checks can then be added
as needed.

Cheers,
Dimitri.

On Tue, Jul 3, 2018 at 4:53 PM Wes McKinney <we...@gmail.com> wrote:

> hi Dimitri,
>
> Zeroing the padding is probably a good idea; I'd be interested to look
> at the diff to see how many code paths are impacted. We already have a
> number of places where we are zeroing buffers but as you have found it
> is not 100% consistent.
>
> - Wes
>
> On Tue, Jul 3, 2018 at 10:13 AM, Dimitri Vorona
> <al...@googlemail.com.invalid> wrote:
> > Hi,
> >
> > I misunderstood the issue: the zeroing of the padding is correctly
> handled
> > by the writers, as long as they are careful not to read beyond the
> > specified size of the buffers (as opposed to its capacity).
> >
> > The valgrind warning came from the way AppendNull(s) is implemented in
> some
> > of the builders: it just sets the nullmap and increments lengths, but
> > doesn't set the data, which leads to uninitialized memory chunks, where
> > nulls where inserted. I'll send a PR which addresses it.
> >
> > I'm still not sure if we should add zero the padding anyway, since the
> > buffers can be shared in any number of ways. Zeroing the last couple of
> > bytes shouldn't have any performance impact and could save a headache in
> > the future.
> >
> > Cheers.
> >
> > On Tue, Jul 3, 2018 at 1:15 PM Antoine Pitrou <an...@python.org>
> wrote:
> >
> >>
> >> Hi,
> >>
> >> Writing uninitialized memory risks leaking private data.  This might
> >> lead to security issues.
> >>
> >> I'd go for option 2.  Option 3 sounds much more costly (we would be
> >> zero-initializing large memory areas instead of small padding areas).
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 03/07/2018 à 13:11, Dimitri Vorona a écrit :
> >> > Hi all,
> >> >
> >> > currently, running json-integration-test with valgrind leads to the
> >> > following warning: "Syscall param write(buf) points to uninitialised
> >> > byte(s)". This is caused by PrimitiveBufferBuilder not initializing
> its
> >> > data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e.
> >> setting
> >> > all values to not-null by default.
> >> >
> >> > Since having tests pass valgrind might be desirable for the CI, I
> think
> >> we
> >> > should fix this warning. There are a couple of possibilities:
> >> >
> >> > 1. Add suppression. The specs doesn't require padding to have a
> specific
> >> > value, so we might consider it to be false positive
> >> > 2. Add initialization of the padding bytes to
> ArrayBuilder::FinishIntenal
> >> > implementations.
> >> > 3. Generally zero-initialize memory in PoolBuffer. Might be too
> >> expensive.
> >> >
> >> > Or course there could be number of other options, includeing "do
> >> nothing".
> >> > If we settle on a best option, I can make a PR.
> >> >
> >> > Cheers,
> >> > Dimitri.
> >> >
> >>
>

Re: Uninitialized buffer memory leads to buffer warnings

Posted by Wes McKinney <we...@gmail.com>.
hi Dimitri,

Zeroing the padding is probably a good idea; I'd be interested to look
at the diff to see how many code paths are impacted. We already have a
number of places where we are zeroing buffers but as you have found it
is not 100% consistent.

- Wes

On Tue, Jul 3, 2018 at 10:13 AM, Dimitri Vorona
<al...@googlemail.com.invalid> wrote:
> Hi,
>
> I misunderstood the issue: the zeroing of the padding is correctly handled
> by the writers, as long as they are careful not to read beyond the
> specified size of the buffers (as opposed to its capacity).
>
> The valgrind warning came from the way AppendNull(s) is implemented in some
> of the builders: it just sets the nullmap and increments lengths, but
> doesn't set the data, which leads to uninitialized memory chunks, where
> nulls where inserted. I'll send a PR which addresses it.
>
> I'm still not sure if we should add zero the padding anyway, since the
> buffers can be shared in any number of ways. Zeroing the last couple of
> bytes shouldn't have any performance impact and could save a headache in
> the future.
>
> Cheers.
>
> On Tue, Jul 3, 2018 at 1:15 PM Antoine Pitrou <an...@python.org> wrote:
>
>>
>> Hi,
>>
>> Writing uninitialized memory risks leaking private data.  This might
>> lead to security issues.
>>
>> I'd go for option 2.  Option 3 sounds much more costly (we would be
>> zero-initializing large memory areas instead of small padding areas).
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 03/07/2018 à 13:11, Dimitri Vorona a écrit :
>> > Hi all,
>> >
>> > currently, running json-integration-test with valgrind leads to the
>> > following warning: "Syscall param write(buf) points to uninitialised
>> > byte(s)". This is caused by PrimitiveBufferBuilder not initializing its
>> > data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e.
>> setting
>> > all values to not-null by default.
>> >
>> > Since having tests pass valgrind might be desirable for the CI, I think
>> we
>> > should fix this warning. There are a couple of possibilities:
>> >
>> > 1. Add suppression. The specs doesn't require padding to have a specific
>> > value, so we might consider it to be false positive
>> > 2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal
>> > implementations.
>> > 3. Generally zero-initialize memory in PoolBuffer. Might be too
>> expensive.
>> >
>> > Or course there could be number of other options, includeing "do
>> nothing".
>> > If we settle on a best option, I can make a PR.
>> >
>> > Cheers,
>> > Dimitri.
>> >
>>

Re: Uninitialized buffer memory leads to buffer warnings

Posted by Dimitri Vorona <al...@googlemail.com.INVALID>.
Hi,

I misunderstood the issue: the zeroing of the padding is correctly handled
by the writers, as long as they are careful not to read beyond the
specified size of the buffers (as opposed to its capacity).

The valgrind warning came from the way AppendNull(s) is implemented in some
of the builders: it just sets the nullmap and increments lengths, but
doesn't set the data, which leads to uninitialized memory chunks, where
nulls where inserted. I'll send a PR which addresses it.

I'm still not sure if we should add zero the padding anyway, since the
buffers can be shared in any number of ways. Zeroing the last couple of
bytes shouldn't have any performance impact and could save a headache in
the future.

Cheers.

On Tue, Jul 3, 2018 at 1:15 PM Antoine Pitrou <an...@python.org> wrote:

>
> Hi,
>
> Writing uninitialized memory risks leaking private data.  This might
> lead to security issues.
>
> I'd go for option 2.  Option 3 sounds much more costly (we would be
> zero-initializing large memory areas instead of small padding areas).
>
> Regards
>
> Antoine.
>
>
> Le 03/07/2018 à 13:11, Dimitri Vorona a écrit :
> > Hi all,
> >
> > currently, running json-integration-test with valgrind leads to the
> > following warning: "Syscall param write(buf) points to uninitialised
> > byte(s)". This is caused by PrimitiveBufferBuilder not initializing its
> > data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e.
> setting
> > all values to not-null by default.
> >
> > Since having tests pass valgrind might be desirable for the CI, I think
> we
> > should fix this warning. There are a couple of possibilities:
> >
> > 1. Add suppression. The specs doesn't require padding to have a specific
> > value, so we might consider it to be false positive
> > 2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal
> > implementations.
> > 3. Generally zero-initialize memory in PoolBuffer. Might be too
> expensive.
> >
> > Or course there could be number of other options, includeing "do
> nothing".
> > If we settle on a best option, I can make a PR.
> >
> > Cheers,
> > Dimitri.
> >
>

Re: Uninitialized buffer memory leads to buffer warnings

Posted by Antoine Pitrou <an...@python.org>.
Hi,

Writing uninitialized memory risks leaking private data.  This might
lead to security issues.

I'd go for option 2.  Option 3 sounds much more costly (we would be
zero-initializing large memory areas instead of small padding areas).

Regards

Antoine.


Le 03/07/2018 à 13:11, Dimitri Vorona a écrit :
> Hi all,
> 
> currently, running json-integration-test with valgrind leads to the
> following warning: "Syscall param write(buf) points to uninitialised
> byte(s)". This is caused by PrimitiveBufferBuilder not initializing its
> data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e. setting
> all values to not-null by default.
> 
> Since having tests pass valgrind might be desirable for the CI, I think we
> should fix this warning. There are a couple of possibilities:
> 
> 1. Add suppression. The specs doesn't require padding to have a specific
> value, so we might consider it to be false positive
> 2. Add initialization of the padding bytes to ArrayBuilder::FinishIntenal
> implementations.
> 3. Generally zero-initialize memory in PoolBuffer. Might be too expensive.
> 
> Or course there could be number of other options, includeing "do nothing".
> If we settle on a best option, I can make a PR.
> 
> Cheers,
> Dimitri.
>