You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Oleg Zinoviev (Jira)" <ji...@apache.org> on 2020/02/20 14:52:00 UTC

[jira] [Commented] (DRILL-5530) IntVectors are sometimes not zero-filled on allocation

    [ https://issues.apache.org/jira/browse/DRILL-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041050#comment-17041050 ] 

Oleg Zinoviev commented on DRILL-5530:
--------------------------------------

[~paul-rogers]
I accidentally discovered a similar behavior when sequentially allocating and freeing a buffer. Here is a test that demonstrates this behavior.


{code:java}
 try (OperatorFixture operatorFixture = new OperatorFixture.Builder(baseDirTestWatcher).build()) {
      DrillBuf buffer = operatorFixture.allocator().buffer(8);
      buffer.setByte(0, 1);
      long addr = buffer.memoryAddress();

      buffer.release();
      buffer = operatorFixture.allocator().buffer(8);

      Assert.assertEquals(addr, buffer.memoryAddress());        // same address
      Assert.assertEquals(0, buffer.getByte(0)); // <== fail, since buffer.getByte(0) return 1
    }
{code}

In my case, a non-zero buffer is returned from PooledByteBufAllocatorL :: threadCache () :: directArena :: allocate (...)

> IntVectors are sometimes not zero-filled on allocation
> ------------------------------------------------------
>
>                 Key: DRILL-5530
>                 URL: https://issues.apache.org/jira/browse/DRILL-5530
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> Much of Drill's vector code is based on the assumption that vectors are initialized to 0s.
> For example, consider the "bit" (is-set) vector used for nullable types. The vector is presumed initialized to 0 so that all fields are set to not-set (e.g. null) by default. For example, to allocate a new vector the code (slightly edited) is:
> {code}
>   private void allocateBytes(final long size) {
>     final int curSize = (int) size;
>     clear();
>     data = allocator.buffer(curSize);
>     data.setZero(0, data.capacity()); // Zero whole vector
>     allocationSizeInBytes = curSize;
>   }
> {code}
> The code to reallocate (grow) the vector zeros the new slots:
> {code}
>   public void reAlloc() {
>     final long newAllocationSize = allocationSizeInBytes * 2L;
>     final int curSize = (int)newAllocationSize;
>     final DrillBuf newBuf = allocator.buffer(curSize);
>     newBuf.setZero(0, newBuf.capacity());
>     newBuf.setBytes(0, data, 0, data.capacity()); // Zero new part
>     data.release();
>     data = newBuf;
>     allocationSizeInBytes = curSize;
>   }
> {code}
> However, it turns out that other vectors omit the initial zero step. Consider the {{IntVector}}:
> {code}
>   private void allocateBytes(final long size) {
>     final int curSize = (int)size;
>     clear();
>     data = allocator.buffer(curSize);
>     // Notice no call here to zero the buffer
>     data.readerIndex(0);
>     allocationSizeInBytes = curSize;
>   }
> {code}
> Now things start to get strange. If the buffer is newly allocated by Netty from the OS, it will contain zeros. Thus, most test cases will get a zero buffer. But, [Netty does *not* zero the buffer|http://netty.io/wiki/using-as-a-generic-library.html#performance] if the buffer is recycled from Netty's free list.
> Further, the reallocation for ints *does* zero the new half of the buffer:
> {code}
>   public void reAlloc() {
>     final long newAllocationSize = allocationSizeInBytes * 2L;
>     final DrillBuf newBuf = allocator.buffer((int)newAllocationSize);
>     newBuf.setBytes(0, data, 0, data.capacity());
>     final int halfNewCapacity = newBuf.capacity() / 2;
>     newBuf.setZero(halfNewCapacity, halfNewCapacity); // Zero new bytes
>     newBuf.writerIndex(data.writerIndex());
>     data.release(1);
>     data = newBuf;
>     allocationSizeInBytes = (int)newAllocationSize;
>   }
> {code}
> The result is that, most times, the bytes of the vector will be zero. But, the first allocation may be non-zero if Netty reuses an existing block from Netty's free list.
> If any code assumes that values default to zero, that code will fail -- but only for the first few bytes and only if run long enough for the buffer to be a "recycled" "dirty" one.
> This was found by a test that filled vectors with runs of 'X' values for one test, followed by another test that allocated an int vector and inspected its contents.
> This asymmetry is just asking for bugs to appear. To make clear that only bit vectors are zero filled:
> * Remove the {{setZero()}} call when reallocating vectors.
> * When assertions are on, fill the buffer with dummy data (such as 0b1010_1010) to flush out any code that happens to depend on zero values.
> * Code that needs to "back-fill" values (such as when columns are missing in some rows in a reader) should do the back-filling explicitly rather than assuming zeros.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)