You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2020/02/20 19:22:00 UTC

[jira] [Closed] (DRILL-5530) IntVectors are sometimes not zero-filled on allocation

     [ https://issues.apache.org/jira/browse/DRILL-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers closed DRILL-5530.
------------------------------
    Resolution: Not A Problem

As explained in a note above, the described behavior is by design. The reason this bug was filed is that some bits of code assumed (hoped?) that all buffers were zero filled. We have since recognized we do not want to pay the cost of zero-fill unnecessarily, and code has been hardened to work with dirty buffers.

> IntVectors are sometimes not zero-filled on allocation
> ------------------------------------------------------
>
>                 Key: DRILL-5530
>                 URL: https://issues.apache.org/jira/browse/DRILL-5530
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> Much of Drill's vector code is based on the assumption that vectors are initialized to 0s.
> For example, consider the "bit" (is-set) vector used for nullable types. The vector is presumed initialized to 0 so that all fields are set to not-set (e.g. null) by default. For example, to allocate a new vector the code (slightly edited) is:
> {code}
>   private void allocateBytes(final long size) {
>     final int curSize = (int) size;
>     clear();
>     data = allocator.buffer(curSize);
>     data.setZero(0, data.capacity()); // Zero whole vector
>     allocationSizeInBytes = curSize;
>   }
> {code}
> The code to reallocate (grow) the vector zeros the new slots:
> {code}
>   public void reAlloc() {
>     final long newAllocationSize = allocationSizeInBytes * 2L;
>     final int curSize = (int)newAllocationSize;
>     final DrillBuf newBuf = allocator.buffer(curSize);
>     newBuf.setZero(0, newBuf.capacity());
>     newBuf.setBytes(0, data, 0, data.capacity()); // Zero new part
>     data.release();
>     data = newBuf;
>     allocationSizeInBytes = curSize;
>   }
> {code}
> However, it turns out that other vectors omit the initial zero step. Consider the {{IntVector}}:
> {code}
>   private void allocateBytes(final long size) {
>     final int curSize = (int)size;
>     clear();
>     data = allocator.buffer(curSize);
>     // Notice no call here to zero the buffer
>     data.readerIndex(0);
>     allocationSizeInBytes = curSize;
>   }
> {code}
> Now things start to get strange. If the buffer is newly allocated by Netty from the OS, it will contain zeros. Thus, most test cases will get a zero buffer. But, [Netty does *not* zero the buffer|http://netty.io/wiki/using-as-a-generic-library.html#performance] if the buffer is recycled from Netty's free list.
> Further, the reallocation for ints *does* zero the new half of the buffer:
> {code}
>   public void reAlloc() {
>     final long newAllocationSize = allocationSizeInBytes * 2L;
>     final DrillBuf newBuf = allocator.buffer((int)newAllocationSize);
>     newBuf.setBytes(0, data, 0, data.capacity());
>     final int halfNewCapacity = newBuf.capacity() / 2;
>     newBuf.setZero(halfNewCapacity, halfNewCapacity); // Zero new bytes
>     newBuf.writerIndex(data.writerIndex());
>     data.release(1);
>     data = newBuf;
>     allocationSizeInBytes = (int)newAllocationSize;
>   }
> {code}
> The result is that, most times, the bytes of the vector will be zero. But, the first allocation may be non-zero if Netty reuses an existing block from Netty's free list.
> If any code assumes that values default to zero, that code will fail -- but only for the first few bytes and only if run long enough for the buffer to be a "recycled" "dirty" one.
> This was found by a test that filled vectors with runs of 'X' values for one test, followed by another test that allocated an int vector and inspected its contents.
> This asymmetry is just asking for bugs to appear. To make clear that only bit vectors are zero filled:
> * Remove the {{setZero()}} call when reallocating vectors.
> * When assertions are on, fill the buffer with dummy data (such as 0b1010_1010) to flush out any code that happens to depend on zero values.
> * Code that needs to "back-fill" values (such as when columns are missing in some rows in a reader) should do the back-filling explicitly rather than assuming zeros.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)