You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/05/21 18:48:04 UTC

[jira] [Created] (DRILL-5530) IntVectors are sometimes no zero-filled on allocation

Paul Rogers created DRILL-5530:
----------------------------------

             Summary: IntVectors are sometimes no zero-filled on allocation
                 Key: DRILL-5530
                 URL: https://issues.apache.org/jira/browse/DRILL-5530
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers


Much of Drill's vector code is based on the assumption that vectors are initialized to 0s.

For example, consider the "bit" (is-set) vector used for nullable types. The vector is presumed initialized to 0 so that all fields are set to not-set (e.g. null) by default. For example, to allocate a new vector the code (slightly edited) is:

{code}
  private void allocateBytes(final long size) {
    final int curSize = (int) size;
    clear();
    data = allocator.buffer(curSize);
    data.setZero(0, data.capacity()); // Zero whole vector
    allocationSizeInBytes = curSize;
  }
{code}

The code to reallocate (grow) the vector zeros the new slots:

{code}
  public void reAlloc() {
    final long newAllocationSize = allocationSizeInBytes * 2L;
    final int curSize = (int)newAllocationSize;
    final DrillBuf newBuf = allocator.buffer(curSize);
    newBuf.setZero(0, newBuf.capacity());
    newBuf.setBytes(0, data, 0, data.capacity()); // Zero new part
    data.release();
    data = newBuf;
    allocationSizeInBytes = curSize;
  }
{code}

However, it turns out that other vectors omit the initial zero step. Consider the {{IntVector}}:

{code}
  private void allocateBytes(final long size) {
    final int curSize = (int)size;
    clear();
    data = allocator.buffer(curSize);
    // Notice no call here to zero the buffer
    data.readerIndex(0);
    allocationSizeInBytes = curSize;
  }
{code}

Now things start to get strange. If the buffer is newly allocated by Netty from the OS, it will contain zeros. Thus, most test cases will get a zero buffer.

Further, the reallocation for ints *does* zero the new half of the buffer:

{code}
  public void reAlloc() {
    final long newAllocationSize = allocationSizeInBytes * 2L;
    final DrillBuf newBuf = allocator.buffer((int)newAllocationSize);
    newBuf.setBytes(0, data, 0, data.capacity());
    final int halfNewCapacity = newBuf.capacity() / 2;
    newBuf.setZero(halfNewCapacity, halfNewCapacity); // Zero new bytes
    newBuf.writerIndex(data.writerIndex());
    data.release(1);
    data = newBuf;
    allocationSizeInBytes = (int)newAllocationSize;
  }
{code}

The result is that, most times, the bytes of the vector will be zero. But, the first allocation may be non-zero if Netty reuses an existing block from Netty's free list.

If any code assumes that values default to zero, that code will fail -- but only for the first few bytes and only if run long enough for the buffer to be a "recycled" "dirty" one.

This was found by a test that filled vectors with runs of 'X' values for one test, followed by another test that allocated an int vector and inspected its contents.

This asymmetry is just asking for bugs to appear. To make clear that only bit vectors are zero filled:

* Remove the {{setZero()}} call when reallocating vectors.
* When assertions are on, fill the buffer with dummy data (such as 0b1010_1010) to flush out any code that happens to depend on zero values.
* Code that needs to "back-fill" values (such as when columns are missing in some rows in a reader) should do the back-filling explicitly rather than assuming zeros.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)