You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Oleg Zinoviev (Jira)" <ji...@apache.org> on 2020/02/20 14:52:00 UTC
[jira] [Commented] (DRILL-5530) IntVectors are sometimes not
zero-filled on allocation
[ https://issues.apache.org/jira/browse/DRILL-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041050#comment-17041050 ]
Oleg Zinoviev commented on DRILL-5530:
--------------------------------------
[~paul-rogers]
I accidentally discovered a similar behavior when sequentially allocating and freeing a buffer. Here is a test that demonstrates this behavior.
{code:java}
try (OperatorFixture operatorFixture = new OperatorFixture.Builder(baseDirTestWatcher).build()) {
DrillBuf buffer = operatorFixture.allocator().buffer(8);
buffer.setByte(0, 1);
long addr = buffer.memoryAddress();
buffer.release();
buffer = operatorFixture.allocator().buffer(8);
Assert.assertEquals(addr, buffer.memoryAddress()); // same address
Assert.assertEquals(0, buffer.getByte(0)); // <== fail, since buffer.getByte(0) return 1
}
{code}
In my case, a non-zero buffer is returned from PooledByteBufAllocatorL :: threadCache () :: directArena :: allocate (...)
> IntVectors are sometimes not zero-filled on allocation
> ------------------------------------------------------
>
> Key: DRILL-5530
> URL: https://issues.apache.org/jira/browse/DRILL-5530
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
> Priority: Major
>
> Much of Drill's vector code is based on the assumption that vectors are initialized to 0s.
> For example, consider the "bit" (is-set) vector used for nullable types. The vector is presumed initialized to 0 so that all fields are set to not-set (e.g. null) by default. For example, to allocate a new vector the code (slightly edited) is:
> {code}
> private void allocateBytes(final long size) {
> final int curSize = (int) size;
> clear();
> data = allocator.buffer(curSize);
> data.setZero(0, data.capacity()); // Zero whole vector
> allocationSizeInBytes = curSize;
> }
> {code}
> The code to reallocate (grow) the vector zeros the new slots:
> {code}
> public void reAlloc() {
> final long newAllocationSize = allocationSizeInBytes * 2L;
> final int curSize = (int)newAllocationSize;
> final DrillBuf newBuf = allocator.buffer(curSize);
> newBuf.setZero(0, newBuf.capacity());
> newBuf.setBytes(0, data, 0, data.capacity()); // Zero new part
> data.release();
> data = newBuf;
> allocationSizeInBytes = curSize;
> }
> {code}
> However, it turns out that other vectors omit the initial zero step. Consider the {{IntVector}}:
> {code}
> private void allocateBytes(final long size) {
> final int curSize = (int)size;
> clear();
> data = allocator.buffer(curSize);
> // Notice no call here to zero the buffer
> data.readerIndex(0);
> allocationSizeInBytes = curSize;
> }
> {code}
> Now things start to get strange. If the buffer is newly allocated by Netty from the OS, it will contain zeros. Thus, most test cases will get a zero buffer. But, [Netty does *not* zero the buffer|http://netty.io/wiki/using-as-a-generic-library.html#performance] if the buffer is recycled from Netty's free list.
> Further, the reallocation for ints *does* zero the new half of the buffer:
> {code}
> public void reAlloc() {
> final long newAllocationSize = allocationSizeInBytes * 2L;
> final DrillBuf newBuf = allocator.buffer((int)newAllocationSize);
> newBuf.setBytes(0, data, 0, data.capacity());
> final int halfNewCapacity = newBuf.capacity() / 2;
> newBuf.setZero(halfNewCapacity, halfNewCapacity); // Zero new bytes
> newBuf.writerIndex(data.writerIndex());
> data.release(1);
> data = newBuf;
> allocationSizeInBytes = (int)newAllocationSize;
> }
> {code}
> The result is that, most times, the bytes of the vector will be zero. But, the first allocation may be non-zero if Netty reuses an existing block from Netty's free list.
> If any code assumes that values default to zero, that code will fail -- but only for the first few bytes and only if run long enough for the buffer to be a "recycled" "dirty" one.
> This was found by a test that filled vectors with runs of 'X' values for one test, followed by another test that allocated an int vector and inspected its contents.
> This asymmetry is just asking for bugs to appear. To make clear that only bit vectors are zero filled:
> * Remove the {{setZero()}} call when reallocating vectors.
> * When assertions are on, fill the buffer with dummy data (such as 0b1010_1010) to flush out any code that happens to depend on zero values.
> * Code that needs to "back-fill" values (such as when columns are missing in some rows in a reader) should do the back-filling explicitly rather than assuming zeros.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)