You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/06/22 04:21:03 UTC
[jira] [Commented] (DRILL-5602) Vector corruption when allocating a
repeated, variable-width vector
[ https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058700#comment-16058700 ]
Paul Rogers commented on DRILL-5602:
------------------------------------
It appears that other vectors have the same issue.
* Repeated map vector (discussed above)
* Variable-width vector (see below)
* All repeated value vectors (see below)
The {{ListVector}} does not have the problem because it does not have the {{allocateNew(int valueCount)}} method. This is its own bug...
The following is code from the {{VarCharVector}}:
{code}
@Override
public void allocateNew(int totalBytes, int valueCount) {
...
offsetVector.allocateNew(valueCount + 1);
...
data.readerIndex(0);
allocationSizeInBytes = totalBytes;
offsetVector.zeroVector();
}
{code}
Notice that the above does not set the initial offset to zero.
Typical repeated vector code (from {{RepeatedIntVector}}:
{code}
@Override
public void allocateNew(int valueCount, int innerValueCount) {
...
offsets.allocateNew(valueCount + 1);
values.allocateNew(innerValueCount);
...
offsets.zeroVector();
mutator.reset();
}
{code}
For {{RepeatedListVector}}:
{code}
@Override
public void allocateNew(int valueCount, int innerValueCount) {
clear();
getOffsetVector().allocateNew(valueCount + 1);
getMutator().reset();
}
{code}
> Vector corruption when allocating a repeated, variable-width vector
> -------------------------------------------------------------------
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that the external sort did not properly allocate its spill batch vectors, and instead allowed them to grow by doubling. While fixing that issue, a new issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as described in DRILL-5530: value vectors do not zero-fill the first allocation for a vector (though subsequent reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This requires growing the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
> public void allocateOffsetsNew(int groupCount) {
> offsets.allocateNew(groupCount + 1);
> }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
> public void allocateNew(final int valueCount) {
> allocateBytes(valueCount * 4);
> }
> private void allocateBytes(final long size) {
> ...
> data = allocator.buffer(curSize);
> ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly states that, for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct memory, which *does* zero-fill the buffer.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)