You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/05/09 02:42:04 UTC
[jira] [Created] (DRILL-5490) RepeatedVarCharOutput.recordStart
becomes corrupted on realloc
Paul Rogers created DRILL-5490:
----------------------------------
Summary: RepeatedVarCharOutput.recordStart becomes corrupted on realloc
Key: DRILL-5490
URL: https://issues.apache.org/jira/browse/DRILL-5490
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
Priority: Minor
The class {{RepeatedVarCharOutput}}, used to read text data into the {{columns}} array, has a member called {{recordStart}} which is supposed to track the memory offset of the start of the record data in the {{columns}} VarChar array. However, the logic for the member is very wrong. First, it is initialized based on an uninitialized variable:
{{code}}
public void startBatch() {
this.recordStart = characterDataOriginal;
...
loadVarCharDataAddress();
}
private void loadVarCharDataAddress(){
...
this.characterDataOriginal = buf.memoryAddress();
...
}
{{code}}
Second, the class keeps track of actual memory addresses for the VarChar data. When data would exceed the buffer, memory is reallocated using the following method. But, this method *does not* adjust the {{recordStart}} member, which now points to a bogus memory address:
{code}
private void expandVarCharData(){
vector.getDataVector().reAlloc();
long diff = characterData - characterDataOriginal;
loadVarCharDataAddress();
characterData += diff;
}
{code}
Fortunately, like so many bugs in this class, these bugs are offset by the fact that the variable is never used in a way that cause obvious problems (else these issues would have been found sooner.) The only real use is:
{code}
@Override
public boolean rowHasData() {
return recordStart < characterData;
}
{code}
Which, it turns out, is used only at EOF in {{TextReader.parseRecord()}}:
{code}
} catch(StreamFinishedPseudoException e) {
// if we've written part of a field or all of a field, we should send this row.
if (fieldsWritten == 0 && !output.rowHasData()) {
throw e;
}
}
{code}
Thus, the bug will cause a failure only if:
* A batch is resized, and
* The new address is lower than the original address, and
* It is the last batch in the file.
Because of the low probability of hitting the bug, the priority is set to Minor.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)