You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Daniel Barclay (Drill) (JIRA)" <ji...@apache.org> on 2015/11/01 22:59:27 UTC
[jira] [Comment Edited] (DRILL-2288) ScanBatch violates IterOutcome
protocol for zero-row sources [was: missing JDBC metadata (schema) for
0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983410#comment-14983410 ]
Daniel Barclay (Drill) edited comment on DRILL-2288 at 11/1/15 9:59 PM:
------------------------------------------------------------------------
Chain of bugs and problems encountered and (partially) addressed:
1. {{ScanBatch.next()}} returned {{NONE}} without ever returning {{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't get its schema, even for static-schema sources, or even get trigger to update their own schema).
2. {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was not documented clearly (so developers didn't know correctly what to expect or provide).
3. {{IteratorValidatorBatchIterator}} didn't validate the sequence of {{IterOutcome values}} (so developers weren't notified about incorrect results).
4. {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} correctly (so it reported spurious/incorrect schema-change and/or empty-/non-empty input exceptions).
5. {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR {"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so it didn't reset nested schema-change state, and so caused spurious {{OK_NEW_SCHEMA}} notifications and downstream exceptions).
6. {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field already existed in the batch (so in that case it forcibly changed the type to {{NullableIntVector}}, causing schema changes and downstream exceptions). \[Note: DRILL-2288 does not address other problems with {{NullableIntVector}} dummy columns from {{JsonRecordReader}}.]
7. HBase tests used only one table region, ignoring known problems with multi-region HBase tables (so latent {{HBaseRecordReader}} problems were left undetected and unresolved.) \[Note: DRILL-2288 addresses only one test table (increasing the number of regions on the other test tables exposes at least one other problem).]
8. {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family (so {{NullableIntVector}} dummy columns got created, causing spurious schema changes and downstream exceptions).
9. Some {{RecordBatch}} classes didn't reset their record counts to zero ({{OrderedPartitionRecordBatch.recordCount}}, {{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).
10. {{RecordBatchLoader}}'s record count was not reset to zero by {{UnorderedReceiverBatch}} (so, again, downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).
11. {{MapVector.load(...)}} left some existing vectors empty, not matching the returned length and the length of sibling vectors (so {{MapVector.getObject(int)}} got {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}"). \[Note: DRILL-2288 does not address the root problem.]
12. {{BaseTestQuery.printResult(...)}} skipped deallocation calls in the case of a zero-record record batch (so when it read a zero-row record batch, it caused a memory leak reported at Drillbit shutdown time).
13. {{TestHBaseProjectPushDown.testRowKeyAndColumnPushDown()}} used delimited identifiers of a form (with a period) that Drill can't handle (so the test failed when the test ran with multiple fragments).
was (Author: dsbos):
Chain of bugs and problems encountered and (partially) addressed:
1. {{ScanBatch.next()}} returned {{NONE}} without ever returning {{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't get its schema, even for static-schema sources, or even get trigger to update their own schema).
2. {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was not documented clearly (so developers didn't know correctly what to expect or provide).
3. {{IteratorValidatorBatchIterator}} didn't validate the sequence of {{IterOutcome values}} (so developers weren't notified about incorrect results).
4. {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} correctly (so it reported spurious/incorrect schema-change and/or empty-/non-empty input exceptions).
5. {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR {"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so it didn't reset nested schema-change state, and so caused spurious {{OK_NEW_SCHEMA}} notifications and downstream exceptions).
6. {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field already existed in the batch (so in that case it forcibly changed the type to {{NullableIntVector}}, causing schema changes and downstream exceptions). \[Note: DRILL-2288 does not address other problems with {{NullableIntVector}} dummy columns from {{JsonRecordReader}}.]
7. HBase tests used only one table region, ignoring known problems with multi-region HBase tables (so latent {{HBaseRecordReader}} problems were left undetected and unresolved.) \[Note: DRILL-2288 addresses only one test table (increasing the number of regions on the other test tables exposes at least one other problem).]
8. {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family (so {{NullableIntVector}} dummy columns got created, causing spurious schema changes and downstream exceptions).
9. Some {{RecordBatch}} classes didn't reset their record counts to zero ({{OrderedPartitionRecordBatch.recordCount}}, {{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).
10. {{RecordBatchLoader}}'s record count was not reset to zero by {{UnorderedReceiverBatch}} (so, again, downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).
11. {{MapVector.load(...)}} left some existing vectors empty, not matching the returned length and the length of sibling vectors (so {{MapVector.getObject(int)}} got {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}"). \[Note: DRILL-2288 does not address the root problem.]
> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Information Schema
> Reporter: Daniel Barclay (Drill)
> Assignee: Daniel Barclay (Drill)
> Fix For: 1.3.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up (getColumnCount() returns zero, and trying to access any other metadata throws IndexOutOfBoundsException) for a result set with zero rows, at least for one from DatabaseMetaData.getColumns(...).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)