You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Daniel Barclay (Drill) (JIRA)" <ji...@apache.org> on 2015/11/01 22:59:27 UTC
[jira] [Comment Edited] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

    [ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983410#comment-14983410 ] 

Daniel Barclay (Drill) edited comment on DRILL-2288 at 11/1/15 9:59 PM:
------------------------------------------------------------------------


Chain of bugs and problems encountered and (partially) addressed:

1.  {{ScanBatch.next()}} returned {{NONE}} without ever returning {{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't get its schema, even for static-schema sources, or even get trigger to update their own schema).

2.  {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was not documented clearly (so developers didn't know correctly what to expect or provide).

3.  {{IteratorValidatorBatchIterator}} didn't validate the sequence of {{IterOutcome values}} (so developers weren't notified about incorrect results).

4.  {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} correctly (so it reported spurious/incorrect schema-change and/or empty-/non-empty input exceptions).

5.  {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR {"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so it didn't reset nested schema-change state, and so caused spurious {{OK_NEW_SCHEMA}} notifications and downstream exceptions).

6.  {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field already existed in the batch (so in that case it forcibly changed the type to {{NullableIntVector}}, causing schema changes and downstream exceptions). \[Note:  DRILL-2288 does not address other problems with {{NullableIntVector}} dummy columns from {{JsonRecordReader}}.]

7.  HBase tests used only one table region, ignoring known problems with multi-region HBase tables (so latent {{HBaseRecordReader}} problems were left undetected and unresolved.)   \[Note: DRILL-2288 addresses only one test table (increasing the number of regions on the other test tables exposes at least one other problem).]

8.  {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family (so {{NullableIntVector}} dummy columns got created, causing spurious schema changes and downstream exceptions).

9.  Some {{RecordBatch}} classes didn't reset their record counts to zero ({{OrderedPartitionRecordBatch.recordCount}}, {{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).

10.  {{RecordBatchLoader}}'s record count was not reset to zero by {{UnorderedReceiverBatch}} (so, again, downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).

11.  {{MapVector.load(...)}} left some existing vectors empty, not matching the returned length and the length of sibling vectors (so {{MapVector.getObject(int)}} got {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}").  \[Note: DRILL-2288 does not address the root problem.]

12. {{BaseTestQuery.printResult(...)}} skipped deallocation calls in the case of a zero-record record batch (so when it read a zero-row record batch, it caused a memory leak reported at Drillbit shutdown time).

13. {{TestHBaseProjectPushDown.testRowKeyAndColumnPushDown()}} used delimited identifiers of a form (with a period) that Drill can't handle (so the test failed when the test ran with multiple fragments).





was (Author: dsbos):


Chain of bugs and problems encountered and (partially) addressed:

1.  {{ScanBatch.next()}} returned {{NONE}} without ever returning {{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't get its schema, even for static-schema sources, or even get trigger to update their own schema).

2.  {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was not documented clearly (so developers didn't know correctly what to expect or provide).

3.  {{IteratorValidatorBatchIterator}} didn't validate the sequence of {{IterOutcome values}} (so developers weren't notified about incorrect results).

4.  {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} correctly (so it reported spurious/incorrect schema-change and/or empty-/non-empty input exceptions).

5.  {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR {"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so it didn't reset nested schema-change state, and so caused spurious {{OK_NEW_SCHEMA}} notifications and downstream exceptions).

6.  {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field already existed in the batch (so in that case it forcibly changed the type to {{NullableIntVector}}, causing schema changes and downstream exceptions). \[Note:  DRILL-2288 does not address other problems with {{NullableIntVector}} dummy columns from {{JsonRecordReader}}.]

7.  HBase tests used only one table region, ignoring known problems with multi-region HBase tables (so latent {{HBaseRecordReader}} problems were left undetected and unresolved.)   \[Note: DRILL-2288 addresses only one test table (increasing the number of regions on the other test tables exposes at least one other problem).]

8.  {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family (so {{NullableIntVector}} dummy columns got created, causing spurious schema changes and downstream exceptions).

9.  Some {{RecordBatch}} classes didn't reset their record counts to zero ({{OrderedPartitionRecordBatch.recordCount}}, {{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).

10.  {{RecordBatchLoader}}'s record count was not reset to zero by {{UnorderedReceiverBatch}} (so, again, downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).

11.  {{MapVector.load(...)}} left some existing vectors empty, not matching the returned length and the length of sibling vectors (so {{MapVector.getObject(int)}} got {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}").  \[Note: DRILL-2288 does not address the root problem.]


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2288
>                 URL: https://issues.apache.org/jira/browse/DRILL-2288
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Information Schema
>            Reporter: Daniel Barclay (Drill)
>            Assignee: Daniel Barclay (Drill)
>             Fix For: 1.3.0
>
>         Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up (getColumnCount() returns zero, and trying to access any other metadata throws IndexOutOfBoundsException) for a result set with zero rows, at least for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)