You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2021/04/25 21:41:00 UTC

[jira] [Resolved] (DRILL-7325) Many operators do not set container record count

     [ https://issues.apache.org/jira/browse/DRILL-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers resolved DRILL-7325.
--------------------------------
    Resolution: Fixed

A number of individual commits fixed problems found in each operator. This overall task is now complete.

> Many operators do not set container record count
> ------------------------------------------------
>
>                 Key: DRILL-7325
>                 URL: https://issues.apache.org/jira/browse/DRILL-7325
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.19.0
>
>
> See DRILL-7324. The following are problems found because some operators fail to set the record count for their containers.
> h4. Scan
> TestComplexTypeReader, on cluster setup, using the PojoRecordReader:
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from ScanBatch
> ScanBatch: Container record count not set
> Reason: ScanBatch never sets the record count of its container (this is a generic issue, not specific to the PojoRecordReader).
> h4. Filter
> {{TestComplexTypeReader.testNonExistentFieldConverting()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
> FilterRecordBatch: Container record count not set
> {noformat}
> h4. Hash Join
> {{TestComplexTypeReader.test_array()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from HashJoinBatch
> HashJoinBatch: Container record count not set
> {noformat}
> Occurs on the first batch in which the hash join returns {{OK_NEW_SCHEMA}} with no records.
> h4. Project
> TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, schema-only batches):
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from ProjectRecordBatch
> ProjectRecordBatch: Container record count not set
> {noformat}
> Occurs in {{ProjectRecordBatch.handleNullInput()}}: it sets up the schema but does not set the value count to 0.
> h4. Unordered Receiver
> {{TestCsvWithSchema.testMultiFileSchema()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from UnorderedReceiverBatch
> UnorderedReceiverBatch: Container record count not set
> {noformat}
> The problem is that {{RecordBatchLoader.load()}} does not set the container record count.
> h4. Streaming Aggregate
> {{TestJsonReader.testSumWithTypeCase()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from StreamingAggBatch
> StreamingAggBatch: Container record count not set
> {noformat}
> The problem is that {{StreamingAggBatch.buildSchema()}} does not set the container record count to 0.
> h4. Limit
> {{TestJsonReader.testDrill_1419()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch
> LimitRecordBatch: Container record count not set
> {noformat}
> None of the paths in {{LimitRecordBatch.innerNext()}} set the container record count.
> h4. Union All
> {{TestJsonReader.testKvgenWithUnionAll()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from UnionAllRecordBatch
> UnionAllRecordBatch: Container record count not set
> {noformat}
> When {{UnionAllRecordBatch}} calls {{VectorAccessibleUtilities.setValueCount()}}, it did not also set the container count.
> h4. Hash Aggregate
> {{TestJsonReader.drill_4479()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from HashAggBatch
> HashAggBatch: Container record count not set
> {noformat}
> Problem is that {{HashAggBatch.buildSchema()}} does not set the container record count to 0 for the first, empty, batch sent for {{OK_NEW_SCHEMA.}}
> h4. And Many More
> I turns out that most operators fail to set one of the many row count variables somewhere in their code path: maybe in the schema setup path, maybe when building a batch along one of the many paths that operators follow. Further, we have multiple row counts that must be set:
> * Values in each vector ({{setValueCount()}},
> * Row count in the container ({{setRecordCount()}}), which must be the same as the vector value count.
> * Row count in the operator (batch), which is the (possibly filtered) count of records presented to downstream operators. It must be less than or equal to the container row count (except for an SV4.)
> * The SV2 record count, which is the number of entries in the SV2 and must be the same as the batch row count (and less or equal to the container row count.)
> * The SV2 actual bactch record count, which must be the same as the container row count.
> * The SV4 record count, which must be the same as the batch record count. With an SV4, the batch consists of multiple containers, each of which must have an accurate container record count.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)