You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/09/29 22:34:00 UTC

[jira] [Created] (DRILL-5828) RecordBatchLoader permutes column order

Paul Rogers created DRILL-5828:
----------------------------------

             Summary: RecordBatchLoader permutes column order
                 Key: DRILL-5828
                 URL: https://issues.apache.org/jira/browse/DRILL-5828
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.11.0
            Reporter: Paul Rogers
            Priority: Minor


The {{RecordBatchLoader}} class deserializes batches and checks for schema changes. As part of investigating DRILL-5826, it seems that {{RecordBatchLoader}} detects schema changes as follows:

* If two batches have the same column in the same order, no schema change occurs. (Fine)
* If batch A has schema (a, b) while batch B has (b, a), then no schema change occurs. (Fine)

But, in the case of permutated columns (second case above), the {{RecordBatchLoader}} returns the column order of the second batch, though it says that no schema change has occurred.

That is, {{RecordBatchLoader}} says that the schema has not changed, but the actual schema has changed (column order changed.)

This is a potential problem: if a downstream batch counts on the same column order, then that assumption is violated by the behavior described above.

Correct behavior would be to coerce the second batch to match the schema of the first batch, if the {{RecordBatchLoader}} indicates that no schema change occurred.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)