You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/09/29 22:34:00 UTC
[jira] [Created] (DRILL-5828) RecordBatchLoader permutes column
order
Paul Rogers created DRILL-5828:
----------------------------------
Summary: RecordBatchLoader permutes column order
Key: DRILL-5828
URL: https://issues.apache.org/jira/browse/DRILL-5828
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Paul Rogers
Priority: Minor
The {{RecordBatchLoader}} class deserializes batches and checks for schema changes. As part of investigating DRILL-5826, it seems that {{RecordBatchLoader}} detects schema changes as follows:
* If two batches have the same column in the same order, no schema change occurs. (Fine)
* If batch A has schema (a, b) while batch B has (b, a), then no schema change occurs. (Fine)
But, in the case of permutated columns (second case above), the {{RecordBatchLoader}} returns the column order of the second batch, though it says that no schema change has occurred.
That is, {{RecordBatchLoader}} says that the schema has not changed, but the actual schema has changed (column order changed.)
This is a potential problem: if a downstream batch counts on the same column order, then that assumption is violated by the behavior described above.
Correct behavior would be to coerce the second batch to match the schema of the first batch, if the {{RecordBatchLoader}} indicates that no schema change occurred.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)