You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vitalii Diravka (Jira)" <ji...@apache.org> on 2021/11/18 22:36:00 UTC

[jira] [Assigned] (DRILL-5612) Random failure in TestMergeJoinWithSchemaChanges

     [ https://issues.apache.org/jira/browse/DRILL-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vitalii Diravka reassigned DRILL-5612:
--------------------------------------

    Assignee: Vitalii Diravka

> Random failure in TestMergeJoinWithSchemaChanges
> ------------------------------------------------
>
>                 Key: DRILL-5612
>                 URL: https://issues.apache.org/jira/browse/DRILL-5612
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Vitalii Diravka
>            Priority: Major
>         Attachments: image-2021-11-16-02-35-25-690.png
>
>
> The unit test {{org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges#testMissingAndNewColumns}} is subject to random failures, perhaps due to changes in file order in readers.
> The test builds a number of input files, then executes queries against them. On most runs, the output is fine:
> {code}
> Running org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges#testMissingAndNewColumns
> /home/.../target/1498606483211-0/mergejoin-schemachanges-left
> /home/.../target/1498606483211-1/mergejoin-schemachanges-right
> {code}
> But, on occasion, the query fails:
> {code}
> org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges
> testMissingAndNewColumns(org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges)  Time elapsed: 0.569 sec  <<< ERROR!
> ...: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts with changing schemas
> Fragment 0:0
>   (org.apache.drill.exec.exception.SchemaChangeException) Sort currently only supports a single schema.
>     org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder.build():152
>     org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext():476
> ...
> {code}
> The line in the exception above:
> {code}
>   public void build(VectorContainer outputContainer) throws SchemaChangeException {
>     outputContainer.clear();
>     if (batches.keySet().size() > 1) {
>       throw new SchemaChangeException("Sort currently only supports a single schema.");
>     }
> {code}
> The above code has not changed in quite some time. The failure is in the "legacy" external sort.
> Although the external sort does support schema changes, it only does so in the form of a union vector, which must be enabled. (Other tests validate that schema changes work.)
> What is likely happening here is that the sort sometimes sees two files with differing schemas, sometimes multiple threads run so that a single sort sees only one file. This speculation can be verified by looking at a log file (not available in the test run that failed) to see if the scan under the sort read more than one file.
> Or, perhaps the order of the JSON files matters. Perhaps file order varies across machines (since the Linux command to list directories does not guarantee order.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)