You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/12/18 23:55:58 UTC

[jira] [Created] (DRILL-5134) TestMergeJoinWithSchemaChanges throws exception with paged SV4

Paul Rogers created DRILL-5134:
----------------------------------

             Summary: TestMergeJoinWithSchemaChanges throws exception with paged SV4
                 Key: DRILL-5134
                 URL: https://issues.apache.org/jira/browse/DRILL-5134
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Paul Rogers
            Priority: Minor


The {{TestMergeJoinWithSchemaChanges}} test exercises the in-memory merge sort with union vectors. (Note that union vectors are not fully supported.)

The merge sort creates an SV4 to hold an index into the sorted results. SV4's have the ability to page results as batches to upstream.

When {{TestMergeJoinWithSchemaChanges}} is run using the "managed" external sort and union vectors, a downstream operator throws an index out of range exception. However, when run with the "classic" external sort, no such exception is thrown.

The difference is that the classic version returns all rows in a single batch, while the managed version attempted to return rows in a batch of a specified size.

The paging approach works for tests that do not include union vectors, but fails for those that do include them.

Modifying the managed version to return all results in a single batch does work.

The problem with this workaround is that there will come a size beyond which sorted results cannot be returned in a single batch and paging will be necessary. The sort buffer can, for example, be set to 10G, which is too large for a single batch. Or, the sort can process more than 64K rows, which is also too large for a single batch. In those scenarios, union vectors with SV4 will fail.

Since union vectors are not supported, the workaround described above is used to get the test to pass. This ticket records the issue for a future time in which we attempt to support union vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)