You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/12/18 20:05:00 UTC

[jira] [Created] (DRILL-6037) List vector can lose data when "promoting" to union

Paul Rogers created DRILL-6037:
----------------------------------

             Summary: List vector can lose data when "promoting" to union
                 Key: DRILL-6037
                 URL: https://issues.apache.org/jira/browse/DRILL-6037
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers


Drill provides a little-known {{ListVector}} used in the JSON reader to create an alternative to the {{REPEATED}} data mode which allows array values to be null. That is, the list vector allows the following:

{noformat}
{a: [10, 20]} {a: null}
{noformat}

(It is unclear if the rest of Drill can handle this extra null state, however.)

The list vector has another form of magic. It can be "promoted" to a list of (barely supported) unions. Promotion to union allows the following:

{noformat}
{a: [10, "twenty"]}
{noformat}

Promotion to union is done via a call to {{ListVector.promoteToUnion()}} which appears to be called only from {{PromotableWriter.promoteToUnion()}}.

The {{ListVector.promoteToUnion()}} call itself transforms the list from a list of something to a list of Union, with the something as the first union member. However *it does not* go back and update the Union's type vector with the type of the prior values.

That work is done in {{PromotableWriter.promoteToUnion()}}, meaning that other uses (such as the size-aware writers) must duplicate that functionality or risk losing the values before the promotion. The code should be in the vector itself so that {{ListVector.promoteToUnion()}} "does the right thing" without clients needing to fill in part of the work.

Another feature of lists is that, unlike {{REPEATED}} types, lists allow nulls as list values. That is, a list can support the following:

{code}
{a: [10, null, 20]}
{code}

The code in {{PromotableWriter.promoteToUnion()}} code is wrong: it sets all unions to the prior type (such as BIGINT in the example above) without considering if the value is null. As a result, after promotion to union, the above list will be:

{code}
{a: [10, 0, 20]}
{code}

The code should check the null flag on each value. If null, set the union's type vector to the null marker, else set it to the type of the prior vector.

Note: a new version, {{ListVector.convertToUnion()}} was created for use in the new size-aware writers. The old version should be fixed or deprecated to avoid data corruption errors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)