You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/05/16 20:12:00 UTC

[jira] [Commented] (DRILL-7257) [Text V3 Reader] dir0 is empty if a column filter returns all lines.

    [ https://issues.apache.org/jira/browse/DRILL-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841689#comment-16841689 ] 

Paul Rogers commented on DRILL-7257:
------------------------------------

Turns out this is due to a subtle issue with variable-width nullable vectors. Such vectors have a {{lastSet}} attribute in the {{Mutator}} class. When using "transfer pairs" to copy values, the code somehow decides to zero-fill from the {{lastSet}} value to the record count. The row set framework did not set this value, meaning that the {{RemovingRecordBatch}} zero-filled the {{dir0}} column when it chose to use transfer pairs rather than copying values. The use of transfer pairs occurs when all rows in a batch pass the filter prior to the removing record batch.

Modified the nullable vector writer to properly set the {{lastSet}} value at the end of each batch.

One could argue that the semantics of {{lastSet}} are wrong: no operator except a scan should ever zero-fill, and should certainly not do so with batches provided by a child operator. Fixing that issue is more complex and is left for another time.


> [Text V3 Reader] dir0 is empty if a column filter returns all lines.
> --------------------------------------------------------------------
>
>                 Key: DRILL-7257
>                 URL: https://issues.apache.org/jira/browse/DRILL-7257
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Anton Gozhiy
>            Priority: Major
>         Attachments: lineitempart.zip
>
>
> *Data:*
> Unzip the attached archive: lineitempart.zip.
> *Steps:*
> # Set exec.storage.enable_v3_text_reader=true
> # Run the following query:
> {code:sql}
> select columns[0], dir0 from dfs.tmp.`/drill/data/lineitempart` where dir0=1994 and columns[0]>29766 order by columns[0] limit 1;
> {code}
> *Expected result:*
> {noformat}
> +--------+------+
> | EXPR$0 | dir0 |
> +--------+------+
> | 29767  | 1994 |
> +--------+------+
> {noformat}
> *Actual result:*
> {noformat}
> +--------+------+
> | EXPR$0 | dir0 |
> +--------+------+
> | 29767  |      |
> +--------+------+
> {noformat}
> *Note:* If change filter a bit so it doesn't return all lines, everything is ok:
> {noformat}
> apache drill> select columns[0], dir0 from dfs.tmp.`/drill/data/lineitempart` where dir0=1994 and columns[0]>29767 order by columns[0] limit 1;
> +--------+------+
> | EXPR$0 | dir0 |
> +--------+------+
> | 29792  | 1994 |
> +--------+------+
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)