You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/02/28 00:25:05 UTC

[jira] [Created] (DRILL-2342) Nullability property of the view created from parquet file is not correct

Victoria Markman created DRILL-2342:
---------------------------------------

             Summary: Nullability property of the view created from parquet file is not correct
                 Key: DRILL-2342
                 URL: https://issues.apache.org/jira/browse/DRILL-2342
             Project: Apache Drill
          Issue Type: Bug
          Components: Metadata
    Affects Versions: 0.8.0
            Reporter: Victoria Markman
            Assignee: Steven Phillips


Here is my t1 table definition:
{code}
message root {
  optional int32 a1;
  optional binary b1 (UTF8);
  optional int32 c1 (DATE);
}
{code}

I created a view on top of it:
{code}
0: jdbc:drill:schema=dfs> create view v1 as select cast(a1 as int), cast(b1 as varchar(10)), cast(c1 as date) from t1;
+------------+------------+
|     ok     |  summary   |
+------------+------------+
| true       | View 'v1' created successfully in 'dfs.aggregation' schema |
+------------+------------+
1 row selected (0.096 seconds)
{code}

IS_NULLABLE says 'NO', which is incorrect.
{code}
0: jdbc:drill:schema=dfs> describe v1;
+-------------+------------+-------------+
| COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
+-------------+------------+-------------+
| EXPR$0      | INTEGER    | NO          |
| EXPR$1      | VARCHAR    | NO          |
| EXPR$2      | DATE       | NO          |
+-------------+------------+-------------+
3 rows selected (0.067 seconds)
{code}

It is dangerous potentially, because if Calcite decided to take advantage over this property tomorrow and create an optimization where if column is not nullable "is null" predicate can be dropped, query : "select * from v1 where x is null" would return incorrect result.

{code}
0: jdbc:drill:schema=dfs> explain plan for select * from v1 where z is null;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      Project(x=[$0], y=[$1], z=[$2])
00-02        SelectionVectorRemover
00-03          Filter(condition=[IS NULL($2)])
00-04            Project(x=[CAST($2):ANY NOT NULL], y=[CAST($1):ANY NOT NULL], z=[CAST($0):ANY NOT NULL])
00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/t1]], selectionRoot=/aggregation/t1, numFiles=1, columns=[`a1`, `b1`, `c1`]]])
{code}

It seems to me that in views column properties should be always nullable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)