You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2016/03/16 08:28:33 UTC
[jira] [Updated] (DRILL-4513) Query returns null when star in project and filter over non null values

     [ https://issues.apache.org/jira/browse/DRILL-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Khurram Faraaz updated DRILL-4513:
----------------------------------
    Attachment: tooManyNulls.json

Attached JSON file here.

Query plans for the two cases where we see null as result.

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for select * from `tooManyNulls.json` t where t.c1 IS NOT NULL;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(*=[$0])
00-02        Project(T8¦¦*=[$0])
00-03          SelectionVectorRemover
00-04            Filter(condition=[IS NOT NULL($1)])
00-05              Project(T8¦¦*=[$0], c1=[$1])
00-06                Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/tooManyNulls.json, numFiles=1, columns=[`*`], files=[maprfs:///tmp/tooManyNulls.json]]])
{noformat}

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for select * from `tooManyNulls.json` t where t.c1='Hello World';
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(*=[$0])
00-02        Project(T9¦¦*=[$0])
00-03          SelectionVectorRemover
00-04            Filter(condition=[=($1, 'Hello World')])
00-05              Project(T9¦¦*=[$0], c1=[$1])
00-06                Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/tooManyNulls.json, numFiles=1, columns=[`*`], files=[maprfs:///tmp/tooManyNulls.json]]])
{noformat}

> Query returns null when star in project and filter over non null values
> -----------------------------------------------------------------------
>
>                 Key: DRILL-4513
>                 URL: https://issues.apache.org/jira/browse/DRILL-4513
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.4.0
>         Environment: 4 node cluster CentOS
>            Reporter: Khurram Faraaz
>         Attachments: tooManyNulls.json
>
>
> When there is a star in project and there are many nulls followed by a non null value in the column c1 in JSON file, query does not return correct results.
> json file with 4096 records that have a field
> {c1: null}
> followed by the 4097th record that has
> {c1: "Hello World"}
> Drill version
> http://yum.qa.lab/drill-opensource.ebf/mapr-drill-1.4.0.201603151147-1.noarch.rpm
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from sys.version;
> +---------+-----------+----------------+-------------+-------------+------------+
> | version | commit_id | commit_message | commit_time | build_email | build_time |
> +---------+-----------+----------------+-------------+-------------+------------+
> | 1.4.0 | 99ec15919cd73f5e7b976f6193af45c0e46ffd7a | DRILL-4479: For empty fields under all_text_mode enabled (a) use varchar for the default columns and (b) ensure we create fields corresponding to all columns. | 14.03.2016 @ 20:52:13 UTC | Unknown | 15.03.2016 @ 18:47:50 UTC |
> +---------+-----------+----------------+-------------+-------------+------------+
> 1 row selected (0.375 seconds)
> 0: jdbc:drill:schema=dfs.tmp> alter system set `store.json.all_text_mode`=true;
> +-------+------------------------------------+
> |  ok   |              summary               |
> +-------+------------------------------------+
> | true  | store.json.all_text_mode updated.  |
> +-------+------------------------------------+
> 1 row selected (0.136 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select * from `tooManyNulls.json` t where t.c1 = 'Hello World';
> Error: SYSTEM ERROR: NumberFormatException: Hello World
> Fragment 0:0
> [Error Id: f9022f22-12cd-46c7-b471-470a50506ba2 on centos-04.qa.lab:31010] (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select * from `tooManyNulls.json` t where t.c1 IS NOT NULL;
> +-------+
> |   *   |
> +-------+
> | null  |
> +-------+
> 1 row selected (0.239 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select t.c1 from `tooManyNulls.json` t where t.c1 = 'Hello World';
> +--------------+
> |      c1      |
> +--------------+
> | Hello World  |
> +--------------+
> 1 row selected (0.282 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select t.c1 from `tooManyNulls.json` t where t.c1 IS NOT NULL;
> +--------------+
> |      c1      |
> +--------------+
> | Hello World  |
> +--------------+
> 1 row selected (0.259 seconds)
> {noformat}
> Another interesting observation to note, the below query returns null. However, when the same query is used as a sub-query, we see the query returns correct results.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `tooManyNulls.json` t where t.c1 IS NOT NULL;
> +-------+
> |   *   |
> +-------+
> | null  |
> +-------+
> 1 row selected (0.203 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select tmp.c1 from (select * from `tooManyNulls.json` t where t.c1 IS NOT NULL) tmp;
> +--------------+
> |      c1      |
> +--------------+
> | Hello World  |
> +--------------+
> 1 row selected (0.25 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)