You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2019/03/19 16:08:00 UTC
[jira] [Updated] (DRILL-7113) Issue with filtering null values from MapRDB-JSON

     [ https://issues.apache.org/jira/browse/DRILL-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aman Sinha updated DRILL-7113:
------------------------------
    Fix Version/s:     (was: 1.17.0)

> Issue with filtering null values from MapRDB-JSON
> -------------------------------------------------
>
>                 Key: DRILL-7113
>                 URL: https://issues.apache.org/jira/browse/DRILL-7113
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning &amp; Optimization
>    Affects Versions: 1.15.0
>            Reporter: Hanumath Rao Maduri
>            Assignee: Aman Sinha
>            Priority: Major
>             Fix For: 1.16.0
>
>
> When the Drill is querying documents from MapRDBJSON that contain fields with null value, it returns the wrong result.
>  The issue is locally reproduced.
> Please find the repro steps:
>  [1] Create a MaprDBJSON table. Say '/tmp/dmdb2/'.
> [2] Insert the following sample records to table:
> {code:java}
> insert --table /tmp/dmdb2/ --value '{"_id": "1", "label": "person", "confidence": 0.24}'
> insert --table /tmp/dmdb2/ --value '{"_id": "2", "label": "person2"}'
> insert --table /tmp/dmdb2/ --value '{"_id": "3", "label": "person3", "confidence": 0.54}'
> insert --table /tmp/dmdb2/ --value '{"_id": "4", "label": "person4", "confidence": null}'
> {code}
> We can see that for field 'confidence' document 1 has value 0.24, document 3 has value 0.54, document 2 does not have the field and document 4 has the field with value null.
> [3] Query the table from DRILL.
>  *Query 1:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2;
> +----------+-------------+
> |  label   | confidence  |
> +----------+-------------+
> | person   | 0.24        |
> | person2  | null        |
> | person3  | 0.54        |
> | person4  | null        |
> +----------+-------------+
> 4 rows selected (0.2 seconds)
> {code}
> *Query 2:*
> {code:java}
> 0: jdbc:drill:> select * from dfs.tmp.dmdb2;
> +------+-------------+----------+
> | _id  | confidence  |  label   |
> +------+-------------+----------+
> | 1    | 0.24        | person   |
> | 2    | null        | person2  |
> | 3    | 0.54        | person3  |
> | 4    | null        | person4  |
> +------+-------------+----------+
> 4 rows selected (0.174 seconds)
> {code}
> *Query 3:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is not null;
> +----------+-------------+
> |  label   | confidence  |
> +----------+-------------+
> | person   | 0.24        |
> | person3  | 0.54        |
> | person4  | null        |
> +----------+-------------+
> 3 rows selected (0.192 seconds)
> {code}
> *Query 4:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is  null;
> +----------+-------------+
> |  label   | confidence  |
> +----------+-------------+
> | person2  | null        |
> +----------+-------------+
> 1 row selected (0.262 seconds)
> {code}
> As you can see, Query 3 which queries for all documents with confidence value 'is not null', returns a document with null value.
> *Other observation:*
>  Querying the same data using DRILL without MapRDB provides the correct result.
>  For example, create 4 different JSON files with following data:
> {"label": "person", "confidence": 0.24} \{"label": "person2"} \{"label": "person3", "confidence": 0.54} \{"label": "person4", "confidence": null}
> Query it directly using DRILL:
> *Query 5:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2;
> +----------+-------------+
> |  label   | confidence  |
> +----------+-------------+
> | person4  | null        |
> | person3  | 0.54        |
> | person2  | null        |
> | person   | 0.24        |
> +----------+-------------+
> 4 rows selected (0.203 seconds)
> {code}
> *Query 6:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is null;
> +----------+-------------+
> |  label   | confidence  |
> +----------+-------------+
> | person4  | null        |
> | person2  | null        |
> +----------+-------------+
> 2 rows selected (0.352 seconds)
> {code}
> *Query 7:*
> {code:java}
> 0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is not null;
> +----------+-------------+
> |  label   | confidence  |
> +----------+-------------+
> | person3  | 0.54        |
> | person   | 0.24        |
> +----------+-------------+
> 2 rows selected (0.265 seconds)
> {code}
> As seen in query 6 & 7, it returns the correct result.
> I believe the issue is at the MapRDB layer where it is fetching the results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)