You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2015/01/05 00:58:15 UTC

[jira] [Updated] (DRILL-1434) count of a nullable column in tpcds gives incorrect results

     [ https://issues.apache.org/jira/browse/DRILL-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacques Nadeau updated DRILL-1434:
----------------------------------
    Assignee: Aman Sinha

> count of a nullable column in tpcds gives incorrect results
> -----------------------------------------------------------
>
>                 Key: DRILL-1434
>                 URL: https://issues.apache.org/jira/browse/DRILL-1434
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 0.6.0
>            Reporter: Chun Chang
>            Assignee: Aman Sinha
>             Fix For: 0.8.0
>
>         Attachments: 0001-DRILL-1434-In-ParquetGroupScan-compute-the-non-null-.patch
>
>
> code base 
> #Fri Sep 12 14:08:02 PDT 2014
> git.commit.id.abbrev=9e16466
> I have a parquet file (tpcds data) which contains null value on a column. The total count of the column:
> 0: jdbc:drill:schema=dfs> select count(ss_quantity) from `tpcds/p1/store_sales.parquet`;
> +------------+
> |   EXPR$0   |
> +------------+
> | 2880404    |
> +------------+
> The count without considering null is:
> 0: jdbc:drill:schema=dfs> select count(ss_quantity) from `tpcds/p1/store_sales.parquet` where ss_quantity is not null;
> +------------+
> |   EXPR$0   |
> +------------+
> | 2750408    |
> +------------+
> But the count for null value is zero:
> 0: jdbc:drill:schema=dfs> select count(ss_quantity) from `tpcds/p1/store_sales.parquet` where ss_quantity is null;
> +------------+
> |   EXPR$0   |
> +------------+
> | 0          |
> +------------+
> Here is the physical plan look like for this query:
> 0: jdbc:drill:schema=dfs> explain plan for select count(ss_quantity) from `tpcds/p1/store_sales.parquet` where ss_quantity is null;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      StreamAgg(group=[{}], EXPR$0=[COUNT($0)])
> 00-02        Filter(condition=[IS NULL($0)])
> 00-03          ProducerConsumer
> 00-04            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/user/root/mondrian/tpcds/p1/store_sales.parquet]], selectionRoot=/user/root/mondrian/tpcds/p1/store_sales.parquet, columns=[SchemaPath [`ss_quantity`]]]])
>  | {
>   "head" : {
>     "version" : 1,
>     "generator" : {
>       "type" : "ExplainHandler",
>       "info" : ""
>     },
>     "type" : "APACHE_DRILL_PHYSICAL",
>     "options" : [ ],
>     "queue" : 0,
>     "resultMode" : "EXEC"
>   },
>   "graph" : [ {
>     "pop" : "parquet-scan",
>     "@id" : 4,
>     "entries" : [ {
>       "path" : "maprfs:/user/root/mondrian/tpcds/p1/store_sales.parquet"
>     } ],
>     "storage" : {
>       "type" : "file",
>       "enabled" : true,
>       "connection" : "maprfs:///",
>       "workspaces" : {
>         "default" : {
>           "location" : "/user/root/mondrian/",
>           "writable" : true,
>           "storageformat" : null
>         },
>         "home" : {
>           "location" : "/",
>           "writable" : false,
>           "storageformat" : null
>         },
>         "root" : {
>           "location" : "/",
>           "writable" : false,
>           "storageformat" : null
>         },
>         "abhi" : {
>           "location" : "/tables",
>           "writable" : true,
>           "storageformat" : "csv"
>         },
>         "chun" : {
>           "location" : "/drill/testdata/chun/",
>           "writable" : false,
>           "storageformat" : null
>         },
>         "tmp" : {
>           "location" : "/tmp",
>           "writable" : true,
>           "storageformat" : "csv"
>         }
>       },
>       "formats" : {
>         "psv" : {
>           "type" : "text",
>           "extensions" : [ "tbl" ],
>           "delimiter" : "|"
>         },
>         "csv" : {
>           "type" : "text",
>           "extensions" : [ "csv" ],
>           "delimiter" : ","
>         },
>         "tsv" : {
>           "type" : "text",
>           "extensions" : [ "tsv" ],
>           "delimiter" : "\t"
>         },
>         "parquet" : {
>           "type" : "parquet"
>         },
>         "json" : {
>           "type" : "json"
>         }
>       }
>     },
>     "format" : {
>       "type" : "parquet"
>     },
>     "columns" : [ "`ss_quantity`" ],
>     "selectionRoot" : "/user/root/mondrian/tpcds/p1/store_sales.parquet",
>     "cost" : 2880404.0
>   }, {
>     "pop" : "producer-consumer",
>     "@id" : 3,
>     "child" : 4,
>     "size" : 10,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 2880404.0
>   }, {
>     "pop" : "filter",
>     "@id" : 2,
>     "child" : 3,
>     "expr" : "isnull(`ss_quantity`) ",
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 720101.0
>   }, {
>     "pop" : "streaming-aggregate",
>     "@id" : 1,
>     "child" : 2,
>     "keys" : [ ],
>     "exprs" : [ {
>       "ref" : "`EXPR$0`",
>       "expr" : "count(`ss_quantity`) "
>     } ],
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 1.0
>   }, {
>     "pop" : "screen",
>     "@id" : 0,
>     "child" : 1,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 72010.1
>   } ]
> } |
> +------------+------------+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)