You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2016/03/04 20:57:40 UTC

[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)

    [ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180448#comment-15180448 ] 

Jacques Nadeau commented on DRILL-4474:
---------------------------------------

I've confirmed that this is still a problem in 1.5.

Drill's ConvertCountToDirectScan seems to incorrectly detect the nature of a this count(case) as non-nullable or something else and thus converts the query into a direct scan count.

> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> ----------------------------------------------------------------------
>
>                 Key: DRILL-4474
>                 URL: https://issues.apache.org/jira/browse/DRILL-4474
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.5.0
>         Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
>            Reporter: Shankar
>            Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic. 
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been already created). 
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-1 (Wrong result)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-----------+
> |   count   |
> +-----------+
> | 27645752  |
> +-----------+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-2 (Wrong result)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +-----------+-------+
> |  EXPR$0   |  cnt  |
> +-----------+-------+
> | 37772844  | 2108  |
> +-----------+-------+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-3 (Wrong result, only first count is correct)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +---------+-----------+
> | EXPR$0  |    cnt    |
> +---------+-----------+
> | 201941  | 37772844  |
> +---------+-----------+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-4 (Correct result)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +------+
> | cnt  |
> +------+
> | 525  |
> +------+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-5 (Correct result)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select  
> . . . . . . . > count(sessionid),
> . . . . . . . > count(distinct sessionid)
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > where ( t.id = '/confirmDrop/btnYes/' and t.event = 'Click')
> . . . . . . . > ;
> +---------+---------+
> | EXPR$0  | EXPR$1  |
> +---------+---------+
> | 2108    | 525     |
> +---------+---------+
> 1 row selected (19.355 seconds)
> {noformat}
> {quote}
> {color}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-6
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {quote}
> {noformat}
> 0: jdbc:drill:> explain plan for 
> . . . . . . . > 
> . . . . . . . > select  
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@73ff10e1])
>  | {
>   "head" : {
>     "version" : 1,
>     "generator" : {
>       "type" : "ExplainHandler",
>       "info" : ""
>     },
>     "type" : "APACHE_DRILL_PHYSICAL",
>     "options" : [ ],
>     "queue" : 0,
>     "resultMode" : "EXEC"
>   },
>   "graph" : [ {
>     "pop" : "DirectGroupScan",
>     "@id" : 1,
>     "cost" : 20.0
>   }, {
>     "pop" : "screen",
>     "@id" : 0,
>     "child" : 1,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 20.0
>   } ]
> } |
> +------+------+
> 1 row selected (0.276 seconds)
> {noformat}
> {quote}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-7
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {quote}
> {noformat}
> 0: jdbc:drill:> 
> 0: jdbc:drill:> explain plan for 
> . . . . . . . > 
> . . . . . . . > select  
> . . . . . . . > count(distinct sessionid), 
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ; 
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0], cnt=[$1])
> 00-02        Project(EXPR$0=[$1], cnt=[$0])
> 00-03          NestedLoopJoin(condition=[true], joinType=[inner])
> 00-05            Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@59b7cda9])
> 00-04            StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> 00-06              UnionExchange
> 01-01                StreamAgg(group=[{}], EXPR$0=[COUNT($0)])
> 01-02                  HashAgg(group=[{0}])
> 01-03                    Project(sessionid=[$0])
> 01-04                      HashToRandomExchange(dist0=[[$0]])
> 02-01                        UnorderedMuxExchange
> 03-01                          Project(sessionid=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02                            HashAgg(group=[{0}])
> 03-03                              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=hdfs://namenode:9000/tmp/a_games_log_visit_base]], selectionRoot=hdfs://namenode:9000/tmp/a_games_log_visit_base, numFiles=1, usedMetadataFile=false, columns=[`sessionid`]]])
>  | {
>   "head" : {
>     "version" : 1,
>     "generator" : {
>       "type" : "ExplainHandler",
>       "info" : ""
>     },
>     "type" : "APACHE_DRILL_PHYSICAL",
>     "options" : [ ],
>     "queue" : 0,
>     "resultMode" : "EXEC"
>   },
>   "graph" : [ {
>     "pop" : "DirectGroupScan",
>     "@id" : 5,
>     "cost" : 20.0
>   }, {
>     "pop" : "parquet-scan",
>     "@id" : 196611,
>     "userName" : "hadoop",
>     "entries" : [ {
>       "path" : "hdfs://namenode:9000/tmp/a_games_log_visit_base"
>     } ],
>     "storage" : {
>       "type" : "file",
>       "enabled" : true,
>       "connection" : "hdfs://namenode:9000",
>       "workspaces" : {
>         "root" : {
>           "location" : "/tmp/",
>           "writable" : true,
>           "defaultInputFormat" : null
>         },
>         "tmp" : {
>           "location" : "/tmp",
>           "writable" : true,
>           "defaultInputFormat" : null
>         }
>       },
>       "formats" : {
>         "psv" : {
>           "type" : "text",
>           "extensions" : [ "tbl" ],
>           "delimiter" : "|"
>         },
>         "csv" : {
>           "type" : "text",
>           "extensions" : [ "csv" ],
>           "delimiter" : ","
>         },
>         "tsv" : {
>           "type" : "text",
>           "extensions" : [ "tsv" ],
>           "delimiter" : "\t"
>         },
>         "parquet" : {
>           "type" : "parquet"
>         },
>         "json" : {
>           "type" : "json"
>         },
>         "avro" : {
>           "type" : "avro"
>         }
>       }
>     },
>     "format" : {
>       "type" : "parquet"
>     },
>     "columns" : [ "`sessionid`" ],
>     "selectionRoot" : "hdfs://namenode:9000/tmp/a_games_log_visit_base",
>     "fileSet" : [ "/tmp/a_games_log_visit_base/1_6_15.parquet", "/tmp/a_games_log_visit_base/1_1_9.parquet", "/tmp/a_games_log_visit_base/1_7_2.parquet", "/tmp/a_games_log_visit_base/1_5_7.parquet", "/tmp/a_games_log_visit_base/1_2_1.parquet", "/tmp/a_games_log_visit_base/1_4_23.parquet", "/tmp/a_games_log_visit_base/1_8_22.parquet", "/tmp/a_games_log_visit_base/1_3_11.parquet", "/tmp/a_games_log_visit_base/1_6_30.parquet", "/tmp/a_games_log_visit_base/1_7_10.parquet", "/tmp/a_games_log_visit_base/1_1_10.parquet", "/tmp/a_games_log_visit_base/1_0_24.parquet", "/tmp/a_games_log_visit_base/1_8_39.parquet", "/tmp/a_games_log_visit_base/1_2_16.parquet", "/tmp/a_games_log_visit_base/1_7_27.parquet", "/tmp/a_games_log_visit_base/1_3_4.parquet", "/tmp/a_games_log_visit_base/1_8_5.parquet", "/tmp/a_games_log_visit_base/1_8_13.parquet", "/tmp/a_games_log_visit_base/1_4_7.parquet", "/tmp/a_games_log_visit_base/1_5_12.parquet", "/tmp/a_games_log_visit_base/1_0_9.parquet", "/tmp/a_games_log_visit_base/1_4_14.parquet", "/tmp/a_games_log_visit_base/1_2_13.parquet", "/tmp/a_games_log_visit_base/1_0_15.parquet", "/tmp/a_games_log_visit_base/1_2_4.parquet", "/tmp/a_games_log_visit_base/1_6_24.parquet", "/tmp/a_games_log_visit_base/1_7_5.parquet", "/tmp/a_games_log_visit_base/1_6_2.parquet", "/tmp/a_games_log_visit_base/1_1_13.parquet", "/tmp/a_games_log_visit_base/1_1_1.parquet", "/tmp/a_games_log_visit_base/1_3_7.parquet", "/tmp/a_games_log_visit_base/1_0_12.parquet", "/tmp/a_games_log_visit_base/1_0_3.parquet", "/tmp/a_games_log_visit_base/1_0_29.parquet", "/tmp/a_games_log_visit_base/1_1_24.parquet", "/tmp/a_games_log_visit_base/1_6_18.parquet", "/tmp/a_games_log_visit_base/1_2_19.parquet", "/tmp/a_games_log_visit_base/1_3_16.parquet", "/tmp/a_games_log_visit_base/1_6_27.parquet", "/tmp/a_games_log_visit_base/1_4_11.parquet", "/tmp/a_games_log_visit_base/1_7_15.parquet", "/tmp/a_games_log_visit_base/1_5_2.parquet", "/tmp/a_games_log_visit_base/1_8_10.parquet", "/tmp/a_games_log_visit_base/1_6_7.parquet", "/tmp/a_games_log_visit_base/1_8_2.parquet", "/tmp/a_games_log_visit_base/1_2_9.parquet", "/tmp/a_games_log_visit_base/1_7_18.parquet", "/tmp/a_games_log_visit_base/1_3_20.parquet", "/tmp/a_games_log_visit_base/1_7_31.parquet", "/tmp/a_games_log_visit_base/1_1_16.parquet", "/tmp/a_games_log_visit_base/1_6_35.parquet", "/tmp/a_games_log_visit_base/1_1_4.parquet", "/tmp/a_games_log_visit_base/1_4_19.parquet", "/tmp/a_games_log_visit_base/1_0_6.parquet", "/tmp/a_games_log_visit_base/1_8_18.parquet", "/tmp/a_games_log_visit_base/1_7_8.parquet", "/tmp/a_games_log_visit_base/1_5_15.parquet", "/tmp/a_games_log_visit_base/1_4_20.parquet", "/tmp/a_games_log_visit_base/1_0_21.parquet", "/tmp/a_games_log_visit_base/1_8_31.parquet", "/tmp/a_games_log_visit_base/1_4_4.parquet", "/tmp/a_games_log_visit_base/1_3_19.parquet", "/tmp/a_games_log_visit_base/1_1_21.parquet", "/tmp/a_games_log_visit_base/1_2_18.parquet", "/tmp/a_games_log_visit_base/1_8_20.parquet", "/tmp/a_games_log_visit_base/1_8_37.parquet", "/tmp/a_games_log_visit_base/1_0_13.parquet", "/tmp/a_games_log_visit_base/1_0_4.parquet", "/tmp/a_games_log_visit_base/1_6_13.parquet", "/tmp/a_games_log_visit_base/1_0_22.parquet", "/tmp/a_games_log_visit_base/1_1_19.parquet", "/tmp/a_games_log_visit_base/1_3_2.parquet", "/tmp/a_games_log_visit_base/1_7_29.parquet", "/tmp/a_games_log_visit_base/1_5_1.parquet", "/tmp/a_games_log_visit_base/1_8_11.parquet", "/tmp/a_games_log_visit_base/1_4_25.parquet", "/tmp/a_games_log_visit_base/1_6_32.parquet", "/tmp/a_games_log_visit_base/1_7_12.parquet", "/tmp/a_games_log_visit_base/1_7_0.parquet", "/tmp/a_games_log_visit_base/1_6_4.parquet", "/tmp/a_games_log_visit_base/1_6_22.parquet", "/tmp/a_games_log_visit_base/1_4_5.parquet", "/tmp/a_games_log_visit_base/1_8_3.parquet", "/tmp/a_games_log_visit_base/1_1_11.parquet", "/tmp/a_games_log_visit_base/1_1_3.parquet", "/tmp/a_games_log_visit_base/1_8_34.parquet", "/tmp/a_games_log_visit_base/1_2_6.parquet", "/tmp/a_games_log_visit_base/1_5_14.parquet", "/tmp/a_games_log_visit_base/1_0_7.parquet", "/tmp/a_games_log_visit_base/1_4_16.parquet", "/tmp/a_games_log_visit_base/1_6_10.parquet", "/tmp/a_games_log_visit_base/1_7_3.parquet", "/tmp/a_games_log_visit_base/1_0_27.parquet", "/tmp/a_games_log_visit_base/1_0_1.parquet", "/tmp/a_games_log_visit_base/1_0_10.parquet", "/tmp/a_games_log_visit_base/1_5_4.parquet", "/tmp/a_games_log_visit_base/1_8_19.parquet", "/tmp/a_games_log_visit_base/1_4_22.parquet", "/tmp/a_games_log_visit_base/1_7_24.parquet", "/tmp/a_games_log_visit_base/1_8_8.parquet", "/tmp/a_games_log_visit_base/1_8_25.parquet", "/tmp/a_games_log_visit_base/1_6_9.parquet", "/tmp/a_games_log_visit_base/1_5_17.parquet", "/tmp/a_games_log_visit_base/1_3_5.parquet", "/tmp/a_games_log_visit_base/1_7_30.parquet", "/tmp/a_games_log_visit_base/1_3_14.parquet", "/tmp/a_games_log_visit_base/1_1_6.parquet", "/tmp/a_games_log_visit_base/1_6_16.parquet", "/tmp/a_games_log_visit_base/1_2_0.parquet", "/tmp/a_games_log_visit_base/1_5_21.parquet", "/tmp/a_games_log_visit_base/1_1_14.parquet", "/tmp/a_games_log_visit_base/1_7_33.parquet", "/tmp/a_games_log_visit_base/1_8_28.parquet", "/tmp/a_games_log_visit_base/1_0_18.parquet", "/tmp/a_games_log_visit_base/1_4_13.parquet", "/tmp/a_games_log_visit_base/1_7_21.parquet", "/tmp/a_games_log_visit_base/1_2_3.parquet", "/tmp/a_games_log_visit_base/1_3_17.parquet", "/tmp/a_games_log_visit_base/1_4_2.parquet", "/tmp/a_games_log_visit_base/1_8_16.parquet", "/tmp/a_games_log_visit_base/1_6_25.parquet", "/tmp/a_games_log_visit_base/1_8_0.parquet", "/tmp/a_games_log_visit_base/1_2_10.parquet", "/tmp/a_games_log_visit_base/1_6_1.parquet", "/tmp/a_games_log_visit_base/1_7_6.parquet", "/tmp/a_games_log_visit_base/1_3_8.parquet", "/tmp/a_games_log_visit_base/1_8_9.parquet", "/tmp/a_games_log_visit_base/1_3_15.parquet", "/tmp/a_games_log_visit_base/1_7_23.parquet", "/tmp/a_games_log_visit_base/1_6_28.parquet", "/tmp/a_games_log_visit_base/1_4_0.parquet", "/tmp/a_games_log_visit_base/1_4_27.parquet", "/tmp/a_games_log_visit_base/1_8_35.parquet", "/tmp/a_games_log_visit_base/1_6_19.parquet", "/tmp/a_games_log_visit_base/1_0_28.parquet", "/tmp/a_games_log_visit_base/1_5_16.parquet", "/tmp/a_games_log_visit_base/1_7_14.parquet", "/tmp/a_games_log_visit_base/1_5_22.parquet", "/tmp/a_games_log_visit_base/1_8_26.parquet", "/tmp/a_games_log_visit_base/1_6_34.parquet", "/tmp/a_games_log_visit_base/1_1_5.parquet", "/tmp/a_games_log_visit_base/1_1_23.parquet", "/tmp/a_games_log_visit_base/1_6_6.parquet", "/tmp/a_games_log_visit_base/1_5_3.parquet", "/tmp/a_games_log_visit_base/1_0_11.parquet", "/tmp/a_games_log_visit_base/1_0_2.parquet", "/tmp/a_games_log_visit_base/1_4_10.parquet", "/tmp/a_games_log_visit_base/1_2_8.parquet", "/tmp/a_games_log_visit_base/1_0_19.parquet", "/tmp/a_games_log_visit_base/1_7_32.parquet", "/tmp/a_games_log_visit_base/1_8_29.parquet", "/tmp/a_games_log_visit_base/1_8_1.parquet", "/tmp/a_games_log_visit_base/1_6_20.parquet", "/tmp/a_games_log_visit_base/1_7_17.parquet", "/tmp/a_games_log_visit_base/1_3_0.parquet", "/tmp/a_games_log_visit_base/1_1_17.parquet", "/tmp/a_games_log_visit_base/1_4_18.parquet", "/tmp/a_games_log_visit_base/1_7_9.parquet", "/tmp/a_games_log_visit_base/1_8_32.parquet", "/tmp/a_games_log_visit_base/1_7_20.parquet", "/tmp/a_ |
> +------+------+
> 1 row selected (0.503 seconds)
> {noformat}
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)