You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/08/17 15:14:01 UTC
[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block

    [ https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909743#comment-16909743 ] 

ASF subversion and git services commented on IMPALA-110:
--------------------------------------------------------

Commit c665fc1e06d53c7b70611ad3993c712c7fe35cb2 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c665fc1 ]

IMPALA-8790: fix referencing wrong grouping exprs of MultiAggregateInfo

When creating the single node plan for analytic functions (see
SingleNodePlanner#createQueryPlan), if the query block contains
aggregations, the grouping exprs are carried on for optimizations (see
AnalyticPlanner#computeInputPartitionExprs). This patch fixes a
regression bug due to IMPALA-110 in this phase.

The pre-IMPALA-110 behavior is carrying the groupingExprs of the
AggregateInfo (if has). Those exprs are already substituted in
AggregationNode#init calling from SingleNodePlanner#createSelectPlan.
Now the behavior is carrying the groupingExprs of the MultiAggregateInfo
(if has). Those exprs are not substituted in AggregationNode#init.
Instead, MultiAggregateInfo creates a substituted grouping exprs in this
step. They are what we actually need. The original grouping exprs may
use non-materialized slots, which should not be referenced in exchanges.

Also add some useful TRACE logs for future debugging.

Tests
  - Add planner tests to cover the regression bug.
  - Run CORE tests

Change-Id: I11a80bd6d73ea00ad8c644469558a1885706f596
Reviewed-on: http://gerrit.cloudera.org:8080/14063
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Add support for multiple distinct operators in the same query block
> -------------------------------------------------------------------
>
>                 Key: IMPALA-110
>                 URL: https://issues.apache.org/jira/browse/IMPALA-110
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend, Frontend
>    Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 2.3.0
>            Reporter: Greg Rahn
>            Assignee: Thomas Tauber-Marshall
>            Priority: Major
>              Labels: sql-language
>             Fix For: Impala 3.1.0
>
>
> Impala only allows a single (DISTINCT columns) expression in each query.
> {color:red}Note:
> If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by specifying NDV(column); a query can contain multiple instances of NDV(column). To make Impala automatically rewrite COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query option.
> {color}
> {code}
> [impala:21000] > select count(distinct i_class_id) from item;
> Query: select count(distinct i_class_id) from item
> Query finished, fetching results ...
> 16
> Returned 1 row(s) in 1.51s
> {code}
> {code}
> [impala:21000] > select count(distinct i_class_id), count(distinct i_brand_id) from item;
> Query: select count(distinct i_class_id), count(distinct i_brand_id) from item
> ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in select count(distinct i_class_id), count(distinct i_brand_id) from item)
> 	at com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
> 	at com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
> 	at com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
> Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as COUNT(DISTINCT i_class_id); deviating function: COUNT(DISTINCT i_brand_id)
> 	at com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196)
> 	at com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143)
> 	at com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466)
> 	at com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347)
> 	at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155)
> 	at com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
> 	... 2 more
> {code}
> Hive supports this:
> {code}
> $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from item;"
> Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
> Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201302081514_0073, Tracking URL = http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073
> Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
> 2013-03-05 22:34:43,255 Stage-1 map = 0%,  reduce = 0%
> 2013-03-05 22:34:49,323 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:50,337 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:51,351 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:52,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:53,370 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:54,379 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:55,389 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.58 sec
> 2013-03-05 22:34:56,402 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.58 sec
> 2013-03-05 22:34:57,413 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.58 sec
> 2013-03-05 22:34:58,424 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.58 sec
> MapReduce Total cumulative CPU time: 8 seconds 580 msec
> Ended Job = job_201302081514_0073
> MapReduce Jobs Launched: 
> Job 0: Map: 1  Reduce: 1   Cumulative CPU: 8.58 sec   HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 8 seconds 580 msec
> OK
> 16	952
> Time taken: 25.666 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org