You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/04/04 17:10:02 UTC

[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block

    [ https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810088#comment-16810088 ] 

ASF subversion and git services commented on IMPALA-110:
--------------------------------------------------------

Commit fdd6db524c9c97f0baebfde0119fce19d62eaec3 in impala's branch refs/heads/2.x from Thomas Tauber-Marshall
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fdd6db5 ]

IMPALA-7251: Fix QueryMaintenance calls in Aggregators

A recent change, IMPALA-110 (part 2), refactored
PartitionedAggregationNode into several classes, including a new type
'Aggregator'. During this refactor, code that makes local allocations
while evaluating exprs was moved from the ExecNode (now
AggregationNode/StreamingAggregationNode) into the Aggregators, but
code related to cleaning these allocations up (ie QueryMaintenance())
was not, resulting in some queries using an excessive amount of
memory.

This patch removes all calls to QueryMaintenance() from the exec nodes
and moves them into the Aggregators.

Testing:
- Added new test cases with a mem limit that fails if the expr
  allocations aren't released in a timely manner.
- Passed a full exhaustive run.

Change-Id: I4dac2bb0a15cdd7315ee15608bae409c125c82f5
Reviewed-on: http://gerrit.cloudera.org:8080/10871
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Add support for multiple distinct operators in the same query block
> -------------------------------------------------------------------
>
>                 Key: IMPALA-110
>                 URL: https://issues.apache.org/jira/browse/IMPALA-110
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend, Frontend
>    Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 2.3.0
>            Reporter: Greg Rahn
>            Assignee: Thomas Tauber-Marshall
>            Priority: Major
>              Labels: sql-language
>             Fix For: Impala 3.1.0
>
>
> Impala only allows a single (DISTINCT columns) expression in each query.
> {color:red}Note:
> If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by specifying NDV(column); a query can contain multiple instances of NDV(column). To make Impala automatically rewrite COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query option.
> {color}
> {code}
> [impala:21000] > select count(distinct i_class_id) from item;
> Query: select count(distinct i_class_id) from item
> Query finished, fetching results ...
> 16
> Returned 1 row(s) in 1.51s
> {code}
> {code}
> [impala:21000] > select count(distinct i_class_id), count(distinct i_brand_id) from item;
> Query: select count(distinct i_class_id), count(distinct i_brand_id) from item
> ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in select count(distinct i_class_id), count(distinct i_brand_id) from item)
> 	at com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
> 	at com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
> 	at com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
> Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as COUNT(DISTINCT i_class_id); deviating function: COUNT(DISTINCT i_brand_id)
> 	at com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196)
> 	at com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143)
> 	at com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466)
> 	at com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347)
> 	at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155)
> 	at com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
> 	... 2 more
> {code}
> Hive supports this:
> {code}
> $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from item;"
> Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
> Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201302081514_0073, Tracking URL = http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073
> Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
> 2013-03-05 22:34:43,255 Stage-1 map = 0%,  reduce = 0%
> 2013-03-05 22:34:49,323 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:50,337 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:51,351 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:52,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:53,370 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:54,379 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
> 2013-03-05 22:34:55,389 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.58 sec
> 2013-03-05 22:34:56,402 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.58 sec
> 2013-03-05 22:34:57,413 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.58 sec
> 2013-03-05 22:34:58,424 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.58 sec
> MapReduce Total cumulative CPU time: 8 seconds 580 msec
> Ended Job = job_201302081514_0073
> MapReduce Jobs Launched: 
> Job 0: Map: 1  Reduce: 1   Cumulative CPU: 8.58 sec   HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 8 seconds 580 msec
> OK
> 16	952
> Time taken: 25.666 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org