You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/03/29 15:10:00 UTC
[jira] [Commented] (ASTERIXDB-2483) Out of Memory error doing aggregation - need a rewrite

    [ https://issues.apache.org/jira/browse/ASTERIXDB-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805077#comment-16805077 ] 

ASF subversion and git services commented on ASTERIXDB-2483:
------------------------------------------------------------

Commit 0cca97d57d047427c4ed27e3817870fd6325437e in asterixdb's branch refs/heads/master from Dmitry Lychagin
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=0cca97d ]

[ASTERIXDB-2483][COMP][FUN] Eliminate listify for distinct aggregates

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
- Move distinct aggregate rewriting from SqlppQueryRewriter
  to RewriteDistinctAggregateRule in the optimizer
- Add runtime for scalar distinct aggregates
- Fix ExtractCommonOperatorsRule handling of binary operators
- Additional tests for distinct aggregates

Change-Id: If13ea2696e9e0a8a639db684656e5642991c1f99
Reviewed-on: https://asterix-gerrit.ics.uci.edu/3293
Reviewed-by: Ali Alsuliman <al...@gmail.com>
Tested-by: Dmitry Lychagin <dm...@couchbase.com>


> Out of Memory error doing aggregation - need a rewrite
> ------------------------------------------------------
>
>                 Key: ASTERIXDB-2483
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2483
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: COMP - Compiler, RT - Runtime, SQL - Translator SQL++
>    Affects Versions: 0.9.5
>         Environment: Linux
>            Reporter: Michael J. Carey
>            Assignee: Dmitry Lychagin
>            Priority: Critical
>
> This is the schema:
> {noformat}
> CREATE TYPE Test AS open { unique2: int64 };
> CREATE DATASET wisconsin_5gb(Test) PRIMARY KEY unique2;
> {noformat}
> This is the query:
> {noformat}
> SELECT
>     min(t.oddOnePercent) as min, 
>     max(t.oddOnePercent) as max, 
>     count(distinct t.oddOnePercent) as cnt
> FROM wisconsin_5gb t;
> {noformat}
> The plan for this query:
> {noformat}
> distribute result [$$46]
> -- DISTRIBUTE_RESULT  |UNPARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
>     project ([$$46])
>     -- STREAM_PROJECT  |UNPARTITIONED|
>       assign [$$46] <- [{"min": $$48, "max": $$49, "cnt": $$50}]
>       -- ASSIGN  |UNPARTITIONED|
>         project ([$$48, $$49, $$50])
>         -- STREAM_PROJECT  |UNPARTITIONED|
>           subplan {
>                     aggregate [$$50] <- [agg-sql-sum($$53)]
>                     -- AGGREGATE  |LOCAL|
>                       aggregate [$$53] <- [agg-sql-count($$43)]
>                       -- AGGREGATE  |LOCAL|
>                         distinct ([$$43])
>                         -- MICRO_PRE_SORTED_DISTINCT_BY  |LOCAL|
>                           order (ASC, $$43) 
>                           -- IN_MEMORY_STABLE_SORT [$$43(ASC)]  |LOCAL|
>                             assign [$$43] <- [$$52.getField("oddOnePercent")]
>                             -- ASSIGN  |UNPARTITIONED|
>                               assign [$$52] <- [$#4.getField(0)]
>                               -- ASSIGN  |UNPARTITIONED|
>                                 unnest $#4 <- scan-collection($$28)
>                                 -- UNNEST  |UNPARTITIONED|
>                                   nested tuple source
>                                   -- NESTED_TUPLE_SOURCE  |UNPARTITIONED|
>                  }
>           -- SUBPLAN  |UNPARTITIONED|
>             aggregate [$$28, $$48, $$49] <- [listify($$27), agg-sql-min($$33), agg-sql-max($$33)]
>             -- AGGREGATE  |UNPARTITIONED|
>               exchange
>               -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
>                 project ([$$27, $$33])
>                 -- STREAM_PROJECT  |PARTITIONED|
>                   assign [$$33, $$27] <- [$$t.getField("oddOnePercent"), {"t": $$t}]
>                   -- ASSIGN  |PARTITIONED|
>                     project ([$$t])
>                     -- STREAM_PROJECT  |PARTITIONED|
>                       exchange
>                       -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                         data-scan []<-[$$47, $$t] <- Default.wisconsin_5gb
>                         -- DATASOURCE_SCAN  |PARTITIONED|
>                           exchange
>                           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                             empty-tuple-source
>                             -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)