You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Andrei Sereda (Jira)" <ji...@apache.org> on 2020/06/17 22:25:00 UTC

[jira] [Comment Edited] (CALCITE-4069) Elasticsearch the result of count(*) will be added to the result list twice when use multiple aggregations without group by

    [ https://issues.apache.org/jira/browse/CALCITE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138874#comment-17138874 ] 

Andrei Sereda edited comment on CALCITE-4069 at 6/17/20, 10:24 PM:
-------------------------------------------------------------------

What would happen in the case below ? 

{code:sql}
select count(*), sum(field1),  min(field2), count(*) from foo
{code}

Assume {{aggregations.size() > 1}} ? 


was (Author: asereda):
What would happen in the case below ? 

{code:sql}
select count(*), sum(field1),  min(field2), count(*) from foo
{code}

> Elasticsearch the result of count(*) will be added to the result list twice when use multiple aggregations without group by
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-4069
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4069
>             Project: Calcite
>          Issue Type: Improvement
>          Components: elasticsearch-adapter
>            Reporter: yeyonghao
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> when you write a sql:
> {code:java}
> select count(*) from mytable
> {code}
> in:
> {code:java}
> org.apache.calcite.adapter.elasticsearch.ElasticsearchTable#aggregate
> {code}
> As an optimization, totalHit will be the result of the query for count *:
> {code:java}
> // elastic exposes total number of documents matching a query in "/hits/total" path
> // this can be used for simple "select count(*) from table"
> final long total = res.searchHits().total().value();
> {code}
> It is then added to the result set:
> {code:java}
> if (groupBy.isEmpty()) {
>   // put totals automatically for count(*) expression(s), unless they contain group by
>   for (String expr : countAll) {
>     result.forEach(m -> m.put(expr, total));
>   }
> }
> {code}
> This is fine if there is only one count * aggregate function in SQL.
> But when you write multiple aggregate functions in your SQL:
> {code:java}
> select count(*),sum(field1) from mytable{code}
> You'll notice that the result of count * is repeatedly added to the result set (it doesn't affect the final result, but I think it can be optimized).
> The result of count * is first added to the result set along with other aggregate function result values in the following code:
> {code:java}
> final List<Map<String, Object>> result = new ArrayList<>();
> if (res.aggregations() != null) {
>   // collect values
>   ElasticsearchJson.visitValueNodes(res.aggregations(), m -> {
>     // using 'Collectors.toMap' will trigger Java 8 bug here
>     Map<String, Object> newMap = new LinkedHashMap<>();
>     for (String key: m.keySet()) {
>       newMap.put(fieldMap.getOrDefault(key, key), m.get(key));
>     }
>     result.add(newMap);
>   });
> } else {
>   // probably no group by. add single result
>   result.add(new LinkedHashMap<>());
> }
> {code}
> The second time is added again in the following code:
> {code:java}
> // elastic exposes total number of documents matching a query in "/hits/total" path
> // this can be used for simple "select count(*) from table"
> final long total = res.searchHits().total().value();
> if (groupBy.isEmpty()) {
>   // put totals automatically for count(*) expression(s), unless they contain group by
>   for (String expr : countAll) {
>     result.forEach(m -> m.put(expr, total));
>   }
> }
> {code}
> Although such operations have no effect on the final result when there are multiple aggregate functions, it is obvious that they can be optimized.TotalHit is added to the result set only when groupby. isEmpty() and aggregation. size() == 1 simultaneously satisfy:
> {code:java}
> if (groupBy.isEmpty() && aggregations.size() == 1) {
>    ...
> }
> {code}
> This avoids repeatedly adding count * to the result set.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)