You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "yeyonghao (Jira)" <ji...@apache.org> on 2020/06/17 19:50:00 UTC
[jira] [Created] (CALCITE-4069) Elasticsearch the result of
count(*) will be added to the result list twice when use multiple
aggregations without group by
yeyonghao created CALCITE-4069:
----------------------------------
Summary: Elasticsearch the result of count(*) will be added to the result list twice when use multiple aggregations without group by
Key: CALCITE-4069
URL: https://issues.apache.org/jira/browse/CALCITE-4069
Project: Calcite
Issue Type: Improvement
Components: elasticsearch-adapter
Reporter: yeyonghao
when you write a sql:
{code:java}
select count(*) from mytable
{code}
in:
{code:java}
org.apache.calcite.adapter.elasticsearch.ElasticsearchTable#aggregate
{code}
As an optimization, totalHit will be the result of the query for count(*):
{code:java}
// elastic exposes total number of documents matching a query in "/hits/total" path
// this can be used for simple "select count(*) from table"
final long total = res.searchHits().total().value();
{code}
It is then added to the result set:
{code:java}
if (groupBy.isEmpty()) {
// put totals automatically for count(*) expression(s), unless they contain group by
for (String expr : countAll) {
result.forEach(m -> m.put(expr, total));
}
}
{code}
This is fine if there is only one count(*) aggregate function in SQL.
But when you write multiple aggregate functions in youe SQL:
{code:java}
select count(*),sum(field1) from mytable{code}
You'll notice that the result of count(*) is repeatedly added to the result set (it doesn't affect the final result, but I think it can be optimized).
The result of count(*) is first added to the result set along with other aggregate function result values in the following code:
{code:java}
final List<Map<String, Object>> result = new ArrayList<>();
if (res.aggregations() != null) {
// collect values
ElasticsearchJson.visitValueNodes(res.aggregations(), m -> {
// using 'Collectors.toMap' will trigger Java 8 bug here
Map<String, Object> newMap = new LinkedHashMap<>();
for (String key: m.keySet()) {
newMap.put(fieldMap.getOrDefault(key, key), m.get(key));
}
result.add(newMap);
});
} else {
// probably no group by. add single result
result.add(new LinkedHashMap<>());
}
{code}
The second time is added again in the following code:
{code:java}
// elastic exposes total number of documents matching a query in "/hits/total" path
// this can be used for simple "select count(*) from table"
final long total = res.searchHits().total().value();
if (groupBy.isEmpty()) {
// put totals automatically for count(*) expression(s), unless they contain group by
for (String expr : countAll) {
result.forEach(m -> m.put(expr, total));
}
}
{code}
Although such operations have no effect on the final result when there are multiple aggregate functions, it is obvious that they can be optimized.TotalHit is added to the result set only when groupby. isEmpty() and aggregation. size() == 1 simultaneously satisfy:
{code:java}
if (groupBy.isEmpty() && aggregations.size() == 1) {
...
}
{code}
This avoids repeatedly adding count(*) to the result set.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)