You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2016/06/06 20:01:21 UTC

[jira] [Reopened] (PHOENIX-2965) Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY

     [ https://issues.apache.org/jira/browse/PHOENIX-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Taylor reopened PHOENIX-2965:
-----------------------------------

It does seem like a subset of COUNT(DISTINCT) queries could use the DistinctPrefixFilter, but only if the only aggregation being done is a COUNT(DISTINCT), since we run a single scan and calculate all the aggregates for each row.

Not sure if Calcite has a rule like this already, but if it'd did, we get this optimization for free, as Calcite would cost both and choose the better one. For example, the following type of query:
{code}
SELECT COUNT(DISTINCT foo) FROM t;
{code}
could be rewritten as:
{code}
SELECT COUNT(1) FROM (SELECT DISTINCT foo FROM t);
{code}
The latter would use the DistinctPrefixFilter and be more efficient, with the only extra cost being that the values would come back to the client instead of being counted on the server. A separate JIRA for this would be good for this part, as that's also the only missing part for the COUNT(...) GROUP BY case.

Does such a rule make sense, [~julianhyde], and if so does it already exist? 


> Use DistinctPrefixFilter logic for COUNT(DISTINCT ...) and COUNT(...) GROUP BY
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2965
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2965
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Lars Hofhansl
>             Fix For: 4.8.0
>
>
> Parent uses skip scanning to optimize DISTINCT and certain GROUP BY operations along the row key.
> COUNT queries are optimized differently, could be sped up significantly as well.
> [~giacomotaylor], I might need to help into where COUNT(DISTINCT) queries are planned and optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)