You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Dheeren Beborrtha (JIRA)" <ji...@apache.org> on 2018/10/01 17:26:00 UTC
[jira] [Commented] (PHOENIX-3005) Fixes for COUNT(DISTINCT...) with DistinctPrefixFilter and indexes

    [ https://issues.apache.org/jira/browse/PHOENIX-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16634360#comment-16634360 ] 

Dheeren Beborrtha commented on PHOENIX-3005:
--------------------------------------------

Looks like there is still some issue. Should we reopen the issue or create a new Jira ? 

`select count (distinct c1) from tab1 limit 10` query returns correct result in (phoenix 4.7.0.2.5.3.0-37, hbase 1.1.2.2.5) and incorrect result in (  phoenix-4.12.0.2.4.2.0-258, hbase 1.1.2.2.4 ):

*On phoenix-4.12.0.2.4.2.0-258, hbase 1.1.2.2.4*

Without limit:
0: jdbc:phoenix:thin:url=[http://XXXXXXX-|http://halhb-bdcsce-/]> select count (distinct oidadi) from ADI_DL_DATA ;
+------------------------------------------+
|DISTINCT_COUNT(OIDADI)|

+------------------------------------------+
|1985|

+------------------------------------------+
1 row selected (1.421 seconds)

with Limits:
0: jdbc:phoenix:thin:url=[http://XXXXXX-|http://halhb-bdcsce-/]> select count (distinct oidadi) from ADI_DL_DATA limit 10;
+------------------------------------------+
|DISTINCT_COUNT(OIDADI)|

+------------------------------------------+
|97|

+------------------------------------------+

*On phoenix 4.7.0.2.5.3.0-37, hbase 1.1.2.2.5*

Without limit:
0: jdbc:phoenix:thin:url=[http://XXXXXXX-|http://halhb-bdcsce-/]> select count (distinct oidadi) from ADI_DL_DATA ;
+------------------------------------------+
|DISTINCT_COUNT(OIDADI)|

+------------------------------------------+
|1985|

+------------------------------------------+
1 row selected (1.421 seconds)

with Limits:
0: jdbc:phoenix:thin:url=[http://XXXXXX-|http://halhb-bdcsce-/]> select count (distinct oidadi) from ADI_DL_DATA limit 10;
+------------------------------------------+
|DISTINCT_COUNT(OIDADI)|

+------------------------------------------+
1985
+------------------------------------------+

======================================

With explain : 
0: jdbc:phoenix:thin:url=[http://XXXXXXX-|http://halhb-bdcsce-/]> explain select count (distinct oidadi) from ADI_DL_DATA limit 10;
+-------------------------------------------+-----------------------------------------++------------------------------------------------------------------------------------+
|PLAN|EST_BYTES_READ|EST_ROWS_READ|EST_INFO_TS|

+-------------------------------------------+-----------------------------------------++------------------------------------------------------------------------------------+
|CLIENT 128-CHUNK 26176389 ROWS 35232158993 BYTES PARALLEL 16-WAY FULL SCAN OVER ADI_DL_DATA|35232158993|26176389|
|SERVER FILTER BY FIRST KEY ONLY|35232158993|26176389|1537332401319|
|SERVER DISTINCT PREFIX FILTER OVER [OIDADI] \| 35232158993 \| 26176389 \| 1537332401319 \||
|SERVER 10 ROW LIMIT|35232158993|26176389|1537332401319|
|SERVER AGGREGATE INTO SINGLE ROW|35232158993|26176389|1537332401319|
|CLIENT 10 ROW LIMIT|35232158993|26176389|1537332401319|

+-------------------------------------------+-----------------------------------------++------------------------------------------------------------------------------------+
6 rows selected (1.573 seconds)

CC: [~lhofhansl] [~jamestaylor]

> Fixes for COUNT(DISTINCT...) with DistinctPrefixFilter and indexes
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-3005
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3005
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Major
>             Fix For: 4.8.0
>
>         Attachments: 3005-v2.txt, 3005-wip-v1.txt, PHOENIX-3005_v1.patch
>
>
> It turns out that PHOENIX-2965 has some bugs with indexes:
> # COUNT(DISTINCT <indexed column>) does not use the DistinctPrefixFilter
> # Once an index is created COUNT(DISTINCT <pk-prefix>) is no longer using the DistinctPrefixFilter
> This jira fixes both issues.
> Was:
> Currently the optimization in PHOENIX-258 is not used for DISTINCT index scans. We should add that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)