You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2017/09/06 05:07:00 UTC

[jira] [Created] (PHOENIX-4164) APPROX_COUNT_DISTINCT becomes imprecise at 20m unique values.

Lars Hofhansl created PHOENIX-4164:
--------------------------------------

             Summary: APPROX_COUNT_DISTINCT becomes imprecise at 20m unique values.
                 Key: PHOENIX-4164
                 URL: https://issues.apache.org/jira/browse/PHOENIX-4164
             Project: Phoenix
          Issue Type: Bug
            Reporter: Lars Hofhansl


{code}
0: jdbc:phoenix:localhost> select count(*) from test;
+-----------+
| COUNT(1)  |
+-----------+
| 26931816  |
+-----------+
1 row selected (14.604 seconds)
0: jdbc:phoenix:localhost> select approx_count_distinct(v1) from test;
+----------------------------+
| APPROX_COUNT_DISTINCT(V1)  |
+----------------------------+
| 17221394                   |
+----------------------------+
1 row selected (21.619 seconds)
{code}

The table is generated from random numbers, and the cardinality of v1 is close to the number of rows.
(I cannot run a COUNT(DISTINCT(v1)), as it uses up all memory on my machine and eventually kills the regionserver - that's another story and another jira)

[~aertoria]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)