You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Abhilash L L <ab...@infoworks.io> on 2016/02/01 07:41:13 UTC

Re: Exact distinct count support

Sorry for the delayed response.

what's the cardinality of the dimension that you want to count distinct
values?
--> We might be coming across different types of cardinality for the
measure. Though unsigned int capacity should cover almost all cases, there
might be some cases we miss.


For example, if you want to count distinct users, use the numeric user_id,
instead of email address;
--> We will see if we can come up with a mapping function and use that for
distinct count


cast Long to Int may cause precision losing
--> i remember seeing something like, good to know its removed and will be
introduced later after the fix


Regards,
Abhilash

On Fri, Jan 29, 2016 at 4:51 PM, Sarnath <st...@gmail.com> wrote:

> Yes. I was just hinting at practically faster compute using bloom filter.
> Will need a way to handle probablistic answers
>

Re: Exact distinct count support

Posted by Abhilash L L <ab...@infoworks.io>.
Hello,

   Need clarification on one point. From what I understand the int value
for the bitmap is per cell ?

   As long as the maximum disctinct count for one cell (a given value for
each dimension in the particular cuboid) does not exceed int value we
should be okay ?



Regards,
Abhilash

On Mon, Feb 1, 2016 at 12:11 PM, Abhilash L L <ab...@infoworks.io> wrote:

> Sorry for the delayed response.
>
> what's the cardinality of the dimension that you want to count distinct
> values?
> --> We might be coming across different types of cardinality for the
> measure. Though unsigned int capacity should cover almost all cases, there
> might be some cases we miss.
>
>
> For example, if you want to count distinct users, use the numeric
> user_id, instead of email address;
> --> We will see if we can come up with a mapping function and use that for
> distinct count
>
>
> cast Long to Int may cause precision losing
> --> i remember seeing something like, good to know its removed and will be
> introduced later after the fix
>
>
> Regards,
> Abhilash
>
> On Fri, Jan 29, 2016 at 4:51 PM, Sarnath <st...@gmail.com> wrote:
>
>> Yes. I was just hinting at practically faster compute using bloom filter.
>> Will need a way to handle probablistic answers
>>
>
>