You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Chetan Dixit <Ch...@symantec.com> on 2015/10/27 05:45:48 UTC

Timestamp related issues

Hello Kylin Team,

We are facing following issues while using Kylin could you please help.


1.      Is there any issue with Timestamp/Date values ?
               We see issues in queries using "WHERE columnname = timestamp '2015-07-23 10:30:00' " it does not return any results.
               If we use "WHERE columnname = '2015-07-23 10:30:00' " it returns ERROR
               If use timestamp column in projection list, it truncates the timestamp part i.e. 2015-07-23 10:30:00 to 2015-07-23 00:00:00


2.      For measures with distinct count, it uses approximations with certain error rates, lowest of which is <1.22%. Does this guarantee that counts would be accurate ?
               We have seen for a count of 1000 results as 982, 1000 etc.

Thanks,
Chetan


Fwd: Timestamp related issues

Posted by Luke Han <lu...@gmail.com>.
Forward this to dev list also.

Thanks.

Best Regards!
---------------------

Luke Han

---------- Forwarded message ----------
From: Li Yang <li...@apache.org>
Date: Thu, Oct 29, 2015 at 6:08 PM
Subject: Re: Timestamp related issues
To: user@kylin.incubator.apache.org


> 1.      Is there any issue with Timestamp/Date values ?

Timestamp testing is very limited on 1.x branch. All use cases I knew about
uses date instead of timestamp.
The 2.x branch has much better timestamp support.

> 2.      For measures with distinct count, it uses approximations with
certain error rates, lowest of which is <1.22%. Does this guarantee that
counts would be accurate ?

The short answer is no 100% guarantee. The count distinct algorithm behind
this is HyperLogLog [1]. Its error follows a normal distribution. The "<
1.22%" is brief of saying for 99.7% out of all the results, the error is
<1.22% in theory. And there's still 0.3% results could go beyond the error.

[1] https://en.wikipedia.org/wiki/HyperLogLog

On Tue, Oct 27, 2015 at 12:45 PM, Chetan Dixit <Ch...@symantec.com>
wrote:

> Hello Kylin Team,
>
>
>
> We are facing following issues while using Kylin could you please help.
>
>
>
> 1.      Is there any issue with Timestamp/Date values ?
>
>                We see issues in queries using “WHERE columnname =
> timestamp ‘2015-07-23 10:30:00’ “ it does not return any results.
>
>                If we use “WHERE columnname = ‘2015-07-23 10:30:00’ “ it
> returns ERROR
>
>                If use timestamp column in projection list, it truncates
> the timestamp part i.e. 2015-07-23 10:30:00 to 2015-07-23 00:00:00
>
>
>
> 2.      For measures with distinct count, it uses approximations with
> certain error rates, lowest of which is <1.22%. Does this guarantee that
> counts would be accurate ?
>
>                We have seen for a count of 1000 results as 982, 1000 etc.
>
>
>
> Thanks,
>
> Chetan
>
>
>

Fwd: Timestamp related issues

Posted by Luke Han <lu...@gmail.com>.
Forward this to dev list also.

Thanks.

Best Regards!
---------------------

Luke Han

---------- Forwarded message ----------
From: Li Yang <li...@apache.org>
Date: Thu, Oct 29, 2015 at 6:08 PM
Subject: Re: Timestamp related issues
To: user@kylin.incubator.apache.org


> 1.      Is there any issue with Timestamp/Date values ?

Timestamp testing is very limited on 1.x branch. All use cases I knew about
uses date instead of timestamp.
The 2.x branch has much better timestamp support.

> 2.      For measures with distinct count, it uses approximations with
certain error rates, lowest of which is <1.22%. Does this guarantee that
counts would be accurate ?

The short answer is no 100% guarantee. The count distinct algorithm behind
this is HyperLogLog [1]. Its error follows a normal distribution. The "<
1.22%" is brief of saying for 99.7% out of all the results, the error is
<1.22% in theory. And there's still 0.3% results could go beyond the error.

[1] https://en.wikipedia.org/wiki/HyperLogLog

On Tue, Oct 27, 2015 at 12:45 PM, Chetan Dixit <Ch...@symantec.com>
wrote:

> Hello Kylin Team,
>
>
>
> We are facing following issues while using Kylin could you please help.
>
>
>
> 1.      Is there any issue with Timestamp/Date values ?
>
>                We see issues in queries using “WHERE columnname =
> timestamp ‘2015-07-23 10:30:00’ “ it does not return any results.
>
>                If we use “WHERE columnname = ‘2015-07-23 10:30:00’ “ it
> returns ERROR
>
>                If use timestamp column in projection list, it truncates
> the timestamp part i.e. 2015-07-23 10:30:00 to 2015-07-23 00:00:00
>
>
>
> 2.      For measures with distinct count, it uses approximations with
> certain error rates, lowest of which is <1.22%. Does this guarantee that
> counts would be accurate ?
>
>                We have seen for a count of 1000 results as 982, 1000 etc.
>
>
>
> Thanks,
>
> Chetan
>
>
>

Re: Timestamp related issues

Posted by Li Yang <li...@apache.org>.
> 1.      Is there any issue with Timestamp/Date values ?

Timestamp testing is very limited on 1.x branch. All use cases I knew about
uses date instead of timestamp.
The 2.x branch has much better timestamp support.

> 2.      For measures with distinct count, it uses approximations with
certain error rates, lowest of which is <1.22%. Does this guarantee that
counts would be accurate ?

The short answer is no 100% guarantee. The count distinct algorithm behind
this is HyperLogLog [1]. Its error follows a normal distribution. The "<
1.22%" is brief of saying for 99.7% out of all the results, the error is
<1.22% in theory. And there's still 0.3% results could go beyond the error.

[1] https://en.wikipedia.org/wiki/HyperLogLog

On Tue, Oct 27, 2015 at 12:45 PM, Chetan Dixit <Ch...@symantec.com>
wrote:

> Hello Kylin Team,
>
>
>
> We are facing following issues while using Kylin could you please help.
>
>
>
> 1.      Is there any issue with Timestamp/Date values ?
>
>                We see issues in queries using “WHERE columnname =
> timestamp ‘2015-07-23 10:30:00’ “ it does not return any results.
>
>                If we use “WHERE columnname = ‘2015-07-23 10:30:00’ “ it
> returns ERROR
>
>                If use timestamp column in projection list, it truncates
> the timestamp part i.e. 2015-07-23 10:30:00 to 2015-07-23 00:00:00
>
>
>
> 2.      For measures with distinct count, it uses approximations with
> certain error rates, lowest of which is <1.22%. Does this guarantee that
> counts would be accurate ?
>
>                We have seen for a count of 1000 results as 982, 1000 etc.
>
>
>
> Thanks,
>
> Chetan
>
>
>