You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by Vadim Semenov <_...@databuryat.com> on 2015/10/27 05:03:10 UTC

KYLIN-747 bad query performance when IN clause contains a value doesn't exist in the dictionary

Hi everyone,

I'm trying to find the commit where this issue was fixed (https://issues.apache.org/jira/browse/KYLIN-747).
Could you point me?

We have a cube that is partitioned by date, and when we query using SELECT * FROM table WHERE dt = '2015-10-01', I see in the logs:
"Can't translate value 2015-10-01 to dictionary ID, roundingFlag 0. Using default value \xFF",
which translates into a huge scan.

Re: KYLIN-747 bad query performance when IN clause contains a value doesn't exist in the dictionary

Posted by Vadim Semenov <_...@databuryat.com>.

Thank you!

I just tested the changes with queries like:
SELECT dt, SUM(metric) FROM table WHERE dt IN ('2015-10-01', '2015-10-02') GROUP BY dt;
SELECT dt, SUM(metric) FROM table WHERE dt BETWEEN '2015-10-01' AND '2015-10-07' GROUP BY dt;

before every partition was scanned, after the changes only relevant partitions are scanned.

Very useful performance change, thanks again.

On October 28, 2015 at 4:02:19 AM, Li Yang (liyang@apache.org) wrote:

Em.. I searched commits on KYLIN-747 and get nothing too. Thought it was 
covered by fix to some other JIRA. 

Anyway, I cooked some test cases to query non-existing values, and made 
some further optimization. Now the ever-false scan range can be pruned. 

https://github.com/apache/incubator-kylin/commit/f96600e89a5a2c0e533cea86ad6a73b9451bcddc 

On Tue, Oct 27, 2015 at 12:03 PM, Vadim Semenov <_...@databuryat.com> wrote: 

> Hi everyone, 
> 
> I'm trying to find the commit where this issue was fixed ( 
> https://issues.apache.org/jira/browse/KYLIN-747). 
> Could you point me? 
> 
> We have a cube that is partitioned by date, and when we query using SELECT 
> * FROM table WHERE dt = '2015-10-01', I see in the logs: 
> "Can't translate value 2015-10-01 to dictionary ID, roundingFlag 0. Using 
> default value \xFF", 
> which translates into a huge scan.

Re: KYLIN-747 bad query performance when IN clause contains a value doesn't exist in the dictionary

Posted by Li Yang <li...@apache.org>.

Em.. I searched commits on KYLIN-747 and get nothing too. Thought it was
covered by fix to some other JIRA.

Anyway, I cooked some test cases to query non-existing values, and made
some further optimization. Now the ever-false scan range can be pruned.

https://github.com/apache/incubator-kylin/commit/f96600e89a5a2c0e533cea86ad6a73b9451bcddc

On Tue, Oct 27, 2015 at 12:03 PM, Vadim Semenov <_...@databuryat.com> wrote:

> Hi everyone,
>
> I'm trying to find the commit where this issue was fixed (
> https://issues.apache.org/jira/browse/KYLIN-747).
> Could you point me?
>
> We have a cube that is partitioned by date, and when we query using SELECT
> * FROM table WHERE dt = '2015-10-01', I see in the logs:
> "Can't translate value 2015-10-01 to dictionary ID, roundingFlag 0. Using
> default value \xFF",
> which translates into a huge scan.

Re: KYLIN-747 bad query performance when IN clause contains a value doesn't exist in the dictionary

Posted by Luke Han <lu...@gmail.com>.

the fixed version is v1.1. please try with latest released one.

Thanks.


Best Regards!
---------------------

Luke Han

On Tue, Oct 27, 2015 at 5:14 PM, hongbin ma <ma...@apache.org> wrote:

> @liyang
>
> On Tue, Oct 27, 2015 at 12:03 PM, Vadim Semenov <_...@databuryat.com> wrote:
>
> > Hi everyone,
> >
> > I'm trying to find the commit where this issue was fixed (
> > https://issues.apache.org/jira/browse/KYLIN-747).
> > Could you point me?
> >
> > We have a cube that is partitioned by date, and when we query using
> SELECT
> > * FROM table WHERE dt = '2015-10-01', I see in the logs:
> > "Can't translate value 2015-10-01 to dictionary ID, roundingFlag 0. Using
> > default value \xFF",
> > which translates into a huge scan.
>
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: KYLIN-747 bad query performance when IN clause contains a value doesn't exist in the dictionary

Posted by hongbin ma <ma...@apache.org>.

@liyang

On Tue, Oct 27, 2015 at 12:03 PM, Vadim Semenov <_...@databuryat.com> wrote:

> Hi everyone,
>
> I'm trying to find the commit where this issue was fixed (
> https://issues.apache.org/jira/browse/KYLIN-747).
> Could you point me?
>
> We have a cube that is partitioned by date, and when we query using SELECT
> * FROM table WHERE dt = '2015-10-01', I see in the logs:
> "Can't translate value 2015-10-01 to dictionary ID, roundingFlag 0. Using
> default value \xFF",
> which translates into a huge scan.




-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone