You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Luca Costabello <lu...@gmail.com> on 2015/06/16 13:16:08 UTC

Cache misses

Hello all,

I am running the 0.7.1-incubating, installed from the release binary.

Currently, repeated executions of certain queries result in cache hits (as
I expected)
Example:
If I execute twice this query, I get results served from cache the second
time ("hitCache": true):

SELECT column_1, column_2
FROM FACT_TABLE
INNER JOIN DIMENSION_TABLE on FACT_TABLE.id = DIMENSION_TABLE.id
WHERE date_ < date'2014-08-31'
GROUP BY column_1, column_2

On the other hand, repeated execution of the query below never lead to
cache hits (i.e. I always end up with "hitCache": false).

SELECT count(*) as total_count
FROM FACT_TABLE
WHERE date_ < date'2014-08-31'

My use case would highly benefit from extensive caching, since I have few
heavy repeated queries to issue to the system.
I am not familiar with current Kylin cache strategy, and I was wondering if
someone could give me some hints.
Besides, I have not found any cache-related parameter in property files,
aside from kylin.query.cache.enabled in conf/kylin.properties
(kylin.query.cache.enabled is set to true in my case)

Thanks

luca

Re: Cache misses

Posted by Luca Costabello <lu...@gmail.com>.
Thanks for the heads-up!

As I am running Kylin in sandbox mode (kylin.sandbox=true), I also had to
increase the size of ehcache cache: by default all "maxBytesLocalHeap"
values were set to 1M in ehcache-test.xml (which is ehcache config file
used in sandbox mode), so queries that led to bigger payloads always
skipped the cache.

Cheers,

luca


On Wed, Jun 17, 2015 at 3:09 AM, hongbin ma <ma...@apache.org> wrote:

> the default values for these two param are:
>
> https://github.com/KylinOLAP/Kylin/blob/124121764a3eb0652032c8add97f02708aa5fd3a/common/src/main/java/org/apache/kylin/common/KylinConfig.java#L445
>
> On Wed, Jun 17, 2015 at 10:08 AM, hongbin ma <ma...@apache.org> wrote:
>
> > hi Luca,
> >
> > Kylin selectively caches queries those
> > 1. take a lot of time to execute, or
> > 2. scanned a lot of HBase rows
> > The logic is in
> >
> https://github.com/KylinOLAP/Kylin/blob/0.7.1/server/src/main/java/org/apache/kylin/rest/controller/QueryController.java#L209
> >
> > In other words, Kylin only caches slow queries. Caching all the query
> > results regardless of its cost  is not worth the effort. You can adjust
> > these two parameters kylin.query.cache.threshold.duration
> > and kylin.query.cache.threshold.scancount in kylin.properties to change
> its
> > behavior
> >
> >
> > On Tue, Jun 16, 2015 at 7:16 PM, Luca Costabello <
> > luca.costabello@gmail.com> wrote:
> >
> >> Hello all,
> >>
> >> I am running the 0.7.1-incubating, installed from the release binary.
> >>
> >> Currently, repeated executions of certain queries result in cache hits
> (as
> >> I expected)
> >> Example:
> >> If I execute twice this query, I get results served from cache the
> second
> >> time ("hitCache": true):
> >>
> >> SELECT column_1, column_2
> >> FROM FACT_TABLE
> >> INNER JOIN DIMENSION_TABLE on FACT_TABLE.id = DIMENSION_TABLE.id
> >> WHERE date_ < date'2014-08-31'
> >> GROUP BY column_1, column_2
> >>
> >> On the other hand, repeated execution of the query below never lead to
> >> cache hits (i.e. I always end up with "hitCache": false).
> >>
> >> SELECT count(*) as total_count
> >> FROM FACT_TABLE
> >> WHERE date_ < date'2014-08-31'
> >>
> >> My use case would highly benefit from extensive caching, since I have
> few
> >> heavy repeated queries to issue to the system.
> >> I am not familiar with current Kylin cache strategy, and I was wondering
> >> if
> >> someone could give me some hints.
> >> Besides, I have not found any cache-related parameter in property files,
> >> aside from kylin.query.cache.enabled in conf/kylin.properties
> >> (kylin.query.cache.enabled is set to true in my case)
> >>
> >> Thanks
> >>
> >> luca
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Cache misses

Posted by hongbin ma <ma...@apache.org>.
the default values for these two param are:
https://github.com/KylinOLAP/Kylin/blob/124121764a3eb0652032c8add97f02708aa5fd3a/common/src/main/java/org/apache/kylin/common/KylinConfig.java#L445

On Wed, Jun 17, 2015 at 10:08 AM, hongbin ma <ma...@apache.org> wrote:

> hi Luca,
>
> Kylin selectively caches queries those
> 1. take a lot of time to execute, or
> 2. scanned a lot of HBase rows
> The logic is in
> https://github.com/KylinOLAP/Kylin/blob/0.7.1/server/src/main/java/org/apache/kylin/rest/controller/QueryController.java#L209
>
> In other words, Kylin only caches slow queries. Caching all the query
> results regardless of its cost  is not worth the effort. You can adjust
> these two parameters kylin.query.cache.threshold.duration
> and kylin.query.cache.threshold.scancount in kylin.properties to change its
> behavior
>
>
> On Tue, Jun 16, 2015 at 7:16 PM, Luca Costabello <
> luca.costabello@gmail.com> wrote:
>
>> Hello all,
>>
>> I am running the 0.7.1-incubating, installed from the release binary.
>>
>> Currently, repeated executions of certain queries result in cache hits (as
>> I expected)
>> Example:
>> If I execute twice this query, I get results served from cache the second
>> time ("hitCache": true):
>>
>> SELECT column_1, column_2
>> FROM FACT_TABLE
>> INNER JOIN DIMENSION_TABLE on FACT_TABLE.id = DIMENSION_TABLE.id
>> WHERE date_ < date'2014-08-31'
>> GROUP BY column_1, column_2
>>
>> On the other hand, repeated execution of the query below never lead to
>> cache hits (i.e. I always end up with "hitCache": false).
>>
>> SELECT count(*) as total_count
>> FROM FACT_TABLE
>> WHERE date_ < date'2014-08-31'
>>
>> My use case would highly benefit from extensive caching, since I have few
>> heavy repeated queries to issue to the system.
>> I am not familiar with current Kylin cache strategy, and I was wondering
>> if
>> someone could give me some hints.
>> Besides, I have not found any cache-related parameter in property files,
>> aside from kylin.query.cache.enabled in conf/kylin.properties
>> (kylin.query.cache.enabled is set to true in my case)
>>
>> Thanks
>>
>> luca
>>
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Cache misses

Posted by hongbin ma <ma...@apache.org>.
hi Luca,

Kylin selectively caches queries those
1. take a lot of time to execute, or
2. scanned a lot of HBase rows
The logic is in
https://github.com/KylinOLAP/Kylin/blob/0.7.1/server/src/main/java/org/apache/kylin/rest/controller/QueryController.java#L209

In other words, Kylin only caches slow queries. Caching all the query
results regardless of its cost  is not worth the effort. You can adjust
these two parameters kylin.query.cache.threshold.duration
and kylin.query.cache.threshold.scancount in kylin.properties to change its
behavior


On Tue, Jun 16, 2015 at 7:16 PM, Luca Costabello <lu...@gmail.com>
wrote:

> Hello all,
>
> I am running the 0.7.1-incubating, installed from the release binary.
>
> Currently, repeated executions of certain queries result in cache hits (as
> I expected)
> Example:
> If I execute twice this query, I get results served from cache the second
> time ("hitCache": true):
>
> SELECT column_1, column_2
> FROM FACT_TABLE
> INNER JOIN DIMENSION_TABLE on FACT_TABLE.id = DIMENSION_TABLE.id
> WHERE date_ < date'2014-08-31'
> GROUP BY column_1, column_2
>
> On the other hand, repeated execution of the query below never lead to
> cache hits (i.e. I always end up with "hitCache": false).
>
> SELECT count(*) as total_count
> FROM FACT_TABLE
> WHERE date_ < date'2014-08-31'
>
> My use case would highly benefit from extensive caching, since I have few
> heavy repeated queries to issue to the system.
> I am not familiar with current Kylin cache strategy, and I was wondering if
> someone could give me some hints.
> Besides, I have not found any cache-related parameter in property files,
> aside from kylin.query.cache.enabled in conf/kylin.properties
> (kylin.query.cache.enabled is set to true in my case)
>
> Thanks
>
> luca
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone