You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by sh...@yahoo.in.INVALID on 2017/03/09 17:42:53 UTC

Question Regrading Cube Query Time

Hello,
I am doing a POC on kylin Cubes, I have built a Cube on TPC-DS data (~40GB). The build was successful, but i am facing issues with queries. Simple aggregation queries are returning results in sub seconds, but queries with order by/group by taking too much time. In first place, queries were failing with timeout error because of records scan threshold, i then increased "kylin.query.scan.threshold" value in kylin.properties. The threshold error got fixed, but queries were taking around 200 sec. Which is totally not acceptable because HIVE was returning result in 10 seconds for the same query. I am attaching one of the query(standard TPC-DS query q3) i am trying to run,
SELECT date_dim.d_year,item.i_brand_id, item.i_brand,sum(facttable.ss_ext_discount_amt) sum_agg FROM store_sales facttableINNER JOIN date_dim date_dim ON (facttable.ss_sold_date_sk = date_dim.d_date_sk)INNER JOIN item item ON (facttable.ss_item_sk = item.i_item_sk) WHERE item.i_manufact_id = 783 and date_dim.d_moy = 11 GROUP BY date_dim.d_year, item.i_brand,item.i_brand_id ORDER BY date_dim.d_year,sum_agg DESC,item.i_brand_idLIMIT 100;
My cluster details are,10 nodes(each node has 32 cores, 64GB RAM) with hdp 2.5HBase 1.1.2.2.5.3.0-37 (fully distributed mode)

Just to investigate, i checked region server logs of all the nodes and found that during query execution only one region server was doing all the work while others were idle. And, my Cube's Hbase table was also showing 1 region count, So i tried changing following properties but still no luck.
kylin.hbase.hfile.size.gb=1kylin.hbase.region.count.min=8
Please let me know, if there is any other configuration needed in order to fix large query time.
Thanks

Re: Question Regrading Cube Query Time

Posted by shailesh prajapati <sh...@yahoo.in.INVALID>.

Hi All,
Thanks for replying. The problem has been resolved, queries are running in sub second now. Actually, i was using dimensions as "derived" instead of "normal".
Thanks 

    On Sunday, 19 March 2017 12:29 AM, Alberto Ramón <a....@gmail.com> wrote:
 

 Hi

can you try to rebuild cube with a new measure? TopN

2017-03-17 17:58 GMT+00:00 Li Yang <li...@apache.org>:

> You didn't mention the Kylin version. Seems to be 1.6 from the
> configuration property.
>
> The properties related to region number are (note names are slightly
> differently in 1.6):
>    kylin.storage.hbase.region-cut-gb=5
>    kylin.storage.hbase.min-region-count=1
>    kylin.storage.hbase.max-region-count=500
>
> As to the query, it is a simple OLAP query and should be lightening fast if
> you got the right cube and model. This talk on Apache Kylin 2.0 touches a
> bit about TPC-H on Kylin, which may give ideas.
>
> The rowkey order also impact as HBase does not have secondary index. You
> want "d_moy" and "i_manufact_id" be at (or near) the head of rowkey to get
> best performance of this query.
>
> If you still have problem, there are some online tuning tools for Kylin
> that you can try.
>
> Cheers
> Yang
>
>
> On Fri, Mar 10, 2017 at 1:42 AM, <sh...@yahoo.in.invalid>
> wrote:
>
> > Hello,
> > I am doing a POC on kylin Cubes, I have built a Cube on TPC-DS data
> > (~40GB). The build was successful, but i am facing issues with queries.
> > Simple aggregation queries are returning results in sub seconds, but
> > queries with order by/group by taking too much time. In first place,
> > queries were failing with timeout error because of records scan
> threshold,
> > i then increased "kylin.query.scan.threshold" value in kylin.properties.
> > The threshold error got fixed, but queries were taking around 200 sec.
> > Which is totally not acceptable because HIVE was returning result in 10
> > seconds for the same query. I am attaching one of the query(standard
> TPC-DS
> > query q3) i am trying to run,
> > SELECT date_dim.d_year,item.i_brand_id, item.i_brand,sum(facttable.ss_
> ext_discount_amt)
> > sum_agg FROM store_sales facttableINNER JOIN date_dim date_dim ON
> > (facttable.ss_sold_date_sk = date_dim.d_date_sk)INNER JOIN item item ON
> > (facttable.ss_item_sk = item.i_item_sk) WHERE item.i_manufact_id =
> > 783 and date_dim.d_moy = 11 GROUP BY date_dim.d_year,
> item.i_brand,item.i_brand_id ORDER
> > BY date_dim.d_year,sum_agg DESC,item.i_brand_idLIMIT 100;
> > My cluster details are,10 nodes(each node has 32 cores, 64GB RAM) with
> hdp
> > 2.5HBase 1.1.2.2.5.3.0-37 (fully distributed mode)
> >
> > Just to investigate, i checked region server logs of all the nodes and
> > found that during query execution only one region server was doing all
> the
> > work while others were idle. And, my Cube's Hbase table was also showing
> 1
> > region count, So i tried changing following properties but still no luck.
> > kylin.hbase.hfile.size.gb=1kylin.hbase.region.count.min=8
> > Please let me know, if there is any other configuration needed in order
> to
> > fix large query time.
> > Thanks
> >
> >
>

Re: Question Regrading Cube Query Time

Posted by Alberto Ramón <a....@gmail.com>.

Hi

can you try to rebuild cube with a new measure? TopN

2017-03-17 17:58 GMT+00:00 Li Yang <li...@apache.org>:

> You didn't mention the Kylin version. Seems to be 1.6 from the
> configuration property.
>
> The properties related to region number are (note names are slightly
> differently in 1.6):
>     kylin.storage.hbase.region-cut-gb=5
>     kylin.storage.hbase.min-region-count=1
>     kylin.storage.hbase.max-region-count=500
>
> As to the query, it is a simple OLAP query and should be lightening fast if
> you got the right cube and model. This talk on Apache Kylin 2.0 touches a
> bit about TPC-H on Kylin, which may give ideas.
>
> The rowkey order also impact as HBase does not have secondary index. You
> want "d_moy" and "i_manufact_id" be at (or near) the head of rowkey to get
> best performance of this query.
>
> If you still have problem, there are some online tuning tools for Kylin
> that you can try.
>
> Cheers
> Yang
>
>
> On Fri, Mar 10, 2017 at 1:42 AM, <sh...@yahoo.in.invalid>
> wrote:
>
> > Hello,
> > I am doing a POC on kylin Cubes, I have built a Cube on TPC-DS data
> > (~40GB). The build was successful, but i am facing issues with queries.
> > Simple aggregation queries are returning results in sub seconds, but
> > queries with order by/group by taking too much time. In first place,
> > queries were failing with timeout error because of records scan
> threshold,
> > i then increased "kylin.query.scan.threshold" value in kylin.properties.
> > The threshold error got fixed, but queries were taking around 200 sec.
> > Which is totally not acceptable because HIVE was returning result in 10
> > seconds for the same query. I am attaching one of the query(standard
> TPC-DS
> > query q3) i am trying to run,
> > SELECT date_dim.d_year,item.i_brand_id, item.i_brand,sum(facttable.ss_
> ext_discount_amt)
> > sum_agg FROM store_sales facttableINNER JOIN date_dim date_dim ON
> > (facttable.ss_sold_date_sk = date_dim.d_date_sk)INNER JOIN item item ON
> > (facttable.ss_item_sk = item.i_item_sk) WHERE item.i_manufact_id =
> > 783 and date_dim.d_moy = 11 GROUP BY date_dim.d_year,
> item.i_brand,item.i_brand_id ORDER
> > BY date_dim.d_year,sum_agg DESC,item.i_brand_idLIMIT 100;
> > My cluster details are,10 nodes(each node has 32 cores, 64GB RAM) with
> hdp
> > 2.5HBase 1.1.2.2.5.3.0-37 (fully distributed mode)
> >
> > Just to investigate, i checked region server logs of all the nodes and
> > found that during query execution only one region server was doing all
> the
> > work while others were idle. And, my Cube's Hbase table was also showing
> 1
> > region count, So i tried changing following properties but still no luck.
> > kylin.hbase.hfile.size.gb=1kylin.hbase.region.count.min=8
> > Please let me know, if there is any other configuration needed in order
> to
> > fix large query time.
> > Thanks
> >
> >
>

Re: Question Regrading Cube Query Time

Posted by Li Yang <li...@apache.org>.

You didn't mention the Kylin version. Seems to be 1.6 from the
configuration property.

The properties related to region number are (note names are slightly
differently in 1.6):
    kylin.storage.hbase.region-cut-gb=5
    kylin.storage.hbase.min-region-count=1
    kylin.storage.hbase.max-region-count=500

As to the query, it is a simple OLAP query and should be lightening fast if
you got the right cube and model. This talk on Apache Kylin 2.0 touches a
bit about TPC-H on Kylin, which may give ideas.

The rowkey order also impact as HBase does not have secondary index. You
want "d_moy" and "i_manufact_id" be at (or near) the head of rowkey to get
best performance of this query.

If you still have problem, there are some online tuning tools for Kylin
that you can try.

Cheers
Yang


On Fri, Mar 10, 2017 at 1:42 AM, <sh...@yahoo.in.invalid>
wrote:

> Hello,
> I am doing a POC on kylin Cubes, I have built a Cube on TPC-DS data
> (~40GB). The build was successful, but i am facing issues with queries.
> Simple aggregation queries are returning results in sub seconds, but
> queries with order by/group by taking too much time. In first place,
> queries were failing with timeout error because of records scan threshold,
> i then increased "kylin.query.scan.threshold" value in kylin.properties.
> The threshold error got fixed, but queries were taking around 200 sec.
> Which is totally not acceptable because HIVE was returning result in 10
> seconds for the same query. I am attaching one of the query(standard TPC-DS
> query q3) i am trying to run,
> SELECT date_dim.d_year,item.i_brand_id, item.i_brand,sum(facttable.ss_ext_discount_amt)
> sum_agg FROM store_sales facttableINNER JOIN date_dim date_dim ON
> (facttable.ss_sold_date_sk = date_dim.d_date_sk)INNER JOIN item item ON
> (facttable.ss_item_sk = item.i_item_sk) WHERE item.i_manufact_id =
> 783 and date_dim.d_moy = 11 GROUP BY date_dim.d_year, item.i_brand,item.i_brand_id ORDER
> BY date_dim.d_year,sum_agg DESC,item.i_brand_idLIMIT 100;
> My cluster details are,10 nodes(each node has 32 cores, 64GB RAM) with hdp
> 2.5HBase 1.1.2.2.5.3.0-37 (fully distributed mode)
>
> Just to investigate, i checked region server logs of all the nodes and
> found that during query execution only one region server was doing all the
> work while others were idle. And, my Cube's Hbase table was also showing 1
> region count, So i tried changing following properties but still no luck.
> kylin.hbase.hfile.size.gb=1kylin.hbase.region.count.min=8
> Please let me know, if there is any other configuration needed in order to
> fix large query time.
> Thanks
>
>