You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Francis Liang <so...@hotmail.com> on 2018/09/12 07:51:21 UTC

Cube segment targeting process

Hi:

I am having an issue regarding the process of targeting the correct segment of a cube. The issue is as below:

I have a cube built daily using the parttion field “DAY”, so the cube has the following segments:
20180101000000_20180102000000
20180102000000_20180103000000
…
20180929000000_20180930000000

When I submit a query with the filter: where “DAY” = 20180101, Kylin starts to look for the correct segment from the very beginning and after it targets the right segment, it continues the looking process until the end of all the segments. The logic works but it significantly downgrade the query performance. The following is the timeline summarized from the log:
13:54:00,951 start
13:54:00,975 finish cube selection (24ms)
13:54:01,004 query storage
13:54:01,019 target the right segment (44ms)

13:54:01,021 looking for other segments
13:54:01,423 finish looking for other segments (402ms)
13:54:01,431 return the result (8ms)

total: 478ms
looking for other segments (402ms)

Therefore, most of the time was wasted in looking for the unnecessary segments, which was 402ms. Then I have a few questions regarding this:

1.     When it targeted the right segment, why the segment looking process doesn’t stop?

2.     Even if it stops after targeting the right segment, the performance will still be bad if the needed segment is at the end of the segment span.

3.     It’s known that every segment is stored as a separate Hbase table, why can’t it target the right hbase table immediately if there are some mappings from segments to table names?

Many thanks for the reply and I really appreciate your help!

Best, Feng.

发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用


答复: Cube segment targeting process

Posted by Francis Liang <so...@hotmail.com>.
Hi Shaofeng, many thanks for your quick reply! My kylin version is 2.4.0-bin-cdh57 and do we have the timeline for the release of 2.5. Thanks again! Best, Feng.

发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用

________________________________
发件人: ShaoFeng Shi <sh...@apache.org>
发送时间: Wednesday, September 12, 2018 5:43:39 PM
收件人: user
主题: Re: Cube segment targeting process

Hi Feng,

What's your Kylin version? There was a bug in segment pruning, not sure whether your version is affected.

In the coming v2.5, there will be an enhancement, which is to do the pruning by each dimension's (not just partition col) max/min value: KYLIN-3370; you can try that when 2.5 released.

Francis Liang <so...@hotmail.com>> 于2018年9月12日周三 下午3:51写道:
Hi:

I am having an issue regarding the process of targeting the correct segment of a cube. The issue is as below:

I have a cube built daily using the parttion field “DAY”, so the cube has the following segments:
20180101000000_20180102000000
20180102000000_20180103000000
…
20180929000000_20180930000000

When I submit a query with the filter: where “DAY” = 20180101, Kylin starts to look for the correct segment from the very beginning and after it targets the right segment, it continues the looking process until the end of all the segments. The logic works but it significantly downgrade the query performance. The following is the timeline summarized from the log:
13:54:00,951 start
13:54:00,975 finish cube selection (24ms)
13:54:01,004 query storage
13:54:01,019 target the right segment (44ms)

13:54:01,021 looking for other segments
13:54:01,423 finish looking for other segments (402ms)
13:54:01,431 return the result (8ms)

total: 478ms
looking for other segments (402ms)

Therefore, most of the time was wasted in looking for the unnecessary segments, which was 402ms. Then I have a few questions regarding this:

1.     When it targeted the right segment, why the segment looking process doesn’t stop?

2.     Even if it stops after targeting the right segment, the performance will still be bad if the needed segment is at the end of the segment span.

3.     It’s known that every segment is stored as a separate Hbase table, why can’t it target the right hbase table immediately if there are some mappings from segments to table names?

Many thanks for the reply and I really appreciate your help!

Best, Feng.

发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用



--
Best regards,

Shaofeng Shi 史少锋


Re: Cube segment targeting process

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Feng,

What's your Kylin version? There was a bug in segment pruning, not sure
whether your version is affected.

In the coming v2.5, there will be an enhancement, which is to do the
pruning by each dimension's (not just partition col) max/min value:
KYLIN-3370; you can try that when 2.5 released.

Francis Liang <so...@hotmail.com> 于2018年9月12日周三 下午3:51写道:

> Hi:
>
>
>
> I am having an issue regarding the process of targeting the correct
> segment of a cube. The issue is as below:
>
>
>
> I have a cube built daily using the parttion field “DAY”, so the cube has
> the following segments:
>
> 20180101000000_20180102000000
>
> 20180102000000_20180103000000
>
> …
>
> 20180929000000_20180930000000
>
>
>
> When I submit a query with the filter: where “DAY” = 20180101, Kylin
> starts to look for the correct segment from the very beginning and after it
> targets the right segment, it continues the looking process until the end
> of all the segments. The logic works but it significantly downgrade the
> query performance. The following is the timeline summarized from the log:
>
> 13:54:00,951 start
>
> 13:54:00,975 finish cube selection (24ms)
>
> 13:54:01,004 query storage
>
> 13:54:01,019 target the right segment (44ms)
>
>
>
> 13:54:01,021 looking for other segments
>
> 13:54:01,423 finish looking for other segments (402ms)
>
> 13:54:01,431 return the result (8ms)
>
>
>
> total: 478ms
>
> looking for other segments (402ms)
>
>
>
> Therefore, most of the time was wasted in looking for the unnecessary
> segments, which was 402ms. Then I have a few questions regarding this:
>
> 1.     When it targeted the right segment, why the segment looking
> process doesn’t stop?
>
> 2.     Even if it stops after targeting the right segment, the
> performance will still be bad if the needed segment is at the end of the
> segment span.
>
> 3.     It’s known that every segment is stored as a separate Hbase table,
> why can’t it target the right hbase table immediately if there are some
> mappings from segments to table names?
>
>
>
> Many thanks for the reply and I really appreciate your help!
>
>
>
> Best, Feng.
>
>
>
> 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋