You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by ShaoFeng Shi <sh...@apache.org> on 2018/04/01 01:57:23 UTC

Re: [Discuss] Lookup table improvement: support global/big lookup table

Thank you Ma Gang; This is a good proposal. Externalizing the lookup
snapshots will reduce the burden of Kylin query servers.

My only comment is, does this implementation support extension? You know
since Kylin 1.5, Kylin has the plug-in architecture, HBase is one
implementation for the storage. Kylin core modules don't directly depend on
HBase anymore;  so please take this into consideration when you implement
it.

2018-03-30 17:57 GMT+08:00 magang <mg...@163.com>:

> Hi all,
>
> There are two limitations for current lookup table design:
>
> 1. lookup table size is limited, because table snapshot need to be cached
> in
> Kylin server, too large snapshot table will break Kylin server, also the
> snapshot building may take very long time when it is too large.
> 2. each segment has its own lookup table snapshot, but some users may need
> a
> global snapshot table, which means when the global table is updated, the
> query for all segments need to reflect the change.
>
> To resolve the above limitations, I have created ticket:
> https://issues.apache.org/jira/browse/KYLIN-3221
> <https://issues.apache.org/jira/browse/KYLIN-3221>  , and put initial new
> design doc there, any comments and suggestions are welcome.
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: Re: Re: [Discuss] Lookup table improvement: support global/big lookup table

Posted by Billy Liu <bi...@apache.org>.
Thank you for the updates. Great.

With Warm regards

Billy Liu


2018-04-05 11:06 GMT+08:00 Ma Gang <mg...@163.com>:
> Thanks Billy, below is my answer for your questions:
> Is there a configuration item on Cube to tell which snapshot would be used?
>>> When user design the cube, there will be a configuration in the 'Advance Setting' tab to let user configure to use global snapshot or not, but user cannot change the snapshot reference after build(Maybe we can add it in the future, it is easy to implement but not sure it make sense or not).
>
>
> Could the segment level snapshot be refreshed independently?
>>> Very good question. I think we can add a new button "refresh lookup table" at cube level, so that user can refresh snapshots for cube/segments independently.
>
>
> Could we show the snapshot size, build time, storage path on the GUI?
>>> Yes, this is in the design "Management" section, user can view all snapshots for each lookup table, and detail information for each snapshot, and which cube/segments use this snapshot
>
>
>
>
>
>
> At 2018-04-01 23:12:37, "Billy Liu" <bi...@apache.org> wrote:
>>Hi Magang,
>>
>>Very good proposal. I have one few questions here. In item 2, you
>>mentioned global level snapshot and segment level snapshot. This is
>>similar scenario with handling slowly change dimensions. Is there a
>>configuration item on Cube to tell which snapshot would be used? Could
>>the segment level snapshot be refreshed independently? Could we show
>>the snapshot size, build time, storage path on the GUI?
>>
>>With Warm regards
>>
>>Billy Liu
>>
>>
>>2018-04-01 18:19 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:
>>> Thank you, Gang! Looking forward to seeing this feature in Kylin.
>>>
>>> 2018-04-01 17:35 GMT+08:00 Ma Gang <mg...@163.com>:
>>>
>>>> Sure ShaoFeng, I add a storageType field in the SnapshotDesc to support
>>>> different materialized storages.
>>>> At 2018-04-01 09:57:23, "ShaoFeng Shi" <sh...@apache.org> wrote:
>>>> >Thank you Ma Gang; This is a good proposal. Externalizing the lookup
>>>> >snapshots will reduce the burden of Kylin query servers.
>>>> >
>>>> >My only comment is, does this implementation support extension? You know
>>>> >since Kylin 1.5, Kylin has the plug-in architecture, HBase is one
>>>> >implementation for the storage. Kylin core modules don't directly depend
>>>> on
>>>> >HBase anymore;  so please take this into consideration when you implement
>>>> >it.
>>>> >
>>>> >2018-03-30 17:57 GMT+08:00 magang <mg...@163.com>:
>>>> >
>>>> >> Hi all,
>>>> >>
>>>> >> There are two limitations for current lookup table design:
>>>> >>
>>>> >> 1. lookup table size is limited, because table snapshot need to be
>>>> cached
>>>> >> in
>>>> >> Kylin server, too large snapshot table will break Kylin server, also the
>>>> >> snapshot building may take very long time when it is too large.
>>>> >> 2. each segment has its own lookup table snapshot, but some users may
>>>> need
>>>> >> a
>>>> >> global snapshot table, which means when the global table is updated, the
>>>> >> query for all segments need to reflect the change.
>>>> >>
>>>> >> To resolve the above limitations, I have created ticket:
>>>> >> https://issues.apache.org/jira/browse/KYLIN-3221
>>>> >> <https://issues.apache.org/jira/browse/KYLIN-3221>  , and put initial
>>>> new
>>>> >> design doc there, any comments and suggestions are welcome.
>>>> >>
>>>> >> --
>>>> >> Sent from: http://apache-kylin.74782.x6.nabble.com/
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> >--
>>>> >Best regards,
>>>> >
>>>> >Shaofeng Shi 史少锋
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋

Re:Re: Re: [Discuss] Lookup table improvement: support global/big lookup table

Posted by Ma Gang <mg...@163.com>.
Thanks Billy, below is my answer for your questions:
Is there a configuration item on Cube to tell which snapshot would be used? 
>> When user design the cube, there will be a configuration in the 'Advance Setting' tab to let user configure to use global snapshot or not, but user cannot change the snapshot reference after build(Maybe we can add it in the future, it is easy to implement but not sure it make sense or not).


Could the segment level snapshot be refreshed independently? 
>> Very good question. I think we can add a new button "refresh lookup table" at cube level, so that user can refresh snapshots for cube/segments independently.


Could we show the snapshot size, build time, storage path on the GUI?
>> Yes, this is in the design "Management" section, user can view all snapshots for each lookup table, and detail information for each snapshot, and which cube/segments use this snapshot






At 2018-04-01 23:12:37, "Billy Liu" <bi...@apache.org> wrote:
>Hi Magang,
>
>Very good proposal. I have one few questions here. In item 2, you
>mentioned global level snapshot and segment level snapshot. This is
>similar scenario with handling slowly change dimensions. Is there a
>configuration item on Cube to tell which snapshot would be used? Could
>the segment level snapshot be refreshed independently? Could we show
>the snapshot size, build time, storage path on the GUI?
>
>With Warm regards
>
>Billy Liu
>
>
>2018-04-01 18:19 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:
>> Thank you, Gang! Looking forward to seeing this feature in Kylin.
>>
>> 2018-04-01 17:35 GMT+08:00 Ma Gang <mg...@163.com>:
>>
>>> Sure ShaoFeng, I add a storageType field in the SnapshotDesc to support
>>> different materialized storages.
>>> At 2018-04-01 09:57:23, "ShaoFeng Shi" <sh...@apache.org> wrote:
>>> >Thank you Ma Gang; This is a good proposal. Externalizing the lookup
>>> >snapshots will reduce the burden of Kylin query servers.
>>> >
>>> >My only comment is, does this implementation support extension? You know
>>> >since Kylin 1.5, Kylin has the plug-in architecture, HBase is one
>>> >implementation for the storage. Kylin core modules don't directly depend
>>> on
>>> >HBase anymore;  so please take this into consideration when you implement
>>> >it.
>>> >
>>> >2018-03-30 17:57 GMT+08:00 magang <mg...@163.com>:
>>> >
>>> >> Hi all,
>>> >>
>>> >> There are two limitations for current lookup table design:
>>> >>
>>> >> 1. lookup table size is limited, because table snapshot need to be
>>> cached
>>> >> in
>>> >> Kylin server, too large snapshot table will break Kylin server, also the
>>> >> snapshot building may take very long time when it is too large.
>>> >> 2. each segment has its own lookup table snapshot, but some users may
>>> need
>>> >> a
>>> >> global snapshot table, which means when the global table is updated, the
>>> >> query for all segments need to reflect the change.
>>> >>
>>> >> To resolve the above limitations, I have created ticket:
>>> >> https://issues.apache.org/jira/browse/KYLIN-3221
>>> >> <https://issues.apache.org/jira/browse/KYLIN-3221>  , and put initial
>>> new
>>> >> design doc there, any comments and suggestions are welcome.
>>> >>
>>> >> --
>>> >> Sent from: http://apache-kylin.74782.x6.nabble.com/
>>> >>
>>> >
>>> >
>>> >
>>> >--
>>> >Best regards,
>>> >
>>> >Shaofeng Shi 史少锋
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋

Re: Re: [Discuss] Lookup table improvement: support global/big lookup table

Posted by Billy Liu <bi...@apache.org>.
Hi Magang,

Very good proposal. I have one few questions here. In item 2, you
mentioned global level snapshot and segment level snapshot. This is
similar scenario with handling slowly change dimensions. Is there a
configuration item on Cube to tell which snapshot would be used? Could
the segment level snapshot be refreshed independently? Could we show
the snapshot size, build time, storage path on the GUI?

With Warm regards

Billy Liu


2018-04-01 18:19 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:
> Thank you, Gang! Looking forward to seeing this feature in Kylin.
>
> 2018-04-01 17:35 GMT+08:00 Ma Gang <mg...@163.com>:
>
>> Sure ShaoFeng, I add a storageType field in the SnapshotDesc to support
>> different materialized storages.
>> At 2018-04-01 09:57:23, "ShaoFeng Shi" <sh...@apache.org> wrote:
>> >Thank you Ma Gang; This is a good proposal. Externalizing the lookup
>> >snapshots will reduce the burden of Kylin query servers.
>> >
>> >My only comment is, does this implementation support extension? You know
>> >since Kylin 1.5, Kylin has the plug-in architecture, HBase is one
>> >implementation for the storage. Kylin core modules don't directly depend
>> on
>> >HBase anymore;  so please take this into consideration when you implement
>> >it.
>> >
>> >2018-03-30 17:57 GMT+08:00 magang <mg...@163.com>:
>> >
>> >> Hi all,
>> >>
>> >> There are two limitations for current lookup table design:
>> >>
>> >> 1. lookup table size is limited, because table snapshot need to be
>> cached
>> >> in
>> >> Kylin server, too large snapshot table will break Kylin server, also the
>> >> snapshot building may take very long time when it is too large.
>> >> 2. each segment has its own lookup table snapshot, but some users may
>> need
>> >> a
>> >> global snapshot table, which means when the global table is updated, the
>> >> query for all segments need to reflect the change.
>> >>
>> >> To resolve the above limitations, I have created ticket:
>> >> https://issues.apache.org/jira/browse/KYLIN-3221
>> >> <https://issues.apache.org/jira/browse/KYLIN-3221>  , and put initial
>> new
>> >> design doc there, any comments and suggestions are welcome.
>> >>
>> >> --
>> >> Sent from: http://apache-kylin.74782.x6.nabble.com/
>> >>
>> >
>> >
>> >
>> >--
>> >Best regards,
>> >
>> >Shaofeng Shi 史少锋
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋

Re: Re: [Discuss] Lookup table improvement: support global/big lookup table

Posted by ShaoFeng Shi <sh...@apache.org>.
Thank you, Gang! Looking forward to seeing this feature in Kylin.

2018-04-01 17:35 GMT+08:00 Ma Gang <mg...@163.com>:

> Sure ShaoFeng, I add a storageType field in the SnapshotDesc to support
> different materialized storages.
> At 2018-04-01 09:57:23, "ShaoFeng Shi" <sh...@apache.org> wrote:
> >Thank you Ma Gang; This is a good proposal. Externalizing the lookup
> >snapshots will reduce the burden of Kylin query servers.
> >
> >My only comment is, does this implementation support extension? You know
> >since Kylin 1.5, Kylin has the plug-in architecture, HBase is one
> >implementation for the storage. Kylin core modules don't directly depend
> on
> >HBase anymore;  so please take this into consideration when you implement
> >it.
> >
> >2018-03-30 17:57 GMT+08:00 magang <mg...@163.com>:
> >
> >> Hi all,
> >>
> >> There are two limitations for current lookup table design:
> >>
> >> 1. lookup table size is limited, because table snapshot need to be
> cached
> >> in
> >> Kylin server, too large snapshot table will break Kylin server, also the
> >> snapshot building may take very long time when it is too large.
> >> 2. each segment has its own lookup table snapshot, but some users may
> need
> >> a
> >> global snapshot table, which means when the global table is updated, the
> >> query for all segments need to reflect the change.
> >>
> >> To resolve the above limitations, I have created ticket:
> >> https://issues.apache.org/jira/browse/KYLIN-3221
> >> <https://issues.apache.org/jira/browse/KYLIN-3221>  , and put initial
> new
> >> design doc there, any comments and suggestions are welcome.
> >>
> >> --
> >> Sent from: http://apache-kylin.74782.x6.nabble.com/
> >>
> >
> >
> >
> >--
> >Best regards,
> >
> >Shaofeng Shi 史少锋
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re:Re: [Discuss] Lookup table improvement: support global/big lookup table

Posted by Ma Gang <mg...@163.com>.
Sure ShaoFeng, I add a storageType field in the SnapshotDesc to support different materialized storages.
At 2018-04-01 09:57:23, "ShaoFeng Shi" <sh...@apache.org> wrote:
>Thank you Ma Gang; This is a good proposal. Externalizing the lookup
>snapshots will reduce the burden of Kylin query servers.
>
>My only comment is, does this implementation support extension? You know
>since Kylin 1.5, Kylin has the plug-in architecture, HBase is one
>implementation for the storage. Kylin core modules don't directly depend on
>HBase anymore;  so please take this into consideration when you implement
>it.
>
>2018-03-30 17:57 GMT+08:00 magang <mg...@163.com>:
>
>> Hi all,
>>
>> There are two limitations for current lookup table design:
>>
>> 1. lookup table size is limited, because table snapshot need to be cached
>> in
>> Kylin server, too large snapshot table will break Kylin server, also the
>> snapshot building may take very long time when it is too large.
>> 2. each segment has its own lookup table snapshot, but some users may need
>> a
>> global snapshot table, which means when the global table is updated, the
>> query for all segments need to reflect the change.
>>
>> To resolve the above limitations, I have created ticket:
>> https://issues.apache.org/jira/browse/KYLIN-3221
>> <https://issues.apache.org/jira/browse/KYLIN-3221>  , and put initial new
>> design doc there, any comments and suggestions are welcome.
>>
>> --
>> Sent from: http://apache-kylin.74782.x6.nabble.com/
>>
>
>
>
>-- 
>Best regards,
>
>Shaofeng Shi 史少锋