You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Dayue Gao <da...@163.com> on 2015/08/24 23:12:33 UTC

about CubeController.updateCubeDesc

Hi developers,

When I was working on https://issues.apache.org/jira/browse/KYLIN-958 <https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult to implement CubeController.updateCubeDesc. The problems are

1. CubeDesc.calculateSignature only include fact table name and partition desc as data model information

This means if user changes lookup tables or filter condition, cube desc signature won't change and kylin will not clear already built cube segments. BTW, why do we store signature in metadata rather than calculate it on demands? I know it may be an optimization to avoid recalculating signature every time, however desc changing shouldn't be a regular operation, so persisting signature won't give us too much benefit. What's more, once it's been recorded in metadata, it makes us difficult to change the computing logic.

2. Maintain metadata consistency

This is a more general problem. As we have separated metadata into different files (cube, cube_desc, model_desc, project, etc) and maintaining consistency across these files is not an easy task in both FileResourceStore and HBaseResourceStore, IMO we'd better avoid operations that change multiple metadata files as much as possible. "CubeController.updateCubeDesc" is a notable counter-example. In order to complete this operation, a sequence of metadata updates (model_desc -> cube -> cube_desc -> cube -> project) is performed. Make sure "CubeController.updateCubeDesc" won't leave metadata in half success state is not easy.

Given all these difficulties, do we really need to allow user to change data model? Can we just make data model immutable and only allow user to change cube desc? Immutable or versioned metadata is always good in my experience, so a further question is can we make key parts (properties that defines how cube was built, excluding description, notify_list for example) of cube desc also immutable and just make a shortcut in front-end to let user create new cube desc based on existing one?

Best,
Dayue

Re: about CubeController.updateCubeDesc

Posted by Dayue Gao <da...@163.com>.
Cool!

Had a nice talk with Shaofeng and Li Yang this afternoon, I will fix this jira soon.

Best,
Dayue


> 在 2015年8月25日,下午2:46,Luke Han <lu...@gmail.com> 写道:
> 
> Hi Dayue,
>    You are right, metadata is the key part of a system.
>    For KYLIN-958, you could apply any workaround for short term. for long
> term purpose, we will go through current implementation and try to fix with
> right approach to avoid conflict.
> 
>    Underling storage is not an issue, actually we just migrated from MySQL
> to HBase in early 0.6 version, to remove one more dependency. I think the
> metadata storage already be extracted as interface, should be easy to add
> other storage again if necessary.
> 
>    Thanks.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Best Regards!
> ---------------------
> 
> Luke Han
> 
> On Tue, Aug 25, 2015 at 11:35 AM, Dayue Gao <da...@163.com> wrote:
> 
>> Metadata consistency is one of the most crucial things for many systems.
>> 
>> So in the short run, to fix KYLIN-958, I suggest disallowing user to
>> update data model. Even so, user can still create new data model to fulfill
>> their needs.
>> 
>> In the long run, I'd suggest migrating metadata persistence from NoSQL
>> like HBase to a transactional database like MySQL. Although lots of work
>> need to be done, it will make keeping metadata consistency a lot easier.
>> 
>> What do you think?
>> 
>> Best,
>> Dayue
>> 
>>> 在 2015年8月25日,上午11:11,Li Yang <li...@apache.org> 写道:
>>> 
>>> Dayue has a good point. Although updating multiple resources in one
>> request
>>> is doable but the complexity does not worth the effort.
>>> 
>>> Making model desc and cube desc immutable is a good idea. And we can
>> still
>>> implement "update" by first delete the old model and cube, then create
>> new
>>> ones with the same name. So from user point of view, it looks like an
>>> update. This work around should do well on 0.7 branch where model and
>> cube
>>> are 1-1 strictly.
>>> 
>>> The reason model and cube are separate resource is because in 0.8 branch,
>>> they are 1-m relationship. User can create a model and create multiple
>>> cubes on it.
>>> 
>>> On Tue, Aug 25, 2015 at 10:31 AM, hongbin ma <ma...@apache.org>
>> wrote:
>>> 
>>>> hi dayue,
>>>> 
>>>> I'll agree with you. Current cube desc/model desc design is a result of
>>>> multiple rounds of re-designing, and it may failed to take maintenance
>>>> convenience into well consideration. And to be honest it's quite complex
>>>> now, especially when involved with cube/model updates.
>>>> 
>>>> Making cubes/models immutable looks appealing to me. However we might
>> need
>>>> some more front end work to reduce cube/model recreate overhead for
>> users.
>>>> 
>>>> @liyang and @luke  will you please comment on this?
>>>> 
>>>> On Tue, Aug 25, 2015 at 5:12 AM, Dayue Gao <da...@163.com> wrote:
>>>> 
>>>>> Hi developers,
>>>>> 
>>>>> When I was working on https://issues.apache.org/jira/browse/KYLIN-958
>> <
>>>>> https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult
>>>> to
>>>>> implement CubeController.updateCubeDesc. The problems are
>>>>> 
>>>>> 1. CubeDesc.calculateSignature only include fact table name and
>> partition
>>>>> desc as data model information
>>>>> 
>>>>> This means if user changes lookup tables or filter condition, cube desc
>>>>> signature won't change and kylin will not clear already built cube
>>>>> segments. BTW, why do we store signature in metadata rather than
>>>> calculate
>>>>> it on demands? I know it may be an optimization to avoid recalculating
>>>>> signature every time, however desc changing shouldn't be a regular
>>>>> operation, so persisting signature won't give us too much benefit.
>> What's
>>>>> more, once it's been recorded in metadata, it makes us difficult to
>>>> change
>>>>> the computing logic.
>>>>> 
>>>>> 2. Maintain metadata consistency
>>>>> 
>>>>> This is a more general problem. As we have separated metadata into
>>>>> different files (cube, cube_desc, model_desc, project, etc) and
>>>> maintaining
>>>>> consistency across these files is not an easy task in both
>>>>> FileResourceStore and HBaseResourceStore, IMO we'd better avoid
>>>> operations
>>>>> that change multiple metadata files as much as possible.
>>>>> "CubeController.updateCubeDesc" is a notable counter-example. In order
>> to
>>>>> complete this operation, a sequence of metadata updates (model_desc ->
>>>> cube
>>>>> -> cube_desc -> cube -> project) is performed. Make sure
>>>>> "CubeController.updateCubeDesc" won't leave metadata in half success
>>>> state
>>>>> is not easy.
>>>>> 
>>>>> Given all these difficulties, do we really need to allow user to change
>>>>> data model? Can we just make data model immutable and only allow user
>> to
>>>>> change cube desc? Immutable or versioned metadata is always good in my
>>>>> experience, so a further question is can we make key parts (properties
>>>> that
>>>>> defines how cube was built, excluding description, notify_list for
>>>> example)
>>>>> of cube desc also immutable and just make a shortcut in front-end to
>> let
>>>>> user create new cube desc based on existing one?
>>>>> 
>>>>> Best,
>>>>> Dayue
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> 
>>>> *Bin Mahone | 马洪宾*
>>>> Apache Kylin: http://kylin.io
>>>> Github: https://github.com/binmahone
>>>> 
>> 
>> 
>> 



Re: about CubeController.updateCubeDesc

Posted by Luke Han <lu...@gmail.com>.
Hi Dayue,
    You are right, metadata is the key part of a system.
    For KYLIN-958, you could apply any workaround for short term. for long
term purpose, we will go through current implementation and try to fix with
right approach to avoid conflict.

    Underling storage is not an issue, actually we just migrated from MySQL
to HBase in early 0.6 version, to remove one more dependency. I think the
metadata storage already be extracted as interface, should be easy to add
other storage again if necessary.

    Thanks.









Best Regards!
---------------------

Luke Han

On Tue, Aug 25, 2015 at 11:35 AM, Dayue Gao <da...@163.com> wrote:

> Metadata consistency is one of the most crucial things for many systems.
>
> So in the short run, to fix KYLIN-958, I suggest disallowing user to
> update data model. Even so, user can still create new data model to fulfill
> their needs.
>
> In the long run, I'd suggest migrating metadata persistence from NoSQL
> like HBase to a transactional database like MySQL. Although lots of work
> need to be done, it will make keeping metadata consistency a lot easier.
>
> What do you think?
>
> Best,
> Dayue
>
> > 在 2015年8月25日,上午11:11,Li Yang <li...@apache.org> 写道:
> >
> > Dayue has a good point. Although updating multiple resources in one
> request
> > is doable but the complexity does not worth the effort.
> >
> > Making model desc and cube desc immutable is a good idea. And we can
> still
> > implement "update" by first delete the old model and cube, then create
> new
> > ones with the same name. So from user point of view, it looks like an
> > update. This work around should do well on 0.7 branch where model and
> cube
> > are 1-1 strictly.
> >
> > The reason model and cube are separate resource is because in 0.8 branch,
> > they are 1-m relationship. User can create a model and create multiple
> > cubes on it.
> >
> > On Tue, Aug 25, 2015 at 10:31 AM, hongbin ma <ma...@apache.org>
> wrote:
> >
> >> hi dayue,
> >>
> >> I'll agree with you. Current cube desc/model desc design is a result of
> >> multiple rounds of re-designing, and it may failed to take maintenance
> >> convenience into well consideration. And to be honest it's quite complex
> >> now, especially when involved with cube/model updates.
> >>
> >> Making cubes/models immutable looks appealing to me. However we might
> need
> >> some more front end work to reduce cube/model recreate overhead for
> users.
> >>
> >> @liyang and @luke  will you please comment on this?
> >>
> >> On Tue, Aug 25, 2015 at 5:12 AM, Dayue Gao <da...@163.com> wrote:
> >>
> >>> Hi developers,
> >>>
> >>> When I was working on https://issues.apache.org/jira/browse/KYLIN-958
> <
> >>> https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult
> >> to
> >>> implement CubeController.updateCubeDesc. The problems are
> >>>
> >>> 1. CubeDesc.calculateSignature only include fact table name and
> partition
> >>> desc as data model information
> >>>
> >>> This means if user changes lookup tables or filter condition, cube desc
> >>> signature won't change and kylin will not clear already built cube
> >>> segments. BTW, why do we store signature in metadata rather than
> >> calculate
> >>> it on demands? I know it may be an optimization to avoid recalculating
> >>> signature every time, however desc changing shouldn't be a regular
> >>> operation, so persisting signature won't give us too much benefit.
> What's
> >>> more, once it's been recorded in metadata, it makes us difficult to
> >> change
> >>> the computing logic.
> >>>
> >>> 2. Maintain metadata consistency
> >>>
> >>> This is a more general problem. As we have separated metadata into
> >>> different files (cube, cube_desc, model_desc, project, etc) and
> >> maintaining
> >>> consistency across these files is not an easy task in both
> >>> FileResourceStore and HBaseResourceStore, IMO we'd better avoid
> >> operations
> >>> that change multiple metadata files as much as possible.
> >>> "CubeController.updateCubeDesc" is a notable counter-example. In order
> to
> >>> complete this operation, a sequence of metadata updates (model_desc ->
> >> cube
> >>> -> cube_desc -> cube -> project) is performed. Make sure
> >>> "CubeController.updateCubeDesc" won't leave metadata in half success
> >> state
> >>> is not easy.
> >>>
> >>> Given all these difficulties, do we really need to allow user to change
> >>> data model? Can we just make data model immutable and only allow user
> to
> >>> change cube desc? Immutable or versioned metadata is always good in my
> >>> experience, so a further question is can we make key parts (properties
> >> that
> >>> defines how cube was built, excluding description, notify_list for
> >> example)
> >>> of cube desc also immutable and just make a shortcut in front-end to
> let
> >>> user create new cube desc based on existing one?
> >>>
> >>> Best,
> >>> Dayue
> >>
> >>
> >>
> >>
> >> --
> >> Regards,
> >>
> >> *Bin Mahone | 马洪宾*
> >> Apache Kylin: http://kylin.io
> >> Github: https://github.com/binmahone
> >>
>
>
>

Re: about CubeController.updateCubeDesc

Posted by Dayue Gao <da...@163.com>.
Metadata consistency is one of the most crucial things for many systems.

So in the short run, to fix KYLIN-958, I suggest disallowing user to update data model. Even so, user can still create new data model to fulfill their needs.

In the long run, I'd suggest migrating metadata persistence from NoSQL like HBase to a transactional database like MySQL. Although lots of work need to be done, it will make keeping metadata consistency a lot easier.

What do you think?

Best,
Dayue

> 在 2015年8月25日,上午11:11,Li Yang <li...@apache.org> 写道:
> 
> Dayue has a good point. Although updating multiple resources in one request
> is doable but the complexity does not worth the effort.
> 
> Making model desc and cube desc immutable is a good idea. And we can still
> implement "update" by first delete the old model and cube, then create new
> ones with the same name. So from user point of view, it looks like an
> update. This work around should do well on 0.7 branch where model and cube
> are 1-1 strictly.
> 
> The reason model and cube are separate resource is because in 0.8 branch,
> they are 1-m relationship. User can create a model and create multiple
> cubes on it.
> 
> On Tue, Aug 25, 2015 at 10:31 AM, hongbin ma <ma...@apache.org> wrote:
> 
>> hi dayue,
>> 
>> I'll agree with you. Current cube desc/model desc design is a result of
>> multiple rounds of re-designing, and it may failed to take maintenance
>> convenience into well consideration. And to be honest it's quite complex
>> now, especially when involved with cube/model updates.
>> 
>> Making cubes/models immutable looks appealing to me. However we might need
>> some more front end work to reduce cube/model recreate overhead for users.
>> 
>> @liyang and @luke  will you please comment on this?
>> 
>> On Tue, Aug 25, 2015 at 5:12 AM, Dayue Gao <da...@163.com> wrote:
>> 
>>> Hi developers,
>>> 
>>> When I was working on https://issues.apache.org/jira/browse/KYLIN-958 <
>>> https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult
>> to
>>> implement CubeController.updateCubeDesc. The problems are
>>> 
>>> 1. CubeDesc.calculateSignature only include fact table name and partition
>>> desc as data model information
>>> 
>>> This means if user changes lookup tables or filter condition, cube desc
>>> signature won't change and kylin will not clear already built cube
>>> segments. BTW, why do we store signature in metadata rather than
>> calculate
>>> it on demands? I know it may be an optimization to avoid recalculating
>>> signature every time, however desc changing shouldn't be a regular
>>> operation, so persisting signature won't give us too much benefit. What's
>>> more, once it's been recorded in metadata, it makes us difficult to
>> change
>>> the computing logic.
>>> 
>>> 2. Maintain metadata consistency
>>> 
>>> This is a more general problem. As we have separated metadata into
>>> different files (cube, cube_desc, model_desc, project, etc) and
>> maintaining
>>> consistency across these files is not an easy task in both
>>> FileResourceStore and HBaseResourceStore, IMO we'd better avoid
>> operations
>>> that change multiple metadata files as much as possible.
>>> "CubeController.updateCubeDesc" is a notable counter-example. In order to
>>> complete this operation, a sequence of metadata updates (model_desc ->
>> cube
>>> -> cube_desc -> cube -> project) is performed. Make sure
>>> "CubeController.updateCubeDesc" won't leave metadata in half success
>> state
>>> is not easy.
>>> 
>>> Given all these difficulties, do we really need to allow user to change
>>> data model? Can we just make data model immutable and only allow user to
>>> change cube desc? Immutable or versioned metadata is always good in my
>>> experience, so a further question is can we make key parts (properties
>> that
>>> defines how cube was built, excluding description, notify_list for
>> example)
>>> of cube desc also immutable and just make a shortcut in front-end to let
>>> user create new cube desc based on existing one?
>>> 
>>> Best,
>>> Dayue
>> 
>> 
>> 
>> 
>> --
>> Regards,
>> 
>> *Bin Mahone | 马洪宾*
>> Apache Kylin: http://kylin.io
>> Github: https://github.com/binmahone
>> 



Re: about CubeController.updateCubeDesc

Posted by Li Yang <li...@apache.org>.
Dayue has a good point. Although updating multiple resources in one request
is doable but the complexity does not worth the effort.

Making model desc and cube desc immutable is a good idea. And we can still
implement "update" by first delete the old model and cube, then create new
ones with the same name. So from user point of view, it looks like an
update. This work around should do well on 0.7 branch where model and cube
are 1-1 strictly.

The reason model and cube are separate resource is because in 0.8 branch,
they are 1-m relationship. User can create a model and create multiple
cubes on it.

On Tue, Aug 25, 2015 at 10:31 AM, hongbin ma <ma...@apache.org> wrote:

> hi dayue,
>
> I'll agree with you. Current cube desc/model desc design is a result of
> multiple rounds of re-designing, and it may failed to take maintenance
> convenience into well consideration. And to be honest it's quite complex
> now, especially when involved with cube/model updates.
>
> Making cubes/models immutable looks appealing to me. However we might need
> some more front end work to reduce cube/model recreate overhead for users.
>
> @liyang and @luke  will you please comment on this?
>
> On Tue, Aug 25, 2015 at 5:12 AM, Dayue Gao <da...@163.com> wrote:
>
> > Hi developers,
> >
> > When I was working on https://issues.apache.org/jira/browse/KYLIN-958 <
> > https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult
> to
> > implement CubeController.updateCubeDesc. The problems are
> >
> > 1. CubeDesc.calculateSignature only include fact table name and partition
> > desc as data model information
> >
> > This means if user changes lookup tables or filter condition, cube desc
> > signature won't change and kylin will not clear already built cube
> > segments. BTW, why do we store signature in metadata rather than
> calculate
> > it on demands? I know it may be an optimization to avoid recalculating
> > signature every time, however desc changing shouldn't be a regular
> > operation, so persisting signature won't give us too much benefit. What's
> > more, once it's been recorded in metadata, it makes us difficult to
> change
> > the computing logic.
> >
> > 2. Maintain metadata consistency
> >
> > This is a more general problem. As we have separated metadata into
> > different files (cube, cube_desc, model_desc, project, etc) and
> maintaining
> > consistency across these files is not an easy task in both
> > FileResourceStore and HBaseResourceStore, IMO we'd better avoid
> operations
> > that change multiple metadata files as much as possible.
> > "CubeController.updateCubeDesc" is a notable counter-example. In order to
> > complete this operation, a sequence of metadata updates (model_desc ->
> cube
> > -> cube_desc -> cube -> project) is performed. Make sure
> > "CubeController.updateCubeDesc" won't leave metadata in half success
> state
> > is not easy.
> >
> > Given all these difficulties, do we really need to allow user to change
> > data model? Can we just make data model immutable and only allow user to
> > change cube desc? Immutable or versioned metadata is always good in my
> > experience, so a further question is can we make key parts (properties
> that
> > defines how cube was built, excluding description, notify_list for
> example)
> > of cube desc also immutable and just make a shortcut in front-end to let
> > user create new cube desc based on existing one?
> >
> > Best,
> > Dayue
>
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: about CubeController.updateCubeDesc

Posted by hongbin ma <ma...@apache.org>.
hi dayue,

I'll agree with you. Current cube desc/model desc design is a result of
multiple rounds of re-designing, and it may failed to take maintenance
convenience into well consideration. And to be honest it's quite complex
now, especially when involved with cube/model updates.

Making cubes/models immutable looks appealing to me. However we might need
some more front end work to reduce cube/model recreate overhead for users.

@liyang and @luke  will you please comment on this?

On Tue, Aug 25, 2015 at 5:12 AM, Dayue Gao <da...@163.com> wrote:

> Hi developers,
>
> When I was working on https://issues.apache.org/jira/browse/KYLIN-958 <
> https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult to
> implement CubeController.updateCubeDesc. The problems are
>
> 1. CubeDesc.calculateSignature only include fact table name and partition
> desc as data model information
>
> This means if user changes lookup tables or filter condition, cube desc
> signature won't change and kylin will not clear already built cube
> segments. BTW, why do we store signature in metadata rather than calculate
> it on demands? I know it may be an optimization to avoid recalculating
> signature every time, however desc changing shouldn't be a regular
> operation, so persisting signature won't give us too much benefit. What's
> more, once it's been recorded in metadata, it makes us difficult to change
> the computing logic.
>
> 2. Maintain metadata consistency
>
> This is a more general problem. As we have separated metadata into
> different files (cube, cube_desc, model_desc, project, etc) and maintaining
> consistency across these files is not an easy task in both
> FileResourceStore and HBaseResourceStore, IMO we'd better avoid operations
> that change multiple metadata files as much as possible.
> "CubeController.updateCubeDesc" is a notable counter-example. In order to
> complete this operation, a sequence of metadata updates (model_desc -> cube
> -> cube_desc -> cube -> project) is performed. Make sure
> "CubeController.updateCubeDesc" won't leave metadata in half success state
> is not easy.
>
> Given all these difficulties, do we really need to allow user to change
> data model? Can we just make data model immutable and only allow user to
> change cube desc? Immutable or versioned metadata is always good in my
> experience, so a further question is can we make key parts (properties that
> defines how cube was built, excluding description, notify_list for example)
> of cube desc also immutable and just make a shortcut in front-end to let
> user create new cube desc based on existing one?
>
> Best,
> Dayue




-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: about CubeController.updateCubeDesc

Posted by ShaoFeng Shi <sh...@gmail.com>.
Hi Dayue, here are some comments from my side:

1. The signagure covers not only fact table name, partition info, but also
dimensions and measures; The DimensionDesc object also contains the table
name, join condition, columns etc which are related with the lookup tables;
So, once there is change in the data model, this signature will also be
changed;

2. Persistent the old signature is for comparing with the new signature
after it be returned from front-end; see this:
https://github.com/apache/incubator-kylin/blob/0.7-staging/server/src/main/java/org/apache/kylin/rest/service/CubeService.java#L239

3. About the metadata consistency, in 0.7 it was a temporary solution,
which is not well implemented; From 0.8, Kylin UI has been changed a lot;
create/update data model are separate steps with create/update cube, that
will be easier for control;

2015-08-25 5:12 GMT+08:00 Dayue Gao <da...@163.com>:

> Hi developers,
>
> When I was working on https://issues.apache.org/jira/browse/KYLIN-958 <
> https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult to
> implement CubeController.updateCubeDesc. The problems are
>
> 1. CubeDesc.calculateSignature only include fact table name and partition
> desc as data model information
>
> This means if user changes lookup tables or filter condition, cube desc
> signature won't change and kylin will not clear already built cube
> segments. BTW, why do we store signature in metadata rather than calculate
> it on demands? I know it may be an optimization to avoid recalculating
> signature every time, however desc changing shouldn't be a regular
> operation, so persisting signature won't give us too much benefit. What's
> more, once it's been recorded in metadata, it makes us difficult to change
> the computing logic.
>
> 2. Maintain metadata consistency
>
> This is a more general problem. As we have separated metadata into
> different files (cube, cube_desc, model_desc, project, etc) and maintaining
> consistency across these files is not an easy task in both
> FileResourceStore and HBaseResourceStore, IMO we'd better avoid operations
> that change multiple metadata files as much as possible.
> "CubeController.updateCubeDesc" is a notable counter-example. In order to
> complete this operation, a sequence of metadata updates (model_desc -> cube
> -> cube_desc -> cube -> project) is performed. Make sure
> "CubeController.updateCubeDesc" won't leave metadata in half success state
> is not easy.
>
> Given all these difficulties, do we really need to allow user to change
> data model? Can we just make data model immutable and only allow user to
> change cube desc? Immutable or versioned metadata is always good in my
> experience, so a further question is can we make key parts (properties that
> defines how cube was built, excluding description, notify_list for example)
> of cube desc also immutable and just make a shortcut in front-end to let
> user create new cube desc based on existing one?
>
> Best,
> Dayue