You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Alberto Ramón <a....@gmail.com> on 2016/12/01 09:07:45 UTC

Re: Consulting "EXTENDED_COLUMN"

Hello
I was preparing a email with related doubts:

Some times we have derived dimensions with relation 1:1, examples:
WeekDayID & WeekDayTxt
MonthID & WeekTxt

SOL1: Derived.  ID as Host and Txt Extended
PB: You can't filter / Group by Txt

SOL2: Joint. Define tuples of ID & TXT
Some PB/limitation?  (I need test this option)

2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:

> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only used
> for representation, but not filtering or grouping which is  done by
> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
> key/value map against the HOST_COLUMN.
>
> If the value in EXTENDED_COLUMN is not long, you could just define two
> dimensions with joint dimension setting, it has almost the same performance
> impact with EXTENDED_COLUMN which reduces one dimension, but better
> understanding.
>
> 2016-11-30 19:00 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>
>> This will help you
>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>
>> The idea is always, How I can reduce the number of Dimension ?
>> If you reduce Dim, the time / resources to build the cube and final size
>> of
>> it decrease --> Its good
>>
>> An example can be DIM_Persons: Id_Person , Name, Surname, Address, .....
>>    Id_Person can be HostColumn
>>     and other columns can be calculated from ID --> are Extended Column
>>
>>
>>
>>
>> 2016-11-30 11:35 GMT+01:00 仇同心 <qi...@jd.com>:
>>
>> > Hi ,all
>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although I
>> saw
>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>> > What,s the means about parameters of “Host Column” and “Extended
>> Column”?
>> > Why use this expression,and what aspects of optimization that this
>> > expression solved?
>> > Can be combined with a SQL statement to explain?
>> >
>> >
>> > Thanks~
>> >
>>
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>

Re: Consulting "EXTENDED_COLUMN"

Posted by ShaoFeng Shi <sh...@apache.org>.
Kylin will encode the dimension values with Dictionary (default encoding)
or other encoding methods when composing the rowkey; so the overhead will
be less in most of cases.

2016-12-02 17:59 GMT+08:00 Alberto Ramón <a....@gmail.com>:

> yes, I will asume this overhead in rowKey
>
> 2016-12-02 9:58 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:
>
>> Using Joint Dimension for your 1:1 relation is the right design.
>>
>> 2016-12-02 0:21 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>
>>> Nice Liu
>>>
>>> We have some cases like
>>> DayWeekTXT , DayWeekID
>>> MonthTXT, MonthID
>>>
>>> small proposal:
>>> Can would be interesting create Derived with 1:1 relation, with support
>>> for filters and Group by
>>>
>>> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:
>>>
>>>> The cost of joint dimension compared with extended column is you have
>>>> more columns in the HBase rowkey. It may harm the query performance. But
>>>> most time, joint dimension is still recommended, since the normal dimension
>>>> column supports much more functions than extended column, such as count(*).
>>>>
>>>> 2016-12-01 17:07 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>>>
>>>>> Hello
>>>>> I was preparing a email with related doubts:
>>>>>
>>>>> Some times we have derived dimensions with relation 1:1, examples:
>>>>> WeekDayID & WeekDayTxt
>>>>> MonthID & WeekTxt
>>>>>
>>>>> SOL1: Derived.  ID as Host and Txt Extended
>>>>> PB: You can't filter / Group by Txt
>>>>>
>>>>> SOL2: Joint. Define tuples of ID & TXT
>>>>> Some PB/limitation?  (I need test this option)
>>>>>
>>>>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:
>>>>>
>>>>>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>>>>>> used for representation, but not filtering or grouping which is  done by
>>>>>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>>>>>> key/value map against the HOST_COLUMN.
>>>>>>
>>>>>> If the value in EXTENDED_COLUMN is not long, you could just define
>>>>>> two dimensions with joint dimension setting, it has almost the same
>>>>>> performance impact with EXTENDED_COLUMN which reduces one dimension, but
>>>>>> better understanding.
>>>>>>
>>>>>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>>>>>
>>>>>>> This will help you
>>>>>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>>>>>
>>>>>>> The idea is always, How I can reduce the number of Dimension ?
>>>>>>> If you reduce Dim, the time / resources to build the cube and final
>>>>>>> size of
>>>>>>> it decrease --> Its good
>>>>>>>
>>>>>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address,
>>>>>>> .....
>>>>>>>    Id_Person can be HostColumn
>>>>>>>     and other columns can be calculated from ID --> are Extended
>>>>>>> Column
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2016-11-30 11:35 GMT+01:00 仇同心 <qi...@jd.com>:
>>>>>>>
>>>>>>> > Hi ,all
>>>>>>> > I don’t understand the usage scenarios of
>>>>>>> EXTENDED_COLUMN,although I saw
>>>>>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>>>>>> > What,s the means about parameters of “Host Column” and “Extended
>>>>>>> Column”?
>>>>>>> > Why use this expression,and what aspects of optimization that this
>>>>>>> > expression solved?
>>>>>>> > Can be combined with a SQL statement to explain?
>>>>>>> >
>>>>>>> >
>>>>>>> > Thanks~
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> With Warm regards
>>>>>>
>>>>>> Yiming Liu (刘一鸣)
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> With Warm regards
>>>>
>>>> Yiming Liu (刘一鸣)
>>>>
>>>
>>>
>>
>>
>> --
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: Consulting "EXTENDED_COLUMN"

Posted by Alberto Ramón <a....@gmail.com>.
yes, I will asume this overhead in rowKey

2016-12-02 9:58 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:

> Using Joint Dimension for your 1:1 relation is the right design.
>
> 2016-12-02 0:21 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>
>> Nice Liu
>>
>> We have some cases like
>> DayWeekTXT , DayWeekID
>> MonthTXT, MonthID
>>
>> small proposal:
>> Can would be interesting create Derived with 1:1 relation, with support
>> for filters and Group by
>>
>> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:
>>
>>> The cost of joint dimension compared with extended column is you have
>>> more columns in the HBase rowkey. It may harm the query performance. But
>>> most time, joint dimension is still recommended, since the normal dimension
>>> column supports much more functions than extended column, such as count(*).
>>>
>>> 2016-12-01 17:07 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>>
>>>> Hello
>>>> I was preparing a email with related doubts:
>>>>
>>>> Some times we have derived dimensions with relation 1:1, examples:
>>>> WeekDayID & WeekDayTxt
>>>> MonthID & WeekTxt
>>>>
>>>> SOL1: Derived.  ID as Host and Txt Extended
>>>> PB: You can't filter / Group by Txt
>>>>
>>>> SOL2: Joint. Define tuples of ID & TXT
>>>> Some PB/limitation?  (I need test this option)
>>>>
>>>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:
>>>>
>>>>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>>>>> used for representation, but not filtering or grouping which is  done by
>>>>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>>>>> key/value map against the HOST_COLUMN.
>>>>>
>>>>> If the value in EXTENDED_COLUMN is not long, you could just define two
>>>>> dimensions with joint dimension setting, it has almost the same performance
>>>>> impact with EXTENDED_COLUMN which reduces one dimension, but better
>>>>> understanding.
>>>>>
>>>>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>>>>
>>>>>> This will help you
>>>>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>>>>
>>>>>> The idea is always, How I can reduce the number of Dimension ?
>>>>>> If you reduce Dim, the time / resources to build the cube and final
>>>>>> size of
>>>>>> it decrease --> Its good
>>>>>>
>>>>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address,
>>>>>> .....
>>>>>>    Id_Person can be HostColumn
>>>>>>     and other columns can be calculated from ID --> are Extended
>>>>>> Column
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2016-11-30 11:35 GMT+01:00 仇同心 <qi...@jd.com>:
>>>>>>
>>>>>> > Hi ,all
>>>>>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although
>>>>>> I saw
>>>>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>>>>> > What,s the means about parameters of “Host Column” and “Extended
>>>>>> Column”?
>>>>>> > Why use this expression,and what aspects of optimization that this
>>>>>> > expression solved?
>>>>>> > Can be combined with a SQL statement to explain?
>>>>>> >
>>>>>> >
>>>>>> > Thanks~
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> With Warm regards
>>>>>
>>>>> Yiming Liu (刘一鸣)
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>

Re: Consulting "EXTENDED_COLUMN"

Posted by "Billy(Yiming) Liu" <li...@gmail.com>.
Using Joint Dimension for your 1:1 relation is the right design.

2016-12-02 0:21 GMT+08:00 Alberto Ramón <a....@gmail.com>:

> Nice Liu
>
> We have some cases like
> DayWeekTXT , DayWeekID
> MonthTXT, MonthID
>
> small proposal:
> Can would be interesting create Derived with 1:1 relation, with support
> for filters and Group by
>
> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:
>
>> The cost of joint dimension compared with extended column is you have
>> more columns in the HBase rowkey. It may harm the query performance. But
>> most time, joint dimension is still recommended, since the normal dimension
>> column supports much more functions than extended column, such as count(*).
>>
>> 2016-12-01 17:07 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>
>>> Hello
>>> I was preparing a email with related doubts:
>>>
>>> Some times we have derived dimensions with relation 1:1, examples:
>>> WeekDayID & WeekDayTxt
>>> MonthID & WeekTxt
>>>
>>> SOL1: Derived.  ID as Host and Txt Extended
>>> PB: You can't filter / Group by Txt
>>>
>>> SOL2: Joint. Define tuples of ID & TXT
>>> Some PB/limitation?  (I need test this option)
>>>
>>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:
>>>
>>>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>>>> used for representation, but not filtering or grouping which is  done by
>>>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>>>> key/value map against the HOST_COLUMN.
>>>>
>>>> If the value in EXTENDED_COLUMN is not long, you could just define two
>>>> dimensions with joint dimension setting, it has almost the same performance
>>>> impact with EXTENDED_COLUMN which reduces one dimension, but better
>>>> understanding.
>>>>
>>>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>>>
>>>>> This will help you
>>>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>>>
>>>>> The idea is always, How I can reduce the number of Dimension ?
>>>>> If you reduce Dim, the time / resources to build the cube and final
>>>>> size of
>>>>> it decrease --> Its good
>>>>>
>>>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address,
>>>>> .....
>>>>>    Id_Person can be HostColumn
>>>>>     and other columns can be calculated from ID --> are Extended Column
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2016-11-30 11:35 GMT+01:00 仇同心 <qi...@jd.com>:
>>>>>
>>>>> > Hi ,all
>>>>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although
>>>>> I saw
>>>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>>>> > What,s the means about parameters of “Host Column” and “Extended
>>>>> Column”?
>>>>> > Why use this expression,and what aspects of optimization that this
>>>>> > expression solved?
>>>>> > Can be combined with a SQL statement to explain?
>>>>> >
>>>>> >
>>>>> > Thanks~
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> With Warm regards
>>>>
>>>> Yiming Liu (刘一鸣)
>>>>
>>>
>>>
>>
>>
>> --
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>
>


-- 
With Warm regards

Yiming Liu (刘一鸣)

Re: Consulting "EXTENDED_COLUMN"

Posted by Alberto Ramón <a....@gmail.com>.
Nice Liu

We have some cases like
DayWeekTXT , DayWeekID
MonthTXT, MonthID

small proposal:
Can would be interesting create Derived with 1:1 relation, with support for
filters and Group by

2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:

> The cost of joint dimension compared with extended column is you have more
> columns in the HBase rowkey. It may harm the query performance. But most
> time, joint dimension is still recommended, since the normal dimension
> column supports much more functions than extended column, such as count(*).
>
> 2016-12-01 17:07 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>
>> Hello
>> I was preparing a email with related doubts:
>>
>> Some times we have derived dimensions with relation 1:1, examples:
>> WeekDayID & WeekDayTxt
>> MonthID & WeekTxt
>>
>> SOL1: Derived.  ID as Host and Txt Extended
>> PB: You can't filter / Group by Txt
>>
>> SOL2: Joint. Define tuples of ID & TXT
>> Some PB/limitation?  (I need test this option)
>>
>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:
>>
>>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>>> used for representation, but not filtering or grouping which is  done by
>>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>>> key/value map against the HOST_COLUMN.
>>>
>>> If the value in EXTENDED_COLUMN is not long, you could just define two
>>> dimensions with joint dimension setting, it has almost the same performance
>>> impact with EXTENDED_COLUMN which reduces one dimension, but better
>>> understanding.
>>>
>>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>>
>>>> This will help you
>>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>>
>>>> The idea is always, How I can reduce the number of Dimension ?
>>>> If you reduce Dim, the time / resources to build the cube and final
>>>> size of
>>>> it decrease --> Its good
>>>>
>>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address, .....
>>>>    Id_Person can be HostColumn
>>>>     and other columns can be calculated from ID --> are Extended Column
>>>>
>>>>
>>>>
>>>>
>>>> 2016-11-30 11:35 GMT+01:00 仇同心 <qi...@jd.com>:
>>>>
>>>> > Hi ,all
>>>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although I
>>>> saw
>>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>>> > What,s the means about parameters of “Host Column” and “Extended
>>>> Column”?
>>>> > Why use this expression,and what aspects of optimization that this
>>>> > expression solved?
>>>> > Can be combined with a SQL statement to explain?
>>>> >
>>>> >
>>>> > Thanks~
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>

Re: Consulting "EXTENDED_COLUMN"

Posted by "Billy(Yiming) Liu" <li...@gmail.com>.
The cost of joint dimension compared with extended column is you have more
columns in the HBase rowkey. It may harm the query performance. But most
time, joint dimension is still recommended, since the normal dimension
column supports much more functions than extended column, such as count(*).

2016-12-01 17:07 GMT+08:00 Alberto Ramón <a....@gmail.com>:

> Hello
> I was preparing a email with related doubts:
>
> Some times we have derived dimensions with relation 1:1, examples:
> WeekDayID & WeekDayTxt
> MonthID & WeekTxt
>
> SOL1: Derived.  ID as Host and Txt Extended
> PB: You can't filter / Group by Txt
>
> SOL2: Joint. Define tuples of ID & TXT
> Some PB/limitation?  (I need test this option)
>
> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu <li...@gmail.com>:
>
>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>> used for representation, but not filtering or grouping which is  done by
>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>> key/value map against the HOST_COLUMN.
>>
>> If the value in EXTENDED_COLUMN is not long, you could just define two
>> dimensions with joint dimension setting, it has almost the same performance
>> impact with EXTENDED_COLUMN which reduces one dimension, but better
>> understanding.
>>
>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>>
>>> This will help you
>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>
>>> The idea is always, How I can reduce the number of Dimension ?
>>> If you reduce Dim, the time / resources to build the cube and final size
>>> of
>>> it decrease --> Its good
>>>
>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address, .....
>>>    Id_Person can be HostColumn
>>>     and other columns can be calculated from ID --> are Extended Column
>>>
>>>
>>>
>>>
>>> 2016-11-30 11:35 GMT+01:00 仇同心 <qi...@jd.com>:
>>>
>>> > Hi ,all
>>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although I
>>> saw
>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>> > What,s the means about parameters of “Host Column” and “Extended
>>> Column”?
>>> > Why use this expression,and what aspects of optimization that this
>>> > expression solved?
>>> > Can be combined with a SQL statement to explain?
>>> >
>>> >
>>> > Thanks~
>>> >
>>>
>>
>>
>>
>> --
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>
>


-- 
With Warm regards

Yiming Liu (刘一鸣)