You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by hongbin ma <ma...@apache.org> on 2016/12/10 14:04:17 UTC

Re: IN_THRESHOLD

and FYI, the size of IN_THRESHOLD is configurable after
https://issues.apache.org/jira/browse/KYLIN-2193

On Mon, Nov 21, 2016 at 4:35 PM, Alberto Ramón <a....@gmail.com>
wrote:

> very very clear,
> thanks ¡¡
>
> 2016-11-18 4:16 GMT+01:00 Li Yang <li...@apache.org>:
>
>> For filter on derived column, it has to translate into a filter on PK.
>>
>> E.g. say USER_NAME is a derived column (not on cube), USER_ID is its PK
>> (on cube). When filter USER_NAME='liyang' comes in, it need to translate
>> into USER_ID in (1,211,382), where ID 1, 211, 382 are three users whose
>> name is 'liyang'.
>>
>> Now consider 'liyang' is so common a name that there are thousands of
>> 'liyang's. Then the IN clause becomes super long and can cause performance
>> problem during storage scanning. In such case, the filter can be translated
>> into a range filter instead, like USER_ID between 1 and 382.
>>
>> The threshold is used to decided whether the translation to return a IN
>> condition or a range condition.
>>
>> Cheers
>> Yang
>>
>> On Wed, Nov 16, 2016 at 12:35 AM, Alberto Ramón <
>> a.ramonportoles@gmail.com> wrote:
>>
>>> About Kylin 2193
>>> What is the poupose of org.apache.kylin.storage.translate.DerivedFilterTranslator#
>>> IN_THRESHOLD ? :)
>>> (when is used?)
>>>
>>
>>
>


-- 
Regards,

*Bin Mahone | 马洪宾*