You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by Da Tong <dt...@thoughtworks.com> on 2016/12/21 05:32:27 UTC

How to workaround with columns with NULL value?

Hi, all

I am using kylin 1.6.0. I have met three problem:

1. in one of my Metrics, some of the values are NULL, when I tried to
calculate the average of the column, the COUNT function will not filter out
NULL value, which means the average result is biased. One solution I found
is using another column to mark whether the value is NULL or not, but there
are hundreds of columns like this. I don't think adding another hundreds of
mark column as dimensions is a good way. Any suggestions about this
situation?

2. I need to do filter using WHERE clause in some metrics columns, such as
count rows that having value of one field over 100. It seems that I have to
add new columns such as A_FIELD_OVER_100 to achieve this. But what if the
*100* is a variable? User of our system need to filter out result based on
metrics value, should I add metrics into dimensions? Is this requirement an
uncommon case?

3. It seems that querying all-null columns issue is fixed in this issue
<https://issues.apache.org/jira/browse/KYLIN-1527> (Kylin 1527). But I
still got NullPointerError from RawMesureType.valueOf method. I just want
to make sure that Kylin support columns with all null values, right?

Any suggestion is welcome. Thank you.
-- 
TONG, Da / 佟达

Re: How to workaround with columns with NULL value?

Posted by ShaoFeng Shi <sh...@apache.org>.

Thanks Alberto on question 1, that's a known issue, we can prioritize it if
more people comment on it.

About 2, if the cardinality of that column isn't high, you can add it to
dimension to get this capacity. If it is high, suggest you do a conversion,
e.g, 0, 100, 200,...; you can do this easily with hive view;

About 3, could you please open a JIRA with the error trace? Thanks;

2016-12-21 15:26 GMT+08:00 Alberto Ramón <a....@gmail.com>:

> about 1º point: In Kylin 2049
> <https://issues.apache.org/jira/browse/KYLIN-2049> there is a commet
> <https://issues.apache.org/jira/browse/KYLIN-2049?focusedCommentId=15631755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15631755>
> of Shaofeng SHI
>
> 2016-12-21 6:32 GMT+01:00 Da Tong <dt...@thoughtworks.com>:
>
>> Hi, all
>>
>> I am using kylin 1.6.0. I have met three problem:
>>
>> 1. in one of my Metrics, some of the values are NULL, when I tried to
>> calculate the average of the column, the COUNT function will not filter out
>> NULL value, which means the average result is biased. One solution I found
>> is using another column to mark whether the value is NULL or not, but there
>> are hundreds of columns like this. I don't think adding another hundreds of
>> mark column as dimensions is a good way. Any suggestions about this
>> situation?
>>
>> 2. I need to do filter using WHERE clause in some metrics columns, such
>> as count rows that having value of one field over 100. It seems that I have
>> to add new columns such as A_FIELD_OVER_100 to achieve this. But what if
>> the *100* is a variable? User of our system need to filter out result based
>> on metrics value, should I add metrics into dimensions? Is this requirement
>> an uncommon case?
>>
>> 3. It seems that querying all-null columns issue is fixed in this issue
>> <https://issues.apache.org/jira/browse/KYLIN-1527> (Kylin 1527). But I
>> still got NullPointerError from RawMesureType.valueOf method. I just want
>> to make sure that Kylin support columns with all null values, right?
>>
>> Any suggestion is welcome. Thank you.
>> --
>> TONG, Da / 佟达
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: How to workaround with columns with NULL value?

Posted by Alberto Ramón <a....@gmail.com>.

about 1º point: In Kylin 2049
<https://issues.apache.org/jira/browse/KYLIN-2049> there is a commet
<https://issues.apache.org/jira/browse/KYLIN-2049?focusedCommentId=15631755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15631755>
of Shaofeng SHI

2016-12-21 6:32 GMT+01:00 Da Tong <dt...@thoughtworks.com>:

> Hi, all
>
> I am using kylin 1.6.0. I have met three problem:
>
> 1. in one of my Metrics, some of the values are NULL, when I tried to
> calculate the average of the column, the COUNT function will not filter out
> NULL value, which means the average result is biased. One solution I found
> is using another column to mark whether the value is NULL or not, but there
> are hundreds of columns like this. I don't think adding another hundreds of
> mark column as dimensions is a good way. Any suggestions about this
> situation?
>
> 2. I need to do filter using WHERE clause in some metrics columns, such as
> count rows that having value of one field over 100. It seems that I have to
> add new columns such as A_FIELD_OVER_100 to achieve this. But what if the
> *100* is a variable? User of our system need to filter out result based on
> metrics value, should I add metrics into dimensions? Is this requirement an
> uncommon case?
>
> 3. It seems that querying all-null columns issue is fixed in this issue
> <https://issues.apache.org/jira/browse/KYLIN-1527> (Kylin 1527). But I
> still got NullPointerError from RawMesureType.valueOf method. I just want
> to make sure that Kylin support columns with all null values, right?
>
> Any suggestion is welcome. Thank you.
> --
> TONG, Da / 佟达
>