You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by Kiriti Sai <ki...@gmail.com> on 2016/01/08 08:04:08 UTC

Only retrieving high cardinality dimension

Hi,
I have a description column in my data which is mainly a set of key value
pairs which will never be searched but will only be retrieved. Due to the
nature of the logs, the cardinality of this dimension is going to be nearly
equal to the number of rows. Hence I have to disable the dictionary for
this dimension and the cube size is increasing multiple times. The length
of this column would be around 80-100 characters. This field cannot be
split up as it is designed to provided flexibility for collecting
measurements.
So, is there any provision of specifying a dimension that it will only be
retrieved but will never be queried. Does such a thing speed up the cube
building process.
Please let me know if there is any work around that you guys can think of.

Thank you,
Sai Kiriti B.

Re: Only retrieving high cardinality dimension

Posted by Luke Han <lu...@gmail.com>.

Well, if you do not put this dimension in group by, how could you get this
dimension from SQL? If you want to fetch data from fact table without any
group by (just where), it will not return exactly result so far (kylin does
not support query on raw data yet).

Would you mind describe a little bit about your data structure and query
sample?

Thanks.

Best Regards!
---------------------

Luke Han

On Fri, Jan 8, 2016 at 6:26 PM, Kiriti Sai <ki...@gmail.com> wrote:

> I don't want to group by or apply any filters for that dimension, I just
> want to do all those query related things with other dimensions and just
> get the corresponding description column value.
> As I mentioned before, I'm not sure if such optimization exists in any OLAP
> systems, but just wanted to clarify.
>
> Thank you.
> On Jan 8, 2016 7:19 PM, "hongbin ma" <ma...@apache.org> wrote:
>
> > how do you wish to use such dimension if we imaginarily had implemented
> it
> > ?
> > you can't apply groupby or filters on it, and it's obviously not a
> measure
> > either.
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>

Re: Only retrieving high cardinality dimension

Posted by Kiriti Sai <ki...@gmail.com>.

I don't want to group by or apply any filters for that dimension, I just
want to do all those query related things with other dimensions and just
get the corresponding description column value.
As I mentioned before, I'm not sure if such optimization exists in any OLAP
systems, but just wanted to clarify.

Thank you.
On Jan 8, 2016 7:19 PM, "hongbin ma" <ma...@apache.org> wrote:

> how do you wish to use such dimension if we imaginarily had implemented it
> ?
> you can't apply groupby or filters on it, and it's obviously not a measure
> either.
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Only retrieving high cardinality dimension

Posted by hongbin ma <ma...@apache.org>.

how do you wish to use such dimension if we imaginarily had implemented it ?
you can't apply groupby or filters on it, and it's obviously not a measure
either.


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone