You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by Huang Hua <hu...@mininglamp.com> on 2015/08/10 12:17:23 UTC

答复: Querying raw data / lowest granularity with Kylin

I haven't used the "InvertedIndex" feature, but I think the feature is still in early stage in terms of functionality and stability. 

Back to the time when we were using with kylin-0.6, we had a very similar use case that to drill down to the lowest granularity of the data.
What we did is to define the filter columns as dimensions(almost defined as mandatory ones to avoid the cube expansion), all other result columns as measures. 

You can think of our case more like using kylin to build query index in HBase in order to support queries like "fetch all transactions given a user or server user ids or user names or other filters so".
However, ultimately, we realized that maybe Kylin wasn't the best option to support such queries, because Kylin is very good at rollup queries with pre-computed measures and a limited number of filters. Perhaps with the enhancement of "InvertedIndex" we can see more possibilities from Kylin when dealing with the lowest granularity queries.

Best,
Hua	
> -----邮件原件-----
> 发件人: dev-return-3593-
> huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> 3593-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 alex
> schufo
> 发送时间: 2015年8月10日 17:24
> 收件人: dev@kylin.incubator.apache.org
> 主题: Querying raw data / lowest granularity with Kylin
> 
> I have some scenarios where I would like to drill down to the lowest
> granularity of my table, does Kylin handle this?
> 
> If I am not mistaken a least one "group by" should always be used.
> 
> So I tried to query by grouping by all my dimensions at the same time :
> "select dim1, dim2, ..., dimN, sum(measure1), ..., sum(measureN) from ...
> where ... group by dim1, dim2, ..., dimN". This gives me the expected results.
> Is this the correct way to do it?
> 
> Although this seems to work, with several dimension it would mean building
> a lot of cubes and using a lot of space whereas in this case it would not
> necessarily be used. I know that aggregation groups can be used to solve
> reduce this. With the same example I created 1 aggregation group for each
> dimension and the expansion rate is 200%, but I tested only on 5 dimensions.
> Again, is this the correct way to do it?
> 
> Relative to this topic, I saw:
> 
> v0.7.x: InvertedIndex (HybridOLAP)
> Goal:
> Introduce InvertedIndex to optimise queries on raw data and low level
> aggregation
> 
> on https://issues.apache.org/jira/browse/KYLIN-577
> 
> Is this something that is currently available in 0.7.2? This ticket dates back
> from beginning 2015, so I am not sure if it reflects Kylin current plan or not.

Re: 答复: Querying raw data / lowest granularity with Kylin

Posted by Luke Han <lu...@apache.org>.

TopN can't server detail/raw data query.
Suppose there are transaction data, and the requirement is
to query all transaction in last 7 days by specified accountID.
It has to include all detail level data so that could get them
by right filter condition.

and, in real world, the detail query patterns are very specified with
where clause even limited columns.

So I'm thinking to open one option for modeler to declare certian
 query pattern and Kylin only optimize such queries.
It could be more easy than store and optimize all detail data.

On Wed, Aug 12, 2015 at 1:01 PM, Li Yang <li...@apache.org> wrote:

> Thanks for sharing the use cases. I can see the demand of seeing raw data
> in Kylin.
>
> Think the TopN feature may satisfy such need to a big extent.  Say for
> every aggregated number, user can see the top 10000 records that contribute
> to the sum.  Would this be enough?  I guess yes, because anything behind
> 10000 is minority and of less interest.  And in case the breakdown is less
> than 10000 rows, user will the see full population.
>
> On Wed, Aug 12, 2015 at 1:35 AM, alex schufo <al...@gmail.com> wrote:
>
> > Thanks for those details.
> >
> > I read about mandatory dimensions in the presentation, but how does one
> > make a dimension mandatory in the Cube Builder UI?
> >
> > In terms of use case I can see the following:
> >
> >    - Drill down from hierarchies (aggregations) until the lowest
> >    granularity (raw data). For example imagine you have book stores
> > everywhere
> >    in the US, the user would pick a date range and see how many sells per
> > US
> >    State, then click one State and see how many sells per city for this
> > State,
> >    then click on one city and see the sells per book store for that city,
> > and
> >    finally when clicking on one store you could see the actual
> transactions
> >    that lead to those sells total numbers
> >    - Use Kylin as a single fast access to Hadoop data: build cubes for
> >    regular OLAP process but also being able to query other Hive tables
> > that do
> >    not require specifically aggregations but dimensional filtering on raw
> > data
> >    and benefiting from Kylin SQL interface and fast HBase queries
> >
> > These are not as strong requirements as what Kylin provides (OLAP) but
> > having it would be very nice in my view, if it fits the project.
> >
> > On Tue, Aug 11, 2015 at 10:00 AM, Li Yang <li...@apache.org> wrote:
> >
> > > > ... at least one "group by" should always be used.
> > >
> > > This is correct. So the lowest granularity Kylin provides is by
> grouping
> > > all dimensions, which is what Alex has tried if I understand correctly.
> > We
> > > believe this can solve 90% of analysis requirement.
> > >
> > > > ... using a lot of space whereas in this case it would not
> necessarily
> > be
> > > used.
> > >
> > > You can set dimensions to be "mandatory" such that less dimension
> > > combinations will be calculated.  See more at
> > > http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> > >
> > > > "InvertedIndex" feature ... is still in early stage in terms of
> > > functionality and stability.
> > >
> > > Very true.  We have experimented "inverted-index" to solve two
> > > requirements: 1) Neal Real Time data readiness in Kylin;  2) Query raw
> > > data.  Later 1) is solved by another feature called Stream Cubing, thus
> > the
> > > priority of "inverted-index" greatly reduces since the need of raw
> record
> > > analysis seems not strong.
> > >
> > >
> > > Do you (or any one) see raw record query a must-have feature?  We'd
> like
> > to
> > > hear your use case.
> > >
> > > Cheers
> > > Yang
> > >
> > > On Tue, Aug 11, 2015 at 8:30 AM, Luke Han <lu...@gmail.com> wrote:
> > >
> > > > Currently, Kylin not support detail/raw data query, that's why you
> > > already
> > > > knew you have add at least one "group by" in your query.
> > > >
> > > > As growing requirement about this feature, we actually are evaluating
> > > > and will update our idea soon here.
> > > >
> > > > The roadmap is a little bit changed due to some priority changed.
> > > > I'm drafting a new one for coming release.
> > > >
> > > > Please help to let's know if there are any feature, function or
> > anything
> > > > else which missing but your cases are really need them.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > >
> > > > Best Regards!
> > > > ---------------------
> > > >
> > > > Luke Han
> > > >
> > > > On Mon, Aug 10, 2015 at 6:17 PM, Huang Hua <hu...@mininglamp.com>
> > > > wrote:
> > > >
> > > > > I haven't used the "InvertedIndex" feature, but I think the feature
> > is
> > > > > still in early stage in terms of functionality and stability.
> > > > >
> > > > > Back to the time when we were using with kylin-0.6, we had a very
> > > similar
> > > > > use case that to drill down to the lowest granularity of the data.
> > > > > What we did is to define the filter columns as dimensions(almost
> > > defined
> > > > > as mandatory ones to avoid the cube expansion), all other result
> > > columns
> > > > as
> > > > > measures.
> > > > >
> > > > > You can think of our case more like using kylin to build query
> index
> > in
> > > > > HBase in order to support queries like "fetch all transactions
> given
> > a
> > > > user
> > > > > or server user ids or user names or other filters so".
> > > > > However, ultimately, we realized that maybe Kylin wasn't the best
> > > option
> > > > > to support such queries, because Kylin is very good at rollup
> queries
> > > > with
> > > > > pre-computed measures and a limited number of filters. Perhaps with
> > the
> > > > > enhancement of "InvertedIndex" we can see more possibilities from
> > Kylin
> > > > > when dealing with the lowest granularity queries.
> > > > >
> > > > > Best,
> > > > > Hua
> > > > > > -----邮件原件-----
> > > > > > 发件人: dev-return-3593-
> > > > > > huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:
> > > dev-return-
> > > > > > 3593-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 alex
> > > > > > schufo
> > > > > > 发送时间: 2015年8月10日 17:24
> > > > > > 收件人: dev@kylin.incubator.apache.org
> > > > > > 主题: Querying raw data / lowest granularity with Kylin
> > > > > >
> > > > > > I have some scenarios where I would like to drill down to the
> > lowest
> > > > > > granularity of my table, does Kylin handle this?
> > > > > >
> > > > > > If I am not mistaken a least one "group by" should always be
> used.
> > > > > >
> > > > > > So I tried to query by grouping by all my dimensions at the same
> > > time :
> > > > > > "select dim1, dim2, ..., dimN, sum(measure1), ..., sum(measureN)
> > from
> > > > ...
> > > > > > where ... group by dim1, dim2, ..., dimN". This gives me the
> > expected
> > > > > results.
> > > > > > Is this the correct way to do it?
> > > > > >
> > > > > > Although this seems to work, with several dimension it would mean
> > > > > building
> > > > > > a lot of cubes and using a lot of space whereas in this case it
> > would
> > > > not
> > > > > > necessarily be used. I know that aggregation groups can be used
> to
> > > > solve
> > > > > > reduce this. With the same example I created 1 aggregation group
> > for
> > > > each
> > > > > > dimension and the expansion rate is 200%, but I tested only on 5
> > > > > dimensions.
> > > > > > Again, is this the correct way to do it?
> > > > > >
> > > > > > Relative to this topic, I saw:
> > > > > >
> > > > > > v0.7.x: InvertedIndex (HybridOLAP)
> > > > > > Goal:
> > > > > > Introduce InvertedIndex to optimise queries on raw data and low
> > level
> > > > > > aggregation
> > > > > >
> > > > > > on https://issues.apache.org/jira/browse/KYLIN-577
> > > > > >
> > > > > > Is this something that is currently available in 0.7.2? This
> ticket
> > > > > dates back
> > > > > > from beginning 2015, so I am not sure if it reflects Kylin
> current
> > > plan
> > > > > or not.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: 答复: Querying raw data / lowest granularity with Kylin

Posted by Li Yang <li...@apache.org>.

Thanks for sharing the use cases. I can see the demand of seeing raw data
in Kylin.

Think the TopN feature may satisfy such need to a big extent.  Say for
every aggregated number, user can see the top 10000 records that contribute
to the sum.  Would this be enough?  I guess yes, because anything behind
10000 is minority and of less interest.  And in case the breakdown is less
than 10000 rows, user will the see full population.

On Wed, Aug 12, 2015 at 1:35 AM, alex schufo <al...@gmail.com> wrote:

> Thanks for those details.
>
> I read about mandatory dimensions in the presentation, but how does one
> make a dimension mandatory in the Cube Builder UI?
>
> In terms of use case I can see the following:
>
>    - Drill down from hierarchies (aggregations) until the lowest
>    granularity (raw data). For example imagine you have book stores
> everywhere
>    in the US, the user would pick a date range and see how many sells per
> US
>    State, then click one State and see how many sells per city for this
> State,
>    then click on one city and see the sells per book store for that city,
> and
>    finally when clicking on one store you could see the actual transactions
>    that lead to those sells total numbers
>    - Use Kylin as a single fast access to Hadoop data: build cubes for
>    regular OLAP process but also being able to query other Hive tables
> that do
>    not require specifically aggregations but dimensional filtering on raw
> data
>    and benefiting from Kylin SQL interface and fast HBase queries
>
> These are not as strong requirements as what Kylin provides (OLAP) but
> having it would be very nice in my view, if it fits the project.
>
> On Tue, Aug 11, 2015 at 10:00 AM, Li Yang <li...@apache.org> wrote:
>
> > > ... at least one "group by" should always be used.
> >
> > This is correct. So the lowest granularity Kylin provides is by grouping
> > all dimensions, which is what Alex has tried if I understand correctly.
> We
> > believe this can solve 90% of analysis requirement.
> >
> > > ... using a lot of space whereas in this case it would not necessarily
> be
> > used.
> >
> > You can set dimensions to be "mandatory" such that less dimension
> > combinations will be calculated.  See more at
> > http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> >
> > > "InvertedIndex" feature ... is still in early stage in terms of
> > functionality and stability.
> >
> > Very true.  We have experimented "inverted-index" to solve two
> > requirements: 1) Neal Real Time data readiness in Kylin;  2) Query raw
> > data.  Later 1) is solved by another feature called Stream Cubing, thus
> the
> > priority of "inverted-index" greatly reduces since the need of raw record
> > analysis seems not strong.
> >
> >
> > Do you (or any one) see raw record query a must-have feature?  We'd like
> to
> > hear your use case.
> >
> > Cheers
> > Yang
> >
> > On Tue, Aug 11, 2015 at 8:30 AM, Luke Han <lu...@gmail.com> wrote:
> >
> > > Currently, Kylin not support detail/raw data query, that's why you
> > already
> > > knew you have add at least one "group by" in your query.
> > >
> > > As growing requirement about this feature, we actually are evaluating
> > > and will update our idea soon here.
> > >
> > > The roadmap is a little bit changed due to some priority changed.
> > > I'm drafting a new one for coming release.
> > >
> > > Please help to let's know if there are any feature, function or
> anything
> > > else which missing but your cases are really need them.
> > >
> > > Thanks.
> > >
> > >
> > >
> > >
> > > Best Regards!
> > > ---------------------
> > >
> > > Luke Han
> > >
> > > On Mon, Aug 10, 2015 at 6:17 PM, Huang Hua <hu...@mininglamp.com>
> > > wrote:
> > >
> > > > I haven't used the "InvertedIndex" feature, but I think the feature
> is
> > > > still in early stage in terms of functionality and stability.
> > > >
> > > > Back to the time when we were using with kylin-0.6, we had a very
> > similar
> > > > use case that to drill down to the lowest granularity of the data.
> > > > What we did is to define the filter columns as dimensions(almost
> > defined
> > > > as mandatory ones to avoid the cube expansion), all other result
> > columns
> > > as
> > > > measures.
> > > >
> > > > You can think of our case more like using kylin to build query index
> in
> > > > HBase in order to support queries like "fetch all transactions given
> a
> > > user
> > > > or server user ids or user names or other filters so".
> > > > However, ultimately, we realized that maybe Kylin wasn't the best
> > option
> > > > to support such queries, because Kylin is very good at rollup queries
> > > with
> > > > pre-computed measures and a limited number of filters. Perhaps with
> the
> > > > enhancement of "InvertedIndex" we can see more possibilities from
> Kylin
> > > > when dealing with the lowest granularity queries.
> > > >
> > > > Best,
> > > > Hua
> > > > > -----邮件原件-----
> > > > > 发件人: dev-return-3593-
> > > > > huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:
> > dev-return-
> > > > > 3593-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 alex
> > > > > schufo
> > > > > 发送时间: 2015年8月10日 17:24
> > > > > 收件人: dev@kylin.incubator.apache.org
> > > > > 主题: Querying raw data / lowest granularity with Kylin
> > > > >
> > > > > I have some scenarios where I would like to drill down to the
> lowest
> > > > > granularity of my table, does Kylin handle this?
> > > > >
> > > > > If I am not mistaken a least one "group by" should always be used.
> > > > >
> > > > > So I tried to query by grouping by all my dimensions at the same
> > time :
> > > > > "select dim1, dim2, ..., dimN, sum(measure1), ..., sum(measureN)
> from
> > > ...
> > > > > where ... group by dim1, dim2, ..., dimN". This gives me the
> expected
> > > > results.
> > > > > Is this the correct way to do it?
> > > > >
> > > > > Although this seems to work, with several dimension it would mean
> > > > building
> > > > > a lot of cubes and using a lot of space whereas in this case it
> would
> > > not
> > > > > necessarily be used. I know that aggregation groups can be used to
> > > solve
> > > > > reduce this. With the same example I created 1 aggregation group
> for
> > > each
> > > > > dimension and the expansion rate is 200%, but I tested only on 5
> > > > dimensions.
> > > > > Again, is this the correct way to do it?
> > > > >
> > > > > Relative to this topic, I saw:
> > > > >
> > > > > v0.7.x: InvertedIndex (HybridOLAP)
> > > > > Goal:
> > > > > Introduce InvertedIndex to optimise queries on raw data and low
> level
> > > > > aggregation
> > > > >
> > > > > on https://issues.apache.org/jira/browse/KYLIN-577
> > > > >
> > > > > Is this something that is currently available in 0.7.2? This ticket
> > > > dates back
> > > > > from beginning 2015, so I am not sure if it reflects Kylin current
> > plan
> > > > or not.
> > > >
> > > >
> > > >
> > >
> >
>

Re: 答复: Querying raw data / lowest granularity with Kylin

Posted by alex schufo <al...@gmail.com>.

Thanks for those details.

I read about mandatory dimensions in the presentation, but how does one
make a dimension mandatory in the Cube Builder UI?

In terms of use case I can see the following:

   - Drill down from hierarchies (aggregations) until the lowest
   granularity (raw data). For example imagine you have book stores everywhere
   in the US, the user would pick a date range and see how many sells per US
   State, then click one State and see how many sells per city for this State,
   then click on one city and see the sells per book store for that city, and
   finally when clicking on one store you could see the actual transactions
   that lead to those sells total numbers
   - Use Kylin as a single fast access to Hadoop data: build cubes for
   regular OLAP process but also being able to query other Hive tables that do
   not require specifically aggregations but dimensional filtering on raw data
   and benefiting from Kylin SQL interface and fast HBase queries

These are not as strong requirements as what Kylin provides (OLAP) but
having it would be very nice in my view, if it fits the project.

On Tue, Aug 11, 2015 at 10:00 AM, Li Yang <li...@apache.org> wrote:

> > ... at least one "group by" should always be used.
>
> This is correct. So the lowest granularity Kylin provides is by grouping
> all dimensions, which is what Alex has tried if I understand correctly.  We
> believe this can solve 90% of analysis requirement.
>
> > ... using a lot of space whereas in this case it would not necessarily be
> used.
>
> You can set dimensions to be "mandatory" such that less dimension
> combinations will be calculated.  See more at
> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
>
> > "InvertedIndex" feature ... is still in early stage in terms of
> functionality and stability.
>
> Very true.  We have experimented "inverted-index" to solve two
> requirements: 1) Neal Real Time data readiness in Kylin;  2) Query raw
> data.  Later 1) is solved by another feature called Stream Cubing, thus the
> priority of "inverted-index" greatly reduces since the need of raw record
> analysis seems not strong.
>
>
> Do you (or any one) see raw record query a must-have feature?  We'd like to
> hear your use case.
>
> Cheers
> Yang
>
> On Tue, Aug 11, 2015 at 8:30 AM, Luke Han <lu...@gmail.com> wrote:
>
> > Currently, Kylin not support detail/raw data query, that's why you
> already
> > knew you have add at least one "group by" in your query.
> >
> > As growing requirement about this feature, we actually are evaluating
> > and will update our idea soon here.
> >
> > The roadmap is a little bit changed due to some priority changed.
> > I'm drafting a new one for coming release.
> >
> > Please help to let's know if there are any feature, function or anything
> > else which missing but your cases are really need them.
> >
> > Thanks.
> >
> >
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Mon, Aug 10, 2015 at 6:17 PM, Huang Hua <hu...@mininglamp.com>
> > wrote:
> >
> > > I haven't used the "InvertedIndex" feature, but I think the feature is
> > > still in early stage in terms of functionality and stability.
> > >
> > > Back to the time when we were using with kylin-0.6, we had a very
> similar
> > > use case that to drill down to the lowest granularity of the data.
> > > What we did is to define the filter columns as dimensions(almost
> defined
> > > as mandatory ones to avoid the cube expansion), all other result
> columns
> > as
> > > measures.
> > >
> > > You can think of our case more like using kylin to build query index in
> > > HBase in order to support queries like "fetch all transactions given a
> > user
> > > or server user ids or user names or other filters so".
> > > However, ultimately, we realized that maybe Kylin wasn't the best
> option
> > > to support such queries, because Kylin is very good at rollup queries
> > with
> > > pre-computed measures and a limited number of filters. Perhaps with the
> > > enhancement of "InvertedIndex" we can see more possibilities from Kylin
> > > when dealing with the lowest granularity queries.
> > >
> > > Best,
> > > Hua
> > > > -----邮件原件-----
> > > > 发件人: dev-return-3593-
> > > > huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:
> dev-return-
> > > > 3593-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 alex
> > > > schufo
> > > > 发送时间: 2015年8月10日 17:24
> > > > 收件人: dev@kylin.incubator.apache.org
> > > > 主题: Querying raw data / lowest granularity with Kylin
> > > >
> > > > I have some scenarios where I would like to drill down to the lowest
> > > > granularity of my table, does Kylin handle this?
> > > >
> > > > If I am not mistaken a least one "group by" should always be used.
> > > >
> > > > So I tried to query by grouping by all my dimensions at the same
> time :
> > > > "select dim1, dim2, ..., dimN, sum(measure1), ..., sum(measureN) from
> > ...
> > > > where ... group by dim1, dim2, ..., dimN". This gives me the expected
> > > results.
> > > > Is this the correct way to do it?
> > > >
> > > > Although this seems to work, with several dimension it would mean
> > > building
> > > > a lot of cubes and using a lot of space whereas in this case it would
> > not
> > > > necessarily be used. I know that aggregation groups can be used to
> > solve
> > > > reduce this. With the same example I created 1 aggregation group for
> > each
> > > > dimension and the expansion rate is 200%, but I tested only on 5
> > > dimensions.
> > > > Again, is this the correct way to do it?
> > > >
> > > > Relative to this topic, I saw:
> > > >
> > > > v0.7.x: InvertedIndex (HybridOLAP)
> > > > Goal:
> > > > Introduce InvertedIndex to optimise queries on raw data and low level
> > > > aggregation
> > > >
> > > > on https://issues.apache.org/jira/browse/KYLIN-577
> > > >
> > > > Is this something that is currently available in 0.7.2? This ticket
> > > dates back
> > > > from beginning 2015, so I am not sure if it reflects Kylin current
> plan
> > > or not.
> > >
> > >
> > >
> >
>

Re: 答复: Querying raw data / lowest granularity with Kylin

Posted by Li Yang <li...@apache.org>.

> ... at least one "group by" should always be used.

This is correct. So the lowest granularity Kylin provides is by grouping
all dimensions, which is what Alex has tried if I understand correctly.  We
believe this can solve 90% of analysis requirement.

> ... using a lot of space whereas in this case it would not necessarily be
used.

You can set dimensions to be "mandatory" such that less dimension
combinations will be calculated.  See more at
http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin

> "InvertedIndex" feature ... is still in early stage in terms of
functionality and stability.

Very true.  We have experimented "inverted-index" to solve two
requirements: 1) Neal Real Time data readiness in Kylin;  2) Query raw
data.  Later 1) is solved by another feature called Stream Cubing, thus the
priority of "inverted-index" greatly reduces since the need of raw record
analysis seems not strong.


Do you (or any one) see raw record query a must-have feature?  We'd like to
hear your use case.

Cheers
Yang

On Tue, Aug 11, 2015 at 8:30 AM, Luke Han <lu...@gmail.com> wrote:

> Currently, Kylin not support detail/raw data query, that's why you already
> knew you have add at least one "group by" in your query.
>
> As growing requirement about this feature, we actually are evaluating
> and will update our idea soon here.
>
> The roadmap is a little bit changed due to some priority changed.
> I'm drafting a new one for coming release.
>
> Please help to let's know if there are any feature, function or anything
> else which missing but your cases are really need them.
>
> Thanks.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Mon, Aug 10, 2015 at 6:17 PM, Huang Hua <hu...@mininglamp.com>
> wrote:
>
> > I haven't used the "InvertedIndex" feature, but I think the feature is
> > still in early stage in terms of functionality and stability.
> >
> > Back to the time when we were using with kylin-0.6, we had a very similar
> > use case that to drill down to the lowest granularity of the data.
> > What we did is to define the filter columns as dimensions(almost defined
> > as mandatory ones to avoid the cube expansion), all other result columns
> as
> > measures.
> >
> > You can think of our case more like using kylin to build query index in
> > HBase in order to support queries like "fetch all transactions given a
> user
> > or server user ids or user names or other filters so".
> > However, ultimately, we realized that maybe Kylin wasn't the best option
> > to support such queries, because Kylin is very good at rollup queries
> with
> > pre-computed measures and a limited number of filters. Perhaps with the
> > enhancement of "InvertedIndex" we can see more possibilities from Kylin
> > when dealing with the lowest granularity queries.
> >
> > Best,
> > Hua
> > > -----邮件原件-----
> > > 发件人: dev-return-3593-
> > > huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> > > 3593-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 alex
> > > schufo
> > > 发送时间: 2015年8月10日 17:24
> > > 收件人: dev@kylin.incubator.apache.org
> > > 主题: Querying raw data / lowest granularity with Kylin
> > >
> > > I have some scenarios where I would like to drill down to the lowest
> > > granularity of my table, does Kylin handle this?
> > >
> > > If I am not mistaken a least one "group by" should always be used.
> > >
> > > So I tried to query by grouping by all my dimensions at the same time :
> > > "select dim1, dim2, ..., dimN, sum(measure1), ..., sum(measureN) from
> ...
> > > where ... group by dim1, dim2, ..., dimN". This gives me the expected
> > results.
> > > Is this the correct way to do it?
> > >
> > > Although this seems to work, with several dimension it would mean
> > building
> > > a lot of cubes and using a lot of space whereas in this case it would
> not
> > > necessarily be used. I know that aggregation groups can be used to
> solve
> > > reduce this. With the same example I created 1 aggregation group for
> each
> > > dimension and the expansion rate is 200%, but I tested only on 5
> > dimensions.
> > > Again, is this the correct way to do it?
> > >
> > > Relative to this topic, I saw:
> > >
> > > v0.7.x: InvertedIndex (HybridOLAP)
> > > Goal:
> > > Introduce InvertedIndex to optimise queries on raw data and low level
> > > aggregation
> > >
> > > on https://issues.apache.org/jira/browse/KYLIN-577
> > >
> > > Is this something that is currently available in 0.7.2? This ticket
> > dates back
> > > from beginning 2015, so I am not sure if it reflects Kylin current plan
> > or not.
> >
> >
> >
>

Re: 答复: Querying raw data / lowest granularity with Kylin

Posted by Luke Han <lu...@gmail.com>.

Currently, Kylin not support detail/raw data query, that's why you already
knew you have add at least one "group by" in your query.

As growing requirement about this feature, we actually are evaluating
and will update our idea soon here.

The roadmap is a little bit changed due to some priority changed.
I'm drafting a new one for coming release.

Please help to let's know if there are any feature, function or anything
else which missing but your cases are really need them.

Thanks.




Best Regards!
---------------------

Luke Han

On Mon, Aug 10, 2015 at 6:17 PM, Huang Hua <hu...@mininglamp.com> wrote:

> I haven't used the "InvertedIndex" feature, but I think the feature is
> still in early stage in terms of functionality and stability.
>
> Back to the time when we were using with kylin-0.6, we had a very similar
> use case that to drill down to the lowest granularity of the data.
> What we did is to define the filter columns as dimensions(almost defined
> as mandatory ones to avoid the cube expansion), all other result columns as
> measures.
>
> You can think of our case more like using kylin to build query index in
> HBase in order to support queries like "fetch all transactions given a user
> or server user ids or user names or other filters so".
> However, ultimately, we realized that maybe Kylin wasn't the best option
> to support such queries, because Kylin is very good at rollup queries with
> pre-computed measures and a limited number of filters. Perhaps with the
> enhancement of "InvertedIndex" we can see more possibilities from Kylin
> when dealing with the lowest granularity queries.
>
> Best,
> Hua
> > -----邮件原件-----
> > 发件人: dev-return-3593-
> > huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> > 3593-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 alex
> > schufo
> > 发送时间: 2015年8月10日 17:24
> > 收件人: dev@kylin.incubator.apache.org
> > 主题: Querying raw data / lowest granularity with Kylin
> >
> > I have some scenarios where I would like to drill down to the lowest
> > granularity of my table, does Kylin handle this?
> >
> > If I am not mistaken a least one "group by" should always be used.
> >
> > So I tried to query by grouping by all my dimensions at the same time :
> > "select dim1, dim2, ..., dimN, sum(measure1), ..., sum(measureN) from ...
> > where ... group by dim1, dim2, ..., dimN". This gives me the expected
> results.
> > Is this the correct way to do it?
> >
> > Although this seems to work, with several dimension it would mean
> building
> > a lot of cubes and using a lot of space whereas in this case it would not
> > necessarily be used. I know that aggregation groups can be used to solve
> > reduce this. With the same example I created 1 aggregation group for each
> > dimension and the expansion rate is 200%, but I tested only on 5
> dimensions.
> > Again, is this the correct way to do it?
> >
> > Relative to this topic, I saw:
> >
> > v0.7.x: InvertedIndex (HybridOLAP)
> > Goal:
> > Introduce InvertedIndex to optimise queries on raw data and low level
> > aggregation
> >
> > on https://issues.apache.org/jira/browse/KYLIN-577
> >
> > Is this something that is currently available in 0.7.2? This ticket
> dates back
> > from beginning 2015, so I am not sure if it reflects Kylin current plan
> or not.
>
>
>