You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tajo.apache.org by "Jihoon Son (JIRA)" <ji...@apache.org> on 2013/10/16 14:41:41 UTC

[jira] [Comment Edited] (TAJO-256) Support data cube (Umbrella)

    [ https://issues.apache.org/jira/browse/TAJO-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796731#comment-13796731 ] 

Jihoon Son edited comment on TAJO-256 at 10/16/13 12:40 PM:
------------------------------------------------------------

Group by extension queries require significantly high overhead.
Thus, the query optimization, especially for the distributed plan is very important.

Statistics such as histogram are very useful for the query optimization. 
Unfortunately, the current Tajo doesn't store any statistics for raw tables.

In this case, the sample-based cost estimation is a good solution.
In the sample-base cost estimation, the aggregation query is executed for the sampled table before executing the query for the original table.
Here, statistics of the sampled data are collected during the query execution.
After that, more optimized query planning for the original table is possible using the collected statistics.

So, I added the sample-based cost estimation to this issue.


was (Author: jihoonson):
Group by extension queries require significantly high overhead.
Thus, the query optimization, especially the distributed plan is very important.

Statistics such as histogram are very useful for the query optimization. 
Unfortunately, the current Tajo doesn't store any statistics for raw tables.

In this case, the sample-based cost estimation is a good solution.
In the sample-base cost estimation, the aggregation query is executed for the sampled table before executing the query for the original table.
Here, statistics of the sampled data are collected during the query execution.
After that, more optimized query planning for the original table is possible using the collected statistics.

So, I added the sample-based cost estimation to this issue.

> Support data cube (Umbrella)
> ----------------------------
>
>                 Key: TAJO-256
>                 URL: https://issues.apache.org/jira/browse/TAJO-256
>             Project: Tajo
>          Issue Type: New Feature
>          Components: catalog, distributed query plan, parser
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>             Fix For: 0.3-incubating
>
>
> This issue includes follows sub issues
> * SQL support of group by extensions (GROUPING SETS, CUBE, ROLLUP)
> * Query execution of group by extensions
> * GROUPING() function
> * Data cube materialization process
> * Cube schema maintenance
> * Sample-based cost estimation



--
This message was sent by Atlassian JIRA
(v6.1#6144)