You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Anja Gruenheid (JIRA)" <ji...@apache.org> on 2011/02/02 02:57:29 UTC

[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms

    [ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989494#comment-12989494 ] 

Anja Gruenheid commented on HIVE-1940:
--------------------------------------

As first step, I would like to take a closer look at collecting meta data on the column level. In issue HIVE-33, five different statistics are described (# distinct values, # null values, 3 min values, 3 max values, avg size of column) that have been proposed as column meta data. As reference, I would take the implementation of the table/partition meta data collection.
As far as I can tell, deriving histograms is a little bit more complex than obtaining column information, which is why I want to start out with that.

Is there an up-to-date MetaStore DDL script or an E/R model?

> Query Optimization Using Column Metadata and Histograms
> -------------------------------------------------------
>
>                 Key: HIVE-1940
>                 URL: https://issues.apache.org/jira/browse/HIVE-1940
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>            Reporter: Anja Gruenheid
>
> The current basis for cost-based query optimization in Hive is information gathered on tables and partitions. To make further improvements in query optimization possible, the next step is to develop and implement possibilities to gather information on columns as discussed in issue HIVE-33. After that, an implementation of histograms is a possible option to use and collect run-time statistics. Next to the actual implementation of these features, it is also necessary to develop a consistent storage model for the MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira