You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2010/05/22 01:18:16 UTC
[jira] Created: (HIVE-1362) column level statistics
column level statistics
-----------------------
Key: HIVE-1362
URL: https://issues.apache.org/jira/browse/HIVE-1362
Project: Hadoop Hive
Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ahmed M Aly
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1362) column level statistics
Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870229#action_12870229 ]
Ning Zhang commented on HIVE-1362:
----------------------------------
This is the 2nd subtask of HIVE-33 (stats in Hive tables).
We will gather column level stats based on users' request. It also depends on HIVE-1361 in that the metastore API should suport storing and retrieving stats.
The major milestone for this subtasks are:
1) add a new HiveQL command to gather column level stats. Please see HIVE-33 for the syntax.
2) add new UDFs/UDAFs to compute these statistics.
The proposed statistics are:
- number of distinct values
- number of NULL values
- min/max k values where k could be given by user
- histogram: frequency and height balanced
- average size of the column
- avg/sum of all values in the column if their type is numerical
- percentiles of the value
> column level statistics
> -----------------------
>
> Key: HIVE-1362
> URL: https://issues.apache.org/jira/browse/HIVE-1362
> Project: Hadoop Hive
> Issue Type: Sub-task
> Reporter: Ning Zhang
> Assignee: Ahmed M Aly
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.