You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Carl Steinbach (Updated) (JIRA)" <ji...@apache.org> on 2012/04/03 00:05:23 UTC

[jira] [Updated] (HIVE-2185) extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics)

     [ https://issues.apache.org/jira/browse/HIVE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2185:
---------------------------------

    Release Note: This patch added getSerDeStats() methods to the Serializer and Deserializer interfaces. Consequently, any SerDes which were compiled against the old interfaces will need to be recompiled against the new interfaces in order to work against Hive 0.8.0.
    
> extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics)
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2185
>                 URL: https://issues.apache.org/jira/browse/HIVE-2185
>             Project: Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers, Statistics
>            Reporter: Tomasz Nykiel
>            Assignee: Tomasz Nykiel
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2185.1.patch, HIVE-2185.2.patch, HIVE-2185.patch
>
>
> Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we collect statistics about the number of rows per partition/table. Other statistics (e.g., total table/partition size) are derived from the file system. 
> Here, we want to collect information about the sizes of uncompressed data, to be able to determine the efficiency of compression.
> Currently, a large part of statistics collection mechanism is hardcoded and not-easily extensible for other statistics.
> On top of adding the new statistic collected, it would be desirable to extend the collection mechanism, so any new statistics could be added easily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira