You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Tomasz Nykiel (JIRA)" <ji...@apache.org> on 2011/06/02 22:39:49 UTC
[jira] [Updated] (HIVE-2185) extend table statistics to store the
size of uncompressed data (+extend interfaces for collecting other types of
statistics)
[ https://issues.apache.org/jira/browse/HIVE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tomasz Nykiel updated HIVE-2185:
--------------------------------
Attachment: HIVE-2185.2.patch
Fixed some minor issues.
Renamed the metric to rawDataSize
> extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics)
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-2185
> URL: https://issues.apache.org/jira/browse/HIVE-2185
> Project: Hive
> Issue Type: New Feature
> Components: Serializers/Deserializers, Statistics
> Reporter: Tomasz Nykiel
> Assignee: Tomasz Nykiel
> Attachments: HIVE-2185.1.patch, HIVE-2185.2.patch, HIVE-2185.patch
>
>
> Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we collect statistics about the number of rows per partition/table. Other statistics (e.g., total table/partition size) are derived from the file system.
> Here, we want to collect information about the sizes of uncompressed data, to be able to determine the efficiency of compression.
> Currently, a large part of statistics collection mechanism is hardcoded and not-easily extensible for other statistics.
> On top of adding the new statistic collected, it would be desirable to extend the collection mechanism, so any new statistics could be added easily.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira