You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gunther Hagleitner (JIRA)" <ji...@apache.org> on 2017/06/23 23:17:00 UTC
[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability:
difficult to access # of table records
[ https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061627#comment-16061627 ]
Gunther Hagleitner commented on HIVE-16938:
-------------------------------------------
[~cartershanklin] I've followed your lead and created views for table and partition stats. Wondering if we should add another view that's a union of the two and does something like: "<db name>, <table name>, <partition value or null if flat table>, <stats columns>", otherwise you will still have to join with tbls and partitions to get your stats.
> INFORMATION_SCHEMA usability: difficult to access # of table records
> --------------------------------------------------------------------
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
> Issue Type: Bug
> Reporter: Carter Shanklin
> Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system wide.
> This information is in the table_params table but the structure of this table makes it quite inconvenient to access since it is essentially a table of key-value pairs. More table stats are likely to be added over time, especially because of ACID. It would be a lot better if this were a first class table.
> For what it's worth I deal with the current table by pivoting it into something easier to deal with as follows:
> {code}
> create view table_stats as
> select
> tbl_id,
> max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as COLUMN_STATS_ACCURATE,
> max(case param_key when 'numFiles' then param_value end) as numFiles,
> max(case param_key when 'numRows' then param_value end) as numRows,
> max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
> max(case param_key when 'totalSize' then param_value end) as totalSize,
> max(case param_key when 'transient_lastDdlTime' then param_value end) as transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats first-class like column stats currently are.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)