You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/10/25 00:15:06 UTC

[jira] [Commented] (HIVE-15670) column_stats_accurate may not fit in PARTITION_PARAMS.VALUE

    [ https://issues.apache.org/jira/browse/HIVE-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217912#comment-16217912 ] 

Alexander Behm commented on HIVE-15670:
---------------------------------------

May I ask what's the purpose of storing this JSON in the tableproperties? Seems pretty expensive to me. If you want to keep track of the accuracy of column stats, why not populate a "last updated" timestamp in the appropriate column statistic?

> column_stats_accurate may not fit in PARTITION_PARAMS.VALUE
> -----------------------------------------------------------
>
>                 Key: HIVE-15670
>                 URL: https://issues.apache.org/jira/browse/HIVE-15670
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> The JSON can be too big with many columns (see setColumnStatsState method).
> We can make JSON more compact by only storing the list of columns with true values. Or we can even store a bitmask in a dedicated column, and adjust it when altering table (rare enough). Or we can just change the VALUE column to text blob (might be a painful change wrt upgrade scripts, and supporting all the DBs' varied blob implementations, esp. in directsql).
> Storing denormalized flags in a separate table will probably be slow, comparatively.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)