You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/08/13 21:50:00 UTC

[jira] [Commented] (IMPALA-8839) Impala writing data to tables should not lead to incorrect results in Hive

    [ https://issues.apache.org/jira/browse/IMPALA-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906645#comment-16906645 ] 

ASF subversion and git services commented on IMPALA-8839:
---------------------------------------------------------

Commit dfae1aea540edf75061fb5ba8b44a49b6cb93590 in impala's branch refs/heads/master from Yongzhi Chen
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dfae1ae ]

IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties

Hive depends on property COLUMN_STATS_ACCURATE to tell if the
stored statistics accurate. After Impala inserts data, it does
not set statistics values up-to-date(for example numRows).
Impala should unset COLUMN_STATS_ACCURATE to tell Hive the
stored stats are no longer accurate.
The patch impletes:
After Impala insert data,
Remove COLUMN_STATS_ACCURATE from table properties if it exists
Remove COLUMN_STATS_ACCURATE from partition params if it exists
Add helper methods to handle alter table/partition for acid
tables.

Implements the stats changes above for both acid/non-acid tables.

Tests:
Manual tests.
Run core tests.
Add ee tests to test interop with Hive for acid/external tables.

Change-Id: I13f4a77022a7112e10a07314359f927eae083deb
Reviewed-on: http://gerrit.cloudera.org:8080/14037
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Impala writing data to tables should not lead to incorrect results in Hive
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-8839
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8839
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 3.3.0
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>            Priority: Critical
>
> This include partitioned/unpartitioned tables:
> The proposed solution for this issue is that when Impala writes data to an unpartitioned table, it should update 'COLUMN_STATS_ACCURATE' json structure in table properties by removing its 'COLUMN_STATS' nested field (this will end up in TABLE_PARAMS table in HMS).
> The proposed solution for this issue is that when Impala writes data to a partitioned table, it should update 'COLUMN_STATS_ACCURATE' json structure by removing its 'COLUMN_STATS' nested field in the properties of the partitions where data was inserted (PARTITION_PARAMS table in HMS).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org