You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Antoni Ivanov (JIRA)" <ji...@apache.org> on 2018/01/04 16:53:00 UTC
[jira] [Created] (IMPALA-6367) Compute stats do not update
statistics for big tables
Antoni Ivanov created IMPALA-6367:
-------------------------------------
Summary: Compute stats do not update statistics for big tables
Key: IMPALA-6367
URL: https://issues.apache.org/jira/browse/IMPALA-6367
Project: IMPALA
Issue Type: Bug
Components: Backend, Catalog
Affects Versions: Impala 2.8.0
Environment: Impala - v2.8.0-cdh5.11.1
We are using Hive Metastore Database embedded (by cloudera)
It's postgres 8.4.20
OS: Centos
Reporter: Antoni Ivanov
Table with at least 10000 partitions and 100 columns
The table is partitioned by day(bigint), string (this partition cardinality is no bigger than 100)
Executing compute incremental stats without dynnamic partitioning takes about 1 hour.
So we use partitioning:
compute incremental stats table stats partition (some-condition) (I tried (day =X) -- or (day = X , string_part = Y) or (day < X and day > X - 3days) )
It finishes successfully but when I do show table stats for all the partitions in the range I get the following:
day string_part #Rows Incremental stats
1409529600 foo1 0 false
1409529600 foo2 0 false
The #Rows is 0 (the partition is not empty though) And "Incremental stats" column is set to false
Another case
If I execute compute incremental stats table stats partition
and then show table stats
day string_part #Rows Incremental stats
1409529600 foo1 13 false
1409529600 foo2 13 false
The #Rows is updated but "Incremental stats" remains False.
That's usually for smaller tables.
Note that the same happens if I do not use partition clause
Note also that I ran compute stats (without incremental) only for the big table (on our test server) and it had the same effect
Note that on production intermittently(not always) it happens for small tables (#Rows is 0 after compute stats)
But for the biggest tables it's always
In Impala there are 2500 tables with almost 900.000 partitions (accross all tables) with average of 20 columns per table (or 90.000 columns accross all tables), The biggest table has about 35000 partitions
We are using postgres provided by Cloudera as hive metastore backend
I am able to reproduce the issue in our testing setup - it has less than 100 tables and only one is big - 35000 partitions (which I copied from prod).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)