You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tony Hill (JIRA)" <ji...@apache.org> on 2019/03/21 11:02:00 UTC
[jira] [Created] (IMPALA-8331) Provide method for listing tables
without statistics
Tony Hill created IMPALA-8331:
---------------------------------
Summary: Provide method for listing tables without statistics
Key: IMPALA-8331
URL: https://issues.apache.org/jira/browse/IMPALA-8331
Project: IMPALA
Issue Type: New Feature
Components: Catalog
Reporter: Tony Hill
Lack of statistics can cause performance issues with Impala making "wrong" choices during the planning stage. It would be useful to provide a mechanism to list tables with without statistics, or with corrupt statistics.
It can be done indirectly with queries on the metastore DB:
Tables without columns stats:
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID WHERE NOT EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='COLUMN_STATS_ACCURATE' AND PARAM_VALUE='true');
Tables without row stats
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID WHERE NOT EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='numRows');
Impala shows messages about corrupt statistics where size >0 but row=0, i.e.
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID WHERE EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='numRows' AND PARAM_VALUE=0 ) AND EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='totalSize' AND PARAM_VALUE>0 );
(There are almost certainly more efficient queries !)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org