You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Dimitris Tsirogiannis (JIRA)" <ji...@apache.org> on 2017/09/12 19:18:00 UTC
[jira] [Resolved] (IMPALA-4799) Long running metadata load for
large tables blocks queries/loading all other tables for long time
[ https://issues.apache.org/jira/browse/IMPALA-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dimitris Tsirogiannis resolved IMPALA-4799.
-------------------------------------------
Resolution: Duplicate
> Long running metadata load for large tables blocks queries/loading all other tables for long time
> -------------------------------------------------------------------------------------------------
>
> Key: IMPALA-4799
> URL: https://issues.apache.org/jira/browse/IMPALA-4799
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 2.7.0
> Environment: Version: 2.7.0-cdh5.9.0
> OS: Centos.6.6
> Reporter: Antoni
> Assignee: Dimitris Tsirogiannis
> Attachments: catalogd-drop-test-table.log, catalogd-invalidate-metadata-example-logs.log, impalad-drop-test-table-logs.log, test-catalogd.threads.1, test-catalogd.threads.2, test-catalogd.threads.3, test-catalogd.threads.4, test-catalogd.threads.5, test-catalogd.threads.6, test-catalogd.threads.7, test-catalogd.threads.baseline, test-catalogd.threads.post1, test-catalogd.threads.post2
>
>
> If you have some big tables history.big_table_with_many_partitions and do a refresh it may take a long time.
> But this seems to block loading metadata for all other tables. That shouldn't be the case.
> (I am guess it's something to do with ensuring correct version of catalog topic update ? )
> For example the follow sequence can be used to duplicate the issue:
> impala-shell -k -i hdp-dn01 -q "invalidate metadata history.empty_table;" # invalidate metadata to force PriorotizeLoad of table next time it's queried
> impala-shell -k -i hdp-dn02 -q "refresh history.big_table_with_many_partitions;" # from different node
> impala-shell -k -i hdp-dn01 -q "select count(1) from history.empty_table;" # this always finishes a second or two after refresh history.big... finishes - which maybe minutes - even if the table is empty. - this query is started a second or so after the previous one and it always finishes after the previous one.
> See attached logs (see /tmp/catalogd-invalidate-metadata-example-logs.log) : in that case : empty_table is "history.sa_issue_rating" and big_table_with_many_partitions is "history.bundle"
> You may notice in teh logs multiple Publish updates after the refresh (ResetMetadata for history bundle (big_table_with_many_partitions) ) finishes
> Absolutely the same thing happens with drop table statement:
> impala-shell -k -i hdp-dn02 -q "refresh history.big_table_with_many_partitions;" # from different node
> impala-shell -k -i hdp-dn01 -q "drop table default.test_table;" # this always finishes a second or two after refresh history.big... finishes - which maybe minutes - even if the table is empty. - this query is started a second or so after the previous one
> Please see impalad-drop-test-table-logs.log and /tmp/catalogd-drop-test-table.log for logs about this.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)