You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Dimitris Tsirogiannis (JIRA)" <ji...@apache.org> on 2017/09/12 19:18:00 UTC

[jira] [Resolved] (IMPALA-4799) Long running metadata load for large tables blocks queries/loading all other tables for long time

     [ https://issues.apache.org/jira/browse/IMPALA-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dimitris Tsirogiannis resolved IMPALA-4799.
-------------------------------------------
    Resolution: Duplicate

> Long running metadata load for large tables blocks queries/loading all other tables for long time
> -------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-4799
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4799
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 2.7.0
>         Environment: Version: 2.7.0-cdh5.9.0
> OS: Centos.6.6
>            Reporter: Antoni
>            Assignee: Dimitris Tsirogiannis
>         Attachments: catalogd-drop-test-table.log, catalogd-invalidate-metadata-example-logs.log, impalad-drop-test-table-logs.log, test-catalogd.threads.1, test-catalogd.threads.2, test-catalogd.threads.3, test-catalogd.threads.4, test-catalogd.threads.5, test-catalogd.threads.6, test-catalogd.threads.7, test-catalogd.threads.baseline, test-catalogd.threads.post1, test-catalogd.threads.post2
>
>
> If you have some big tables history.big_table_with_many_partitions and do a refresh it may take a long time.
> But this seems to block loading metadata for all other tables. That shouldn't be the case. 
> (I am guess it's something to do with ensuring correct version of catalog topic update ? )
> For example the follow sequence can be used to duplicate the issue: 
> impala-shell -k -i hdp-dn01 -q "invalidate metadata history.empty_table;" # invalidate metadata to force PriorotizeLoad of table next time it's queried
> impala-shell -k -i hdp-dn02 -q "refresh history.big_table_with_many_partitions;" # from different node
> impala-shell -k -i hdp-dn01 -q "select count(1) from history.empty_table;" # this always finishes a second or two after refresh history.big... finishes - which maybe minutes - even if the table is empty. - this query is started a second or so after the previous one and it always finishes after the previous one. 
> See attached logs (see /tmp/catalogd-invalidate-metadata-example-logs.log) : in that case : empty_table is "history.sa_issue_rating"  and  big_table_with_many_partitions is "history.bundle"
> You may notice in teh logs multiple Publish updates after the refresh (ResetMetadata for history bundle (big_table_with_many_partitions) ) finishes 
> Absolutely the same thing happens with drop table statement: 
> impala-shell -k -i hdp-dn02 -q "refresh history.big_table_with_many_partitions;" # from different node
> impala-shell -k -i hdp-dn01 -q "drop table default.test_table;"  # this always finishes a second or two after refresh history.big... finishes - which maybe minutes - even if the table is empty. - this query is started a second or so after the previous one 
> Please see impalad-drop-test-table-logs.log and /tmp/catalogd-drop-test-table.log for logs about this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)