You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/03/16 01:18:00 UTC
[jira] [Commented] (IMPALA-10420) Impala slow performance and crash

    [ https://issues.apache.org/jira/browse/IMPALA-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302139#comment-17302139 ] 

Quanlong Huang commented on IMPALA-10420:
-----------------------------------------

[~manirajv06@gmail.com] Thanks for reporting this issue! I have some questions:

Which catalog mode are you using? If you are using the legacy catalog mode, could you have a try on the local catalog mode?
 Apache Ref: [https://impala.apache.org/docs/build/html/topics/impala_metadata.html]
 CDH Ref: [https://docs.cloudera.com/best-practices/latest/impala-performance/topics/bp-impala-enable-on-demand-metadata-fetch.html]

What's the scale of your warehouse, e.g. number of tables, number of partitions and files of the largest table? This helps to understand the scale of the metadata footprint.

> Impala slow performance and crash
> ---------------------------------
>
>                 Key: IMPALA-10420
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10420
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Manikandan R
>            Priority: Major
>
> At times, Impala daemon has been performing very badly for sometime (slightly for more than 1 hour) and crashed with OOM errors.
> Stack trace:
> I1229 18:58:30.675091 108919 Frontend.java:874] Waiting for local catalog to be initialized, attempt: 41
> I1229 18:58:32.675457 108919 Frontend.java:874] Waiting for local catalog to be initialized, attempt: 42
> 5:06
> I1229 19:24:11.218081 108919 Frontend.java:874] Waiting for local catalog to be initialized, attempt: 99
> I1229 19:24:11.632340 109479 jni-util.cc:211] java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.util.Arrays.copyOfRange(Arrays.java:3664)
>         at java.lang.String.<init>(String.java:207)
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>         at org.apache.hadoop.hive.common.FileUtils.escapePathName(FileUtils.java:287)
>         at org.apache.hadoop.hive.common.FileUtils.makePartName(FileUtils.java:153)
> During this 1 hour or so period, We are seeing lot of retry attempts with respect to catalog initialization log messages, heap space issue, catalog update etc
> I1229 18:27:21.157497 80168 status.cc:125] OutOfMemoryError: Java heap space
>   @      0x95b479 impala::Status::Status()
>   @      0xca3f22 impala::JniUtil::GetJniExceptionMsg()
>   @      0xba3be8 impala::Frontend::UpdateCatalogCache()
>   @      0xbc1589 impala::ImpalaServer::CatalogUpdateCallback()
>   @      0xc62c73 impala::StatestoreSubscriber::UpdateState()
>   @      0xc68963 impala::StatestoreSubscriberThriftIf::UpdateState()
>   @     0x10f0fc8 impala::StatestoreSubscriberProcessor::process_UpdateState()
>   @     0x10f0204 impala::StatestoreSubscriberProcessor::dispatchCall()
>   @      0x92bb4c apache::thrift::TDispatchProcessor::process()
>   @      0xafc6df apache::thrift::server::TAcceptQueueServer::Task::run()
>   @      0xaf6fd5 impala::ThriftThread::RunRunnable()
>   @      0xaf7db2 boost::detail::function::void_function_obj_invoker0<>::invoke()
>   @      0xd16c83 impala::Thread::SuperviseThread()
>   @      0xd173c4 boost::detail::thread_data<>::run()
>   @     0x128fada (unknown)
>   @   0x7f4328a89ea5 start_thread
>   @   0x7f43287b28dd __clone
> E1229 18:27:21.157521 80168 impala-server.cc:1454] There was an error processing the impalad catalog update. Requesting a full topic update to recover: OutOfMemoryError: Java heap space
> 4:16
> I1229 17:06:27.922144 93138 status.cc:125] OutOfMemoryError: GC overhead limit exceeded
>   @      0x95b479 impala::Status::Status()
>   @      0xca3f22 impala::JniUtil::GetJniExceptionMsg()
>   @      0xba3be8 impala::Frontend::UpdateCatalogCache()
>   @      0xbc1589 impala::ImpalaServer::CatalogUpdateCallback()
>   @      0xc62c73 impala::StatestoreSubscriber::UpdateState()
>   @      0xc68963 impala::StatestoreSubscriberThriftIf::UpdateState()
>   @     0x10f0fc8 impala::StatestoreSubscriberProcessor::process_UpdateState()
>   @     0x10f0204 impala::StatestoreSubscriberProcessor::dispatchCall()
>   @      0x92bb4c apache::thrift::TDispatchProcessor::process()
>   @      0xafc6df apache::thrift::server::TAcceptQueueServer::Task::run()
>   @      0xaf6fd5 impala::ThriftThread::RunRunnable()
>   @      0xaf7db2 boost::detail::function::void_function_obj_invoker0<>::invoke()
>   @      0xd16c83 impala::Thread::SuperviseThread()
>   @      0xd173c4 boost::detail::thread_data<>::run()
>   @     0x128fada (unknown)
>   @   0x7f4328a89ea5 start_thread
>   @   0x7f43287b28dd __clone
> E1229 17:06:27.922240 93138 impala-server.cc:1454] There was an error processing the impalad catalog update. Requesting a full topic update to recover: OutOfMemoryError: GC overhead limit exceeded
> When we try to do restarts, it was able to come up again. So, we are used to restart catalog first and all Impala demons one by one to bring the Impala cluster to stable state. Since, this particular Impala daemon has gone bad, whole cluster service is coming down (evident from query completion time) because some fragments of the queries is being processed through backend connection by this daemon. Reason for restarting catalog first and all impala daemons later because we suspect that for some reasons loading catalog metadata into its Impala daemon’s own cache is creating memory pressure on the impala daemon
> .
> On second occurrence, we tried running “invalidate metadata” command on some other impala daemon and restarted bad impala daemon. This approach also helped us.
> Net to net, observation and suspicious place is, loading catalog metadata into Impala daemon’s own cache. During this 1 hour or so period, We don’t any see abnormality in Cloudera Manager dashboard metrics especially on mem_rss, tcmalloc metrics etc. Since there is no sign on health issues, other impala daemons are forwarding fragments and Impala load balancer also forwarding client requests to this problematic daemon, which is degrading the whole service.
> Also, we had come across this -
> https://issues.apache.org/jira/browse/IMPALA-5459



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org