You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Aravindan Vijayan (Jira)" <ji...@apache.org> on 2021/10/18 19:02:00 UTC

[jira] [Updated] (HDDS-5861) Recon container report processing can slow down when there are a lot of new containers to consume

     [ https://issues.apache.org/jira/browse/HDDS-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aravindan Vijayan updated HDDS-5861:
------------------------------------
    Description: 
Recon checks and adds a container from SCM whenever it sees it for the first time. When there are a lot of new containers for Recon to consume due to it being down for a long time, then this report processing can hang on the RPC call, or even worse cause more bottleneck issues if SCM is down. 

{code}
EventQueue-ContainerReportForReconContainerReportHandler
PRIORITY : 5

THREAD ID : 0X00007F2A6DDC3000

NATIVE ID : 0XD324

NATIVE ID (DECIMAL) : 54052

STATE : BLOCKED


stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.ipc.Client$Connection.addCall(Client.java:521)
- waiting to lock <0x00007f1a70482730> (a org.apache.hadoop.ipc.Client$Connection)
at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:413)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1623)
at org.apache.hadoop.ipc.Client.call(Client.java:1452)
at org.apache.hadoop.ipc.Client.call(Client.java:1405)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy41.submitRequest(Unknown Source)
at jdk.internal.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.5/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@11.0.5/Method.java:566)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
- locked <0x00007f1a6ca20ad8> (a org.apache.hadoop.io.retry.RetryInvocationHandler$Call)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
at com.sun.proxy.$Proxy41.submitRequest(Unknown Source)
at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.submitRpcRequest(StorageContainerLocationProtocolClientSideTranslatorPB.java:154)
at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.submitRequest(StorageContainerLocationProtocolClientSideTranslatorPB.java:144)
at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolClientSideTranslatorPB.java:230)
at org.apache.hadoop.ozone.recon.spi.impl.StorageContainerServiceProviderImpl.getContainerWithPipeline(StorageContainerServiceProviderImpl.java:63)
at org.apache.hadoop.ozone.recon.scm.ReconContainerManager.checkAndAddNewContainer(ReconContainerManager.java:122)
at org.apache.hadoop.ozone.recon.scm.ReconContainerReportHandler.onMessage(ReconContainerReportHandler.java:62)
at org.apache.hadoop.ozone.recon.scm.ReconContainerReportHandler.onMessage(ReconContainerReportHandler.java:38)
at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
at org.apache.hadoop.hdds.server.events.SingleThreadExecutor$$Lambda$405/0x00007f19c2857d08.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.5/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.5/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
{code}


  was:
Recon checks and adds a container from SCM whenever it sees it for the first time. When there are a lot of new containers for Recon to consume due to it being down for a long time, then this report processing can hang on the RPC call, or even worse cause more bottleneck issues if SCM is down. 

{code}
at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.submitRpcRequest(StorageContainerLocationProtocolClientSideTranslatorPB.java:154)
at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.submitRequest(StorageContainerLocationProtocolClientSideTranslatorPB.java:144)
at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolClientSideTranslatorPB.java:230)
at org.apache.hadoop.ozone.recon.spi.impl.StorageContainerServiceProviderImpl.getContainerWithPipeline(StorageContainerServiceProviderImpl.java:63)
at org.apache.hadoop.ozone.recon.scm.ReconContainerManager.checkAndAddNewContainer(ReconContainerManager.java:122)
at org.apache.hadoop.ozone.recon.scm.ReconContainerReportHandler.onMessage(ReconContainerReportHandler.java:62)
at org.apache.hadoop.ozone.recon.scm.ReconContainerReportHandler.onMessage(ReconContainerReportHandler.java:38)
at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
{code}



> Recon container report processing can slow down when there are a lot of new containers to consume
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-5861
>                 URL: https://issues.apache.org/jira/browse/HDDS-5861
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Recon
>    Affects Versions: 1.2.0
>            Reporter: Aravindan Vijayan
>            Priority: Major
>             Fix For: 1.3.0
>
>
> Recon checks and adds a container from SCM whenever it sees it for the first time. When there are a lot of new containers for Recon to consume due to it being down for a long time, then this report processing can hang on the RPC call, or even worse cause more bottleneck issues if SCM is down. 
> {code}
> EventQueue-ContainerReportForReconContainerReportHandler
> PRIORITY : 5
> THREAD ID : 0X00007F2A6DDC3000
> NATIVE ID : 0XD324
> NATIVE ID (DECIMAL) : 54052
> STATE : BLOCKED
> stackTrace:
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.apache.hadoop.ipc.Client$Connection.addCall(Client.java:521)
> - waiting to lock <0x00007f1a70482730> (a org.apache.hadoop.ipc.Client$Connection)
> at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:413)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1623)
> at org.apache.hadoop.ipc.Client.call(Client.java:1452)
> at org.apache.hadoop.ipc.Client.call(Client.java:1405)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy41.submitRequest(Unknown Source)
> at jdk.internal.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
> at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.5/DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(java.base@11.0.5/Method.java:566)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
> at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
> at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
> at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
> - locked <0x00007f1a6ca20ad8> (a org.apache.hadoop.io.retry.RetryInvocationHandler$Call)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
> at com.sun.proxy.$Proxy41.submitRequest(Unknown Source)
> at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.submitRpcRequest(StorageContainerLocationProtocolClientSideTranslatorPB.java:154)
> at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.submitRequest(StorageContainerLocationProtocolClientSideTranslatorPB.java:144)
> at org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolClientSideTranslatorPB.java:230)
> at org.apache.hadoop.ozone.recon.spi.impl.StorageContainerServiceProviderImpl.getContainerWithPipeline(StorageContainerServiceProviderImpl.java:63)
> at org.apache.hadoop.ozone.recon.scm.ReconContainerManager.checkAndAddNewContainer(ReconContainerManager.java:122)
> at org.apache.hadoop.ozone.recon.scm.ReconContainerReportHandler.onMessage(ReconContainerReportHandler.java:62)
> at org.apache.hadoop.ozone.recon.scm.ReconContainerReportHandler.onMessage(ReconContainerReportHandler.java:38)
> at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
> at org.apache.hadoop.hdds.server.events.SingleThreadExecutor$$Lambda$405/0x00007f19c2857d08.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.5/ThreadPoolExecutor.java:1128)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.5/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org