You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2021/10/20 20:37:12 UTC

[jira] [Updated] (HDDS-3669) SCM Infinite loop in BlockManagerImpl.allocateBlock

     [ https://issues.apache.org/jira/browse/HDDS-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Rose updated HDDS-3669:
-----------------------------
    Target Version/s: 1.3.0  (was: 1.2.0)

I am managing the 1.2.0 release and we currently have more than 600 issues targeted for 1.2.0. I am moving the target field to 1.3.0.

If you are actively working on this jira and believe this should be targeted for the 1.2.0 release, Please reach out to me via Apache email or Slack.

> SCM Infinite loop in BlockManagerImpl.allocateBlock
> ---------------------------------------------------
>
>                 Key: HDDS-3669
>                 URL: https://issues.apache.org/jira/browse/HDDS-3669
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>    Affects Versions: 1.0.0
>            Reporter: Baolong Mao
>            Assignee: Baolong Mao
>            Priority: Major
>              Labels: Triaged
>
> The following step can reproduce this issue
> - A new ozone cluster with only a factor three pipeline
> - put a big file(1G) into cluster, during the put process,  we kill the leader datanode of this pipeline.
> The put command will hang, the following log will fill the scm log file.
> 2020-05-27 17:32:46,988 [IPC Server handler 23 on default port 9863] WARN org.apache.hadoop.hdds.scm.container.SCMContainerManager: Container allocation failed for pipeline=Pipeline[ Id: bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1, Nodes: e859cad9-c7f6-451a-a039-af06103aa978{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}1cd2bf20-a791-42a0-b4cd-b26d995cb8eb{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}0827f3bb-0d94-435a-a157-4db2c84cdedf{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:3, State:OPEN, leaderId:0827f3bb-0d94-435a-a157-4db2c84cdedf, CreationTimestamp2020-05-27T08:05:36.590Z] requiredSize=268435456 {}
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: PipelineID=bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1 not found
>         at org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getContainers(PipelineStateMap.java:301)
>         at org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getContainers(PipelineStateManager.java:95)
>         at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getContainersInPipeline(SCMPipelineManager.java:360)
>         at org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainersForOwner(SCMContainerManager.java:507)
>         at org.apache.hadoop.hdds.scm.container.SCMContainerManager.getMatchingContainer(SCMContainerManager.java:428)
>         at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:230)
>         at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:190)
>         at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:167)
>         at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:119)
>         at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74)
>         at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:100)
>         at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13303)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org