You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Arun Sarin (Jira)" <ji...@apache.org> on 2023/02/06 16:30:00 UTC

[jira] [Created] (HDDS-7909) When DN is offline Read of EC data is failing [Failed to execute command GetBlock on the Pipeline]

Arun Sarin created HDDS-7909:
--------------------------------

             Summary: When DN is offline Read of EC data is failing [Failed to execute command GetBlock on the Pipeline]
                 Key: HDDS-7909
                 URL: https://issues.apache.org/jira/browse/HDDS-7909
             Project: Apache Ozone
          Issue Type: Bug
          Components: SCM
            Reporter: Arun Sarin


When DN is offline Read of EC data is failing

Getting the below error message:
{code:java}
GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|There are insufficient datanodes to read the EC block {code}
Stack Trace:
{code:java}
2023-02-03 14:05:31,610|INFO|MainThread|machine.py:188 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|RUNNING: /opt/cloudera/parcels/CDH/bin/ozone sh key get o3://ozone1/vol-x20w7/enc-buck-3yp31/decom_1675432802 /tmp/Get_file1675433131 2023-02-03 14:05:35,968|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:35 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties 2023-02-03 14:05:36,040|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2023-02-03 14:05:36,041|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO impl.MetricsSystemImpl: XceiverClientMetrics metrics system started 2023-02-03 14:05:36,937|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 4b386868-0719-4e2f-bd3b-bda45c921f97, Nodes: 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:36.904Z[Etc/UTC]]. 2023-02-03 14:05:36,938|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=4b386868-0719-4e2f-bd3b-bda45c921f97: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:36,980|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 34a3c677-ed98-428f-a0d9-a19f73f93116, Nodes: 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:36.970Z[Etc/UTC]]. 2023-02-03 14:05:36,981|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=34a3c677-ed98-428f-a0d9-a19f73f93116: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,014|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: aee4853d-cc99-43c8-a682-2dc4ad322242, Nodes: 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.003Z[Etc/UTC]]. 2023-02-03 14:05:37,016|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=aee4853d-cc99-43c8-a682-2dc4ad322242: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,039|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,034 [main] WARN io.ECBlockInputStreamProxy (ECBlockInputStreamProxy.java:read(180)) - Failing over to reconstruction read due to an error in ECBlockReader. Exception Class: org.apache.hadoop.ozone.client.io.BadDataLocationException , Exception Message: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,040|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN io.ECBlockInputStreamProxy: Failing over to reconstruction read due to an error in ECBlockReader. Exception Class: org.apache.hadoop.ozone.client.io.BadDataLocationException , Exception Message: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,057|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN erasurecode.ErasureCodeNative: Loading ISA-L failed: Failed to load libisal.so.2 (libisal.so.2: cannot open shared object file: No such file or directory) 2023-02-03 14:05:37,058|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable 2023-02-03 14:05:37,185|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: b3a482a5-33c9-40dd-8614-bfc136ec4479, Nodes: 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.163Z[Etc/UTC]]. 2023-02-03 14:05:37,187|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=b3a482a5-33c9-40dd-8614-bfc136ec4479: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,229|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: d18620b2-70cb-4f07-95b7-45d69980f100, Nodes: 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.220Z[Etc/UTC]]. 2023-02-03 14:05:37,230|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=d18620b2-70cb-4f07-95b7-45d69980f100: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,260|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 08dd8828-a81e-44ac-8757-aa1b66df2c72, Nodes: 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.250Z[Etc/UTC]]. 2023-02-03 14:05:37,261|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=08dd8828-a81e-44ac-8757-aa1b66df2c72: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,282|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,279 [main] WARN io.ECBlockReconstructedStripeInputStream (ECBlockReconstructedStripeInputStream.java:loadDataBuffersFromStream(590)) - Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC index 5. Excluding the block Exception: java.io.IOException Exception Message: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,284|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN io.ECBlockReconstructedStripeInputStream: Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC index 5. Excluding the block Exception: java.io.IOException Exception Message: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,331|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 59920096-eac8-40bd-86c6-4a2fb44edfc7, Nodes: 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.290Z[Etc/UTC]]. 2023-02-03 14:05:37,333|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=59920096-eac8-40bd-86c6-4a2fb44edfc7: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,362|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: d8b18f5b-1fbe-4493-b370-08e22eb0e64d, Nodes: 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.351Z[Etc/UTC]]. 2023-02-03 14:05:37,364|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=d8b18f5b-1fbe-4493-b370-08e22eb0e64d: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,390|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 78e3a9ff-df9d-4cbf-a584-b73254e06ce8, Nodes: 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.380Z[Etc/UTC]]. 2023-02-03 14:05:37,392|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=78e3a9ff-df9d-4cbf-a584-b73254e06ce8: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,411|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,409 [main] WARN io.ECBlockReconstructedStripeInputStream (ECBlockReconstructedStripeInputStream.java:loadDataBuffersFromStream(590)) - Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC index 4. Excluding the block Exception: java.io.IOException Exception Message: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,413|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN io.ECBlockReconstructedStripeInputStream: Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC index 4. Excluding the block Exception: java.io.IOException Exception Message: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,442|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|There are insufficient datanodes to read the EC block  {code}
Additional Debugging RCA was done and found out that there were sufficient number of DN's available at the time of key get operations. Below are the details :
 
EC Dn's are supposed to be 7 and are 7 in numbers
RATIS has to be 3 and those are 3 
EC Data node -
Datanodes':[u'hostname-1.hostname.root.hwx.site', u'hostname-7.hostname.root.hwx.site', u'hostname-2.hostname.root.hwx.site', u'hostname-6.hostname.root.hwx.site', u'hostname-3.hostname.root.hwx.site', u'hostname-5.hostname.root.hwx.site', u'hostname-8.hostname.root.hwx.site'],
Ratis DN available at this point 5
[u'hostname-2.hostname.root.hwx.site', u'hostname-3.hostname.root.hwx.site', u'hostname-1.hostname.root.hwx.site', u'hostname-7.hostname.root.hwx.site', u'hostname-6.hostname.root.hwx.site']

Adding the log files 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org