You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2022/02/07 16:28:00 UTC
[jira] [Commented] (HDDS-6258) EC: Read with stopped but not dead nodes gives IllegalStateException rather than InsufficientNodesExcepion

    [ https://issues.apache.org/jira/browse/HDDS-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488234#comment-17488234 ] 

Stephen O'Donnell commented on HDDS-6258:
-----------------------------------------

This issue is caused by a bug in the sufficientLocations check. For a small key in a container (under 1 chunk), there may be many other large keys in the same container. This means that all data locations will be reported for the parity + data blocks. However in the sufficientLocations method, we count the available data blocks and then count the "padding only" blocks, resulting in double counting the padding only blocks, and then the sufficientLocations check passes when it should fail.

> EC: Read with stopped but not dead nodes gives IllegalStateException rather than InsufficientNodesExcepion
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-6258
>                 URL: https://issues.apache.org/jira/browse/HDDS-6258
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> Attempting to read a key less than 1 chunk, with 3 of the 5 nodes stopped (both when not yet stale or stale), the read hangs for sometime and fails with:
> {code}
> $ ozone sh key get /vol1/bucket/ec1 /tmp/3_down
> java.lang.IllegalStateException
>     at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:33)
>     at org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.selectParityIndexes(ECBlockReconstructedStripeInputStream.java:432)
>     at org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.init(ECBlockReconstructedStripeInputStream.java:179)
>     at org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readStripe(ECBlockReconstructedStripeInputStream.java:285)
>     at org.apache.hadoop.ozone.client.io.ECBlockReconstructedInputStream.readStripe(ECBlockReconstructedInputStream.java:192)
>     at org.apache.hadoop.ozone.client.io.ECBlockReconstructedInputStream.selectNextBuffer(ECBlockReconstructedInputStream.java:109)
>     at org.apache.hadoop.ozone.client.io.ECBlockReconstructedInputStream.read(ECBlockReconstructedInputStream.java:83)
>     at org.apache.hadoop.ozone.client.io.ECBlockInputStreamProxy.read(ECBlockInputStreamProxy.java:156)
>     at org.apache.hadoop.ozone.client.io.ECBlockInputStreamProxy.read(ECBlockInputStreamProxy.java:171)
>     at org.apache.hadoop.ozone.client.io.ECBlockInputStreamProxy.read(ECBlockInputStreamProxy.java:141)
>     at org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:57)
>     at org.apache.hadoop.ozone.client.io.KeyInputStream.readWithStrategy(KeyInputStream.java:268)
>     at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:235)
>     at org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:56)
>     at java.base/java.io.InputStream.read(InputStream.java:205)
>     at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94)
>     at org.apache.hadoop.ozone.shell.keys.GetKeyHandler.execute(GetKeyHandler.java:88)
>     at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:98)
>     at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:44)
>     at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
>     at picocli.CommandLine.access$1300(CommandLine.java:145)
>     at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
>     at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
>     at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
>     at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
>     at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
>     at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
>     at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
>     at org.apache.hadoop.ozone.shell.OzoneShell.lambda$execute$0(OzoneShell.java:55)
>     at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:159)
>     at org.apache.hadoop.ozone.shell.OzoneShell.execute(OzoneShell.java:53)
>     at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
>         at org.apache.hadoop.ozone.shell.OzoneShell.main(OzoneShell.java:47)
> {code}
> After the nodes are marked dead and the replicas no longer present in SCM, we get the expected error immediately:
> {code}
> ozone sh key get /vol1/bucket/ec1 /tmp/3_down_dead
> There are insufficient datanodes to read the EC block
> {code}
> We should fail with a better error here.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org