You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/02/17 09:13:00 UTC

[jira] [Updated] (HDDS-6342) EC: Fix large write with multiple stripes upon stripe failure.

     [ https://issues.apache.org/jira/browse/HDDS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HDDS-6342:
---------------------------------
    Labels: pull-request-available  (was: )

> EC: Fix large write with multiple stripes upon stripe failure.
> --------------------------------------------------------------
>
>                 Key: HDDS-6342
>                 URL: https://issues.apache.org/jira/browse/HDDS-6342
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>              Labels: pull-request-available
>
> Test with ockg
> ./bin/ozone freon ockg -p test -n 50 -t 8 -s $((500*1024*1024)) --type=EC --replication=rs-10-4-1024k
> {code:java}
> 2022-02-15 12:43:11,295 [pool-2-thread-7] ERROR freon.BaseFreonGenerator: Error on executing task 46
> java.lang.IllegalArgumentException
>         at com.google.common.base.Preconditions.checkArgument(Preconditions.java:130)
>         at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.commitKey(BlockOutputStreamEntryPool.java:327)
>         at org.apache.hadoop.ozone.client.io.ECKeyOutputStream.close(ECKeyOutputStream.java:536)
>         at org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
>         at org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.lambda$createKey$36(OzoneClientKeyGenerator.java:150)
>         at com.codahale.metrics.Timer.time(Timer.java:101)
>         at org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.createKey(OzoneClientKeyGenerator.java:142)
>         at org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:183)
>         at org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:163)
>         at org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$1(BaseFreonGenerator.java:146)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748) {code}
> This happens only when write happen failure during parity write and there are > 1 already written stripes in the current block group.
> Upon this a new block group is picked for retrying the current stripe write, and the current block group should rollback its current position, the bug lies within the calculation of the acked length of the block group.
> Code references:
> {code:java}
> if (handleParityWrites(ecChunkSize, allocateBlockIfFull,
>     shouldClose) == StripeWriteStatus.FAILED) {
>   handleStripeFailure(numDataBlks * ecChunkSize, allocateBlockIfFull,
>       shouldClose);
> } else {
>   // At this stage stripe write is successful.
>   currentStreamEntry.updateBlockGroupToAckedPosition(
>       currentStreamEntry.getCurrentPosition());
> } {code}
> {code:java}
> private StripeWriteStatus rewriteStripeToNewBlockGroup(
>     int failedStripeDataSize, boolean allocateBlockIfFull, boolean close)
>     throws IOException {
>   long[] failedDataStripeChunkLens = new long[numDataBlks];
>   long[] failedParityStripeChunkLens = new long[numParityBlks];
>   final ByteBuffer[] dataBuffers = ecChunkBufferCache.getDataBuffers();
>   for (int i = 0; i < numDataBlks; i++) {
>     failedDataStripeChunkLens[i] = dataBuffers[i].limit();
>   }
>   final ByteBuffer[] parityBuffers = ecChunkBufferCache.getParityBuffers();
>   for (int i = 0; i < numParityBlks; i++) {
>     failedParityStripeChunkLens[i] = parityBuffers[i].limit();
>   }
>   blockOutputStreamEntryPool.getCurrentStreamEntry().resetToFirstEntry();
>   // Rollback the length/offset updated as part of this failed stripe write.
>   offset -= failedStripeDataSize;
>   blockOutputStreamEntryPool.getCurrentStreamEntry()
>       .resetToAckedPosition();                         <-- wrong position deteced
> ...
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org