You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "sodonnel (via GitHub)" <gi...@apache.org> on 2024/04/16 15:35:30 UTC

[PR] HDDS-10704. Do not fail read of EC block if the last chunk is empty [ozone]

sodonnel opened a new pull request, #6540:
URL: https://github.com/apache/ozone/pull/6540

   ## What changes were proposed in this pull request?
   
   Due to [HDDS-10682](https://issues.apache.org/jira/browse/HDDS-10682) some EC blocks in a cluster could have an empty final chunk. These blocks will fail to read and could cause data to become unavailable, even though it is still present on disk.
   
   If the last chunk is empty, this should not stop the block from being empty.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-10704
   
   ## How was this patch tested?
   
   Adding a unit test for this issue is not easy within a sensible time.
   
   I tested this manually in a Docker cluster.
   
   First, I created a block with the problem in a build without the fix for HDDS-10682
   
   Then attempted to read the block in a docker-compose cluster and validated the log message was produced:
   
   ```
   bash-4.2$ dd if=/dev/random of=4mb bs=1024 count=4096
   4096+0 records in
   4096+0 records out
   4194304 bytes (4.2 MB) copied, 0.401662 s, 10.4 MB/s
   bash-4.2$ 
   bash-4.2$ ozone sh volume create /vol1
   bash-4.2$ ozone sh bucket create /vol1/bucket1
   bash-4.2$ 
   bash-4.2$ ozone sh key put --type=EC --replication=rs-3-2-1024k /vol1/bucket1/4mb 4mb
   bash-4.2$ 
   bash-4.2$ ozone admin container close 1
   bash-4.2$ 
   bash-4.2$ ozone admin container info 1
   Container id: 1
   Pipeline id: 4a6259e9-b22b-422f-9e53-f43df1f4596c
   Write PipelineId: 32e2dd3a-d3fe-44bd-918c-78efd1e7afab
   Write Pipeline State: OPEN
   Container State: CLOSED
   Datanodes: [6d9b61b2-60f1-47e0-b33c-31e5fb82c0f9/ozone-datanode-4.ozone_default,
   5bf013b0-8bf9-49c5-bec1-a69c702c6764/ozone-datanode-2.ozone_default,
   f11ff464-6007-4bac-b79c-33b784e53cc3/ozone-datanode-5.ozone_default,
   2261af6f-ecd4-4a18-8eb1-7988155cfc63/ozone-datanode-1.ozone_default,
   ae9b7d62-2eb2-43c3-ad68-aa67d8deab70/ozone-datanode-3.ozone_default]
   Replicas: [State: CLOSED; ReplicaIndex: 1; Origin: 6d9b61b2-60f1-47e0-b33c-31e5fb82c0f9; Location: 6d9b61b2-60f1-47e0-b33c-31e5fb82c0f9/ozone-datanode-4.ozone_default,
   State: CLOSED; ReplicaIndex: 2; Origin: ae9b7d62-2eb2-43c3-ad68-aa67d8deab70; Location: ae9b7d62-2eb2-43c3-ad68-aa67d8deab70/ozone-datanode-3.ozone_default,
   State: CLOSED; ReplicaIndex: 3; Origin: f11ff464-6007-4bac-b79c-33b784e53cc3; Location: f11ff464-6007-4bac-b79c-33b784e53cc3/ozone-datanode-5.ozone_default,
   State: CLOSED; ReplicaIndex: 4; Origin: 2261af6f-ecd4-4a18-8eb1-7988155cfc63; Location: 2261af6f-ecd4-4a18-8eb1-7988155cfc63/ozone-datanode-1.ozone_default,
   State: CLOSED; ReplicaIndex: 5; Origin: 5bf013b0-8bf9-49c5-bec1-a69c702c6764; Location: 5bf013b0-8bf9-49c5-bec1-a69c702c6764/ozone-datanode-2.ozone_default]
   ```
   
   Removed the docker containers for datanode-2 and datanode-5 and allow reconstruction to happen (this creates the zero length final chunk).
   
   Then read the block - note the added log message is produced:
   
   ```
   bash-4.2$ export OZONE_ROOT_LOGGER=INFO,console
   bash-4.2$ ozone sh key get /vol1/bucket1/4mb 4mb_copy5
   2024-04-16 15:23:14,426 [main] INFO protocolPB.OmTransportFactory: Loading OM transport implementation org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransportFactory as specified by configuration.
   2024-04-16 15:23:15,055 [main] INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
   2024-04-16 15:23:15,099 [main] INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
   2024-04-16 15:23:15,100 [main] INFO impl.MetricsSystemImpl: XceiverClientMetrics metrics system started
   2024-04-16 15:23:15,481 [main] WARN storage.BlockInputStream: The last chunk is empty for container/block 1/113750153625600001 with an offset of the block length. Likely due to HDDS-10682. This is safe to ignore.
   2024-04-16 15:23:15,502 [main] WARN storage.BlockInputStream: The last chunk is empty for container/block 1/113750153625600001 with an offset of the block length. Likely due to HDDS-10682. This is safe to ignore.
   ```
   
   Finally, I removed the docker containers for 1 and 3 to force reconstruction using the blocks with zero length chunks. Previously reconstruction would have failed forever in this situation. The containers were reconstructed and the expected log was present on the datanodes:
   
   ```
   ozone % docker-compose logs | grep "last chunk is empty for"
   datanode-9   | 2024-04-16 15:24:52,181 [b434f715-8b74-4b12-a6c5-ee583a27a087-ec-reconstruct-reader-TID-2] WARN storage.BlockInputStream: The last chunk is empty for container/block 1/113750153625600001 with an offset of the block length. Likely due to HDDS-10682. This is safe to ignore.
   datanode-9   | 2024-04-16 15:24:52,181 [b434f715-8b74-4b12-a6c5-ee583a27a087-ec-reconstruct-reader-TID-1] WARN storage.BlockInputStream: The last chunk is empty for container/block 1/113750153625600001 with an offset of the block length. Likely due to HDDS-10682. This is safe to ignore.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10704. Do not fail read of EC block if the last chunk is empty [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel merged PR #6540:
URL: https://github.com/apache/ozone/pull/6540


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10704. Do not fail read of EC block if the last chunk is empty [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #6540:
URL: https://github.com/apache/ozone/pull/6540#issuecomment-2060690494

   > I assume this should be: "... should not stop the block from being read".
   @adoroszlai  Yea, you are correct. I have updated the PR description to fix that typo. Thanks for the review!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org