You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Duong (Jira)" <ji...@apache.org> on 2023/01/19 00:34:00 UTC

[jira] [Commented] (HDDS-7755) S3G cannot acquire VOLUME_LOCK lock while holding [BUCKET_LOCK]

    [ https://issues.apache.org/jira/browse/HDDS-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678439#comment-17678439 ] 

Duong commented on HDDS-7755:
-----------------------------

This is actually a result of bad handling of locks, i.e. locks are not released for edge cases like validation failures. For example, consider the following example (in [KeyManagerImpl|https://github.com/apache/ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L351-L351]).
{code:java}
private OmKeyInfo readKeyInfo(OmKeyArgs args) throws IOException {
  String volumeName = args.getVolumeName();
  String bucketName = args.getBucketName();
  String keyName = args.getKeyName();

  metadataManager.getLock().acquireReadLock(BUCKET_LOCK, volumeName,
      bucketName);

  BucketLayout bucketLayout = getBucketLayout(metadataManager, args.getVolumeName(), args.getBucketName());

  OmKeyInfo value = null;
  try {
    .... read key info
  } finally {
    metadataManager.getLock().releaseReadLock(BUCKET_LOCK, volumeName,
        bucketName);
  } 
...{code}
When there exception getting bucket layout, e.g. "Bucket not found" error like we have in HDDS-7801, OM will not release the acquired BUCKET_LOCK. As a result, the acquired lock context is not cleaned up in the thread context (ThreadLocal).
The next time some operation is invoked on the same thread (IPC handler threads are very likely to be reused in a busy environment), it will fail to acquire locks.

 

We have to ensure that acquired locks are always released.

> S3G cannot acquire VOLUME_LOCK lock while holding [BUCKET_LOCK]
> ---------------------------------------------------------------
>
>                 Key: HDDS-7755
>                 URL: https://issues.apache.org/jira/browse/HDDS-7755
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Arun Sarin
>            Assignee: Ethan Rose
>            Priority: Major
>
> S3G cannot acquire VOLUME_LOCK lock while holding [BUCKET_LOCK].
> Log from internal tests:
> {code:java}
> 2022-12-09 12:57:32,982|INFO|MainThread|machine.py:230 - run()||GUID=39760bfb-bb7f-434e-8fa8-368701270fa3|Exit Code: 0 2022-12-09 12:57:32,982|INFO|MainThread|machine.py:188 - run()||GUID=c638945f-3903-4cec-81a9-cb15951608df|RUNNING: aws s3api --endpoint https://<clustername>:9879/ --ca-bundle=/usr/local/share/ca-certificates/ca.crt get-object --bucket erycnbhu --key file1 /tmp/getObjectFile1670590652 2022-12-09 12:57:33,475|INFO|MainThread|machine.py:203 - run()||GUID=c638945f-3903-4cec-81a9-cb15951608df| 2022-12-09 12:57:33,476|INFO|MainThread|machine.py:203 - run()||GUID=c638945f-3903-4cec-81a9-cb15951608df|An error occurred (NoSuchBucket) when calling the GetObject operation: The specified bucket does not exist 2022-12-09 12:57:33,547|INFO|MainThread|machine.py:232 - run()||GUID=c638945f-3903-4cec-81a9-cb15951608df|Exit Code: 255  {code}
> According to the locks priority mentioned in documentation 
> [https://ozone.apache.org/docs/1.0.0/design/locks.html]
> A higher priority lock in this case VOLUME_LOCK can't be acquired when we have lower priority lock i.e. BUCKET_LOCK
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org