You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "UENISHI Kota (Jira)" <ji...@apache.org> on 2022/02/15 08:44:00 UTC
[jira] [Updated] (HDDS-6321) Avoid refresh pipeline for key lookup in checkAcls

     [ https://issues.apache.org/jira/browse/HDDS-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

UENISHI Kota updated HDDS-6321:
-------------------------------
    Description: 
In every ACL check under native Ozone authorizer, it calls [keyManager.checkAccess|#L162]. KeyManagerImpl#checkAccess [calls getFileStatus() as well|https://github.com/apache/ozone/blob/76aa27e7c05196ae00cba540efce4bb7529e5d15/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L1804], which finally [calls pipeline refresh()|#L2011]. Pipeline refresh is not needed here because it just obtains key ACL and no need for blocks. This causes additional external RPC call to SCM, which is unnecessary overhead on each object-get.

We observed this issue in our production cluster, as 50% increase of latency estimated from wall clock profile:

!Screenshot_2022-02-15_17-35-18.png|width=739,height=452!

Also, our monitoring shows 2x lookup key to OM, which increases SCM call count of GetContainerWithPipeline.

!29843180-8924-11ec-8ad5-5b5a8342f2d3.png|width=797,height=245!
!2b4df500-8924-11ec-927a-de3d8adc6fe0.png|width=798,height=239!

 

I'm not sure how to fix this issue regarding {color:#6e7781}HDDS-3658{color} . Cleanest way would be re-utilizing again refreshPipeline flag, but it'd be a hustle to consider all cases using getFileStatus(). HDDS-5450 may be give us some hints.

  was:
In every ACL check under native Ozone authorizer, it calls [keyManager.checkAccess|#L162].] KeyManagerImpl#checkAccess [calls getFileStatus() as well|https://github.com/apache/ozone/blob/76aa27e7c05196ae00cba540efce4bb7529e5d15/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L1804], which finally [calls pipeline refresh()|#L2011].] Pipeline refresh is not needed here because it just obtains key ACL and no need for blocks. This causes additional external RPC call to SCM, which is unnecessary overhead on each object-get.

We observed this issue in our production cluster, as 50% increase of latency estimated from wall clock profile:

!Screenshot_2022-02-15_17-35-18.png!

Also, our monitoring shows 2x lookup key to OM, which increases SCM call count of GetContainerWithPipeline.

!29843180-8924-11ec-8ad5-5b5a8342f2d3.png!
!2b4df500-8924-11ec-927a-de3d8adc6fe0.png!

 

I'm not sure how to fix this issue regarding {color:#6e7781}HDDS-3658{color} . Cleanest way would be re-utilizing again refreshPipeline flag, but it'd be a hustle to consider all cases using getFileStatus(). HDDS-5450 may be give us some hints.


> Avoid refresh pipeline for key lookup in checkAcls
> --------------------------------------------------
>
>                 Key: HDDS-6321
>                 URL: https://issues.apache.org/jira/browse/HDDS-6321
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Manager
>    Affects Versions: 1.2.0
>         Environment: OM setup with Native Ozone Authorizer
>            Reporter: UENISHI Kota
>            Priority: Major
>         Attachments: 29843180-8924-11ec-8ad5-5b5a8342f2d3.png, 2b4df500-8924-11ec-927a-de3d8adc6fe0.png, Screenshot_2022-02-15_17-35-18.png
>
>
> In every ACL check under native Ozone authorizer, it calls [keyManager.checkAccess|#L162]. KeyManagerImpl#checkAccess [calls getFileStatus() as well|https://github.com/apache/ozone/blob/76aa27e7c05196ae00cba540efce4bb7529e5d15/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L1804], which finally [calls pipeline refresh()|#L2011]. Pipeline refresh is not needed here because it just obtains key ACL and no need for blocks. This causes additional external RPC call to SCM, which is unnecessary overhead on each object-get.
> We observed this issue in our production cluster, as 50% increase of latency estimated from wall clock profile:
> !Screenshot_2022-02-15_17-35-18.png|width=739,height=452!
> Also, our monitoring shows 2x lookup key to OM, which increases SCM call count of GetContainerWithPipeline.
> !29843180-8924-11ec-8ad5-5b5a8342f2d3.png|width=797,height=245!
> !2b4df500-8924-11ec-927a-de3d8adc6fe0.png|width=798,height=239!
>  
> I'm not sure how to fix this issue regarding {color:#6e7781}HDDS-3658{color} . Cleanest way would be re-utilizing again refreshPipeline flag, but it'd be a hustle to consider all cases using getFileStatus(). HDDS-5450 may be give us some hints.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org