You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Bilahari T H (Jira)" <ji...@apache.org> on 2020/07/06 11:32:00 UTC

[jira] [Assigned] (HADOOP-17092) ABFS: Long waits and unintended retries when multiple threads try to fetch token using ClientCreds

     [ https://issues.apache.org/jira/browse/HADOOP-17092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bilahari T H reassigned HADOOP-17092:
-------------------------------------

    Assignee: Bilahari T H  (was: Sneha Vijayarajan)

> ABFS: Long waits and unintended retries when multiple threads try to fetch token using ClientCreds
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-17092
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17092
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>            Reporter: Sneha Vijayarajan
>            Assignee: Bilahari T H
>            Priority: Major
>             Fix For: 3.4.0
>
>
> Issue reported by DB:
> we recently experienced some problems with ABFS driver that highlighted a possible issue with long hangs following synchronized retries when using the _ClientCredsTokenProvider_ and calling _AbfsClient.getAccessToken_. We have seen [https://github.com/apache/hadoop/pull/1923|https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhadoop%2Fpull%2F1923&data=02%7c01%7csnvijaya%40microsoft.com%7c7362c5ba4af24a553c4308d807ec459d%7c72f988bf86f141af91ab2d7cd011db47%7c1%7c0%7c637268058650442694&sdata=FePBBkEqj5kI2Ty4kNr3a2oJgB8Kvy3NvyRK8NoxyH4%3D&reserved=0], but it does not directly apply since we are not using a custom token provider, but instead _ClientCredsTokenProvider_ that ultimately relies on _AzureADAuthenticator_. 
>  
> The problem was that the critical section of getAccessToken, combined with a possibly redundant retry policy, made jobs hanging for a very long time, since only one thread at a time could make progress, and this progress amounted to basically retrying on a failing connection for 30-60 minutes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org