You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/10/31 10:11:00 UTC

[jira] [Updated] (HUDI-5103) Does not work with Azure Data lake Gen2

     [ https://issues.apache.org/jira/browse/HUDI-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HUDI-5103:
---------------------------------
    Labels: features pull-request-available  (was: features)

> Does not work with Azure Data lake Gen2
> ---------------------------------------
>
>                 Key: HUDI-5103
>                 URL: https://issues.apache.org/jira/browse/HUDI-5103
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Dheeraj Panangat
>            Priority: Major
>              Labels: features, pull-request-available
>
> Unable to use Hudi with Flink for Azure Data Lake 
> It tries to look for 
> {code:java}
> Caused by: Configuration property <datalakeaccount>.dfs.core.windows.net not found.
>     at org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:372)
>     at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1133)
>     at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:174) {code}
> Following properties are specified when running the Flink code : 
> {code:java}
> "fs.azure.account.auth.type": "OAuth",
>        "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
>        "fs.azure.account.oauth2.client.id": "<appId>",
>        "fs.azure.account.oauth2.client.secret": "<clientSecret>",
>        "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
>        "fs.azure.createRemoteFileSystemDuringInitialization": "true" {code}
> as per document : [Microsoft Azure Spark to Data Lake Gen2|https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark]
>  
>  
> In AWS it works because it takes the credentials from the environment, but in Azure it needs to get it from config, which does not reach till the point where the FileSystem is initialized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)