You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Phillip Henry <lo...@gmail.com> on 2020/03/18 13:25:08 UTC

Re: hadoop-azure: "StorageException: The specified Rest Version is Unsupported"

Solved my own problem.

Here's what I did for future reference: use OAuth so:

spark.conf.set("fs.azure.account.auth.type",
"OAuth")
spark.conf.set("fs.azure.account.oauth2.client.secret",
 SECRET)
spark.conf.set("fs.azure.account.oauth2.client.id" ,                  APP_ID
)
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization",
"true")
spark.conf.set("fs.azure.account.oauth.provider.type",
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.endpoint", "
https://login.microsoftonline.com/" + TENANT + "/oauth2/token")
spark.conf.set("fs.azure.account.auth.type." + accountName + ".
dfs.core.windows.net", "SharedKey")
spark.conf.set("fs.azure.account.key."       + accountName + ".
dfs.core.windows.net", accountKey)

where
The tenant is the ID of our active directory in Azure
The app id (also known as client id) is the ID of the service principal
The secret is something you create under the service principal which you
use to authenticate (i.e. a password)

Phillip



On Thu, Feb 27, 2020 at 11:50 AM Phillip Henry <lo...@gmail.com>
wrote:

> I've built Spark 3.0.0-preview2 with the -Phadoop-3.2 profile switch and
> deployed it via Kubernetes.
>
> I launch Spark with a switch to pull in the relevant Hadoop/Azure
> dependencies:
>
>  --packages
> org.apache.hadoop:hadoop-azure:3.2.0,org.apache.hadoop:hadoop-azure-datalake:3.2.0
>
> and see that com.microsoft.azure#azure-storage;7.0.0 is indeed pulled in.
>
> I can see files using a blob.core.windows.net URL but the
> dfs.core.windows.net throws an Exception saying "The specified Rest
> Version is Unsupported".
>
> I use tcpdump and see that my client is indeed using:
>
> x-ms-version: 2017-07-29
>
> in its HTTP headers.
>
> If I upgrade to azure-storage:8.6.0, I see in the HTTP headers:
>
> x-ms-version: 2019-02-02
>
> and the job gets slightly further but reading the Parquet file now fails
> with "Incorrect Blob type, please use the correct Blob type to access a
> blob on the server. Expected BLOCK_BLOB, actual UNSPECIFIED".
>
> This is not overly surprising as I am shoe-horning in a binary that Hadoop
> was unprepared for. I just did this to demonstrate that this version of the
> library seems to talk to Azure as its version is more recent.
>
> Does anybody have any ideas on how I can talk to Azure?
>
> [Note: for various non-technical reasons, I cannot use HDInsight nor
> DataBricks.]
>
> Kind regards,
>
> Phillip
>
>