You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Matt Ryan (Jira)" <ji...@apache.org> on 2021/01/06 00:44:00 UTC

[jira] [Comment Edited] (OAK-9304) Filename with special characters in direct download URI Content-Disposition are causing HTTP 400 errors from Azure

    [ https://issues.apache.org/jira/browse/OAK-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258433#comment-17258433 ] 

Matt Ryan edited comment on OAK-9304 at 1/6/21, 12:43 AM:
----------------------------------------------------------

Sure thing [~reschke].  Sorry, I've been on holidays :)

Previously, in regard to the example in the description above, you said:  "The first of the two entries looks perfectly ok to me."  The issue here is that the first one does not work with Azure blob storage service - it rejects the request as having an invalid character in the URI.  So this is less an issue of whether the URI is correct per RFCs, and more an issue that the URI does not properly work with Azure.

More details follow.

PRIOR TO THIS FIX:  When Oak would attempt to generate a direct binary access URI for a filename with characters outside the ISO-8859-1 character set, this would result in a URI that Azure would reject with a 400-level error.  The reason was due to Oak failing to properly encode this filename in the "filename" portion of the Content-Disposition header specification.

(As background, remember that Oak declares to the cloud storage the value that should be used in the Content-Disposition header for requests to the generated direct binary access URI.  In Oak we specify both the content disposition type and filenames for this.  See [0] and [1] for more info.)

Example:  Suppose the filename is "umläut.jpg".  Oak would specify a Content-Disposition header value of:
{noformat}
inline; filename="umläut.jpg"; filename*=UTF-8''umla%CC%88ut.jpg{noformat}
This is then specified in a query parameter in the direct access URI, so this information gets encoded.  It is probably this encoding change that Azure does not expect.  Since this portion of the URI is signed, the signature doesn't match and the request fails.

WITH THIS FIX:  A basic ISO-8859-1 encoding is done on the "filename" value of the header.  This was made based on RFC6266 Section 4.3 which seems to suggest that only ISO-8859-1 characters are allowed for that value.

Thus the header now looks like this:
{noformat}
inline; filename="umla?ut.jpg"; filename*=UTF-8''umla%CC%88ut.jpg{noformat}
This header encodes and validates properly with Azure.  In testing, modern clients prefer the "filename*" portion, which results in the proper filename being used.

Please let me know if this is still unclear, or if it's clear now, let me know if you'd like me to update the bug description accordingly or just let it go :).

 

[0] - [https://jackrabbit.apache.org/oak/docs/features/direct-binary-access.html]

[1] - [https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.html]


was (Author: mattvryan):
Sure thing [~reschke].  Sorry, I've been on holidays :)

Previously, in regard to the example in the description above, you said:  "The first of the two entries looks perfectly ok to me."  The issue here is that the first one does not work with Azure blob storage service - it rejects the request as having an invalid character in the URI.  So this is less an issue of whether the URI is correct per RFCs, and more an issue that the URI does not properly work with Azure.

More details follow.

PRIOR TO THIS FIX:  When Oak would attempt to generate a direct binary access URI for a filename with characters outside the ISO-8859-1 character set, this would result in a URI that Azure would reject with a 400-level error.  The reason was due to Oak failing to properly encode this filename in the "filename" portion of the Content-Disposition header specification.

(As background, remember that Oak declares to the cloud storage the value that should be used in the Content-Disposition header for requests to the generated direct binary access URI.  In Oak we specify both the content disposition type and filenames for this.  See [0] and [1] for more info.)

Example:  Suppose the filename is "umläut.jpg".  Oak would specify a Content-Disposition header value of:
{noformat}
inline; filename="umläut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This is then specified in a query parameter in the direct access URI, so this information gets encoded.  It is probably this encoding change that Azure does not expect.  Since this portion of the URI is signed, the signature doesn't match and the request fails.

WITH THIS FIX:  A basic ISO-8859-1 encoding is done on the "filename" value of the header.  This was made based on RFC6266 Section 4.3 which seems to suggest that only ISO-8859-1 characters are allowed for that value.

Thus the header now looks like this:
{noformat}
inline; filename="umla?ut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This header encodes and validates properly with Azure.  In testing, modern clients prefer the "filename*" portion, which results in the proper filename being used.

Please let me know if this is still unclear, or if it's clear now, let me know if you'd like me to update the bug description accordingly or just let it go :).

 

[0] - [https://jackrabbit.apache.org/oak/docs/features/direct-binary-access.html]

[1] - [https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.html]

> Filename with special characters in direct download URI Content-Disposition are causing HTTP 400 errors from Azure
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: OAK-9304
>                 URL: https://issues.apache.org/jira/browse/OAK-9304
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob-cloud, blob-cloud-azure, blob-plugins
>    Affects Versions: 1.36.0
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> When generating a direct download URI for a filename with certain non-standard characters in the name, it can cause the resulting signed URI to be considered invalid by some blob storage services (Azure in particular).  This can lead to blob storage services being unable to service the URl request.
> For example, a filename of "Ausländische.jpg" currently requests a Content-Disposition header that looks like:
> {noformat}
> inline; filename="Ausländische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg {noformat}
> Azure blob storage service fails trying to parse a URI with that Content-Disposition header specification in the query string.  It instead should look like:
> {noformat}
> inline; filename="Ausla?ndische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg {noformat}
>  
> The "filename" portion of the Content-Disposition needs to consist of ISO-8859-1 characters, per [https://tools.ietf.org/html/rfc6266#section-4.3] in this paragraph:
> {quote}The parameters "filename" and "filename*" differ only in that "filename*" uses the encoding defined in RFC5987, allowing the use of characters not present in the ISO-8859-1 character set ISO-8859-1.
> {quote}
> Note that the purpose of this ticket is to address compatibility issues with blob storage services, not to ensure ISO-8859-1 compatibility.  However, by encoding the "filename" portion using standard Java character set encoding conversion (e.g. {{Charsets.ISO_8859_1.encode(fileName)}}), we can generate a URI that works with Azure, delivers the proper Content-Disposition header in responses, and generates the proper client result (meaning, the correct name for the downloaded file).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)