You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Mike Thomsen <mi...@gmail.com> on 2019/03/07 16:04:00 UTC

PutS3Object failing when using non-Latin characters in filename

I kept the default for the object key, which is ${filename} and some of our
files have non-Latin characters. The error from AWS is:

> The request signature we calculated does not match the signature you
provided. Check your key and signing method. (Service: Amazon S3; Status
Code: 403; Error Code: SignatureDoesNotMatch; Request ID: <REQ_ID>; S3
Extended Request ID: <SOME_UUID>)

There are no obvious encoding issues on the NiFi end. It renders the
characters just fine in the flowfile viewer. Is it something with UTF8
characters being problematic here? Any mitigation suggestions?

Thanks,

Mike

Re: PutS3Object failing when using non-Latin characters in filename

Posted by Andy LoPresto <al...@apache.org>.
Thanks Peter, this is very helpful. 

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 14, 2019, at 10:37 PM, Peter Turcsanyi <tu...@cloudera.com> wrote:
> 
> PR is available with the fix: https://github.com/apache/nifi/pull/3373 <https://github.com/apache/nifi/pull/3373>
> On Thu, Mar 7, 2019 at 11:47 PM Peter Turcsanyi <turcsanyi@cloudera.com <ma...@cloudera.com>> wrote:
> I managed to reproduce the S3 Put error, but FetchFile works fine in my local dev env, so I think the two issues are unrelated.
> 
> I looked into the code of PutS3Object and also checked the HTTP request sent to S3:
> The filename is being set in the Content-Disposition HTTP header and if it contains national characters, then the encoding will be wrong. It seems it needs to be URL encoded (related RFC: RFC 6266 <https://tools.ietf.org/html/rfc6266>, however I didn't dig into it in detail). I've checked it with S3 and it works fine.
> You can find my proposed fix here <https://github.com/apache/nifi/compare/master...turcsanyip:s3_put_i18n?expand=1>. If there are no objections, I'll file an issue / open a pull request tomorrow.
> 
> Regards,
> Peter
> 
> On Thu, Mar 7, 2019 at 7:47 PM Andy LoPresto <alopresto@apache.org <ma...@apache.org>> wrote:
> The fact that the signatures don’t match may indicate some kind of character normalization or encoding difference with the way AWS handles the input. There is an existing Jira for handling filenames with orthographic marks in FetchFile [1]. 
> 
> [1] https://issues.apache.org/jira/browse/NIFI-6051 <https://issues.apache.org/jira/browse/NIFI-6051>
> 
> 
> Andy LoPresto
> alopresto@apache.org <ma...@apache.org>
> alopresto.apache@gmail.com <ma...@gmail.com>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Mar 7, 2019, at 8:04 AM, Mike Thomsen <mikerthomsen@gmail.com <ma...@gmail.com>> wrote:
>> 
>> I kept the default for the object key, which is ${filename} and some of our files have non-Latin characters. The error from AWS is:
>> 
>> > The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: <REQ_ID>; S3 Extended Request ID: <SOME_UUID>)
>> 
>> There are no obvious encoding issues on the NiFi end. It renders the characters just fine in the flowfile viewer. Is it something with UTF8 characters being problematic here? Any mitigation suggestions?
>> 
>> Thanks,
>> 
>> Mike
> 


Re: PutS3Object failing when using non-Latin characters in filename

Posted by Peter Turcsanyi <tu...@cloudera.com>.
PR is available with the fix: https://github.com/apache/nifi/pull/3373

On Thu, Mar 7, 2019 at 11:47 PM Peter Turcsanyi <tu...@cloudera.com>
wrote:

> I managed to reproduce the S3 Put error, but FetchFile works fine in my
> local dev env, so I think the two issues are unrelated.
>
> I looked into the code of PutS3Object and also checked the HTTP request
> sent to S3:
> The filename is being set in the Content-Disposition HTTP header and if it
> contains national characters, then the encoding will be wrong. It seems it
> needs to be URL encoded (related RFC: RFC 6266
> <https://tools.ietf.org/html/rfc6266>, however I didn't dig into it in
> detail). I've checked it with S3 and it works fine.
> You can find my proposed fix here
> <https://github.com/apache/nifi/compare/master...turcsanyip:s3_put_i18n?expand=1>.
> If there are no objections, I'll file an issue / open a pull request
> tomorrow.
>
> Regards,
> Peter
>
> On Thu, Mar 7, 2019 at 7:47 PM Andy LoPresto <al...@apache.org> wrote:
>
>> The fact that the signatures don’t match may indicate some kind of
>> character normalization or encoding difference with the way AWS handles the
>> input. There is an existing Jira for handling filenames with orthographic
>> marks in FetchFile [1].
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-6051
>>
>>
>> Andy LoPresto
>> alopresto@apache.org
>> *alopresto.apache@gmail.com <al...@gmail.com>*
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Mar 7, 2019, at 8:04 AM, Mike Thomsen <mi...@gmail.com> wrote:
>>
>> I kept the default for the object key, which is ${filename} and some of
>> our files have non-Latin characters. The error from AWS is:
>>
>> > The request signature we calculated does not match the signature you
>> provided. Check your key and signing method. (Service: Amazon S3; Status
>> Code: 403; Error Code: SignatureDoesNotMatch; Request ID: <REQ_ID>; S3
>> Extended Request ID: <SOME_UUID>)
>>
>> There are no obvious encoding issues on the NiFi end. It renders the
>> characters just fine in the flowfile viewer. Is it something with UTF8
>> characters being problematic here? Any mitigation suggestions?
>>
>> Thanks,
>>
>> Mike
>>
>>
>>

Re: PutS3Object failing when using non-Latin characters in filename

Posted by Peter Turcsanyi <tu...@cloudera.com>.
I managed to reproduce the S3 Put error, but FetchFile works fine in my
local dev env, so I think the two issues are unrelated.

I looked into the code of PutS3Object and also checked the HTTP request
sent to S3:
The filename is being set in the Content-Disposition HTTP header and if it
contains national characters, then the encoding will be wrong. It seems it
needs to be URL encoded (related RFC: RFC 6266
<https://tools.ietf.org/html/rfc6266>, however I didn't dig into it in
detail). I've checked it with S3 and it works fine.
You can find my proposed fix here
<https://github.com/apache/nifi/compare/master...turcsanyip:s3_put_i18n?expand=1>.
If there are no objections, I'll file an issue / open a pull request
tomorrow.

Regards,
Peter

On Thu, Mar 7, 2019 at 7:47 PM Andy LoPresto <al...@apache.org> wrote:

> The fact that the signatures don’t match may indicate some kind of
> character normalization or encoding difference with the way AWS handles the
> input. There is an existing Jira for handling filenames with orthographic
> marks in FetchFile [1].
>
> [1] https://issues.apache.org/jira/browse/NIFI-6051
>
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <al...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Mar 7, 2019, at 8:04 AM, Mike Thomsen <mi...@gmail.com> wrote:
>
> I kept the default for the object key, which is ${filename} and some of
> our files have non-Latin characters. The error from AWS is:
>
> > The request signature we calculated does not match the signature you
> provided. Check your key and signing method. (Service: Amazon S3; Status
> Code: 403; Error Code: SignatureDoesNotMatch; Request ID: <REQ_ID>; S3
> Extended Request ID: <SOME_UUID>)
>
> There are no obvious encoding issues on the NiFi end. It renders the
> characters just fine in the flowfile viewer. Is it something with UTF8
> characters being problematic here? Any mitigation suggestions?
>
> Thanks,
>
> Mike
>
>
>

Re: PutS3Object failing when using non-Latin characters in filename

Posted by Andy LoPresto <al...@apache.org>.
The fact that the signatures don’t match may indicate some kind of character normalization or encoding difference with the way AWS handles the input. There is an existing Jira for handling filenames with orthographic marks in FetchFile [1]. 

[1] https://issues.apache.org/jira/browse/NIFI-6051


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 7, 2019, at 8:04 AM, Mike Thomsen <mi...@gmail.com> wrote:
> 
> I kept the default for the object key, which is ${filename} and some of our files have non-Latin characters. The error from AWS is:
> 
> > The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: <REQ_ID>; S3 Extended Request ID: <SOME_UUID>)
> 
> There are no obvious encoding issues on the NiFi end. It renders the characters just fine in the flowfile viewer. Is it something with UTF8 characters being problematic here? Any mitigation suggestions?
> 
> Thanks,
> 
> Mike