You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by "Guyle M. Taber" <gu...@gmtech.net> on 2019/04/25 21:32:52 UTC

flume to s3 - renaming .tmp files fails.

I’m using a new flume sink to S3 that doesn’t seem to successfully close out .tmp files created in S3 buckets. So I’m essentially getting a whole lot of unclosed .tmp files.

The IAM role being used has full S3 permissions to this bucket.

Here’s the flume error when trying to rename and close the file (cp & delete)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
25 Apr 2019 21:20:01,522 ERROR [hdfs-S3Sink-call-runner-7] (org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects:1151)  - button/qa1-event1/: "AccessDenied" - Access Denied
25 Apr 2019 21:20:01,675 WARN  [hdfs-S3Sink-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:427)  - failed to rename() file (s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp). Exception follows.
java.nio.file.AccessDeniedException: s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp: getFileStatus on s3a://my-bucket-name/button/qa1-event1./FlumeData.1556226600899.tmp: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 68D5110FD4C0C1DA), S3 Extended Request ID: xk9gb+hY0NUrqAQS9NQW6dDZL35p0I4SpO57b/o9YZucaVtuk1igtPfYaQZTgEfPrHepyxm6+q8=
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:120)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1886)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1855)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1799)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2529)
	at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:654)
	at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:651)
	at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701)
	at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
	at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here’s my S3 sink.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
agent.sinks.S3Sink.type = hdfs
agent.sinks.S3Sink.hdfs.path = s3a://my-bucket-name/
agent.sinks.S3Sink.channel = S3Channel
agent.sinks.S3Sink.hdfs.fileType = DataStream
agent.sinks.S3Sink.hdfs.writeFormat = Text
agent.sinks.S3Sink.hdfs.rollCount = 0
agent.sinks.S3Sink.hdfs.rollSize = 0
agent.sinks.S3Sink.hdfs.batchSize = 10000
agent.sinks.S3Sink.hdfs.rollInterval = 600
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: flume to s3 - renaming .tmp files fails.

Posted by iain wright <ia...@gmail.com>.
Ah, thats a bummer. I was suspect of the s3a as I had the same problem
years ago (and the same solution!) , but figured it had matured/resolved by
now

Thanks for reporting back for future use by the community!

Cheers,
-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.


On Thu, Apr 25, 2019 at 5:35 PM Guyle M. Taber <gu...@gmtech.net> wrote:

> Well I think I have this figured out.
>
> I had to change the sink to use “s3n”, instead of “s3a”, and add the AWS
> key and secret key to the core-site.xml to make “s3n” work properly. Then
> with a change to the bucket policy to allow the IAM user (e.g. keys) full
> perms to that bucket. I’m no longer getting .tmp files and fully closed
> files in the S3 bucket.
>
> I really wanted to leverage IAM server roles for this, but s3a wasn’t
> closing the files in the bucket. S3n has that ability, but requires AWS
> keys.
> Kind of a bummer, but it works.
>
>
> On Apr 25, 2019, at 5:14 PM, iain wright <ia...@gmail.com> wrote:
>
> Tahnks, bucket policy looks good...
>
> Are any denies present on the policies attached to event-server-s3-role??
>
> Are you able to aws s3 mv s3://my-bucket-name/file.tmp
> s3://my-bucket-name/file from the instance? Not sure if that's a valid
> test for what flume/aws-sdk are doing underneath but might reveal something
>
>
>
> --
> Iain Wright
>
> This email message is confidential, intended only for the recipient(s)
> named above and may contain information that is privileged, exempt from
> disclosure under applicable law. If you are not the intended recipient, do
> not disclose or disseminate the message to anyone except the intended
> recipient. If you have received this message in error, or are not the named
> recipient(s), please immediately notify the sender by return email, and
> delete all copies of this message.
>
>
> On Thu, Apr 25, 2019 at 3:10 PM Guyle M. Taber <gu...@gmtech.net> wrote:
>
>> Here you go. Names changed to protect the innocent. :-)
>>
>> {
>>     "Version": "2012-10-17",
>>     "Id": "Policy1527067401408",
>>     "Statement": [
>>         {
>>             "Sid": "AccessForEventServerRole",
>>             "Effect": "Allow",
>>             "Principal": {
>>                 "AWS":
>> "arn:aws:iam::XXXXXXXXXXXX:role/event-server-s3-role"
>>             },
>>             *"Action": "s3:*",*
>>             "Resource": [
>>                 "arn:aws:s3:::my-bucket-name",
>>                 "arn:aws:s3:::my-bucket-name/*"
>>             ]
>>         }
>>     ]
>> }
>>
>> On Apr 25, 2019, at 3:06 PM, iain wright <ia...@gmail.com> wrote:
>>
>> Could you please share the IAM policy attached to the role granting
>> permission to the bucket, as well the bucket policy, if one is present?
>>
>> Please remove or obfuscate bucket names, account number, etc.
>>
>> The policy on the role or bucket is most certainly a missing permission,
>> rename requires a few odd ones in addition to the usual actions, ie:
>>
>> "s3:GetObjectVersion", "s3:DeleteObjectVersion",
>> "s3:PutObjectAcl",
>> "s3:GetObjectAcl"
>>
>>
>> Sent from my iPhone
>>
>> On Apr 25, 2019, at 2:32 PM, Guyle M. Taber <gu...@gmtech.net> wrote:
>>
>> I’m using a new flume sink to S3 that doesn’t seem to successfully close
>> out .tmp files created in S3 buckets. So I’m essentially getting a whole
>> lot of unclosed .tmp files.
>>
>> The IAM role being used has full S3 permissions to this bucket.
>>
>> Here’s the flume error when trying to rename and close the file (cp &
>> delete)
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 25 Apr 2019 21:20:01,522 ERROR [hdfs-S3Sink-call-runner-7]
>> (org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects:1151)  -
>> button/qa1-event1/: "*AccessDenied" - Access Denied*
>> 25 Apr 2019 21:20:01,675 WARN  [hdfs-S3Sink-roll-timer-0]
>> (org.apache.flume.sink.hdfs.BucketWriter.close:427)  - failed to rename()
>> file (s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp).
>> Exception follows.
>> java.nio.file.AccessDeniedException:
>> s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp:
>> getFileStatus on s3a://my-bucket-name/button/qa1-event1./FlumeData.1556226600899.tmp:
>> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden
>> (Service: Amazon S3; Status Code: 403; Error Code: *403 Forbidden;
>> Request ID*: 68D5110FD4C0C1DA), S3 Extended Request
>> ID: xk9gb+hY0NUrqAQS9NQW6dDZL35p0I4SpO57b/o9YZucaVtuk1igtPfYaQZTgEfPrHepyxm6+q8=
>> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
>> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:120)
>> at
>> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1886)
>> at
>> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1855)
>> at
>> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1799)
>> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418)
>> at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2529)
>> at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:654)
>> at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:651)
>> at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701)
>> at
>> org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
>> at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Here’s my S3 sink.
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> agent.sinks.S3Sink.type = hdfs
>> agent.sinks.S3Sink.hdfs.path = s3a://my-bucket-name/
>> agent.sinks.S3Sink.channel = S3Channel
>> agent.sinks.S3Sink.hdfs.fileType = DataStream
>> agent.sinks.S3Sink.hdfs.writeFormat = Text
>> agent.sinks.S3Sink.hdfs.rollCount = 0
>> agent.sinks.S3Sink.hdfs.rollSize = 0
>> agent.sinks.S3Sink.hdfs.batchSize = 10000
>> agent.sinks.S3Sink.hdfs.rollInterval = 600
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>

Re: flume to s3 - renaming .tmp files fails.

Posted by "Guyle M. Taber" <gu...@gmtech.net>.
Well I think I have this figured out.

I had to change the sink to use “s3n”, instead of “s3a”, and add the AWS key and secret key to the core-site.xml to make “s3n” work properly. Then with a change to the bucket policy to allow the IAM user (e.g. keys) full perms to that bucket. I’m no longer getting .tmp files and fully closed files in the S3 bucket.

I really wanted to leverage IAM server roles for this, but s3a wasn’t closing the files in the bucket. S3n has that ability, but requires AWS keys. 
Kind of a bummer, but it works.


> On Apr 25, 2019, at 5:14 PM, iain wright <ia...@gmail.com> wrote:
> 
> Tahnks, bucket policy looks good... 
> 
> Are any denies present on the policies attached to event-server-s3-role??
> 
> Are you able to aws s3 mv s3://my-bucket-name/file.tmp s3://my-bucket-name/file from the instance? Not sure if that's a valid test for what flume/aws-sdk are doing underneath but might reveal something
> 
> 
> 
> -- 
> Iain Wright
> 
> This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message.
> 
> 
> On Thu, Apr 25, 2019 at 3:10 PM Guyle M. Taber <guyle@gmtech.net <ma...@gmtech.net>> wrote:
> Here you go. Names changed to protect the innocent. :-)
> 
> {
>     "Version": "2012-10-17",
>     "Id": "Policy1527067401408",
>     "Statement": [
>         {
>             "Sid": "AccessForEventServerRole",
>             "Effect": "Allow",
>             "Principal": {
>                 "AWS":   "arn:aws:iam::XXXXXXXXXXXX:role/event-server-s3-role"
>             },
>             "Action": "s3:*",
>             "Resource": [
>                 "arn:aws:s3:::my-bucket-name",
>                 "arn:aws:s3:::my-bucket-name/*"
>             ]
>         }
>     ]
> }
> 
>> On Apr 25, 2019, at 3:06 PM, iain wright <iainwrig@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Could you please share the IAM policy attached to the role granting permission to the bucket, as well the bucket policy, if one is present?
>> 
>> Please remove or obfuscate bucket names, account number, etc.
>> 
>> The policy on the role or bucket is most certainly a missing permission, rename requires a few odd ones in addition to the usual actions, ie:
>> 
>> "s3:GetObjectVersion", "s3:DeleteObjectVersion",
>> "s3:PutObjectAcl", 
>> "s3:GetObjectAcl"
>>  
>> 
>> Sent from my iPhone
>> 
>> On Apr 25, 2019, at 2:32 PM, Guyle M. Taber <guyle@gmtech.net <ma...@gmtech.net>> wrote:
>> 
>>> I’m using a new flume sink to S3 that doesn’t seem to successfully close out .tmp files created in S3 buckets. So I’m essentially getting a whole lot of unclosed .tmp files.
>>> 
>>> The IAM role being used has full S3 permissions to this bucket.
>>> 
>>> Here’s the flume error when trying to rename and close the file (cp & delete)
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> 25 Apr 2019 21:20:01,522 ERROR [hdfs-S3Sink-call-runner-7] (org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects:1151)  - button/qa1-event1/: "AccessDenied" - Access Denied
>>> 25 Apr 2019 21:20:01,675 WARN  [hdfs-S3Sink-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:427)  - failed to rename() file (s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp <>). Exception follows.
>>> java.nio.file.AccessDeniedException: s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp: <> getFileStatus on s3a://my- <>bucket-name/button/qa1-event1./FlumeData.1556226600899.tmp: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 68D5110FD4C0C1DA), S3 Extended Request ID: xk9gb+hY0NUrqAQS9NQW6dDZL35p0I4SpO57b/o9YZucaVtuk1igtPfYaQZTgEfPrHepyxm6+q8=
>>> 	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
>>> 	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:120)
>>> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1886)
>>> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1855)
>>> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1799)
>>> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418)
>>> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2529)
>>> 	at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:654)
>>> 	at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:651)
>>> 	at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701)
>>> 	at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
>>> 	at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698)
>>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> 	at java.lang.Thread.run(Thread.java:748)
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> 
>>> Here’s my S3 sink.
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> agent.sinks.S3Sink.type = hdfs
>>> agent.sinks.S3Sink.hdfs.path = s3a://my-bucket-name/ <>
>>> agent.sinks.S3Sink.channel = S3Channel
>>> agent.sinks.S3Sink.hdfs.fileType = DataStream
>>> agent.sinks.S3Sink.hdfs.writeFormat = Text
>>> agent.sinks.S3Sink.hdfs.rollCount = 0
>>> agent.sinks.S3Sink.hdfs.rollSize = 0
>>> agent.sinks.S3Sink.hdfs.batchSize = 10000
>>> agent.sinks.S3Sink.hdfs.rollInterval = 600
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 


Re: flume to s3 - renaming .tmp files fails.

Posted by iain wright <ia...@gmail.com>.
Tahnks, bucket policy looks good...

Are any denies present on the policies attached to event-server-s3-role??

Are you able to aws s3 mv s3://my-bucket-name/file.tmp
s3://my-bucket-name/file from the instance? Not sure if that's a valid test
for what flume/aws-sdk are doing underneath but might reveal something



-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.


On Thu, Apr 25, 2019 at 3:10 PM Guyle M. Taber <gu...@gmtech.net> wrote:

> Here you go. Names changed to protect the innocent. :-)
>
> {
>     "Version": "2012-10-17",
>     "Id": "Policy1527067401408",
>     "Statement": [
>         {
>             "Sid": "AccessForEventServerRole",
>             "Effect": "Allow",
>             "Principal": {
>                 "AWS":
> "arn:aws:iam::XXXXXXXXXXXX:role/event-server-s3-role"
>             },
>             *"Action": "s3:*",*
>             "Resource": [
>                 "arn:aws:s3:::my-bucket-name",
>                 "arn:aws:s3:::my-bucket-name/*"
>             ]
>         }
>     ]
> }
>
> On Apr 25, 2019, at 3:06 PM, iain wright <ia...@gmail.com> wrote:
>
> Could you please share the IAM policy attached to the role granting
> permission to the bucket, as well the bucket policy, if one is present?
>
> Please remove or obfuscate bucket names, account number, etc.
>
> The policy on the role or bucket is most certainly a missing permission,
> rename requires a few odd ones in addition to the usual actions, ie:
>
> "s3:GetObjectVersion", "s3:DeleteObjectVersion",
> "s3:PutObjectAcl",
> "s3:GetObjectAcl"
>
>
> Sent from my iPhone
>
> On Apr 25, 2019, at 2:32 PM, Guyle M. Taber <gu...@gmtech.net> wrote:
>
> I’m using a new flume sink to S3 that doesn’t seem to successfully close
> out .tmp files created in S3 buckets. So I’m essentially getting a whole
> lot of unclosed .tmp files.
>
> The IAM role being used has full S3 permissions to this bucket.
>
> Here’s the flume error when trying to rename and close the file (cp &
> delete)
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 25 Apr 2019 21:20:01,522 ERROR [hdfs-S3Sink-call-runner-7]
> (org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects:1151)  -
> button/qa1-event1/: "*AccessDenied" - Access Denied*
> 25 Apr 2019 21:20:01,675 WARN  [hdfs-S3Sink-roll-timer-0]
> (org.apache.flume.sink.hdfs.BucketWriter.close:427)  - failed to rename()
> file (s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp).
> Exception follows.
> java.nio.file.AccessDeniedException:
> s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp:
> getFileStatus on s3a://my-bucket-name/button/qa1-event1./FlumeData.1556226600899.tmp:
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden
> (Service: Amazon S3; Status Code: 403; Error Code: *403 Forbidden;
> Request ID*: 68D5110FD4C0C1DA), S3 Extended Request
> ID: xk9gb+hY0NUrqAQS9NQW6dDZL35p0I4SpO57b/o9YZucaVtuk1igtPfYaQZTgEfPrHepyxm6+q8=
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:120)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1886)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1855)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1799)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2529)
> at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:654)
> at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:651)
> at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701)
> at
> org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
> at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Here’s my S3 sink.
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> agent.sinks.S3Sink.type = hdfs
> agent.sinks.S3Sink.hdfs.path = s3a://my-bucket-name/
> agent.sinks.S3Sink.channel = S3Channel
> agent.sinks.S3Sink.hdfs.fileType = DataStream
> agent.sinks.S3Sink.hdfs.writeFormat = Text
> agent.sinks.S3Sink.hdfs.rollCount = 0
> agent.sinks.S3Sink.hdfs.rollSize = 0
> agent.sinks.S3Sink.hdfs.batchSize = 10000
> agent.sinks.S3Sink.hdfs.rollInterval = 600
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>

Re: flume to s3 - renaming .tmp files fails.

Posted by "Guyle M. Taber" <gu...@gmtech.net>.
Here you go. Names changed to protect the innocent. :-)

{
    "Version": "2012-10-17",
    "Id": "Policy1527067401408",
    "Statement": [
        {
            "Sid": "AccessForEventServerRole",
            "Effect": "Allow",
            "Principal": {
                "AWS":   "arn:aws:iam::XXXXXXXXXXXX:role/event-server-s3-role"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::my-bucket-name",
                "arn:aws:s3:::my-bucket-name/*"
            ]
        }
    ]
}

> On Apr 25, 2019, at 3:06 PM, iain wright <ia...@gmail.com> wrote:
> 
> Could you please share the IAM policy attached to the role granting permission to the bucket, as well the bucket policy, if one is present?
> 
> Please remove or obfuscate bucket names, account number, etc.
> 
> The policy on the role or bucket is most certainly a missing permission, rename requires a few odd ones in addition to the usual actions, ie:
> 
> "s3:GetObjectVersion", "s3:DeleteObjectVersion",
> "s3:PutObjectAcl", 
> "s3:GetObjectAcl"
>  
> 
> Sent from my iPhone
> 
> On Apr 25, 2019, at 2:32 PM, Guyle M. Taber <guyle@gmtech.net <ma...@gmtech.net>> wrote:
> 
>> I’m using a new flume sink to S3 that doesn’t seem to successfully close out .tmp files created in S3 buckets. So I’m essentially getting a whole lot of unclosed .tmp files.
>> 
>> The IAM role being used has full S3 permissions to this bucket.
>> 
>> Here’s the flume error when trying to rename and close the file (cp & delete)
>> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 25 Apr 2019 21:20:01,522 ERROR [hdfs-S3Sink-call-runner-7] (org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects:1151)  - button/qa1-event1/: "AccessDenied" - Access Denied
>> 25 Apr 2019 21:20:01,675 WARN  [hdfs-S3Sink-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:427)  - failed to rename() file (s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp <s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp>). Exception follows.
>> java.nio.file.AccessDeniedException: s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp: <s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp:> getFileStatus on s3a://my- <s3a://my->bucket-name/button/qa1-event1./FlumeData.1556226600899.tmp: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 68D5110FD4C0C1DA), S3 Extended Request ID: xk9gb+hY0NUrqAQS9NQW6dDZL35p0I4SpO57b/o9YZucaVtuk1igtPfYaQZTgEfPrHepyxm6+q8=
>> 	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
>> 	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:120)
>> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1886)
>> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1855)
>> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1799)
>> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418)
>> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2529)
>> 	at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:654)
>> 	at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:651)
>> 	at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701)
>> 	at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
>> 	at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698)
>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> 	at java.lang.Thread.run(Thread.java:748)
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 
>> Here’s my S3 sink.
>> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> agent.sinks.S3Sink.type = hdfs
>> agent.sinks.S3Sink.hdfs.path = s3a://my-bucket-name/ <s3a://my-bucket-name/>
>> agent.sinks.S3Sink.channel = S3Channel
>> agent.sinks.S3Sink.hdfs.fileType = DataStream
>> agent.sinks.S3Sink.hdfs.writeFormat = Text
>> agent.sinks.S3Sink.hdfs.rollCount = 0
>> agent.sinks.S3Sink.hdfs.rollSize = 0
>> agent.sinks.S3Sink.hdfs.batchSize = 10000
>> agent.sinks.S3Sink.hdfs.rollInterval = 600
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Re: flume to s3 - renaming .tmp files fails.

Posted by iain wright <ia...@gmail.com>.
Could you please share the IAM policy attached to the role granting permission to the bucket, as well the bucket policy, if one is present?

Please remove or obfuscate bucket names, account number, etc.

The policy on the role or bucket is most certainly a missing permission, rename requires a few odd ones in addition to the usual actions, ie:

"s3:GetObjectVersion", "s3:DeleteObjectVersion",
"s3:PutObjectAcl", 
"s3:GetObjectAcl"
 

Sent from my iPhone

> On Apr 25, 2019, at 2:32 PM, Guyle M. Taber <gu...@gmtech.net> wrote:
> 
> I’m using a new flume sink to S3 that doesn’t seem to successfully close out .tmp files created in S3 buckets. So I’m essentially getting a whole lot of unclosed .tmp files.
> 
> The IAM role being used has full S3 permissions to this bucket.
> 
> Here’s the flume error when trying to rename and close the file (cp & delete)
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 25 Apr 2019 21:20:01,522 ERROR [hdfs-S3Sink-call-runner-7] (org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects:1151)  - button/qa1-event1/: "AccessDenied" - Access Denied
> 25 Apr 2019 21:20:01,675 WARN  [hdfs-S3Sink-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:427)  - failed to rename() file (s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp). Exception follows.
> java.nio.file.AccessDeniedException: s3a://my-bucket-name/button/qa1-event1/FlumeData.1556226600899.tmp: getFileStatus on s3a://my-bucket-name/button/qa1-event1./FlumeData.1556226600899.tmp: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 68D5110FD4C0C1DA), S3 Extended Request ID: xk9gb+hY0NUrqAQS9NQW6dDZL35p0I4SpO57b/o9YZucaVtuk1igtPfYaQZTgEfPrHepyxm6+q8=
> 	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
> 	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:120)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1886)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1855)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1799)
> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2529)
> 	at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:654)
> 	at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:651)
> 	at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:701)
> 	at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
> 	at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:698)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Here’s my S3 sink.
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> agent.sinks.S3Sink.type = hdfs
> agent.sinks.S3Sink.hdfs.path = s3a://my-bucket-name/
> agent.sinks.S3Sink.channel = S3Channel
> agent.sinks.S3Sink.hdfs.fileType = DataStream
> agent.sinks.S3Sink.hdfs.writeFormat = Text
> agent.sinks.S3Sink.hdfs.rollCount = 0
> agent.sinks.S3Sink.hdfs.rollSize = 0
> agent.sinks.S3Sink.hdfs.batchSize = 10000
> agent.sinks.S3Sink.hdfs.rollInterval = 600
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~