You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flume.apache.org by "Tomas Zulberti (Jira)" <ji...@apache.org> on 2020/05/21 21:53:00 UTC
[jira] [Created] (FLUME-3369) Corrupt S3 File
Tomas Zulberti created FLUME-3369:
-------------------------------------
Summary: Corrupt S3 File
Key: FLUME-3369
URL: https://issues.apache.org/jira/browse/FLUME-3369
Project: Flume
Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Tomas Zulberti
We are using Flume to read from Kinesis, and upload the files to S3. The issue comes that the generated Gzip file is corrupt:
- it is an empty file
- it is a file that isn't a valid Gz File.
I checked FLUME-2967, and we are already using native libraries. The stack trace I have is as follows:
{code}
21 May 2020 01:09:27,342 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:246) - Creating s3a://mycompany/foobar/year=2020/month=05/day=21/hour=01/172_17_5_220_bids4.1590023192733.gz
21 May 2020 01:09:27,393 INFO [hdfs-bids4-call-runner-19] (org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas:190) - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.s3a.S3ABlockOutputStream; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3ABlockOutputStream.getNumCurrentReplicas()
21 May 2020 01:09:27,396 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.getRefIsClosed:197) - isFileClosed() is not available in the version of the distributed filesystem being used. Flume will not attempt to re-close files if the close fails on the first attempt
21 May 2020 01:09:27,614 WARN [hdfs-bids4-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close:348) - Closing file: s3a://mycompany/foobar/year=2020/month=05/day=21/hour=01/172_17_5_220_foobar.1590022801143.gz failed. Will retry again in 180 seconds.
java.io.IOException: Filesystem {bucket=dw.jampp.com, key='foobar/year=2020/month=05/day=21/hour=01/172_17_5_220_foobar.1590022801143.gz'} closed
at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.checkOpen(S3ABlockOutputStream.java:224)
at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.write(S3ABlockOutputStream.java:270)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:83)
at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
at org.apache.flume.sink.hdfs.HDFSCompressedDataStream.close(HDFSCompressedDataStream.java:149)
at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:319)
at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:316)
at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
21 May 2020 01:09:27,656 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.append:613) - Caught IOException writing to HDFSWriter (write beyond end of stream). Closing file (s3a://mycompany/foobar/year=2020/month=05/day=21/hour=01/172_17_5_220_foobar.1590023192733.gz) and rethrowing exception.
21 May 2020 01:09:27,658 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink$1.run:393) - Writer callback called.
21 May 2020 01:09:27,658 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438) - Closing s3a://mycompany/foobar/year=2020/month=05/day=21/hour=01/172_17_5_220_foobar.1590023192733.gz
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@flume.apache.org
For additional commands, e-mail: issues-help@flume.apache.org