You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Rob Slifka (JIRA)" <ji...@apache.org> on 2014/07/16 22:21:05 UTC

[jira] [Comment Edited] (FLUME-1228) flume-ng fails while writing to S3 sink

    [ https://issues.apache.org/jira/browse/FLUME-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064029#comment-14064029 ] 

Rob Slifka edited comment on FLUME-1228 at 7/16/14 8:20 PM:
------------------------------------------------------------

This bit us pretty hard today.  We're on Flume 1.4 via CDH 4.4.  Events began backing up on two of our three nodes with the following logs:

{noformat}
16 Jul 2014 19:04:18,366 WARN  [hdfs-s3sink-beacons-cold-call-runner-8] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900%2Fnginx-ip-10-80-54-224.ec2.internal.1405537200166' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:18,377 WARN  [hdfs-s3sink-beacons-cold-call-runner-8] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900%2Fnginx-ip-10-80-54-224.ec2.internal.1405537200166_%24folder%24' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:18,508 WARN  [hdfs-s3sink-beacons-cold-call-runner-8] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:18,518 WARN  [hdfs-s3sink-beacons-cold-call-runner-8] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900_%24folder%24' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:21,632 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:219)  - Creating s3n://ACCESS_KEY:SECRET@PRODUCTION_BUCKET/2014-07-16/1900/nginx-ip-10-80-54-224.ec2.internal.1405537200167.tmp
16 Jul 2014 19:04:21,734 WARN  [hdfs-s3sink-beacons-cold-call-runner-0] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900%2Fnginx-ip-10-80-54-224.ec2.internal.1405537200167.tmp' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:21,745 WARN  [hdfs-s3sink-beacons-cold-call-runner-0] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900%2Fnginx-ip-10-80-54-224.ec2.internal.1405537200167.tmp_%24folder%24' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:21,761 INFO  [hdfs-s3sink-beacons-cold-call-runner-0] (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>:182)  - OutputStream for key '2014-07-16/1900/nginx-ip-10-80-54-224.ec2.internal.1405537200167.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-2622317072117705911.tmp'
16 Jul 2014 19:04:21,761 INFO  [hdfs-s3sink-beacons-cold-call-runner-0] (org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas:185)  - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.getNumCurrentReplicas()
16 Jul 2014 19:04:39,254 INFO  [Log-BackgroundWorker-fileChannel] (org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint:214)  - Start checkpoint for /mnt/flume/checkpoint/checkpoint, elements to sync = 41091
16 Jul 2014 19:04:39,261 INFO  [Log-BackgroundWorker-fileChannel] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:239)  - Updating checkpoint metadata: logWriteOrderID: 1408437260518, queueSize: 600, queueHead: 705519
16 Jul 2014 19:04:39,272 INFO  [Log-BackgroundWorker-fileChannel] (org.apache.flume.channel.file.Log.writeCheckpoint:1019)  - Updated checkpoint for file: /mnt/flume/data/log-648 position: 1599320032 logWriteOrderID: 1408437260518
16 Jul 2014 19:04:39,347 INFO  [Log-BackgroundWorker-hotFileChannel] (org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint:214)  - Start checkpoint for /mnt/flume/checkpoint-hot/checkpoint, elements to sync = 40628
16 Jul 2014 19:04:39,358 INFO  [Log-BackgroundWorker-hotFileChannel] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:239)  - Updating checkpoint metadata: logWriteOrderID: 1408437261136, queueSize: 999972, queueHead: 783009
16 Jul 2014 19:04:39,368 INFO  [Log-BackgroundWorker-hotFileChannel] (org.apache.flume.channel.file.Log.writeCheckpoint:1019)  - Updated checkpoint for file: /mnt/flume/data-hot/log-668 position: 1621862329 logWriteOrderID: 1408437261136
16 Jul 2014 19:04:39,988 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.Log.roll:944)  - Roll start /mnt/flume/data-hot
16 Jul 2014 19:04:39,989 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.LogFile$Writer.<init>:188)  - Opened /mnt/flume/data-hot/log-669
16 Jul 2014 19:04:39,993 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.LogFile$Writer.close:327)  - Closing /mnt/flume/data-hot/log-668
16 Jul 2014 19:04:39,993 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.Log.roll:959)  - Roll end
16 Jul 2014 19:04:52,026 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.Log.roll:944)  - Roll start /mnt/flume/data
16 Jul 2014 19:04:52,026 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.LogFile$Writer.<init>:188)  - Opened /mnt/flume/data/log-649
16 Jul 2014 19:04:52,030 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.LogFile$Writer.close:327)  - Closing /mnt/flume/data/log-648
16 Jul 2014 19:04:52,030 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.Log.roll:959)  - Roll end
16 Jul 2014 19:04:57,917 INFO  [hdfs-s3sink-beacons-cold-call-runner-0] (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:217)  - OutputStream for key '2014-07-16/1900/nginx-ip-10-80-54-224.ec2.internal.1405537200167.tmp' closed. Now beginning upload
{noformat}

When we restarted Flume, things began draining (slowly) though we're still seeing these errors.  If a restart solves it or if folks like Dennis have figured it out some other way, I suspect they may not be discussing it.  A definitive position or fix would be awesome!  We're happy to help if you need something verified.


was (Author: rslifka):
How are you evaluating whether or not people use Flume to sync to S3?  Is there usage information being collected?  Usage surveys, etc.?  If you're not hearing much then I would suggest it's either not used or working well (enough) :)

With respect to this particular issue, it bit us pretty hard today.  We're on Flume 1.4 via CDH 4.4.  Events began backing up on two of our three nodes with the following logs:

{noformat}
16 Jul 2014 19:04:18,366 WARN  [hdfs-s3sink-beacons-cold-call-runner-8] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900%2Fnginx-ip-10-80-54-224.ec2.internal.1405537200166' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:18,377 WARN  [hdfs-s3sink-beacons-cold-call-runner-8] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900%2Fnginx-ip-10-80-54-224.ec2.internal.1405537200166_%24folder%24' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:18,508 WARN  [hdfs-s3sink-beacons-cold-call-runner-8] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:18,518 WARN  [hdfs-s3sink-beacons-cold-call-runner-8] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900_%24folder%24' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:21,632 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:219)  - Creating s3n://ACCESS_KEY:SECRET@PRODUCTION_BUCKET/2014-07-16/1900/nginx-ip-10-80-54-224.ec2.internal.1405537200167.tmp
16 Jul 2014 19:04:21,734 WARN  [hdfs-s3sink-beacons-cold-call-runner-0] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900%2Fnginx-ip-10-80-54-224.ec2.internal.1405537200167.tmp' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:21,745 WARN  [hdfs-s3sink-beacons-cold-call-runner-0] (org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest:393)  - Response '/2014-07-16%2F1900%2Fnginx-ip-10-80-54-224.ec2.internal.1405537200167.tmp_%24folder%24' - Unexpected response code 404, expected 200
16 Jul 2014 19:04:21,761 INFO  [hdfs-s3sink-beacons-cold-call-runner-0] (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>:182)  - OutputStream for key '2014-07-16/1900/nginx-ip-10-80-54-224.ec2.internal.1405537200167.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-2622317072117705911.tmp'
16 Jul 2014 19:04:21,761 INFO  [hdfs-s3sink-beacons-cold-call-runner-0] (org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas:185)  - FileSystem's output stream doesn't support getNumCurrentReplicas; --HDFS-826 not available; fsOut=org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream; err=java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.getNumCurrentReplicas()
16 Jul 2014 19:04:39,254 INFO  [Log-BackgroundWorker-fileChannel] (org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint:214)  - Start checkpoint for /mnt/flume/checkpoint/checkpoint, elements to sync = 41091
16 Jul 2014 19:04:39,261 INFO  [Log-BackgroundWorker-fileChannel] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:239)  - Updating checkpoint metadata: logWriteOrderID: 1408437260518, queueSize: 600, queueHead: 705519
16 Jul 2014 19:04:39,272 INFO  [Log-BackgroundWorker-fileChannel] (org.apache.flume.channel.file.Log.writeCheckpoint:1019)  - Updated checkpoint for file: /mnt/flume/data/log-648 position: 1599320032 logWriteOrderID: 1408437260518
16 Jul 2014 19:04:39,347 INFO  [Log-BackgroundWorker-hotFileChannel] (org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint:214)  - Start checkpoint for /mnt/flume/checkpoint-hot/checkpoint, elements to sync = 40628
16 Jul 2014 19:04:39,358 INFO  [Log-BackgroundWorker-hotFileChannel] (org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:239)  - Updating checkpoint metadata: logWriteOrderID: 1408437261136, queueSize: 999972, queueHead: 783009
16 Jul 2014 19:04:39,368 INFO  [Log-BackgroundWorker-hotFileChannel] (org.apache.flume.channel.file.Log.writeCheckpoint:1019)  - Updated checkpoint for file: /mnt/flume/data-hot/log-668 position: 1621862329 logWriteOrderID: 1408437261136
16 Jul 2014 19:04:39,988 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.Log.roll:944)  - Roll start /mnt/flume/data-hot
16 Jul 2014 19:04:39,989 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.LogFile$Writer.<init>:188)  - Opened /mnt/flume/data-hot/log-669
16 Jul 2014 19:04:39,993 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.LogFile$Writer.close:327)  - Closing /mnt/flume/data-hot/log-668
16 Jul 2014 19:04:39,993 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.Log.roll:959)  - Roll end
16 Jul 2014 19:04:52,026 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.Log.roll:944)  - Roll start /mnt/flume/data
16 Jul 2014 19:04:52,026 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.LogFile$Writer.<init>:188)  - Opened /mnt/flume/data/log-649
16 Jul 2014 19:04:52,030 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.LogFile$Writer.close:327)  - Closing /mnt/flume/data/log-648
16 Jul 2014 19:04:52,030 INFO  [pool-5-thread-1] (org.apache.flume.channel.file.Log.roll:959)  - Roll end
16 Jul 2014 19:04:57,917 INFO  [hdfs-s3sink-beacons-cold-call-runner-0] (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:217)  - OutputStream for key '2014-07-16/1900/nginx-ip-10-80-54-224.ec2.internal.1405537200167.tmp' closed. Now beginning upload
{noformat}


> flume-ng fails while writing to S3 sink
> ---------------------------------------
>
>                 Key: FLUME-1228
>                 URL: https://issues.apache.org/jira/browse/FLUME-1228
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0, v1.4.0
>            Reporter: Prashanth Jonnalagadda
>            Assignee: Ashish Paliwal
>            Priority: Critical
>
> flume-ng (version 1.2.0) fails while writing to S3 sink since it gets back 404 response code. The files with data is created on S3 though.
> Hadoop version used is 0.20.2-cdh3u4
> Followed all the steps documented in the jira - https://issues.cloudera.org/browse/FLUME-66
> and also I tried swapping out hadoop-core.jar that comes with CDH, with emr-hadoop-core-0.20.jar that comes with EC2 hadoop cluster instance as suggested in the following blog post - http://eric.lubow.org/2011/system-administration/distributed-flume-setup-with-an-s3-sink/ but the issue still remains.
> Following errors are seen in the log:
> 2012-05-25 05:04:28,889 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995.tmp_%24folder%24' - Unexpected response code 404, expected 200
> 2012-05-25 05:04:28,964 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-8042215269186280519.tmp'
> 2012-05-25 05:04:28,972 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' closed. Now beginning upload
> 2012-05-25 05:04:29,044 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' upload complete
> 2012-05-25 05:04:29,074 INFO hdfs.BucketWriter: Renaming s3n://flume-ng/flumedata/FlumeData.122585423857995.tmp to s3n://flume-ng/flumedata/FlumeData.122585423857995
> 2012-05-25 05:04:29,097 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995' - Unexpected response code 404, expected 200
> 2012-05-25 05:04:29,120 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995_%24folder%24' - Unexpected response code 404, expected 200
> 2012-05-25 05:04:29,203 WARN httpclient.RestS3Service: Response '/flumedata' - Unexpected response code 404, expected 200
> 2012-05-25 05:04:29,224 WARN httpclient.RestS3Service: Response '/flumedata_%24folder%24' - Unexpected response code 404, expected 200
> 2012-05-25 05:04:29,608 INFO hdfs.BucketWriter: Creating s3n://flume-ng/flumedata/FlumeData.122585423857996.tmp
> 2012-05-25 05:04:29,720 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857996.tmp' - Unexpected response code 404, expected 200
> 2012-05-25 05:04:29,748 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857996.tmp_%24folder%24' - Unexpected response code 404, expected 200
> 2012-05-25 05:04:29,791 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-2477068572058013384.tmp'
> 2012-05-25 05:04:29,793 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' closed. Now beginning upload
> 2012-05-25 05:04:29,828 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' upload complete



--
This message was sent by Atlassian JIRA
(v6.2#6252)