You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Denis Lowe <de...@gmail.com> on 2013/03/02 00:57:34 UTC

process failed - java.lang.OutOfMemoryError

process failed - java.lang.OutOfMemoryError

We observed the following error:
01 Mar 2013 21:37:24,807 ERROR
[SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSEventSink.process:460)  - process failed
java.lang.OutOfMemoryError
        at org.apache.hadoop.io.compress.zlib.ZlibCompressor.init(Native
Method)
        at
org.apache.hadoop.io.compress.zlib.ZlibCompressor.<init>(ZlibCompressor.java:222)
        at
org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor.<init>(GzipCodec.java:159)
        at
org.apache.hadoop.io.compress.GzipCodec.createCompressor(GzipCodec.java:109)
        at
org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
        at
org.apache.flume.sink.hdfs.HDFSCompressedDataStream.open(HDFSCompressedDataStream.java:70)
        at
org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:216)
        at
org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:53)
        at
org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:172)
        at
org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:170)
        at
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
        at
org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:170)
        at
org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:364)
        at
org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
        at
org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

Unfortunately the error does not state if it is because of lack of Heap,
Perm or Direct Memory?

Looking at the system memory we could see that we were using 3GB of 7GB (ie
less than half of the physical memory was used)

Using VisualVM profiler we could see that we had not maxed out the Heap
Memory 75MB of 131MB (allocated)
PermGen was fine 16MB of 27MB (allocated)

Buffer Usage is as follows:
Direct Memory:
< 50MB (this gets freed after each GC)

Mapped Memory:
count 9
144MB (always stays constant)

I'm assuming the -XX:MaxDirectMemorySize is for Direct Buffer Memory usage
NOT Mapped buffer Memory?

The other thing we noticed was that after restart the flume process "RES"
size starts at around 200MB and then over a period of a week will grow up
to 3GB after which we observed the above error.
Unfortunately we cannot see where this 3GB of memory is being used when
profiled with VisualVM and JConsole (max heap size is set to 256MB) - there
definitely appears to be a slow memory leak?

Flume is the only process running on this server:
64bit Centos
java version "1.6.0_27" (64bit)

The flume collector is configured with 8 file channels writing to S3 using
the HDFS sink. (8 upstream servers a pushing events to 2 downsteam
collectors)

Each of the 8 channels/sinks is configured as follows:
## impression source
agent.sources.impressions.type = avro
agent.sources.impressions.bind = 0.0.0.0
agent.sources.impressions.port = 5001
agent.sources.impressions.channels = impressions-s3-channel
## impression  channel
agent.channels.impressions-s3-channel.type = file
agent.channels.impressions-s3-channel.checkpointDir =
/mnt/flume-ng/checkpoint/impressions-s3-channel
agent.channels.impressions-s3-channel.dataDirs =
/mnt/flume-ng/data1/impressions-s3-channel,/mnt/flume-ng/data2/impressions-s3-channel
agent.channels.impressions-s3-channel.maxFileSize = 210000000
agent.channels.impressions-s3-channel.capacity = 2000000
agent.channels.impressions-s3-channel.checkpointInterval = 300000
agent.channels.impressions-s3-channel.transactionCapacity = 10000
# impression s3 sink
agent.sinks.impressions-s3-sink.type = hdfs
agent.sinks.impressions-s3-sink.channel = impressions-s3-channel
agent.sinks.impressions-s3-sink.hdfs.path = s3n://KEY:SECRET_KEY@S3-PATH
agent.sinks.impressions-s3-sink.hdfs.filePrefix =
impressions-%{collector-host}
agent.sinks.impressions-s3-sink.hdfs.callTimeout = 0
agent.sinks.impressions-s3-sink.hdfs.rollInterval = 3600
agent.sinks.impressions-s3-sink.hdfs.rollSize = 450000000
agent.sinks.impressions-s3-sink.hdfs.rollCount = 0
agent.sinks.impressions-s3-sink.hdfs.codeC = gzip
agent.sinks.impressions-s3-sink.hdfs.fileType = CompressedStream
agent.sinks.impressions-s3-sink.hdfs.batchSize = 100

I am using flume-ng 1.3.1 with the following parameters:
JAVA_OPTS="-Xms64m -Xmx256m -Xss128k -XX:MaxDirectMemorySize=256m
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/mnt/logs/flume-ng/gc.log"

We have 2 collectors running and they both fail at pretty much the same
time.

So from what i can see there appears to be a slow memory leak with the HDFS
sink, but have no idea how track this down or what alternate configuration
i can use to prevent this from happening again?

Any ideas would be greatly appreciated?

Re: process failed - java.lang.OutOfMemoryError

Posted by Brock Noland <br...@cloudera.com>.
Try turning on HeapDumpOnOutOfMemoryError so we can peek at the heap dump.  

-- 
Brock Noland
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, March 1, 2013 at 5:57 PM, Denis Lowe wrote:

> process failed - java.lang.OutOfMemoryError
> 
> We observed the following error:
> 01 Mar 2013 21:37:24,807 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:460)  - process failed
> java.lang.OutOfMemoryError
>         at org.apache.hadoop.io.compress.zlib.ZlibCompressor.init(Native Method)
>         at org.apache.hadoop.io.compress.zlib.ZlibCompressor.<init>(ZlibCompressor.java:222)
>         at org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor.<init>(GzipCodec.java:159)
>         at org.apache.hadoop.io.compress.GzipCodec.createCompressor(GzipCodec.java:109)
>         at org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
>         at org.apache.flume.sink.hdfs.HDFSCompressedDataStream.open(HDFSCompressedDataStream.java:70)
>         at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:216)
>         at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:53)
>         at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:172)
>         at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:170)
>         at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>         at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:170)
>         at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:364)
>         at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>         at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:722)
> 
> Unfortunately the error does not state if it is because of lack of Heap, Perm or Direct Memory?
> 
> Looking at the system memory we could see that we were using 3GB of 7GB (ie less than half of the physical memory was used) 
> 
> Using VisualVM profiler we could see that we had not maxed out the Heap Memory 75MB of 131MB (allocated)
> PermGen was fine 16MB of 27MB (allocated)
> 
> Buffer Usage is as follows: 
> Direct Memory:
> < 50MB (this gets freed after each GC)
> 
> Mapped Memory:
> count 9
> 144MB (always stays constant)
> 
> I'm assuming the -XX:MaxDirectMemorySize is for Direct Buffer Memory usage NOT Mapped buffer Memory? 
> 
> The other thing we noticed was that after restart the flume process "RES" size starts at around 200MB and then over a period of a week will grow up to 3GB after which we observed the above error. 
> Unfortunately we cannot see where this 3GB of memory is being used when profiled with VisualVM and JConsole (max heap size is set to 256MB) - there definitely appears to be a slow memory leak?
> 
> Flume is the only process running on this server:
> 64bit Centos
> java version "1.6.0_27" (64bit)
> 
> The flume collector is configured with 8 file channels writing to S3 using the HDFS sink. (8 upstream servers a pushing events to 2 downsteam collectors) 
> 
> Each of the 8 channels/sinks is configured as follows:
> ## impression source
> agent.sources.impressions.type = avro
> agent.sources.impressions.bind = 0.0.0.0
> agent.sources.impressions.port = 5001
> agent.sources.impressions.channels = impressions-s3-channel
> ## impression  channel
> agent.channels.impressions-s3-channel.type = file
> agent.channels.impressions-s3-channel.checkpointDir = /mnt/flume-ng/checkpoint/impressions-s3-channel
> agent.channels.impressions-s3-channel.dataDirs = /mnt/flume-ng/data1/impressions-s3-channel,/mnt/flume-ng/data2/impressions-s3-channel
> agent.channels.impressions-s3-channel.maxFileSize = 210000000
> agent.channels.impressions-s3-channel.capacity = 2000000
> agent.channels.impressions-s3-channel.checkpointInterval = 300000
> agent.channels.impressions-s3-channel.transactionCapacity = 10000
> # impression s3 sink
> agent.sinks.impressions-s3-sink.type = hdfs
> agent.sinks.impressions-s3-sink.channel = impressions-s3-channel
> agent.sinks.impressions-s3-sink.hdfs.path = s3n://KEY:SECRET_KEY@S3-PATH
> agent.sinks.impressions-s3-sink.hdfs.filePrefix = impressions-%{collector-host}
> agent.sinks.impressions-s3-sink.hdfs.callTimeout = 0
> agent.sinks.impressions-s3-sink.hdfs.rollInterval = 3600
> agent.sinks.impressions-s3-sink.hdfs.rollSize = 450000000
> agent.sinks.impressions-s3-sink.hdfs.rollCount = 0
> agent.sinks.impressions-s3-sink.hdfs.codeC = gzip
> agent.sinks.impressions-s3-sink.hdfs.fileType = CompressedStream
> agent.sinks.impressions-s3-sink.hdfs.batchSize = 100
> 
> I am using flume-ng 1.3.1 with the following parameters: 
> JAVA_OPTS="-Xms64m -Xmx256m -Xss128k -XX:MaxDirectMemorySize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/mnt/logs/flume-ng/gc.log"
> 
> We have 2 collectors running and they both fail at pretty much the same time.
> 
> So from what i can see there appears to be a slow memory leak with the HDFS sink, but have no idea how track this down or what alternate configuration i can use to prevent this from happening again? 
> 
> Any ideas would be greatly appreciated?
>