You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "thinker0 (JIRA)" <ji...@apache.org> on 2012/07/06 16:03:34 UTC

[jira] [Updated] (FLUME-1326) OutOfMemoryError in HDFSSink

     [ https://issues.apache.org/jira/browse/FLUME-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

thinker0 updated FLUME-1326:
----------------------------

    Comment: was deleted

(was: {panel}
JAVA_OPTS="-server -Xms100m -Xmx200m"
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
{panel}

{code}

2012-07-06T22:27:00.413+0900: [GC [1 CMS-initial-mark: 136575K(136576K)] 198009K(198016K), 0.0350390 secs] [Times: user=0.03 sys=0.00, real=0.04 secs] 
2012-07-06T22:27:00.449+0900: [Full GC [CMS2012-07-06T22:27:00.608+0900: [CMS-concurrent-mark: 0.159/0.160 secs] [Times: user=0.16 sys=0.00, real=0.16 secs] 
 (concurrent mode failure): 136575K->136575K(136576K), 0.5808260 secs] 198014K->197895K(198016K), [CMS Perm : 20375K->20375K(34192K)] icms_dc=100 , 0.5809580 secs] [Times: user=0.58 sys=0.00, real=0.58 secs] 
2012-07-06T22:27:01.030+0900: [GC [1 CMS-initial-mark: 136575K(136576K)] 197897K(198016K), 0.0354160 secs] [Times: user=0.04 sys=0.00, real=0.03 secs] 
Heap
 par new generation   total 61440K, used 61435K [0x00000000ee600000, 0x00000000f28a0000, 0x00000000f28a0000)
  eden space 54656K, 100% used [0x00000000ee600000, 0x00000000f1b60000, 0x00000000f1b60000)
  from space 6784K,  99% used [0x00000000f2200000, 0x00000000f289efa8, 0x00000000f28a0000)
  to   space 6784K,   0% used [0x00000000f1b60000, 0x00000000f1b60000, 0x00000000f2200000)
 concurrent mark-sweep generation total 136576K, used 136575K [0x00000000f28a0000, 0x00000000fae00000, 0x00000000fae00000)
 concurrent-mark-sweep perm gen total 34192K, used 20375K [0x00000000fae00000, 0x00000000fcf64000, 0x0000000100000000)

2012-07-06T22:27:01.069+0900: [Full GC [CMS2012-07-06T22:27:01.226+0900: [CMS-concurrent-mark: 0.159/0.160 secs] [Times: user=0.16 sys=0.00, real=0.17 secs] 
 (concurrent mode failure): 136575K->136575K(136576K), 0.5796240 secs] 198013K->197886K(198016K), [CMS Perm : 20375K->20375K(34192K)] icms_dc=100 , 0.5797590 secs] [Times: user=0.58 sys=0.00, real=0.58 secs] 
2012-07-06T22:27:01.650+0900: [GC [1 CMS-initial-mark: 136575K(136576K)] 197967K(198016K), 0.0351150 secs] [Times: user=0.04 sys=0.00, real=0.03 secs] 
2012-07-06T22:27:01.685+0900: [Full GC [CMS2012-07-06T22:27:01.845+0900: [CMS-concurrent-mark: 0.159/0.160 secs] [Times: user=0.16 sys=0.00, real=0.16 secs] 
 (concurrent mode failure): 136575K->136575K(136576K), 0.6167080 secs] 198008K->197894K(198016K), [CMS Perm : 20375K->20375K(34192K)] icms_dc=100 , 0.6170540 secs] [Times: user=0.62 sys=0.00, real=0.62 secs] 
2012-07-06T22:27:02.303+0900: [GC [1 CMS-initial-mark: 136575K(136576K)] 197918K(198016K), 0.0352080 secs] [Times: user=0.03 sys=0.00, real=0.04 secs] 
2012-07-06T22:27:02.339+0900: [Full GC [CMS2012-07-06T22:27:02.498+0900: [CMS-concurrent-mark: 0.159/0.160 secs] [Times: user=0.16 sys=0.00, real=0.16 secs] 
 (concurrent mode failure): 136575K->136575K(136576K), 0.5811630 secs] 198014K->197913K(198016K), [CMS Perm : 20375K->20375K(34192K)] icms_dc=100 , 0.5812980 secs] [Times: user=0.58 sys=0.00, real=0.58 secs] 

....



2012-07-06T22:45:21.899+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 198883K(201920K), 0.0150690 secs] [Times: user=0.01 sys=0.00, real=0.02 secs] 
Heap
 par new generation   total 26368K, used 25104K [0x00000000ee600000, 0x00000000f0290000, 0x00000000f0290000)
  eden space 23488K, 100% used [0x00000000ee600000, 0x00000000efcf0000, 0x00000000efcf0000)
  from space 2880K,  56% used [0x00000000effc0000, 0x00000000f01543d0, 0x00000000f0290000)
  to   space 2880K,   0% used [0x00000000efcf0000, 0x00000000efcf0000, 0x00000000effc0000)
 concurrent mark-sweep generation total 175552K, used 175551K [0x00000000f0290000, 0x00000000fae00000, 0x00000000fae00000)
 concurrent-mark-sweep perm gen total 34104K, used 20352K [0x00000000fae00000, 0x00000000fcf4e000, 0x0000000100000000)

2012-07-06T22:45:21.923+0900: [Full GC [CMS2012-07-06T22:45:22.120+0900: [CMS-concurrent-mark: 0.205/0.206 secs] [Times: user=0.22 sys=0.01, real=0.20 secs] 
 (concurrent mode failure): 175551K->175551K(175552K), 0.7612190 secs] 201913K->198933K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7613540 secs] [Times: user=0.76 sys=0.00, real=0.77 secs] 
2012-07-06T22:45:22.685+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 199035K(201920K), 0.0151000 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2012-07-06T22:45:22.708+0900: [Full GC [CMS2012-07-06T22:45:22.906+0900: [CMS-concurrent-mark: 0.204/0.205 secs] [Times: user=0.23 sys=0.00, real=0.21 secs] 
 (concurrent mode failure): 175551K->175551K(175552K), 0.7694170 secs] 201919K->199083K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7695470 secs] [Times: user=0.77 sys=0.00, real=0.77 secs] 
2012-07-06T22:45:23.483+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 200691K(201920K), 0.0157280 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
2012-07-06T22:45:23.502+0900: [Full GC [CMS2012-07-06T22:45:23.716+0900: [CMS-concurrent-mark: 0.216/0.217 secs] [Times: user=0.22 sys=0.00, real=0.22 secs] 
 (concurrent mode failure): 175552K->175551K(175552K), 0.7632440 secs] 201920K->199203K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7633760 secs] [Times: user=0.76 sys=0.00, real=0.77 secs] 
2012-07-06T22:45:24.268+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 199625K(201920K), 0.0156730 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
2012-07-06T22:45:24.295+0900: [Full GC [CMS2012-07-06T22:45:24.490+0900: [CMS-concurrent-mark: 0.205/0.206 secs] [Times: user=0.22 sys=0.00, real=0.21 secs] 
 (concurrent mode failure): 175551K->175551K(175552K), 0.7598450 secs] 201919K->199192K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7599740 secs] [Times: user=0.76 sys=0.00, real=0.76 secs] 
2012-07-06T22:45:25.058+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 199986K(201920K), 0.0155300 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2012-07-06T22:45:25.080+0900: [Full GC [CMS2012-07-06T22:45:25.279+0900: [CMS-concurrent-mark: 0.205/0.205 secs] [Times: user=0.22 sys=0.00, real=0.21 secs] 
 (concurrent mode failure): 175551K->175551K(175552K), 0.7627070 secs] 201913K->199371K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7628390 secs] [Times: user=0.77 sys=0.00, real=0.76 secs] 
2012-07-06T22:45:25.844+0900: [GC [1 CMS-initial-mark: 175551K(175552K)] 199571K(201920K), 0.0154090 secs] [Times: user=0.01 sys=0.00, real=0.02 secs] 
2012-07-06T22:45:25.868+0900: [Full GC [CMS2012-07-06T22:45:26.065+0900: [CMS-concurrent-mark: 0.205/0.205 secs] [Times: user=0.22 sys=0.01, real=0.21 secs] 
 (concurrent mode failure): 175551K->175551K(175552K), 0.7624890 secs] 201912K->199463K(201920K), [CMS Perm : 20352K->20352K(34104K)] icms_dc=100 , 0.7626150 secs] [Times: user=0.77 sys=0.00, real=0.76 secs] 
{code})
    
> OutOfMemoryError in HDFSSink
> ----------------------------
>
>                 Key: FLUME-1326
>                 URL: https://issues.apache.org/jira/browse/FLUME-1326
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.2.0
>            Reporter: Juhani Connolly
>            Priority: Critical
>
> We run a 3 node/1 collector test cluster pushing about 350events/sec per node... Not really high stress, but just something to evaluate flume with.
> Consistently our collector has been dying because of an OOMError killing the SinkRunner after running for about 30-40 hours(seems pretty consistent as we've had it 3 times now). 
> Suspected cause would be a memory leak somewhere in HdfsSink. The feeder nodes which run AvroSink instead of HdfsSink have been up and running for about a week without restarts.
> flume-load/act-wap02/2012-06-26-17.1340697637324.tmp, packetSize=65557, chunksPerPacket=127, bytesCurBlock=29731328
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2012-06-26 17:12:56,080 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:411)] process failed
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.util.Arrays.copyOfRange(Arrays.java:3209)
>         at java.lang.String.<init>(String.java:215)
>         at java.lang.StringBuilder.toString(StringBuilder.java:430)
>         at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:306)
>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:367)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:619)
> Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.util.Arrays.copyOfRange(Arrays.java:3209)
>         at java.lang.String.<init>(String.java:215)
>         at java.lang.StringBuilder.toString(StringBuilder.java:430)
>         at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:306)
>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:367)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira