You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by "Anantharaman, Srinatha (Contractor)" <Sr...@comcast.com> on 2017/07/26 15:00:31 UTC

Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Hi All,

Though I have mentioned the -Xms and -Xmx  values Flume is consuming all memory and failing at the end

I have tried adding above parameters in command line as below


a.     /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Dproperty="-Xms1024m -Xmx4048m"

b.    /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Xms1024m -Xmx4048m

And also using flume-env.sh file as below

export JAVA_OPTS="-Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseConcMarkSweepGC"

I am using HDP 2.5  and flume 1.5.2.2.5

Kindly let me know how to resolve this issue

Regards,
~Sri

RE: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Posted by "Anantharaman, Srinatha (Contractor)" <Sr...@comcast.com>.
Iain,

I got the solution for my problem.
Issue was with writing to HDFS. While ingesting files to Solr I am also copying that file to HDFS.
Now as a separate process I am consolidating all ingested files and storing it on HDFS.

Thanks a ton for your help!!

Regards,
~Sri



From: iain wright [mailto:iainwrig@gmail.com]
Sent: Thursday, July 27, 2017 4:22 PM
To: user@flume.apache.org
Subject: Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Definitely strange. I didn't see you already had those set in your flume-env.sh, my mistake

If you have room, and double the Xmx, do you still get OOM/GC overlimit?

Might be interesting to see the metrics endpoint output over time compared with jvisualvm graph of the heap over the same period.

How long does it take for the OOM to occur?

Are events making it through both sinks to their destinations?

Anything else interesting in the logs before the OOM?

Best,



--
Iain Wright

This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message.

On Wed, Jul 26, 2017 at 5:27 PM, Anantharaman, Srinatha (Contractor) <Sr...@comcast.com>> wrote:
Iain,

Yes I see the Java process with my Xms/Xmx size like below

//bin/java -Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xms1024m -Xmx4048m -cp /etc/flume/conf:/usr/hdp/2.3.4.0-3485/flume

Let me work on your other suggestions, I will keep you posted

I appreciate your valuable time

Regards,
~Sri

From: iain wright [mailto:iainwrig@gmail.com<ma...@gmail.com>]
Sent: Wednesday, July 26, 2017 5:24 PM

To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Config seems sane.

if you ps auxww|grep -i flume, do you see the java process started with your Xms/Xmx flags?

I increased the heap & added jmx by adding this to flume-env.sh in the flume conf dir:
JAVA_OPTS="-Xms2048m -Xmx3072m -Dcom.sun.management.jmxremote"

If you enable jmx you can get some more info about the heap allocation/do a heap dump also with jvisualvm. It seems most likely those flags aren't getting to the jvm

wrt. Monitoring, you can add -Dflume.monitoring.type=HTTP -Dflume.monitoring.port=34548 to expose the metrics endpoint on http://<host>:34548/metrics<http://%3chost%3e:34548/metrics>




--
Iain Wright

This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message.

On Wed, Jul 26, 2017 at 1:24 PM, Anantharaman, Srinatha (Contractor) <Sr...@comcast.com>> wrote:
Lain,

I am using file channel. Source is spoolDir and Sinks are Solr and HDFS
Please find below my Code

#Flume Configuration Starts

agent.sources = SpoolDirSrc
agent.channels = Channel1 Channel2
agent.sinks = SolrSink HDFSsink

# Configure Source

agent.sources.SpoolDirSrc.channels = Channel1 Channel2
agent.sources.SpoolDirSrc.type = spooldir
#agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/sources_tmp2
#agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/processed_emails/
agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/processed_emails2/
agent.sources.SpoolDirSrc.basenameHeader = true
agent.sources.SpoolDirSrc.selector.type = replicating
#agent.sources.SpoolDirSrc.batchSize = 100000

agent.sources.SpoolDirSrc.fileHeader = true
#agent.sources.src1.fileSuffix = .COMPLETED
agent.sources.SpoolDirSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder


# Use a channel that buffers events in file
#
agent.channels.Channel1.type = file
agent.channels.Channel2.type = file
agent.channels.Channel1.capacity = 5000
agent.channels.Channel2.capacity = 5000
agent.channels.Channel1.transactionCapacity = 5000
agent.channels.Channel2.transactionCapacity = 5000
agent.channels.Channel1.checkpointDir = /app/home/flume/.flume/file-channel/checkpoint1
agent.channels.Channel2.checkpointDir = /app/home/flume/.flume/file-channel/checkpoint2
agent.channels.Channel1.dataDirs = /app/home/flume/.flume/file-channel/data1
agent.channels.Channel2.dataDirs = /app/home/flume/.flume/file-channel/data2


#agent.channels.Channel.transactionCapacity = 10000


# Configure Solr Sink

agent.sinks.SolrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf
agent.sinks.SolrSink.batchsize = 10
agent.sinks.SolrSink.batchDurationMillis = 10
agent.sinks.SolrSink.channel = Channel1
agent.sinks.SolrSink.morphlineId = morphline1
agent.sinks.SolrSink.tika.config = tikaConfig.xml
#agent.sinks.SolrSink.fileType = DataStream
#agent.sinks.SolrSink.hdfs.batchsize = 5
agent.sinks.SolrSink.rollCount = 0
agent.sinks.SolrSink.rollInterval = 0
#agent.sinks.SolrSink.rollsize = 100000000
agent.sinks.SolrSink.idleTimeout = 0
#agent.sinks.SolrSink.txnEventMax = 5000

# Configure HDFS Sink

agent.sinks.HDFSsink.channel = Channel2
agent.sinks.HDFSsink.type = hdfs
#agent.sinks.HDFSsink.hdfs.path = hdfs://codehdplak-po-r10p.sys.comcast.net:8020/user/solr/emails<http://codehdplak-po-r10p.sys.comcast.net:8020/user/solr/emails>
agent.sinks.HDFSsink.hdfs.path = hdfs://codehann/user/solr/emails
#agent.sinks.HDFSsink.hdfs.fileType = DataStream
agent.sinks.HDFSsink.hdfs.fileType = CompressedStream
agent.sinks.HDFSsink.hdfs.batchsize = 1000
agent.sinks.HDFSsink.hdfs.rollCount = 0
agent.sinks.HDFSsink.hdfs.rollInterval = 0
agent.sinks.HDFSsink.hdfs.rollsize = 10485760
agent.sinks.HDFSsink.hdfs.idleTimeout = 0
agent.sinks.HDFSsink.hdfs.maxOpenFiles = 1
agent.sinks.HDFSsink.hdfs.filePrefix = %{basename}
agent.sinks.HDFSsink.hdfs.codeC = gzip


agent.sources.SpoolDirSrc.channels = Channel1 Channel2
agent.sinks.SolrSink.channel = Channel1
agent.sinks.HDFSsink.channel = Channel2

Morhphine Code :


solrLocator: {

collection : esearch

#zkHost : "127.0.0.1:9983<http://127.0.0.1:9983>"

#zkHost : "codesolr-as-r1p.sys.comcast.net:2181<http://codesolr-as-r1p.sys.comcast.net:2181>,codesolr-as-r2p.sys.comcast.net:2182<http://codesolr-as-r2p.sys.comcast.net:2182>"
#zkHost : "codesolr-as-r2p:2181"
zkHost : "codesolr-wc-r1p.sys.comcast.net:2181<http://codesolr-wc-r1p.sys.comcast.net:2181>,codesolr-wc-r2p.sys.comcast.net:2181<http://codesolr-wc-r2p.sys.comcast.net:2181>,codesolr-wc-r3p.sys.comcast.net:2181<http://codesolr-wc-r3p.sys.comcast.net:2181>"

}

morphlines :
[

  {

    id : morphline1

    importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

    commands :
    [

      { detectMimeType { includeDefaultMimeTypes : true } }

      {

        solrCell {

          solrLocator : ${solrLocator}

          captureAttr : true

          lowernames : true

          capture : [_attachment_body, _attachment_mimetype, basename, content, content_encoding, content_type, file, meta,text]

          parsers : [ # { parser : org.apache.tika.parser.txt.TXTParser }

                    # { parser : org.apache.tika.parser.AutoDetectParser }
                      #{ parser : org.apache.tika.parser.asm.ClassParser }
                      #{ parser : org.gagravarr.tika.FlacParser }
                      #{ parser : org.apache.tika.parser.executable.ExecutableParser }
                      #{ parser : org.apache.tika.parser.font.TrueTypeParser }
                      #{ parser : org.apache.tika.parser.xml.XMLParser }
                      #{ parser : org.apache.tika.parser.html.HtmlParser }
                      #{ parser : org.apache.tika.parser.image.TiffParser }
                      # { parser : org.apache.tika.parser.mail.RFC822Parser }
                      #{ parser : org.apache.tika.parser.mbox.MboxParser, additionalSupportedMimeTypes : [message/x-emlx] }
                      #{ parser : org.apache.tika.parser.microsoft.OfficeParser }
                      #{ parser : org.apache.tika.parser.hdf.HDFParser }
                      #{ parser : org.apache.tika.parser.odf.OpenDocumentParser }
                      #{ parser : org.apache.tika.parser.pdf.PDFParser }
                      #{ parser : org.apache.tika.parser.rtf.RTFParser }
                      { parser : org.apache.tika.parser.txt.TXTParser }
                      #{ parser : org.apache.tika.parser.chm.ChmParser }
                    ]

         fmap : { content : text }
         }

      }
      { generateUUID { field : id } }

      { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }


      { logDebug { format : "output record: {}", args : ["@{}"] } }

      { loadSolr: { solrLocator : ${solrLocator} } }

    ]

  }

]

I am not sure How I can get the flume metrics.
Thank you for looking into it

Regards,
~Sri

From: iain wright [mailto:iainwrig@gmail.com<ma...@gmail.com>]
Sent: Wednesday, July 26, 2017 2:37 PM
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Hi Sri,

Are you using a memory channel? What source/sink?

Can you please paste/link your obfuscated config

What does the metrics endpoint say in terms of channel size, sinkdrainsuccess etc, for the period leading up to the OOM?

Best,
Iain

Sent from my iPhone

On Jul 26, 2017, at 8:00 AM, Anantharaman, Srinatha (Contractor) <Sr...@comcast.com>> wrote:
Hi All,

Though I have mentioned the -Xms and -Xmx  values Flume is consuming all memory and failing at the end

I have tried adding above parameters in command line as below


a.       /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Dproperty="-Xms1024m -Xmx4048m"

b.      /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Xms1024m -Xmx4048m

And also using flume-env.sh file as below

export JAVA_OPTS="-Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseConcMarkSweepGC"

I am using HDP 2.5  and flume 1.5.2.2.5

Kindly let me know how to resolve this issue

Regards,
~Sri



Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Posted by iain wright <ia...@gmail.com>.
Definitely strange. I didn't see you already had those set in your
flume-env.sh, my mistake

If you have room, and double the Xmx, do you still get OOM/GC overlimit?

Might be interesting to see the metrics endpoint output over time compared
with jvisualvm graph of the heap over the same period.

How long does it take for the OOM to occur?

Are events making it through both sinks to their destinations?

Anything else interesting in the logs before the OOM?

Best,



-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Wed, Jul 26, 2017 at 5:27 PM, Anantharaman, Srinatha (Contractor) <
Srinatha_Anantharaman@comcast.com> wrote:

> Iain,
>
>
>
> Yes I see the Java process with my Xms/Xmx size like below
>
>
>
> //bin/java *-Xms2048m -Xmx4048m* -Dcom.sun.management.jmxremote
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC *-Xms1024m -Xmx4048m* -cp
> /etc/flume/conf:/usr/hdp/2.3.4.0-3485/flume
>
>
>
> Let me work on your other suggestions, I will keep you posted
>
>
>
> I appreciate your valuable time
>
>
>
> Regards,
>
> ~Sri
>
>
>
> *From:* iain wright [mailto:iainwrig@gmail.com]
> *Sent:* Wednesday, July 26, 2017 5:24 PM
>
> *To:* user@flume.apache.org
> *Subject:* Re: Flume consumes all memory - { OutOfMemoryError: GC
> overhead limit exceeded }
>
>
>
> Config seems sane.
>
>
>
> if you ps auxww|grep -i flume, do you see the java process started with
> your Xms/Xmx flags?
>
>
> I increased the heap & added jmx by adding this to flume-env.sh in the
> flume conf dir:
>
> JAVA_OPTS="-Xms2048m -Xmx3072m -Dcom.sun.management.jmxremote"
>
> If you enable jmx you can get some more info about the heap allocation/do
> a heap dump also with jvisualvm. It seems most likely those flags aren't
> getting to the jvm
>
>
>
> wrt. Monitoring, you can add -Dflume.monitoring.type=HTTP
> -Dflume.monitoring.port=34548 to expose the metrics endpoint on
> http://<host>:34548/metrics
>
>
>
>
>
>
>
>
> --
>
> Iain Wright
>
>
>
> This email message is confidential, intended only for the recipient(s)
> named above and may contain information that is privileged, exempt from
> disclosure under applicable law. If you are not the intended recipient, do
> not disclose or disseminate the message to anyone except the intended
> recipient. If you have received this message in error, or are not the named
> recipient(s), please immediately notify the sender by return email, and
> delete all copies of this message.
>
>
>
> On Wed, Jul 26, 2017 at 1:24 PM, Anantharaman, Srinatha (Contractor) <
> Srinatha_Anantharaman@comcast.com> wrote:
>
> Lain,
>
>
>
> I am using file channel. Source is spoolDir and Sinks are Solr and HDFS
>
> Please find below my Code
>
>
>
> #Flume Configuration Starts
>
>
>
> agent.sources = SpoolDirSrc
>
> agent.channels = Channel1 Channel2
>
> agent.sinks = SolrSink HDFSsink
>
>
>
> # Configure Source
>
>
>
> agent.sources.SpoolDirSrc.channels = Channel1 Channel2
>
> agent.sources.SpoolDirSrc.type = spooldir
>
> #agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/sources_tmp2
>
> #agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/
> processed_emails/
>
> agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/
> processed_emails2/
>
> agent.sources.SpoolDirSrc.basenameHeader = true
>
> agent.sources.SpoolDirSrc.selector.type = replicating
>
> #agent.sources.SpoolDirSrc.batchSize = 100000
>
>
>
> agent.sources.SpoolDirSrc.fileHeader = true
>
> #agent.sources.src1.fileSuffix = .COMPLETED
>
> agent.sources.SpoolDirSrc.deserializer = org.apache.flume.sink.solr.
> morphline.BlobDeserializer$Builder
>
>
>
>
>
> # Use a channel that buffers events in file
>
> #
>
> agent.channels.Channel1.type = file
>
> agent.channels.Channel2.type = file
>
> agent.channels.Channel1.capacity = 5000
>
> agent.channels.Channel2.capacity = 5000
>
> agent.channels.Channel1.transactionCapacity = 5000
>
> agent.channels.Channel2.transactionCapacity = 5000
>
> agent.channels.Channel1.checkpointDir = /app/home/flume/.flume/file-
> channel/checkpoint1
>
> agent.channels.Channel2.checkpointDir = /app/home/flume/.flume/file-
> channel/checkpoint2
>
> agent.channels.Channel1.dataDirs = /app/home/flume/.flume/file-
> channel/data1
>
> agent.channels.Channel2.dataDirs = /app/home/flume/.flume/file-
> channel/data2
>
>
>
>
>
> #agent.channels.Channel.transactionCapacity = 10000
>
>
>
>
>
> # Configure Solr Sink
>
>
>
> agent.sinks.SolrSink.type = org.apache.flume.sink.solr.
> morphline.MorphlineSolrSink
>
> agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf
>
> agent.sinks.SolrSink.batchsize = 10
>
> agent.sinks.SolrSink.batchDurationMillis = 10
>
> agent.sinks.SolrSink.channel = Channel1
>
> agent.sinks.SolrSink.morphlineId = morphline1
>
> agent.sinks.SolrSink.tika.config = tikaConfig.xml
>
> #agent.sinks.SolrSink.fileType = DataStream
>
> #agent.sinks.SolrSink.hdfs.batchsize = 5
>
> agent.sinks.SolrSink.rollCount = 0
>
> agent.sinks.SolrSink.rollInterval = 0
>
> #agent.sinks.SolrSink.rollsize = 100000000
>
> agent.sinks.SolrSink.idleTimeout = 0
>
> #agent.sinks.SolrSink.txnEventMax = 5000
>
>
>
> # Configure HDFS Sink
>
>
>
> agent.sinks.HDFSsink.channel = Channel2
>
> agent.sinks.HDFSsink.type = hdfs
>
> #agent.sinks.HDFSsink.hdfs.path = hdfs://codehdplak-po-r10p.sys.
> comcast.net:8020/user/solr/emails
>
> agent.sinks.HDFSsink.hdfs.path = hdfs://codehann/user/solr/emails
>
> #agent.sinks.HDFSsink.hdfs.fileType = DataStream
>
> agent.sinks.HDFSsink.hdfs.fileType = CompressedStream
>
> agent.sinks.HDFSsink.hdfs.batchsize = 1000
>
> agent.sinks.HDFSsink.hdfs.rollCount = 0
>
> agent.sinks.HDFSsink.hdfs.rollInterval = 0
>
> agent.sinks.HDFSsink.hdfs.rollsize = 10485760
>
> agent.sinks.HDFSsink.hdfs.idleTimeout = 0
>
> agent.sinks.HDFSsink.hdfs.maxOpenFiles = 1
>
> agent.sinks.HDFSsink.hdfs.filePrefix = %{basename}
>
> agent.sinks.HDFSsink.hdfs.codeC = gzip
>
>
>
>
>
> agent.sources.SpoolDirSrc.channels = Channel1 Channel2
>
> agent.sinks.SolrSink.channel = Channel1
>
> agent.sinks.HDFSsink.channel = Channel2
>
>
>
> Morhphine Code :
>
>
>
>
>
> solrLocator: {
>
>
>
> collection : esearch
>
>
>
> #zkHost : "127.0.0.1:9983"
>
>
>
> #zkHost : "codesolr-as-r1p.sys.comcast.net:2181,codesolr-as-r2p.sys.
> comcast.net:2182"
>
> #zkHost : "codesolr-as-r2p:2181"
>
> zkHost : "codesolr-wc-r1p.sys.comcast.net:2181,codesolr-wc-r2p.sys.
> comcast.net:2181,codesolr-wc-r3p.sys.comcast.net:2181"
>
>
>
> }
>
>
>
> morphlines :
>
> [
>
>
>
>   {
>
>
>
>     id : morphline1
>
>
>
>     importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
>
>
>
>     commands :
>
>     [
>
>
>
>       { detectMimeType { includeDefaultMimeTypes : true } }
>
>
>
>       {
>
>
>
>         solrCell {
>
>
>
>           solrLocator : ${solrLocator}
>
>
>
>           captureAttr : true
>
>
>
>           lowernames : true
>
>
>
>           capture : [_attachment_body, _attachment_mimetype, basename,
> content, content_encoding, content_type, file, meta,text]
>
>
>
>           parsers : [ # { parser : org.apache.tika.parser.txt.TXTParser }
>
>
>
>                     # { parser : org.apache.tika.parser.AutoDetectParser }
>
>                       #{ parser : org.apache.tika.parser.asm.ClassParser }
>
>                       #{ parser : org.gagravarr.tika.FlacParser }
>
>                       #{ parser : org.apache.tika.parser.executable.ExecutableParser
> }
>
>                       #{ parser : org.apache.tika.parser.font.TrueTypeParser
> }
>
>                       #{ parser : org.apache.tika.parser.xml.XMLParser }
>
>                       #{ parser : org.apache.tika.parser.html.HtmlParser }
>
>                       #{ parser : org.apache.tika.parser.image.TiffParser
> }
>
>                       # { parser : org.apache.tika.parser.mail.RFC822Parser
> }
>
>                       #{ parser : org.apache.tika.parser.mbox.MboxParser,
> additionalSupportedMimeTypes : [message/x-emlx] }
>
>                       #{ parser : org.apache.tika.parser.microsoft.OfficeParser
> }
>
>                       #{ parser : org.apache.tika.parser.hdf.HDFParser }
>
>                       #{ parser : org.apache.tika.parser.odf.OpenDocumentParser
> }
>
>                       #{ parser : org.apache.tika.parser.pdf.PDFParser }
>
>                       #{ parser : org.apache.tika.parser.rtf.RTFParser }
>
>                       { parser : org.apache.tika.parser.txt.TXTParser }
>
>                       #{ parser : org.apache.tika.parser.chm.ChmParser }
>
>                     ]
>
>
>
>          fmap : { content : text }
>
>          }
>
>
>
>       }
>
>       { generateUUID { field : id } }
>
>
>
>       { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }
>
>
>
>
>
>       { logDebug { format : "output record: {}", args : ["@{}"] } }
>
>
>
>       { loadSolr: { solrLocator : ${solrLocator} } }
>
>
>
>     ]
>
>
>
>   }
>
>
>
> ]
>
>
>
> I am not sure How I can get the flume metrics.
>
> Thank you for looking into it
>
>
>
> Regards,
>
> ~Sri
>
>
>
> *From:* iain wright [mailto:iainwrig@gmail.com]
> *Sent:* Wednesday, July 26, 2017 2:37 PM
> *To:* user@flume.apache.org
> *Subject:* Re: Flume consumes all memory - { OutOfMemoryError: GC
> overhead limit exceeded }
>
>
>
> Hi Sri,
>
>
>
> Are you using a memory channel? What source/sink?
>
>
>
> Can you please paste/link your obfuscated config
>
>
>
> What does the metrics endpoint say in terms of channel size,
> sinkdrainsuccess etc, for the period leading up to the OOM?
>
>
>
> Best,
>
> Iain
>
>
> Sent from my iPhone
>
>
> On Jul 26, 2017, at 8:00 AM, Anantharaman, Srinatha (Contractor) <
> Srinatha_Anantharaman@comcast.com> wrote:
>
> Hi All,
>
>
>
> Though I have mentioned the -Xms and -Xmx  values Flume is consuming all
> memory and failing at the end
>
>
>
> I have tried adding above parameters in command line as below
>
>
>
> a.       /usr/hdp/current/flume-server/bin/flume-ng agent -c
> /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent
> -Dproperty="-Xms1024m -Xmx4048m"
>
> b.      /usr/hdp/current/flume-server/bin/flume-ng agent -c
> /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Xms1024m
> -Xmx4048m
>
>
>
> And also using flume-env.sh file as below
>
>
>
> export JAVA_OPTS="-Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC"
>
>
>
> I am using HDP 2.5  and flume 1.5.2.2.5
>
>
>
> Kindly let me know how to resolve this issue
>
>
>
> Regards,
>
> ~Sri
>
>
>

RE: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Posted by "Anantharaman, Srinatha (Contractor)" <Sr...@comcast.com>.
Iain,

Yes I see the Java process with my Xms/Xmx size like below

//bin/java -Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xms1024m -Xmx4048m -cp /etc/flume/conf:/usr/hdp/2.3.4.0-3485/flume

Let me work on your other suggestions, I will keep you posted

I appreciate your valuable time

Regards,
~Sri

From: iain wright [mailto:iainwrig@gmail.com]
Sent: Wednesday, July 26, 2017 5:24 PM
To: user@flume.apache.org
Subject: Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Config seems sane.

if you ps auxww|grep -i flume, do you see the java process started with your Xms/Xmx flags?

I increased the heap & added jmx by adding this to flume-env.sh in the flume conf dir:
JAVA_OPTS="-Xms2048m -Xmx3072m -Dcom.sun.management.jmxremote"

If you enable jmx you can get some more info about the heap allocation/do a heap dump also with jvisualvm. It seems most likely those flags aren't getting to the jvm

wrt. Monitoring, you can add -Dflume.monitoring.type=HTTP -Dflume.monitoring.port=34548 to expose the metrics endpoint on http://<host>:34548/metrics<http://%3chost%3e:34548/metrics>




--
Iain Wright

This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message.

On Wed, Jul 26, 2017 at 1:24 PM, Anantharaman, Srinatha (Contractor) <Sr...@comcast.com>> wrote:
Lain,

I am using file channel. Source is spoolDir and Sinks are Solr and HDFS
Please find below my Code

#Flume Configuration Starts

agent.sources = SpoolDirSrc
agent.channels = Channel1 Channel2
agent.sinks = SolrSink HDFSsink

# Configure Source

agent.sources.SpoolDirSrc.channels = Channel1 Channel2
agent.sources.SpoolDirSrc.type = spooldir
#agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/sources_tmp2
#agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/processed_emails/
agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/processed_emails2/
agent.sources.SpoolDirSrc.basenameHeader = true
agent.sources.SpoolDirSrc.selector.type = replicating
#agent.sources.SpoolDirSrc.batchSize = 100000

agent.sources.SpoolDirSrc.fileHeader = true
#agent.sources.src1.fileSuffix = .COMPLETED
agent.sources.SpoolDirSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder


# Use a channel that buffers events in file
#
agent.channels.Channel1.type = file
agent.channels.Channel2.type = file
agent.channels.Channel1.capacity = 5000
agent.channels.Channel2.capacity = 5000
agent.channels.Channel1.transactionCapacity = 5000
agent.channels.Channel2.transactionCapacity = 5000
agent.channels.Channel1.checkpointDir = /app/home/flume/.flume/file-channel/checkpoint1
agent.channels.Channel2.checkpointDir = /app/home/flume/.flume/file-channel/checkpoint2
agent.channels.Channel1.dataDirs = /app/home/flume/.flume/file-channel/data1
agent.channels.Channel2.dataDirs = /app/home/flume/.flume/file-channel/data2


#agent.channels.Channel.transactionCapacity = 10000


# Configure Solr Sink

agent.sinks.SolrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf
agent.sinks.SolrSink.batchsize = 10
agent.sinks.SolrSink.batchDurationMillis = 10
agent.sinks.SolrSink.channel = Channel1
agent.sinks.SolrSink.morphlineId = morphline1
agent.sinks.SolrSink.tika.config = tikaConfig.xml
#agent.sinks.SolrSink.fileType = DataStream
#agent.sinks.SolrSink.hdfs.batchsize = 5
agent.sinks.SolrSink.rollCount = 0
agent.sinks.SolrSink.rollInterval = 0
#agent.sinks.SolrSink.rollsize = 100000000
agent.sinks.SolrSink.idleTimeout = 0
#agent.sinks.SolrSink.txnEventMax = 5000

# Configure HDFS Sink

agent.sinks.HDFSsink.channel = Channel2
agent.sinks.HDFSsink.type = hdfs
#agent.sinks.HDFSsink.hdfs.path = hdfs://codehdplak-po-r10p.sys.comcast.net:8020/user/solr/emails<http://codehdplak-po-r10p.sys.comcast.net:8020/user/solr/emails>
agent.sinks.HDFSsink.hdfs.path = hdfs://codehann/user/solr/emails
#agent.sinks.HDFSsink.hdfs.fileType = DataStream
agent.sinks.HDFSsink.hdfs.fileType = CompressedStream
agent.sinks.HDFSsink.hdfs.batchsize = 1000
agent.sinks.HDFSsink.hdfs.rollCount = 0
agent.sinks.HDFSsink.hdfs.rollInterval = 0
agent.sinks.HDFSsink.hdfs.rollsize = 10485760
agent.sinks.HDFSsink.hdfs.idleTimeout = 0
agent.sinks.HDFSsink.hdfs.maxOpenFiles = 1
agent.sinks.HDFSsink.hdfs.filePrefix = %{basename}
agent.sinks.HDFSsink.hdfs.codeC = gzip


agent.sources.SpoolDirSrc.channels = Channel1 Channel2
agent.sinks.SolrSink.channel = Channel1
agent.sinks.HDFSsink.channel = Channel2

Morhphine Code :


solrLocator: {

collection : esearch

#zkHost : "127.0.0.1:9983<http://127.0.0.1:9983>"

#zkHost : "codesolr-as-r1p.sys.comcast.net:2181<http://codesolr-as-r1p.sys.comcast.net:2181>,codesolr-as-r2p.sys.comcast.net:2182<http://codesolr-as-r2p.sys.comcast.net:2182>"
#zkHost : "codesolr-as-r2p:2181"
zkHost : "codesolr-wc-r1p.sys.comcast.net:2181<http://codesolr-wc-r1p.sys.comcast.net:2181>,codesolr-wc-r2p.sys.comcast.net:2181<http://codesolr-wc-r2p.sys.comcast.net:2181>,codesolr-wc-r3p.sys.comcast.net:2181<http://codesolr-wc-r3p.sys.comcast.net:2181>"

}

morphlines :
[

  {

    id : morphline1

    importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

    commands :
    [

      { detectMimeType { includeDefaultMimeTypes : true } }

      {

        solrCell {

          solrLocator : ${solrLocator}

          captureAttr : true

          lowernames : true

          capture : [_attachment_body, _attachment_mimetype, basename, content, content_encoding, content_type, file, meta,text]

          parsers : [ # { parser : org.apache.tika.parser.txt.TXTParser }

                    # { parser : org.apache.tika.parser.AutoDetectParser }
                      #{ parser : org.apache.tika.parser.asm.ClassParser }
                      #{ parser : org.gagravarr.tika.FlacParser }
                      #{ parser : org.apache.tika.parser.executable.ExecutableParser }
                      #{ parser : org.apache.tika.parser.font.TrueTypeParser }
                      #{ parser : org.apache.tika.parser.xml.XMLParser }
                      #{ parser : org.apache.tika.parser.html.HtmlParser }
                      #{ parser : org.apache.tika.parser.image.TiffParser }
                      # { parser : org.apache.tika.parser.mail.RFC822Parser }
                      #{ parser : org.apache.tika.parser.mbox.MboxParser, additionalSupportedMimeTypes : [message/x-emlx] }
                      #{ parser : org.apache.tika.parser.microsoft.OfficeParser }
                      #{ parser : org.apache.tika.parser.hdf.HDFParser }
                      #{ parser : org.apache.tika.parser.odf.OpenDocumentParser }
                      #{ parser : org.apache.tika.parser.pdf.PDFParser }
                      #{ parser : org.apache.tika.parser.rtf.RTFParser }
                      { parser : org.apache.tika.parser.txt.TXTParser }
                      #{ parser : org.apache.tika.parser.chm.ChmParser }
                    ]

         fmap : { content : text }
         }

      }
      { generateUUID { field : id } }

      { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }


      { logDebug { format : "output record: {}", args : ["@{}"] } }

      { loadSolr: { solrLocator : ${solrLocator} } }

    ]

  }

]

I am not sure How I can get the flume metrics.
Thank you for looking into it

Regards,
~Sri

From: iain wright [mailto:iainwrig@gmail.com<ma...@gmail.com>]
Sent: Wednesday, July 26, 2017 2:37 PM
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Hi Sri,

Are you using a memory channel? What source/sink?

Can you please paste/link your obfuscated config

What does the metrics endpoint say in terms of channel size, sinkdrainsuccess etc, for the period leading up to the OOM?

Best,
Iain

Sent from my iPhone

On Jul 26, 2017, at 8:00 AM, Anantharaman, Srinatha (Contractor) <Sr...@comcast.com>> wrote:
Hi All,

Though I have mentioned the -Xms and -Xmx  values Flume is consuming all memory and failing at the end

I have tried adding above parameters in command line as below


a.       /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Dproperty="-Xms1024m -Xmx4048m"

b.      /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Xms1024m -Xmx4048m

And also using flume-env.sh file as below

export JAVA_OPTS="-Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseConcMarkSweepGC"

I am using HDP 2.5  and flume 1.5.2.2.5

Kindly let me know how to resolve this issue

Regards,
~Sri


Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Posted by iain wright <ia...@gmail.com>.
Config seems sane.

if you ps auxww|grep -i flume, do you see the java process started with
your Xms/Xmx flags?

I increased the heap & added jmx by adding this to flume-env.sh in the
flume conf dir:
JAVA_OPTS="-Xms2048m -Xmx3072m -Dcom.sun.management.jmxremote"

If you enable jmx you can get some more info about the heap allocation/do a
heap dump also with jvisualvm. It seems most likely those flags aren't
getting to the jvm

wrt. Monitoring, you can add -Dflume.monitoring.type=HTTP
-Dflume.monitoring.port=34548 to expose the metrics endpoint on http://
<host>:34548/metrics





-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Wed, Jul 26, 2017 at 1:24 PM, Anantharaman, Srinatha (Contractor) <
Srinatha_Anantharaman@comcast.com> wrote:

> Lain,
>
>
>
> I am using file channel. Source is spoolDir and Sinks are Solr and HDFS
>
> Please find below my Code
>
>
>
> #Flume Configuration Starts
>
>
>
> agent.sources = SpoolDirSrc
>
> agent.channels = Channel1 Channel2
>
> agent.sinks = SolrSink HDFSsink
>
>
>
> # Configure Source
>
>
>
> agent.sources.SpoolDirSrc.channels = Channel1 Channel2
>
> agent.sources.SpoolDirSrc.type = spooldir
>
> #agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/sources_tmp2
>
> #agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/
> processed_emails/
>
> agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/
> processed_emails2/
>
> agent.sources.SpoolDirSrc.basenameHeader = true
>
> agent.sources.SpoolDirSrc.selector.type = replicating
>
> #agent.sources.SpoolDirSrc.batchSize = 100000
>
>
>
> agent.sources.SpoolDirSrc.fileHeader = true
>
> #agent.sources.src1.fileSuffix = .COMPLETED
>
> agent.sources.SpoolDirSrc.deserializer = org.apache.flume.sink.solr.
> morphline.BlobDeserializer$Builder
>
>
>
>
>
> # Use a channel that buffers events in file
>
> #
>
> agent.channels.Channel1.type = file
>
> agent.channels.Channel2.type = file
>
> agent.channels.Channel1.capacity = 5000
>
> agent.channels.Channel2.capacity = 5000
>
> agent.channels.Channel1.transactionCapacity = 5000
>
> agent.channels.Channel2.transactionCapacity = 5000
>
> agent.channels.Channel1.checkpointDir = /app/home/flume/.flume/file-
> channel/checkpoint1
>
> agent.channels.Channel2.checkpointDir = /app/home/flume/.flume/file-
> channel/checkpoint2
>
> agent.channels.Channel1.dataDirs = /app/home/flume/.flume/file-
> channel/data1
>
> agent.channels.Channel2.dataDirs = /app/home/flume/.flume/file-
> channel/data2
>
>
>
>
>
> #agent.channels.Channel.transactionCapacity = 10000
>
>
>
>
>
> # Configure Solr Sink
>
>
>
> agent.sinks.SolrSink.type = org.apache.flume.sink.solr.
> morphline.MorphlineSolrSink
>
> agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf
>
> agent.sinks.SolrSink.batchsize = 10
>
> agent.sinks.SolrSink.batchDurationMillis = 10
>
> agent.sinks.SolrSink.channel = Channel1
>
> agent.sinks.SolrSink.morphlineId = morphline1
>
> agent.sinks.SolrSink.tika.config = tikaConfig.xml
>
> #agent.sinks.SolrSink.fileType = DataStream
>
> #agent.sinks.SolrSink.hdfs.batchsize = 5
>
> agent.sinks.SolrSink.rollCount = 0
>
> agent.sinks.SolrSink.rollInterval = 0
>
> #agent.sinks.SolrSink.rollsize = 100000000
>
> agent.sinks.SolrSink.idleTimeout = 0
>
> #agent.sinks.SolrSink.txnEventMax = 5000
>
>
>
> # Configure HDFS Sink
>
>
>
> agent.sinks.HDFSsink.channel = Channel2
>
> agent.sinks.HDFSsink.type = hdfs
>
> #agent.sinks.HDFSsink.hdfs.path = hdfs://codehdplak-po-r10p.sys.
> comcast.net:8020/user/solr/emails
>
> agent.sinks.HDFSsink.hdfs.path = hdfs://codehann/user/solr/emails
>
> #agent.sinks.HDFSsink.hdfs.fileType = DataStream
>
> agent.sinks.HDFSsink.hdfs.fileType = CompressedStream
>
> agent.sinks.HDFSsink.hdfs.batchsize = 1000
>
> agent.sinks.HDFSsink.hdfs.rollCount = 0
>
> agent.sinks.HDFSsink.hdfs.rollInterval = 0
>
> agent.sinks.HDFSsink.hdfs.rollsize = 10485760
>
> agent.sinks.HDFSsink.hdfs.idleTimeout = 0
>
> agent.sinks.HDFSsink.hdfs.maxOpenFiles = 1
>
> agent.sinks.HDFSsink.hdfs.filePrefix = %{basename}
>
> agent.sinks.HDFSsink.hdfs.codeC = gzip
>
>
>
>
>
> agent.sources.SpoolDirSrc.channels = Channel1 Channel2
>
> agent.sinks.SolrSink.channel = Channel1
>
> agent.sinks.HDFSsink.channel = Channel2
>
>
>
> Morhphine Code :
>
>
>
>
>
> solrLocator: {
>
>
>
> collection : esearch
>
>
>
> #zkHost : "127.0.0.1:9983"
>
>
>
> #zkHost : "codesolr-as-r1p.sys.comcast.net:2181,codesolr-as-r2p.sys.
> comcast.net:2182"
>
> #zkHost : "codesolr-as-r2p:2181"
>
> zkHost : "codesolr-wc-r1p.sys.comcast.net:2181,codesolr-wc-r2p.sys.
> comcast.net:2181,codesolr-wc-r3p.sys.comcast.net:2181"
>
>
>
> }
>
>
>
> morphlines :
>
> [
>
>
>
>   {
>
>
>
>     id : morphline1
>
>
>
>     importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
>
>
>
>     commands :
>
>     [
>
>
>
>       { detectMimeType { includeDefaultMimeTypes : true } }
>
>
>
>       {
>
>
>
>         solrCell {
>
>
>
>           solrLocator : ${solrLocator}
>
>
>
>           captureAttr : true
>
>
>
>           lowernames : true
>
>
>
>           capture : [_attachment_body, _attachment_mimetype, basename,
> content, content_encoding, content_type, file, meta,text]
>
>
>
>           parsers : [ # { parser : org.apache.tika.parser.txt.TXTParser }
>
>
>
>                     # { parser : org.apache.tika.parser.AutoDetectParser }
>
>                       #{ parser : org.apache.tika.parser.asm.ClassParser }
>
>                       #{ parser : org.gagravarr.tika.FlacParser }
>
>                       #{ parser : org.apache.tika.parser.executable.ExecutableParser
> }
>
>                       #{ parser : org.apache.tika.parser.font.TrueTypeParser
> }
>
>                       #{ parser : org.apache.tika.parser.xml.XMLParser }
>
>                       #{ parser : org.apache.tika.parser.html.HtmlParser }
>
>                       #{ parser : org.apache.tika.parser.image.TiffParser
> }
>
>                       # { parser : org.apache.tika.parser.mail.RFC822Parser
> }
>
>                       #{ parser : org.apache.tika.parser.mbox.MboxParser,
> additionalSupportedMimeTypes : [message/x-emlx] }
>
>                       #{ parser : org.apache.tika.parser.microsoft.OfficeParser
> }
>
>                       #{ parser : org.apache.tika.parser.hdf.HDFParser }
>
>                       #{ parser : org.apache.tika.parser.odf.OpenDocumentParser
> }
>
>                       #{ parser : org.apache.tika.parser.pdf.PDFParser }
>
>                       #{ parser : org.apache.tika.parser.rtf.RTFParser }
>
>                       { parser : org.apache.tika.parser.txt.TXTParser }
>
>                       #{ parser : org.apache.tika.parser.chm.ChmParser }
>
>                     ]
>
>
>
>          fmap : { content : text }
>
>          }
>
>
>
>       }
>
>       { generateUUID { field : id } }
>
>
>
>       { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }
>
>
>
>
>
>       { logDebug { format : "output record: {}", args : ["@{}"] } }
>
>
>
>       { loadSolr: { solrLocator : ${solrLocator} } }
>
>
>
>     ]
>
>
>
>   }
>
>
>
> ]
>
>
>
> I am not sure How I can get the flume metrics.
>
> Thank you for looking into it
>
>
>
> Regards,
>
> ~Sri
>
>
>
> *From:* iain wright [mailto:iainwrig@gmail.com]
> *Sent:* Wednesday, July 26, 2017 2:37 PM
> *To:* user@flume.apache.org
> *Subject:* Re: Flume consumes all memory - { OutOfMemoryError: GC
> overhead limit exceeded }
>
>
>
> Hi Sri,
>
>
>
> Are you using a memory channel? What source/sink?
>
>
>
> Can you please paste/link your obfuscated config
>
>
>
> What does the metrics endpoint say in terms of channel size,
> sinkdrainsuccess etc, for the period leading up to the OOM?
>
>
>
> Best,
>
> Iain
>
>
> Sent from my iPhone
>
>
> On Jul 26, 2017, at 8:00 AM, Anantharaman, Srinatha (Contractor) <
> Srinatha_Anantharaman@comcast.com> wrote:
>
> Hi All,
>
>
>
> Though I have mentioned the -Xms and -Xmx  values Flume is consuming all
> memory and failing at the end
>
>
>
> I have tried adding above parameters in command line as below
>
>
>
> a.       /usr/hdp/current/flume-server/bin/flume-ng agent -c
> /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent
> -Dproperty="-Xms1024m -Xmx4048m"
>
> b.      /usr/hdp/current/flume-server/bin/flume-ng agent -c
> /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Xms1024m
> -Xmx4048m
>
>
>
> And also using flume-env.sh file as below
>
>
>
> export JAVA_OPTS="-Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC"
>
>
>
> I am using HDP 2.5  and flume 1.5.2.2.5
>
>
>
> Kindly let me know how to resolve this issue
>
>
>
> Regards,
>
> ~Sri
>
>

RE: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Posted by "Anantharaman, Srinatha (Contractor)" <Sr...@comcast.com>.
Lain,

I am using file channel. Source is spoolDir and Sinks are Solr and HDFS
Please find below my Code

#Flume Configuration Starts

agent.sources = SpoolDirSrc
agent.channels = Channel1 Channel2
agent.sinks = SolrSink HDFSsink

# Configure Source

agent.sources.SpoolDirSrc.channels = Channel1 Channel2
agent.sources.SpoolDirSrc.type = spooldir
#agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/sources_tmp2
#agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/processed_emails/
agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/processed_emails2/
agent.sources.SpoolDirSrc.basenameHeader = true
agent.sources.SpoolDirSrc.selector.type = replicating
#agent.sources.SpoolDirSrc.batchSize = 100000

agent.sources.SpoolDirSrc.fileHeader = true
#agent.sources.src1.fileSuffix = .COMPLETED
agent.sources.SpoolDirSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder


# Use a channel that buffers events in file
#
agent.channels.Channel1.type = file
agent.channels.Channel2.type = file
agent.channels.Channel1.capacity = 5000
agent.channels.Channel2.capacity = 5000
agent.channels.Channel1.transactionCapacity = 5000
agent.channels.Channel2.transactionCapacity = 5000
agent.channels.Channel1.checkpointDir = /app/home/flume/.flume/file-channel/checkpoint1
agent.channels.Channel2.checkpointDir = /app/home/flume/.flume/file-channel/checkpoint2
agent.channels.Channel1.dataDirs = /app/home/flume/.flume/file-channel/data1
agent.channels.Channel2.dataDirs = /app/home/flume/.flume/file-channel/data2


#agent.channels.Channel.transactionCapacity = 10000


# Configure Solr Sink

agent.sinks.SolrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf
agent.sinks.SolrSink.batchsize = 10
agent.sinks.SolrSink.batchDurationMillis = 10
agent.sinks.SolrSink.channel = Channel1
agent.sinks.SolrSink.morphlineId = morphline1
agent.sinks.SolrSink.tika.config = tikaConfig.xml
#agent.sinks.SolrSink.fileType = DataStream
#agent.sinks.SolrSink.hdfs.batchsize = 5
agent.sinks.SolrSink.rollCount = 0
agent.sinks.SolrSink.rollInterval = 0
#agent.sinks.SolrSink.rollsize = 100000000
agent.sinks.SolrSink.idleTimeout = 0
#agent.sinks.SolrSink.txnEventMax = 5000

# Configure HDFS Sink

agent.sinks.HDFSsink.channel = Channel2
agent.sinks.HDFSsink.type = hdfs
#agent.sinks.HDFSsink.hdfs.path = hdfs://codehdplak-po-r10p.sys.comcast.net:8020/user/solr/emails
agent.sinks.HDFSsink.hdfs.path = hdfs://codehann/user/solr/emails
#agent.sinks.HDFSsink.hdfs.fileType = DataStream
agent.sinks.HDFSsink.hdfs.fileType = CompressedStream
agent.sinks.HDFSsink.hdfs.batchsize = 1000
agent.sinks.HDFSsink.hdfs.rollCount = 0
agent.sinks.HDFSsink.hdfs.rollInterval = 0
agent.sinks.HDFSsink.hdfs.rollsize = 10485760
agent.sinks.HDFSsink.hdfs.idleTimeout = 0
agent.sinks.HDFSsink.hdfs.maxOpenFiles = 1
agent.sinks.HDFSsink.hdfs.filePrefix = %{basename}
agent.sinks.HDFSsink.hdfs.codeC = gzip


agent.sources.SpoolDirSrc.channels = Channel1 Channel2
agent.sinks.SolrSink.channel = Channel1
agent.sinks.HDFSsink.channel = Channel2

Morhphine Code :


solrLocator: {

collection : esearch

#zkHost : "127.0.0.1:9983"

#zkHost : "codesolr-as-r1p.sys.comcast.net:2181,codesolr-as-r2p.sys.comcast.net:2182"
#zkHost : "codesolr-as-r2p:2181"
zkHost : "codesolr-wc-r1p.sys.comcast.net:2181,codesolr-wc-r2p.sys.comcast.net:2181,codesolr-wc-r3p.sys.comcast.net:2181"

}

morphlines :
[

  {

    id : morphline1

    importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

    commands :
    [

      { detectMimeType { includeDefaultMimeTypes : true } }

      {

        solrCell {

          solrLocator : ${solrLocator}

          captureAttr : true

          lowernames : true

          capture : [_attachment_body, _attachment_mimetype, basename, content, content_encoding, content_type, file, meta,text]

          parsers : [ # { parser : org.apache.tika.parser.txt.TXTParser }

                    # { parser : org.apache.tika.parser.AutoDetectParser }
                      #{ parser : org.apache.tika.parser.asm.ClassParser }
                      #{ parser : org.gagravarr.tika.FlacParser }
                      #{ parser : org.apache.tika.parser.executable.ExecutableParser }
                      #{ parser : org.apache.tika.parser.font.TrueTypeParser }
                      #{ parser : org.apache.tika.parser.xml.XMLParser }
                      #{ parser : org.apache.tika.parser.html.HtmlParser }
                      #{ parser : org.apache.tika.parser.image.TiffParser }
                      # { parser : org.apache.tika.parser.mail.RFC822Parser }
                      #{ parser : org.apache.tika.parser.mbox.MboxParser, additionalSupportedMimeTypes : [message/x-emlx] }
                      #{ parser : org.apache.tika.parser.microsoft.OfficeParser }
                      #{ parser : org.apache.tika.parser.hdf.HDFParser }
                      #{ parser : org.apache.tika.parser.odf.OpenDocumentParser }
                      #{ parser : org.apache.tika.parser.pdf.PDFParser }
                      #{ parser : org.apache.tika.parser.rtf.RTFParser }
                      { parser : org.apache.tika.parser.txt.TXTParser }
                      #{ parser : org.apache.tika.parser.chm.ChmParser }
                    ]

         fmap : { content : text }
         }

      }
      { generateUUID { field : id } }

      { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }


      { logDebug { format : "output record: {}", args : ["@{}"] } }

      { loadSolr: { solrLocator : ${solrLocator} } }

    ]

  }

]

I am not sure How I can get the flume metrics.
Thank you for looking into it

Regards,
~Sri

From: iain wright [mailto:iainwrig@gmail.com]
Sent: Wednesday, July 26, 2017 2:37 PM
To: user@flume.apache.org
Subject: Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Hi Sri,

Are you using a memory channel? What source/sink?

Can you please paste/link your obfuscated config

What does the metrics endpoint say in terms of channel size, sinkdrainsuccess etc, for the period leading up to the OOM?

Best,
Iain

Sent from my iPhone

On Jul 26, 2017, at 8:00 AM, Anantharaman, Srinatha (Contractor) <Sr...@comcast.com>> wrote:
Hi All,

Though I have mentioned the -Xms and -Xmx  values Flume is consuming all memory and failing at the end

I have tried adding above parameters in command line as below


a.       /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Dproperty="-Xms1024m -Xmx4048m"

b.      /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Xms1024m -Xmx4048m

And also using flume-env.sh file as below

export JAVA_OPTS="-Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseConcMarkSweepGC"

I am using HDP 2.5  and flume 1.5.2.2.5

Kindly let me know how to resolve this issue

Regards,
~Sri

Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }

Posted by iain wright <ia...@gmail.com>.
Hi Sri,

Are you using a memory channel? What source/sink? 

Can you please paste/link your obfuscated config

What does the metrics endpoint say in terms of channel size, sinkdrainsuccess etc, for the period leading up to the OOM?

Best,
Iain

Sent from my iPhone

> On Jul 26, 2017, at 8:00 AM, Anantharaman, Srinatha (Contractor) <Sr...@comcast.com> wrote:
> 
> Hi All,
>  
> Though I have mentioned the -Xms and -Xmx  values Flume is consuming all memory and failing at the end
>  
> I have tried adding above parameters in command line as below
>  
> a.     /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Dproperty="-Xms1024m -Xmx4048m"
> b.    /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Xms1024m -Xmx4048m
>  
> And also using flume-env.sh file as below
>  
> export JAVA_OPTS="-Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseConcMarkSweepGC"
>  
> I am using HDP 2.5  and flume 1.5.2.2.5
>  
> Kindly let me know how to resolve this issue
>  
> Regards,
> ~Sri