You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Will McQueen (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/03/22 06:58:22 UTC

[jira] [Issue Comment Edited] (FLUME-1037) NETCAT handler theads terminate under stress test

    [ https://issues.apache.org/jira/browse/FLUME-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235369#comment-13235369 ] 

Will McQueen edited comment on FLUME-1037 at 3/22/12 5:57 AM:
--------------------------------------------------------------

I now understand what's going on. NetcatSource is not creating one event per newline-separated text. Instead, it's creating one event per connection (the event body ends when the connection closes).

The NetcatSource javadoc says:
"A netcat-like source that listens on a given port and turns each line of text into an event....The expectation is that the supplied data is newline separated text. Each line of text is turned into a Flume event and sent via the connected channel."

The handler code (in NetcatSource.NetcatSocketHandler.run) that reads from the socket is here:
        while (reader.read(buffer) != -1) {
          buffer.flip();

          logger.debug("read {} characters", buffer.remaining());

          counterGroup.addAndGet("characters.received",
              Long.valueOf(buffer.limit()));

          builder.append(buffer.array(), buffer.position(), buffer.length());
        }

As you can see, the buffer (which is 512 bytes in size) is continuously used to read from the server port until the connection is EOL'd by the client. That means that the StringBuilder continues to grow without bound (assuming that you have a continuous input source such as with "yes | nc localhost 1473"). So no matter what you specify for -Xmx in flume-env.sh, you're guaranteed to eventually go Out Of Memory (which will sever the connection, and cause the source to restart due to the LifecyleSupervisor's AlwaysRestartPolicy, allowing a new connection over the same port).

In case you're interested in my approach, I started with a very large heap size and I used YourKit Java Profiler (there are other profilers out there too like JProfiler) to take a snapshot of memory, identify the objects with the largest retained sizes (one of which turned out to be a single instance of class StringBuilder), then filtered-out "unreachable but not yet collected" objects, sorted the list by "Retained Size", and looked at top 3 hotspots: char[], Thread, and StringBuilder. There was only 1 object of class StringBuilder, and it had a retained size of over 500MB! So I drilled-down into that object (highlighting it and choosing "Selected Objects"), which showed the value of the StringBuilder object to be a single char[] of over 250 million 'y' characters. Going to <Paths from GC Roots> tab and clicking "start calculation" showed that the huge char[] was held by a local variable called "builder" in a thread named "netcat-handler-0"... which brought me to the while loop I pasted earlier.

I initially used NULL for the sink, but if you replace that with LOGGER then you'll see that only a single event is output... and it's output only after the client closes the connection to the netcat source. I tried the LOGGER sink when Hari showed me that when he used netcat source on his Mac, only a single event was output to the logger after closing the netcat connection.
                
      was (Author: will@cloudera.com):
    I now understand what's going on. NetcatSource is not creating one event per newline-separated text. Instead, it's creating one event per connection (the event body ends when the connection closes).

The NetcatSource javadoc says:
"A netcat-like source that listens on a given port and turns each line of text into an event....The expectation is that the supplied data is newline separated text. Each line of text is turned into a Flume event and sent via the connected channel."

The handler code (in NetcatSource.NetcatSocketHandler.run) that reads from the socket is here:
        while (reader.read(buffer) != -1) {
          buffer.flip();

          logger.debug("read {} characters", buffer.remaining());

          counterGroup.addAndGet("characters.received",
              Long.valueOf(buffer.limit()));

          builder.append(buffer.array(), buffer.position(), buffer.length());
        }

As you can see, the buffer (which is 512 bytes in size) is continuously used to read from the server port until the connection is EOL'd by the client. That means that the StringBuilder continues to grow without bound (assuming that you have a continuous input source such as with "yes | nc localhost 1473"). So no matter what you specify for -Xmx in flume-env.sh, you're guaranteed to eventually go Out Of Memory (which will sever the connection, and cause the source to restart due to the LifecyleSupervisor's AlwaysRestartPolicy, allowing a new connection over the same port).

In case you're interested in my approach, I started with a very large heap size and I used YourKit Java Profiler (there are other profilers out there too like JProfiler) to take a snapshot of memory, identify the objects with the largest retained sizes (one of which turned out to be a single instance of class StringBuilder), then filtered-out "unreachable but not yet collected" objects, sorted the list by "Retained Size", and looked at top 3 hotspots: char[], Thread, and StringBuilder. There was only 1 object of class StringBuilder, and it had a retained size of over 500MB! So I drilled-down into that object (highlighting it and choosing "Selected Objects"), which showed the value of the StringBuilder object to be a single char[] of over 250 million 'y' characters. Going to <Paths from GC Roots> tab and clicking "start calculation" showed that the huge char[] was held by a local variable called "builder" in a thread named "netcat-handler-0"... which brought me to the while loop I pasted earlier.

I initially used NULL for the sink, but if you replace that with LOGGER then you'll see that only a single event is output... and it's output only after the client closes the connection to the netcat source.
                  
> NETCAT handler theads terminate under stress test
> -------------------------------------------------
>
>                 Key: FLUME-1037
>                 URL: https://issues.apache.org/jira/browse/FLUME-1037
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>         Environment: [CentOS 6.2 64-bit]
>            Reporter: Will McQueen
>
> Steps:
> 1. Use a props file such as the following:
> \# a = agent
> \# r = source
> \# c = channel
> \# k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> \# ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> \# ===CHANNELS===
> a1.channels.c1.type = MEMORY
> \# ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2. Set the FLUME_CONF_DIR to point to your conf dir
> [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ export FLUME_CONF_DIR=/home/will/git/apache/flume/flume-1.2.0-incubating-SNAPSHOT/conf
> 3. Create a flume-env.sh file
> [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ cp conf/flume-env.sh.template conf/flume-env.sh
> 4. Adjust the memory size within flume-env.sh (this file will be automatically sourced when calling bin/flume-ng, but only if you've specified the FLUME_CONF_DIR env var)
> (here, I went to the extreme and I set the min and max heap to 1GB. I also specified a YourKit profiler agent)
> Sample contents of flume-env.sh:
> export JAVA_OPTS="-Xms1024m -Xmx1024m -agentpath:/home/will/tools/yjp-10.0.6/bin/linux-x86-64/libyjpagent.so=tracing,noj2ee"
> 5. Run the flume NG agent:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 6. Open-up 10 terminal windows (on the same host) to connect to the netcat server port. Sent continuous output in each terminal. I chose to use the command:
>      yes | nc localhost 1473
> The "yes" unix cmd will continuously output 'y' char, followed by newline char. If you use YourKit and go into the Threads view, you'll see that after a while (possibly need to wait up to 10 mins) after a netcat handler thread has continuously been alternating between Runnable and Blocked states (blocking due to org.apache.klog4j.Category.log(..), but that's beside the point), the netcat handler thread enters a continuous wait state for exactly 1 minute, and then terminates (while its associated 'yes | nc localhost 1473' command is still running).
> I haven't done further analysis. My first thought was a thread safety issue. Note that there are no property file reconfigurations done during this test -- I leave the props file alone.
> I welcome your ideas/comments. I initially ran this test with the default -Xmx20m but it ran out of memory. For a future test I might lower the Xmx/Xms from 1GB to maybe 128MB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira