You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Masanobu Horiyama (JIRA)" <ji...@apache.org> on 2015/07/07 22:10:04 UTC
[jira] [Created] (AVRO-1696) Handshake request is not handled
causing an OutOfMemoryError
Masanobu Horiyama created AVRO-1696:
---------------------------------------
Summary: Handshake request is not handled causing an OutOfMemoryError
Key: AVRO-1696
URL: https://issues.apache.org/jira/browse/AVRO-1696
Project: Avro
Issue Type: Bug
Components: java
Affects Versions: 1.7.4, 1.8.0
Environment: Flume Agent : 1.6.0
OS : Mac OS X 10.7.5 and CentOS release 6.6 2.6.32-504.1.3.el6.x86_64
avro : 1.7.4
avro-ipc : 1.7.4
JDK: 1.6.0_65 and 1.7.0-45
Flume Client - NettyAvroRpcClient
flume-ng-sdk : 1.6.0
OS : Mac OS X 10.7.5
avro : 1.7.4
avro-ipc : 1.7.4
JDK: 1.6.0_65
The agent config:
{noformat}
# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 10000
# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414
# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
#agent1.sinks.log-sink1.type = logger
agent1.sinks.log-sink1.type = null
agent1.sinks.log-sink1.batchSize = 10
# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1
{noformat}
The client config:
batch-size = 1
connect-timeout = 10s
request-timout = 10s
Reporter: Masanobu Horiyama
A handshake request is not read because connection.isConnected() == true here:
https://github.com/apache/avro/blob/trunk/lang/java/ipc/src/main/java/org/apache/avro/ipc/Responder.java#L208
This should be an invalid state, but we see this happening in production which causes the agent to have OOM errors. Seems to happen when there are rapid client disconnects and connects or connection reset by peer and broken pipe errors.
Here is the base64 encoded form of buffers.get(0).array() when this occurs:
{noformat}
hqra4sRUdMD+k//Q8jUKZQCGqtrixFR0wP6T/9DyNQplAgAADGFwcGVuZA==
{noformat}
which appears to be the serialized handshake info.
The handshake is not read because this code is skipped:
https://github.com/apache/avro/blob/trunk/lang/java/ipc/src/main/java/org/apache/avro/ipc/Responder.java#L210
When not read by the handshakeReader, the BinaryDecoder's position in the internal byte array is not advanced (to 19).
Causing this code:
https://github.com/apache/avro/blob/trunk/lang/java/ipc/src/main/java/org/apache/avro/ipc/Responder.java#L124
to use index 0 of the handshake byte array as the size of the map it is expecting to deserialize.
At this point:
https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L308
The size of the hash map is determined to be 1452339317379, or 640371331 when cast to an int here:
https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L311
and we get an OOM error:
{noformat}
2015-06-29 15:30:24,590 (New I/O worker #4) [WARN - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)] Unexpected exception from downstream.
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.<init>(HashMap.java:187)
at java.util.HashMap.<init>(HashMap.java:199)
at org.apache.avro.generic.GenericDatumReader.newMap(GenericDatumReader.java:330)
at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:239)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
at org.apache.avro.ipc.Responder.respond(Responder.java:124)
at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)