You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2017/10/11 10:48:38 UTC

Flink 1.3.2 Netty Exception

Hi to all,
we wrote a small JUnit test to reproduce a memory issue we have in a Flink
job (that seems related to Netty) . At some point, usually around the 28th
loop, the job fails with the following exception (actually we never faced
that in production but maybe is related to the memory issue somehow...):

Caused by: java.lang.IllegalAccessError:
org/apache/flink/runtime/io/network/netty/NettyMessage
at
io.netty.util.internal.__matchers__.org.apache.flink.runtime.io.network.netty.NettyMessageMatcher.match(NoOpTypeParameterMatcher.java)
at
io.netty.channel.SimpleChannelInboundHandler.acceptInboundMessage(SimpleChannelInboundHandler.java:95)
at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:102)
... 16 more

The github project is https://github.com/okkam-it/flink-memory-leak and the
JUnit test is contained in the MemoryLeakTest class (within src/main/test).

Thanks in advance for any support,
Flavio

Re: Flink 1.3.2 Netty Exception

Posted by Flavio Pompermaier <po...@okkam.it>.
Any update on this? Do you want me to create a JIRA issue for this bug?

On 11 Oct 2017 17:14, "Ufuk Celebi" <uc...@apache.org> wrote:

@Chesnay: Recycling of network resources happens after the tasks go
into state FINISHED. Since we are submitting new jobs in a local loop
here it can easily happen that the new job is submitted before enough
buffers are available again. At least, previously that was the case.

I'm CC'ing Nico who refactored the network buffer distribution
recently and who might have more details about this specific error
message.

@Nico: another question is why there seem to be more buffers available
but we don't assign them. I'm referring to this part of the error
message "5691 of 32768 bytes...".

On Wed, Oct 11, 2017 at 2:54 PM, Chesnay Schepler <ch...@apache.org>
wrote:
> I can confirm that the issue is reproducible with the given test, from the
> command-line and IDE.
>
> While cutting down the test case, by replacing the outputformat with a
> DiscardingOutputFormat and the JDBCInputFormat with a simple collection, i
> stumbled onto a new Exception after ~200 iterations:
>
> org.apache.flink.runtime.client.JobExecutionException: Job execution
failed.
>
> Caused by: java.io.IOException: Insufficient number of network buffers:
> required 4, but only 1 available. The total number of network buffers is
> currently set to 5691 of 32768 bytes each. You can increase this number by
> setting the configuration keys 'taskmanager.network.memory.fraction',
> 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
>       at
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.
createBufferPool(NetworkBufferPool.java:195)
>       at
> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(
NetworkEnvironment.java:186)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:602)
>       at java.lang.Thread.run(Thread.java:745)
>
>
> On 11.10.2017 12:48, Flavio Pompermaier wrote:
>
> Hi to all,
> we wrote a small JUnit test to reproduce a memory issue we have in a Flink
> job (that seems related to Netty) . At some point, usually around the 28th
> loop, the job fails with the following exception (actually we never faced
> that in production but maybe is related to the memory issue somehow...):
>
> Caused by: java.lang.IllegalAccessError:
> org/apache/flink/runtime/io/network/netty/NettyMessage
> at
> io.netty.util.internal.__matchers__.org.apache.flink.
runtime.io.network.netty.NettyMessageMatcher.match(
NoOpTypeParameterMatcher.java)
> at
> io.netty.channel.SimpleChannelInboundHandler.acceptInboundMessage(
SimpleChannelInboundHandler.java:95)
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(
SimpleChannelInboundHandler.java:102)
> ... 16 more
>
> The github project is https://github.com/okkam-it/flink-memory-leak and
the
> JUnit test is contained in the MemoryLeakTest class (within
src/main/test).
>
> Thanks in advance for any support,
> Flavio
>
>

Re: Flink 1.3.2 Netty Exception

Posted by Ufuk Celebi <uc...@apache.org>.
@Chesnay: Recycling of network resources happens after the tasks go
into state FINISHED. Since we are submitting new jobs in a local loop
here it can easily happen that the new job is submitted before enough
buffers are available again. At least, previously that was the case.

I'm CC'ing Nico who refactored the network buffer distribution
recently and who might have more details about this specific error
message.

@Nico: another question is why there seem to be more buffers available
but we don't assign them. I'm referring to this part of the error
message "5691 of 32768 bytes...".

On Wed, Oct 11, 2017 at 2:54 PM, Chesnay Schepler <ch...@apache.org> wrote:
> I can confirm that the issue is reproducible with the given test, from the
> command-line and IDE.
>
> While cutting down the test case, by replacing the outputformat with a
> DiscardingOutputFormat and the JDBCInputFormat with a simple collection, i
> stumbled onto a new Exception after ~200 iterations:
>
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>
> Caused by: java.io.IOException: Insufficient number of network buffers:
> required 4, but only 1 available. The total number of network buffers is
> currently set to 5691 of 32768 bytes each. You can increase this number by
> setting the configuration keys 'taskmanager.network.memory.fraction',
> 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
> 	at
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:195)
> 	at
> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:186)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:602)
> 	at java.lang.Thread.run(Thread.java:745)
>
>
> On 11.10.2017 12:48, Flavio Pompermaier wrote:
>
> Hi to all,
> we wrote a small JUnit test to reproduce a memory issue we have in a Flink
> job (that seems related to Netty) . At some point, usually around the 28th
> loop, the job fails with the following exception (actually we never faced
> that in production but maybe is related to the memory issue somehow...):
>
> Caused by: java.lang.IllegalAccessError:
> org/apache/flink/runtime/io/network/netty/NettyMessage
> at
> io.netty.util.internal.__matchers__.org.apache.flink.runtime.io.network.netty.NettyMessageMatcher.match(NoOpTypeParameterMatcher.java)
> at
> io.netty.channel.SimpleChannelInboundHandler.acceptInboundMessage(SimpleChannelInboundHandler.java:95)
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:102)
> ... 16 more
>
> The github project is https://github.com/okkam-it/flink-memory-leak and the
> JUnit test is contained in the MemoryLeakTest class (within src/main/test).
>
> Thanks in advance for any support,
> Flavio
>
>

Re: Flink 1.3.2 Netty Exception

Posted by Chesnay Schepler <ch...@apache.org>.
I can confirm that the issue is reproducible with the given test, from 
the command-line and IDE.

While cutting down the test case, by replacing the outputformat with a 
DiscardingOutputFormat and the JDBCInputFormat with a simple collection, 
i stumbled onto a new Exception after ~200 iterations:

org.apache.flink.runtime.client.JobExecutionException: Job execution failed.

Caused by: java.io.IOException: Insufficient number of network buffers: required 4, but only 1 available. The total number of network buffers is currently set to 5691 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
	at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:195)
	at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:186)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:602)
	at java.lang.Thread.run(Thread.java:745)


On 11.10.2017 12:48, Flavio Pompermaier wrote:
> Hi to all,
> we wrote a small JUnit test to reproduce a memory issue we have in a 
> Flink job (that seems related to Netty) . At some point, usually 
> around the 28th loop, the job fails with the following exception 
> (actually we never faced that in production but maybe is related to 
> the memory issue somehow...):
>
> Caused by: java.lang.IllegalAccessError: 
> org/apache/flink/runtime/io/network/netty/NettyMessage
> at 
> io.netty.util.internal.__matchers__.org.apache.flink.runtime.io.network.netty.NettyMessageMatcher.match(NoOpTypeParameterMatcher.java)
> at 
> io.netty.channel.SimpleChannelInboundHandler.acceptInboundMessage(SimpleChannelInboundHandler.java:95)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:102)
> ... 16 more
>
> The github project is https://github.com/okkam-it/flink-memory-leak 
> and the JUnit test is contained in the MemoryLeakTest class (within 
> src/main/test).
>
> Thanks in advance for any support,
> Flavio