You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "zhijiang (JIRA)" <ji...@apache.org> on 2019/07/08 15:29:00 UTC
[jira] [Comment Edited] (FLINK-13100) Fix the unexpected IOException during FileBufferReader#nextBuffer

    [ https://issues.apache.org/jira/browse/FLINK-13100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880465#comment-16880465 ] 

zhijiang edited comment on FLINK-13100 at 7/8/19 3:28 PM:
----------------------------------------------------------

Yes, the previous SpilledSubpartitionView really has the deadlock issue based on two buffers. And now we have the similar issue as before, but throw the IOException instead.

Your understanding of above two parts is right. But there exists another case:
 * When the view is created, it would be enqueued into the netty thread loop at first time, because it has both data and credit now as you mentioned.
 * Then a buffer is fetched from view and writeAndFlush() is called. If it is still available in the queue, that means it must has credit because next buffer is always available for reading ahead. For this case it has no problem, because we only wait the previous writeAndFlush() done to trigger the next one if current queue is not empty.
 * But if it is not available (no credit) when writeAndFlush() is called, then the queue is empty in netty thread loop. Before the future of writeAndFlush() is done, the netty thread could still process the AddCredit message which would make the view become available again, then it would be added into queue to trigger the next writeAndFlush(). That means the previous buffer is not recycled but the next write is also triggered to cause the problem. The process might be like this : first writeAndFlush() pending -> addCredit(trigger second writeAndFlush) -> finish the first writeAndFlush() to recycle buffer.
 * I think it might be caused by the improvement of reusing flink buffer in netty stack from release-1.5. We could break writeAndFlush()  into write and flush two processes. In the before when the write process finishes, the flink buffer is copied into netty internal ByteBuffer to be recycled then, so it would not cause problem even though the second writeAndFlush is triggered before first pending done. But now the write process would still reference the flink buffer in netty stack until the flush is done.

I would try to mock this process in relevant unit tests to verify and I might submit this test tomorrow for understanding it easily.


was (Author: zjwang):
Yes, the previous SpilledSubpartitionView really has the deadlock issue based on two buffers. And now we have the similar issue as before, but throw the IOException instead.

Your understanding of above two parts is right. But there exists another case:
 * When the view is created, it would be enqueued into the netty thread loop at first time, because it has both data and credit now as you mentioned.
 * Then a buffer is fetched from view and writeAndFlush() is called. If it is still available in the queue, that means it must has credit because next buffer is always available for reading ahead. For this case it has no problem, because we only wait the previous writeAndFlush() done to trigger the next one.
 * But if it is not available (no credit) when writeAndFlush() is called, then the queue is empty in netty thread loop. Before the future of writeAndFlush() is done, the netty thread could still process the AddCredit message which would make the view become available again, then it would be added into queue to trigger the next writeAndFlush(). That means the previous buffer is not recycled but the next write is also triggered to cause the problem. The process might be like this : first writeAndFlush() pending -> addCredit(trigger second writeAndFlush) -> finish the first writeAndFlush() to recycle buffer.
 * I think it might be caused by the improvement of reusing flink buffer in netty stack from release-1.5. We could break writeAndFlush()  into write and flush two processes. In the before when the write process finishes, the flink buffer is copied into netty internal ByteBuffer to be recycled then, so it would not cause problem even though the second writeAndFlush is triggered before first pending done. But now the write process would still reference the flink buffer in netty stack until the flush is done.

I would try to mock this process in relevant unit tests to verify and I might submit this test tomorrow for understanding it easily.

> Fix the unexpected IOException during FileBufferReader#nextBuffer
> -----------------------------------------------------------------
>
>                 Key: FLINK-13100
>                 URL: https://issues.apache.org/jira/browse/FLINK-13100
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Network
>            Reporter: zhijiang
>            Priority: Blocker
>
> In the implementation of FileBufferReader#nextBuffer, we expect the next memory segment always available based on the assumption that the nextBuffer call could only happen when the previous buffer was recycled before. Otherwise it would throw an IOException in current implementation.
> In fact, the above assumption is not making sense based on the credit-based and zero-copy features in network. The detail processes are as follows:
>  * The netty thread finishes calling the channel.writeAndFlush() in PartitionRequestQueue and adds a listener to handle the ChannelFuture later. Before future done, the corresponding buffer is not recycled because of zero-copy improvement.
>  * Before the previous future done, the netty thread could trigger next writeAndFlush via processing addCredit message, then FileBufferReader#nextBuffer would throw exception because of previous buffer not recycled.
> We thought of several ways for solving this potential bug:
>  * It does not trigger the next writeAndFlush before the previous future done. To do so it has to maintain the future state and check it in relevant actions. I wonder it might bring performance regression in network throughput and bring extra state management.
>  * Adjust the implementation of current FileBufferReader. We ever regarded the blocking partition view as always available based on the next buffer read ahead, so it would be always added into available queue in PartitionRequestQueue. Actually this next buffer ahead only simplifies the process of BoundedBlockingSubpartitionReader#notifyDataAvailable. The view availability could be judged based on available buffers in FileBufferReader instead of next buffer ahead. When the buffer is recycled into FileBufferReader after writeAndFlush done, it could call notifyDataAvailable to add this view into available queue in PartitionRequestQueue.
> I prefer the second way because it would not bring any bad impacts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)