You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2020/05/05 13:26:00 UTC

[jira] [Comment Edited] (TEZ-4157) ShuffleHandler: upgrade to netty4

    [ https://issues.apache.org/jira/browse/TEZ-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098937#comment-17098937 ] 

László Bodor edited comment on TEZ-4157 at 5/5/20, 1:25 PM:
------------------------------------------------------------

 [^TEZ-4157.02.patch]  is about the first successful refactor to netty4, most of the unit tests pass, except testKeepAlive, which has started to drive me crazy, but I'll give another chance to it

[~jeagles]: do you have some pointers regarding testKeepAlive, maybe you're familiar with that testcase...I'm 99% sure that my netty upgrade is correct in  [^TEZ-4157.02.patch], and all of the test cases pass (except testKeepAlive)...in testKeepAlive, there are 2 consecutive keepalive connections from the client, and the [second|https://github.com/apache/tez/blob/master/tez-plugins/tez-aux-services/src/test/java/org/apache/tez/auxservices/TestShuffleHandler.java#L474] fails with invalid http response after my patch...
could you please clarify the expected behavior of this test case, [regarding broken pipe|https://github.com/apache/tez/blob/master/tez-plugins/tez-aux-services/src/test/java/org/apache/tez/auxservices/TestShuffleHandler.java#L403]? I've been playing with this test case for more than 8-10 hours, but I haven't been able to solve it...basically:
1. if I insert a Thread.sleep(1000) before the second getInputStream, the connection is successful, but it than it fails because the second socket address is not the same, so I think it's not a keepalive anymore
2. without the sleep, I got invalid http response no matter how I change the payload from the fake shuffle handler...
what's exacly the point of this very [long cycle and big payload|https://github.com/apache/tez/blob/master/tez-plugins/tez-aux-services/src/test/java/org/apache/tez/auxservices/TestShuffleHandler.java#L410]? do we expect the buffer fill cycle itself to take longer than the keepalive timeout?

cc: [~rizhang]


was (Author: abstractdog):
 [^TEZ-4157.02.patch]  is about the first successful refactor to netty4, most of the unit tests pass, except testKeepAlive, which has started to drive me crazy, but I'll give another chance to it

> ShuffleHandler: upgrade to netty4
> ---------------------------------
>
>                 Key: TEZ-4157
>                 URL: https://issues.apache.org/jira/browse/TEZ-4157
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: TEZ-4157.01.patch, TEZ-4157.02.patch
>
>
> -In the dependency tree, there are 2 occurrences of compile scope direct netty dependencies, however, they're not used at all. I compiled locally successfully without them. E.g. when investigating blackduck alerts (complaining about netty deps for current 3.10.5.Final), it would be cleaner to start from a dependency tree where Tez doesn't depend on netty directly in order to eliminate its responsibility (and move the focus to underlying hadoop for instance).-
> Tez depends on netty3 almost only in ShuffleHandler and some related classes. We can eliminate netty3 by upgrading it, but this effort might involve some testing due to fundamental [changes from netty3->netty4|https://netty.io/wiki/new-and-noteworthy-in-4.0.html] + we don't have a reference yet, as [hadoop's ShuffleHandler|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java] is still on netty3.
> As per the netty documentation, we can also expect some performance improvement (e.g. Pooled buffers).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)