You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/11/16 11:10:00 UTC

[GitHub] [flink] xintongsong commented on a diff in pull request #21331: [FLINK-29639] Print resourceId of remote taskmanager when encounter transport exception.

xintongsong commented on code in PR #21331:
URL: https://github.com/apache/flink/pull/21331#discussion_r1023845581


##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/ConnectionID.java:
##########
@@ -43,18 +46,36 @@ public class ConnectionID implements Serializable {
 
     private final int connectionIndex;
 
+    /**
+     * Resource ID can be known from shuffleDescriptor, to avoid rpc overhead,
+     * PartitionConnectionInfo will carry a connectionId with null resourceID.

Review Comment:
   Why `PartitionConnectionInfo` has a null resourceId? It seems `NetworkPartitionConnectionInfo` is always created in `NettyShuffleDescriptor#fromProducerDescriptor`, where you can get the resourceId from `producerDescriptor#getProducerLocation`.



##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/exception/TransportException.java:
##########
@@ -27,17 +29,25 @@ public abstract class TransportException extends IOException {
 
     private final SocketAddress address;
 
-    public TransportException(String message, SocketAddress address) {
-        this(message, address, null);
+    private final ResourceID resourceID;
+
+    public TransportException(String message, SocketAddress address, ResourceID resourceID) {
+        this(message, address, resourceID, null);
     }
 
-    public TransportException(String message, SocketAddress address, Throwable cause) {
+    public TransportException(
+            String message, SocketAddress address, ResourceID resourceID, Throwable cause) {
         super(message, cause);
 
+        this.resourceID = resourceID;
         this.address = address;
     }
 
     public SocketAddress getAddress() {
         return address;
     }
+
+    public ResourceID getResourceID() {
+        return resourceID;
+    }

Review Comment:
   It seems this is never called. The resourceId is already in the error message.



##########
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/exception/LocalTransportException.java:
##########
@@ -18,17 +18,20 @@
 
 package org.apache.flink.runtime.io.network.netty.exception;
 
+import org.apache.flink.runtime.clusterframework.types.ResourceID;
+
 import java.net.SocketAddress;
 
 public class LocalTransportException extends TransportException {
 
     private static final long serialVersionUID = 2366708881288640674L;
 
-    public LocalTransportException(String message, SocketAddress address) {
-        super(message, address);
+    public LocalTransportException(String message, SocketAddress address, ResourceID resourceID) {

Review Comment:
   Why does a `LocalTransportException` need a resourceId? I think the purpose of printing resource id is to help identify the remote TM that the broken connection connected to. For a local exception, there's no remote TM.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org