You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/04/05 21:55:51 UTC

[GitHub] [tvm] csullivan opened a new pull request, #10909: [Hexagon] Less aggressive adb state clean up

csullivan opened a new pull request, #10909:
URL: https://github.com/apache/tvm/pull/10909

   Changes:
   - Issue SIGINT before SIGKILL to allow the server time to clean up and unbind the port on Android
   - Only remove forward (and reversed) ports set during a given session rather than globally applying remove all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] Lunderberg commented on a diff in pull request #10909: [Hexagon] Less aggressive adb state clean up

Posted by GitBox <gi...@apache.org>.
Lunderberg commented on code in PR #10909:
URL: https://github.com/apache/tvm/pull/10909#discussion_r849650057


##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -356,26 +358,46 @@ def _copy_binaries(self):
         for item in self.ANDROID_HEXAGON_RPC_FILES:
             self._copy_to_remote(lib_dir / item, self._workspace / item)
 
+    def _process_forwarded_ports(self):
+        forwarded_ports = subprocess.check_output(self._adb_device_sub_cmd + ["forward", "--list"])
+        existing_forwards = []
+        for forward in str(forwarded_ports).split("\\n"):
+            entry = forward.split()
+            if len(entry) == 3:
+                _, local, _ = entry
+                existing_forwards.append(int(local.strip("tcp:")))
+        return existing_forwards
+
+    def _forward_ports(self, rpc_server_port, existing_forwards):
+        # Enable port forward for RPC server. We forward the first ten open ports
+        # starting from the rpc_server_port
+        port = rpc_server_port
+        while len(self.forwarded_ports_) < 10:

Review Comment:
   I believe the total number of ports isn't important, instead trying to match the port range that may be attempted a server, when it searches for an available port to listen on.  How often is the port already forwarded or in use, and would be be better to throw an error in those cases?



##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -356,26 +358,46 @@ def _copy_binaries(self):
         for item in self.ANDROID_HEXAGON_RPC_FILES:
             self._copy_to_remote(lib_dir / item, self._workspace / item)
 
+    def _process_forwarded_ports(self):

Review Comment:
   Nit: Rename to `_existing_forwarded_ports` or `_get_forwarded_ports`.  The current name reads as if this is performing some processing on each forwarded port.



##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -356,26 +358,46 @@ def _copy_binaries(self):
         for item in self.ANDROID_HEXAGON_RPC_FILES:
             self._copy_to_remote(lib_dir / item, self._workspace / item)
 
+    def _process_forwarded_ports(self):
+        forwarded_ports = subprocess.check_output(self._adb_device_sub_cmd + ["forward", "--list"])
+        existing_forwards = []
+        for forward in str(forwarded_ports).split("\\n"):
+            entry = forward.split()
+            if len(entry) == 3:
+                _, local, _ = entry
+                existing_forwards.append(int(local.strip("tcp:")))
+        return existing_forwards
+
+    def _forward_ports(self, rpc_server_port, existing_forwards):
+        # Enable port forward for RPC server. We forward the first ten open ports
+        # starting from the rpc_server_port
+        port = rpc_server_port
+        while len(self.forwarded_ports_) < 10:
+            if port not in existing_forwards and not _is_port_in_use(port):
+                subprocess.check_call(
+                    self._adb_device_sub_cmd + ["forward", f"tcp:{port}", f"tcp:{port}"]
+                )
+                self.forwarded_ports_.append(port)
+            port += 1
+
+    def _reverse_ports(self, rpc_tracker_port):
+        subprocess.check_call(
+            self._adb_device_sub_cmd
+            + ["reverse", f"tcp:{rpc_tracker_port}", f"tcp:{rpc_tracker_port}"]
+        )
+
     def _run_server_script(self):
         """Setup the ADB connection and execute the server script."""
 
-        # Removed pre-defined forward/reverse rules
-        subprocess.check_call(self._adb_device_sub_cmd + ["forward", "--remove-all"])
-        subprocess.check_call(self._adb_device_sub_cmd + ["reverse", "--remove-all"])
-
+        # Collect any existing adb port forwarding to avoid duplication
+        # with another running process
+        existing_forwards = self._process_forwarded_ports()
         # Enable port reverse for RPC tracker
         rpc_tracker_port = self._rpc_info["rpc_tracker_port"]
         rpc_server_port = self._rpc_info["rpc_server_port"]
-        subprocess.check_call(
-            self._adb_device_sub_cmd
-            + ["reverse", f"tcp:{rpc_tracker_port}", f"tcp:{rpc_tracker_port}"]
-        )
-        # Enable port forward for RPC server. We forward 9 ports after the rpc_server_port.
-        for i in range(0, 10):
-            subprocess.check_call(
-                self._adb_device_sub_cmd
-                + ["forward", f"tcp:{rpc_server_port+i}", f"tcp:{rpc_server_port+i}"]
-            )
+
+        self._reverse_ports(rpc_tracker_port)
+        self._forward_ports(rpc_server_port, existing_forwards)

Review Comment:
   Nit: Since `existing_forwards` isn't used elsewhere, may be cleaner to generated it inside `_forward_ports`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] csullivan commented on a diff in pull request #10909: [Hexagon] Less aggressive adb state clean up

Posted by GitBox <gi...@apache.org>.
csullivan commented on code in PR #10909:
URL: https://github.com/apache/tvm/pull/10909#discussion_r848956577


##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -373,9 +370,10 @@ def _run_server_script(self):
         )
         # Enable port forward for RPC server. We forward 9 ports after the rpc_server_port.
         for i in range(0, 10):
+            port = rpc_server_port + i
+            self.forwarded_ports_.append(port)
             subprocess.check_call(
-                self._adb_device_sub_cmd
-                + ["forward", f"tcp:{rpc_server_port+i}", f"tcp:{rpc_server_port+i}"]
+                self._adb_device_sub_cmd + ["forward", f"tcp:{port}", f"tcp:{port}"]

Review Comment:
   Thank you for the though on this, and no adb doesn't complain about multiple attempts to forward the same port. I decided to be more rigorous with port checking and only forward ports that are not already forwarded by adb or bound by another process. This seems to have greatly improved the stability. PTAL at https://github.com/apache/tvm/pull/10909/commits/ed610c4de5d2208ba599c922f990a979127593aa.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] csullivan commented on a diff in pull request #10909: [Hexagon] Less aggressive adb state clean up

Posted by GitBox <gi...@apache.org>.
csullivan commented on code in PR #10909:
URL: https://github.com/apache/tvm/pull/10909#discussion_r848956577


##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -373,9 +370,10 @@ def _run_server_script(self):
         )
         # Enable port forward for RPC server. We forward 9 ports after the rpc_server_port.
         for i in range(0, 10):
+            port = rpc_server_port + i
+            self.forwarded_ports_.append(port)
             subprocess.check_call(
-                self._adb_device_sub_cmd
-                + ["forward", f"tcp:{rpc_server_port+i}", f"tcp:{rpc_server_port+i}"]
+                self._adb_device_sub_cmd + ["forward", f"tcp:{port}", f"tcp:{port}"]

Review Comment:
   Thank you for the thought on this, and no adb doesn't complain about multiple attempts to forward the same port. I decided to be more rigorous with port checking and only forward ports that are not already forwarded by adb or bound by another process. This seems to have greatly improved the stability. PTAL at https://github.com/apache/tvm/pull/10909/commits/ed610c4de5d2208ba599c922f990a979127593aa.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] Lunderberg commented on a diff in pull request #10909: [Hexagon] Less aggressive adb state clean up

Posted by GitBox <gi...@apache.org>.
Lunderberg commented on code in PR #10909:
URL: https://github.com/apache/tvm/pull/10909#discussion_r849762545


##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -356,26 +358,46 @@ def _copy_binaries(self):
         for item in self.ANDROID_HEXAGON_RPC_FILES:
             self._copy_to_remote(lib_dir / item, self._workspace / item)
 
+    def _process_forwarded_ports(self):
+        forwarded_ports = subprocess.check_output(self._adb_device_sub_cmd + ["forward", "--list"])
+        existing_forwards = []
+        for forward in str(forwarded_ports).split("\\n"):
+            entry = forward.split()
+            if len(entry) == 3:
+                _, local, _ = entry
+                existing_forwards.append(int(local.strip("tcp:")))
+        return existing_forwards
+
+    def _forward_ports(self, rpc_server_port, existing_forwards):
+        # Enable port forward for RPC server. We forward the first ten open ports
+        # starting from the rpc_server_port
+        port = rpc_server_port
+        while len(self.forwarded_ports_) < 10:

Review Comment:
   Good point, and that would make a lot of sense to me.  I like the semantics of "bind here or error" much more than "bind here or maybe somewhere else or error".  That makes sense for it to be a later change, and not something needed at the moment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] mehrdadh merged pull request #10909: [Hexagon] Less aggressive adb state clean up

Posted by GitBox <gi...@apache.org>.
mehrdadh merged PR #10909:
URL: https://github.com/apache/tvm/pull/10909


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] csullivan commented on a diff in pull request #10909: [Hexagon] Less aggressive adb state clean up

Posted by GitBox <gi...@apache.org>.
csullivan commented on code in PR #10909:
URL: https://github.com/apache/tvm/pull/10909#discussion_r849665063


##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -356,26 +358,46 @@ def _copy_binaries(self):
         for item in self.ANDROID_HEXAGON_RPC_FILES:
             self._copy_to_remote(lib_dir / item, self._workspace / item)
 
+    def _process_forwarded_ports(self):
+        forwarded_ports = subprocess.check_output(self._adb_device_sub_cmd + ["forward", "--list"])
+        existing_forwards = []
+        for forward in str(forwarded_ports).split("\\n"):
+            entry = forward.split()
+            if len(entry) == 3:
+                _, local, _ = entry
+                existing_forwards.append(int(local.strip("tcp:")))
+        return existing_forwards
+
+    def _forward_ports(self, rpc_server_port, existing_forwards):
+        # Enable port forward for RPC server. We forward the first ten open ports
+        # starting from the rpc_server_port
+        port = rpc_server_port
+        while len(self.forwarded_ports_) < 10:

Review Comment:
   Not sure throwing an error in these cases makes sense given that the user only cares about the functionality of their tests on hardware and doesn't want to see test failures based on bad port configurations. 
   
   If we change the RPC server to fail rather than search for a new port as it does now, then we can revisit the launcher code and have it try until successfully binding to a single port and remove the port range nonsense. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] Lunderberg commented on a diff in pull request #10909: [Hexagon] Less aggressive adb state clean up

Posted by GitBox <gi...@apache.org>.
Lunderberg commented on code in PR #10909:
URL: https://github.com/apache/tvm/pull/10909#discussion_r843993790


##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -360,10 +361,6 @@ def _copy_binaries(self):
     def _run_server_script(self):
         """Setup the ADB connection and execute the server script."""
 
-        # Removed pre-defined forward/reverse rules
-        subprocess.check_call(self._adb_device_sub_cmd + ["forward", "--remove-all"])

Review Comment:
   Do we want to clean up port forwards that have been opened for more than N days?  The `.stop_server()` method may not be called if a segfault occurs, and that may leave ports open unnecessarily.



##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -386,21 +384,46 @@ def _run_server_script(self):
             stderr=subprocess.PIPE,
         )
 
+    def _cleanup_port_forwarding(self):
+        # Removed pre-defined forward/reverse rules
+        rpc_tracker_port = self._rpc_info["rpc_tracker_port"]
+        subprocess.check_call(
+            self._adb_device_sub_cmd + ["reverse", "--remove", f"tcp:{rpc_tracker_port}"]
+        )
+        for port in self.forwarded_ports_:
+            subprocess.check_call(self._adb_device_sub_cmd + ["forward", "--remove", f"tcp:{port}"])
+
+    def _terminate_remote(self):
+        # Send interupt to main and child processes
+        SIGINT = 2
+        subprocess.Popen(
+            self._adb_device_sub_cmd
+            + ["shell", f"pkill -l{SIGINT} -P `cat {self._workspace}/rpc_pid.txt`"]
+        )
+        subprocess.Popen(
+            self._adb_device_sub_cmd
+            + ["shell", f"kill -s sigint `cat {self._workspace}/rpc_pid.txt`"]
+        )
+        subprocess.Popen(
+            self._adb_device_sub_cmd
+            + ["shell", f"kill -s sigint `cat {self._workspace}/rpc_pid.txt`"]
+        )
+        # Wait for processes to destruct cleanly after receiving the intrupt
+        subprocess.Popen(self._adb_device_sub_cmd + ["shell", "sleep", "0.1s"])
+        # Kill process children
+        self._adb_device_sub_cmd + ["shell", f"pkill -P `cat {self._workspace}/rpc_pid.txt`"]

Review Comment:
   I wonder if we could set the process group for the subprocesses launched by `PopenWorker`.  Those are the ones that tend to be left-over.  The `pkill -P` command would send signals to the direct children of the main process, but not to any grandchildren (e.g. main process starts a worker subprocess, which starts a compilation subprocess), or to any processes started after the child process is killed but before the parent process is killed (e.g. main process interprets the closing child as a failed test case, starts a new test case).  It [looks like](https://pymotw.com/2/subprocess/#process-groups-sessions) it would be relatively straightfoward to have `Popen` calls started as the same process group, so that they would all receive the signal simultaneously.



##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -386,21 +384,46 @@ def _run_server_script(self):
             stderr=subprocess.PIPE,
         )
 
+    def _cleanup_port_forwarding(self):
+        # Removed pre-defined forward/reverse rules
+        rpc_tracker_port = self._rpc_info["rpc_tracker_port"]
+        subprocess.check_call(
+            self._adb_device_sub_cmd + ["reverse", "--remove", f"tcp:{rpc_tracker_port}"]
+        )
+        for port in self.forwarded_ports_:
+            subprocess.check_call(self._adb_device_sub_cmd + ["forward", "--remove", f"tcp:{port}"])
+
+    def _terminate_remote(self):
+        # Send interupt to main and child processes
+        SIGINT = 2
+        subprocess.Popen(
+            self._adb_device_sub_cmd
+            + ["shell", f"pkill -l{SIGINT} -P `cat {self._workspace}/rpc_pid.txt`"]

Review Comment:
   I think pkill also supports named signals, so we can pass `-l SIGINT` instead of needing the string format.



##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -373,9 +370,10 @@ def _run_server_script(self):
         )
         # Enable port forward for RPC server. We forward 9 ports after the rpc_server_port.
         for i in range(0, 10):
+            port = rpc_server_port + i
+            self.forwarded_ports_.append(port)
             subprocess.check_call(
-                self._adb_device_sub_cmd
-                + ["forward", f"tcp:{rpc_server_port+i}", f"tcp:{rpc_server_port+i}"]
+                self._adb_device_sub_cmd + ["forward", f"tcp:{port}", f"tcp:{port}"]

Review Comment:
   If a port forward already exists, does `adb` exit with a non-zero return code?  If the `adb` call exits with a zero return code in that case, then there are two processes trying to start port forwards on overlapping ranges would result in double clean-up occurring.



##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -373,9 +370,10 @@ def _run_server_script(self):
         )
         # Enable port forward for RPC server. We forward 9 ports after the rpc_server_port.
         for i in range(0, 10):
+            port = rpc_server_port + i
+            self.forwarded_ports_.append(port)
             subprocess.check_call(
-                self._adb_device_sub_cmd
-                + ["forward", f"tcp:{rpc_server_port+i}", f"tcp:{rpc_server_port+i}"]
+                self._adb_device_sub_cmd + ["forward", f"tcp:{port}", f"tcp:{port}"]

Review Comment:
   Also, I think we should move the `self.forwarded_ports_.append(port)` to be after the subprocess call.  That way, a cleanup that occurs as the stack unwinds (e.g. [the `hexagon_launcher` fixture](https://github.com/apache/tvm/blob/main/tests/python/contrib/test_hexagon/conftest.py#L184)) wouldn't try to clean up the failed port forward.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] csullivan commented on pull request #10909: [Hexagon] Less aggressive adb state clean up

Posted by GitBox <gi...@apache.org>.
csullivan commented on PR #10909:
URL: https://github.com/apache/tvm/pull/10909#issuecomment-1089411791

   cc @mehrdadh @Lunderberg this one was an attempt to stabilize the adb issues we've seen. I'm not sure I see much improvement, but would like your feedback. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] csullivan commented on a diff in pull request #10909: [Hexagon] Less aggressive adb state clean up

Posted by GitBox <gi...@apache.org>.
csullivan commented on code in PR #10909:
URL: https://github.com/apache/tvm/pull/10909#discussion_r848956577


##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -373,9 +370,10 @@ def _run_server_script(self):
         )
         # Enable port forward for RPC server. We forward 9 ports after the rpc_server_port.
         for i in range(0, 10):
+            port = rpc_server_port + i
+            self.forwarded_ports_.append(port)
             subprocess.check_call(
-                self._adb_device_sub_cmd
-                + ["forward", f"tcp:{rpc_server_port+i}", f"tcp:{rpc_server_port+i}"]
+                self._adb_device_sub_cmd + ["forward", f"tcp:{port}", f"tcp:{port}"]

Review Comment:
   Thank you for the though on this, and no adb doesn't complain about multiple attempts to forward the same port. I decided to be more rigorous with port checking and only forward ports that are not already forwarded by adb or bound by another process. This seems to have greatly improved the stability. PTAL



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org