You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Philip Martin <ph...@codematters.co.uk> on 2007/09/09 20:34:54 UTC

[PATCH] Add mutex to parallel testsuite

I'm a bit out of touch with Subversion development.  I see that a
parallel mode has been added to the testsuite (great), but it doesn't
seem to work on my computer (boo).  It usually hangs in getopt_test,
the first parallel set, or occasionally in basic_tests, the second
parallel set.  When interrupted I typically see something like:

  File "threading.py", line 442, in __bootstrap
    self.run()
  File "/home/pm/sw/subversion/subversion/tests/cmdline/svntest/main.py", line 879, in run
    self.result, self.stdout_lines, self.stderr_lines =\
  File "/home/pm/sw/subversion/subversion/tests/cmdline/svntest/main.py", line 328, in spawn_process
    pid, wait_code = os.wait()
OSError: [Errno 10] No child processes

Looking at the python code there is a list that is modified in
multiple threads without locking.  I don't know much about python
threading, but the following patch makes it "work for me":

The tests list needs to be protected by a mutex when running tests in
parallel.

* subversion/tests/cmdline/svntest/main.py
  (SpawnTest.__init__): Add lock parameter.
  (SpawnTest.run): Use lock when modifying list.
  (run_one_test): Add lock parameter.
  (_internal_run_tests): Create lock, use lock when acessing list.
  
Index: subversion/tests/cmdline/svntest/main.py
===================================================================
--- subversion/tests/cmdline/svntest/main.py	(revision 26499)
+++ subversion/tests/cmdline/svntest/main.py	(working copy)
@@ -850,10 +850,11 @@
   """Encapsulate a single test case, run it in a separate child process.
   Instead of waiting till the process is finished, add this class to a
   list of active tests for follow up in the parent process."""
-  def __init__(self, index, tests = None):
+  def __init__(self, index, tests = None, tests_lock = None):
     threading.Thread.__init__(self)
     self.index = index
     self.tests = tests
+    self.tests_lock = tests_lock
     self.result = None
     self.stdout_lines = None
     self.stderr_lines = None
@@ -882,7 +883,9 @@
     if filter(lambda x: x.startswith('FAIL: ') or x.startswith('XPASS: '),
               self.stdout_lines):
       self.result = 1
+    self.tests_lock.acquire()
     self.tests.append(self)
+    self.tests_lock.release()
     sys.stdout.write('.')
 
 class TestRunner:
@@ -980,7 +983,8 @@
 # it can be displayed by the 'list' command.)
 
 # Func to run one test in the list.
-def run_one_test(n, test_list, parallel = 0, finished_tests = None):
+def run_one_test(n, test_list, parallel = 0,
+                 finished_tests = None, finished_tests_lock = None):
   """Run the Nth client test in TEST_LIST, return the result.
 
   If we're running the tests in parallel spawn the test in a new process.
@@ -992,7 +996,7 @@
 
   # Run the test.
   if parallel:
-    st = SpawnTest(n, finished_tests)
+    st = SpawnTest(n, finished_tests, finished_tests_lock)
     st.start()
     return 0
   else:
@@ -1009,6 +1013,7 @@
 
   exit_code = 0
   finished_tests = []
+  finished_tests_lock = threading.Lock()
   tests_started = 0
 
   if not parallel:
@@ -1018,14 +1023,23 @@
   else:
     for testnum in testnums:
       # wait till there's a free spot.
+      finished_tests_lock.acquire()
       while tests_started - len(finished_tests) > parallel:
+        finished_tests_lock.release()
         time.sleep(0.2)
-      run_one_test(testnum, test_list, parallel, finished_tests)
+        finished_tests_lock.acquire()
+      run_one_test(testnum, test_list, parallel, finished_tests,
+                   finished_tests_lock)
       tests_started += 1
+      finished_tests_lock.release()
 
     # wait for all tests to finish
+    finished_tests_lock.acquire()
     while len(finished_tests) < len(testnums):
+      finished_tests_lock.release()
       time.sleep(0.2)
+      finished_tests_lock.acquire()
+    finished_tests_lock.release()
 
     # Sort test results list by test nr.
     deco = [(test.index, test) for test in finished_tests]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Add mutex to parallel testsuite

Posted by Ben Collins-Sussman <su...@red-bean.com>.

Welcome back Philip!  We missed you!  :-)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Add mutex to parallel testsuite

Posted by Philip Martin <ph...@codematters.co.uk>.

Philip Martin <ph...@codematters.co.uk> writes:

> Looking at the python code there is a list that is modified in
> multiple threads without locking.  I don't know much about python
> threading, but the following patch makes it "work for me":
>
> The tests list needs to be protected by a mutex when running tests in
> parallel.
>
> * subversion/tests/cmdline/svntest/main.py
>   (SpawnTest.__init__): Add lock parameter.
>   (SpawnTest.run): Use lock when modifying list.
>   (run_one_test): Add lock parameter.
>   (_internal_run_tests): Create lock, use lock when acessing list.

Having looked at the testsuite code more closely I don't see how the
parallel code can ever work reliably.  The problem is that the code
uses spawn_process to run each test in a separate process, and
spawn_process uses a open3/wait sequence.  That might work in a single
threaded program[1] but it won't work in a multithreaded program that
starts multiple processes simultaneously as there is no guarantee that
the status obtained by wait corresponds to the process started by
popen3.  The python documentation states that the process exit status
is not available when using popen3 and that the Popen3 class should be
used, although I think Popen3 is not available on Windows.

My "works for me" patch doesn't fix the problem, I have realised that
it doesn't even "work for me" all the time.  I'm not sure why the
program is losing track of child processes, but since it is attempting
to do something explicitly not supported by Python I suppose we can't
really complain.

I suppose the current code might work if it was switched to use Popen3
on those platforms that support it.  Another, perhaps better, approach
would be to dispense with the subprocess and just run the test
directly in the thread, although this might require some additional
code to handle the stdout/stderr output.

Does anyone use --parallel or PARALLEL=1?

1. It could fail in a single threaded program if there is a subprocess
   running when the popen3/wait combination is attempted, say if the
   program had earlier called popen3 without calling wait.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org