You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by ad...@apache.org on 2019/02/15 18:07:29 UTC

[kudu] 01/03: KUDU-2686 python: remove multiprocessing

This is an automated email from the ASF dual-hosted git repository.

adar pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit d76422c080fe5ded3acec3dab8f16148396c7468
Author: Andrew Wong <aw...@cloudera.com>
AuthorDate: Thu Feb 14 14:25:32 2019 -0800

    KUDU-2686 python: remove multiprocessing
    
    The python multiprocessing library doesn't play very nicely when
    creating Pools after initializing complex state (e.g. the KuduClient).
    Because of how it forks, the library may even copy lock states, and lead
    to odd situations, like multiple threads waiting on a lock that isn't
    held by any thread in the process.
    
    This was the case in KUDU-2686, which resulted in a hang in
    test_scantoken.py. Upon inspection[1], there appeared to be multiple
    threads in a process waiting on the same sys_futex, but none of them
    held it. [2] tips some pieces to make it more reproducible (tested on
    Ubuntu 14.04, where the issue was first reported).
    
    Starting with python3.4, there is supposedly a way around this, which is
    to use a different process-startup pattern, though this isn't available
    in python2.
    
    This patch removes usage of multiprocessing entirely. While it can be
    argued that multiprocessing provides extra test coverage, given that
    multiprocessing is known to have such issues[3][4], that this isn't the
    first time we've been bitten by this forking issue, that it's test-only,
    and that scan tokens are tested in the C++ client as well, "dumbing"
    down the test doesn't seem unreasonable.
    
    [1] https://gist.github.com/andrwng/d2d21c551362ddd564926c2a4ec406ae
    [2] https://gist.github.com/andrwng/cc6c211c62b1235cc58944d513ba6655
    [3] https://github.com/pytest-dev/pytest/issues/958
    [4] https://codewithoutrules.com/2018/09/04/python-multiprocessing/
    
    Change-Id: Ia9aa91191d54801731da27e5f132b3c96af0efa1
    Reviewed-on: http://gerrit.cloudera.org:8080/12494
    Tested-by: Kudu Jenkins
    Reviewed-by: Jordan Birdsell <jt...@apache.org>
    Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
 python/kudu/tests/test_scantoken.py | 17 ++++-------------
 python/setup.py                     |  5 -----
 2 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/python/kudu/tests/test_scantoken.py b/python/kudu/tests/test_scantoken.py
index c453659..37b273d 100644
--- a/python/kudu/tests/test_scantoken.py
+++ b/python/kudu/tests/test_scantoken.py
@@ -20,7 +20,6 @@ from kudu.compat import unittest
 from kudu.tests.util import TestScanBase
 from kudu.tests.common import KuduTestBase
 import kudu
-from multiprocessing import Pool
 import datetime
 import time
 
@@ -44,21 +43,13 @@ class TestScanToken(TestScanBase):
 
     def _subtest_serialize_thread_and_verify(self, tokens, expected_tuples, count_only=False):
         """
-        Given the input serialized tokens, spawn new threads,
-        execute them and validate the results
+        Given the input serialized tokens, hydrate the scanners, execute the
+        scans, and validate the results
         """
-        input =  [(token.serialize(), self.master_hosts, self.master_ports)
-                for token in tokens]
-
-        # Begin process pool
-        pool = Pool(len(input))
-        try:
-            results = pool.map(_get_scan_token_results, input)
-        finally:
-            pool.close()
-            pool.join()
+        input = [(token.serialize(), self.master_hosts, self.master_ports) for token in tokens]
 
         # Validate results
+        results = [_get_scan_token_results(i) for i in input]
         actual_tuples = []
         for result in results:
             actual_tuples += result
diff --git a/python/setup.py b/python/setup.py
index 87872bc..a09f7a9 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -30,11 +30,6 @@ import os
 import re
 import subprocess
 
-# Workaround a Python bug in which multiprocessing's atexit handler doesn't
-# play well with pytest. See http://bugs.python.org/issue15881 for details
-# and this suggested workaround (comment msg170215 in the thread).
-import multiprocessing
-
 if Cython.__version__ < '0.21.0':
     raise Exception('Please upgrade to Cython 0.21.0 or newer')