You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2015/07/25 17:02:04 UTC

[jira] [Commented] (THRIFT-2918) Race condition in Python TProcessPoolServer test

    [ https://issues.apache.org/jira/browse/THRIFT-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641648#comment-14641648 ] 

Hudson commented on THRIFT-2918:
--------------------------------

SUCCESS: Integrated in Thrift #1622 (See [https://builds.apache.org/job/Thrift/1622/])
THRIFT-2918 Race condition in Python TProcessPoolServer test (jensg: rev f7d327ae96d451ff76b80d7e952f9cc21a1c4464)
* test/py/TestServer.py


> Race condition in Python TProcessPoolServer test
> ------------------------------------------------
>
>                 Key: THRIFT-2918
>                 URL: https://issues.apache.org/jira/browse/THRIFT-2918
>             Project: Thrift
>          Issue Type: Bug
>          Components: Python - Library
>         Environment: openSUSE 13.2
> Python 2.7.8 (default, Sep 30 2014, 15:34:38) [GCC] on linux2
>            Reporter: Jens Geyer
>            Assignee: Jens Geyer
>             Fix For: 0.9.3
>
>         Attachments: THRIFT-2918-python-test-timeouts.patch
>
>
> {{make check}} gets stuck very reproducible in my VM at Test run #218:
> {code}
> Test run #217:  (includes gen-py-default) Server=TProcessPoolServer,  Proto=accel,  zlib=False,  SSL=False
> Testing server TProcessPoolServer: /usr/bin/python ./TestServer.py --genpydir=gen-py-default --protocol=accel --port=9090 TProcessPoolServer
> Testing client: /usr/bin/python ./TestClient.py --genpydir=gen-py-default --protocol=accel --port=9090 --transport=buffered
> ...testException(Safe)
> testException(Xception)
> testException(throw_undeclared)
> ...............
> ----------------------------------------------------------------------
> Ran 18 tests in 0.563s
> OK
> Giving TProcessPoolServer (proto=accel,zlib=False,ssl=False) an extra 3 seconds for childprocesses to terminate via alarm
> Terminating worker: <Process(Process-1, started daemon)>
> Terminating worker: <Process(Process-2, started daemon)>
> Terminating worker: <Process(Process-3, started daemon)>
> Terminating worker: <Process(Process-4, started daemon)>
> Terminating worker: <Process(Process-5, started daemon)>
> Requesting server to stop()
> OK: Finished (includes gen-py-default)  TProcessPoolServer / accel proto / zlib=False / SSL=False.   217 combinations tested.
> Test run #218:  (includes gen-py-default) Server=TProcessPoolServer,  Proto=accel,  zlib=False,  SSL=True
> Testing server TProcessPoolServer: /usr/bin/python ./TestServer.py --genpydir=gen-py-default --protocol=accel --port=9090 --ssl TProcessPoolServer
> Testing client: /usr/bin/python ./TestClient.py --genpydir=gen-py-default --protocol=accel --port=9090 --ssl --transport=buffered
> ...testException(Safe)
> testException(Xception)
> testException(throw_undeclared)
> ..........Terminating worker: <Process(Process-1, started daemon)>
> Terminating worker: <Process(Process-2, started daemon)>
> Terminating worker: <Process(Process-3, started daemon)>
> Terminating worker: <Process(Process-4, started daemon)>
> Terminating worker: <Process(Process-5, started daemon)>
> Requesting server to stop()
> {code}
> After fiddling a bit around with it I got it to work by increasing the alarm() timeout from 2 seconds to 4 seconds. 
> I'm not a Python expert, but the code looks somewhat interesting to me: 
> - The server code starts the workers, but some piece of code outside of the server is responsible for terminating them. Is that really idiomatic in Python or just bad design? 
> - The Condition() object used in TProcessPoolsServer.py should probably be replaced by an Event() object. Especially, as Condition.wait() [seems to have it's own perils|http://stackoverflow.com/questions/24137480/threading-condition-waittimeout-ignores-threading-condition-notify] and Event is much easier to use.
> - Calling Condition.aquire() without a matching release() within a {{while True:}} loop looks also not very convincing to me. AFAIK the second call to aquire() will block, if that ever happens (it did not in my tests).
> The bad news is, that neither of the proposed changes above had any effect on the race conditions, except increasing the timeout - but that is merely a workaround, not a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)