You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by jeking3 <gi...@git.apache.org> on 2016/04/06 16:40:04 UTC

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

GitHub user jeking3 opened a pull request:

    https://github.com/apache/thrift/pull/980

    THRIFT-3768: ensure TThreadedServer guarantees the lifetime of the client

    TThreadedServer now guarantees the lifetime of the TConnectedClient.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jeking3/thrift defect/THRIFT-3768-alternate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/thrift/pull/980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #980
    
----
commit 7132ec16086daf2b2808e3a91f78ecfbca6a5dab
Author: Jim King <ji...@simplivity.com>
Date:   2016-04-05T16:17:51Z

    THRIFT-3768: ensure TThreadedServer guarantees the lifetime of the client

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the pull request:

    https://github.com/apache/thrift/pull/980#issuecomment-208013575
  
    I ran the test that failed in the apache build in a loop and finally ended up in a deadlock situation, so I am looking at that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59306037
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.h ---
    @@ -83,12 +88,56 @@ class TThreadedServer : public TServerFramework {
       virtual void serve();
     
     protected:
    +  /**
    +   * Drain recently connected clients by joining their threads - this is done lazily because
    +   * we cannot do it inside the thread context that is disconnecting.
    +   */
    +  virtual void drainDeadClients();
    +
    +  /**
    +   * Implementation of TServerFramework::onClientConnected
    +   */
       virtual void onClientConnected(const boost::shared_ptr<TConnectedClient>& pClient) /* override */;
    -  virtual void onClientDisconnected(TConnectedClient* pClient) /* override */;
    +
    +  /**
    +   * Implementation of TServerFramework::onClientDisconnected
    +   */
    +  virtual void onClientDisconnected(TConnectedClient *pClient) /* override */;
     
       boost::shared_ptr<apache::thrift::concurrency::ThreadFactory> threadFactory_;
    -  apache::thrift::concurrency::Monitor clientsMonitor_;
    +
    +  /**
    +   * A helper wrapper used to wrap the client in something we can use to maintain
    +   * the lifetime of the connected client within a detached thread.
    --- End diff --
    
    You are correct, we can just store the thread instead of this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r58851243
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp ---
    @@ -92,29 +97,42 @@ TThreadedServer::~TThreadedServer() {
     void TThreadedServer::serve() {
       TServerFramework::serve();
     
    -  // Drain all clients - no more will arrive
    -  try {
    -    Synchronized s(clientsMonitor_);
    -    while (getConcurrentClientCount() > 0) {
    -      clientsMonitor_.wait();
    -    }
    -  } catch (TException& tx) {
    -    string errStr = string("TThreadedServer: Exception joining workers: ") + tx.what();
    -    GlobalOutput(errStr.c_str());
    +  // Ensure post-condition of no active clients
    +  Synchronized s(clientMonitor_);
    +  while (!clientMap_.empty()) {
    +    clientMonitor_.wait();
       }
     }
     
     void TThreadedServer::onClientConnected(const shared_ptr<TConnectedClient>& pClient) {
    -  threadFactory_->newThread(pClient)->start();
    +  Synchronized sync(clientMonitor_);
    +  clientMap_.insert(ClientMap::value_type(pClient.get(), boost::make_shared<TConnectedClientTracker>(pClient)));
    +
    +  // We do not track the threads themselves
    +  ClientMap::const_iterator it = clientMap_.find(pClient.get());
    +  threadFactory_->newThread(it->second)->start();
     }
     
     void TThreadedServer::onClientDisconnected(TConnectedClient* pClient) {
    -  THRIFT_UNUSED_VARIABLE(pClient);
    -  Synchronized s(clientsMonitor_);
    -  if (getConcurrentClientCount() == 0) {
    -    clientsMonitor_.notify();
    +  Synchronized sync(clientMonitor_);
    +  clientMap_.erase(pClient);
    +  if (clientMap_.empty()) {
    --- End diff --
    
    Understood.  I'm working on that now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59306085
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp ---
    @@ -90,31 +95,71 @@ TThreadedServer::~TThreadedServer() {
     }
     
     void TThreadedServer::serve() {
    +  threadFactory_->setDetached(false);
       TServerFramework::serve();
     
    -  // Drain all clients - no more will arrive
    -  try {
    -    Synchronized s(clientsMonitor_);
    -    while (getConcurrentClientCount() > 0) {
    -      clientsMonitor_.wait();
    -    }
    -  } catch (TException& tx) {
    -    string errStr = string("TThreadedServer: Exception joining workers: ") + tx.what();
    -    GlobalOutput(errStr.c_str());
    +  // Ensure post-condition of no active clients
    +  Synchronized s(clientMonitor_);
    +  while (!activeClientMap_.empty()) {
    +    clientMonitor_.wait();
    +  }
    +
    +  drainDeadClients();
    +}
    +
    +void TThreadedServer::drainDeadClients() {
    +  // we're in a monitor here
    +  while (!deadClientMap_.empty()) {
    +    ClientMap::iterator it = deadClientMap_.begin();
    +    it->second->join();
    +    deadClientMap_.erase(it);
       }
     }
     
     void TThreadedServer::onClientConnected(const shared_ptr<TConnectedClient>& pClient) {
    -  threadFactory_->newThread(pClient)->start();
    +  Synchronized sync(clientMonitor_);
    +  drainDeadClients();
    --- End diff --
    
    Sure we could do it that way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r60590542
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    This routine blocks until there are fewer actual workers than the number of workers when it was called less the value passed in.  On a busy system where workers are being added this means it might be in here for a while.  I did not change this behavior; simply I ensured that the effects of multiple threads calling removeWorker() at the same time don't clobber each-other's desired goal of how many to remove.
    
    It would be better to have two calls; one to trim the maximum number of workers but it doesn't block; another being a barrier you can use to get to that number.  Callers may want to trim without blocking, for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59390673
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.h ---
    @@ -83,12 +88,56 @@ class TThreadedServer : public TServerFramework {
       virtual void serve();
     
     protected:
    +  /**
    +   * Drain recently connected clients by joining their threads - this is done lazily because
    +   * we cannot do it inside the thread context that is disconnecting.
    +   */
    +  virtual void drainDeadClients();
    +
    +  /**
    +   * Implementation of TServerFramework::onClientConnected
    +   */
       virtual void onClientConnected(const boost::shared_ptr<TConnectedClient>& pClient) /* override */;
    -  virtual void onClientDisconnected(TConnectedClient* pClient) /* override */;
    +
    +  /**
    +   * Implementation of TServerFramework::onClientDisconnected
    +   */
    +  virtual void onClientDisconnected(TConnectedClient *pClient) /* override */;
     
       boost::shared_ptr<apache::thrift::concurrency::ThreadFactory> threadFactory_;
    -  apache::thrift::concurrency::Monitor clientsMonitor_;
    +
    +  /**
    +   * A helper wrapper used to wrap the client in something we can use to maintain
    +   * the lifetime of the connected client within a detached thread.
    --- End diff --
    
    I made this change and it will only work if I modify each of the thread implementations to release the smart_ptr to runnable in the thread class after the thread runs.  I assume this is acceptable; once the runnable runs we don't need the thread smart_ptr holding on to it any more; TServerFramework requires the TConnectedClient (the runnable) to be destroyed when the client disconnects to function properly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r60614631
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    In that particular scenario, the second removeWorker should fail since we should be synchronizing correctly when it checks if (value > workerMaxCount_), but I agree that ThreadManager probably needs some TLC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the pull request:

    https://github.com/apache/thrift/pull/980#issuecomment-214391825
  
    Looking for an advocate to consider merging this.  It passed all three builds with the exception of something that happened on the apache jenkins where a cppcheck file was not created:
    
    > [Cppcheck] Starting the cppcheck analysis.
    > [Cppcheck] No cppcheck test report file(s) were found with the pattern 'cppcheck-result.xml' relative to '/home/jenkins/jenkins-slave/workspace/Thrift-precommit'.  Did you enter a pattern relative to the correct directory?  Did you generate the XML report(s) for Cppcheck?
    > [Cppcheck] Parsing throws exceptions. No cppcheck test report file(s) were found with the pattern 'cppcheck-result.xml' relative to '/home/jenkins/jenkins-slave/workspace/Thrift-precommit'.  Did you enter a pattern relative to the correct directory?  Did you generate the XML report(s) for Cppcheck?
    > Build step 'Publish Cppcheck results' changed build result to FAILURE



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r61612686
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    I reverted all changes to this file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the pull request:

    https://github.com/apache/thrift/pull/980#issuecomment-208513794
  
    I'm starting to believe that the TThreadServerPool implementation may have the same issue.  The cores I have been able to produce through repeated testing are showing a pthread running when nothing else is, and it usually cores because it is trying to lock a mutex that has been destroyed.  I'm looking into this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59305327
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp ---
    @@ -90,31 +95,71 @@ TThreadedServer::~TThreadedServer() {
     }
     
     void TThreadedServer::serve() {
    +  threadFactory_->setDetached(false);
       TServerFramework::serve();
     
    -  // Drain all clients - no more will arrive
    -  try {
    -    Synchronized s(clientsMonitor_);
    -    while (getConcurrentClientCount() > 0) {
    -      clientsMonitor_.wait();
    -    }
    -  } catch (TException& tx) {
    -    string errStr = string("TThreadedServer: Exception joining workers: ") + tx.what();
    -    GlobalOutput(errStr.c_str());
    +  // Ensure post-condition of no active clients
    +  Synchronized s(clientMonitor_);
    +  while (!activeClientMap_.empty()) {
    +    clientMonitor_.wait();
    +  }
    +
    +  drainDeadClients();
    +}
    +
    +void TThreadedServer::drainDeadClients() {
    +  // we're in a monitor here
    +  while (!deadClientMap_.empty()) {
    +    ClientMap::iterator it = deadClientMap_.begin();
    +    it->second->join();
    +    deadClientMap_.erase(it);
       }
     }
     
     void TThreadedServer::onClientConnected(const shared_ptr<TConnectedClient>& pClient) {
    -  threadFactory_->newThread(pClient)->start();
    +  Synchronized sync(clientMonitor_);
    +  drainDeadClients();
    --- End diff --
    
    What do you think about draining in onClientDisconnected instead of onClientConnected (note to take care of draining before adding the thread that initiated onClientDisconnected to the list of dead clients)? I think it might be better to make onClientConnected as fast as possible because the serve thread is the caller?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r60612338
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    I'm not sure I understand what the problem is. Why would calling removeWorker() at the same time clobber each other's desire goal? They have a shared goal of reducing workerMaxCount_ and both will eventually get there, right?
    
    However, I think the new implementation does have problems with addWorker and removeWorker. Say I start with 1 worker:
    (1) Thread1 calls removeWorker(1) - goalCount is 0, workerCount_ is 1
    (2) Thread2 calls addWorker(2) - goalCount is 0, workerCount_ is 3
    (3) Thread3 worker is reaped - goalCount is 0, workerCount_ is 2 <- this will stay at 2 forever
    
    In any case, we may want to discuss this in a separate change if there is a problem with the current implementation since this pull request no longer uses the thread manager.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on the pull request:

    https://github.com/apache/thrift/pull/980#issuecomment-211137935
  
    I think this change looks good for the most part, maybe we can get one of the maintainers to look at it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #980: THRIFT-3768: ensure TThreadedServer guarantees the...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/thrift/pull/980


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r58814042
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp ---
    @@ -92,29 +97,42 @@ TThreadedServer::~TThreadedServer() {
     void TThreadedServer::serve() {
       TServerFramework::serve();
     
    -  // Drain all clients - no more will arrive
    -  try {
    -    Synchronized s(clientsMonitor_);
    -    while (getConcurrentClientCount() > 0) {
    -      clientsMonitor_.wait();
    -    }
    -  } catch (TException& tx) {
    -    string errStr = string("TThreadedServer: Exception joining workers: ") + tx.what();
    -    GlobalOutput(errStr.c_str());
    +  // Ensure post-condition of no active clients
    +  Synchronized s(clientMonitor_);
    +  while (!clientMap_.empty()) {
    +    clientMonitor_.wait();
       }
     }
     
     void TThreadedServer::onClientConnected(const shared_ptr<TConnectedClient>& pClient) {
    -  threadFactory_->newThread(pClient)->start();
    +  Synchronized sync(clientMonitor_);
    +  clientMap_.insert(ClientMap::value_type(pClient.get(), boost::make_shared<TConnectedClientTracker>(pClient)));
    +
    +  // We do not track the threads themselves
    +  ClientMap::const_iterator it = clientMap_.find(pClient.get());
    +  threadFactory_->newThread(it->second)->start();
     }
     
     void TThreadedServer::onClientDisconnected(TConnectedClient* pClient) {
    -  THRIFT_UNUSED_VARIABLE(pClient);
    -  Synchronized s(clientsMonitor_);
    -  if (getConcurrentClientCount() == 0) {
    -    clientsMonitor_.notify();
    +  Synchronized sync(clientMonitor_);
    +  clientMap_.erase(pClient);
    +  if (clientMap_.empty()) {
    --- End diff --
    
    This doesn't fix the issue because as soon as you notify clientMonitor_, TThreadedServer::serve() will exit and clients are free to release TThreadedServer, but the thread that initiated onClientDisconnected is still in the middle of disposeConnectedClient, which it should not be executing any more because the object is already destroyed.
    
    Try adding one second sleep after the delete in disposeConnectedClient and run the tests. I see the same segfault in TServerIntegrationTest. I am not sure we can get away with not joining the threads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the pull request:

    https://github.com/apache/thrift/pull/980#issuecomment-207675424
  
    I did not touch "go" or "py" implementations and that was the reason for the failure in #2467.2.  #2467.1 failed in three tests including go-cpp/json however given other go tests are failing it looks like the issue is likely in "go".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on the pull request:

    https://github.com/apache/thrift/pull/980#issuecomment-215848959
  
    LGTM. Thanks for fixing this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r61558381
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    Old line 424 was accessing ``workerMaxCount_`` outside of a lock so its behavior was unpredictable when multiple threads called ``removeWorker()`` simultaneously.  Saving the value ti ``goalCount`` removes this unpredictability and therefore is an improvement.  Also see THRIFT-3233 for another fix submitted by someone else.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r60591056
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp ---
    @@ -90,31 +95,71 @@ TThreadedServer::~TThreadedServer() {
     }
     
     void TThreadedServer::serve() {
    +  threadFactory_->setDetached(false);
    --- End diff --
    
    I don't see an easy way around this.  The threaded server must be able to join.  The caller could pass a non-detached ThreadManager into the ctor and then setDetached(true) before or while serve() is running - there are a variety of things we cannot control.  The default behavior is to use a detached ThreadManager, so if I changed this to a check with an invalid_argument or logic_error exception, there would have been more work to do in the default simple thread manager static factory and related calls.
    
    On line 38 of the header this behavior is documented, at least.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59996185
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp ---
    @@ -90,31 +95,69 @@ TThreadedServer::~TThreadedServer() {
     }
     
     void TThreadedServer::serve() {
    +  threadFactory_->setDetached(false);
       TServerFramework::serve();
     
    -  // Drain all clients - no more will arrive
    -  try {
    -    Synchronized s(clientsMonitor_);
    -    while (getConcurrentClientCount() > 0) {
    -      clientsMonitor_.wait();
    -    }
    -  } catch (TException& tx) {
    -    string errStr = string("TThreadedServer: Exception joining workers: ") + tx.what();
    -    GlobalOutput(errStr.c_str());
    +  // Ensure post-condition of no active clients
    +  Synchronized s(clientMonitor_);
    +  while (!activeClientMap_.empty()) {
    +    clientMonitor_.wait();
    +  }
    +
    +  drainDeadClients();
    +}
    +
    +void TThreadedServer::drainDeadClients() {
    +  // we're in a monitor here
    +  while (!deadClientMap_.empty()) {
    +    ClientMap::iterator it = deadClientMap_.begin();
    +    it->second->join();
    +    deadClientMap_.erase(it);
       }
     }
     
     void TThreadedServer::onClientConnected(const shared_ptr<TConnectedClient>& pClient) {
    -  threadFactory_->newThread(pClient)->start();
    +  Synchronized sync(clientMonitor_);
    +  activeClientMap_.insert(ClientMap::value_type(pClient.get(), boost::make_shared<TConnectedClientRunner>(pClient)));
    +  ClientMap::iterator it = activeClientMap_.find(pClient.get());
    --- End diff --
    
    You can be slightly more efficient if you use the pair<iterator,bool> returned by insert.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59995963
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    What if an addWorker call was made in between this and when goalCount was set? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the pull request:

    https://github.com/apache/thrift/pull/980#issuecomment-215502049
  
    Looking for an advocate who would consider merging this fix despite the busted Apache Jenkins environment result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the pull request:

    https://github.com/apache/thrift/pull/980#issuecomment-209039437
  
    @jfarrell Could you look into why the Jenkins build for apache thinks that a c_glib file needs to be merged manually and/or has an open pull request on the apache build system?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59304879
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.h ---
    @@ -83,12 +88,56 @@ class TThreadedServer : public TServerFramework {
       virtual void serve();
     
     protected:
    +  /**
    +   * Drain recently connected clients by joining their threads - this is done lazily because
    +   * we cannot do it inside the thread context that is disconnecting.
    +   */
    +  virtual void drainDeadClients();
    +
    +  /**
    +   * Implementation of TServerFramework::onClientConnected
    +   */
       virtual void onClientConnected(const boost::shared_ptr<TConnectedClient>& pClient) /* override */;
    -  virtual void onClientDisconnected(TConnectedClient* pClient) /* override */;
    +
    +  /**
    +   * Implementation of TServerFramework::onClientDisconnected
    +   */
    +  virtual void onClientDisconnected(TConnectedClient *pClient) /* override */;
     
       boost::shared_ptr<apache::thrift::concurrency::ThreadFactory> threadFactory_;
    -  apache::thrift::concurrency::Monitor clientsMonitor_;
    +
    +  /**
    +   * A helper wrapper used to wrap the client in something we can use to maintain
    +   * the lifetime of the connected client within a detached thread.
    --- End diff --
    
    It's not a detached thread anymore, but why is this needed? I think ClientMap be a map from TConnectedClient* to shared_ptr<apache:thrift::concurrency::Thread> because Thread itself holds on to the TConnectedClient and maintains its lifetime since it needs to run it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r61589086
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    Old line 399 modifies it under the monitor ``monitor_`` and then old line 424 acquired it under a different monitor ``workerMonitor_``.  One cannot use different monitors to control access to the same variable and expect any synchronization.  This is why I squirrel away the new value into goalCount and use it later on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r61534321
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    I would be happy to give my vote for this to be merged in, if you can address my concern on the change in this file. I still don't think this new implementation correctly handles concurrent addWorker/removeWorker calls. I also think the existing implementation actually handles concurrent removeWorker calls correctly, so it is not clear to me why this change is necessary. Sorry if I miss anything obvious.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r60227644
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp ---
    @@ -90,31 +95,69 @@ TThreadedServer::~TThreadedServer() {
     }
     
     void TThreadedServer::serve() {
    +  threadFactory_->setDetached(false);
       TServerFramework::serve();
     
    -  // Drain all clients - no more will arrive
    -  try {
    -    Synchronized s(clientsMonitor_);
    -    while (getConcurrentClientCount() > 0) {
    -      clientsMonitor_.wait();
    -    }
    -  } catch (TException& tx) {
    -    string errStr = string("TThreadedServer: Exception joining workers: ") + tx.what();
    -    GlobalOutput(errStr.c_str());
    +  // Ensure post-condition of no active clients
    +  Synchronized s(clientMonitor_);
    +  while (!activeClientMap_.empty()) {
    +    clientMonitor_.wait();
    +  }
    +
    +  drainDeadClients();
    +}
    +
    +void TThreadedServer::drainDeadClients() {
    +  // we're in a monitor here
    +  while (!deadClientMap_.empty()) {
    +    ClientMap::iterator it = deadClientMap_.begin();
    +    it->second->join();
    +    deadClientMap_.erase(it);
       }
     }
     
     void TThreadedServer::onClientConnected(const shared_ptr<TConnectedClient>& pClient) {
    -  threadFactory_->newThread(pClient)->start();
    +  Synchronized sync(clientMonitor_);
    +  activeClientMap_.insert(ClientMap::value_type(pClient.get(), boost::make_shared<TConnectedClientRunner>(pClient)));
    +  ClientMap::iterator it = activeClientMap_.find(pClient.get());
    --- End diff --
    
    I implemented this suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59410629
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.h ---
    @@ -83,12 +88,56 @@ class TThreadedServer : public TServerFramework {
       virtual void serve();
     
     protected:
    +  /**
    +   * Drain recently connected clients by joining their threads - this is done lazily because
    +   * we cannot do it inside the thread context that is disconnecting.
    +   */
    +  virtual void drainDeadClients();
    +
    +  /**
    +   * Implementation of TServerFramework::onClientConnected
    +   */
       virtual void onClientConnected(const boost::shared_ptr<TConnectedClient>& pClient) /* override */;
    -  virtual void onClientDisconnected(TConnectedClient* pClient) /* override */;
    +
    +  /**
    +   * Implementation of TServerFramework::onClientDisconnected
    +   */
    +  virtual void onClientDisconnected(TConnectedClient *pClient) /* override */;
     
       boost::shared_ptr<apache::thrift::concurrency::ThreadFactory> threadFactory_;
    -  apache::thrift::concurrency::Monitor clientsMonitor_;
    +
    +  /**
    +   * A helper wrapper used to wrap the client in something we can use to maintain
    +   * the lifetime of the connected client within a detached thread.
    --- End diff --
    
    This busted StressTest and StressTestNonBlocking because they are coded to assume that the thread shared pointer will hold on to the runnable after the thread is gone.  The TConnectedClientTracker is therefore necessary to avoid breaking existing clients that might depend on this behavior of Thread, so I am not going to simplify to a map of TConnectedClient* to shared_ptr<Thread>.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59996235
  
    --- Diff: lib/cpp/test/TServerIntegrationTest.cpp ---
    @@ -152,7 +155,10 @@ class TServerIntegrationTestFixture {
                                       new TServerSocket("localhost", 0)),
                                   boost::shared_ptr<TTransportFactory>(new TTransportFactory),
                                   boost::shared_ptr<TProtocolFactory>(new TBinaryProtocolFactory))),
    -      pEventHandler(boost::shared_ptr<TServerReadyEventHandler>(new TServerReadyEventHandler)) {
    +      pEventHandler(boost::shared_ptr<TServerReadyEventHandler>(new TServerReadyEventHandler)),
    +	  bStressDone(false),
    --- End diff --
    
    Looks like the tabs are not aligned correctly in this document.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59390310
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp ---
    @@ -90,31 +95,71 @@ TThreadedServer::~TThreadedServer() {
     }
     
     void TThreadedServer::serve() {
    +  threadFactory_->setDetached(false);
    --- End diff --
    
    I could require it to be detached in the constructor and throw an invalid argument if it is not, I suppose, rather than mutating the thread factory in serve.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r60614134
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    ThreadManager should probably be tossed out; removeWorker is inherently dangerous for a variety of reasons just like this.  In particular I can see a scenario where multiple workers simultaneously try to reduce the worker count by the same amount and fail badly because they both thought there were 5 workers and they both asked to removeWorker(3).  The API would be much better as specifying what the new limit should be and the reduction of intention vs setting a barrier until you get there should be separate operations.  It seems pretty fragile as-is and I think the whole thing needs a rethink.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r61581424
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    It is not accessing `workerMaxCount_` outside of a lock. See line 424 where it uses the `Synchronized` object which grabs the mutex that the monitor owns. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59304884
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp ---
    @@ -90,31 +95,71 @@ TThreadedServer::~TThreadedServer() {
     }
     
     void TThreadedServer::serve() {
    +  threadFactory_->setDetached(false);
    --- End diff --
    
    This is a functional change that is subtle and may break existing users of TThreadedServer. I don't have a better solution at the moment, but I do want to call this out to see if others have a better solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r61601611
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
       {
         Synchronized s(workerMonitor_);
     
    -    while (workerCount_ != workerMaxCount_) {
    +    while (workerCount_ > goalCount) {
    --- End diff --
    
    I had missed that it is using different monitors and thus different mutexes. Thanks for clarifying.
    
    I would still argue that this new implementation is worse than the existing implementation. With the existing implementation, concurrent **writes** to `workerMaxCount_` and `workerCount_` is synchronized, but the reading of those values is not synchronized. Reading an aligned integer is most likely atomic on most architectures, so this might not even be that unsafe (?). Getting either the old value or the new value will not invalidate what it is trying to do.
    
    On the other hand, the new implementation has a problem with concurrent removeWorker and addWorker as far as I can tell. A removeWorker can be stuck forever if a addWorker call comes in after the removeWorker sets the local `goalCount` because it will never hit that goal with new workers that are produced by addWorker.
    
    Hopefully I didn't miss another detail. Thanks for working with me to address these concerns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request: THRIFT-3768: ensure TThreadedServer guarantee...

Posted by tpcwang <gi...@git.apache.org>.
Github user tpcwang commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/980#discussion_r59996087
  
    --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp ---
    @@ -90,31 +95,71 @@ TThreadedServer::~TThreadedServer() {
     }
     
     void TThreadedServer::serve() {
    +  threadFactory_->setDetached(false);
    --- End diff --
    
    I am *slightly* concerned about user reusing the thrift factory in other part of the code, so changing this can change the function of their other code. I'm not too sure what the maintainers usually do when there are some minor functional changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---