You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by jeking3 <gi...@git.apache.org> on 2016/10/01 05:30:34 UTC

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

GitHub user jeking3 opened a pull request:

    https://github.com/apache/thrift/pull/1103

    THRIFT-3932: fixed ThreadManager concurrency issues, added more tests in that area, did a little refactoring and prettying up along the way

    See the THRIFT-3932 ticket for more details.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jeking3/thrift defect/THRIFT-3932

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/thrift/pull/1103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1103
    
----
commit 3691db3af28d02e348b79d2c76e527432448b3c6
Author: James E. King, III <ji...@simplivity.com>
Date:   2016-09-30T22:41:17Z

    THRIFT-3932: fixed ThreadManager concurrency issues, added more tests in that area, did a little refactoring and prettying up along the way

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81470203
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -339,103 +357,88 @@ void ThreadManager::Impl::addWorker(size_t value) {
         idMap_.insert(std::pair<const Thread::id_t, shared_ptr<Thread> >((*ix)->getId(), *ix));
       }
     
    -  {
    -    Synchronized s(workerMonitor_);
    -    while (workerCount_ != workerMaxCount_) {
    -      workerMonitor_.wait();
    -    }
    +  while (workerCount_ != workerMaxCount_) {
    +    workerMonitor_.wait();
       }
     }
     
     void ThreadManager::Impl::start() {
    -
    +  Guard g(mutex_);
       if (state_ == ThreadManager::STOPPED) {
         return;
       }
     
    -  {
    -    Synchronized s(monitor_);
    -    if (state_ == ThreadManager::UNINITIALIZED) {
    -      if (!threadFactory_) {
    -        throw InvalidArgumentException();
    -      }
    -      state_ = ThreadManager::STARTED;
    -      monitor_.notifyAll();
    +  if (state_ == ThreadManager::UNINITIALIZED) {
    +    if (!threadFactory_) {
    +      throw InvalidArgumentException();
         }
    +    state_ = ThreadManager::STARTED;
    +    monitor_.notifyAll();
    +  }
     
    -    while (state_ == STARTING) {
    -      monitor_.wait();
    -    }
    +  while (state_ == STARTING) {
    +    monitor_.wait();
       }
     }
     
    -void ThreadManager::Impl::stopImpl(bool join) {
    +void ThreadManager::Impl::stop() {
    +  Guard g(mutex_);
       bool doStop = false;
    -  if (state_ == ThreadManager::STOPPED) {
    -    return;
    -  }
     
    -  {
    -    Synchronized s(monitor_);
    -    if (state_ != ThreadManager::STOPPING && state_ != ThreadManager::JOINING
    -        && state_ != ThreadManager::STOPPED) {
    -      doStop = true;
    -      state_ = join ? ThreadManager::JOINING : ThreadManager::STOPPING;
    -    }
    +  if (state_ != ThreadManager::STOPPING && state_ != ThreadManager::JOINING
    +      && state_ != ThreadManager::STOPPED) {
    +    doStop = true;
    +    state_ = ThreadManager::JOINING;
       }
     
       if (doStop) {
    -    removeWorker(workerCount_);
    +    removeWorkersUnderLock(workerCount_);
       }
     
    -  // XXX
    -  // should be able to block here for transition to STOPPED since we're no
    -  // using shared_ptrs
    -
    -  {
    -    Synchronized s(monitor_);
    -    state_ = ThreadManager::STOPPED;
    -  }
    +  state_ = ThreadManager::STOPPED;
     }
     
     void ThreadManager::Impl::removeWorker(size_t value) {
    -  std::set<shared_ptr<Thread> > removedThreads;
    -  {
    -    Synchronized s(monitor_);
    -    if (value > workerMaxCount_) {
    -      throw InvalidArgumentException();
    -    }
    +  Guard g(mutex_);
    +  removeWorkersUnderLock(value);
    +}
     
    -    workerMaxCount_ -= value;
    +void ThreadManager::Impl::removeWorkersUnderLock(size_t value) {
    +  if (value > workerMaxCount_) {
    +    throw InvalidArgumentException();
    +  }
     
    -    if (idleCount_ < value) {
    -      for (size_t ix = 0; ix < idleCount_; ix++) {
    -        monitor_.notify();
    -      }
    -    } else {
    -      monitor_.notifyAll();
    +  workerMaxCount_ -= value;
    +
    +  if (idleCount_ < value) {
    +    for (size_t ix = 0; ix < idleCount_; ix++) {
    --- End diff --
    
    I just pushed a fix for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    Okay disregard that comment about boost... it turns out the blockTest was using a std::set to store the tasks it put into the thread manager so they were getting reordered.  I changed it to a vector and it resolved that issue so I am going to re-enable the concurrency test on windows builds.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81470201
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -292,28 +310,30 @@ class ThreadManager::Worker : public Runnable {
                 GlobalOutput.printf("[ERROR] task->run() raised an unknown exception");
               }
             }
    +        else if (task->state_ == ThreadManager::Task::TIMEDOUT && manager_->expireCallback_) {
    +          manager_->expireCallback_(task->getRunnable());
    +          manager_->expiredCount_++;
    +        }
    --- End diff --
    
    I just pushed a fix for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    By removing the THRIFT_SLEEP_USEC(1) calls in the test it gets much faster, which is great, however as part of this effort I have discovered our boost implementation is broken in some way.  I haven't gotten to the root cause yet, but I can easily get concurrency_test to hang on Linux with boost 1.54 and Windows with boost 1.58 after fixing the logic in ThreadManagerTests.h / blockTest to be correct.  The symptom is that notifyAll() isn't waking up all the threads it is supposed to.  It seems to wake up about 64 threads before it gives up.  If you run blockTest by itself (comment out the other thread-manager tests in Tests.cpp) it will wait 1 out of 2 times, and if you look at things in the debugger you will see a bunch of threads blocking on blockMonitor_ waiting for the notification from the test thread.  They are properly synchronized but not all of them wake up all the time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    I pushed a fix for the failing monitor timing test as described above; this resolves the appveyor issue with concurrency_test that was previously discovered.  I intend to leave concurrency_test enabled in appveyor so if it breaks again I'll see what's up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by nsuke <gi...@git.apache.org>.
Github user nsuke commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81467563
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -339,103 +357,88 @@ void ThreadManager::Impl::addWorker(size_t value) {
         idMap_.insert(std::pair<const Thread::id_t, shared_ptr<Thread> >((*ix)->getId(), *ix));
    --- End diff --
    
    `getId` always returns invalid ID for detached boost/std threads.
    It only affects `add` method called from inside worker threads, which does not seem to be happening in thrift servers' code (not completely sure).
    A warning or an exception when passed such thread factories would be nice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by nsuke <gi...@git.apache.org>.
Github user nsuke commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81470710
  
    --- Diff: lib/cpp/src/thrift/concurrency/PosixThreadFactory.h ---
    @@ -39,7 +39,14 @@ class PosixThreadFactory : public ThreadFactory {
       /**
        * POSIX Thread scheduler policies
        */
    -  enum POLICY { OTHER, FIFO, ROUND_ROBIN };
    +  class Policy {
    +  public:
    +    enum value {
    +      OTHER       = 0,
    +      FIFO        = 1,
    +      ROUND_ROBIN = 2
    +    };
    --- End diff --
    
    I understand your point, although I wouldn't personally do this as it feels fighting against language.
    If we do this, IMO we should introduce implicit conversion from inner enum to outer class to remove the `::value` in method signatures.
    But I think it's completely separate issue and should be handled separately from this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/thrift/pull/1103


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by nsuke <gi...@git.apache.org>.
Github user nsuke commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81472348
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -339,103 +357,88 @@ void ThreadManager::Impl::addWorker(size_t value) {
         idMap_.insert(std::pair<const Thread::id_t, shared_ptr<Thread> >((*ix)->getId(), *ix));
    --- End diff --
    
    Removing seems quite safe to me too.
    But if you intend this for 0.10.0, it may not be the best timing.
    Thought ?
    Otherwise it looks good to me, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    It looks like the processor_test is failing now. ;(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    @ben-craig changing the "detached" behavior of an active thread manager seems like a bug to me which is why I removed it.  What is the use case for it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by nsuke <gi...@git.apache.org>.
Github user nsuke commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81467452
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -339,103 +357,88 @@ void ThreadManager::Impl::addWorker(size_t value) {
         idMap_.insert(std::pair<const Thread::id_t, shared_ptr<Thread> >((*ix)->getId(), *ix));
       }
     
    -  {
    -    Synchronized s(workerMonitor_);
    -    while (workerCount_ != workerMaxCount_) {
    -      workerMonitor_.wait();
    -    }
    +  while (workerCount_ != workerMaxCount_) {
    +    workerMonitor_.wait();
       }
     }
     
     void ThreadManager::Impl::start() {
    -
    +  Guard g(mutex_);
       if (state_ == ThreadManager::STOPPED) {
         return;
       }
     
    -  {
    -    Synchronized s(monitor_);
    -    if (state_ == ThreadManager::UNINITIALIZED) {
    -      if (!threadFactory_) {
    -        throw InvalidArgumentException();
    -      }
    -      state_ = ThreadManager::STARTED;
    -      monitor_.notifyAll();
    +  if (state_ == ThreadManager::UNINITIALIZED) {
    +    if (!threadFactory_) {
    +      throw InvalidArgumentException();
         }
    +    state_ = ThreadManager::STARTED;
    +    monitor_.notifyAll();
    +  }
     
    -    while (state_ == STARTING) {
    -      monitor_.wait();
    -    }
    +  while (state_ == STARTING) {
    +    monitor_.wait();
       }
     }
     
    -void ThreadManager::Impl::stopImpl(bool join) {
    +void ThreadManager::Impl::stop() {
    +  Guard g(mutex_);
       bool doStop = false;
    -  if (state_ == ThreadManager::STOPPED) {
    -    return;
    -  }
     
    -  {
    -    Synchronized s(monitor_);
    -    if (state_ != ThreadManager::STOPPING && state_ != ThreadManager::JOINING
    -        && state_ != ThreadManager::STOPPED) {
    -      doStop = true;
    -      state_ = join ? ThreadManager::JOINING : ThreadManager::STOPPING;
    -    }
    +  if (state_ != ThreadManager::STOPPING && state_ != ThreadManager::JOINING
    +      && state_ != ThreadManager::STOPPED) {
    +    doStop = true;
    +    state_ = ThreadManager::JOINING;
       }
     
       if (doStop) {
    -    removeWorker(workerCount_);
    +    removeWorkersUnderLock(workerCount_);
       }
     
    -  // XXX
    -  // should be able to block here for transition to STOPPED since we're no
    -  // using shared_ptrs
    -
    -  {
    -    Synchronized s(monitor_);
    -    state_ = ThreadManager::STOPPED;
    -  }
    +  state_ = ThreadManager::STOPPED;
     }
     
     void ThreadManager::Impl::removeWorker(size_t value) {
    -  std::set<shared_ptr<Thread> > removedThreads;
    -  {
    -    Synchronized s(monitor_);
    -    if (value > workerMaxCount_) {
    -      throw InvalidArgumentException();
    -    }
    +  Guard g(mutex_);
    +  removeWorkersUnderLock(value);
    +}
     
    -    workerMaxCount_ -= value;
    +void ThreadManager::Impl::removeWorkersUnderLock(size_t value) {
    +  if (value > workerMaxCount_) {
    +    throw InvalidArgumentException();
    +  }
     
    -    if (idleCount_ < value) {
    -      for (size_t ix = 0; ix < idleCount_; ix++) {
    -        monitor_.notify();
    -      }
    -    } else {
    -      monitor_.notifyAll();
    +  workerMaxCount_ -= value;
    +
    +  if (idleCount_ < value) {
    +    for (size_t ix = 0; ix < idleCount_; ix++) {
    --- End diff --
    
    I know it's not your code but shouldn't it be
    
        if (idleCount_ > value) {
            for( ...; ix < value; ...)
    
    ?
    If insufficient or equal number of people are sleeping, let's wake them up all, otherwise do it one by one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81470853
  
    --- Diff: lib/cpp/src/thrift/concurrency/PosixThreadFactory.h ---
    @@ -39,7 +39,14 @@ class PosixThreadFactory : public ThreadFactory {
       /**
        * POSIX Thread scheduler policies
        */
    -  enum POLICY { OTHER, FIFO, ROUND_ROBIN };
    +  class Policy {
    +  public:
    +    enum value {
    +      OTHER       = 0,
    +      FIFO        = 1,
    +      ROUND_ROBIN = 2
    +    };
    --- End diff --
    
    I'll revert the enum changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    I decided to leave setDetached and not deprecate it.  I don't see a reason for someone to change from detachable to joinable (or the reverse) while running but I believe all of the servers will properly support it because all three join implementations check if the thread is detached_.  Therefore I am doing a force push on this one and rebased it on master as of earlier today.  Please check it and let me know if there are any more concerns; since we're getting into the beginning of 0.11.0 development I want to get it in as early as possible to get as much soak time as possible, given the area of work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by nsuke <gi...@git.apache.org>.
Github user nsuke commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81464488
  
    --- Diff: lib/cpp/src/thrift/concurrency/PosixThreadFactory.h ---
    @@ -39,7 +39,14 @@ class PosixThreadFactory : public ThreadFactory {
       /**
        * POSIX Thread scheduler policies
        */
    -  enum POLICY { OTHER, FIFO, ROUND_ROBIN };
    +  class Policy {
    +  public:
    +    enum value {
    +      OTHER       = 0,
    +      FIFO        = 1,
    +      ROUND_ROBIN = 2
    +    };
    --- End diff --
    
    What is the purpose here ?
    At first glance I thought you were emulating C++11 enum class, but it does not provide type safety, and method signatures get noisy with ::value.
    There's certainly a way to avoid ::value and provide limited type-safety, but not sure if it's worth changing public API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    I re-enabled these tests and they have all passed in Travis and in Appveyor.  I also brought the time on TInterruptTest down by 90 seconds.  The entire suite runs in about 60 seconds on Windows without nonblocking tests; definitely runs in just a few minutes with all the tests enabled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    It looks like we have a number of tests disabled on Appveyor that were not behaving well... I'm going to re-enable as many as I can since they seem to be pretty stable for me locally, starting with concurrency_test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by nsuke <gi...@git.apache.org>.
Github user nsuke commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    It looks succeeding to me :tada:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    How will a decision be made as to whether this gets into 0.10.0 or not?  Obviously it is not a trivial change but I think it solidifies parts of the C++ engine that have been neglected; adds much better test coverage; improves the reliability and decreases the runtime of many unit tests; it also re-enables some tests that have been disabled in CI for a long time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81469729
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -339,103 +357,88 @@ void ThreadManager::Impl::addWorker(size_t value) {
         idMap_.insert(std::pair<const Thread::id_t, shared_ptr<Thread> >((*ix)->getId(), *ix));
       }
     
    -  {
    -    Synchronized s(workerMonitor_);
    -    while (workerCount_ != workerMaxCount_) {
    -      workerMonitor_.wait();
    -    }
    +  while (workerCount_ != workerMaxCount_) {
    +    workerMonitor_.wait();
       }
     }
     
     void ThreadManager::Impl::start() {
    -
    +  Guard g(mutex_);
       if (state_ == ThreadManager::STOPPED) {
         return;
       }
     
    -  {
    -    Synchronized s(monitor_);
    -    if (state_ == ThreadManager::UNINITIALIZED) {
    -      if (!threadFactory_) {
    -        throw InvalidArgumentException();
    -      }
    -      state_ = ThreadManager::STARTED;
    -      monitor_.notifyAll();
    +  if (state_ == ThreadManager::UNINITIALIZED) {
    +    if (!threadFactory_) {
    +      throw InvalidArgumentException();
         }
    +    state_ = ThreadManager::STARTED;
    +    monitor_.notifyAll();
    +  }
     
    -    while (state_ == STARTING) {
    -      monitor_.wait();
    -    }
    +  while (state_ == STARTING) {
    +    monitor_.wait();
       }
     }
     
    -void ThreadManager::Impl::stopImpl(bool join) {
    +void ThreadManager::Impl::stop() {
    +  Guard g(mutex_);
       bool doStop = false;
    -  if (state_ == ThreadManager::STOPPED) {
    -    return;
    -  }
     
    -  {
    -    Synchronized s(monitor_);
    -    if (state_ != ThreadManager::STOPPING && state_ != ThreadManager::JOINING
    -        && state_ != ThreadManager::STOPPED) {
    -      doStop = true;
    -      state_ = join ? ThreadManager::JOINING : ThreadManager::STOPPING;
    -    }
    +  if (state_ != ThreadManager::STOPPING && state_ != ThreadManager::JOINING
    +      && state_ != ThreadManager::STOPPED) {
    +    doStop = true;
    +    state_ = ThreadManager::JOINING;
       }
     
       if (doStop) {
    -    removeWorker(workerCount_);
    +    removeWorkersUnderLock(workerCount_);
       }
     
    -  // XXX
    -  // should be able to block here for transition to STOPPED since we're no
    -  // using shared_ptrs
    -
    -  {
    -    Synchronized s(monitor_);
    -    state_ = ThreadManager::STOPPED;
    -  }
    +  state_ = ThreadManager::STOPPED;
     }
     
     void ThreadManager::Impl::removeWorker(size_t value) {
    -  std::set<shared_ptr<Thread> > removedThreads;
    -  {
    -    Synchronized s(monitor_);
    -    if (value > workerMaxCount_) {
    -      throw InvalidArgumentException();
    -    }
    +  Guard g(mutex_);
    +  removeWorkersUnderLock(value);
    +}
     
    -    workerMaxCount_ -= value;
    +void ThreadManager::Impl::removeWorkersUnderLock(size_t value) {
    +  if (value > workerMaxCount_) {
    +    throw InvalidArgumentException();
    +  }
     
    -    if (idleCount_ < value) {
    -      for (size_t ix = 0; ix < idleCount_; ix++) {
    -        monitor_.notify();
    -      }
    -    } else {
    -      monitor_.notifyAll();
    +  workerMaxCount_ -= value;
    +
    +  if (idleCount_ < value) {
    +    for (size_t ix = 0; ix < idleCount_; ix++) {
    --- End diff --
    
    I will look at this more closely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by nsuke <gi...@git.apache.org>.
Github user nsuke commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    Oh, good catch, concurrency_test definitely needs to be enabled for Linux cmake CI and autotools.
    
    Somehow thought that you were referring to concurrency_test in the last comment,
    but `processor_test` worked without your patch for you ?
    For me it didn't, actually I don't remember a single instance it did.
    According to THRIFT-1336, it was disabled since the day 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81525843
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -339,103 +357,88 @@ void ThreadManager::Impl::addWorker(size_t value) {
         idMap_.insert(std::pair<const Thread::id_t, shared_ptr<Thread> >((*ix)->getId(), *ix));
    --- End diff --
    
    In addition, if you look at the logic around canSleep and the use of a timeout, it isn't even correct.  It checks to see if a timeout was requested then proceeds into an infinite loop waiting for a slot to add the task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by nsuke <gi...@git.apache.org>.
Github user nsuke commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81466362
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -292,28 +310,30 @@ class ThreadManager::Worker : public Runnable {
                 GlobalOutput.printf("[ERROR] task->run() raised an unknown exception");
               }
             }
    +        else if (task->state_ == ThreadManager::Task::TIMEDOUT && manager_->expireCallback_) {
    +          manager_->expireCallback_(task->getRunnable());
    +          manager_->expiredCount_++;
    +        }
    --- End diff --
    
    Looks problematic after `mutex_.unlock()` above.
    I guess you intend to unlock inside `if`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by ben-craig <gi...@git.apache.org>.
Github user ben-craig commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    Removing setDetached and the old thread factory ctors breaks client code loudly.  It doesn't fix the underlying problem, and it exchanges one problem (don't forget to set the detached mode!) for another (what does the 'false' in that parameter list mean?).
    
    I'm great with adding ctors that would make the code less buggy.  I'm also okay with the minor breakage of making isDetached and setDetached non-virtual.  But removing setDetached just breaks too much code without a strong benefit.
    
    If your change ripples out to completely unrelated tests, you need a strong motivation for it.  Please separate out the refactors from the fixes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    I reverted the PosixThreadManager enum changes per @nsuke 's request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by ben-craig <gi...@git.apache.org>.
Github user ben-craig commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    PlatformThreadFactory f;
    f.setDetached(false);
    
    Prior to this change, that was the only way to portably change the detached state.  The PosixThreadFactory didn't have a ctor that would permit that.  The tests were even using this pattern.
    
    I see that you have added a PosixThreadFactory ctor that lets you set the detached state in a portable way, and that's great.  It doesn't change that this pattern is already out there in a lot of production code.  It even broke tests which made no use of ThreadManager.
    
    If you need to fix the tests after making a change, chances are high that you have introduced a source breaking change that needs to be strongly considered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by ben-craig <gi...@git.apache.org>.
Github user ben-craig commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    I'm fine with you adding back setDetached and marking it deprecated.  If you really want, you could even put in some logic so that you assert or throw an error if someone calls setDetached after a thread is created by the factory.  Just don't break the two-stage initialization that is in existing code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81470274
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -339,103 +357,88 @@ void ThreadManager::Impl::addWorker(size_t value) {
         idMap_.insert(std::pair<const Thread::id_t, shared_ptr<Thread> >((*ix)->getId(), *ix));
    --- End diff --
    
    This means canSleep() will always return true, which is correct; when adding a task without a timeout you will get TooManyPendingTasksException.  When adding a task with a timeout you may get a TimedOutException.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81469719
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -292,28 +310,30 @@ class ThreadManager::Worker : public Runnable {
                 GlobalOutput.printf("[ERROR] task->run() raised an unknown exception");
               }
             }
    +        else if (task->state_ == ThreadManager::Task::TIMEDOUT && manager_->expireCallback_) {
    +          manager_->expireCallback_(task->getRunnable());
    +          manager_->expiredCount_++;
    +        }
    --- End diff --
    
    Yes, that needs to re-lock.  I will fix that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81525567
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -339,103 +357,88 @@ void ThreadManager::Impl::addWorker(size_t value) {
         idMap_.insert(std::pair<const Thread::id_t, shared_ptr<Thread> >((*ix)->getId(), *ix));
    --- End diff --
    
    I feel the changes in here are substantial and quite useful, in that I have added tests and re-enabled a number of tests, and ctest on windows runs in about a minute now.  As for getting rid of idMap_ and canSleep, I see no reason to keep them.  As for 0.10.0, all of these changes are pretty substantial and I don't know if they make it in or not.  I think they are substantial improvements in correctness and stability, and should be considered at least.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    What you described is a failing in the original set of thread factories to provide common functionality (detached, policy, priority) in the base class.  Only PosixThreadFactory has policy and priority, and yet it's quite likely that all of them would benefit from these.  I moved the detached information into the base class as a first step.  I could certainly add back setDetached into the base and mark it deprecated, however that breaks the contract that the servers require - that the detached disposition cannot change while the server is running, so it really doesn't belong... I still have no valid use case for changing detached disposition of a thread manager while it is in use.
    
    Now that said, I believe that Thread knows if it is detached or not; if Thread had a "joinIfJoinable()" method then that server requirement would go away since the TConnectedClient could just call that and it would do the right thing, and setDetached() could be added back in avoiding that particular breaking change.
    
    I don't find breaking changes to be a problem when they remove behavior that has no use case to support it, however.  There's a lot of technical debt in the concurrency area that still could be addressed - this was a first pass as trying to shore it up and ensure functions provided are actually tested and work, and that tests don't take 100 seconds each.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    https://travis-ci.org/apache/thrift/builds/164436835
    Job 3025.17
    https://travis-ci.org/apache/thrift/jobs/164436856 has in it:
    ctest -VV -E "(concurrency_test|processor_test)"
    Are we sure that processor_test runs in the Travis CI environment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    The latest AppVeyor build failed because the thread factory monitor test assumes that a monitor wait(milliseconds) will wake up at "milliseconds" however the only guarantee is that it will sleep "at least milliseconds".  As we see in a busy Appveyor build environment, there are significant delays, but it meets the criteria of sleeping "at least milliseconds" which is the only guarantee, so I am going to fix that test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    I think I found the reason for the poor test performance on Windows - the minimum granularity of a sleep call on Windows as defined by thrift_usleep() is 1 millisecond.  So in a test where we create 100,000 things there will be 100 seconds of delay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81527972
  
    --- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
    @@ -339,103 +357,88 @@ void ThreadManager::Impl::addWorker(size_t value) {
         idMap_.insert(std::pair<const Thread::id_t, shared_ptr<Thread> >((*ix)->getId(), *ix));
    --- End diff --
    
    Okay actually that comment was wrong.  maxMonitor_ will wake under two conditions - one is that there's enough room to add a task.  The other was not coded and that is during stop.  If there was a thread in add() and someone called stop() it needs to signal maxMonitor_ and if the thread manager is no longer started we need to throw the illegal state exception since it's bieng stopped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    It is pretty frustrating when you build, test, repeat many many times and everything looks good, and then the CI build fails, but that's why it's there right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift issue #1103: THRIFT-3932: fixed ThreadManager concurrency issues, add...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    I made this single change:
    
    > +++ b/lib/cpp/test/processor/ProcessorTest.cpp
    >  uint32_t checkNewConnEvents(const boost::shared_ptr<EventLog>& log) {
    >    // Check for an ET_CONN_CREATED event
    >   Event event = log->waitForEvent(2000);                                // <<< I added "2000"
    >    BOOST_CHECK_EQUAL(EventLog::ET_CONN_CREATED, event.type);
    
    After making that change I can run processor_test:
    
    > jking@jking-dvm51-buildenv-2:~/thrift-build$ ctest -R processor_test
    > Test project /home/jking/thrift-build
    >     Start 19: processor_test
    > 1/1 Test #19: processor_test ...................   Passed   33.84 sec
    > 
    > 100% tests passed, 0 tests failed out of 1
    > 
    > Total Test time (real) =  33.88 sec
    
    Looks like it's time to re-enable these tests!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] thrift pull request #1103: THRIFT-3932: fixed ThreadManager concurrency issu...

Posted by jeking3 <gi...@git.apache.org>.
Github user jeking3 commented on a diff in the pull request:

    https://github.com/apache/thrift/pull/1103#discussion_r81469715
  
    --- Diff: lib/cpp/src/thrift/concurrency/PosixThreadFactory.h ---
    @@ -39,7 +39,14 @@ class PosixThreadFactory : public ThreadFactory {
       /**
        * POSIX Thread scheduler policies
        */
    -  enum POLICY { OTHER, FIFO, ROUND_ROBIN };
    +  class Policy {
    +  public:
    +    enum value {
    +      OTHER       = 0,
    +      FIFO        = 1,
    +      ROUND_ROBIN = 2
    +    };
    --- End diff --
    
    Intention was to make it more readable.
    
    In code, OTHER, FIFO, ROUND_ROBIN mixed with NORMAL, LOWER, HIGHER... this adds a scope to them, i.e. "Policy::FIFO" and "Priority::NORMAL".
    
    I can revert it if needed, I just wanted to make the code more maintainable.  I suspect these options are not used that often anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---