You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Joris Van Remoortere <jo...@gmail.com> on 2014/10/15 23:55:31 UTC

Review Request 26783: Fix race condition in Latch

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26783/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Niklas Nielsen.


Repository: mesos-git


Description
-------

There is a race condition in Latch. The same pid can be terminated by 2 seperate threads simultaneously.


Diffs
-----

  3rdparty/libprocess/src/latch.cpp 89185ec 

Diff: https://reviews.apache.org/r/26783/diff/


Testing
-------

make check

The following test could segfault before: (after this patch it does not)

```
auto do_launch = []() {
  std::mutex mut;
  std::condition_variable cond;
  bool ready = true;
  Latch *latch = new Latch();
  const size_t num_iter = 10000;
  std::thread t1([&]() {
    for (size_t i = 0; i < num_iter; ++i) {
      {
        std::unique_lock<std::mutex> lock(mut);
        while (!ready) {
          cond.wait(lock);
        }
        ready = false;
      }
      latch->trigger();
    }
  });
  std::thread t2([&]() {
    for (size_t i = 0; i < num_iter; ++i) {
      latch->await();
      delete latch;
      latch = new Latch();
      std::lock_guard<std::mutex> lock(mut);
      ready = true;
      cond.notify_one();
    }
  });
  t1.join();
  t2.join();
};
const size_t nthread = 16;
std::vector<std::thread> tvec;
for (size_t i = 0; i < nthread; ++i) {
  tvec.emplace_back(do_launch);
}
for (auto& t : tvec) {
  t.join();
}
```


Thanks,

Joris Van Remoortere


Re: Review Request 26783: Fix race condition in Latch

Posted by Jie Yu <yu...@gmail.com>.

> On Oct. 17, 2014, 6:46 p.m., Niklas Nielsen wrote:
> > Hey Joris,
> > 
> > Can you file a JIRA ticket on the issue?
> > 
> > Also, do you know why multiple threads where calling the destructor?
> 
> Joris Van Remoortere wrote:
>     It is not the case that multiple threads are calling the destructor. It is the case that a thread joining (await) on the latch could then destroy the latch after it has joined. This condition can happen before the terminate(pid) in trigger() is fully completed. In this case terminate(pid) is called twice which can cause a segfault. The extra atomic check helps prevent the 2nd terminate.

I am curious why calling terminte() twice will lead to a segfault?


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26783/#review57188
-----------------------------------------------------------


On Oct. 22, 2014, 6:55 p.m., Joris Van Remoortere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26783/
> -----------------------------------------------------------
> 
> (Updated Oct. 22, 2014, 6:55 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Niklas Nielsen.
> 
> 
> Bugs: MESOS-1968
>     https://issues.apache.org/jira/browse/MESOS-1968
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> There is a race condition in Latch. The same pid can be terminated by 2 seperate threads simultaneously.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/latch.cpp 89185ec 
> 
> Diff: https://reviews.apache.org/r/26783/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> The following test could segfault before: (after this patch it does not)
> 
> ```
> auto do_launch = []() {
>   std::mutex mut;
>   std::condition_variable cond;
>   bool ready = true;
>   Latch *latch = new Latch();
>   const size_t num_iter = 10000;
>   std::thread t1([&]() {
>     for (size_t i = 0; i < num_iter; ++i) {
>       {
>         std::unique_lock<std::mutex> lock(mut);
>         while (!ready) {
>           cond.wait(lock);
>         }
>         ready = false;
>       }
>       latch->trigger();
>     }
>   });
>   std::thread t2([&]() {
>     for (size_t i = 0; i < num_iter; ++i) {
>       latch->await();
>       delete latch;
>       latch = new Latch();
>       std::lock_guard<std::mutex> lock(mut);
>       ready = true;
>       cond.notify_one();
>     }
>   });
>   t1.join();
>   t2.join();
> };
> const size_t nthread = 16;
> std::vector<std::thread> tvec;
> for (size_t i = 0; i < nthread; ++i) {
>   tvec.emplace_back(do_launch);
> }
> for (auto& t : tvec) {
>   t.join();
> }
> ```
> 
> 
> Thanks,
> 
> Joris Van Remoortere
> 
>


Re: Review Request 26783: Fix race condition in Latch

Posted by Joris Van Remoortere <jo...@gmail.com>.

> On Oct. 17, 2014, 6:46 p.m., Niklas Nielsen wrote:
> > Hey Joris,
> > 
> > Can you file a JIRA ticket on the issue?
> > 
> > Also, do you know why multiple threads where calling the destructor?

It is not the case that multiple threads are calling the destructor. It is the case that a thread joining (await) on the latch could then destroy the latch after it has joined. This condition can happen before the terminate(pid) in trigger() is fully completed. In this case terminate(pid) is called twice which can cause a segfault. The extra atomic check helps prevent the 2nd terminate.


- Joris


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26783/#review57188
-----------------------------------------------------------


On Oct. 22, 2014, 6:55 p.m., Joris Van Remoortere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26783/
> -----------------------------------------------------------
> 
> (Updated Oct. 22, 2014, 6:55 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Niklas Nielsen.
> 
> 
> Bugs: MESOS-1968
>     https://issues.apache.org/jira/browse/MESOS-1968
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> There is a race condition in Latch. The same pid can be terminated by 2 seperate threads simultaneously.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/latch.cpp 89185ec 
> 
> Diff: https://reviews.apache.org/r/26783/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> The following test could segfault before: (after this patch it does not)
> 
> ```
> auto do_launch = []() {
>   std::mutex mut;
>   std::condition_variable cond;
>   bool ready = true;
>   Latch *latch = new Latch();
>   const size_t num_iter = 10000;
>   std::thread t1([&]() {
>     for (size_t i = 0; i < num_iter; ++i) {
>       {
>         std::unique_lock<std::mutex> lock(mut);
>         while (!ready) {
>           cond.wait(lock);
>         }
>         ready = false;
>       }
>       latch->trigger();
>     }
>   });
>   std::thread t2([&]() {
>     for (size_t i = 0; i < num_iter; ++i) {
>       latch->await();
>       delete latch;
>       latch = new Latch();
>       std::lock_guard<std::mutex> lock(mut);
>       ready = true;
>       cond.notify_one();
>     }
>   });
>   t1.join();
>   t2.join();
> };
> const size_t nthread = 16;
> std::vector<std::thread> tvec;
> for (size_t i = 0; i < nthread; ++i) {
>   tvec.emplace_back(do_launch);
> }
> for (auto& t : tvec) {
>   t.join();
> }
> ```
> 
> 
> Thanks,
> 
> Joris Van Remoortere
> 
>


Re: Review Request 26783: Fix race condition in Latch

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26783/#review57188
-----------------------------------------------------------


Hey Joris,

Can you file a JIRA ticket on the issue?

Also, do you know why multiple threads where calling the destructor?

- Niklas Nielsen


On Oct. 15, 2014, 2:55 p.m., Joris Van Remoortere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26783/
> -----------------------------------------------------------
> 
> (Updated Oct. 15, 2014, 2:55 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Niklas Nielsen.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> There is a race condition in Latch. The same pid can be terminated by 2 seperate threads simultaneously.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/latch.cpp 89185ec 
> 
> Diff: https://reviews.apache.org/r/26783/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> The following test could segfault before: (after this patch it does not)
> 
> ```
> auto do_launch = []() {
>   std::mutex mut;
>   std::condition_variable cond;
>   bool ready = true;
>   Latch *latch = new Latch();
>   const size_t num_iter = 10000;
>   std::thread t1([&]() {
>     for (size_t i = 0; i < num_iter; ++i) {
>       {
>         std::unique_lock<std::mutex> lock(mut);
>         while (!ready) {
>           cond.wait(lock);
>         }
>         ready = false;
>       }
>       latch->trigger();
>     }
>   });
>   std::thread t2([&]() {
>     for (size_t i = 0; i < num_iter; ++i) {
>       latch->await();
>       delete latch;
>       latch = new Latch();
>       std::lock_guard<std::mutex> lock(mut);
>       ready = true;
>       cond.notify_one();
>     }
>   });
>   t1.join();
>   t2.join();
> };
> const size_t nthread = 16;
> std::vector<std::thread> tvec;
> for (size_t i = 0; i < nthread; ++i) {
>   tvec.emplace_back(do_launch);
> }
> for (auto& t : tvec) {
>   t.join();
> }
> ```
> 
> 
> Thanks,
> 
> Joris Van Remoortere
> 
>


Re: Review Request 26783: Fix race condition in Latch

Posted by Mesos ReviewBot <de...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26783/#review56858
-----------------------------------------------------------


Patch looks great!

Reviews applied: [26783]

All tests passed.

- Mesos ReviewBot


On Oct. 15, 2014, 9:55 p.m., Joris Van Remoortere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26783/
> -----------------------------------------------------------
> 
> (Updated Oct. 15, 2014, 9:55 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Niklas Nielsen.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> There is a race condition in Latch. The same pid can be terminated by 2 seperate threads simultaneously.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/latch.cpp 89185ec 
> 
> Diff: https://reviews.apache.org/r/26783/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> The following test could segfault before: (after this patch it does not)
> 
> ```
> auto do_launch = []() {
>   std::mutex mut;
>   std::condition_variable cond;
>   bool ready = true;
>   Latch *latch = new Latch();
>   const size_t num_iter = 10000;
>   std::thread t1([&]() {
>     for (size_t i = 0; i < num_iter; ++i) {
>       {
>         std::unique_lock<std::mutex> lock(mut);
>         while (!ready) {
>           cond.wait(lock);
>         }
>         ready = false;
>       }
>       latch->trigger();
>     }
>   });
>   std::thread t2([&]() {
>     for (size_t i = 0; i < num_iter; ++i) {
>       latch->await();
>       delete latch;
>       latch = new Latch();
>       std::lock_guard<std::mutex> lock(mut);
>       ready = true;
>       cond.notify_one();
>     }
>   });
>   t1.join();
>   t2.join();
> };
> const size_t nthread = 16;
> std::vector<std::thread> tvec;
> for (size_t i = 0; i < nthread; ++i) {
>   tvec.emplace_back(do_launch);
> }
> for (auto& t : tvec) {
>   t.join();
> }
> ```
> 
> 
> Thanks,
> 
> Joris Van Remoortere
> 
>


Re: Review Request 26783: Fix race condition in Latch

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26783/#review57879
-----------------------------------------------------------

Ship it!


Ship It!

- Niklas Nielsen


On Oct. 22, 2014, 11:55 a.m., Joris Van Remoortere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26783/
> -----------------------------------------------------------
> 
> (Updated Oct. 22, 2014, 11:55 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Niklas Nielsen.
> 
> 
> Bugs: MESOS-1968
>     https://issues.apache.org/jira/browse/MESOS-1968
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> There is a race condition in Latch. The same pid can be terminated by 2 seperate threads simultaneously.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/latch.cpp 89185ec 
> 
> Diff: https://reviews.apache.org/r/26783/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> The following test could segfault before: (after this patch it does not)
> 
> ```
> auto do_launch = []() {
>   std::mutex mut;
>   std::condition_variable cond;
>   bool ready = true;
>   Latch *latch = new Latch();
>   const size_t num_iter = 10000;
>   std::thread t1([&]() {
>     for (size_t i = 0; i < num_iter; ++i) {
>       {
>         std::unique_lock<std::mutex> lock(mut);
>         while (!ready) {
>           cond.wait(lock);
>         }
>         ready = false;
>       }
>       latch->trigger();
>     }
>   });
>   std::thread t2([&]() {
>     for (size_t i = 0; i < num_iter; ++i) {
>       latch->await();
>       delete latch;
>       latch = new Latch();
>       std::lock_guard<std::mutex> lock(mut);
>       ready = true;
>       cond.notify_one();
>     }
>   });
>   t1.join();
>   t2.join();
> };
> const size_t nthread = 16;
> std::vector<std::thread> tvec;
> for (size_t i = 0; i < nthread; ++i) {
>   tvec.emplace_back(do_launch);
> }
> for (auto& t : tvec) {
>   t.join();
> }
> ```
> 
> 
> Thanks,
> 
> Joris Van Remoortere
> 
>


Re: Review Request 26783: Fix race condition in Latch

Posted by Joris Van Remoortere <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26783/
-----------------------------------------------------------

(Updated Oct. 22, 2014, 6:55 p.m.)


Review request for mesos, Benjamin Hindman and Niklas Nielsen.


Changes
-------

Created JIRA


Bugs: MESOS-1968
    https://issues.apache.org/jira/browse/MESOS-1968


Repository: mesos-git


Description
-------

There is a race condition in Latch. The same pid can be terminated by 2 seperate threads simultaneously.


Diffs
-----

  3rdparty/libprocess/src/latch.cpp 89185ec 

Diff: https://reviews.apache.org/r/26783/diff/


Testing
-------

make check

The following test could segfault before: (after this patch it does not)

```
auto do_launch = []() {
  std::mutex mut;
  std::condition_variable cond;
  bool ready = true;
  Latch *latch = new Latch();
  const size_t num_iter = 10000;
  std::thread t1([&]() {
    for (size_t i = 0; i < num_iter; ++i) {
      {
        std::unique_lock<std::mutex> lock(mut);
        while (!ready) {
          cond.wait(lock);
        }
        ready = false;
      }
      latch->trigger();
    }
  });
  std::thread t2([&]() {
    for (size_t i = 0; i < num_iter; ++i) {
      latch->await();
      delete latch;
      latch = new Latch();
      std::lock_guard<std::mutex> lock(mut);
      ready = true;
      cond.notify_one();
    }
  });
  t1.join();
  t2.join();
};
const size_t nthread = 16;
std::vector<std::thread> tvec;
for (size_t i = 0; i < nthread; ++i) {
  tvec.emplace_back(do_launch);
}
for (auto& t : tvec) {
  t.join();
}
```


Thanks,

Joris Van Remoortere