You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Marta Kuczora <ku...@cloudera.com> on 2016/10/07 14:00:50 UTC

Review Request 52631: HIVE-14839: Improve the stability of TestSessionManagerMetrics

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52631/
-----------------------------------------------------------

Review request for hive, Aihua Xu, Miklos Csanady, Peter Vary, Szehon Ho, and Barna Zsombor Klara.


Bugs: HIVE-14839
    https://issues.apache.org/jira/browse/HIVE-14839


Repository: hive-git


Description
-------

The TestSessionManagerMetrics fails occasionally with the following error: 
org.junit.ComparisonFailure: expected:<[0]> but was:<[1]>
	at org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics(TestSessionManagerMetrics.java:98)

Changed the synchronization between the tasks a bit to avoid these failures.
With this patch the test would work do the following steps:
- Submit four tasks.
- Wait with the metrics verification, until the first two tasks are running.
  This is done by invoking await on the "ready" barrier.
  If, for some reason, the tasks are not started within a timeout period, make the test fail.
- Make the tasks wait until the metrics are checked.
  This is done by invoking await on the "completed" barrier.
- Verify the metrics.
- Let the first two tasks complete by breaking the "complete" barrier.
  When the first two tasks completed, the remaining two tasks will be removed from the queue and started.
- Wait until the remaining tasks are running by using the "ready" barrier again.
  Do the metrics check only if they are started to avoid the failures when the queue size was not 0.
  If, for some reason, the tasks are not started within a timeout period, make the test fail.
- Verify the metrics.
- Let the remaining tasks complete by breaking the "completed" barrier.


Diffs
-----

  service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java 5511c54 

Diff: https://reviews.apache.org/r/52631/diff/


Testing
-------

The change effects only a unit test.
Ran the test many times locally.
Added random sleeps to the tasks to simulate the delay when they start or complete and ran the test many times to see if the synch between the tasks is working correctly.


Thanks,

Marta Kuczora


Re: Review Request 52631: HIVE-14839: Improve the stability of TestSessionManagerMetrics

Posted by Aihua Xu <ax...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52631/#review151971
-----------------------------------------------------------




service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java (line 106)
<https://reviews.apache.org/r/52631/#comment220646>

    OK. That makes sense.


- Aihua Xu


On Oct. 7, 2016, 2 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52631/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2016, 2 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Miklos Csanady, Peter Vary, Szehon Ho, and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-14839
>     https://issues.apache.org/jira/browse/HIVE-14839
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The TestSessionManagerMetrics fails occasionally with the following error: 
> org.junit.ComparisonFailure: expected:<[0]> but was:<[1]>
> 	at org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics(TestSessionManagerMetrics.java:98)
> 
> Changed the synchronization between the tasks a bit to avoid these failures.
> With this patch the test would work do the following steps:
> - Submit four tasks.
> - Wait with the metrics verification, until the first two tasks are running.
>   This is done by invoking await on the "ready" barrier.
>   If, for some reason, the tasks are not started within a timeout period, make the test fail.
> - Make the tasks wait until the metrics are checked.
>   This is done by invoking await on the "completed" barrier.
> - Verify the metrics.
> - Let the first two tasks complete by breaking the "complete" barrier.
>   When the first two tasks completed, the remaining two tasks will be removed from the queue and started.
> - Wait until the remaining tasks are running by using the "ready" barrier again.
>   Do the metrics check only if they are started to avoid the failures when the queue size was not 0.
>   If, for some reason, the tasks are not started within a timeout period, make the test fail.
> - Verify the metrics.
> - Let the remaining tasks complete by breaking the "completed" barrier.
> 
> 
> Diffs
> -----
> 
>   service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java 5511c54 
> 
> Diff: https://reviews.apache.org/r/52631/diff/
> 
> 
> Testing
> -------
> 
> The change effects only a unit test.
> Ran the test many times locally.
> Added random sleeps to the tasks to simulate the delay when they start or complete and ran the test many times to see if the synch between the tasks is working correctly.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 52631: HIVE-14839: Improve the stability of TestSessionManagerMetrics

Posted by Aihua Xu <ax...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52631/#review151972
-----------------------------------------------------------


Ship it!




Ship It!

- Aihua Xu


On Oct. 7, 2016, 2 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52631/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2016, 2 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Miklos Csanady, Peter Vary, Szehon Ho, and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-14839
>     https://issues.apache.org/jira/browse/HIVE-14839
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The TestSessionManagerMetrics fails occasionally with the following error: 
> org.junit.ComparisonFailure: expected:<[0]> but was:<[1]>
> 	at org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics(TestSessionManagerMetrics.java:98)
> 
> Changed the synchronization between the tasks a bit to avoid these failures.
> With this patch the test would work do the following steps:
> - Submit four tasks.
> - Wait with the metrics verification, until the first two tasks are running.
>   This is done by invoking await on the "ready" barrier.
>   If, for some reason, the tasks are not started within a timeout period, make the test fail.
> - Make the tasks wait until the metrics are checked.
>   This is done by invoking await on the "completed" barrier.
> - Verify the metrics.
> - Let the first two tasks complete by breaking the "complete" barrier.
>   When the first two tasks completed, the remaining two tasks will be removed from the queue and started.
> - Wait until the remaining tasks are running by using the "ready" barrier again.
>   Do the metrics check only if they are started to avoid the failures when the queue size was not 0.
>   If, for some reason, the tasks are not started within a timeout period, make the test fail.
> - Verify the metrics.
> - Let the remaining tasks complete by breaking the "completed" barrier.
> 
> 
> Diffs
> -----
> 
>   service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java 5511c54 
> 
> Diff: https://reviews.apache.org/r/52631/diff/
> 
> 
> Testing
> -------
> 
> The change effects only a unit test.
> Ran the test many times locally.
> Added random sleeps to the tasks to simulate the delay when they start or complete and ran the test many times to see if the synch between the tasks is working correctly.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 52631: HIVE-14839: Improve the stability of TestSessionManagerMetrics

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Oct. 7, 2016, 2:26 p.m., Aihua Xu wrote:
> > service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java, line 113
> > <https://reviews.apache.org/r/52631/diff/1/?file=1526447#file1526447line113>
> >
> >     Seems you could run into the following case which will lead to a unwanted failure.
> >     
> >     Assume 3 or 4 backgroundOperation threads call ready.await() first, then the call in the main thread (line 112) will have 4 or 5 threads merge. Then the main threads get to line 124, seems it will timeout since no threads will call ready.await() any more?
> >     
> >     Just checked CyclicBarrier doc and not very sure how it should be used actually. Maybe I'm wrong.

As far as I understand, in this test only two tasks will run at the same time. The corePoolSize and maximumPoolSize of the ThreadPoolExecutor is the value of the hive.server2.async.exec.threads parameter. It is set to 2 in this test. So when the 4 tasks are submitted, only two of them will be assigned to a thread and get started. The other two will wait in a queue. The run method of the two tasks in the queue won't get called until they are staying in the queue, so I think the case when all 4 tasks call await on the same barrier won't happen.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52631/#review151799
-----------------------------------------------------------


On Oct. 7, 2016, 2 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52631/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2016, 2 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Miklos Csanady, Peter Vary, Szehon Ho, and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-14839
>     https://issues.apache.org/jira/browse/HIVE-14839
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The TestSessionManagerMetrics fails occasionally with the following error: 
> org.junit.ComparisonFailure: expected:<[0]> but was:<[1]>
> 	at org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics(TestSessionManagerMetrics.java:98)
> 
> Changed the synchronization between the tasks a bit to avoid these failures.
> With this patch the test would work do the following steps:
> - Submit four tasks.
> - Wait with the metrics verification, until the first two tasks are running.
>   This is done by invoking await on the "ready" barrier.
>   If, for some reason, the tasks are not started within a timeout period, make the test fail.
> - Make the tasks wait until the metrics are checked.
>   This is done by invoking await on the "completed" barrier.
> - Verify the metrics.
> - Let the first two tasks complete by breaking the "complete" barrier.
>   When the first two tasks completed, the remaining two tasks will be removed from the queue and started.
> - Wait until the remaining tasks are running by using the "ready" barrier again.
>   Do the metrics check only if they are started to avoid the failures when the queue size was not 0.
>   If, for some reason, the tasks are not started within a timeout period, make the test fail.
> - Verify the metrics.
> - Let the remaining tasks complete by breaking the "completed" barrier.
> 
> 
> Diffs
> -----
> 
>   service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java 5511c54 
> 
> Diff: https://reviews.apache.org/r/52631/diff/
> 
> 
> Testing
> -------
> 
> The change effects only a unit test.
> Ran the test many times locally.
> Added random sleeps to the tasks to simulate the delay when they start or complete and ran the test many times to see if the synch between the tasks is working correctly.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 52631: HIVE-14839: Improve the stability of TestSessionManagerMetrics

Posted by Aihua Xu <ax...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52631/#review151799
-----------------------------------------------------------




service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java (line 106)
<https://reviews.apache.org/r/52631/#comment220330>

    Seems you could run into the following case which will lead to a unwanted failure.
    
    Assume 3 or 4 backgroundOperation threads call ready.await() first, then the call in the main thread (line 112) will have 4 or 5 threads merge. Then the main threads get to line 124, seems it will timeout since no threads will call ready.await() any more?
    
    Just checked CyclicBarrier doc and not very sure how it should be used actually. Maybe I'm wrong.


- Aihua Xu


On Oct. 7, 2016, 2 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52631/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2016, 2 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Miklos Csanady, Peter Vary, Szehon Ho, and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-14839
>     https://issues.apache.org/jira/browse/HIVE-14839
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The TestSessionManagerMetrics fails occasionally with the following error: 
> org.junit.ComparisonFailure: expected:<[0]> but was:<[1]>
> 	at org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics(TestSessionManagerMetrics.java:98)
> 
> Changed the synchronization between the tasks a bit to avoid these failures.
> With this patch the test would work do the following steps:
> - Submit four tasks.
> - Wait with the metrics verification, until the first two tasks are running.
>   This is done by invoking await on the "ready" barrier.
>   If, for some reason, the tasks are not started within a timeout period, make the test fail.
> - Make the tasks wait until the metrics are checked.
>   This is done by invoking await on the "completed" barrier.
> - Verify the metrics.
> - Let the first two tasks complete by breaking the "complete" barrier.
>   When the first two tasks completed, the remaining two tasks will be removed from the queue and started.
> - Wait until the remaining tasks are running by using the "ready" barrier again.
>   Do the metrics check only if they are started to avoid the failures when the queue size was not 0.
>   If, for some reason, the tasks are not started within a timeout period, make the test fail.
> - Verify the metrics.
> - Let the remaining tasks complete by breaking the "completed" barrier.
> 
> 
> Diffs
> -----
> 
>   service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java 5511c54 
> 
> Diff: https://reviews.apache.org/r/52631/diff/
> 
> 
> Testing
> -------
> 
> The change effects only a unit test.
> Ran the test many times locally.
> Added random sleeps to the tasks to simulate the delay when they start or complete and ran the test many times to see if the synch between the tasks is working correctly.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 52631: HIVE-14839: Improve the stability of TestSessionManagerMetrics

Posted by Barna Zsombor Klara <zs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52631/#review151796
-----------------------------------------------------------


Ship it!




+1. Thanks for the patch.

- Barna Zsombor Klara


On Oct. 7, 2016, 2 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52631/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2016, 2 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Miklos Csanady, Peter Vary, Szehon Ho, and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-14839
>     https://issues.apache.org/jira/browse/HIVE-14839
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The TestSessionManagerMetrics fails occasionally with the following error: 
> org.junit.ComparisonFailure: expected:<[0]> but was:<[1]>
> 	at org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics(TestSessionManagerMetrics.java:98)
> 
> Changed the synchronization between the tasks a bit to avoid these failures.
> With this patch the test would work do the following steps:
> - Submit four tasks.
> - Wait with the metrics verification, until the first two tasks are running.
>   This is done by invoking await on the "ready" barrier.
>   If, for some reason, the tasks are not started within a timeout period, make the test fail.
> - Make the tasks wait until the metrics are checked.
>   This is done by invoking await on the "completed" barrier.
> - Verify the metrics.
> - Let the first two tasks complete by breaking the "complete" barrier.
>   When the first two tasks completed, the remaining two tasks will be removed from the queue and started.
> - Wait until the remaining tasks are running by using the "ready" barrier again.
>   Do the metrics check only if they are started to avoid the failures when the queue size was not 0.
>   If, for some reason, the tasks are not started within a timeout period, make the test fail.
> - Verify the metrics.
> - Let the remaining tasks complete by breaking the "completed" barrier.
> 
> 
> Diffs
> -----
> 
>   service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java 5511c54 
> 
> Diff: https://reviews.apache.org/r/52631/diff/
> 
> 
> Testing
> -------
> 
> The change effects only a unit test.
> Ran the test many times locally.
> Added random sleeps to the tasks to simulate the delay when they start or complete and ran the test many times to see if the synch between the tasks is working correctly.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>