You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Satish Saley <sa...@gmail.com> on 2017/11/10 20:19:33 UTC

Review Request 63739: OOZIE-3113 Retry for ZK lock release

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/
-----------------------------------------------------------

Review request for oozie.


Bugs: OOZIE-3113
    https://issues.apache.org/jira/browse/OOZIE-3113


Repository: oozie-git


Description
-------

OOZIE-3113 Retry for ZK lock release


Diffs
-----

  core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
  core/src/main/resources/oozie-default.xml 8285df0a7 
  core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 


Diff: https://reviews.apache.org/r/63739/diff/1/


Testing
-------

Tested locally


Thanks,

Satish Saley


Re: Review Request 63739: OOZIE-3113 Retry for ZK lock release

Posted by András Piros via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/#review194623
-----------------------------------------------------------


Ship it!




Ship It!

- András Piros


On Dec. 27, 2017, 10:13 p.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63739/
> -----------------------------------------------------------
> 
> (Updated Dec. 27, 2017, 10:13 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-3113
>     https://issues.apache.org/jira/browse/OOZIE-3113
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> OOZIE-3113 Retry for ZK lock release
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
>   core/src/main/resources/oozie-default.xml 8285df0a7 
>   core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 
> 
> 
> Diff: https://reviews.apache.org/r/63739/diff/4/
> 
> 
> Testing
> -------
> 
> Tested locally
> 
> 
> Thanks,
> 
> Satish Saley
> 
>


Re: Review Request 63739: OOZIE-3113 Retry for ZK lock release

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/#review194554
-----------------------------------------------------------


Ship it!




Please commit after you get +1 from András as well

- Rohini Palaniswamy


On Dec. 27, 2017, 10:13 p.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63739/
> -----------------------------------------------------------
> 
> (Updated Dec. 27, 2017, 10:13 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-3113
>     https://issues.apache.org/jira/browse/OOZIE-3113
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> OOZIE-3113 Retry for ZK lock release
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
>   core/src/main/resources/oozie-default.xml 8285df0a7 
>   core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 
> 
> 
> Diff: https://reviews.apache.org/r/63739/diff/4/
> 
> 
> Testing
> -------
> 
> Tested locally
> 
> 
> Thanks,
> 
> Satish Saley
> 
>


Re: Review Request 63739: OOZIE-3113 Retry for ZK lock release

Posted by Satish Saley <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/
-----------------------------------------------------------

(Updated Dec. 27, 2017, 2:13 p.m.)


Review request for oozie.


Bugs: OOZIE-3113
    https://issues.apache.org/jira/browse/OOZIE-3113


Repository: oozie-git


Description
-------

OOZIE-3113 Retry for ZK lock release


Diffs (updated)
-----

  core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
  core/src/main/resources/oozie-default.xml 8285df0a7 
  core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 


Diff: https://reviews.apache.org/r/63739/diff/4/

Changes: https://reviews.apache.org/r/63739/diff/3-4/


Testing
-------

Tested locally


Thanks,

Satish Saley


Re: Review Request 63739: OOZIE-3113 Retry for ZK lock release

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/#review194551
-----------------------------------------------------------




core/src/test/java/org/apache/oozie/service/TestZKLocksService.java
Lines 603 (patched)
<https://reviews.apache.org/r/63739/#comment273382>

    Why is this needed?


- Rohini Palaniswamy


On Dec. 23, 2017, 12:43 a.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63739/
> -----------------------------------------------------------
> 
> (Updated Dec. 23, 2017, 12:43 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-3113
>     https://issues.apache.org/jira/browse/OOZIE-3113
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> OOZIE-3113 Retry for ZK lock release
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
>   core/src/main/resources/oozie-default.xml 8285df0a7 
>   core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 
> 
> 
> Diff: https://reviews.apache.org/r/63739/diff/3/
> 
> 
> Testing
> -------
> 
> Tested locally
> 
> 
> Thanks,
> 
> Satish Saley
> 
>


Re: Review Request 63739: OOZIE-3113 Retry for ZK lock release

Posted by Satish Saley <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/
-----------------------------------------------------------

(Updated Dec. 22, 2017, 4:43 p.m.)


Review request for oozie.


Bugs: OOZIE-3113
    https://issues.apache.org/jira/browse/OOZIE-3113


Repository: oozie-git


Description
-------

OOZIE-3113 Retry for ZK lock release


Diffs (updated)
-----

  core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
  core/src/main/resources/oozie-default.xml 8285df0a7 
  core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 


Diff: https://reviews.apache.org/r/63739/diff/3/

Changes: https://reviews.apache.org/r/63739/diff/2-3/


Testing
-------

Tested locally


Thanks,

Satish Saley


Re: Review Request 63739: OOZIE-3113 Retry for ZK lock release

Posted by Rohini Palaniswamy <ro...@gmail.com>.

> On Nov. 30, 2017, 5:33 p.m., András Piros wrote:
> > core/src/main/java/org/apache/oozie/service/ZKLocksService.java
> > Lines 239 (patched)
> > <https://reviews.apache.org/r/63739/diff/2/?file=1901904#file1901904line239>
> >
> >     Yep, blocking a thread for 30-ish minutes doesn't seem like a good idea. E.g. ZooKeeper outage lasts for only a few seconds under heavy load - it can happen that Oozie JVM ends up blocking lots of `Thread`s in `TIMED_WAITING` state.
> >     
> >     Can you think of some, more elaborate solution? Maybe having an `ExecutorService` just to try to unlock, and `Callable` instances that would try unlocking. In that case, the number of `Thread`s blocked in `TIMED_WAITING` state would be:
> >     * configurable
> >     * limited to a manageable extent

There are couple of cases where ZK outages require Ops team restarting the quorum. From finding the issue to having it restarted, 30 mins would easily pass. Instead of exponential retry, the retry should be every 10 seconds. Retrying after 2,4,8 and 16 minutes is waste of time and blocking thread unnecessarily as you mentioned if ZK is back up. Otherwise it does not matter if retry continues for 30 mins or 1 hour (NN, RM retry time limit is also about the same). Everything in Oozie is going to be totally blocked if ZK is down as you can't acquire locks for any command. Having higher retry time avoids having to restart Oozie.


> On Nov. 30, 2017, 5:33 p.m., András Piros wrote:
> > core/src/test/java/org/apache/oozie/service/TestZKLocksService.java
> > Lines 579-598 (patched)
> > <https://reviews.apache.org/r/63739/diff/2/?file=1901906#file1901906line579>
> >
> >     Extract nested class w/ a catchy name.

Prefer to disagree on this one. I think it is a overkill to extract into one method and one class for 20 lines of test code. Unless the method is being reused or it is a big chunk of code, it does not make sense to extract into methods and that too in a test class.

More methods + more classes != Clean code. It kills readability as well when one has to unnecessarily jump to multiple places to finish reading logic of one method.


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/#review192309
-----------------------------------------------------------


On Nov. 27, 2017, 7:22 p.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63739/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2017, 7:22 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-3113
>     https://issues.apache.org/jira/browse/OOZIE-3113
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> OOZIE-3113 Retry for ZK lock release
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
>   core/src/main/resources/oozie-default.xml 8285df0a7 
>   core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 
> 
> 
> Diff: https://reviews.apache.org/r/63739/diff/2/
> 
> 
> Testing
> -------
> 
> Tested locally
> 
> 
> Thanks,
> 
> Satish Saley
> 
>


Re: Review Request 63739: OOZIE-3113 Retry for ZK lock release

Posted by Satish Saley <sa...@gmail.com>.

> On Nov. 30, 2017, 9:33 a.m., András Piros wrote:
> > core/src/main/java/org/apache/oozie/service/ZKLocksService.java
> > Lines 239 (patched)
> > <https://reviews.apache.org/r/63739/diff/2/?file=1901904#file1901904line239>
> >
> >     Yep, blocking a thread for 30-ish minutes doesn't seem like a good idea. E.g. ZooKeeper outage lasts for only a few seconds under heavy load - it can happen that Oozie JVM ends up blocking lots of `Thread`s in `TIMED_WAITING` state.
> >     
> >     Can you think of some, more elaborate solution? Maybe having an `ExecutorService` just to try to unlock, and `Callable` instances that would try unlocking. In that case, the number of `Thread`s blocked in `TIMED_WAITING` state would be:
> >     * configurable
> >     * limited to a manageable extent
> 
> Rohini Palaniswamy wrote:
>     There are couple of cases where ZK outages require Ops team restarting the quorum. From finding the issue to having it restarted, 30 mins would easily pass. Instead of exponential retry, the retry should be every 10 seconds. Retrying after 2,4,8 and 16 minutes is waste of time and blocking thread unnecessarily as you mentioned if ZK is back up. Otherwise it does not matter if retry continues for 30 mins or 1 hour (NN, RM retry time limit is also about the same). Everything in Oozie is going to be totally blocked if ZK is down as you can't acquire locks for any command. Having higher retry time avoids having to restart Oozie.

If ZooKeeper outages lasts for only a few seconds then the thread won't be blocked for 30 minutes. When it wakes up from sleep, it will release the lock successfully.


> On Nov. 30, 2017, 9:33 a.m., András Piros wrote:
> > core/src/test/java/org/apache/oozie/service/TestZKLocksService.java
> > Lines 563 (patched)
> > <https://reviews.apache.org/r/63739/diff/2/?file=1901906#file1901906line563>
> >
> >     What about `Services.get().get(ZKLocksService.class)`?

Similar to other test cases in this class because we are not initializing ZKLocksService in setup method.


> On Nov. 30, 2017, 9:33 a.m., András Piros wrote:
> > core/src/test/java/org/apache/oozie/service/TestZKLocksService.java
> > Lines 567-574 (patched)
> > <https://reviews.apache.org/r/63739/diff/2/?file=1901906#file1901906line567>
> >
> >     Extract method `setupSilentLock()`.

Overkill, dropping as Rohini suggested below.


- Satish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/#review192309
-----------------------------------------------------------


On Nov. 27, 2017, 11:22 a.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63739/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2017, 11:22 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-3113
>     https://issues.apache.org/jira/browse/OOZIE-3113
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> OOZIE-3113 Retry for ZK lock release
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
>   core/src/main/resources/oozie-default.xml 8285df0a7 
>   core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 
> 
> 
> Diff: https://reviews.apache.org/r/63739/diff/2/
> 
> 
> Testing
> -------
> 
> Tested locally
> 
> 
> Thanks,
> 
> Satish Saley
> 
>


Re: Review Request 63739: OOZIE-3113 Retry for ZK lock release

Posted by András Piros via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/#review192309
-----------------------------------------------------------




core/src/main/java/org/apache/oozie/service/ZKLocksService.java
Lines 59 (patched)
<https://reviews.apache.org/r/63739/#comment270389>

    You need `CONF_PREFIX` here.



core/src/main/java/org/apache/oozie/service/ZKLocksService.java
Lines 209-211 (patched)
<https://reviews.apache.org/r/63739/#comment270392>

    While I understand this `Exception` handling is about to mimic previous behavior, I'd go on and investigate whether we can `catch` separate `Exception` subclasses, and whether it's OK to swallow these.



core/src/main/java/org/apache/oozie/service/ZKLocksService.java
Lines 222 (patched)
<https://reviews.apache.org/r/63739/#comment270394>

    I find having this variable as seconds would be more flexible for the user. Currently no means of waiting e.g. 5 seconds between ZooKeeper outages.



core/src/main/java/org/apache/oozie/service/ZKLocksService.java
Line 215 (original), 236 (patched)
<https://reviews.apache.org/r/63739/#comment270395>

    I'd only retry when there is a ZooKeeper connectivity issue, not on other types of `Exception`s.



core/src/main/java/org/apache/oozie/service/ZKLocksService.java
Lines 239 (patched)
<https://reviews.apache.org/r/63739/#comment270393>

    Yep, blocking a thread for 30-ish minutes doesn't seem like a good idea. E.g. ZooKeeper outage lasts for only a few seconds under heavy load - it can happen that Oozie JVM ends up blocking lots of `Thread`s in `TIMED_WAITING` state.
    
    Can you think of some, more elaborate solution? Maybe having an `ExecutorService` just to try to unlock, and `Callable` instances that would try unlocking. In that case, the number of `Thread`s blocked in `TIMED_WAITING` state would be:
    * configurable
    * limited to a manageable extent



core/src/test/java/org/apache/oozie/service/TestZKLocksService.java
Lines 563 (patched)
<https://reviews.apache.org/r/63739/#comment270396>

    What about `Services.get().get(ZKLocksService.class)`?



core/src/test/java/org/apache/oozie/service/TestZKLocksService.java
Lines 567-574 (patched)
<https://reviews.apache.org/r/63739/#comment270397>

    Extract method `setupSilentLock()`.



core/src/test/java/org/apache/oozie/service/TestZKLocksService.java
Lines 579-598 (patched)
<https://reviews.apache.org/r/63739/#comment270398>

    Extract nested class w/ a catchy name.



core/src/test/java/org/apache/oozie/service/TestZKLocksService.java
Lines 594 (patched)
<https://reviews.apache.org/r/63739/#comment270399>

    `LOG.error(e)`.


- András Piros


On Nov. 27, 2017, 7:22 p.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63739/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2017, 7:22 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-3113
>     https://issues.apache.org/jira/browse/OOZIE-3113
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> OOZIE-3113 Retry for ZK lock release
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
>   core/src/main/resources/oozie-default.xml 8285df0a7 
>   core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 
> 
> 
> Diff: https://reviews.apache.org/r/63739/diff/2/
> 
> 
> Testing
> -------
> 
> Tested locally
> 
> 
> Thanks,
> 
> Satish Saley
> 
>


Re: Review Request 63739: OOZIE-3113 Retry for ZK lock release

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/#review194265
-----------------------------------------------------------




core/src/main/java/org/apache/oozie/service/ZKLocksService.java
Lines 240 (patched)
<https://reviews.apache.org/r/63739/#comment272993>

    You should log the retry number as well



core/src/test/java/org/apache/oozie/service/TestZKLocksService.java
Lines 584 (patched)
<https://reviews.apache.org/r/63739/#comment272994>

    Try to keep runtime of every test under 30 sec - 1 min.



core/src/test/java/org/apache/oozie/service/TestZKLocksService.java
Lines 601 (patched)
<https://reviews.apache.org/r/63739/#comment272995>

    You should have a catch block and fail the test if there is any exception


- Rohini Palaniswamy


On Nov. 27, 2017, 7:22 p.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63739/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2017, 7:22 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-3113
>     https://issues.apache.org/jira/browse/OOZIE-3113
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> OOZIE-3113 Retry for ZK lock release
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
>   core/src/main/resources/oozie-default.xml 8285df0a7 
>   core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 
> 
> 
> Diff: https://reviews.apache.org/r/63739/diff/2/
> 
> 
> Testing
> -------
> 
> Tested locally
> 
> 
> Thanks,
> 
> Satish Saley
> 
>


Re: Review Request 63739: OOZIE-3113 Retry for ZK lock release

Posted by Satish Saley <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63739/
-----------------------------------------------------------

(Updated Nov. 27, 2017, 11:22 a.m.)


Review request for oozie.


Bugs: OOZIE-3113
    https://issues.apache.org/jira/browse/OOZIE-3113


Repository: oozie-git


Description
-------

OOZIE-3113 Retry for ZK lock release


Diffs (updated)
-----

  core/src/main/java/org/apache/oozie/service/ZKLocksService.java 2c71c001f 
  core/src/main/resources/oozie-default.xml 8285df0a7 
  core/src/test/java/org/apache/oozie/service/TestZKLocksService.java d04f04e80 


Diff: https://reviews.apache.org/r/63739/diff/2/

Changes: https://reviews.apache.org/r/63739/diff/1-2/


Testing
-------

Tested locally


Thanks,

Satish Saley