You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Purshotam Shah <pu...@yahoo-inc.com> on 2014/04/08 02:44:51 UTC

Re: Review Request 18762: OOZIE-1709 CoordELFunctions.getCurrentInstance() is expensive

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18762/#review39621
-----------------------------------------------------------



core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java
<https://reviews.apache.org/r/18762/#comment72074>

    Can we put a check, we are logging same message repetitively..



core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java
<https://reviews.apache.org/r/18762/#comment72365>

    instanceCount, is not this same as coord.lastActionNumber?
    
    At least in case of nominal time, this should same as coord.lastActionNumber. We can use that to optimize query.



core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java
<https://reviews.apache.org/r/18762/#comment72364>

    I haven't tested this. Will this work for different timezone?


- Purshotam Shah


On March 26, 2014, 10:09 a.m., shwethags wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18762/
> -----------------------------------------------------------
> 
> (Updated March 26, 2014, 10:09 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-1709
>     https://issues.apache.org/jira/browse/OOZIE-1709
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> CoordELFunctions.getCurrentInstance() has this code:
>     while (current.compareTo(calEffectiveTime) <= 0) {
>             current = (Calendar) origCurrent.clone();
>             instanceCount[0]++;
>             current.add(dsTimeUnit.getCalendarUnit(), instanceCount[0] * dsFreq);
>         }
> 
> For coords with smaller frequency and start time in very past, this is very expensive. On prod, we have seen materialisation of each instance taking few mins sometimes for coords with 1 min frequency
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java d73bc7d 
>   core/src/test/java/org/apache/oozie/coord/TestCoordELFunctions.java 34a428f 
> 
> Diff: https://reviews.apache.org/r/18762/diff/
> 
> 
> Testing
> -------
> 
> UT
> 
> 
> Thanks,
> 
> shwethags
> 
>


Re: Review Request 18762: OOZIE-1709 CoordELFunctions.getCurrentInstance() is expensive

Posted by sh...@inmobi.com.

> On April 8, 2014, 12:44 a.m., Purshotam Shah wrote:
> > core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java, line 1314
> > <https://reviews.apache.org/r/18762/diff/2/?file=537328#file537328line1314>
> >
> >     instanceCount, is not this same as coord.lastActionNumber?
> >     
> >     At least in case of nominal time, this should same as coord.lastActionNumber. We can use that to optimize query.
> 
> shwethags wrote:
>     instanceCount is the data instance count with respect to the coord nominal time. coord.lastActionNumber is the last materialised action. They are not related and can't be re-used
> 
> Purshotam Shah wrote:
>     I missed the part that coord.lastActionNumber is offset from start time,  where as instanceCount is offset from initial instance.
>     
>     We are still looping to get instanceCount, if we can get get rid of that then we will get much better performance. For now we are good.

get instance count is the major bottle neck and I have changed that too. Check getCurrentInstance()


- shwethags


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18762/#review39621
-----------------------------------------------------------


On April 8, 2014, 1:03 a.m., shwethags wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18762/
> -----------------------------------------------------------
> 
> (Updated April 8, 2014, 1:03 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-1709
>     https://issues.apache.org/jira/browse/OOZIE-1709
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> CoordELFunctions.getCurrentInstance() has this code:
>     while (current.compareTo(calEffectiveTime) <= 0) {
>             current = (Calendar) origCurrent.clone();
>             instanceCount[0]++;
>             current.add(dsTimeUnit.getCalendarUnit(), instanceCount[0] * dsFreq);
>         }
> 
> For coords with smaller frequency and start time in very past, this is very expensive. On prod, we have seen materialisation of each instance taking few mins sometimes for coords with 1 min frequency
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java d73bc7d 
>   core/src/test/java/org/apache/oozie/coord/TestCoordELFunctions.java be35ce4 
> 
> Diff: https://reviews.apache.org/r/18762/diff/
> 
> 
> Testing
> -------
> 
> UT
> 
> 
> Thanks,
> 
> shwethags
> 
>


Re: Review Request 18762: OOZIE-1709 CoordELFunctions.getCurrentInstance() is expensive

Posted by Purshotam Shah <pu...@yahoo-inc.com>.

> On April 8, 2014, 12:44 a.m., Purshotam Shah wrote:
> > core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java, line 1314
> > <https://reviews.apache.org/r/18762/diff/2/?file=537328#file537328line1314>
> >
> >     instanceCount, is not this same as coord.lastActionNumber?
> >     
> >     At least in case of nominal time, this should same as coord.lastActionNumber. We can use that to optimize query.
> 
> shwethags wrote:
>     instanceCount is the data instance count with respect to the coord nominal time. coord.lastActionNumber is the last materialised action. They are not related and can't be re-used

I missed the part that coord.lastActionNumber is offset from start time,  where as instanceCount is offset from initial instance.

We are still looping to get instanceCount, if we can get get rid of that then we will get much better performance. For now we are good.


- Purshotam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18762/#review39621
-----------------------------------------------------------


On April 8, 2014, 1:03 a.m., shwethags wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18762/
> -----------------------------------------------------------
> 
> (Updated April 8, 2014, 1:03 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-1709
>     https://issues.apache.org/jira/browse/OOZIE-1709
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> CoordELFunctions.getCurrentInstance() has this code:
>     while (current.compareTo(calEffectiveTime) <= 0) {
>             current = (Calendar) origCurrent.clone();
>             instanceCount[0]++;
>             current.add(dsTimeUnit.getCalendarUnit(), instanceCount[0] * dsFreq);
>         }
> 
> For coords with smaller frequency and start time in very past, this is very expensive. On prod, we have seen materialisation of each instance taking few mins sometimes for coords with 1 min frequency
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java d73bc7d 
>   core/src/test/java/org/apache/oozie/coord/TestCoordELFunctions.java be35ce4 
> 
> Diff: https://reviews.apache.org/r/18762/diff/
> 
> 
> Testing
> -------
> 
> UT
> 
> 
> Thanks,
> 
> shwethags
> 
>


Re: Review Request 18762: OOZIE-1709 CoordELFunctions.getCurrentInstance() is expensive

Posted by sh...@inmobi.com.

> On April 8, 2014, 12:44 a.m., Purshotam Shah wrote:
> > core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java, line 960
> > <https://reviews.apache.org/r/18762/diff/2/?file=537328#file537328line960>
> >
> >     Can we put a check, we are logging same message repetitively..

will update the message


> On April 8, 2014, 12:44 a.m., Purshotam Shah wrote:
> > core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java, line 1314
> > <https://reviews.apache.org/r/18762/diff/2/?file=537328#file537328line1314>
> >
> >     instanceCount, is not this same as coord.lastActionNumber?
> >     
> >     At least in case of nominal time, this should same as coord.lastActionNumber. We can use that to optimize query.

instanceCount is the data instance count with respect to the coord nominal time. coord.lastActionNumber is the last materialised action. They are not related and can't be re-used


> On April 8, 2014, 12:44 a.m., Purshotam Shah wrote:
> > core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java, line 1339
> > <https://reviews.apache.org/r/18762/diff/2/?file=537328#file537328line1339>
> >
> >     I haven't tested this. Will this work for different timezone?

yes, it should work with different timezones and DST as Calendar is used.

There are tests in TestCoordELFunctions which test with different timezones and DST. I haven't changed the tests and all tests pass. So, the patch shouldn't add any regression


- shwethags


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18762/#review39621
-----------------------------------------------------------


On March 26, 2014, 10:09 a.m., shwethags wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18762/
> -----------------------------------------------------------
> 
> (Updated March 26, 2014, 10:09 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-1709
>     https://issues.apache.org/jira/browse/OOZIE-1709
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> CoordELFunctions.getCurrentInstance() has this code:
>     while (current.compareTo(calEffectiveTime) <= 0) {
>             current = (Calendar) origCurrent.clone();
>             instanceCount[0]++;
>             current.add(dsTimeUnit.getCalendarUnit(), instanceCount[0] * dsFreq);
>         }
> 
> For coords with smaller frequency and start time in very past, this is very expensive. On prod, we have seen materialisation of each instance taking few mins sometimes for coords with 1 min frequency
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/coord/CoordELFunctions.java d73bc7d 
>   core/src/test/java/org/apache/oozie/coord/TestCoordELFunctions.java 34a428f 
> 
> Diff: https://reviews.apache.org/r/18762/diff/
> 
> 
> Testing
> -------
> 
> UT
> 
> 
> Thanks,
> 
> shwethags
> 
>