You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Robert Kanter <rk...@cloudera.com> on 2016/09/13 21:35:29 UTC

Re: Review Request 43970: Oozie log parsing regex consume more than 90% cpu

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43970/#review148786
-----------------------------------------------------------



LGTM +1 after this.


core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java (line 93)
<https://reviews.apache.org/r/43970/#comment216298>

    When you rebase the patch, this is where you'll see a trivial conflict.



core/src/main/java/org/apache/oozie/util/XLogFilter.java (line 213)
<https://reviews.apache.org/r/43970/#comment216293>

    Shouldn't this be true?


- Robert Kanter


On June 14, 2016, 11:15 p.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43970/
> -----------------------------------------------------------
> 
> (Updated June 14, 2016, 11:15 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/OOZIE-2417
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/OOZIE-2417
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> This patch tries to minimize the parsing of same log line for multiple times using different regex.
> It also caches the log parts once we figure out what those are, thus avoiding the re-parsing of log line to get log parts whenever needed.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/util/LogLine.java PRE-CREATION 
>   core/src/main/java/org/apache/oozie/util/SimpleTimestampedMessageParser.java 78cb042 
>   core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java f691409 
>   core/src/main/java/org/apache/oozie/util/XLogFilter.java 3b49f77 
> 
> Diff: https://reviews.apache.org/r/43970/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Satish Saley
> 
>


Re: Review Request 43970: Oozie log parsing regex consume more than 90% cpu

Posted by Satish Saley <sa...@gmail.com>.

> On Sept. 13, 2016, 2:35 p.m., Robert Kanter wrote:
> > core/src/main/java/org/apache/oozie/util/XLogFilter.java, line 213
> > <https://reviews.apache.org/r/43970/diff/3/?file=1419293#file1419293line213>
> >
> >     Shouldn't this be true?

I revisited this logic. I have renamed isSplit to isMatched, because it made more sense. We will set isMatched to true only if the log line matches the log filter pattern, otherwise we set it to false. If log line matches to the log filter pattern, then we cut the message into three parts and recored those parts in a list to avoid regex matching further in the code.


> On Sept. 13, 2016, 2:35 p.m., Robert Kanter wrote:
> > core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java, line 94
> > <https://reviews.apache.org/r/43970/diff/3/?file=1419292#file1419292line94>
> >
> >     When you rebase the patch, this is where you'll see a trivial conflict.

I saw this. I will take care of it while committing.


- Satish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43970/#review148786
-----------------------------------------------------------


On June 14, 2016, 4:15 p.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43970/
> -----------------------------------------------------------
> 
> (Updated June 14, 2016, 4:15 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/OOZIE-2417
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/OOZIE-2417
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> This patch tries to minimize the parsing of same log line for multiple times using different regex.
> It also caches the log parts once we figure out what those are, thus avoiding the re-parsing of log line to get log parts whenever needed.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/util/LogLine.java PRE-CREATION 
>   core/src/main/java/org/apache/oozie/util/SimpleTimestampedMessageParser.java 78cb042 
>   core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java f691409 
>   core/src/main/java/org/apache/oozie/util/XLogFilter.java 3b49f77 
> 
> Diff: https://reviews.apache.org/r/43970/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Satish Saley
> 
>


Re: Review Request 43970: [OOZIE-2457] Oozie log parsing regex consume more than 90% cpu

Posted by Satish Saley <sa...@gmail.com>.

> On Sept. 13, 2016, 2:35 p.m., Robert Kanter wrote:
> > core/src/main/java/org/apache/oozie/util/XLogFilter.java, line 213
> > <https://reviews.apache.org/r/43970/diff/3/?file=1419293#file1419293line213>
> >
> >     Shouldn't this be true?
> 
> Satish Saley wrote:
>     I revisited this logic. I have renamed isSplit to isMatched, because it made more sense. We will set isMatched to true only if the log line matches the log filter pattern, otherwise we set it to false. If log line matches to the log filter pattern, then we cut the message into three parts and recored those parts in a list to avoid regex matching further in the code.

In the latest patch, I replaced this with an enum having 3 possible values. An enum is needed to distinguish between line not matched with pattern and a line not matched with pattern but should be included in log ( such as part of a stack trace).


- Satish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43970/#review148786
-----------------------------------------------------------


On Feb. 27, 2017, 11:42 a.m., Satish Saley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43970/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2017, 11:42 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/OOZIE-2417
>     https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/OOZIE-2417
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> This patch tries to minimize the parsing of same log line for multiple times using different regex.
> It also caches the log parts once we figure out what those are, thus avoiding the re-parsing of log line to get log parts whenever needed.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/util/LogLine.java PRE-CREATION 
>   core/src/main/java/org/apache/oozie/util/SimpleTimestampedMessageParser.java 78cb042 
>   core/src/main/java/org/apache/oozie/util/TimestampedMessageParser.java a676f4d 
>   core/src/main/java/org/apache/oozie/util/XLogFilter.java 3b49f77 
> 
> Diff: https://reviews.apache.org/r/43970/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Satish Saley
> 
>