You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2017/02/24 20:12:44 UTC

[jira] [Commented] (OOZIE-2457) Oozie log parsing regex consume more than 90% cpu

    [ https://issues.apache.org/jira/browse/OOZIE-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883438#comment-15883438 ] 

Hadoop QA commented on OOZIE-2457:
----------------------------------

Testing JIRA OOZIE-2457

Cleaning local git workspace

----------------------------

{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.    {color:green}+1{color} the patch does not introduce any @author tags
.    {color:green}+1{color} the patch does not introduce any tabs
.    {color:red}-1{color} the patch contains 1 line(s) with trailing spaces
.    {color:green}+1{color} the patch does not introduce any line longer than 132
.    {color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.    {color:green}+1{color} the patch does not seem to introduce new RAT warnings
{color:green}+1 JAVADOC{color}
.    {color:green}+1{color} the patch does not seem to introduce new Javadoc warnings
{color:green}+1 COMPILE{color}
.    {color:green}+1{color} HEAD compiles
.    {color:green}+1{color} patch compiles
.    {color:green}+1{color} the patch does not seem to introduce new javac warnings
{color:green}+1{color} There are no new bugs found in total.
.    {color:green}+1{color} There are no new bugs found in [server].
.    {color:green}+1{color} There are no new bugs found in [client].
.    {color:green}+1{color} There are no new bugs found in [core].
.    {color:green}+1{color} There are no new bugs found in [docs].
.    {color:green}+1{color} There are no new bugs found in [hadooplibs/hadoop-utils-2].
.    {color:green}+1{color} There are no new bugs found in [tools].
.    {color:green}+1{color} There are no new bugs found in [examples].
.    {color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.    {color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.    {color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.    {color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.    {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.    {color:green}+1{color} There are no new bugs found in [sharelib/hive].
.    {color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.    {color:green}+1{color} There are no new bugs found in [sharelib/pig].
.    {color:green}+1{color} There are no new bugs found in [sharelib/spark].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.    {color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
.    {color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.    Tests run: 1879
.    Tests failed: 4
.    Tests errors: 0

.    The patch failed the following testcases:

.      testProcessRemainingLog(org.apache.oozie.util.TestTimestampedMessageParser)
.      testFsFailover(org.apache.oozie.action.TestActionFailover)
.      testloglevel_Error(org.apache.oozie.util.TestXLogUserFilterParam)
.      testConnectionDrop(org.apache.oozie.jms.TestJMSJobEventListener)

.    Tests failing with errors:
.      

{color:green}+1 DISTRO{color}
.    {color:green}+1{color} distro tarball builds with the patch 

----------------------------
{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3659/

> Oozie log parsing regex consume more than 90% cpu
> -------------------------------------------------
>
>                 Key: OOZIE-2457
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2457
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Blocker
>             Fix For: 5.0.0
>
>         Attachments: OOZIE-2457-1.patch, OOZIE-2457-2.patch, OOZIE-2457-3.patch, OOZIE-2457-4.patch
>
>
> http-0.0.0.0-4080-26  TID=62215  STATE=RUNNABLE  CPU_TIME=1992 (92.59%)  USER_TIME=1990 (92.46%) Allocted: 269156584
>     java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
>     java.util.regex.Pattern$Curly.match(Pattern.java:4132)
>     java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
>     java.util.regex.Matcher.match(Matcher.java:1221)
>     java.util.regex.Matcher.matches(Matcher.java:559)
>     org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
>     org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
>     org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)
> Regex 
> {code}
> (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] JOB\[0000000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
> {code}
> For single line parsing we use two regex.
> 1. 
> {code}
> public ArrayList<String> splitLogMessage(String logLine) {
>         Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
>         if (splitter.matches()) {
>             ArrayList<String> logParts = new ArrayList<String>();
>             logParts.add(splitter.group(1));// timestamp
>             logParts.add(splitter.group(2));// log level
>             logParts.add(splitter.group(3));// Log Message
>             return logParts;
>         }
>         else {
>             return null;
>         }
>     }
> {code}
> 2.
> {code}
>  public boolean matches(ArrayList<String> logParts) {
>         if (getStartDate() != null) {
>             if (logParts.get(0).substring(0, 19).compareTo(getFormattedStartDate()) < 0) {
>                 return false;
>             }
>         }
>         String logLevel = logParts.get(1);
>         String logMessage = logParts.get(2);
>         if (this.logLevels == null || this.logLevels.containsKey(logLevel.toUpperCase())) {
>             Matcher logMatcher = filterPattern.matcher(logMessage);
>             return logMatcher.matches();
>         }
>         else {
>             return false;
>         }
>     }
> {code}
> Also there is repetitive parsing  for same log message in
> {code}
> private String parseTimestamp(String line) {
>         String timestamp = null;
>         ArrayList<String> logParts = filter.splitLogMessage(line);
>         if (logParts != null) {
>             timestamp = logParts.get(0);
>         }
>         return timestamp;
>     }
> {code}
> where the {{line}} has already parsed using regex and we already know the {{logParts}} if any.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)