You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jerome Boulon (JIRA)" <ji...@apache.org> on 2009/01/21 00:57:59 UTC

[jira] Created: (HADOOP-5087) Regex for Cmd parsing contains an error

Regex for Cmd parsing contains an error
---------------------------------------

                 Key: HADOOP-5087
                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/chukwa
         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
the current regex is:
Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
does not correctly parsed this valid checkpoint entry:
"ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
Parsing result:
adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
dataType Syslog
params 0 /var/log/messages 11402
offset 7

Instead of:
adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
dataType Syslog
params 0 /var/log/messages 
offset 114027

The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
Parsing result:
adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
dataType Syslog
params 0 my param1 param2 /var/log/messages 
offset 114027


            Reporter: Jerome Boulon
            Assignee: Jerome Boulon




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5087:
----------------------------------

    Attachment: HADOOP-5087-2.patch

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-5087:
-------------------------------

    Attachment: fixedregex.patch

This patch removes trailing spaces from adaptor parameters; I'm pretty sure this is the Right Thing.

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668143#action_12668143 ] 

Jerome Boulon commented on HADOOP-5087:
---------------------------------------

The definition for the ADD command is: 

      // words should contain (space delimited):
      // 0) command ("add")
      // 1) AdaptorClassname
      // 2) dataType (e.g. "hadoop_log")
      // 3) params <optional>
      // (e.g. for files, this is filename,
      // but can be arbitrarily many space
      // delimited agent specific params )
      // 4) offset

How can you remove trailing spaces from adaptor parameters, this is adator specific and the adaptor should take care of that and this should not be automatically by the processCommand: HADOOP-5087-2.patch is doing that

Current tests cases are failing for 2 reasons:
-> space on the filename and the adaptor should be fixed
-> A test case send some chunks to the queue but do not clean up after itself and the shutdown method on the agent is not doing any sort of cleanup since in the real world the agent is calling System.exit(0). The solution is to move that test in a separate test case. Since we are forking, it will be fine.




> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-5087:
-------------------------------

    Attachment: fixedregex.patch

I think this is the right one -- I fiddled for a while and checked a bunch of the test cases.

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Mac Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668566#action_12668566 ] 

Mac Yang commented on HADOOP-5087:
----------------------------------

another +1 for "foo"

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665934#action_12665934 ] 

Ari Rabkin commented on HADOOP-5087:
------------------------------------

Awesome.  +1

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5087:
----------------------------------

    Attachment: HADOOP-5087.patch

- Fix the regex to correctly parse the ADD command
- Add a test case to validate the ADD command

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated HADOOP-5087:
------------------------------

      Resolution: Fixed
    Release Note: 
What is new in HADOOP-5087:

- Fixed the regex to correctly parse the chukwa agent ADD command.
- Added a test case to validate the chukwa agent ADD command.


    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch, reluctantregex.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-5087:
-------------------------------

    Attachment:     (was: fixedregex.patch)

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-5087:
-------------------------------

    Status: Patch Available  (was: Reopened)

existing tests cover this case, none added

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Ari Rabkin
>         Attachments: fixedregex.patch, fixftaregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch, reluctantregex.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-5087:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Ari Rabkin
>         Attachments: fixedregex.patch, fixftaregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch, reluctantregex.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin reopened HADOOP-5087:
--------------------------------

      Assignee: Ari Rabkin  (was: Jerome Boulon)

FTA doesn't parse its argument correctly.

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Ari Rabkin
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch, reluctantregex.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665929#action_12665929 ] 

Jerome Boulon commented on HADOOP-5087:
---------------------------------------

Yes, HADOOP-5087-2.patch now contains 2 additional test case:
- "ADD org.apache.hadoop.chukwa.datacollection.adaptor.ChukwaTestAdaptor  chukwaTestAdaptorType 0 114027"
- "ADD org.apache.hadoop.chukwa.datacollection.adaptor.ChukwaTestAdaptor  chukwaTestAdaptorType 114027"


> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665904#action_12665904 ] 

Jerome Boulon commented on HADOOP-5087:
---------------------------------------

Ps: The ChukwaTestAdaptor could be useful for testing the AdaptorManager implementation
https://issues.apache.org/jira/browse/HADOOP-4893?focusedCommentId=12663916#action_12663916

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Mac Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668547#action_12668547 ] 

Mac Yang commented on HADOOP-5087:
----------------------------------

Regex is very powerful and could provide an elegant solution to the right problem. However, it's not the easiest thing to read and maintain.

A typical answer to regex maintainability issue is to have detailed comment on the regex. O'Reilly has an article on how to maintain regex which I thought was quite useful (http://www.perl.com/pub/a/2004/01/16/regexps.html). I think we should do something like that if we want to take the regex approach.



> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668561#action_12668561 ] 

Eric Yang commented on HADOOP-5087:
-----------------------------------

+1 on "foo".

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerome Boulon updated HADOOP-5087:
----------------------------------

    Status: Patch Available  (was: Open)

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681802#action_12681802 ] 

Hudson commented on HADOOP-5087:
--------------------------------

Integrated in Hadoop-trunk #778 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/778/])
    

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Ari Rabkin
>         Attachments: fixedregex.patch, fixftaregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch, reluctantregex.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665919#action_12665919 ] 

Ari Rabkin commented on HADOOP-5087:
------------------------------------

Does this regex work if an adaptor has no parameters?  Do we have a test case to cover this?

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668529#action_12668529 ] 

Jerome Boulon commented on HADOOP-5087:
---------------------------------------

The idea of HADOOP-4947 was to have a more flexible parsing for chukwa commands.
Moving to regex was a good idea but the current regex to match the previous parsing (6-7 simple statements) seems to be very complicated and will be difficult to extend in the future.

So, I'm asking if in order to keep it simple, shouldn't we revert back to something similar to the initial parsing?





> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678190#action_12678190 ] 

Eric Yang commented on HADOOP-5087:
-----------------------------------

+1 on fixftaregex.patch

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Ari Rabkin
>         Attachments: fixedregex.patch, fixftaregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch, reluctantregex.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668145#action_12668145 ] 

Ari Rabkin commented on HADOOP-5087:
------------------------------------

Currently, no adaptors assume that their parameters can end with spaces.  So we can change that part of the spec without breaking things.  And I think it's generally more confusing than useful; if an adaptor needs parameters to end with spaces, they can quote their parameters. 

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668530#action_12668530 ] 

Ari Rabkin commented on HADOOP-5087:
------------------------------------

I wouldn't revert.  The previous code was very complex and difficult to extend, too.  And had a number of quirks, or bugs, depending on what you think the proper behavior was.  I think this is actually simpler.  I vote keep.  

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by Nigel Daley <nd...@yahoo-inc.com>.
Ari, "Cancel patch" and then "Submit patch" to have Hudson retest.

Nige

On Feb 25, 2009, at 5:11 PM, Ari Rabkin (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676832 
> #action_12676832 ]
>
> Ari Rabkin commented on HADOOP-5087:
> ------------------------------------
>
> This has been languishing for a while and I'd like to resolve it; I  
> don't want to commit anything without a little more consensus, though.
>
> Thoughts on the most recent patch?  [Is there a way to trigger  
> hudson to re-check a patch?]
>
>> Regex for Cmd parsing contains an error
>> ---------------------------------------
>>
>>                Key: HADOOP-5087
>>                URL: https://issues.apache.org/jira/browse/HADOOP-5087
>>            Project: Hadoop Core
>>         Issue Type: Bug
>>         Components: contrib/chukwa
>>        Environment: HADOOP-4947 use regex to parse chukwa commands  
>> but there's an error in the regex
>> the current regex is:
>> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+ 
>> (\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
>> does not correctly parsed this valid checkpoint entry:
>> "ADD  
>> org 
>> .apache 
>> .hadoop 
>> .chukwa 
>> .datacollection 
>> .adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog  
>> 0 /var/log/messages 114027"
>> Parsing result:
>> adaptorName  
>> org 
>> .apache 
>> .hadoop 
>> .chukwa 
>> .datacollection 
>> .adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
>> dataType Syslog
>> params 0 /var/log/messages 11402
>> offset 7
>> Instead of:
>> adaptorName  
>> org 
>> .apache 
>> .hadoop 
>> .chukwa 
>> .datacollection 
>> .adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
>> dataType Syslog
>> params 0 /var/log/messages
>> offset 114027
>> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\ 
>> \s*(\\d+)\\s*"
>> Example of parsing: "ADD  
>> org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor  
>> Syslog 0 my param1 param2 /var/log/messages 114027";
>> Parsing result:
>> adaptorName  
>> org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
>> dataType Syslog
>> params 0 my param1 param2 /var/log/messages
>> offset 114027
>>           Reporter: Jerome Boulon
>>           Assignee: Jerome Boulon
>>        Attachments: fixedregex.patch, HADOOP-5087-2.patch,  
>> HADOOP-5087.patch, reluctantregex.patch
>>
>>
>
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676832#action_12676832 ] 

Ari Rabkin commented on HADOOP-5087:
------------------------------------

This has been languishing for a while and I'd like to resolve it; I don't want to commit anything without a little more consensus, though.

Thoughts on the most recent patch?  [Is there a way to trigger hudson to re-check a patch?]

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch, reluctantregex.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678435#action_12678435 ] 

Hadoop QA commented on HADOOP-5087:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12401259/fixftaregex.patch
  against trunk revision 749318.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/39/console

This message is automatically generated.

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Ari Rabkin
>         Attachments: fixedregex.patch, fixftaregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch, reluctantregex.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-5087:
-------------------------------

    Attachment: reluctantregex.patch

You asked for it, you got it.  Now with comments, and reluctant matching.

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch, reluctantregex.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668557#action_12668557 ] 

Ari Rabkin commented on HADOOP-5087:
------------------------------------

Comments are good.  It should be easy to split the regex into pieces with comments and I'm happy to do it.  But we should decide exactly what the behavior we want is, in the case where you have multiple spaces between an Adaptor's parameters and the starting offset.  Which spaces belong to the parameter, and which are discarded?
That is, suppose have something that looks like:
       add ...FileTailingAdaptor... foo    10
Is the filename "foo" or "foo   " or?

This is basically a matter of taste.  I vote for the former; I think Jerome preferrs the latter.  Other opinions?

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5087) Regex for Cmd parsing contains an error

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated HADOOP-5087:
-------------------------------

    Attachment: fixftaregex.patch

patch should only affect file tailing adaptor

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Ari Rabkin
>         Attachments: fixedregex.patch, fixftaregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch, reluctantregex.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.