You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Michael Knapp (JIRA)" <ji...@apache.org> on 2012/11/25 00:28:59 UTC

[jira] [Comment Edited] (LANG-860) String split with an escape pattern

    [ https://issues.apache.org/jira/browse/LANG-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503436#comment-13503436 ] 

Michael Knapp edited comment on LANG-860 at 11/24/12 11:28 PM:
---------------------------------------------------------------

I beg to differ, commons-csv assumes there can be an escape character, my code assumes there can be an escape pattern.  My code handles a much more broad range of problems than CSV.  For example, what if you want to get all the parenthesized text out of a document?  commons-csv cannot do that because '(' and ')' are different characters.  Commons-csv offers no method to retain delimiters that you split on, my code does.  Let's say you split on the pattern of open and closed parentheses: no existing split function in commons-lang, and no function in commons-csv, is able to retain the text that matched your regular expression delimiter, but my code does.  The code I wrote does not replace commons-csv, nor does it try.  Commons-csv handles comments, empty lines, trimming text, and a whole lot more which is out of the scope of my code.  Also, if you expect anybody to use commons-csv, you should really put it on the central maven repository, and document it a little more.
                
      was (Author: msknapp):
    I beg to differ, commons-csv assumes there can be an escape character, my code assumes there can be an escape pattern.  My code handles a much more broad range of problems than CSV.  For example, what if you want to get all the parenthesized text out of a document?  commons-csv cannot do that because '(' and ')' are different characters.  Commons-csv offers no method to retain delimiters that you split on, my code does.  Let's say you split on the pattern of open and closed parentheses: no existing split function in commons-lang, and no function in commons-csv, is able to retain the text that matched your delimiter, but my code does.  The code I wrote does not replace commons-csv, nor does it try.  Commons-csv handles comments, empty lines, trimming text, and a whole lot more which is out of the scope of my code.  Also, if you expect anybody to use commons-csv, you should really put it on the central maven repository, and document it a little more.
                  
> String split with an escape pattern
> -----------------------------------
>
>                 Key: LANG-860
>                 URL: https://issues.apache.org/jira/browse/LANG-860
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Michael Knapp
>            Priority: Minor
>              Labels: patch, split
>         Attachments: StringUtilsSplitEscapingly.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Often times there are strings which are delimited, but certain patterns can escape the delimiter.  For example, quotes are used in CSV to escape a comma delimiter.  I have written a couple methods for StringUtils that split strings while considering the possibility of an escape pattern.  For example, when given "a,\"b,c\",c", it will produce {"a","\"b,c\"","c"}.  In my code, the delimiter can be a string, and it can be escaped by any regular expression pattern.  Unit tests are already written and passing.
> I plan to attach the patch for this once the ticket is created.  I just need a committer to review the patch, approve, and commit it for me.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira