You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Emmanuel Bourg (JIRA)" <ji...@apache.org> on 2008/04/15 14:47:05 UTC

[jira] Created: (LANG-426) String splitting with escaped delimiter

String splitting with escaped delimiter
---------------------------------------

                 Key: LANG-426
                 URL: https://issues.apache.org/jira/browse/LANG-426
             Project: Commons Lang
          Issue Type: New Feature
    Affects Versions: 2.4
            Reporter: Emmanuel Bourg
            Priority: Minor
             Fix For: 3.0


In Commons Configuration we use a custom split method that supports the concept of an escaped delimiter, that may be nice if this was available in Commons Lang (as a method in StringUtils, or as a setting in StrTokenizer).

Example:

{code}
a,b\,c,d    ->    ["a", "b,c", "d"]
{code}

Here is the code of the method:

{code:java}
public static List<String> split(String s, char delimiter)
{
    if (s == null)
    {
        return new ArrayList<String>();
    }

    List<String> list = new ArrayList<String>();

    StringBuilder token = new StringBuilder();
    int begin = 0;
    boolean inEscape = false;

    while (begin < s.length())
    {
        char c = s.charAt(begin);
        if (inEscape)
        {
            // last character was the escape marker
            // can current character be escaped?
            if (c != delimiter && c != LIST_ESC_CHAR)
            {
                // no, also add escape character
                token.append(LIST_ESC_CHAR);
            }
            token.append(c);
            inEscape = false;
        }

        else
        {
            if (c == delimiter)
            {
                // found a list delimiter -> add token and reset buffer
                list.add(token.toString().trim());
                token = new StringBuilder();
            }
            else if (c == LIST_ESC_CHAR)
            {
                // eventually escape next character
                inEscape = true;
            }
            else
            {
                token.append(c);
            }
        }

        begin++;
    }

    // Trailing delimiter?
    if (inEscape)
    {
        token.append(LIST_ESC_CHAR);
    }
    // Add last token
    list.add(token.toString().trim());

    return list;
}
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (LANG-426) String splitting with escaped delimiter

Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595882#action_12595882 ] 

Henri Yandell commented on LANG-426:
------------------------------------

Need to write a unit test.

> String splitting with escaped delimiter
> ---------------------------------------
>
>                 Key: LANG-426
>                 URL: https://issues.apache.org/jira/browse/LANG-426
>             Project: Commons Lang
>          Issue Type: New Feature
>    Affects Versions: 2.4
>            Reporter: Emmanuel Bourg
>            Priority: Minor
>             Fix For: 3.0
>
>
> In Commons Configuration we use a custom split method that supports the concept of an escaped delimiter, that may be nice if this was available in Commons Lang (as a method in StringUtils, or as a setting in StrTokenizer).
> Example:
> {code}
> a,b\,c,d    ->    ["a", "b,c", "d"]
> {code}
> Here is the code of the method:
> {code:java}
> public static List<String> split(String s, char delimiter)
> {
>     if (s == null)
>     {
>         return new ArrayList<String>();
>     }
>     List<String> list = new ArrayList<String>();
>     StringBuilder token = new StringBuilder();
>     int begin = 0;
>     boolean inEscape = false;
>     while (begin < s.length())
>     {
>         char c = s.charAt(begin);
>         if (inEscape)
>         {
>             // last character was the escape marker
>             // can current character be escaped?
>             if (c != delimiter && c != LIST_ESC_CHAR)
>             {
>                 // no, also add escape character
>                 token.append(LIST_ESC_CHAR);
>             }
>             token.append(c);
>             inEscape = false;
>         }
>         else
>         {
>             if (c == delimiter)
>             {
>                 // found a list delimiter -> add token and reset buffer
>                 list.add(token.toString().trim());
>                 token = new StringBuilder();
>             }
>             else if (c == LIST_ESC_CHAR)
>             {
>                 // eventually escape next character
>                 inEscape = true;
>             }
>             else
>             {
>                 token.append(c);
>             }
>         }
>         begin++;
>     }
>     // Trailing delimiter?
>     if (inEscape)
>     {
>         token.append(LIST_ESC_CHAR);
>     }
>     // Add last token
>     list.add(token.toString().trim());
>     return list;
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (LANG-426) String splitting with escaped delimiter

Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LANG-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-426:
-------------------------------


Moving to 3.x. I don't think this would be backwards incompatible so can be done later.

> String splitting with escaped delimiter
> ---------------------------------------
>
>                 Key: LANG-426
>                 URL: https://issues.apache.org/jira/browse/LANG-426
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.text.*
>    Affects Versions: 2.4
>            Reporter: Emmanuel Bourg
>            Priority: Minor
>             Fix For: 3.x
>
>
> In Commons Configuration we use a custom split method that supports the concept of an escaped delimiter, that may be nice if this was available in Commons Lang (as a method in StringUtils, or as a setting in StrTokenizer).
> Example:
> {code}
> a,b\,c,d    ->    ["a", "b,c", "d"]
> {code}
> Here is the code of the method:
> {code:java}
> public static List<String> split(String s, char delimiter)
> {
>     if (s == null)
>     {
>         return new ArrayList<String>();
>     }
>     List<String> list = new ArrayList<String>();
>     StringBuilder token = new StringBuilder();
>     int begin = 0;
>     boolean inEscape = false;
>     while (begin < s.length())
>     {
>         char c = s.charAt(begin);
>         if (inEscape)
>         {
>             // last character was the escape marker
>             // can current character be escaped?
>             if (c != delimiter && c != LIST_ESC_CHAR)
>             {
>                 // no, also add escape character
>                 token.append(LIST_ESC_CHAR);
>             }
>             token.append(c);
>             inEscape = false;
>         }
>         else
>         {
>             if (c == delimiter)
>             {
>                 // found a list delimiter -> add token and reset buffer
>                 list.add(token.toString().trim());
>                 token = new StringBuilder();
>             }
>             else if (c == LIST_ESC_CHAR)
>             {
>                 // eventually escape next character
>                 inEscape = true;
>             }
>             else
>             {
>                 token.append(c);
>             }
>         }
>         begin++;
>     }
>     // Trailing delimiter?
>     if (inEscape)
>     {
>         token.append(LIST_ESC_CHAR);
>     }
>     // Add last token
>     list.add(token.toString().trim());
>     return list;
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (LANG-426) String splitting with escaped delimiter

Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778079#action_12778079 ] 

Henri Yandell commented on LANG-426:
------------------------------------

With regardisng to StringUtils - Biggest concern is that this balloons the API. There are currently 4 split methods, plus 10 other splitByXyz type methods.  I think StrTokenizer is the better place to pursue this.

> String splitting with escaped delimiter
> ---------------------------------------
>
>                 Key: LANG-426
>                 URL: https://issues.apache.org/jira/browse/LANG-426
>             Project: Commons Lang
>          Issue Type: New Feature
>    Affects Versions: 2.4
>            Reporter: Emmanuel Bourg
>            Priority: Minor
>             Fix For: 3.0
>
>
> In Commons Configuration we use a custom split method that supports the concept of an escaped delimiter, that may be nice if this was available in Commons Lang (as a method in StringUtils, or as a setting in StrTokenizer).
> Example:
> {code}
> a,b\,c,d    ->    ["a", "b,c", "d"]
> {code}
> Here is the code of the method:
> {code:java}
> public static List<String> split(String s, char delimiter)
> {
>     if (s == null)
>     {
>         return new ArrayList<String>();
>     }
>     List<String> list = new ArrayList<String>();
>     StringBuilder token = new StringBuilder();
>     int begin = 0;
>     boolean inEscape = false;
>     while (begin < s.length())
>     {
>         char c = s.charAt(begin);
>         if (inEscape)
>         {
>             // last character was the escape marker
>             // can current character be escaped?
>             if (c != delimiter && c != LIST_ESC_CHAR)
>             {
>                 // no, also add escape character
>                 token.append(LIST_ESC_CHAR);
>             }
>             token.append(c);
>             inEscape = false;
>         }
>         else
>         {
>             if (c == delimiter)
>             {
>                 // found a list delimiter -> add token and reset buffer
>                 list.add(token.toString().trim());
>                 token = new StringBuilder();
>             }
>             else if (c == LIST_ESC_CHAR)
>             {
>                 // eventually escape next character
>                 inEscape = true;
>             }
>             else
>             {
>                 token.append(c);
>             }
>         }
>         begin++;
>     }
>     // Trailing delimiter?
>     if (inEscape)
>     {
>         token.append(LIST_ESC_CHAR);
>     }
>     // Add last token
>     list.add(token.toString().trim());
>     return list;
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (LANG-426) String splitting with escaped delimiter

Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LANG-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-426:
-------------------------------

    Fix Version/s:     (was: 3.0)
                   3.x

> String splitting with escaped delimiter
> ---------------------------------------
>
>                 Key: LANG-426
>                 URL: https://issues.apache.org/jira/browse/LANG-426
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.text.*
>    Affects Versions: 2.4
>            Reporter: Emmanuel Bourg
>            Priority: Minor
>             Fix For: 3.x
>
>
> In Commons Configuration we use a custom split method that supports the concept of an escaped delimiter, that may be nice if this was available in Commons Lang (as a method in StringUtils, or as a setting in StrTokenizer).
> Example:
> {code}
> a,b\,c,d    ->    ["a", "b,c", "d"]
> {code}
> Here is the code of the method:
> {code:java}
> public static List<String> split(String s, char delimiter)
> {
>     if (s == null)
>     {
>         return new ArrayList<String>();
>     }
>     List<String> list = new ArrayList<String>();
>     StringBuilder token = new StringBuilder();
>     int begin = 0;
>     boolean inEscape = false;
>     while (begin < s.length())
>     {
>         char c = s.charAt(begin);
>         if (inEscape)
>         {
>             // last character was the escape marker
>             // can current character be escaped?
>             if (c != delimiter && c != LIST_ESC_CHAR)
>             {
>                 // no, also add escape character
>                 token.append(LIST_ESC_CHAR);
>             }
>             token.append(c);
>             inEscape = false;
>         }
>         else
>         {
>             if (c == delimiter)
>             {
>                 // found a list delimiter -> add token and reset buffer
>                 list.add(token.toString().trim());
>                 token = new StringBuilder();
>             }
>             else if (c == LIST_ESC_CHAR)
>             {
>                 // eventually escape next character
>                 inEscape = true;
>             }
>             else
>             {
>                 token.append(c);
>             }
>         }
>         begin++;
>     }
>     // Trailing delimiter?
>     if (inEscape)
>     {
>         token.append(LIST_ESC_CHAR);
>     }
>     // Add last token
>     list.add(token.toString().trim());
>     return list;
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.