You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Kai Gülzau (JIRA)" <ji...@apache.org> on 2010/03/10 09:31:27 UTC

[jira] Created: (LANG-604) Optimize isBlank() for untrimmed strings

Optimize isBlank() for untrimmed strings
----------------------------------------

                 Key: LANG-604
                 URL: https://issues.apache.org/jira/browse/LANG-604
             Project: Commons Lang
          Issue Type: Improvement
          Components: lang.*
    Affects Versions: 3.0
            Reporter: Kai Gülzau
            Priority: Minor


Change isBlank() to start iteration in the middle of the String.
So you get better performance for untrimmed Strings like "   dummy   ".

Here is my proposal:

public static boolean isBlank(CharSequence cs) {
  int strLen;
  if (cs == null || (strLen = cs.length()) == 0) {
    return true;
  }
  int mid = strLen / 2, i = mid;
  for (; i < strLen; i++) {
    if (!Character.isWhitespace(cs.charAt(i))) {
      return false;
    }
  }
  for (i = 0; i < mid; i++) {
    if (!Character.isWhitespace(cs.charAt(i))) {
      return false;
    }
  }
  return true;
}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (LANG-604) Optimize isBlank() for untrimmed strings

Posted by "Kai Gülzau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845578#action_12845578 ] 

Kai Gülzau commented on LANG-604:
---------------------------------

It's a heuristic to guess the position of a non whitespace character.
For example the algorithm is ~50% faster for the
string " 123" and ~100% for the string "  12".

I belive most of the strings checked by isBlank() which
contain whitespaces are untrimmed strings.
So this works for me and should work in general.


You loose ~1-5% performance on blank or random strings
due to computing the middle position of the string.

To limit the performance loss to 50% for strings like "12  "
i've put both loops into one loop. For blank strings with
an odd length the middle char is checked twice.

public static boolean isBlank(CharSequence cs) {
  int strLen;
  if (cs == null || (strLen = cs.length()) == 0)
    return true;
  for (int m = l / 2, i = 0; m < l; m++, i++)
    if (!Character.isWhitespace(s.charAt(m)) || !Character.isWhitespace(s.charAt(i)))
      return false;
  return true;
}

regards,

Kai Gülzau


> Optimize isBlank() for untrimmed strings
> ----------------------------------------
>
>                 Key: LANG-604
>                 URL: https://issues.apache.org/jira/browse/LANG-604
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>    Affects Versions: 3.0
>            Reporter: Kai Gülzau
>            Priority: Minor
>
> Change isBlank() to start iteration in the middle of the String.
> So you get better performance for untrimmed Strings like "   dummy   ".
> Here is my proposal:
> public static boolean isBlank(CharSequence cs) {
>   int strLen;
>   if (cs == null || (strLen = cs.length()) == 0) {
>     return true;
>   }
>   int mid = strLen / 2, i = mid;
>   for (; i < strLen; i++) {
>     if (!Character.isWhitespace(cs.charAt(i))) {
>       return false;
>     }
>   }
>   for (i = 0; i < mid; i++) {
>     if (!Character.isWhitespace(cs.charAt(i))) {
>       return false;
>     }
>   }
>   return true;
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (LANG-604) Optimize isBlank() for untrimmed strings

Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845245#action_12845245 ] 

Henri Yandell commented on LANG-604:
------------------------------------

Two questions:

1) What kind of improvement has this given you?
2) What's the cost - what gets worse?

> Optimize isBlank() for untrimmed strings
> ----------------------------------------
>
>                 Key: LANG-604
>                 URL: https://issues.apache.org/jira/browse/LANG-604
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>    Affects Versions: 3.0
>            Reporter: Kai Gülzau
>            Priority: Minor
>
> Change isBlank() to start iteration in the middle of the String.
> So you get better performance for untrimmed Strings like "   dummy   ".
> Here is my proposal:
> public static boolean isBlank(CharSequence cs) {
>   int strLen;
>   if (cs == null || (strLen = cs.length()) == 0) {
>     return true;
>   }
>   int mid = strLen / 2, i = mid;
>   for (; i < strLen; i++) {
>     if (!Character.isWhitespace(cs.charAt(i))) {
>       return false;
>     }
>   }
>   for (i = 0; i < mid; i++) {
>     if (!Character.isWhitespace(cs.charAt(i))) {
>       return false;
>     }
>   }
>   return true;
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (LANG-604) Optimize isBlank() for untrimmed strings

Posted by "Kai Gülzau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848141#action_12848141 ] 

Kai Gülzau commented on LANG-604:
---------------------------------

Ok, good point. This way you don't loose performance for trimmed strings.

But the loop doesn't check all chars now:

isBlank("   X  ") returns true!

There are two fixes:
a)
...
for (int m = strLen / 2, i = 1; m < strLen; m++, i++) {
...

Here one char (at strLen / 2) is double checked for even length blank strings.

b)
...
int mid = strLen / 2;
for (int i = 1, m = mid; m < strLen; m++, i++) {
  if (!Character.isWhitespace(cs.charAt(m)) || ( i < mid && !Character.isWhitespace(cs.charAt(i))) ) {
...

Here we loose some minor performance on very large blank strings due
to the extra check in the loop.


I opt for b).

Kai



> Optimize isBlank() for untrimmed strings
> ----------------------------------------
>
>                 Key: LANG-604
>                 URL: https://issues.apache.org/jira/browse/LANG-604
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>    Affects Versions: 3.0
>            Reporter: Kai Gülzau
>            Priority: Minor
>
> Change isBlank() to start iteration in the middle of the String.
> So you get better performance for untrimmed Strings like "   dummy   ".
> Here is my proposal:
> public static boolean isBlank(CharSequence cs) {
>   int strLen;
>   if (cs == null || (strLen = cs.length()) == 0) {
>     return true;
>   }
>   int mid = strLen / 2, i = mid;
>   for (; i < strLen; i++) {
>     if (!Character.isWhitespace(cs.charAt(i))) {
>       return false;
>     }
>   }
>   for (i = 0; i < mid; i++) {
>     if (!Character.isWhitespace(cs.charAt(i))) {
>       return false;
>     }
>   }
>   return true;
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (LANG-604) Optimize isBlank() for untrimmed strings

Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846475#action_12846475 ] 

Henri Yandell commented on LANG-604:
------------------------------------

I'm not sure I buy that most of the strings checked are untrimmed.

I'd expect most to be either a) blank or b) normal user input. If the first character is whitespace; then I could believe that it's untrimmed.

So:

+    public static boolean isBlank(CharSequence cs) { 
+        int strLen;
+        if (cs == null || (strLen = cs.length()) == 0) {
+            return true;
+        }
+        // Optimized - check first character
+        if (!Character.isWhitespace(cs.charAt(0))) {
+            return false;
+        }
+        // Optimized - starts in the middle and works out with the assumption that 
+        // most input starting with whitespace are untrimmed strings
+        for (int m = 1 + strLen / 2, i = 1; m < strLen; m++, i++) {
+            if (!Character.isWhitespace(cs.charAt(m)) || !Character.isWhitespace(cs.charAt(i))) {
+                return false;
+            }
+        }
+        return true;
+    }

Any thoughts?

> Optimize isBlank() for untrimmed strings
> ----------------------------------------
>
>                 Key: LANG-604
>                 URL: https://issues.apache.org/jira/browse/LANG-604
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>    Affects Versions: 3.0
>            Reporter: Kai Gülzau
>            Priority: Minor
>
> Change isBlank() to start iteration in the middle of the String.
> So you get better performance for untrimmed Strings like "   dummy   ".
> Here is my proposal:
> public static boolean isBlank(CharSequence cs) {
>   int strLen;
>   if (cs == null || (strLen = cs.length()) == 0) {
>     return true;
>   }
>   int mid = strLen / 2, i = mid;
>   for (; i < strLen; i++) {
>     if (!Character.isWhitespace(cs.charAt(i))) {
>       return false;
>     }
>   }
>   for (i = 0; i < mid; i++) {
>     if (!Character.isWhitespace(cs.charAt(i))) {
>       return false;
>     }
>   }
>   return true;
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.