You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Kai Gülzau (JIRA)" <ji...@apache.org> on 2010/03/10 09:31:27 UTC
[jira] Created: (LANG-604) Optimize isBlank() for untrimmed strings
Optimize isBlank() for untrimmed strings
----------------------------------------
Key: LANG-604
URL: https://issues.apache.org/jira/browse/LANG-604
Project: Commons Lang
Issue Type: Improvement
Components: lang.*
Affects Versions: 3.0
Reporter: Kai Gülzau
Priority: Minor
Change isBlank() to start iteration in the middle of the String.
So you get better performance for untrimmed Strings like " dummy ".
Here is my proposal:
public static boolean isBlank(CharSequence cs) {
int strLen;
if (cs == null || (strLen = cs.length()) == 0) {
return true;
}
int mid = strLen / 2, i = mid;
for (; i < strLen; i++) {
if (!Character.isWhitespace(cs.charAt(i))) {
return false;
}
}
for (i = 0; i < mid; i++) {
if (!Character.isWhitespace(cs.charAt(i))) {
return false;
}
}
return true;
}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (LANG-604) Optimize isBlank() for untrimmed
strings
Posted by "Kai Gülzau (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LANG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845578#action_12845578 ]
Kai Gülzau commented on LANG-604:
---------------------------------
It's a heuristic to guess the position of a non whitespace character.
For example the algorithm is ~50% faster for the
string " 123" and ~100% for the string " 12".
I belive most of the strings checked by isBlank() which
contain whitespaces are untrimmed strings.
So this works for me and should work in general.
You loose ~1-5% performance on blank or random strings
due to computing the middle position of the string.
To limit the performance loss to 50% for strings like "12 "
i've put both loops into one loop. For blank strings with
an odd length the middle char is checked twice.
public static boolean isBlank(CharSequence cs) {
int strLen;
if (cs == null || (strLen = cs.length()) == 0)
return true;
for (int m = l / 2, i = 0; m < l; m++, i++)
if (!Character.isWhitespace(s.charAt(m)) || !Character.isWhitespace(s.charAt(i)))
return false;
return true;
}
regards,
Kai Gülzau
> Optimize isBlank() for untrimmed strings
> ----------------------------------------
>
> Key: LANG-604
> URL: https://issues.apache.org/jira/browse/LANG-604
> Project: Commons Lang
> Issue Type: Improvement
> Components: lang.*
> Affects Versions: 3.0
> Reporter: Kai Gülzau
> Priority: Minor
>
> Change isBlank() to start iteration in the middle of the String.
> So you get better performance for untrimmed Strings like " dummy ".
> Here is my proposal:
> public static boolean isBlank(CharSequence cs) {
> int strLen;
> if (cs == null || (strLen = cs.length()) == 0) {
> return true;
> }
> int mid = strLen / 2, i = mid;
> for (; i < strLen; i++) {
> if (!Character.isWhitespace(cs.charAt(i))) {
> return false;
> }
> }
> for (i = 0; i < mid; i++) {
> if (!Character.isWhitespace(cs.charAt(i))) {
> return false;
> }
> }
> return true;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (LANG-604) Optimize isBlank() for untrimmed
strings
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LANG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845245#action_12845245 ]
Henri Yandell commented on LANG-604:
------------------------------------
Two questions:
1) What kind of improvement has this given you?
2) What's the cost - what gets worse?
> Optimize isBlank() for untrimmed strings
> ----------------------------------------
>
> Key: LANG-604
> URL: https://issues.apache.org/jira/browse/LANG-604
> Project: Commons Lang
> Issue Type: Improvement
> Components: lang.*
> Affects Versions: 3.0
> Reporter: Kai Gülzau
> Priority: Minor
>
> Change isBlank() to start iteration in the middle of the String.
> So you get better performance for untrimmed Strings like " dummy ".
> Here is my proposal:
> public static boolean isBlank(CharSequence cs) {
> int strLen;
> if (cs == null || (strLen = cs.length()) == 0) {
> return true;
> }
> int mid = strLen / 2, i = mid;
> for (; i < strLen; i++) {
> if (!Character.isWhitespace(cs.charAt(i))) {
> return false;
> }
> }
> for (i = 0; i < mid; i++) {
> if (!Character.isWhitespace(cs.charAt(i))) {
> return false;
> }
> }
> return true;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (LANG-604) Optimize isBlank() for untrimmed
strings
Posted by "Kai Gülzau (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LANG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848141#action_12848141 ]
Kai Gülzau commented on LANG-604:
---------------------------------
Ok, good point. This way you don't loose performance for trimmed strings.
But the loop doesn't check all chars now:
isBlank(" X ") returns true!
There are two fixes:
a)
...
for (int m = strLen / 2, i = 1; m < strLen; m++, i++) {
...
Here one char (at strLen / 2) is double checked for even length blank strings.
b)
...
int mid = strLen / 2;
for (int i = 1, m = mid; m < strLen; m++, i++) {
if (!Character.isWhitespace(cs.charAt(m)) || ( i < mid && !Character.isWhitespace(cs.charAt(i))) ) {
...
Here we loose some minor performance on very large blank strings due
to the extra check in the loop.
I opt for b).
Kai
> Optimize isBlank() for untrimmed strings
> ----------------------------------------
>
> Key: LANG-604
> URL: https://issues.apache.org/jira/browse/LANG-604
> Project: Commons Lang
> Issue Type: Improvement
> Components: lang.*
> Affects Versions: 3.0
> Reporter: Kai Gülzau
> Priority: Minor
>
> Change isBlank() to start iteration in the middle of the String.
> So you get better performance for untrimmed Strings like " dummy ".
> Here is my proposal:
> public static boolean isBlank(CharSequence cs) {
> int strLen;
> if (cs == null || (strLen = cs.length()) == 0) {
> return true;
> }
> int mid = strLen / 2, i = mid;
> for (; i < strLen; i++) {
> if (!Character.isWhitespace(cs.charAt(i))) {
> return false;
> }
> }
> for (i = 0; i < mid; i++) {
> if (!Character.isWhitespace(cs.charAt(i))) {
> return false;
> }
> }
> return true;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (LANG-604) Optimize isBlank() for untrimmed
strings
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LANG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846475#action_12846475 ]
Henri Yandell commented on LANG-604:
------------------------------------
I'm not sure I buy that most of the strings checked are untrimmed.
I'd expect most to be either a) blank or b) normal user input. If the first character is whitespace; then I could believe that it's untrimmed.
So:
+ public static boolean isBlank(CharSequence cs) {
+ int strLen;
+ if (cs == null || (strLen = cs.length()) == 0) {
+ return true;
+ }
+ // Optimized - check first character
+ if (!Character.isWhitespace(cs.charAt(0))) {
+ return false;
+ }
+ // Optimized - starts in the middle and works out with the assumption that
+ // most input starting with whitespace are untrimmed strings
+ for (int m = 1 + strLen / 2, i = 1; m < strLen; m++, i++) {
+ if (!Character.isWhitespace(cs.charAt(m)) || !Character.isWhitespace(cs.charAt(i))) {
+ return false;
+ }
+ }
+ return true;
+ }
Any thoughts?
> Optimize isBlank() for untrimmed strings
> ----------------------------------------
>
> Key: LANG-604
> URL: https://issues.apache.org/jira/browse/LANG-604
> Project: Commons Lang
> Issue Type: Improvement
> Components: lang.*
> Affects Versions: 3.0
> Reporter: Kai Gülzau
> Priority: Minor
>
> Change isBlank() to start iteration in the middle of the String.
> So you get better performance for untrimmed Strings like " dummy ".
> Here is my proposal:
> public static boolean isBlank(CharSequence cs) {
> int strLen;
> if (cs == null || (strLen = cs.length()) == 0) {
> return true;
> }
> int mid = strLen / 2, i = mid;
> for (; i < strLen; i++) {
> if (!Character.isWhitespace(cs.charAt(i))) {
> return false;
> }
> }
> for (i = 0; i < mid; i++) {
> if (!Character.isWhitespace(cs.charAt(i))) {
> return false;
> }
> }
> return true;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.