You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Rustem Galiev (Jira)" <ji...@apache.org> on 2020/09/01 08:32:00 UTC

[jira] [Created] (LANG-1606) StringUtils.countMatches returns incorrect value while handling intersecting substrings

Rustem Galiev created LANG-1606:
-----------------------------------

             Summary: StringUtils.countMatches returns incorrect value while handling intersecting substrings
                 Key: LANG-1606
                 URL: https://issues.apache.org/jira/browse/LANG-1606
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 3.11
            Reporter: Rustem Galiev


Steps to reproduce:

1. Call the method like that:
{code:java}
int count = StringUtils.countMatches("abaabaababaab", "aba");
{code}
Actual result: the value of count variable equals 3
 Expected result: the value of count variable equals 4

The substrings are highlighted in red:
 {color:#ff0000}aba{color}abaababaab
 aba{color:#ff0000}aba{color}ababaab
 abaaba{color:#ff0000}aba{color}baab
 abaabaab{color:#ff0000}aba{color}ab

Method returns incorrect value because of this code:
{code:java}
while ((idx = CharSequenceUtils.indexOf(str, sub, idx)) != INDEX_NOT_FOUND) {
    count++;
    idx += sub.length();
}
{code}
This looks like a greedy algorithm - but increasing the idx variable by the length of substring could lead to the problems like in example:

Let's say that idx = 6, so we try to find a substring in the highlighted suffix:
 abaaba{color:#ff0000}ababaab{color}

We found the substring, so idx now becomes idx + 3 = 9. So now this suffix will be used for searching substring in it:
 abaabaaba{color:#ff0000}baab{color}
 But because of increasing the value of idx by 3 we won't find the substring (abaabaab{color:#ff0000}aba{color}ab) which intersects with the already found substring on the last step.

Basically, this method will work incorrectly with any substrings that intersect with each other.

There is also a unit test with incorrect expected value:
{code:java}
assertEquals(4,
     StringUtils.countMatches("oooooooooooo", "ooo"));
{code}
If this behavior (counting substrings that do not intersect) is intended, please update the JavaDoc to reflect it. Right now it looks like that:
{code:java}
Counts how many times the substring appears in the larger string.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)