You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Sebb (Jira)" <ji...@apache.org> on 2020/09/12 22:25:00 UTC

[jira] [Resolved] (LANG-1606) StringUtils.countMatches returns incorrect value while handling intersecting substrings

     [ https://issues.apache.org/jira/browse/LANG-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebb resolved LANG-1606.
------------------------
    Fix Version/s: 3.12
       Resolution: Fixed

Clarified Javadoc; added more tests to agree with Javadoc

> StringUtils.countMatches returns incorrect value while handling intersecting substrings
> ---------------------------------------------------------------------------------------
>
>                 Key: LANG-1606
>                 URL: https://issues.apache.org/jira/browse/LANG-1606
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.11
>            Reporter: Rustem Galiev
>            Priority: Major
>             Fix For: 3.12
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> 1. Call the method like that:
> {code:java}
> int count = StringUtils.countMatches("abaabaababaab", "aba");
> {code}
> Actual result: the value of count variable equals 3
>  Expected result: the value of count variable equals 4
> The substrings are highlighted in red:
>  {color:#ff0000}aba{color}abaababaab
>  aba{color:#ff0000}aba{color}ababaab
>  abaaba{color:#ff0000}aba{color}baab
>  abaabaab{color:#ff0000}aba{color}ab
> Method returns incorrect value because of this code:
> {code:java}
> while ((idx = CharSequenceUtils.indexOf(str, sub, idx)) != INDEX_NOT_FOUND) {
>     count++;
>     idx += sub.length();
> }
> {code}
> This looks like a greedy algorithm - but increasing the idx variable by the length of substring could lead to the problems like in example:
> Let's say that idx = 6, so we try to find a substring in the highlighted suffix:
>  abaaba{color:#ff0000}ababaab{color}
> We found the substring, so idx now becomes idx + 3 = 9. So now this suffix will be used for searching substring in it:
>  abaabaaba{color:#ff0000}baab{color}
>  But because of increasing the value of idx by 3 we won't find the substring (abaabaab{color:#ff0000}aba{color}ab) which intersects with the already found substring on the last step.
> Basically, this method will work incorrectly with any substrings that intersect with each other.
> There is also a unit test with incorrect expected value:
> {code:java}
> assertEquals(4,
>      StringUtils.countMatches("oooooooooooo", "ooo"));
> {code}
> If this behavior (counting substrings that do not intersect) is intended, please update the JavaDoc to reflect it. Right now it looks like that:
> {code:java}
> Counts how many times the substring appears in the larger string.
> {code}
> Link for the PR: https://github.com/apache/commons-lang/pull/615



--
This message was sent by Atlassian Jira
(v8.3.4#803005)