You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Adrian Nistor (JIRA)" <ji...@apache.org> on 2012/06/20 22:45:42 UTC

[jira] [Created] (COLLECTIONS-412) CollectionUtils.subtract() is very slow

Adrian Nistor created COLLECTIONS-412:
-----------------------------------------

             Summary: CollectionUtils.subtract() is very slow
                 Key: COLLECTIONS-412
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-412
             Project: Commons Collections
          Issue Type: Bug
         Environment: java 1.6.0_24
Ubuntu 11.10
            Reporter: Adrian Nistor
         Attachments: Test.java, patch.diff

Hi,

I am encountering a performance problem in CollectionUtils.subtract().
It appears in version 3.2.1 and also in revision 1352300 (20 June
2012).  I attached a test that exposes this problem and a patch that
fixes it.  On my machine, for this test, the patch provides a 204X
speedup.

To run the test, just do:

$ java Test

The output for the un-patched version is:
Time is 11036

The output for the patched version is:
Time is 54

The root cause of this problem is similar to the root cause of the
previously fixed COLLECTIONS-406 in ListUtils.subtract(), i.e.,
quadratic complexity instead of linear complexity.  This problem
affects two methods:

CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b)
and
CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)

because the former just calls the latter.

Currently, the code for
"CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)"
is:

ArrayList<O> list = new ArrayList<O>();
addAll(list, a);
for (O element : b) {
    if (p.evaluate(element)) {
        list.remove(element);
    }
}

which is quadratic, because "list.remove(element)" has linear
complexity for "ArrayList<O> list = new ArrayList<O>()".

The attached patch makes the remove() be constant complexity by
removing from an org.apache.commons.collections.bag.HashBag.  The
attached patch is very similar to the patch of COLLECTIONS-406, so I
assume the risk of applying this patch is minimal.  Just like in the
patch for COLLECTIONS-406, this patch uses a HashBag (and not a
HashSet) to respect cardinality when there are repeated objects.

Can you please confirm if the patch is correct?

Thanks,

Adrian



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COLLECTIONS-412) CollectionUtils.subtract() is very slow

Posted by "Adrian Nistor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COLLECTIONS-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Nistor updated COLLECTIONS-412:
--------------------------------------

    Attachment: Test.java
                patch.diff
    
> CollectionUtils.subtract() is very slow
> ---------------------------------------
>
>                 Key: COLLECTIONS-412
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-412
>             Project: Commons Collections
>          Issue Type: Bug
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in CollectionUtils.subtract().
> It appears in version 3.2.1 and also in revision 1352300 (20 June
> 2012).  I attached a test that exposes this problem and a patch that
> fixes it.  On my machine, for this test, the patch provides a 204X
> speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 11036
> The output for the patched version is:
> Time is 54
> The root cause of this problem is similar to the root cause of the
> previously fixed COLLECTIONS-406 in ListUtils.subtract(), i.e.,
> quadratic complexity instead of linear complexity.  This problem
> affects two methods:
> CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b)
> and
> CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)
> because the former just calls the latter.
> Currently, the code for
> "CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)"
> is:
> ArrayList<O> list = new ArrayList<O>();
> addAll(list, a);
> for (O element : b) {
>     if (p.evaluate(element)) {
>         list.remove(element);
>     }
> }
> which is quadratic, because "list.remove(element)" has linear
> complexity for "ArrayList<O> list = new ArrayList<O>()".
> The attached patch makes the remove() be constant complexity by
> removing from an org.apache.commons.collections.bag.HashBag.  The
> attached patch is very similar to the patch of COLLECTIONS-406, so I
> assume the risk of applying this patch is minimal.  Just like in the
> patch for COLLECTIONS-406, this patch uses a HashBag (and not a
> HashSet) to respect cardinality when there are repeated objects.
> Can you please confirm if the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (COLLECTIONS-412) CollectionUtils.subtract() is very slow

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COLLECTIONS-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart resolved COLLECTIONS-412.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0

Fixed in r1353111.

I did not apply the patch, but rather changed the method in a similar way as outlined in the patch. Instead of evaluating the predicate on all elements of A, only elements of B that satisfy the predicate are added to a bag. The rest is similar to the patch.

Also added more information in the javadoc how the resulting collection will look like depending on the predicate evaluation.

Thanks for reporting and the patch!
                
> CollectionUtils.subtract() is very slow
> ---------------------------------------
>
>                 Key: COLLECTIONS-412
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-412
>             Project: Commons Collections
>          Issue Type: Bug
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>             Fix For: 4.0
>
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in CollectionUtils.subtract().
> It appears in version 3.2.1 and also in revision 1352300 (20 June
> 2012).  I attached a test that exposes this problem and a patch that
> fixes it.  On my machine, for this test, the patch provides a 204X
> speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 11036
> The output for the patched version is:
> Time is 54
> The root cause of this problem is similar to the root cause of the
> previously fixed COLLECTIONS-406 in ListUtils.subtract(), i.e.,
> quadratic complexity instead of linear complexity.  This problem
> affects two methods:
> CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b)
> and
> CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)
> because the former just calls the latter.
> Currently, the code for
> "CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)"
> is:
> ArrayList<O> list = new ArrayList<O>();
> addAll(list, a);
> for (O element : b) {
>     if (p.evaluate(element)) {
>         list.remove(element);
>     }
> }
> which is quadratic, because "list.remove(element)" has linear
> complexity for "ArrayList<O> list = new ArrayList<O>()".
> The attached patch makes the remove() be constant complexity by
> removing from an org.apache.commons.collections.bag.HashBag.  The
> attached patch is very similar to the patch of COLLECTIONS-406, so I
> assume the risk of applying this patch is minimal.  Just like in the
> patch for COLLECTIONS-406, this patch uses a HashBag (and not a
> HashSet) to respect cardinality when there are repeated objects.
> Can you please confirm if the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COLLECTIONS-412) CollectionUtils.subtract() is very slow

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COLLECTIONS-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399923#comment-13399923 ] 

Hudson commented on COLLECTIONS-412:
------------------------------------

Integrated in commons-collections #29 (See [https://builds.apache.org/job/commons-collections/29/])
    [COLLECTIONS-412] Improved performance of CollectionUtils#subtract. Thanks to Adrian Nistor for report and patch. (Revision 1353111)

     Result = FAILURE
tn : http://svn.apache.org/viewvc/?view=rev&rev=1353111
Files : 
* /commons/proper/collections/trunk/src/main/java/org/apache/commons/collections/CollectionUtils.java

                
> CollectionUtils.subtract() is very slow
> ---------------------------------------
>
>                 Key: COLLECTIONS-412
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-412
>             Project: Commons Collections
>          Issue Type: Bug
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>             Fix For: 4.0
>
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in CollectionUtils.subtract().
> It appears in version 3.2.1 and also in revision 1352300 (20 June
> 2012).  I attached a test that exposes this problem and a patch that
> fixes it.  On my machine, for this test, the patch provides a 204X
> speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 11036
> The output for the patched version is:
> Time is 54
> The root cause of this problem is similar to the root cause of the
> previously fixed COLLECTIONS-406 in ListUtils.subtract(), i.e.,
> quadratic complexity instead of linear complexity.  This problem
> affects two methods:
> CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b)
> and
> CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)
> because the former just calls the latter.
> Currently, the code for
> "CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)"
> is:
> ArrayList<O> list = new ArrayList<O>();
> addAll(list, a);
> for (O element : b) {
>     if (p.evaluate(element)) {
>         list.remove(element);
>     }
> }
> which is quadratic, because "list.remove(element)" has linear
> complexity for "ArrayList<O> list = new ArrayList<O>()".
> The attached patch makes the remove() be constant complexity by
> removing from an org.apache.commons.collections.bag.HashBag.  The
> attached patch is very similar to the patch of COLLECTIONS-406, so I
> assume the risk of applying this patch is minimal.  Just like in the
> patch for COLLECTIONS-406, this patch uses a HashBag (and not a
> HashSet) to respect cardinality when there are repeated objects.
> Can you please confirm if the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira