You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Thomas Neidhart (JIRA)" <ji...@apache.org> on 2012/06/19 21:24:42 UTC

[jira] [Resolved] (COLLECTIONS-408) performance problem in SetUniqueList.removeAll()

     [ https://issues.apache.org/jira/browse/COLLECTIONS-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart resolved COLLECTIONS-408.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0

Fixed in r1351804.

I have not applied the patch, but rather used a similar approach as for ListOrderedSet: instead of using the removeAll method of the underlying collection, iterate over the elements of the collection to be removed and call the remove method on each element. The remove method does now the same check as in ListOrderedSet, just the other way round: first remove in the set, and then in the list if something was removed.
                
> performance problem in SetUniqueList.removeAll()
> ------------------------------------------------
>
>                 Key: COLLECTIONS-408
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-408
>             Project: Commons Collections
>          Issue Type: Bug
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>             Fix For: 4.0
>
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in SetUniqueList.removeAll().
> It appears in version 3.2.1 and also in revision 1344775 (31 May
> 2012).  I have attached a test that exposes this problem and a
> one-line patch that fixes it.  The patch makes the code two times
> faster for this test.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is: 5027
> The output for the patched version is:
> Time is: 2554
> The one-line patch I attached changes the 
> SetUniqueList.removeAll(Collection<?> coll) code from:
> boolean result = super.removeAll(coll);
> set.removeAll(coll);
> return result;
> to:
> boolean result = super.removeAll(coll);
> if (result) set.removeAll(coll);
> return result;
> If "super.removeAll(coll)" did not change the collection, there is no
> need to call "set.removeAll(coll)", because we already know there is
> nothing to remove.
> As one may expect "set.removeAll(coll)" (on a set) to be faster than
> "super.removeAll(coll)" (on a list), one may have expected the speedup
> gained by avoiding "set.removeAll(coll)" to be smaller than 2X
> achieved for the attached test.  However, the speedup is 2X because
> "java.util.HashSet.removeAll(Collection<?> collection)" has quadratic
> (not linear) complexity if "this.size() <= collection.size()" and the
> "collection" is a list.  Thus, "set.removeAll(coll)" is about as slow
> as "super.removeAll(coll)" in this case, and not executing
> "set.removeAll(coll)" reduces the work done by half.  The quadratic
> behavior of "java.util.HashSet.removeAll(Collection<?> collection)"
> comes from "java.util.AbstractSet.removeAll(Collection<?> c)" and is
> discussed for example here:
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007148.html
> (The link is for OpenJDK, but Oracle JDK has the same problem.)
> In many other cases "set.removeAll(coll)" is actually faster than
> "super.removeAll(coll)", so one can get even more speedup by
> reordering those two checks:
> boolean result = set.removeAll(coll);
> if (result) super.removeAll(coll);
> return result;
> Is this a bug, or am I misunderstanding the intended behavior?  If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira