You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Adrian Nistor (JIRA)" <ji...@apache.org> on 2012/06/02 08:57:23 UTC

[jira] [Created] (COLLECTIONS-408) performance problem in SetUniqueList.removeAll()

Adrian Nistor created COLLECTIONS-408:
-----------------------------------------

             Summary: performance problem in SetUniqueList.removeAll()
                 Key: COLLECTIONS-408
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-408
             Project: Commons Collections
          Issue Type: Bug
         Environment: java 1.6.0_24
Ubuntu 11.10
            Reporter: Adrian Nistor
         Attachments: Test.java, patch.diff

Hi,

I am encountering a performance problem in SetUniqueList.removeAll().
It appears in version 3.2.1 and also in revision 1344775 (31 May
2012).  I have attached a test that exposes this problem and a
one-line patch that fixes it.  The patch makes the code two times
faster for this test.

To run the test, just do:

$ java Test

The output for the un-patched version is:
Time is: 5027

The output for the patched version is:
Time is: 2554

The one-line patch I attached changes the 
SetUniqueList.removeAll(Collection<?> coll) code from:

boolean result = super.removeAll(coll);
set.removeAll(coll);
return result;

to:

boolean result = super.removeAll(coll);
if (result) set.removeAll(coll);
return result;

If "super.removeAll(coll)" did not change the collection, there is no
need to call "set.removeAll(coll)", because we already know there is
nothing to remove.

As one may expect "set.removeAll(coll)" (on a set) to be faster than
"super.removeAll(coll)" (on a list), one may have expected the speedup
gained by avoiding "set.removeAll(coll)" to be smaller than 2X
achieved for the attached test.  However, the speedup is 2X because
"java.util.HashSet.removeAll(Collection<?> collection)" has quadratic
(not linear) complexity if "this.size() <= collection.size()" and the
"collection" is a list.  Thus, "set.removeAll(coll)" is about as slow
as "super.removeAll(coll)" in this case, and not executing
"set.removeAll(coll)" reduces the work done by half.  The quadratic
behavior of "java.util.HashSet.removeAll(Collection<?> collection)"
comes from "java.util.AbstractSet.removeAll(Collection<?> c)" and is
discussed for example here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007148.html
(The link is for OpenJDK, but Oracle JDK has the same problem.)

In many other cases "set.removeAll(coll)" is actually faster than
"super.removeAll(coll)", so one can get even more speedup by
reordering those two checks:

boolean result = set.removeAll(coll);
if (result) super.removeAll(coll);
return result;

Is this a bug, or am I misunderstanding the intended behavior?  If so,
can you please confirm that the patch is correct?

Thanks,

Adrian



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COLLECTIONS-408) performance problem in SetUniqueList.removeAll()

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COLLECTIONS-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397003#comment-13397003 ] 

Hudson commented on COLLECTIONS-408:
------------------------------------

Integrated in commons-collections #22 (See [https://builds.apache.org/job/commons-collections/22/])
    [COLLECTIONS-408] improve performance of remove and removeAll methods, add missing javadoc. Thanks to Adrian Nistor for reporting. (Revision 1351804)

     Result = SUCCESS
tn : http://svn.apache.org/viewvc/?view=rev&rev=1351804
Files : 
* /commons/proper/collections/trunk/src/main/java/org/apache/commons/collections/list/SetUniqueList.java

                
> performance problem in SetUniqueList.removeAll()
> ------------------------------------------------
>
>                 Key: COLLECTIONS-408
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-408
>             Project: Commons Collections
>          Issue Type: Bug
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>             Fix For: 4.0
>
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in SetUniqueList.removeAll().
> It appears in version 3.2.1 and also in revision 1344775 (31 May
> 2012).  I have attached a test that exposes this problem and a
> one-line patch that fixes it.  The patch makes the code two times
> faster for this test.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is: 5027
> The output for the patched version is:
> Time is: 2554
> The one-line patch I attached changes the 
> SetUniqueList.removeAll(Collection<?> coll) code from:
> boolean result = super.removeAll(coll);
> set.removeAll(coll);
> return result;
> to:
> boolean result = super.removeAll(coll);
> if (result) set.removeAll(coll);
> return result;
> If "super.removeAll(coll)" did not change the collection, there is no
> need to call "set.removeAll(coll)", because we already know there is
> nothing to remove.
> As one may expect "set.removeAll(coll)" (on a set) to be faster than
> "super.removeAll(coll)" (on a list), one may have expected the speedup
> gained by avoiding "set.removeAll(coll)" to be smaller than 2X
> achieved for the attached test.  However, the speedup is 2X because
> "java.util.HashSet.removeAll(Collection<?> collection)" has quadratic
> (not linear) complexity if "this.size() <= collection.size()" and the
> "collection" is a list.  Thus, "set.removeAll(coll)" is about as slow
> as "super.removeAll(coll)" in this case, and not executing
> "set.removeAll(coll)" reduces the work done by half.  The quadratic
> behavior of "java.util.HashSet.removeAll(Collection<?> collection)"
> comes from "java.util.AbstractSet.removeAll(Collection<?> c)" and is
> discussed for example here:
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007148.html
> (The link is for OpenJDK, but Oracle JDK has the same problem.)
> In many other cases "set.removeAll(coll)" is actually faster than
> "super.removeAll(coll)", so one can get even more speedup by
> reordering those two checks:
> boolean result = set.removeAll(coll);
> if (result) super.removeAll(coll);
> return result;
> Is this a bug, or am I misunderstanding the intended behavior?  If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COLLECTIONS-408) performance problem in SetUniqueList.removeAll()

Posted by "Adrian Nistor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COLLECTIONS-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Nistor updated COLLECTIONS-408:
--------------------------------------

    Attachment: Test.java
                patch.diff
    
> performance problem in SetUniqueList.removeAll()
> ------------------------------------------------
>
>                 Key: COLLECTIONS-408
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-408
>             Project: Commons Collections
>          Issue Type: Bug
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in SetUniqueList.removeAll().
> It appears in version 3.2.1 and also in revision 1344775 (31 May
> 2012).  I have attached a test that exposes this problem and a
> one-line patch that fixes it.  The patch makes the code two times
> faster for this test.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is: 5027
> The output for the patched version is:
> Time is: 2554
> The one-line patch I attached changes the 
> SetUniqueList.removeAll(Collection<?> coll) code from:
> boolean result = super.removeAll(coll);
> set.removeAll(coll);
> return result;
> to:
> boolean result = super.removeAll(coll);
> if (result) set.removeAll(coll);
> return result;
> If "super.removeAll(coll)" did not change the collection, there is no
> need to call "set.removeAll(coll)", because we already know there is
> nothing to remove.
> As one may expect "set.removeAll(coll)" (on a set) to be faster than
> "super.removeAll(coll)" (on a list), one may have expected the speedup
> gained by avoiding "set.removeAll(coll)" to be smaller than 2X
> achieved for the attached test.  However, the speedup is 2X because
> "java.util.HashSet.removeAll(Collection<?> collection)" has quadratic
> (not linear) complexity if "this.size() <= collection.size()" and the
> "collection" is a list.  Thus, "set.removeAll(coll)" is about as slow
> as "super.removeAll(coll)" in this case, and not executing
> "set.removeAll(coll)" reduces the work done by half.  The quadratic
> behavior of "java.util.HashSet.removeAll(Collection<?> collection)"
> comes from "java.util.AbstractSet.removeAll(Collection<?> c)" and is
> discussed for example here:
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007148.html
> (The link is for OpenJDK, but Oracle JDK has the same problem.)
> In many other cases "set.removeAll(coll)" is actually faster than
> "super.removeAll(coll)", so one can get even more speedup by
> reordering those two checks:
> boolean result = set.removeAll(coll);
> if (result) super.removeAll(coll);
> return result;
> Is this a bug, or am I misunderstanding the intended behavior?  If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (COLLECTIONS-408) performance problem in SetUniqueList.removeAll()

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COLLECTIONS-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart resolved COLLECTIONS-408.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0

Fixed in r1351804.

I have not applied the patch, but rather used a similar approach as for ListOrderedSet: instead of using the removeAll method of the underlying collection, iterate over the elements of the collection to be removed and call the remove method on each element. The remove method does now the same check as in ListOrderedSet, just the other way round: first remove in the set, and then in the list if something was removed.
                
> performance problem in SetUniqueList.removeAll()
> ------------------------------------------------
>
>                 Key: COLLECTIONS-408
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-408
>             Project: Commons Collections
>          Issue Type: Bug
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>             Fix For: 4.0
>
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in SetUniqueList.removeAll().
> It appears in version 3.2.1 and also in revision 1344775 (31 May
> 2012).  I have attached a test that exposes this problem and a
> one-line patch that fixes it.  The patch makes the code two times
> faster for this test.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is: 5027
> The output for the patched version is:
> Time is: 2554
> The one-line patch I attached changes the 
> SetUniqueList.removeAll(Collection<?> coll) code from:
> boolean result = super.removeAll(coll);
> set.removeAll(coll);
> return result;
> to:
> boolean result = super.removeAll(coll);
> if (result) set.removeAll(coll);
> return result;
> If "super.removeAll(coll)" did not change the collection, there is no
> need to call "set.removeAll(coll)", because we already know there is
> nothing to remove.
> As one may expect "set.removeAll(coll)" (on a set) to be faster than
> "super.removeAll(coll)" (on a list), one may have expected the speedup
> gained by avoiding "set.removeAll(coll)" to be smaller than 2X
> achieved for the attached test.  However, the speedup is 2X because
> "java.util.HashSet.removeAll(Collection<?> collection)" has quadratic
> (not linear) complexity if "this.size() <= collection.size()" and the
> "collection" is a list.  Thus, "set.removeAll(coll)" is about as slow
> as "super.removeAll(coll)" in this case, and not executing
> "set.removeAll(coll)" reduces the work done by half.  The quadratic
> behavior of "java.util.HashSet.removeAll(Collection<?> collection)"
> comes from "java.util.AbstractSet.removeAll(Collection<?> c)" and is
> discussed for example here:
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007148.html
> (The link is for OpenJDK, but Oracle JDK has the same problem.)
> In many other cases "set.removeAll(coll)" is actually faster than
> "super.removeAll(coll)", so one can get even more speedup by
> reordering those two checks:
> boolean result = set.removeAll(coll);
> if (result) super.removeAll(coll);
> return result;
> Is this a bug, or am I misunderstanding the intended behavior?  If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira