You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Adrian Nistor (JIRA)" <ji...@apache.org> on 2012/06/29 23:11:42 UTC

[jira] [Created] (COLLECTIONS-416) ListUtils.removeAll() is very slow

Adrian Nistor created COLLECTIONS-416:
-----------------------------------------

             Summary: ListUtils.removeAll() is very slow
                 Key: COLLECTIONS-416
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-416
             Project: Commons Collections
          Issue Type: Bug
    Affects Versions: 3.2.1
         Environment: java 1.6.0_24
Ubuntu 11.10
            Reporter: Adrian Nistor
         Attachments: Test.java, patch.diff

Hi,

I am encountering a performance problem in ListUtils.removeAll().  It
appears in version 3.2.1 and also in revision 1355448.  I attached a
test that exposes this problem and a one-line patch that fixes it.  On
my machine, for this test, the patch provides a 217X speedup.

To run the test, just do:

$ java Test

The output for the un-patched version is:
Time is 5430

The output for the patched version is:
Time is 25

As the patch shows, the problem is that
"ListUtils.removeAll(Collection<E> collection, Collection<?> remove)"
performs "remove.contains(obj)" for each element in "collection",
which can be very expensive if "remove.contains(obj)" is expensive,
e.g., when "remove" is a list.

The one-line patch I attached puts the elements of "remove" in a
HashSet (which has very fast "contains()"), if "remove" is not already
a set:

"if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"

Is this a bug, or am I misunderstanding the intended behavior? If so,
can you please confirm that the patch is correct?

Thanks,

Adrian


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COLLECTIONS-416) ListUtils.removeAll() is very slow

Posted by "Adrian Nistor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COLLECTIONS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406156#comment-13406156 ] 

Adrian Nistor commented on COLLECTIONS-416:
-------------------------------------------

Hi Thomas,

Yes, you are absolutely right, my patch:

"if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"

assumes that anything else except Set has slow contains(), which is
false.

Maybe the best tradeoff is to document this problem, just like you
said, and handle the most common case of slow contains(), i.e., when
the collection is a list:

"if (remove instanceof java.util.List) remove = new HashSet<Object>(remove);"

The scalable solution would be to have a helper method, presumably in
CollectionUtils:

public static <E> Collection<E> createFastContainsCollection(Collection<E> c) {
   return (c instanceof List<E>) ? new HashSet<E>(c) : c;
}

and use it when the complexity of contains() affects the complexity of
the algorithm.  This is similar with choosing your algorithm based on
if a collection implements java.util.RandomAccess or not.

Best,

Adrian


                
> ListUtils.removeAll() is very slow
> ----------------------------------
>
>                 Key: COLLECTIONS-416
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-416
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in ListUtils.removeAll().  It
> appears in version 3.2.1 and also in revision 1355448.  I attached a
> test that exposes this problem and a one-line patch that fixes it.  On
> my machine, for this test, the patch provides a 217X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5430
> The output for the patched version is:
> Time is 25
> As the patch shows, the problem is that
> "ListUtils.removeAll(Collection<E> collection, Collection<?> remove)"
> performs "remove.contains(obj)" for each element in "collection",
> which can be very expensive if "remove.contains(obj)" is expensive,
> e.g., when "remove" is a list.
> The one-line patch I attached puts the elements of "remove" in a
> HashSet (which has very fast "contains()"), if "remove" is not already
> a set:
> "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"
> Is this a bug, or am I misunderstanding the intended behavior? If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COLLECTIONS-416) ListUtils.removeAll() is very slow

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COLLECTIONS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414674#comment-13414674 ] 

Thomas Neidhart commented on COLLECTIONS-416:
---------------------------------------------

Hi Adrian,

maybe this case is not as clear as for example COLLECTIONS-420. I think that users should use the proper data structures for their use-cases and we should be careful to not just mimick HashSet behavior regardless of what comes along. If somebody provides a List to removeAll, he/she needs to be aware that this will slow things down, and that a different data structure would be more appropriate (IF it contains lots of data of course, when there are just 1 or 2 elements it makes not much of a difference).

Other people may have different opinions on this?

Thomas
                
> ListUtils.removeAll() is very slow
> ----------------------------------
>
>                 Key: COLLECTIONS-416
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-416
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in ListUtils.removeAll().  It
> appears in version 3.2.1 and also in revision 1355448.  I attached a
> test that exposes this problem and a one-line patch that fixes it.  On
> my machine, for this test, the patch provides a 217X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5430
> The output for the patched version is:
> Time is 25
> As the patch shows, the problem is that
> "ListUtils.removeAll(Collection<E> collection, Collection<?> remove)"
> performs "remove.contains(obj)" for each element in "collection",
> which can be very expensive if "remove.contains(obj)" is expensive,
> e.g., when "remove" is a list.
> The one-line patch I attached puts the elements of "remove" in a
> HashSet (which has very fast "contains()"), if "remove" is not already
> a set:
> "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"
> Is this a bug, or am I misunderstanding the intended behavior? If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COLLECTIONS-416) ListUtils.removeAll() is very slow

Posted by "Adrian Nistor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COLLECTIONS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415173#comment-13415173 ] 

Adrian Nistor commented on COLLECTIONS-416:
-------------------------------------------

Hi Thomas,

Sure, I agree, documenting this behavior is equally good.

Best,

Adrian


                
> ListUtils.removeAll() is very slow
> ----------------------------------
>
>                 Key: COLLECTIONS-416
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-416
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in ListUtils.removeAll().  It
> appears in version 3.2.1 and also in revision 1355448.  I attached a
> test that exposes this problem and a one-line patch that fixes it.  On
> my machine, for this test, the patch provides a 217X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5430
> The output for the patched version is:
> Time is 25
> As the patch shows, the problem is that
> "ListUtils.removeAll(Collection<E> collection, Collection<?> remove)"
> performs "remove.contains(obj)" for each element in "collection",
> which can be very expensive if "remove.contains(obj)" is expensive,
> e.g., when "remove" is a list.
> The one-line patch I attached puts the elements of "remove" in a
> HashSet (which has very fast "contains()"), if "remove" is not already
> a set:
> "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"
> Is this a bug, or am I misunderstanding the intended behavior? If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COLLECTIONS-416) ListUtils.removeAll() is very slow

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COLLECTIONS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422638#comment-13422638 ] 

Thomas Neidhart commented on COLLECTIONS-416:
---------------------------------------------

Added clarifying javadoc to the method in r1365784.
                
> ListUtils.removeAll() is very slow
> ----------------------------------
>
>                 Key: COLLECTIONS-416
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-416
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in ListUtils.removeAll().  It
> appears in version 3.2.1 and also in revision 1355448.  I attached a
> test that exposes this problem and a one-line patch that fixes it.  On
> my machine, for this test, the patch provides a 217X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5430
> The output for the patched version is:
> Time is 25
> As the patch shows, the problem is that
> "ListUtils.removeAll(Collection<E> collection, Collection<?> remove)"
> performs "remove.contains(obj)" for each element in "collection",
> which can be very expensive if "remove.contains(obj)" is expensive,
> e.g., when "remove" is a list.
> The one-line patch I attached puts the elements of "remove" in a
> HashSet (which has very fast "contains()"), if "remove" is not already
> a set:
> "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"
> Is this a bug, or am I misunderstanding the intended behavior? If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COLLECTIONS-416) ListUtils.removeAll() is very slow

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COLLECTIONS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406087#comment-13406087 ] 

Thomas Neidhart commented on COLLECTIONS-416:
---------------------------------------------

Hi Adrian,

in this case (similar to the other issues like COLLECTIONS-415, 417, 418) I am not sure if the proposed patch is the right way to go. I would actually prefer to document the behavior of the method and to make it clear that removeAll will call contains() on the collection to be removed, so that users have to use a collection that supports this operation fast, e.g. by wrapping it themselves in the same way as outlined in the patch.
                
> ListUtils.removeAll() is very slow
> ----------------------------------
>
>                 Key: COLLECTIONS-416
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-416
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in ListUtils.removeAll().  It
> appears in version 3.2.1 and also in revision 1355448.  I attached a
> test that exposes this problem and a one-line patch that fixes it.  On
> my machine, for this test, the patch provides a 217X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5430
> The output for the patched version is:
> Time is 25
> As the patch shows, the problem is that
> "ListUtils.removeAll(Collection<E> collection, Collection<?> remove)"
> performs "remove.contains(obj)" for each element in "collection",
> which can be very expensive if "remove.contains(obj)" is expensive,
> e.g., when "remove" is a list.
> The one-line patch I attached puts the elements of "remove" in a
> HashSet (which has very fast "contains()"), if "remove" is not already
> a set:
> "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"
> Is this a bug, or am I misunderstanding the intended behavior? If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COLLECTIONS-416) ListUtils.removeAll() is very slow

Posted by "Adrian Nistor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COLLECTIONS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Nistor updated COLLECTIONS-416:
--------------------------------------

    Attachment: Test.java
                patch.diff
    
> ListUtils.removeAll() is very slow
> ----------------------------------
>
>                 Key: COLLECTIONS-416
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-416
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in ListUtils.removeAll().  It
> appears in version 3.2.1 and also in revision 1355448.  I attached a
> test that exposes this problem and a one-line patch that fixes it.  On
> my machine, for this test, the patch provides a 217X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5430
> The output for the patched version is:
> Time is 25
> As the patch shows, the problem is that
> "ListUtils.removeAll(Collection<E> collection, Collection<?> remove)"
> performs "remove.contains(obj)" for each element in "collection",
> which can be very expensive if "remove.contains(obj)" is expensive,
> e.g., when "remove" is a list.
> The one-line patch I attached puts the elements of "remove" in a
> HashSet (which has very fast "contains()"), if "remove" is not already
> a set:
> "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"
> Is this a bug, or am I misunderstanding the intended behavior? If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COLLECTIONS-416) ListUtils.removeAll() is very slow

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COLLECTIONS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422644#comment-13422644 ] 

Thomas Neidhart commented on COLLECTIONS-416:
---------------------------------------------

Resolved as "Won't Fix". The user is responsible for using proper data structures as argument to this method. This is also inline with the jdk whenever there are are methods that take a Collection as input.
                
> ListUtils.removeAll() is very slow
> ----------------------------------
>
>                 Key: COLLECTIONS-416
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-416
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in ListUtils.removeAll().  It
> appears in version 3.2.1 and also in revision 1355448.  I attached a
> test that exposes this problem and a one-line patch that fixes it.  On
> my machine, for this test, the patch provides a 217X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5430
> The output for the patched version is:
> Time is 25
> As the patch shows, the problem is that
> "ListUtils.removeAll(Collection<E> collection, Collection<?> remove)"
> performs "remove.contains(obj)" for each element in "collection",
> which can be very expensive if "remove.contains(obj)" is expensive,
> e.g., when "remove" is a list.
> The one-line patch I attached puts the elements of "remove" in a
> HashSet (which has very fast "contains()"), if "remove" is not already
> a set:
> "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"
> Is this a bug, or am I misunderstanding the intended behavior? If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (COLLECTIONS-416) ListUtils.removeAll() is very slow

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COLLECTIONS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart resolved COLLECTIONS-416.
-----------------------------------------

    Resolution: Won't Fix
    
> ListUtils.removeAll() is very slow
> ----------------------------------
>
>                 Key: COLLECTIONS-416
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-416
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> I am encountering a performance problem in ListUtils.removeAll().  It
> appears in version 3.2.1 and also in revision 1355448.  I attached a
> test that exposes this problem and a one-line patch that fixes it.  On
> my machine, for this test, the patch provides a 217X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5430
> The output for the patched version is:
> Time is 25
> As the patch shows, the problem is that
> "ListUtils.removeAll(Collection<E> collection, Collection<?> remove)"
> performs "remove.contains(obj)" for each element in "collection",
> which can be very expensive if "remove.contains(obj)" is expensive,
> e.g., when "remove" is a list.
> The one-line patch I attached puts the elements of "remove" in a
> HashSet (which has very fast "contains()"), if "remove" is not already
> a set:
> "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"
> Is this a bug, or am I misunderstanding the intended behavior? If so,
> can you please confirm that the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira