You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Adrian Nistor (JIRA)" <ji...@apache.org> on 2012/05/27 21:54:23 UTC

[jira] [Updated] (COLLECTIONS-406) ListUtils.subtract is very slow

     [ https://issues.apache.org/jira/browse/COLLECTIONS-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Nistor updated COLLECTIONS-406:
--------------------------------------

    Attachment: Test.java
                patch.diff
    
> ListUtils.subtract is very slow 
> --------------------------------
>
>                 Key: COLLECTIONS-406
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-406
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, patch.diff
>
>
> Hi,
> ListUtils.subtract is very slow when subtracting two large lists.  The
> root cause of this problem is similar to the root cause of the
> previously fixed COLLECTIONS-328 in ListUtils, i.e., quadratic
> complexity instead of linear complexity.
> I am encountering this problem in version 3.2.1 and also in revision
> 1342815 (May 25th 2012).  I have attached a test that exposes this
> problem and a simple patch.  On my machine, for the attached test,
> this patch provides a 95X speedup.
> To run the test, just do:
> $ java Test
> Currently, the code for ListUtils.subtract is:
> final ArrayList<E> result = new ArrayList<E>(list1);
> for (E e : list2) {
>     result.remove(e);
> }
> return result;
> which is quadratic, because result.remove(e) has linear complexity.
> The attached patch makes the remove() be constant complexity by
> removing from an org.apache.commons.collections.bag.HashBag.  I use
> HashBag and not HashSet because ListUtils.subtract needs to respect
> the cardinality when there are repeated objects in the list.
> As mentioned in COLLECTIONS-328 discussion, for small lists, there is
> some overhead for creating the HashBag.  This can be fixed with a
> threshold, but I did not implement it in my patch because the
> COLLECTIONS-328 patch does not implement it.
> Unlike the patch for COLLECTIONS-328, my patch does not choose the
> list to iterate over based on size, because of the cardinality
> requirement in subtract.  This means the code could be made even
> faster if we could use something like a LinkedHashBag but neither
> Apache Collections nor standard Java libraries have such a class.
> Even so, this patch is still a lot faster than the original version.
> Is this truly a bug, or am I missing something here?  If so, can you
> please confirm if the patch is correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira