You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Adrian Nistor (JIRA)" <ji...@apache.org> on 2012/05/27 21:54:23 UTC
[jira] [Updated] (COLLECTIONS-406) ListUtils.subtract is very slow
[ https://issues.apache.org/jira/browse/COLLECTIONS-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrian Nistor updated COLLECTIONS-406:
--------------------------------------
Attachment: Test.java
patch.diff
> ListUtils.subtract is very slow
> --------------------------------
>
> Key: COLLECTIONS-406
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-406
> Project: Commons Collections
> Issue Type: Bug
> Components: Core
> Affects Versions: 3.2.1
> Environment: java 1.6.0_24
> Ubuntu 11.10
> Reporter: Adrian Nistor
> Attachments: Test.java, patch.diff
>
>
> Hi,
> ListUtils.subtract is very slow when subtracting two large lists. The
> root cause of this problem is similar to the root cause of the
> previously fixed COLLECTIONS-328 in ListUtils, i.e., quadratic
> complexity instead of linear complexity.
> I am encountering this problem in version 3.2.1 and also in revision
> 1342815 (May 25th 2012). I have attached a test that exposes this
> problem and a simple patch. On my machine, for the attached test,
> this patch provides a 95X speedup.
> To run the test, just do:
> $ java Test
> Currently, the code for ListUtils.subtract is:
> final ArrayList<E> result = new ArrayList<E>(list1);
> for (E e : list2) {
> result.remove(e);
> }
> return result;
> which is quadratic, because result.remove(e) has linear complexity.
> The attached patch makes the remove() be constant complexity by
> removing from an org.apache.commons.collections.bag.HashBag. I use
> HashBag and not HashSet because ListUtils.subtract needs to respect
> the cardinality when there are repeated objects in the list.
> As mentioned in COLLECTIONS-328 discussion, for small lists, there is
> some overhead for creating the HashBag. This can be fixed with a
> threshold, but I did not implement it in my patch because the
> COLLECTIONS-328 patch does not implement it.
> Unlike the patch for COLLECTIONS-328, my patch does not choose the
> list to iterate over based on size, because of the cardinality
> requirement in subtract. This means the code could be made even
> faster if we could use something like a LinkedHashBag but neither
> Apache Collections nor standard Java libraries have such a class.
> Even so, this patch is still a lot faster than the original version.
> Is this truly a bug, or am I missing something here? If so, can you
> please confirm if the patch is correct?
> Thanks,
> Adrian
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira