You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2012/06/03 12:35:23 UTC
[jira] [Commented] (COLLECTIONS-406) ListUtils.subtract is very
slow
[ https://issues.apache.org/jira/browse/COLLECTIONS-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288138#comment-13288138 ]
Hudson commented on COLLECTIONS-406:
------------------------------------
Integrated in commons-collections #20 (See [https://builds.apache.org/job/commons-collections/20/])
[COLLECTIONS-406] Improved ListUtils.subtract to O(n) performance. Thanks to Adrian Nistor for reporting and providing a patch. (Revision 1345644)
Result = SUCCESS
tn : http://svn.apache.org/viewvc/?view=rev&rev=1345644
Files :
* /commons/proper/collections/trunk/src/main/java/org/apache/commons/collections/ListUtils.java
* /commons/proper/collections/trunk/src/test/java/org/apache/commons/collections/TestListUtils.java
> ListUtils.subtract is very slow
> --------------------------------
>
> Key: COLLECTIONS-406
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-406
> Project: Commons Collections
> Issue Type: Bug
> Affects Versions: 3.2.1
> Environment: java 1.6.0_24
> Ubuntu 11.10
> Reporter: Adrian Nistor
> Fix For: 4.0
>
> Attachments: Test.java, patch.diff
>
>
> Hi,
> ListUtils.subtract is very slow when subtracting two large lists. The
> root cause of this problem is similar to the root cause of the
> previously fixed COLLECTIONS-328 in ListUtils, i.e., quadratic
> complexity instead of linear complexity.
> I am encountering this problem in version 3.2.1 and also in revision
> 1342815 (May 25th 2012). I have attached a test that exposes this
> problem and a simple patch. On my machine, for the attached test,
> this patch provides a 95X speedup.
> To run the test, just do:
> $ java Test
> Currently, the code for ListUtils.subtract is:
> final ArrayList<E> result = new ArrayList<E>(list1);
> for (E e : list2) {
> result.remove(e);
> }
> return result;
> which is quadratic, because result.remove(e) has linear complexity.
> The attached patch makes the remove() be constant complexity by
> removing from an org.apache.commons.collections.bag.HashBag. I use
> HashBag and not HashSet because ListUtils.subtract needs to respect
> the cardinality when there are repeated objects in the list.
> As mentioned in COLLECTIONS-328 discussion, for small lists, there is
> some overhead for creating the HashBag. This can be fixed with a
> threshold, but I did not implement it in my patch because the
> COLLECTIONS-328 patch does not implement it.
> Unlike the patch for COLLECTIONS-328, my patch does not choose the
> list to iterate over based on size, because of the cardinality
> requirement in subtract. This means the code could be made even
> faster if we could use something like a LinkedHashBag but neither
> Apache Collections nor standard Java libraries have such a class.
> Even so, this patch is still a lot faster than the original version.
> Is this truly a bug, or am I missing something here? If so, can you
> please confirm if the patch is correct?
> Thanks,
> Adrian
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira