You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Mert Guldur (JIRA)" <ji...@apache.org> on 2012/09/06 18:09:08 UTC
[jira] [Updated] (COLLECTIONS-434) performance problem in
CursorableLinkedList.containsAll()
[ https://issues.apache.org/jira/browse/COLLECTIONS-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mert Guldur updated COLLECTIONS-434:
------------------------------------
Attachment: Test.java
patchSmall.diff
patchFull.diff
> performance problem in CursorableLinkedList.containsAll()
> ---------------------------------------------------------
>
> Key: COLLECTIONS-434
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-434
> Project: Commons Collections
> Issue Type: Bug
> Affects Versions: 3.2.1
> Reporter: Mert Guldur
> Attachments: patchFull.diff, patchSmall.diff, Test.java
>
>
> I am encountering a performance problem in
> CursorableLinkedList.containsAll() (which also affects
> NodeCachingLinkedList). It appears in version 3.2.1 and also in
> revision 1381642. I attached a test that exposes this problem and a
> patch that fixes it. On my machine, for this test, the patch provides
> a 182X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5110
> The output for the patched version is:
> Time is 28
> The problem is that "containsAll(Collection<?> coll)" iterates over
> the elements of "coll" and calls "contains" on "this", which is very
> slow, because "this" is a list.
> This problem is different from COLLECTIONS-416. As established in the
> COLLECTIONS-416 discussion, there the user was responsible for passing
> the proper data structure as the parameter "coll" to the method.
> However, for "AbstractLinkedList.containsAll()", the problem is not
> related to the collection passed as the parameter. The problem will
> always manifest for this method, irrespective of the parameter type.
> I attached two patches. Both implement a linear time algorithm using
> a HashSet. patchSmall.diff populates the HashSet eagerly, and
> patchFull.diff populates the HashSet lazily. patchFull.diff is faster
> than patchSmall.diff when containsAll() returns false after inspecting
> only a few elements, though in most cases they are equally fast. I am
> including patchSmall.diff just in case you prefer smaller code.
> There is a potential issue in using a HashSet for comparisons: HashSet
> uses "equals()" not "AbstractLinkedList.isEqualValue(Object value1,
> Object value2)". Currently, this is not a problem, because
> "AbstractLinkedList.isEqualValue" is implemented like this:
> {code:java|borderStyle=solid}
> protected boolean isEqualValue(Object value1, Object value2) {
> return (value1 == value2 || (value1 == null ? false : value1.equals(value2)));
> }
> {code}
> which is the code used by HashSet to check equality, and no subclass
> of AbstractLinkedList (there are only two subclasses:
> "CursorableLinkedList" and "NodeCachingLinkedList") overrides
> "isEqualValue".
> If "isEqualValue" is indeed likely to be overridden in the future, we
> can either (1) wrap the objects in the HashSet and define an
> "equals()" that calls "isEqualValue" or (2) detect if the method is
> really overridden through reflection with something like this:
> {code:java|borderStyle=solid}
> boolean isOverriden = false;
> Class<?> curClass = this.getClass();
> while (!curClass.equals(AbstractLinkedList.class)) {
> try {
> curClass.getDeclaredMethod("isEqualValue", Object.class, Object.class);
> curClass = curClass.getSuperclass();
> } catch (Exception e) {
> isOverriden = true;
> break;
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira