You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Mert Guldur (JIRA)" <ji...@apache.org> on 2012/09/06 18:09:08 UTC

[jira] [Updated] (COLLECTIONS-434) performance problem in CursorableLinkedList.containsAll()

     [ https://issues.apache.org/jira/browse/COLLECTIONS-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mert Guldur updated COLLECTIONS-434:
------------------------------------

    Attachment: Test.java
                patchSmall.diff
                patchFull.diff
    
> performance problem in CursorableLinkedList.containsAll()
> ---------------------------------------------------------
>
>                 Key: COLLECTIONS-434
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-434
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>            Reporter: Mert Guldur
>         Attachments: patchFull.diff, patchSmall.diff, Test.java
>
>
> I am encountering a performance problem in
> CursorableLinkedList.containsAll() (which also affects
> NodeCachingLinkedList).  It appears in version 3.2.1 and also in
> revision 1381642.  I attached a test that exposes this problem and a
> patch that fixes it.  On my machine, for this test, the patch provides
> a 182X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5110
> The output for the patched version is:
> Time is 28
> The problem is that "containsAll(Collection<?> coll)" iterates over
> the elements of "coll" and calls "contains" on "this", which is very
> slow, because "this" is a list.
> This problem is different from COLLECTIONS-416.  As established in the
> COLLECTIONS-416 discussion, there the user was responsible for passing
> the proper data structure as the parameter "coll" to the method.
> However, for "AbstractLinkedList.containsAll()", the problem is not
> related to the collection passed as the parameter.  The problem will
> always manifest for this method, irrespective of the parameter type.
> I attached two patches.  Both implement a linear time algorithm using
> a HashSet.  patchSmall.diff populates the HashSet eagerly, and
> patchFull.diff populates the HashSet lazily.  patchFull.diff is faster
> than patchSmall.diff when containsAll() returns false after inspecting
> only a few elements, though in most cases they are equally fast.  I am
> including patchSmall.diff just in case you prefer smaller code.
> There is a potential issue in using a HashSet for comparisons: HashSet
> uses "equals()" not "AbstractLinkedList.isEqualValue(Object value1,
> Object value2)".  Currently, this is not a problem, because
> "AbstractLinkedList.isEqualValue" is implemented like this:
> {code:java|borderStyle=solid}
> protected boolean isEqualValue(Object value1, Object value2) {
>     return (value1 == value2 || (value1 == null ? false : value1.equals(value2)));
> }
> {code} 
> which is the code used by HashSet to check equality, and no subclass
> of AbstractLinkedList (there are only two subclasses:
> "CursorableLinkedList" and "NodeCachingLinkedList") overrides
> "isEqualValue".
> If "isEqualValue" is indeed likely to be overridden in the future, we
> can either (1) wrap the objects in the HashSet and define an
> "equals()" that calls "isEqualValue" or (2) detect if the method is
> really overridden through reflection with something like this:
> {code:java|borderStyle=solid}
> boolean isOverriden = false;
> Class<?> curClass = this.getClass();
> while (!curClass.equals(AbstractLinkedList.class)) {
>     try {
>         curClass.getDeclaredMethod("isEqualValue", Object.class, Object.class);
>         curClass = curClass.getSuperclass();
>     } catch (Exception e) {
>         isOverriden = true;
>         break;
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira