You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Mert Guldur (JIRA)" <ji...@apache.org> on 2012/09/06 18:09:07 UTC

[jira] [Created] (COLLECTIONS-434) performance problem in CursorableLinkedList.containsAll()

Mert Guldur created COLLECTIONS-434:
---------------------------------------

             Summary: performance problem in CursorableLinkedList.containsAll()
                 Key: COLLECTIONS-434
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-434
             Project: Commons Collections
          Issue Type: Bug
    Affects Versions: 3.2.1
            Reporter: Mert Guldur
         Attachments: patchFull.diff, patchSmall.diff, Test.java

I am encountering a performance problem in
CursorableLinkedList.containsAll() (which also affects
NodeCachingLinkedList).  It appears in version 3.2.1 and also in
revision 1381642.  I attached a test that exposes this problem and a
patch that fixes it.  On my machine, for this test, the patch provides
a 182X speedup.

To run the test, just do:

$ java Test

The output for the un-patched version is:
Time is 5110

The output for the patched version is:
Time is 28

The problem is that "containsAll(Collection<?> coll)" iterates over
the elements of "coll" and calls "contains" on "this", which is very
slow, because "this" is a list.

This problem is different from COLLECTIONS-416.  As established in the
COLLECTIONS-416 discussion, there the user was responsible for passing
the proper data structure as the parameter "coll" to the method.
However, for "AbstractLinkedList.containsAll()", the problem is not
related to the collection passed as the parameter.  The problem will
always manifest for this method, irrespective of the parameter type.

I attached two patches.  Both implement a linear time algorithm using
a HashSet.  patchSmall.diff populates the HashSet eagerly, and
patchFull.diff populates the HashSet lazily.  patchFull.diff is faster
than patchSmall.diff when containsAll() returns false after inspecting
only a few elements, though in most cases they are equally fast.  I am
including patchSmall.diff just in case you prefer smaller code.

There is a potential issue in using a HashSet for comparisons: HashSet
uses "equals()" not "AbstractLinkedList.isEqualValue(Object value1,
Object value2)".  Currently, this is not a problem, because
"AbstractLinkedList.isEqualValue" is implemented like this:

{code:java|borderStyle=solid}
protected boolean isEqualValue(Object value1, Object value2) {
    return (value1 == value2 || (value1 == null ? false : value1.equals(value2)));
}
{code} 

which is the code used by HashSet to check equality, and no subclass
of AbstractLinkedList (there are only two subclasses:
"CursorableLinkedList" and "NodeCachingLinkedList") overrides
"isEqualValue".

If "isEqualValue" is indeed likely to be overridden in the future, we
can either (1) wrap the objects in the HashSet and define an
"equals()" that calls "isEqualValue" or (2) detect if the method is
really overridden through reflection with something like this:

{code:java|borderStyle=solid}
boolean isOverriden = false;
Class<?> curClass = this.getClass();
while (!curClass.equals(AbstractLinkedList.class)) {
    try {
        curClass.getDeclaredMethod("isEqualValue", Object.class, Object.class);
        curClass = curClass.getSuperclass();
    } catch (Exception e) {
        isOverriden = true;
        break;
    }
}
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (COLLECTIONS-434) performance problem in CursorableLinkedList.containsAll()

Posted by "Mert Guldur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COLLECTIONS-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mert Guldur updated COLLECTIONS-434:
------------------------------------

    Description: 
I am encountering a performance problem in
CursorableLinkedList.containsAll() (which also affects
NodeCachingLinkedList).  It appears in version 3.2.1 and also in
revision 1381642.  I attached a test that exposes this problem and a
patch that fixes it.  On my machine, for this test, the patch provides
a 182X speedup.

To run the test, just do:

$ java Test

The output for the un-patched version is:
Time is 5110

The output for the patched version is:
Time is 28

The problem is that "containsAll(Collection<?> coll)" iterates over
the elements of "coll" and calls "contains" on "this", which is very
slow, because "this" is a list.

This problem is different from COLLECTIONS-416.  As established in the
COLLECTIONS-416 discussion, there the user was responsible for passing
the proper data structure as the parameter "coll" to the method.
However, for "AbstractLinkedList.containsAll()", the problem is not
related to the collection passed as the parameter.  The problem will
always manifest for this method, irrespective of the parameter type.

I attached two patches.  Both implement a linear time algorithm using
a HashSet.  patchSmall.diff populates the HashSet eagerly, and
patchFull.diff populates the HashSet lazily.  patchFull.diff is faster
than patchSmall.diff when containsAll() returns false after inspecting
only a few elements, though in most cases they are equally fast.  I am
including patchSmall.diff just in case you prefer smaller code.

There is a potential issue in using a HashSet for comparisons: HashSet
uses "equals()" not "AbstractLinkedList.isEqualValue(Object value1,
Object value2)".  Currently, this is not a problem, because
"AbstractLinkedList.isEqualValue" is implemented like this:

{code:java|borderStyle=solid}
protected boolean isEqualValue(Object value1, Object value2) {
    return (value1 == value2 || (value1 == null ? false : value1.equals(value2)));
}
{code} 

which is the code used by HashSet to check equality, and no subclass
of AbstractLinkedList (there are only two subclasses:
"CursorableLinkedList" and "NodeCachingLinkedList") overrides
"isEqualValue".

If "isEqualValue" is indeed likely to be overridden in the future, we
can either (1) wrap the objects in the HashSet and define an
"equals()" that calls "isEqualValue" or (2) detect if the method is
really overridden through reflection with something like this:

{code:java|borderStyle=solid}
boolean isOverriden = false;
Class<?> curClass = this.getClass();
while (!curClass.equals(AbstractLinkedList.class)) {
    try {
        curClass.getDeclaredMethod("isEqualValue", Object.class, Object.class);
        isOverriden = true;
        break;
    } catch (Exception e) {
        curClass = curClass.getSuperclass();
    }
}
{code}

  was:
I am encountering a performance problem in
CursorableLinkedList.containsAll() (which also affects
NodeCachingLinkedList).  It appears in version 3.2.1 and also in
revision 1381642.  I attached a test that exposes this problem and a
patch that fixes it.  On my machine, for this test, the patch provides
a 182X speedup.

To run the test, just do:

$ java Test

The output for the un-patched version is:
Time is 5110

The output for the patched version is:
Time is 28

The problem is that "containsAll(Collection<?> coll)" iterates over
the elements of "coll" and calls "contains" on "this", which is very
slow, because "this" is a list.

This problem is different from COLLECTIONS-416.  As established in the
COLLECTIONS-416 discussion, there the user was responsible for passing
the proper data structure as the parameter "coll" to the method.
However, for "AbstractLinkedList.containsAll()", the problem is not
related to the collection passed as the parameter.  The problem will
always manifest for this method, irrespective of the parameter type.

I attached two patches.  Both implement a linear time algorithm using
a HashSet.  patchSmall.diff populates the HashSet eagerly, and
patchFull.diff populates the HashSet lazily.  patchFull.diff is faster
than patchSmall.diff when containsAll() returns false after inspecting
only a few elements, though in most cases they are equally fast.  I am
including patchSmall.diff just in case you prefer smaller code.

There is a potential issue in using a HashSet for comparisons: HashSet
uses "equals()" not "AbstractLinkedList.isEqualValue(Object value1,
Object value2)".  Currently, this is not a problem, because
"AbstractLinkedList.isEqualValue" is implemented like this:

{code:java|borderStyle=solid}
protected boolean isEqualValue(Object value1, Object value2) {
    return (value1 == value2 || (value1 == null ? false : value1.equals(value2)));
}
{code} 

which is the code used by HashSet to check equality, and no subclass
of AbstractLinkedList (there are only two subclasses:
"CursorableLinkedList" and "NodeCachingLinkedList") overrides
"isEqualValue".

If "isEqualValue" is indeed likely to be overridden in the future, we
can either (1) wrap the objects in the HashSet and define an
"equals()" that calls "isEqualValue" or (2) detect if the method is
really overridden through reflection with something like this:

{code:java|borderStyle=solid}
boolean isOverriden = false;
Class<?> curClass = this.getClass();
while (!curClass.equals(AbstractLinkedList.class)) {
    try {
        curClass.getDeclaredMethod("isEqualValue", Object.class, Object.class);
        curClass = curClass.getSuperclass();
    } catch (Exception e) {
        isOverriden = true;
        break;
    }
}
{code}

    
> performance problem in CursorableLinkedList.containsAll()
> ---------------------------------------------------------
>
>                 Key: COLLECTIONS-434
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-434
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>            Reporter: Mert Guldur
>         Attachments: patchFull.diff, patchSmall.diff, Test.java
>
>
> I am encountering a performance problem in
> CursorableLinkedList.containsAll() (which also affects
> NodeCachingLinkedList).  It appears in version 3.2.1 and also in
> revision 1381642.  I attached a test that exposes this problem and a
> patch that fixes it.  On my machine, for this test, the patch provides
> a 182X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5110
> The output for the patched version is:
> Time is 28
> The problem is that "containsAll(Collection<?> coll)" iterates over
> the elements of "coll" and calls "contains" on "this", which is very
> slow, because "this" is a list.
> This problem is different from COLLECTIONS-416.  As established in the
> COLLECTIONS-416 discussion, there the user was responsible for passing
> the proper data structure as the parameter "coll" to the method.
> However, for "AbstractLinkedList.containsAll()", the problem is not
> related to the collection passed as the parameter.  The problem will
> always manifest for this method, irrespective of the parameter type.
> I attached two patches.  Both implement a linear time algorithm using
> a HashSet.  patchSmall.diff populates the HashSet eagerly, and
> patchFull.diff populates the HashSet lazily.  patchFull.diff is faster
> than patchSmall.diff when containsAll() returns false after inspecting
> only a few elements, though in most cases they are equally fast.  I am
> including patchSmall.diff just in case you prefer smaller code.
> There is a potential issue in using a HashSet for comparisons: HashSet
> uses "equals()" not "AbstractLinkedList.isEqualValue(Object value1,
> Object value2)".  Currently, this is not a problem, because
> "AbstractLinkedList.isEqualValue" is implemented like this:
> {code:java|borderStyle=solid}
> protected boolean isEqualValue(Object value1, Object value2) {
>     return (value1 == value2 || (value1 == null ? false : value1.equals(value2)));
> }
> {code} 
> which is the code used by HashSet to check equality, and no subclass
> of AbstractLinkedList (there are only two subclasses:
> "CursorableLinkedList" and "NodeCachingLinkedList") overrides
> "isEqualValue".
> If "isEqualValue" is indeed likely to be overridden in the future, we
> can either (1) wrap the objects in the HashSet and define an
> "equals()" that calls "isEqualValue" or (2) detect if the method is
> really overridden through reflection with something like this:
> {code:java|borderStyle=solid}
> boolean isOverriden = false;
> Class<?> curClass = this.getClass();
> while (!curClass.equals(AbstractLinkedList.class)) {
>     try {
>         curClass.getDeclaredMethod("isEqualValue", Object.class, Object.class);
>         isOverriden = true;
>         break;
>     } catch (Exception e) {
>         curClass = curClass.getSuperclass();
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (COLLECTIONS-434) performance problem in CursorableLinkedList.containsAll()

Posted by "Mert Guldur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COLLECTIONS-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mert Guldur updated COLLECTIONS-434:
------------------------------------

    Attachment: Test.java
                patchSmall.diff
                patchFull.diff
    
> performance problem in CursorableLinkedList.containsAll()
> ---------------------------------------------------------
>
>                 Key: COLLECTIONS-434
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-434
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>            Reporter: Mert Guldur
>         Attachments: patchFull.diff, patchSmall.diff, Test.java
>
>
> I am encountering a performance problem in
> CursorableLinkedList.containsAll() (which also affects
> NodeCachingLinkedList).  It appears in version 3.2.1 and also in
> revision 1381642.  I attached a test that exposes this problem and a
> patch that fixes it.  On my machine, for this test, the patch provides
> a 182X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 5110
> The output for the patched version is:
> Time is 28
> The problem is that "containsAll(Collection<?> coll)" iterates over
> the elements of "coll" and calls "contains" on "this", which is very
> slow, because "this" is a list.
> This problem is different from COLLECTIONS-416.  As established in the
> COLLECTIONS-416 discussion, there the user was responsible for passing
> the proper data structure as the parameter "coll" to the method.
> However, for "AbstractLinkedList.containsAll()", the problem is not
> related to the collection passed as the parameter.  The problem will
> always manifest for this method, irrespective of the parameter type.
> I attached two patches.  Both implement a linear time algorithm using
> a HashSet.  patchSmall.diff populates the HashSet eagerly, and
> patchFull.diff populates the HashSet lazily.  patchFull.diff is faster
> than patchSmall.diff when containsAll() returns false after inspecting
> only a few elements, though in most cases they are equally fast.  I am
> including patchSmall.diff just in case you prefer smaller code.
> There is a potential issue in using a HashSet for comparisons: HashSet
> uses "equals()" not "AbstractLinkedList.isEqualValue(Object value1,
> Object value2)".  Currently, this is not a problem, because
> "AbstractLinkedList.isEqualValue" is implemented like this:
> {code:java|borderStyle=solid}
> protected boolean isEqualValue(Object value1, Object value2) {
>     return (value1 == value2 || (value1 == null ? false : value1.equals(value2)));
> }
> {code} 
> which is the code used by HashSet to check equality, and no subclass
> of AbstractLinkedList (there are only two subclasses:
> "CursorableLinkedList" and "NodeCachingLinkedList") overrides
> "isEqualValue".
> If "isEqualValue" is indeed likely to be overridden in the future, we
> can either (1) wrap the objects in the HashSet and define an
> "equals()" that calls "isEqualValue" or (2) detect if the method is
> really overridden through reflection with something like this:
> {code:java|borderStyle=solid}
> boolean isOverriden = false;
> Class<?> curClass = this.getClass();
> while (!curClass.equals(AbstractLinkedList.class)) {
>     try {
>         curClass.getDeclaredMethod("isEqualValue", Object.class, Object.class);
>         curClass = curClass.getSuperclass();
>     } catch (Exception e) {
>         isOverriden = true;
>         break;
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira