You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Adrian Nistor (JIRA)" <ji...@apache.org> on 2012/07/26 21:04:34 UTC

[jira] [Created] (COLLECTIONS-429) Performance problem in MultiValueMap

Adrian Nistor created COLLECTIONS-429:
-----------------------------------------

             Summary: Performance problem in MultiValueMap
                 Key: COLLECTIONS-429
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-429
             Project: Commons Collections
          Issue Type: Bug
    Affects Versions: 3.2.1
         Environment: java 1.6.0_24
Ubuntu 11.10
            Reporter: Adrian Nistor
         Attachments: Test.java, TestDifferentParameter.java, Test_AbstractHashedMap.java, Test_StaticBucketMap.java, patchFull.diff, patchFull_AbstractHashedMap.diff, patchFull_StaticBucketMap.diff, patchSmall.diff, patchSmall_AbstractHashedMap.diff, patchSmall_StaticBucketMap.diff

Hi,

I am encountering a performance problem in MultiValueMap.  It appears
in version 3.2.1 and also in revision 1366088.  I attached a test that
exposes this problem and a patch that fixes it.  On my machine, for
this test, the patch provides a 1158X speedup.

To run the test, just do:

$ java Test

The output for the un-patched version is:
Time is 44040

The output for the patched version is:
Time is 38

The attached test shows that, for a "MultiValueMap multi" object, the
following operation is very slow:

multi.values().containsAll(toContain);

"MultiValueMap.values()" returns a "MultiValueMap.Values" object,
which inherits containsAll() from "java.util.AbstractCollection",
which has quadratic complexity.

I attached two patches.  Both patches override containsAll() and
implement a linear time algorithm.  patchSmall.diff populates a
HashSet eagerly, and patchFull.diff populates a HashSet lazily.
patchFull.diff is faster than patchSmall.diff when containsAll()
returns false after inspecting only a few elements, though in most
cases they are equally fast.  I am including patchSmall.diff just in
case you prefer smaller code.

Note that this problem is different from COLLECTIONS-416.  As
established in the COLLECTIONS-416 discussion, there the user was
responsible for using the proper data structures as argument for the
method.

For "MultiValueMap.values()", the problem is not related to the
collection passed as parameter.  The problem will always manifest for
this method, irrespective of the parameter type.  I attached a test
(TestDifferentParameter.java) that shows that, even with a HashSet
parameter, the current problem still manifests (which did not happen
for COLLECTIONS-416).

This problem also exists for the two other "Values" classes
(AbstractHashedMap.Values, StaticBucketMap.Values).  I attached tests
and patches for these classes as well.  If you want me to file
separate reports, just let me know.

Is this truly a performance problem, or am I misunderstanding the
intended behavior?  If so, can you please confirm that the patches are
correct?

Thanks,

Adrian


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COLLECTIONS-429) Performance problem in MultiValueMap

Posted by "Adrian Nistor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COLLECTIONS-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Nistor updated COLLECTIONS-429:
--------------------------------------

    Attachment: TestDifferentParameter.java
                Test.java
                patchSmall.diff
                patchFull.diff
    
> Performance problem in MultiValueMap
> ------------------------------------
>
>                 Key: COLLECTIONS-429
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-429
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, TestDifferentParameter.java, Test_AbstractHashedMap.java, Test_StaticBucketMap.java, patchFull.diff, patchFull_AbstractHashedMap.diff, patchFull_StaticBucketMap.diff, patchSmall.diff, patchSmall_AbstractHashedMap.diff, patchSmall_StaticBucketMap.diff
>
>
> Hi,
> I am encountering a performance problem in MultiValueMap.  It appears
> in version 3.2.1 and also in revision 1366088.  I attached a test that
> exposes this problem and a patch that fixes it.  On my machine, for
> this test, the patch provides a 1158X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 44040
> The output for the patched version is:
> Time is 38
> The attached test shows that, for a "MultiValueMap multi" object, the
> following operation is very slow:
> multi.values().containsAll(toContain);
> "MultiValueMap.values()" returns a "MultiValueMap.Values" object,
> which inherits containsAll() from "java.util.AbstractCollection",
> which has quadratic complexity.
> I attached two patches.  Both patches override containsAll() and
> implement a linear time algorithm.  patchSmall.diff populates a
> HashSet eagerly, and patchFull.diff populates a HashSet lazily.
> patchFull.diff is faster than patchSmall.diff when containsAll()
> returns false after inspecting only a few elements, though in most
> cases they are equally fast.  I am including patchSmall.diff just in
> case you prefer smaller code.
> Note that this problem is different from COLLECTIONS-416.  As
> established in the COLLECTIONS-416 discussion, there the user was
> responsible for using the proper data structures as argument for the
> method.
> For "MultiValueMap.values()", the problem is not related to the
> collection passed as parameter.  The problem will always manifest for
> this method, irrespective of the parameter type.  I attached a test
> (TestDifferentParameter.java) that shows that, even with a HashSet
> parameter, the current problem still manifests (which did not happen
> for COLLECTIONS-416).
> This problem also exists for the two other "Values" classes
> (AbstractHashedMap.Values, StaticBucketMap.Values).  I attached tests
> and patches for these classes as well.  If you want me to file
> separate reports, just let me know.
> Is this truly a performance problem, or am I misunderstanding the
> intended behavior?  If so, can you please confirm that the patches are
> correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COLLECTIONS-429) Performance problem in MultiValueMap

Posted by "Adrian Nistor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COLLECTIONS-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrian Nistor updated COLLECTIONS-429:
--------------------------------------

    Attachment: Test_StaticBucketMap.java
                Test_AbstractHashedMap.java
                patchSmall_StaticBucketMap.diff
                patchSmall_AbstractHashedMap.diff
                patchFull_StaticBucketMap.diff
                patchFull_AbstractHashedMap.diff
    
> Performance problem in MultiValueMap
> ------------------------------------
>
>                 Key: COLLECTIONS-429
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-429
>             Project: Commons Collections
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>         Environment: java 1.6.0_24
> Ubuntu 11.10
>            Reporter: Adrian Nistor
>         Attachments: Test.java, TestDifferentParameter.java, Test_AbstractHashedMap.java, Test_StaticBucketMap.java, patchFull.diff, patchFull_AbstractHashedMap.diff, patchFull_StaticBucketMap.diff, patchSmall.diff, patchSmall_AbstractHashedMap.diff, patchSmall_StaticBucketMap.diff
>
>
> Hi,
> I am encountering a performance problem in MultiValueMap.  It appears
> in version 3.2.1 and also in revision 1366088.  I attached a test that
> exposes this problem and a patch that fixes it.  On my machine, for
> this test, the patch provides a 1158X speedup.
> To run the test, just do:
> $ java Test
> The output for the un-patched version is:
> Time is 44040
> The output for the patched version is:
> Time is 38
> The attached test shows that, for a "MultiValueMap multi" object, the
> following operation is very slow:
> multi.values().containsAll(toContain);
> "MultiValueMap.values()" returns a "MultiValueMap.Values" object,
> which inherits containsAll() from "java.util.AbstractCollection",
> which has quadratic complexity.
> I attached two patches.  Both patches override containsAll() and
> implement a linear time algorithm.  patchSmall.diff populates a
> HashSet eagerly, and patchFull.diff populates a HashSet lazily.
> patchFull.diff is faster than patchSmall.diff when containsAll()
> returns false after inspecting only a few elements, though in most
> cases they are equally fast.  I am including patchSmall.diff just in
> case you prefer smaller code.
> Note that this problem is different from COLLECTIONS-416.  As
> established in the COLLECTIONS-416 discussion, there the user was
> responsible for using the proper data structures as argument for the
> method.
> For "MultiValueMap.values()", the problem is not related to the
> collection passed as parameter.  The problem will always manifest for
> this method, irrespective of the parameter type.  I attached a test
> (TestDifferentParameter.java) that shows that, even with a HashSet
> parameter, the current problem still manifests (which did not happen
> for COLLECTIONS-416).
> This problem also exists for the two other "Values" classes
> (AbstractHashedMap.Values, StaticBucketMap.Values).  I attached tests
> and patches for these classes as well.  If you want me to file
> separate reports, just let me know.
> Is this truly a performance problem, or am I misunderstanding the
> intended behavior?  If so, can you please confirm that the patches are
> correct?
> Thanks,
> Adrian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira