You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Thomas Neidhart <th...@gmail.com> on 2015/01/24 19:21:01 UTC

[collections] Revert a performance related fix in 4.1

Hi,

from time to time some researchers trying to find performance bugs in
open-source software create issues for collections.

One of the easy targets is the Collection#retainAll(Collection) method
as the default implementation in AbstractCollection calls contains on
the provided collection.

Now, in the worst-case this may lead to a runtime complexity of O(n^2),
i.e. when the collection has a contains implementation with O(n) complexity.

The proposed solution is usually to create an intermediate set and use
it to speed up the contains calls.

My position was always that a given Collection class shall not overwrite
this method trying to optimize its runtime behavior for any possible
input by creating such intermediate data structures as this will just
increase the space complexity (in many cases unnecessarily). Instead,
the runtime complexity was documented (one can even question this as the
Collection framework should be well-known by java users).

It looks like that at 2 occasions these "performance bugs" got fixed:

 * https://issues.apache.org/jira/browse/COLLECTIONS-426
 * https://issues.apache.org/jira/browse/COLLECTIONS-427

and I would like to revert these fixes as they are wrong imho and just
create a precedent for further tickets.

Does anybody challenge my rationale behind this or can I go ahead with it?

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [collections] Revert a performance related fix in 4.1

Posted by James Carman <ja...@carmanconsulting.com>.
On Mon, Jan 26, 2015 at 7:03 AM, Benedikt Ritter <br...@apache.org> wrote:
> Hello Adrian
>
> 2015-01-24 19:43 GMT+01:00 Adrian Crum <ad...@sandglass-software.com>:
>
>> Slightly off-topic but somewhat related...
>>
>> I saw a recent commit where a "performance improvement" went something
>> like this:
>>
>> StringBuilder sb = new StringBuilder();
>> sb.append("foo");
>>
>> was replaced with
>>
>> StringBuilder sb = new StringBuilder("foo");
>>
>> The change reduced the code by one line, but there was no "performance
>> improvement" - because the StringBuilder constructor calls append().
>>
>
> All methods on StringBuffer are synchronized, so there a slight overhead of
> acquiring the lock. This is usually the argument for using StringBuilder
> over StringBuffer.
>

But, there was no StringBuffer referenced in the above code snippet,
right, so the sync vs. non-sync doesn't apply.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [collections] Revert a performance related fix in 4.1

Posted by Benedikt Ritter <br...@apache.org>.
2015-01-26 14:05 GMT+01:00 Gary Gregory <ga...@gmail.com>:

> On Mon, Jan 26, 2015 at 7:03 AM, Benedikt Ritter <br...@apache.org>
> wrote:
>
> > Hello Adrian
> >
> > 2015-01-24 19:43 GMT+01:00 Adrian Crum <
> adrian.crum@sandglass-software.com
> > >:
> >
> > > Slightly off-topic but somewhat related...
> > >
> > > I saw a recent commit where a "performance improvement" went something
> > > like this:
> > >
> > > StringBuilder sb = new StringBuilder();
> > > sb.append("foo");
> > >
> > > was replaced with
> > >
> > > StringBuilder sb = new StringBuilder("foo");
> > >
> > > The change reduced the code by one line, but there was no "performance
> > > improvement" - because the StringBuilder constructor calls append().
> > >
> >
> > All methods on StringBuffer are synchronized, so there a slight overhead
> of
> > acquiring the lock. This is usually the argument for using StringBuilder
> > over StringBuffer.
> >
>
> Woa, going from builder to buffer sounds wrong unless synchronization is
> required.
>

D'oh, meant it the other way around ;-) sorry for the confusion!


>
> Gary
>
> >
> >
> > >
> > > So, I agree that suggested performance improvements should be met with
> a
> > > great deal of skepticism and they should be closely scrutinized.
> > >
> > > Adrian Crum
> > > Sandglass Software
> > > www.sandglass-software.com
> > >
> > >
> > > On 1/24/2015 10:21 AM, Thomas Neidhart wrote:
> > >
> > >> Hi,
> > >>
> > >> from time to time some researchers trying to find performance bugs in
> > >> open-source software create issues for collections.
> > >>
> > >> One of the easy targets is the Collection#retainAll(Collection) method
> > >> as the default implementation in AbstractCollection calls contains on
> > >> the provided collection.
> > >>
> > >> Now, in the worst-case this may lead to a runtime complexity of
> O(n^2),
> > >> i.e. when the collection has a contains implementation with O(n)
> > >> complexity.
> > >>
> > >> The proposed solution is usually to create an intermediate set and use
> > >> it to speed up the contains calls.
> > >>
> > >> My position was always that a given Collection class shall not
> overwrite
> > >> this method trying to optimize its runtime behavior for any possible
> > >> input by creating such intermediate data structures as this will just
> > >> increase the space complexity (in many cases unnecessarily). Instead,
> > >> the runtime complexity was documented (one can even question this as
> the
> > >> Collection framework should be well-known by java users).
> > >>
> > >> It looks like that at 2 occasions these "performance bugs" got fixed:
> > >>
> > >>   * https://issues.apache.org/jira/browse/COLLECTIONS-426
> > >>   * https://issues.apache.org/jira/browse/COLLECTIONS-427
> > >>
> > >> and I would like to revert these fixes as they are wrong imho and just
> > >> create a precedent for further tickets.
> > >>
> > >> Does anybody challenge my rationale behind this or can I go ahead with
> > it?
> > >>
> > >> Thomas
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > >> For additional commands, e-mail: dev-help@commons.apache.org
> > >>
> > >>
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
> >
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter
> >
>
>
>
> --
> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> Java Persistence with Hibernate, Second Edition
> <http://www.manning.com/bauer3/>
> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> Spring Batch in Action <http://www.manning.com/templier/>
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>



-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Re: [collections] Revert a performance related fix in 4.1

Posted by Gary Gregory <ga...@gmail.com>.
On Mon, Jan 26, 2015 at 7:03 AM, Benedikt Ritter <br...@apache.org> wrote:

> Hello Adrian
>
> 2015-01-24 19:43 GMT+01:00 Adrian Crum <adrian.crum@sandglass-software.com
> >:
>
> > Slightly off-topic but somewhat related...
> >
> > I saw a recent commit where a "performance improvement" went something
> > like this:
> >
> > StringBuilder sb = new StringBuilder();
> > sb.append("foo");
> >
> > was replaced with
> >
> > StringBuilder sb = new StringBuilder("foo");
> >
> > The change reduced the code by one line, but there was no "performance
> > improvement" - because the StringBuilder constructor calls append().
> >
>
> All methods on StringBuffer are synchronized, so there a slight overhead of
> acquiring the lock. This is usually the argument for using StringBuilder
> over StringBuffer.
>

Woa, going from builder to buffer sounds wrong unless synchronization is
required.

Gary

>
>
> >
> > So, I agree that suggested performance improvements should be met with a
> > great deal of skepticism and they should be closely scrutinized.
> >
> > Adrian Crum
> > Sandglass Software
> > www.sandglass-software.com
> >
> >
> > On 1/24/2015 10:21 AM, Thomas Neidhart wrote:
> >
> >> Hi,
> >>
> >> from time to time some researchers trying to find performance bugs in
> >> open-source software create issues for collections.
> >>
> >> One of the easy targets is the Collection#retainAll(Collection) method
> >> as the default implementation in AbstractCollection calls contains on
> >> the provided collection.
> >>
> >> Now, in the worst-case this may lead to a runtime complexity of O(n^2),
> >> i.e. when the collection has a contains implementation with O(n)
> >> complexity.
> >>
> >> The proposed solution is usually to create an intermediate set and use
> >> it to speed up the contains calls.
> >>
> >> My position was always that a given Collection class shall not overwrite
> >> this method trying to optimize its runtime behavior for any possible
> >> input by creating such intermediate data structures as this will just
> >> increase the space complexity (in many cases unnecessarily). Instead,
> >> the runtime complexity was documented (one can even question this as the
> >> Collection framework should be well-known by java users).
> >>
> >> It looks like that at 2 occasions these "performance bugs" got fixed:
> >>
> >>   * https://issues.apache.org/jira/browse/COLLECTIONS-426
> >>   * https://issues.apache.org/jira/browse/COLLECTIONS-427
> >>
> >> and I would like to revert these fixes as they are wrong imho and just
> >> create a precedent for further tickets.
> >>
> >> Does anybody challenge my rationale behind this or can I go ahead with
> it?
> >>
> >> Thomas
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >>
> >>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: [collections] Revert a performance related fix in 4.1

Posted by Benedikt Ritter <br...@apache.org>.
Hello Adrian

2015-01-24 19:43 GMT+01:00 Adrian Crum <ad...@sandglass-software.com>:

> Slightly off-topic but somewhat related...
>
> I saw a recent commit where a "performance improvement" went something
> like this:
>
> StringBuilder sb = new StringBuilder();
> sb.append("foo");
>
> was replaced with
>
> StringBuilder sb = new StringBuilder("foo");
>
> The change reduced the code by one line, but there was no "performance
> improvement" - because the StringBuilder constructor calls append().
>

All methods on StringBuffer are synchronized, so there a slight overhead of
acquiring the lock. This is usually the argument for using StringBuilder
over StringBuffer.


>
> So, I agree that suggested performance improvements should be met with a
> great deal of skepticism and they should be closely scrutinized.
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
>
> On 1/24/2015 10:21 AM, Thomas Neidhart wrote:
>
>> Hi,
>>
>> from time to time some researchers trying to find performance bugs in
>> open-source software create issues for collections.
>>
>> One of the easy targets is the Collection#retainAll(Collection) method
>> as the default implementation in AbstractCollection calls contains on
>> the provided collection.
>>
>> Now, in the worst-case this may lead to a runtime complexity of O(n^2),
>> i.e. when the collection has a contains implementation with O(n)
>> complexity.
>>
>> The proposed solution is usually to create an intermediate set and use
>> it to speed up the contains calls.
>>
>> My position was always that a given Collection class shall not overwrite
>> this method trying to optimize its runtime behavior for any possible
>> input by creating such intermediate data structures as this will just
>> increase the space complexity (in many cases unnecessarily). Instead,
>> the runtime complexity was documented (one can even question this as the
>> Collection framework should be well-known by java users).
>>
>> It looks like that at 2 occasions these "performance bugs" got fixed:
>>
>>   * https://issues.apache.org/jira/browse/COLLECTIONS-426
>>   * https://issues.apache.org/jira/browse/COLLECTIONS-427
>>
>> and I would like to revert these fixes as they are wrong imho and just
>> create a precedent for further tickets.
>>
>> Does anybody challenge my rationale behind this or can I go ahead with it?
>>
>> Thomas
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Re: [collections] Revert a performance related fix in 4.1

Posted by Adrian Crum <ad...@sandglass-software.com>.
Slightly off-topic but somewhat related...

I saw a recent commit where a "performance improvement" went something 
like this:

StringBuilder sb = new StringBuilder();
sb.append("foo");

was replaced with

StringBuilder sb = new StringBuilder("foo");

The change reduced the code by one line, but there was no "performance 
improvement" - because the StringBuilder constructor calls append().

So, I agree that suggested performance improvements should be met with a 
great deal of skepticism and they should be closely scrutinized.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 1/24/2015 10:21 AM, Thomas Neidhart wrote:
> Hi,
>
> from time to time some researchers trying to find performance bugs in
> open-source software create issues for collections.
>
> One of the easy targets is the Collection#retainAll(Collection) method
> as the default implementation in AbstractCollection calls contains on
> the provided collection.
>
> Now, in the worst-case this may lead to a runtime complexity of O(n^2),
> i.e. when the collection has a contains implementation with O(n) complexity.
>
> The proposed solution is usually to create an intermediate set and use
> it to speed up the contains calls.
>
> My position was always that a given Collection class shall not overwrite
> this method trying to optimize its runtime behavior for any possible
> input by creating such intermediate data structures as this will just
> increase the space complexity (in many cases unnecessarily). Instead,
> the runtime complexity was documented (one can even question this as the
> Collection framework should be well-known by java users).
>
> It looks like that at 2 occasions these "performance bugs" got fixed:
>
>   * https://issues.apache.org/jira/browse/COLLECTIONS-426
>   * https://issues.apache.org/jira/browse/COLLECTIONS-427
>
> and I would like to revert these fixes as they are wrong imho and just
> create a precedent for further tickets.
>
> Does anybody challenge my rationale behind this or can I go ahead with it?
>
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [collections] Revert a performance related fix in 4.1

Posted by Phil Steitz <ph...@gmail.com>.
On 1/24/15 1:16 PM, Oliver Heger wrote:
> Hi Thomas,
>
> On 24.01.2015 19:21, Thomas Neidhart wrote:
>> Hi,
>>
>> from time to time some researchers trying to find performance
>> bugs in
>> open-source software create issues for collections.
>>
>> One of the easy targets is the Collection#retainAll(Collection)
>> method
>> as the default implementation in AbstractCollection calls
>> contains on
>> the provided collection.
>>
>> Now, in the worst-case this may lead to a runtime complexity of
>> O(n^2),
>> i.e. when the collection has a contains implementation with O(n)
>> complexity.
>>
>> The proposed solution is usually to create an intermediate set
>> and use
>> it to speed up the contains calls.
>>
>> My position was always that a given Collection class shall not
>> overwrite
>> this method trying to optimize its runtime behavior for any possible
>> input by creating such intermediate data structures as this will
>> just
>> increase the space complexity (in many cases unnecessarily).
>> Instead,
>> the runtime complexity was documented (one can even question this
>> as the
>> Collection framework should be well-known by java users).
>>
>> It looks like that at 2 occasions these "performance bugs" got
>> fixed:
>>
>>   * https://issues.apache.org/jira/browse/COLLECTIONS-426
>>   * https://issues.apache.org/jira/browse/COLLECTIONS-427
>>
>> and I would like to revert these fixes as they are wrong imho and
>> just
>> create a precedent for further tickets.
>>
>> Does anybody challenge my rationale behind this or can I go ahead
>> with it?
>>
>> Thomas
>
> I follow your reasoning. It seems that the proposed "optimization"
> would have a negative impact on average use cases while trying to
> prevent users from doing something stupid.
>
> A general purpose library should focus on such average use cases.

+1 - and transparency.  Being able to clearly document behavior (or
rely on existing JDK doco) outweighs the benefit here IMO.

Phil
>
> Oliver
>
>>
>> ---------------------------------------------------------------------
>>
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [collections] Revert a performance related fix in 4.1

Posted by Oliver Heger <ol...@oliver-heger.de>.
Hi Thomas,

On 24.01.2015 19:21, Thomas Neidhart wrote:
> Hi,
>
> from time to time some researchers trying to find performance bugs in
> open-source software create issues for collections.
>
> One of the easy targets is the Collection#retainAll(Collection) method
> as the default implementation in AbstractCollection calls contains on
> the provided collection.
>
> Now, in the worst-case this may lead to a runtime complexity of O(n^2),
> i.e. when the collection has a contains implementation with O(n) complexity.
>
> The proposed solution is usually to create an intermediate set and use
> it to speed up the contains calls.
>
> My position was always that a given Collection class shall not overwrite
> this method trying to optimize its runtime behavior for any possible
> input by creating such intermediate data structures as this will just
> increase the space complexity (in many cases unnecessarily). Instead,
> the runtime complexity was documented (one can even question this as the
> Collection framework should be well-known by java users).
>
> It looks like that at 2 occasions these "performance bugs" got fixed:
>
>   * https://issues.apache.org/jira/browse/COLLECTIONS-426
>   * https://issues.apache.org/jira/browse/COLLECTIONS-427
>
> and I would like to revert these fixes as they are wrong imho and just
> create a precedent for further tickets.
>
> Does anybody challenge my rationale behind this or can I go ahead with it?
>
> Thomas

I follow your reasoning. It seems that the proposed "optimization" would 
have a negative impact on average use cases while trying to prevent 
users from doing something stupid.

A general purpose library should focus on such average use cases.

Oliver

>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org