You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by David McManamon <dm...@gmail.com> on 2017/11/16 01:46:52 UTC

TreeSet to AVLTreeSet?

There were interesting papers regarding balanced binary trees in 2015 & 2016

http://sidsen.azurewebsites.net//

And after reading about "Rank-balanced Trees"  I started comparing AVL &
Red-black tree performance and was surprised to see that red-black trees
are the default in Java given the performance differences noted in Ben
Pfaff's 2004 work.
So I wrote several new TreeMap implementations and put them on GitHub with
two short blog posts-

https://refactoringlightly.wordpress.com/

Seems that TreeSet is used a lot in Lucene and if the input has sequences
then the constants in the Olog(n) operations make those places a very
strong candidate to switch to an AVLTreeSet.

With so many uses of TreeSet I wasn't sure where/how to start benchmarking?
--David

Re: TreeSet to AVLTreeSet?

Posted by David Smiley <da...@gmail.com>.
Also, there are a number of TreeMaps that ought to be HashMaps:
https://issues.apache.org/jira/browse/LUCENE-8041

On Wed, Nov 15, 2017 at 9:20 PM Erick Erickson <er...@gmail.com>
wrote:

> The first thing I'd do is profile a running Solr instance and see if it
> spends enough time building TreeSets to bother with before doing wholesale
> surgery. My guess is that the efficiencies are hard to measure if
> measurable at all.
>
> The proof is in the profiling of course.
>
> Best,
> Erick
>
> On Wed, Nov 15, 2017 at 5:46 PM, David McManamon <dm...@gmail.com>
> wrote:
>
>> There were interesting papers regarding balanced binary trees in 2015 &
>> 2016
>>
>> http://sidsen.azurewebsites.net//
>>
>> And after reading about "Rank-balanced Trees"  I started comparing AVL &
>> Red-black tree performance and was surprised to see that red-black trees
>> are the default in Java given the performance differences noted in Ben
>> Pfaff's 2004 work.
>> So I wrote several new TreeMap implementations and put them on GitHub
>> with two short blog posts-
>>
>> https://refactoringlightly.wordpress.com/
>>
>> Seems that TreeSet is used a lot in Lucene and if the input has sequences
>> then the constants in the Olog(n) operations make those places a very
>> strong candidate to switch to an AVLTreeSet.
>>
>> With so many uses of TreeSet I wasn't sure where/how to start
>> benchmarking?
>> --David
>>
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: TreeSet to AVLTreeSet?

Posted by Erick Erickson <er...@gmail.com>.
The first thing I'd do is profile a running Solr instance and see if it
spends enough time building TreeSets to bother with before doing wholesale
surgery. My guess is that the efficiencies are hard to measure if
measurable at all.

The proof is in the profiling of course.

Best,
Erick

On Wed, Nov 15, 2017 at 5:46 PM, David McManamon <dm...@gmail.com> wrote:

> There were interesting papers regarding balanced binary trees in 2015 &
> 2016
>
> http://sidsen.azurewebsites.net//
>
> And after reading about "Rank-balanced Trees"  I started comparing AVL &
> Red-black tree performance and was surprised to see that red-black trees
> are the default in Java given the performance differences noted in Ben
> Pfaff's 2004 work.
> So I wrote several new TreeMap implementations and put them on GitHub with
> two short blog posts-
>
> https://refactoringlightly.wordpress.com/
>
> Seems that TreeSet is used a lot in Lucene and if the input has sequences
> then the constants in the Olog(n) operations make those places a very
> strong candidate to switch to an AVLTreeSet.
>
> With so many uses of TreeSet I wasn't sure where/how to start benchmarking?
> --David
>