You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Yonik Seeley <ys...@gmail.com> on 2014/01/07 19:53:57 UTC

ANN: Solr Next

It's time to start working on the next major evolution of Solr (much
as we did years ago for the SolrCloud effort).  To kick things off,
I've started a project on github and implemented "off-heap" filters,
as a first step toward taking performance to the next level.

For a number of reasons, we felt it best to incubate this project at
github, where we could have a community dedicated solely to it's
advancement.  The plan is to bring it back to the ASF once it has
stabilized and gained enough traction.

Off-Heap Filters:
JVMs have never been good at dealing with large heaps. Large heaps
mean the JVM needs to do a lot of garbage collection work, and often
means some pretty long stop-the-world GC pauses.

Filters (Solr DocSets) stored in the filterCache are now allocated
off-heap and reference counted so they can be freed as soon as they
are no longer needed.  The JVM no longer needs to waste time copying
around these potentially long-lived blocks of memory. This should both
help eliminate the long GC pauses as well as increase request
throughput.

Performance Results:
  I'm still putting together a blog on the results, but they look good!
It was pretty trivial to reproduce >1s stop-the-world GC pauses with a
4GB heap, and then see those pauses completely go away when I switched
to off-heap filters.  Throughput also increased since much less time
was spent doing GC.

Next major feature: Native Code Optimizations.
In addition to moving more large data structures off-heap(like
UnInvertedField?), I am planning to implement native code
optimizations for certain hotspots.  Native code faceting would be an
obvious first choice since it can often be a CPU bottleneck.

Project resources:

https://github.com/Heliosearch/heliosearch

https://groups.google.com/forum/#!forum/heliosearch
https://groups.google.com/forum/#!forum/heliosearch-dev

Freenode IRC: #heliosearch #heliosearch-dev

-Yonik

Re: ANN: Solr Next

Posted by Yonik Seeley <yo...@heliosearch.com>.
That would be cool, but seems it would only work for simple term queries.
I guess having both would be best.

http://heliosearch.org -- off-heap filters for solr
-Yonik


On Mon, Jan 13, 2014 at 2:21 PM, Mikhail Khludnev
<mk...@griddynamics.com> wrote:
> Yonik,
> Don't you think that proper codec format can get the comparable gain
> without changes in design?
> https://issues.apache.org/jira/browse/LUCENE-5052
>
>
> On Mon, Jan 13, 2014 at 9:15 PM, Yonik Seeley <ys...@gmail.com> wrote:
>
>> Update on the my initial performance findings for off-heap filters:
>> http://heliosearch.org/off-heap-filters/
>>
>> -Yonik
>> http://heliosearch.org -- making solr shine
>>
>>
>> On Tue, Jan 7, 2014 at 1:53 PM, Yonik Seeley <ys...@gmail.com> wrote:
>> > Off-Heap Filters:
>> > JVMs have never been good at dealing with large heaps. Large heaps
>> > mean the JVM needs to do a lot of garbage collection work, and often
>> > means some pretty long stop-the-world GC pauses.
>> >
>> > Filters (Solr DocSets) stored in the filterCache are now allocated
>> > off-heap and reference counted so they can be freed as soon as they
>> > are no longer needed.  The JVM no longer needs to waste time copying
>> > around these potentially long-lived blocks of memory. This should both
>> > help eliminate the long GC pauses as well as increase request
>> > throughput.
>> >
>> > Performance Results:
>> >   I'm still putting together a blog on the results, but they look good!
>> > It was pretty trivial to reproduce >1s stop-the-world GC pauses with a
>> > 4GB heap, and then see those pauses completely go away when I switched
>> > to off-heap filters.  Throughput also increased since much less time
>> > was spent doing GC.
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>

Re: ANN: Solr Next

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Yonik,
Don't you think that proper codec format can get the comparable gain
without changes in design?
https://issues.apache.org/jira/browse/LUCENE-5052


On Mon, Jan 13, 2014 at 9:15 PM, Yonik Seeley <ys...@gmail.com> wrote:

> Update on the my initial performance findings for off-heap filters:
> http://heliosearch.org/off-heap-filters/
>
> -Yonik
> http://heliosearch.org -- making solr shine
>
>
> On Tue, Jan 7, 2014 at 1:53 PM, Yonik Seeley <ys...@gmail.com> wrote:
> > Off-Heap Filters:
> > JVMs have never been good at dealing with large heaps. Large heaps
> > mean the JVM needs to do a lot of garbage collection work, and often
> > means some pretty long stop-the-world GC pauses.
> >
> > Filters (Solr DocSets) stored in the filterCache are now allocated
> > off-heap and reference counted so they can be freed as soon as they
> > are no longer needed.  The JVM no longer needs to waste time copying
> > around these potentially long-lived blocks of memory. This should both
> > help eliminate the long GC pauses as well as increase request
> > throughput.
> >
> > Performance Results:
> >   I'm still putting together a blog on the results, but they look good!
> > It was pretty trivial to reproduce >1s stop-the-world GC pauses with a
> > 4GB heap, and then see those pauses completely go away when I switched
> > to off-heap filters.  Throughput also increased since much less time
> > was spent doing GC.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: ANN: Solr Next

Posted by Yonik Seeley <ys...@gmail.com>.
Update on the my initial performance findings for off-heap filters:
http://heliosearch.org/off-heap-filters/

-Yonik
http://heliosearch.org -- making solr shine


On Tue, Jan 7, 2014 at 1:53 PM, Yonik Seeley <ys...@gmail.com> wrote:
> Off-Heap Filters:
> JVMs have never been good at dealing with large heaps. Large heaps
> mean the JVM needs to do a lot of garbage collection work, and often
> means some pretty long stop-the-world GC pauses.
>
> Filters (Solr DocSets) stored in the filterCache are now allocated
> off-heap and reference counted so they can be freed as soon as they
> are no longer needed.  The JVM no longer needs to waste time copying
> around these potentially long-lived blocks of memory. This should both
> help eliminate the long GC pauses as well as increase request
> throughput.
>
> Performance Results:
>   I'm still putting together a blog on the results, but they look good!
> It was pretty trivial to reproduce >1s stop-the-world GC pauses with a
> 4GB heap, and then see those pauses completely go away when I switched
> to off-heap filters.  Throughput also increased since much less time
> was spent doing GC.

RE: ANN: Solr Next

Posted by Jean-Sebastien Vachon <je...@wantedanalytics.com>.
Hi Yonik,

Very impressive results. Looking forward to use this on our systems. Any idea what`s the plan for this feature? Will it make its way into Solr 4.9? or do we have to switch to HeliosSearch to be able to use it?

Thanks

> -----Original Message-----
> From: Yonik Seeley [mailto:yseeley@gmail.com]
> Sent: June-09-14 10:50 AM
> To: solr-user@lucene.apache.org
> Subject: Re: ANN: Solr Next
> 
> On Tue, Jan 7, 2014 at 1:53 PM, Yonik Seeley <ys...@gmail.com> wrote:
> [...]
> > Next major feature: Native Code Optimizations.
> > In addition to moving more large data structures off-heap(like
> > UnInvertedField?), I am planning to implement native code
> > optimizations for certain hotspots.  Native code faceting would be an
> > obvious first choice since it can often be a CPU bottleneck.
> 
> It's in!  Abbreviated report: 2x performance increase over stock solr faceting
> (which is already fast!) http://heliosearch.org/native-code-faceting/
> 
> -Yonik
> http://heliosearch.org -- making solr shine
> 
> > Project resources:
> >
> > https://github.com/Heliosearch/heliosearch
> >
> > https://groups.google.com/forum/#!forum/heliosearch
> > https://groups.google.com/forum/#!forum/heliosearch-dev
> >
> > Freenode IRC: #heliosearch #heliosearch-dev
> >
> > -Yonik
> 
> -----
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
> 27/05/2014 La Base de données des virus a expiré.

Re: ANN: Solr Next

Posted by Yonik Seeley <ys...@gmail.com>.
On Tue, Jan 7, 2014 at 1:53 PM, Yonik Seeley <ys...@gmail.com> wrote:
[...]
> Next major feature: Native Code Optimizations.
> In addition to moving more large data structures off-heap(like
> UnInvertedField?), I am planning to implement native code
> optimizations for certain hotspots.  Native code faceting would be an
> obvious first choice since it can often be a CPU bottleneck.

It's in!  Abbreviated report: 2x performance increase over stock solr
faceting (which is already fast!)
http://heliosearch.org/native-code-faceting/

-Yonik
http://heliosearch.org -- making solr shine

> Project resources:
>
> https://github.com/Heliosearch/heliosearch
>
> https://groups.google.com/forum/#!forum/heliosearch
> https://groups.google.com/forum/#!forum/heliosearch-dev
>
> Freenode IRC: #heliosearch #heliosearch-dev
>
> -Yonik