You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Uwe Reh <re...@hebis.uni-frankfurt.de> on 2015/09/21 14:09:40 UTC

faceting is unusable slow since upgrade to 5.3.0

Hi,

our bibliographic index (~20M entries) runs fine with Solr 4.10.3
With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 
seconds)
> Output of 'debugQuery':
> <lst name="process"><double name="time">17705.0</double>
> <lst name="query"><double name="time">2.0</double></lst>
> <lst name="facet"><double name="time">17590.0</double></lst> !!!!!!
> <lst name="debug"><double name="time">111.0</double></lst>

The 'fieldValueCache' seems to be unused (no inserts nor lookups) in 
Solr 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a 
cumulative_hitratio of 1.

- the behavior is the same, running Solr5.3 on a copy of the old index 
(luceneMatch=4.6) or a newly build index
- using 'facet.method=enum' makes no remarkable difference
- declaring 'docValues' (with reindexing) makes no remarkable difference
- 'softCommit' isn't used

My enviroment is
   OS: Solaris 5.11 on AMD64
   JDK: 1.8.0_25 and 1.8.0_60 (same behavior)
   JavaOpts: -Xmx 10g -XX:+UseG1GC -XX:+AggressiveOpts 
-XX:+UseLargePages -XX:LargePageSizeInBytes=2m

Any help/advice is welcome
Uwe

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Am 22.09.2015 um 02:12 schrieb Joel Bernstein:
> Have you looked at your Solr instance with a cpu profiler like YourKit? It
> would be useful to see the hotspots which should be really obvious with 20
> second response times.

No, until now I have done no profiling. I thought the unused 
fieldValueCache was clear indicator of my faulty operation.
Because we are a public service, I can not YourKit use (not the license 
itself, the local expenses for licensing is the blocker) I will try to 
detect the hotspot with VisualVM.

> Also are you running in distributed mode or on a single Solr instance?
Just as single instance.

Thanks for the attention
Uwe


Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Joel Bernstein <jo...@gmail.com>.
Can you also try testing with one facet at a time and see if we hit a
particular facet that is slow?

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Sep 22, 2015 at 9:36 AM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:

> The exact version as shown by the UI is:
> - solr-impl   5.3.0 1696229 - noble - 2015-08-17 17:10:43
> - lucene-impl 5.3.0 1696229 - noble - 2015-08-17 16:59:03
>
> Unfortunately my skills in debugging are limited. So I'm not sure about a
> 'deeper caller stack'.
> Did you mean the attached snapshot from VirtualVM, a stack trace like
> below or something else? Please give me a hint.
>
> uwe
>
> "qtp1734853116-68" #68 prio=5 os_prio=64 tid=0x00000000117fd800 nid=0x77
>> runnable [0xfffffd7f991fc000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.util.HashMap.resize(HashMap.java:734)
>>         at java.util.HashMap.putVal(HashMap.java:662)
>>         at java.util.HashMap.put(HashMap.java:611)
>>         at
>> org.apache.lucene.index.FieldInfos$Builder.addOrUpdateInternal(FieldInfos.java:344)
>>         at
>> org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:366)
>>         at
>> org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:304)
>>         at
>> org.apache.lucene.index.MultiFields.getMergedFieldInfos(MultiFields.java:245)
>>         at
>> org.apache.lucene.index.SlowCompositeReaderWrapper.getFieldInfos(SlowCompositeReaderWrapper.java:237)
>>         at
>> org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedSetDocValues(SlowCompositeReaderWrapper.java:174)
>>         at
>> org.apache.solr.request.DocValuesFacets.getCounts(DocValuesFacets.java:72)
>>         at
>> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:492)
>>         at
>> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:385)
>>         at
>> org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:628)
>>         at
>> org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:619)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at
>> org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:573)
>>         at
>> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:644)
>>         at
>> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:294)
>>         at
>> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:256)
>>         at
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:285)
>>         at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
>>         at
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
>>         at
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
>>         at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
>>         at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>>         at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>>         at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>>         at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>         at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>>         at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>>         at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>>         at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>>         at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>>         at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>>         at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>>         at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>>         at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>>         at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>>         at org.eclipse.jetty.server.Server.handle(Server.java:499)
>>         at
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>>         at
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>>         at org.eclipse.jetty.io
>> .AbstractConnection$2.run(AbstractConnection.java:540)
>>         at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>>         at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>
>
>
> Am 22.09.2015 um 12:56 schrieb Mikhail Khludnev:
>
>> It's quite strange
>> https://issues.apache.org/jira/browse/SOLR-7730 significantly optimized
>> DV
>> facets at 5.3.0 exactly by avoiding FileInfo merge.
>> Would you mind to provide deeper caller stack for
>> org.apache.lucene.index.FileInfos.MultibleFields.getMergedFieldInfos()?
>> Or a time spend in SlowCompositeReaderWrapper, DocValuesFacets,
>> MultiDocValues and their hot methods.
>> Which version you exactly on? and how do you know that?
>> Thanks
>>
>>
>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Am 23.09.2015 um 10:02 schrieb Mikhail Khludnev:
> ...
> Accelerating non-DV facets is not so clear so far. Please show profiler
> snapshot for non-DV facets if you wish to go this way.
>
Hi,

attached is a visualvm profile to several times a simplified query (just 
one facet):
> http://xyz/solr/hebis/select/?q=*:*&facet=true&facet.mincount=1&facet.limit=30&facet.field=author_facet&debugQuery=true

The avarage "QTime" for the query is ~5 Seconds:
> <lst name="process">
>   <double name="time">5254.0</double>
>   <lst name="query"><double name="time">0.0</double></lst>
>   <lst name="facet"><double name="time">5253.0</double></lst>
>   <lst name="mlt"><double name="time">0.0</double></lst>
>   <lst name="stats"><double name="time">0.0</double></lst>
>   <lst name="debug"><double name="time">0.0</double></lst>
>   <lst name="elevator"><double name="time">0.0</double></lst>
> </lst>

The profile was made with Solr 5.3 running an 4.10 index with no 
'docValue' at all in the schema. (A native 5.3 index with docValues is 
still building)

For me it's surprising, that a lot of "docValue" could be found in the 
profile.

Uwe

PS.
Meanwhile I tried a 5.1 and I got in the same behavior.

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Despite docvalues provide NRT faceting with a great performance (since
5.4), enum method is still really important for edge cases (many docs,
small num of terms).
Also, Solr's UnIvertedField had a really smart BigTerms strategy, when
fattest terms were counted by enum and remaining ones with fc. DocValues
needs to evolve much until it's capable to provide such level of
flexibility. I might confuse something again, but I a little bit worry
about existence BigTerms/UnIvertedField in recent versions.

On Wed, Sep 23, 2015 at 12:44 PM, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> Well done Mikhail,
> curious to see the performance!
>
> Apart the disk usage ( of course building docValues will cost more space),
> taking in consideration the Field cardinality, in the past when the field
> cardinality was low ( few unique values in the field), the enum approach
> was suggested ( so DocValues were non necessary strictly talking about
> faceting).
>
> After this improvement, in your opinion, what is the current status ? after
> the 5.4 release, will enum faceting approach make sense anymore ?
> Has anyone benchmarked this ?
>
> Cheers
>
> 2015-09-23 9:02 GMT+01:00 Mikhail Khludnev <mk...@griddynamics.com>:
>
> > Uwe,
> >
> > I'm sorry for confusion https://issues.apache.org/jira/browse/SOLR-7730
> > goes in 5.4 only. Hence, to get fast DV facets you need to apply patch
> > (it's pretty small).
> > Accelerating non-DV facets is not so clear so far. Please show profiler
> > snapshot for non-DV facets if you wish to go this way.
> >
> >
> > On Tue, Sep 22, 2015 at 5:36 PM, Uwe Reh <re...@hebis.uni-frankfurt.de>
> > wrote:
> >
> > > The exact version as shown by the UI is:
> > > - solr-impl   5.3.0 1696229 - noble - 2015-08-17 17:10:43
> > > - lucene-impl 5.3.0 1696229 - noble - 2015-08-17 16:59:03
> > >
> > > Unfortunately my skills in debugging are limited. So I'm not sure
> about a
> > > 'deeper caller stack'.
> > > Did you mean the attached snapshot from VirtualVM, a stack trace like
> > > below or something else? Please give me a hint.
> > >
> > > uwe
> > >
> > > "qtp1734853116-68" #68 prio=5 os_prio=64 tid=0x00000000117fd800
> nid=0x77
> > >> runnable [0xfffffd7f991fc000]
> > >>    java.lang.Thread.State: RUNNABLE
> > >>         at java.util.HashMap.resize(HashMap.java:734)
> > >>         at java.util.HashMap.putVal(HashMap.java:662)
> > >>         at java.util.HashMap.put(HashMap.java:611)
> > >>         at
> > >>
> >
> org.apache.lucene.index.FieldInfos$Builder.addOrUpdateInternal(FieldInfos.java:344)
> > >>         at
> > >> org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:366)
> > >>         at
> > >> org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:304)
> > >>         at
> > >>
> >
> org.apache.lucene.index.MultiFields.getMergedFieldInfos(MultiFields.java:245)
> > >>         at
> > >>
> >
> org.apache.lucene.index.SlowCompositeReaderWrapper.getFieldInfos(SlowCompositeReaderWrapper.java:237)
> > >>         at
> > >>
> >
> org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedSetDocValues(SlowCompositeReaderWrapper.java:174)
> > >>         at
> > >>
> >
> org.apache.solr.request.DocValuesFacets.getCounts(DocValuesFacets.java:72)
> > >>         at
> > >>
> > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:492)
> > >>         at
> > >>
> > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:385)
> > >>         at
> > >> org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:628)
> > >>         at
> > >> org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:619)
> > >>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > >>         at
> > >> org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:573)
> > >>         at
> > >>
> >
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:644)
> > >>         at
> > >>
> >
> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:294)
> > >>         at
> > >>
> >
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:256)
> > >>         at
> > >>
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:285)
> > >>         at
> > >>
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> > >>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
> > >>         at
> > >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
> > >>         at
> > >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
> > >>         at
> > >>
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
> > >>         at
> > >>
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
> > >>         at
> > >>
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> > >>         at
> > >>
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> > >>         at
> > >>
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> > >>         at
> > >>
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> > >>         at
> > >>
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> > >>         at
> > >>
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> > >>         at
> > >>
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> > >>         at
> > >>
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> > >>         at
> > >>
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> > >>         at
> > >>
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> > >>         at
> > >>
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> > >>         at
> > >>
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> > >>         at
> > >>
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> > >>         at org.eclipse.jetty.server.Server.handle(Server.java:499)
> > >>         at
> > >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> > >>         at
> > >>
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> > >>         at org.eclipse.jetty.io
> > >> .AbstractConnection$2.run(AbstractConnection.java:540)
> > >>         at
> > >>
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> > >>         at
> > >>
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> > >>         at java.lang.Thread.run(Thread.java:745)
> > >>
> > >
> > >
> > >
> > > Am 22.09.2015 um 12:56 schrieb Mikhail Khludnev:
> > >
> > >> It's quite strange
> > >> https://issues.apache.org/jira/browse/SOLR-7730 significantly
> optimized
> > >> DV
> > >> facets at 5.3.0 exactly by avoiding FileInfo merge.
> > >> Would you mind to provide deeper caller stack for
> > >>
> org.apache.lucene.index.FileInfos.MultibleFields.getMergedFieldInfos()?
> > >> Or a time spend in SlowCompositeReaderWrapper, DocValuesFacets,
> > >> MultiDocValues and their hot methods.
> > >> Which version you exactly on? and how do you know that?
> > >> Thanks
> > >>
> > >>
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> > <mk...@griddynamics.com>
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Alessandro Benedetti <be...@gmail.com>.
Well done Mikhail,
curious to see the performance!

Apart the disk usage ( of course building docValues will cost more space),
taking in consideration the Field cardinality, in the past when the field
cardinality was low ( few unique values in the field), the enum approach
was suggested ( so DocValues were non necessary strictly talking about
faceting).

After this improvement, in your opinion, what is the current status ? after
the 5.4 release, will enum faceting approach make sense anymore ?
Has anyone benchmarked this ?

Cheers

2015-09-23 9:02 GMT+01:00 Mikhail Khludnev <mk...@griddynamics.com>:

> Uwe,
>
> I'm sorry for confusion https://issues.apache.org/jira/browse/SOLR-7730
> goes in 5.4 only. Hence, to get fast DV facets you need to apply patch
> (it's pretty small).
> Accelerating non-DV facets is not so clear so far. Please show profiler
> snapshot for non-DV facets if you wish to go this way.
>
>
> On Tue, Sep 22, 2015 at 5:36 PM, Uwe Reh <re...@hebis.uni-frankfurt.de>
> wrote:
>
> > The exact version as shown by the UI is:
> > - solr-impl   5.3.0 1696229 - noble - 2015-08-17 17:10:43
> > - lucene-impl 5.3.0 1696229 - noble - 2015-08-17 16:59:03
> >
> > Unfortunately my skills in debugging are limited. So I'm not sure about a
> > 'deeper caller stack'.
> > Did you mean the attached snapshot from VirtualVM, a stack trace like
> > below or something else? Please give me a hint.
> >
> > uwe
> >
> > "qtp1734853116-68" #68 prio=5 os_prio=64 tid=0x00000000117fd800 nid=0x77
> >> runnable [0xfffffd7f991fc000]
> >>    java.lang.Thread.State: RUNNABLE
> >>         at java.util.HashMap.resize(HashMap.java:734)
> >>         at java.util.HashMap.putVal(HashMap.java:662)
> >>         at java.util.HashMap.put(HashMap.java:611)
> >>         at
> >>
> org.apache.lucene.index.FieldInfos$Builder.addOrUpdateInternal(FieldInfos.java:344)
> >>         at
> >> org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:366)
> >>         at
> >> org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:304)
> >>         at
> >>
> org.apache.lucene.index.MultiFields.getMergedFieldInfos(MultiFields.java:245)
> >>         at
> >>
> org.apache.lucene.index.SlowCompositeReaderWrapper.getFieldInfos(SlowCompositeReaderWrapper.java:237)
> >>         at
> >>
> org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedSetDocValues(SlowCompositeReaderWrapper.java:174)
> >>         at
> >>
> org.apache.solr.request.DocValuesFacets.getCounts(DocValuesFacets.java:72)
> >>         at
> >>
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:492)
> >>         at
> >>
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:385)
> >>         at
> >> org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:628)
> >>         at
> >> org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:619)
> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >>         at
> >> org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:573)
> >>         at
> >>
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:644)
> >>         at
> >>
> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:294)
> >>         at
> >>
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:256)
> >>         at
> >>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:285)
> >>         at
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> >>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
> >>         at
> >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
> >>         at
> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
> >>         at
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
> >>         at
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
> >>         at
> >>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> >>         at
> >>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> >>         at
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> >>         at
> >>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> >>         at
> >>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> >>         at
> >>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> >>         at
> >>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> >>         at
> >>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> >>         at
> >>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> >>         at
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> >>         at
> >>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> >>         at
> >>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> >>         at
> >>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> >>         at org.eclipse.jetty.server.Server.handle(Server.java:499)
> >>         at
> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> >>         at
> >>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> >>         at org.eclipse.jetty.io
> >> .AbstractConnection$2.run(AbstractConnection.java:540)
> >>         at
> >>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> >>         at
> >>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> >>         at java.lang.Thread.run(Thread.java:745)
> >>
> >
> >
> >
> > Am 22.09.2015 um 12:56 schrieb Mikhail Khludnev:
> >
> >> It's quite strange
> >> https://issues.apache.org/jira/browse/SOLR-7730 significantly optimized
> >> DV
> >> facets at 5.3.0 exactly by avoiding FileInfo merge.
> >> Would you mind to provide deeper caller stack for
> >> org.apache.lucene.index.FileInfos.MultibleFields.getMergedFieldInfos()?
> >> Or a time spend in SlowCompositeReaderWrapper, DocValuesFacets,
> >> MultiDocValues and their hot methods.
> >> Which version you exactly on? and how do you know that?
> >> Thanks
> >>
> >>
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mk...@griddynamics.com>
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Uwe,

I'm sorry for confusion https://issues.apache.org/jira/browse/SOLR-7730
goes in 5.4 only. Hence, to get fast DV facets you need to apply patch
(it's pretty small).
Accelerating non-DV facets is not so clear so far. Please show profiler
snapshot for non-DV facets if you wish to go this way.


On Tue, Sep 22, 2015 at 5:36 PM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:

> The exact version as shown by the UI is:
> - solr-impl   5.3.0 1696229 - noble - 2015-08-17 17:10:43
> - lucene-impl 5.3.0 1696229 - noble - 2015-08-17 16:59:03
>
> Unfortunately my skills in debugging are limited. So I'm not sure about a
> 'deeper caller stack'.
> Did you mean the attached snapshot from VirtualVM, a stack trace like
> below or something else? Please give me a hint.
>
> uwe
>
> "qtp1734853116-68" #68 prio=5 os_prio=64 tid=0x00000000117fd800 nid=0x77
>> runnable [0xfffffd7f991fc000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.util.HashMap.resize(HashMap.java:734)
>>         at java.util.HashMap.putVal(HashMap.java:662)
>>         at java.util.HashMap.put(HashMap.java:611)
>>         at
>> org.apache.lucene.index.FieldInfos$Builder.addOrUpdateInternal(FieldInfos.java:344)
>>         at
>> org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:366)
>>         at
>> org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:304)
>>         at
>> org.apache.lucene.index.MultiFields.getMergedFieldInfos(MultiFields.java:245)
>>         at
>> org.apache.lucene.index.SlowCompositeReaderWrapper.getFieldInfos(SlowCompositeReaderWrapper.java:237)
>>         at
>> org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedSetDocValues(SlowCompositeReaderWrapper.java:174)
>>         at
>> org.apache.solr.request.DocValuesFacets.getCounts(DocValuesFacets.java:72)
>>         at
>> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:492)
>>         at
>> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:385)
>>         at
>> org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:628)
>>         at
>> org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:619)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at
>> org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:573)
>>         at
>> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:644)
>>         at
>> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:294)
>>         at
>> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:256)
>>         at
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:285)
>>         at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
>>         at
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
>>         at
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
>>         at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
>>         at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>>         at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>>         at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>>         at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>         at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>>         at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>>         at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>>         at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>>         at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>>         at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>>         at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>>         at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>>         at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>>         at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>>         at org.eclipse.jetty.server.Server.handle(Server.java:499)
>>         at
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>>         at
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>>         at org.eclipse.jetty.io
>> .AbstractConnection$2.run(AbstractConnection.java:540)
>>         at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>>         at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>
>
>
> Am 22.09.2015 um 12:56 schrieb Mikhail Khludnev:
>
>> It's quite strange
>> https://issues.apache.org/jira/browse/SOLR-7730 significantly optimized
>> DV
>> facets at 5.3.0 exactly by avoiding FileInfo merge.
>> Would you mind to provide deeper caller stack for
>> org.apache.lucene.index.FileInfos.MultibleFields.getMergedFieldInfos()?
>> Or a time spend in SlowCompositeReaderWrapper, DocValuesFacets,
>> MultiDocValues and their hot methods.
>> Which version you exactly on? and how do you know that?
>> Thanks
>>
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
The exact version as shown by the UI is:
- solr-impl   5.3.0 1696229 - noble - 2015-08-17 17:10:43
- lucene-impl 5.3.0 1696229 - noble - 2015-08-17 16:59:03

Unfortunately my skills in debugging are limited. So I'm not sure about 
a 'deeper caller stack'.
Did you mean the attached snapshot from VirtualVM, a stack trace like 
below or something else? Please give me a hint.

uwe

> "qtp1734853116-68" #68 prio=5 os_prio=64 tid=0x00000000117fd800 nid=0x77 runnable [0xfffffd7f991fc000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.HashMap.resize(HashMap.java:734)
> 	at java.util.HashMap.putVal(HashMap.java:662)
> 	at java.util.HashMap.put(HashMap.java:611)
> 	at org.apache.lucene.index.FieldInfos$Builder.addOrUpdateInternal(FieldInfos.java:344)
> 	at org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:366)
> 	at org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:304)
> 	at org.apache.lucene.index.MultiFields.getMergedFieldInfos(MultiFields.java:245)
> 	at org.apache.lucene.index.SlowCompositeReaderWrapper.getFieldInfos(SlowCompositeReaderWrapper.java:237)
> 	at org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedSetDocValues(SlowCompositeReaderWrapper.java:174)
> 	at org.apache.solr.request.DocValuesFacets.getCounts(DocValuesFacets.java:72)
> 	at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:492)
> 	at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:385)
> 	at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:628)
> 	at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:619)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:573)
> 	at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:644)
> 	at org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:294)
> 	at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:256)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:285)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
> 	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
> 	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> 	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> 	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> 	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> 	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> 	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> 	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> 	at org.eclipse.jetty.server.Server.handle(Server.java:499)
> 	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> 	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> 	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> 	at java.lang.Thread.run(Thread.java:745)



Am 22.09.2015 um 12:56 schrieb Mikhail Khludnev:
> It's quite strange
> https://issues.apache.org/jira/browse/SOLR-7730 significantly optimized DV
> facets at 5.3.0 exactly by avoiding FileInfo merge.
> Would you mind to provide deeper caller stack for
> org.apache.lucene.index.FileInfos.MultibleFields.getMergedFieldInfos()?
> Or a time spend in SlowCompositeReaderWrapper, DocValuesFacets,
> MultiDocValues and their hot methods.
> Which version you exactly on? and how do you know that?
> Thanks
>


Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
It's quite strange
https://issues.apache.org/jira/browse/SOLR-7730 significantly optimized DV
facets at 5.3.0 exactly by avoiding FileInfo merge.
Would you mind to provide deeper caller stack for
org.apache.lucene.index.FileInfos.MultibleFields.getMergedFieldInfos()?
Or a time spend in SlowCompositeReaderWrapper, DocValuesFacets,
MultiDocValues and their hot methods.
Which version you exactly on? and how do you know that?
Thanks


On Tue, Sep 22, 2015 at 2:34 PM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:

> here is my try to detect with VirtualVM some hot spots with VirtualVM.
>
> Enviroment:
> A newly started node with ~15 times the query:
>
>>
>> http://yxz/solr/hebis/select/?q=darwin&facet=true&facet.mincount=1&facet.limit=30&facet.field=material_access&facet.field=department_3&facet.field=rvk_facet&facet.field=author_facet&facet.field=material_brief&facet.field=language&facet.prefix=&facet.sort=count&echoParams=all&debugQuery=true
>>
>
> Ordered by self time the top methods are:
>
>> org.eclipseutil.BlockingArrayQueue.poll():
>>  260s(self), 260s(total)
>> org.apache.lucene.index.FileInfos.init()
>> 90s(self),  90s(total)
>> org.apache.lucene.index.FileInfos.FieldNumbers.addOrGet()
>>  60s(self),  60s(total)
>> org.apache.lucene.index.FileInfos.Builder.addOrGetUpdateInternal()
>> 51s(self), 121s(total)
>> org.apache.lucene.index.FileInfos.Builder.finish()
>> 13s(self), 102s(total)
>> org.apache.lucene.index.FileInfos.Builder.fieldInfo()
>> 9s(self),   9s(total)
>> org.apache.lucene.index.FileInfos.Builder.add()
>> 4s(self), 126s(total)
>> org.apache.lucene.index.FileInfos.MultibleFields.getMergedFieldInfos()
>> 1s(self), 229s(total)
>> ...
>> less than 1000ms
>>
>
> Ordered by total time the top (non http/jetty) methods are:
>
>> jetty ...
>>  231s(total)
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody()
>> 231s(total)
>> org.apache.solr.request.SimpleFacets.*
>> 230s(total)
>> org.apache.solr.handler.component.FacetComponent.*
>> 230s(total)
>> org.apache.lucene.index.*
>>  125s(total)
>> org.apache.lucene.search.*
>>  .3s(total)
>> ...                                                                 less
>> than 300ms
>>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
here is my try to detect with VirtualVM some hot spots with VirtualVM.

Enviroment:
A newly started node with ~15 times the query:
> http://yxz/solr/hebis/select/?q=darwin&facet=true&facet.mincount=1&facet.limit=30&facet.field=material_access&facet.field=department_3&facet.field=rvk_facet&facet.field=author_facet&facet.field=material_brief&facet.field=language&facet.prefix=&facet.sort=count&echoParams=all&debugQuery=true

Ordered by self time the top methods are:
> org.eclipseutil.BlockingArrayQueue.poll():                           260s(self), 260s(total)
> org.apache.lucene.index.FileInfos.init()                              90s(self),  90s(total)
> org.apache.lucene.index.FileInfos.FieldNumbers.addOrGet()             60s(self),  60s(total)
> org.apache.lucene.index.FileInfos.Builder.addOrGetUpdateInternal()    51s(self), 121s(total)
> org.apache.lucene.index.FileInfos.Builder.finish()                    13s(self), 102s(total)
> org.apache.lucene.index.FileInfos.Builder.fieldInfo()                  9s(self),   9s(total)
> org.apache.lucene.index.FileInfos.Builder.add()                        4s(self), 126s(total)
> org.apache.lucene.index.FileInfos.MultibleFields.getMergedFieldInfos() 1s(self), 229s(total)
> ...                                                                    less than 1000ms

Ordered by total time the top (non http/jetty) methods are:
> jetty ...                                                           231s(total)
> org.apache.solr.handler.component.SearchHandler.handleRequestBody() 231s(total)
> org.apache.solr.request.SimpleFacets.*                              230s(total)
> org.apache.solr.handler.component.FacetComponent.*                  230s(total)
> org.apache.lucene.index.*                                           125s(total)
> org.apache.lucene.search.*                                           .3s(total)
> ...                                                                 less than 300ms


Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Joel Bernstein <jo...@gmail.com>.
Have you looked at your Solr instance with a cpu profiler like YourKit? It
would be useful to see the hotspots which should be really obvious with 20
second response times.

Also are you running in distributed mode or on a single Solr instance?

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Sep 21, 2015 at 9:42 AM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:

> Am 21.09.2015 um 15:16 schrieb Shalin Shekhar Mangar:
>
>> Can you post your complete facet request as well as the schema
>> definition of the field on which you are faceting?
>>
>>
> Query:
>
>>
>> http://yxz/solr/hebis/select/?q=darwin&facet=true&facet.mincount=1&facet.limit=30&facet.field=material_access&facet.field=department_3&facet.field=rvk_facet&facet.field=author_facet&facet.field=material_brief&facet.field=language&facet.prefix=&facet.sort=count&echoParams=all&debugQuery=true
>>
>
>
>
> Schema (with docValue):
>
>> ...
>> <field name="material_access" type="string" indexed="true" stored="false"
>> required="false" multiValued="true" docValues="true" />
>> <field name="author_facet" type="string" indexed="true" stored="false"
>> required="false" multiValued="true" docValues="true" />
>> ...
>> <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
>> ...
>>
>
>
>
> Schema (w/o docValue):
>
>> ...
>> <field name="material_access" type="string" indexed="true" stored="false"
>> required="false" multiValued="true" docValues="true" />
>> <field name="author_facet" type="string" indexed="true" stored="false"
>> required="false" multiValued="true" />
>> ...
>> <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
>> ...
>>
>
>
>
> solrconfig:
>
>> ...
>> <fieldValueCache class="solr.FastLRUCache" size="48" autowarmCount="20"
>> showItems="48" />
>> ...
>> <requestHandler name="/select" class="solr.SearchHandler">
>>       <lst name="defaults">
>>          <int name="rows">10</int>
>>          <str name="df">allfields</str>
>>          <str name="echoParams">none</str>
>>       </lst>
>>       <arr name="components">
>>          <str>query</str>
>>          <str>facet</str>
>>          <str>stats</str>
>>          <str>debug</str>
>>          <str>elevator</str>
>>       </arr>
>>    </requestHandler>
>>
>
>
>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Uwe
Unfortunately fieldValueCache was dropped there
https://github.com/apache/lucene-solr/commit/fca4c22da81447867533fb28c0f06150cdc2eb9d#diff-5ac9dc7b128b4dd99b764060759222b2R428
However, I see that it's still available in new JSON facets (thus, you need
to amend your app).
Otherwise, you can postpone migration till 5.4. or apply and measure DV
facet fix from SOLR-7730 <https://issues.apache.org/jira/browse/SOLR-7730>.

On Thu, Sep 24, 2015 at 12:38 PM, Uwe Reh <re...@hebis.uni-frankfurt.de>
wrote:

> Am 22.09.2015 um 18:10 schrieb Walter Underwood:
>
>> Faceting on an author field is almost always a bad idea. Or at least a
>> slow, expensive idea.
>>
>
> Hi Wunder,
> n a technical context, the 'author'-facet may be suboptimal. In our
> businesses (library services) it's a core feature.
> Yes the facet is expensive, but thanks to the fieldValueCache (4.10)
> sufficiently fast.
>
> uwe
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Sun, 2015-09-27 at 14:47 +0200, Uwe Reh wrote:
> Like Walter Underwood wrote, in technical sense faceting on authors 
> isn't a good idea.

In a technical sense, there is no good or bad about faceting on
high-cardinality fields in Solr. The faceting code is fairly efficient
(modulo the newly discovered regression) and scales well with the number
of references and unique terms. It gives the expected performance when
used with high-cardinality fields: Relatively heavy and with substantial
worst-case processing time.

As such should be enabled with care and a clear understanding of the
cost. But the same can be said of a great deal of other features, when
building an IT system. Labelling is a good or bad idea only makes sense
when looking at the specific context.

I am being a stickler about this because high-cardinality faceting in
Solr has an undeserved bad rep. Rather than discouraging it, we should
be better at describing the consequences of using it.

> In the worst case, the relation book to author is 
> n:n. Never the less, thanks to authority files (which are intensively 
> used in Germany) the facet 'author' is often helpful.

We have been faceting on Author (10M uniques) since 2007. It helps our
users navigate the corpus. It is a good idea for us.

We tried faceting on 6 billion uniques/machine as default in our Net
Archive (custom hack). It raised our non-pathological 75% percentile to
2½ second, with little value for the researchers. It was a bad idea for
us.

- Toke Eskildsen, State and University Library, Denmark



Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Hi Mikhail,

is this, what you've requested?
> lookups: 34084
> hits: 34067
> hitratio: 1
> inserts: 34
> evictions: 0
> ...
> item_author_facet: {field=author_facet,memSize=104189615,tindexSize=789195,time=16901,phase1=16534,nTerms=3989851,bigTerms=0,termInstances=16214154,uses=4065}
> item_topic_facet: {field=topic_facet,memSize=103817915,tindexSize=112199,time=8912,phase1=8496,nTerms=525261,bigTerms=0,termInstances=11050466,uses=1510}
> item_material_access: {field=material_access,memSize=4532,tindexSize=46,time=1820,phase1=1820,nTerms=2,bigTerms=2,termInstances=0,uses=3406}
(The fields 'author_facet' and 'topic_facet' do have a lot of unique 
entries. 'material_access' has only two values ('online' vs. 'print')

Beside of "*:*", querys with more than maxdoc/2 hits happen very very 
rawly. Typical requests results in less than 1% of maxdoc.

Here a typical example, searching for "Goethe" in the portfolio of the 
University Library Frankfurt/Main
 > https://hds.hebis.de/ubffm/Search/Results?lookfor=goethe&search=new
The request yields over 31,000 results (~.2%. of maxdocs). The majority 
are books about Goethe, 'just' 5700 books are from him. The facet helps 
to detect professionals.

Like Walter Underwood wrote, in technical sense faceting on authors 
isn't a good idea. In the worst case, the relation book to author is 
n:n. Never the less, thanks to authority files (which are intensively 
used in Germany) the facet 'author' is often helpful.

Uwe


Am 26.09.2015 um 14:08 schrieb Mikhail Khludnev:
> Uwe,
> Would you mind to provide a few details about your case?
> I wonder about number of bigterms and other stats as well at 'author' field
> (ant other most expensive facets). It looks like log rows:
>
> Sep 13, 2011 2:51:53 PM org.apache.solr.request.UnInvertedField uninvert
> INFO: UnInverted multi-valued field
> {*field=nomejornal*,memSize=827108,tindexSize=40,time=16,phase1=4,*nTerms=15,bigTerms=0*,termInstances=750,uses=0}
>
> Those heavy requests, do they find more than half of docs, eg hits>maxdoc/2 ?
>
>
> Thanks for your input!
>
>
> On Thu, Sep 24, 2015 at 11:38 AM, Uwe Reh <re...@hebis.uni-frankfurt.de>
> wrote:
>
>> Am 22.09.2015 um 18:10 schrieb Walter Underwood:
>>
>>> Faceting on an author field is almost always a bad idea. Or at least a
>>> slow, expensive idea.
>>>
>>
>> Hi Wunder,
>> n a technical context, the 'author'-facet may be suboptimal. In our
>> businesses (library services) it's a core feature.
>> Yes the facet is expensive, but thanks to the fieldValueCache (4.10)
>> sufficiently fast.
>>
>> uwe
>>
>>
>
>


Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Uwe,
Would you mind to provide a few details about your case?
I wonder about number of bigterms and other stats as well at 'author' field
(ant other most expensive facets). It looks like log rows:

Sep 13, 2011 2:51:53 PM org.apache.solr.request.UnInvertedField uninvert
INFO: UnInverted multi-valued field
{*field=nomejornal*,memSize=827108,tindexSize=40,time=16,phase1=4,*nTerms=15,bigTerms=0*,termInstances=750,uses=0}

Those heavy requests, do they find more than half of docs, eg hits>maxdoc/2 ?


Thanks for your input!


On Thu, Sep 24, 2015 at 11:38 AM, Uwe Reh <re...@hebis.uni-frankfurt.de>
wrote:

> Am 22.09.2015 um 18:10 schrieb Walter Underwood:
>
>> Faceting on an author field is almost always a bad idea. Or at least a
>> slow, expensive idea.
>>
>
> Hi Wunder,
> n a technical context, the 'author'-facet may be suboptimal. In our
> businesses (library services) it's a core feature.
> Yes the facet is expensive, but thanks to the fieldValueCache (4.10)
> sufficiently fast.
>
> uwe
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Am 22.09.2015 um 18:10 schrieb Walter Underwood:
> Faceting on an author field is almost always a bad idea. Or at least a slow, expensive idea.

Hi Wunder,
n a technical context, the 'author'-facet may be suboptimal. In our 
businesses (library services) it's a core feature.
Yes the facet is expensive, but thanks to the fieldValueCache (4.10) 
sufficiently fast.

uwe


Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Walter Underwood <wu...@wunderwood.org>.
Faceting on an author field is almost always a bad idea. Or at least a slow, expensive idea.

Faceting makes big in-memory lists. More values, bigger lists. An author field usually has many, many values, so you will need a lot of memory.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 21, 2015, at 6:42 AM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:
> 
> Am 21.09.2015 um 15:16 schrieb Shalin Shekhar Mangar:
>> Can you post your complete facet request as well as the schema
>> definition of the field on which you are faceting?
>> 
> 
> Query:
>> http://yxz/solr/hebis/select/?q=darwin&facet=true&facet.mincount=1&facet.limit=30&facet.field=material_access&facet.field=department_3&facet.field=rvk_facet&facet.field=author_facet&facet.field=material_brief&facet.field=language&facet.prefix=&facet.sort=count&echoParams=all&debugQuery=true
> 
> 
> 
> Schema (with docValue):
>> ...
>> <field name="material_access" type="string" indexed="true" stored="false" required="false" multiValued="true" docValues="true" />
>> <field name="author_facet" type="string" indexed="true" stored="false" required="false" multiValued="true" docValues="true" />
>> ...
>> <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
>> ...
> 
> 
> 
> Schema (w/o docValue):
>> ...
>> <field name="material_access" type="string" indexed="true" stored="false" required="false" multiValued="true" docValues="true" />
>> <field name="author_facet" type="string" indexed="true" stored="false" required="false" multiValued="true" />
>> ...
>> <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
>> ...
> 
> 
> 
> solrconfig:
>> ...
>> <fieldValueCache class="solr.FastLRUCache" size="48" autowarmCount="20" showItems="48" />
>> ...
>> <requestHandler name="/select" class="solr.SearchHandler">
>>      <lst name="defaults">
>>         <int name="rows">10</int>
>>         <str name="df">allfields</str>
>>         <str name="echoParams">none</str>
>>      </lst>
>>      <arr name="components">
>>         <str>query</str>
>>         <str>facet</str>
>>         <str>stats</str>
>>         <str>debug</str>
>>         <str>elevator</str>
>>      </arr>
>>   </requestHandler>
> 
> 


Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Am 21.09.2015 um 15:16 schrieb Shalin Shekhar Mangar:
> Can you post your complete facet request as well as the schema
> definition of the field on which you are faceting?
>

Query:
> http://yxz/solr/hebis/select/?q=darwin&facet=true&facet.mincount=1&facet.limit=30&facet.field=material_access&facet.field=department_3&facet.field=rvk_facet&facet.field=author_facet&facet.field=material_brief&facet.field=language&facet.prefix=&facet.sort=count&echoParams=all&debugQuery=true



Schema (with docValue):
> ...
> <field name="material_access" type="string" indexed="true" stored="false" required="false" multiValued="true" docValues="true" />
> <field name="author_facet" type="string" indexed="true" stored="false" required="false" multiValued="true" docValues="true" />
> ...
> <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
> ...



Schema (w/o docValue):
> ...
> <field name="material_access" type="string" indexed="true" stored="false" required="false" multiValued="true" docValues="true" />
> <field name="author_facet" type="string" indexed="true" stored="false" required="false" multiValued="true" />
> ...
> <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
> ...



solrconfig:
> ...
> <fieldValueCache class="solr.FastLRUCache" size="48" autowarmCount="20" showItems="48" />
> ...
> <requestHandler name="/select" class="solr.SearchHandler">
>       <lst name="defaults">
>          <int name="rows">10</int>
>          <str name="df">allfields</str>
>          <str name="echoParams">none</str>
>       </lst>
>       <arr name="components">
>          <str>query</str>
>          <str>facet</str>
>          <str>stats</str>
>          <str>debug</str>
>          <str>elevator</str>
>       </arr>
>    </requestHandler>



Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Can you post your complete facet request as well as the schema
definition of the field on which you are faceting?

On Mon, Sep 21, 2015 at 5:39 PM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:
> Hi,
>
> our bibliographic index (~20M entries) runs fine with Solr 4.10.3
> With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 seconds)
>>
>> Output of 'debugQuery':
>> <lst name="process"><double name="time">17705.0</double>
>> <lst name="query"><double name="time">2.0</double></lst>
>> <lst name="facet"><double name="time">17590.0</double></lst> !!!!!!
>> <lst name="debug"><double name="time">111.0</double></lst>
>
>
> The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
> 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
> cumulative_hitratio of 1.
>
> - the behavior is the same, running Solr5.3 on a copy of the old index
> (luceneMatch=4.6) or a newly build index
> - using 'facet.method=enum' makes no remarkable difference
> - declaring 'docValues' (with reindexing) makes no remarkable difference
> - 'softCommit' isn't used
>
> My enviroment is
>   OS: Solaris 5.11 on AMD64
>   JDK: 1.8.0_25 and 1.8.0_60 (same behavior)
>   JavaOpts: -Xmx 10g -XX:+UseG1GC -XX:+AggressiveOpts -XX:+UseLargePages
> -XX:LargePageSizeInBytes=2m
>
> Any help/advice is welcome
> Uwe



-- 
Regards,
Shalin Shekhar Mangar.

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
This fix definitely help for facet.field over docvalues field on
mult-segment index since 5.4.
I suppose it's irrelevant to JSON Facets, non-dv field, and pre 5.4.
I can not comment about comparing perfomance of dv and non-dv fields,
because "it depends" (с) benchmarking and profiler are the only advisers.

On Thu, Dec 17, 2015 at 9:22 AM, William Bell <bi...@gmail.com> wrote:

> Same question here....
>
> Wondering if faceting performance is fixed and how to take advantage of it
> ?
>
> On Wed, Dec 16, 2015 at 2:57 AM, Vincenzo D'Amore <v....@gmail.com>
> wrote:
>
> > Hi all,
> >
> > given that solr 5.4 is finally released, is this what's more stable and
> > efficient version of solrcloud ?
> >
> > I have a website which receives many search requests. It serve normally
> > about 2000 concurrent requests, but sometime there are peak from 4000 to
> > 10000 requests in few seconds.
> >
> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
> to
> > a new brand version, but following this thread I read about the problems
> > that can occur upgrading to latest version.
> >
> > I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> > is fixed in 5.4.
> >
> > I'm using standard faceting without docValues. Should I add docValues in
> > order to benefit of such fix?
> >
> > Best regards,
> > Vincenzo
> >
> >
> >
> > On Thu, Oct 8, 2015 at 2:22 PM, Mikhail Khludnev <
> > mkhludnev@griddynamics.com
> > > wrote:
> >
> > > Uwe, it's good to know! I mean that you've recovered. Take care!
> > >
> > > On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh <re...@hebis.uni-frankfurt.de>
> > > wrote:
> > >
> > > > Sorry for the delay. I had an ugly flu.
> > > >
> > > > SOLR-7730 seems to work fine. Using docValues with Solr
> > > > 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast
> again.
> > > > (90ms vs. 20000ms) :-)
> > > >
> > > > Thanks
> > > > Uwe
> > > >
> > > >
> > > >
> > > >
> > > > Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
> > > >
> > > >> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh <
> reh@hebis.uni-frankfurt.de>
> > > >> wrote:
> > > >>
> > > >> When 5.4 with SOLR-7730 will be released, I will start to use
> > docValues.
> > > >>> Going this way, seems more straight forward to me.
> > > >>>
> > > >>
> > > >>
> > > >> Sure. Giving your answers docValues facets has a really good chance
> to
> > > >> perform in your index after SOLR-7730. It's really interesting to
> see
> > > >> performance numbers on early 5.4 builds:
> > > >>
> > > >>
> > >
> >
> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
> > > >>
> > > >>
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > <http://www.griddynamics.com>
> > > <mk...@griddynamics.com>
> > >
> >
> >
> >
> > --
> > Vincenzo D'Amore
> > email: v.damore@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
> >
>
>
>
> --
> Bill Bell
> billnbell@gmail.com
> cell 720-256-8076
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by William Bell <bi...@gmail.com>.
Same question here....

Wondering if faceting performance is fixed and how to take advantage of it ?

On Wed, Dec 16, 2015 at 2:57 AM, Vincenzo D'Amore <v....@gmail.com>
wrote:

> Hi all,
>
> given that solr 5.4 is finally released, is this what's more stable and
> efficient version of solrcloud ?
>
> I have a website which receives many search requests. It serve normally
> about 2000 concurrent requests, but sometime there are peak from 4000 to
> 10000 requests in few seconds.
>
> On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster to
> a new brand version, but following this thread I read about the problems
> that can occur upgrading to latest version.
>
> I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> is fixed in 5.4.
>
> I'm using standard faceting without docValues. Should I add docValues in
> order to benefit of such fix?
>
> Best regards,
> Vincenzo
>
>
>
> On Thu, Oct 8, 2015 at 2:22 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com
> > wrote:
>
> > Uwe, it's good to know! I mean that you've recovered. Take care!
> >
> > On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh <re...@hebis.uni-frankfurt.de>
> > wrote:
> >
> > > Sorry for the delay. I had an ugly flu.
> > >
> > > SOLR-7730 seems to work fine. Using docValues with Solr
> > > 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again.
> > > (90ms vs. 20000ms) :-)
> > >
> > > Thanks
> > > Uwe
> > >
> > >
> > >
> > >
> > > Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
> > >
> > >> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh <re...@hebis.uni-frankfurt.de>
> > >> wrote:
> > >>
> > >> When 5.4 with SOLR-7730 will be released, I will start to use
> docValues.
> > >>> Going this way, seems more straight forward to me.
> > >>>
> > >>
> > >>
> > >> Sure. Giving your answers docValues facets has a really good chance to
> > >> perform in your index after SOLR-7730. It's really interesting to see
> > >> performance numbers on early 5.4 builds:
> > >>
> > >>
> >
> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
> > >>
> > >>
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> > <mk...@griddynamics.com>
> >
>
>
>
> --
> Vincenzo D'Amore
> email: v.damore@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Jamie Johnson <je...@gmail.com>.
Also can we get the capability to choose the method of faceting in the
older faceting component?  I'm not looking for complete feature parity just
the ability to specify the method.  As always thanks.

On Fri, Dec 18, 2015 at 8:04 AM, Jamie Johnson <je...@gmail.com> wrote:

> Can we still specify the cache implementation for the field cache?  When
> this change occurred to faceting (uninverting reader vs field ) it
> prevented us from moving to 5.x but if we can get the 4.x functionality
> using that api we could look to port to the latest.
>
> Jamie
> On Dec 17, 2015 9:18 AM, "Yonik Seeley" <ys...@gmail.com> wrote:
>
>> On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore <v....@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > given that solr 5.4 is finally released, is this what's more stable and
>> > efficient version of solrcloud ?
>> >
>> > I have a website which receives many search requests. It serve normally
>> > about 2000 concurrent requests, but sometime there are peak from 4000 to
>> > 10000 requests in few seconds.
>> >
>> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
>> to
>> > a new brand version, but following this thread I read about the problems
>> > that can occur upgrading to latest version.
>> >
>> > I have seen that issue SOLR-7730 "speed-up faceting on doc values
>> fields"
>> > is fixed in 5.4.
>> >
>> > I'm using standard faceting without docValues. Should I add docValues in
>> > order to benefit of such fix?
>>
>> You'll have to try it I think...
>> DocValues have a lot of advantages (much less heap consumption, and
>> much smaller overhead when opening a new searcher), but they can often
>> be slower as well.
>>
>> Comparing 4x to 5x non-docvalues, top-level field caches were removed
>> by lucene, and while that benefits certain things like NRT (opening a
>> new searcher very often), it will hurt performance for other
>> configurations.
>>
>> The JSON Facet API currently allows you to pick your strategy via the
>> "method" param for multi-valued string fields without docvalues:
>> "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
>> while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
>> "per-segment" strategy.
>>
>> -Yonik
>>
>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Jamie Johnson <je...@gmail.com>.
Can we still specify the cache implementation for the field cache?  When
this change occurred to faceting (uninverting reader vs field ) it
prevented us from moving to 5.x but if we can get the 4.x functionality
using that api we could look to port to the latest.

Jamie
On Dec 17, 2015 9:18 AM, "Yonik Seeley" <ys...@gmail.com> wrote:

> On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore <v....@gmail.com>
> wrote:
> > Hi all,
> >
> > given that solr 5.4 is finally released, is this what's more stable and
> > efficient version of solrcloud ?
> >
> > I have a website which receives many search requests. It serve normally
> > about 2000 concurrent requests, but sometime there are peak from 4000 to
> > 10000 requests in few seconds.
> >
> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
> to
> > a new brand version, but following this thread I read about the problems
> > that can occur upgrading to latest version.
> >
> > I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> > is fixed in 5.4.
> >
> > I'm using standard faceting without docValues. Should I add docValues in
> > order to benefit of such fix?
>
> You'll have to try it I think...
> DocValues have a lot of advantages (much less heap consumption, and
> much smaller overhead when opening a new searcher), but they can often
> be slower as well.
>
> Comparing 4x to 5x non-docvalues, top-level field caches were removed
> by lucene, and while that benefits certain things like NRT (opening a
> new searcher very often), it will hurt performance for other
> configurations.
>
> The JSON Facet API currently allows you to pick your strategy via the
> "method" param for multi-valued string fields without docvalues:
> "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
> while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
> "per-segment" strategy.
>
> -Yonik
>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by William Bell <bi...@gmail.com>.
Thanks Jamie.

On Sat, Dec 19, 2015 at 11:31 PM, Jamie Johnson <je...@gmail.com> wrote:

> Bill,
>
> Check out the patch attached to
> https://issues.apache.org/jira/browse/SOLR-8096.  I had considered making
> the method uif after I had done most of the work, it would be trivial to
> change and would probably be more aligned with not adding unexpected
> changes to people that are currently using fc.
>
> -Jamie
>
> On Sat, Dec 19, 2015 at 11:03 PM, William Bell <bi...@gmail.com>
> wrote:
>
> > Can we add method=uif back when not using the JSON Facet API too?
> >
> > That would help a lot of people.
> >
> > On Thu, Dec 17, 2015 at 7:17 AM, Yonik Seeley <ys...@gmail.com> wrote:
> >
> > > On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore <v....@gmail.com>
> > > wrote:
> > > > Hi all,
> > > >
> > > > given that solr 5.4 is finally released, is this what's more stable
> and
> > > > efficient version of solrcloud ?
> > > >
> > > > I have a website which receives many search requests. It serve
> normally
> > > > about 2000 concurrent requests, but sometime there are peak from 4000
> > to
> > > > 10000 requests in few seconds.
> > > >
> > > > On January I'll have a chance to upgrade my old SolrCloud 4.8.1
> cluster
> > > to
> > > > a new brand version, but following this thread I read about the
> > problems
> > > > that can occur upgrading to latest version.
> > > >
> > > > I have seen that issue SOLR-7730 "speed-up faceting on doc values
> > fields"
> > > > is fixed in 5.4.
> > > >
> > > > I'm using standard faceting without docValues. Should I add docValues
> > in
> > > > order to benefit of such fix?
> > >
> > > You'll have to try it I think...
> > > DocValues have a lot of advantages (much less heap consumption, and
> > > much smaller overhead when opening a new searcher), but they can often
> > > be slower as well.
> > >
> > > Comparing 4x to 5x non-docvalues, top-level field caches were removed
> > > by lucene, and while that benefits certain things like NRT (opening a
> > > new searcher very often), it will hurt performance for other
> > > configurations.
> > >
> > > The JSON Facet API currently allows you to pick your strategy via the
> > > "method" param for multi-valued string fields without docvalues:
> > > "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
> > > while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
> > > "per-segment" strategy.
> > >
> > > -Yonik
> > >
> >
> >
> >
> > --
> > Bill Bell
> > billnbell@gmail.com
> > cell 720-256-8076
> >
>



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Jamie Johnson <je...@gmail.com>.
Bill,

Check out the patch attached to
https://issues.apache.org/jira/browse/SOLR-8096.  I had considered making
the method uif after I had done most of the work, it would be trivial to
change and would probably be more aligned with not adding unexpected
changes to people that are currently using fc.

-Jamie

On Sat, Dec 19, 2015 at 11:03 PM, William Bell <bi...@gmail.com> wrote:

> Can we add method=uif back when not using the JSON Facet API too?
>
> That would help a lot of people.
>
> On Thu, Dec 17, 2015 at 7:17 AM, Yonik Seeley <ys...@gmail.com> wrote:
>
> > On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore <v....@gmail.com>
> > wrote:
> > > Hi all,
> > >
> > > given that solr 5.4 is finally released, is this what's more stable and
> > > efficient version of solrcloud ?
> > >
> > > I have a website which receives many search requests. It serve normally
> > > about 2000 concurrent requests, but sometime there are peak from 4000
> to
> > > 10000 requests in few seconds.
> > >
> > > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
> > to
> > > a new brand version, but following this thread I read about the
> problems
> > > that can occur upgrading to latest version.
> > >
> > > I have seen that issue SOLR-7730 "speed-up faceting on doc values
> fields"
> > > is fixed in 5.4.
> > >
> > > I'm using standard faceting without docValues. Should I add docValues
> in
> > > order to benefit of such fix?
> >
> > You'll have to try it I think...
> > DocValues have a lot of advantages (much less heap consumption, and
> > much smaller overhead when opening a new searcher), but they can often
> > be slower as well.
> >
> > Comparing 4x to 5x non-docvalues, top-level field caches were removed
> > by lucene, and while that benefits certain things like NRT (opening a
> > new searcher very often), it will hurt performance for other
> > configurations.
> >
> > The JSON Facet API currently allows you to pick your strategy via the
> > "method" param for multi-valued string fields without docvalues:
> > "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
> > while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
> > "per-segment" strategy.
> >
> > -Yonik
> >
>
>
>
> --
> Bill Bell
> billnbell@gmail.com
> cell 720-256-8076
>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by William Bell <bi...@gmail.com>.
Can we add method=uif back when not using the JSON Facet API too?

That would help a lot of people.

On Thu, Dec 17, 2015 at 7:17 AM, Yonik Seeley <ys...@gmail.com> wrote:

> On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore <v....@gmail.com>
> wrote:
> > Hi all,
> >
> > given that solr 5.4 is finally released, is this what's more stable and
> > efficient version of solrcloud ?
> >
> > I have a website which receives many search requests. It serve normally
> > about 2000 concurrent requests, but sometime there are peak from 4000 to
> > 10000 requests in few seconds.
> >
> > On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster
> to
> > a new brand version, but following this thread I read about the problems
> > that can occur upgrading to latest version.
> >
> > I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> > is fixed in 5.4.
> >
> > I'm using standard faceting without docValues. Should I add docValues in
> > order to benefit of such fix?
>
> You'll have to try it I think...
> DocValues have a lot of advantages (much less heap consumption, and
> much smaller overhead when opening a new searcher), but they can often
> be slower as well.
>
> Comparing 4x to 5x non-docvalues, top-level field caches were removed
> by lucene, and while that benefits certain things like NRT (opening a
> new searcher very often), it will hurt performance for other
> configurations.
>
> The JSON Facet API currently allows you to pick your strategy via the
> "method" param for multi-valued string fields without docvalues:
> "uif" (UninvertedField) gets you the top-level strategy from Solr 4,
> while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
> "per-segment" strategy.
>
> -Yonik
>



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Yonik Seeley <ys...@gmail.com>.
On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore <v....@gmail.com> wrote:
> Hi all,
>
> given that solr 5.4 is finally released, is this what's more stable and
> efficient version of solrcloud ?
>
> I have a website which receives many search requests. It serve normally
> about 2000 concurrent requests, but sometime there are peak from 4000 to
> 10000 requests in few seconds.
>
> On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster to
> a new brand version, but following this thread I read about the problems
> that can occur upgrading to latest version.
>
> I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> is fixed in 5.4.
>
> I'm using standard faceting without docValues. Should I add docValues in
> order to benefit of such fix?

You'll have to try it I think...
DocValues have a lot of advantages (much less heap consumption, and
much smaller overhead when opening a new searcher), but they can often
be slower as well.

Comparing 4x to 5x non-docvalues, top-level field caches were removed
by lucene, and while that benefits certain things like NRT (opening a
new searcher very often), it will hurt performance for other
configurations.

The JSON Facet API currently allows you to pick your strategy via the
"method" param for multi-valued string fields without docvalues:
"uif" (UninvertedField) gets you the top-level strategy from Solr 4,
while "dv" (DocValues built on-the-fly) gets you the NRT-friendly
"per-segment" strategy.

-Yonik

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi all,

given that solr 5.4 is finally released, is this what's more stable and
efficient version of solrcloud ?

I have a website which receives many search requests. It serve normally
about 2000 concurrent requests, but sometime there are peak from 4000 to
10000 requests in few seconds.

On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster to
a new brand version, but following this thread I read about the problems
that can occur upgrading to latest version.

I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
is fixed in 5.4.

I'm using standard faceting without docValues. Should I add docValues in
order to benefit of such fix?

Best regards,
Vincenzo



On Thu, Oct 8, 2015 at 2:22 PM, Mikhail Khludnev <mkhludnev@griddynamics.com
> wrote:

> Uwe, it's good to know! I mean that you've recovered. Take care!
>
> On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh <re...@hebis.uni-frankfurt.de>
> wrote:
>
> > Sorry for the delay. I had an ugly flu.
> >
> > SOLR-7730 seems to work fine. Using docValues with Solr
> > 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again.
> > (90ms vs. 20000ms) :-)
> >
> > Thanks
> > Uwe
> >
> >
> >
> >
> > Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
> >
> >> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh <re...@hebis.uni-frankfurt.de>
> >> wrote:
> >>
> >> When 5.4 with SOLR-7730 will be released, I will start to use docValues.
> >>> Going this way, seems more straight forward to me.
> >>>
> >>
> >>
> >> Sure. Giving your answers docValues facets has a really good chance to
> >> perform in your index after SOLR-7730. It's really interesting to see
> >> performance numbers on early 5.4 builds:
> >>
> >>
> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
> >>
> >>
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mk...@griddynamics.com>
>



-- 
Vincenzo D'Amore
email: v.damore@gmail.com
skype: free.dev
mobile: +39 349 8513251

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Uwe, it's good to know! I mean that you've recovered. Take care!

On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:

> Sorry for the delay. I had an ugly flu.
>
> SOLR-7730 seems to work fine. Using docValues with Solr
> 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again.
> (90ms vs. 20000ms) :-)
>
> Thanks
> Uwe
>
>
>
>
> Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
>
>> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh <re...@hebis.uni-frankfurt.de>
>> wrote:
>>
>> When 5.4 with SOLR-7730 will be released, I will start to use docValues.
>>> Going this way, seems more straight forward to me.
>>>
>>
>>
>> Sure. Giving your answers docValues facets has a really good chance to
>> perform in your index after SOLR-7730. It's really interesting to see
>> performance numbers on early 5.4 builds:
>>
>> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
>>
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Sorry for the delay. I had an ugly flu.

SOLR-7730 seems to work fine. Using docValues with Solr 
5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again. 
(90ms vs. 20000ms) :-)

Thanks
Uwe



Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:
>
>> When 5.4 with SOLR-7730 will be released, I will start to use docValues.
>> Going this way, seems more straight forward to me.
>
>
> Sure. Giving your answers docValues facets has a really good chance to
> perform in your index after SOLR-7730. It's really interesting to see
> performance numbers on early 5.4 builds:
> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
>


Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:

> When 5.4 with SOLR-7730 will be released, I will start to use docValues.
> Going this way, seems more straight forward to me.


Sure. Giving your answers docValues facets has a really good chance to
perform in your index after SOLR-7730. It's really interesting to see
performance numbers on early 5.4 builds:
https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Hi Mikhail,

thanks for the hint, and "no" it wasn't obvious for me. :-)
But I think, for us it's better to remain at 4.10.3 and observe the 
evolution of SOLR-8096. When 5.4 with SOLR-7730 will be released, I will 
start to use docValues. Going this way, seems more straight forward to me.

Uwe

Am 27.09.2015 um 00:20 schrieb Mikhail Khludnev:
> Uwe,
>
> As a workaround, can you add facet.threads=Ncores to count fields in
> parallel?
> Also, setting fcs method for single value fields runs per segment faceting
> in parallel.
> Of course, fields which has small number of terms are beneficial from enum
> method.
> Excuse me if it's obvious.
> https://cwiki.apache.org/confluence/display/solr/Faceting
>


Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Uwe,

As a workaround, can you add facet.threads=Ncores to count fields in
parallel?
Also, setting fcs method for single value fields runs per segment faceting
in parallel.
Of course, fields which has small number of terms are beneficial from enum
method.
Excuse me if it's obvious.
https://cwiki.apache.org/confluence/display/solr/Faceting


On Fri, Sep 25, 2015 at 1:33 PM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:

> Am 25.09.2015 um 05:16 schrieb Yonik Seeley:
>
>> I did some performance benchmarks and opened an issue.  It's bad.
>> https://issues.apache.org/jira/browse/SOLR-8096
>>
>
> Hi Yonik,
> thanks a lot for your investigation.
> Using the JSON Facet API is fast and seems to be a usable workaround for
> new applications. But not really, as fast patch to our production
> environment.
>
> What' your assessment about Bill's question? Is there a chance to get the
> fieldValueCache back?
>
> I would like to have it back in 5.x, even marked as deprecated. This would
> help to migrate.
>
> Uwe
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Yonik Seeley <ys...@gmail.com>.
On Fri, Sep 25, 2015 at 6:33 AM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:
> Am 25.09.2015 um 05:16 schrieb Yonik Seeley:
>>
>> I did some performance benchmarks and opened an issue.  It's bad.
>> https://issues.apache.org/jira/browse/SOLR-8096
>
>
> Hi Yonik,
> thanks a lot for your investigation.
> Using the JSON Facet API is fast and seems to be a usable workaround for new
> applications. But not really, as fast patch to our production environment.

Single-valued fields were likely also impacted (but probably not to
the extent that multi-valued fields were).
Are you faceting on any of those?

> What' your assessment about Bill's question? Is there a chance to get the
> fieldValueCache back?

Unclear.  If you look at
https://issues.apache.org/jira/browse/SOLR-8096
You see
"I was always in favour of removing those top-level facetting
algorithms. So they still have my strong +1."
Which means that it could be veto'd

-Yonik

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Am 25.09.2015 um 05:16 schrieb Yonik Seeley:
> I did some performance benchmarks and opened an issue.  It's bad.
> https://issues.apache.org/jira/browse/SOLR-8096

Hi Yonik,
thanks a lot for your investigation.
Using the JSON Facet API is fast and seems to be a usable workaround for 
new applications. But not really, as fast patch to our production 
environment.

What' your assessment about Bill's question? Is there a chance to get 
the fieldValueCache back?

I would like to have it back in 5.x, even marked as deprecated. This 
would help to migrate.

Uwe


Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Yonik Seeley <ys...@gmail.com>.
On Thu, Sep 24, 2015 at 9:58 AM, Yonik Seeley <ys...@gmail.com> wrote:
> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
> removed as part of LUCENE-5666, causing these performance regressions.

I did some performance benchmarks and opened an issue.  It's bad.
https://issues.apache.org/jira/browse/SOLR-8096

-Yonik

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by bi...@gmail.com.
Can we add it back with a parameter at least ?

Bill Bell
Sent from mobile


> On Sep 24, 2015, at 8:58 AM, Yonik Seeley <ys...@gmail.com> wrote:
> 
>> On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:
>> our bibliographic index (~20M entries) runs fine with Solr 4.10.3
>> With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 seconds)
> [...]
>> 
>> The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
>> 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
>> cumulative_hitratio of 1.
> 
> 
> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
> removed as part of LUCENE-5666, causing these performance regressions.
> 
> This code had been evolved over years to be very fast for specific use
> cases.  No one facet algorithm is going to be optimal for everyone, so
> it's important we have multiple.  But use of the UnInvertedField was
> removed without any notification or discussion whatsoever (and
> obviously no benchmarking), and was only discovered later by Solr devs
> in SOLR-7190 that it was essentially dead code.
> 
> 
> When I brought back my "JSON Facet API" work to Solr (which was based
> on 4.10.x) it came with a heavily modified version of UnInvertedField
> that is available via the JSON Facet API.  It might currently work
> better for your usecase.
> 
> On your normal (non-docValues) index, you can try something like the
> following to see what the performance would be:
> 
> $ curl http://yxz/solr/hebis/query -d 'q=darwin&
> json.facet={
>  authors : { type:terms, field:author_facet, limit:30 },
>  material_access : { type:terms, field:material_access, limit:30 },
>  material_brief : { type:terms, field:material_brief, limit:30 },
>  rvk : { type:terms, field:rvk_facet, limit:30 },
>  lang : { type:terms, field:language, limit:30 },
>  dept : { type:terms, field:department_3, limit:30 }
> }'
> 
> There were other changes in LUCENE-5666 that will probably slow down
> faceting on the single valued fields as well (so this may still be a
> little slower than 4.10.x), but hopefully it would be more
> competitive.
> 
> -Yonik

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Alessandro Benedetti <be...@gmail.com>.
Clear !
Now I understand the current situation.
Hope the issue will be fixed soon and the conference is recorded,
good luck!

Cheers

2015-09-25 15:22 GMT+01:00 Yonik Seeley <ys...@gmail.com>:

> On Fri, Sep 25, 2015 at 5:07 AM, Alessandro Benedetti
> <be...@gmail.com> wrote:
> >    There is an undocumented "method" parameter - I need to enable that to
> >
> >> allow switching between the docvalues approach and the UnInvertedField
> >> approach.
> >>
> >
> > Only to clarify, please correct me Yonik if my understanding is wrong or
> > outdated :
> > To calculate facets, without going into the algorithm details there are 2
> > approaches available :
> > Term Enum ( good for limited number of unique values for your field) and
> Fc
> > ( FieldCache) good for a lot of unique values, but not for big fields.
> >
> > For the FC approach,
> >  - storing the DocValues for the field would transparently use them (
> with
> > the known benefit at the cost of disk space for the docValues data
> > structures)
> >  - without the DocValues , there algorithm will un-invert the index at
> > runtime using the field cache to store the results
>
> Yeah, that's right so far.
> We should add a switch though for the method of uninversion...
> UnInvertedField (for indexes that change less frequently) vs DocValues
> (i.e. if you didn't index with DocValues, UnInvertedReader will
> uninvert to an in-memory structure that looks like DocValues).
>
> > So , from your quote, Term Enum will not be supported by Json Faceting ?
>
> We can, it just hasn't been a priority yet.
>
> Anyway, I'm going to step away from email and
> https://issues.apache.org/jira/browse/SOLR-8096 for a couple of days.
> I need to go focus on putting some slides together for
> Strata/HadoopWorld next week. I'll be talking about the new facet
> module / json facets there.
>
> -Yonik
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Yonik Seeley <ys...@gmail.com>.
On Fri, Sep 25, 2015 at 5:07 AM, Alessandro Benedetti
<be...@gmail.com> wrote:
>    There is an undocumented "method" parameter - I need to enable that to
>
>> allow switching between the docvalues approach and the UnInvertedField
>> approach.
>>
>
> Only to clarify, please correct me Yonik if my understanding is wrong or
> outdated :
> To calculate facets, without going into the algorithm details there are 2
> approaches available :
> Term Enum ( good for limited number of unique values for your field) and Fc
> ( FieldCache) good for a lot of unique values, but not for big fields.
>
> For the FC approach,
>  - storing the DocValues for the field would transparently use them ( with
> the known benefit at the cost of disk space for the docValues data
> structures)
>  - without the DocValues , there algorithm will un-invert the index at
> runtime using the field cache to store the results

Yeah, that's right so far.
We should add a switch though for the method of uninversion...
UnInvertedField (for indexes that change less frequently) vs DocValues
(i.e. if you didn't index with DocValues, UnInvertedReader will
uninvert to an in-memory structure that looks like DocValues).

> So , from your quote, Term Enum will not be supported by Json Faceting ?

We can, it just hasn't been a priority yet.

Anyway, I'm going to step away from email and
https://issues.apache.org/jira/browse/SOLR-8096 for a couple of days.
I need to go focus on putting some slides together for
Strata/HadoopWorld next week. I'll be talking about the new facet
module / json facets there.

-Yonik

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Alessandro Benedetti <be...@gmail.com>.
   There is an undocumented "method" parameter - I need to enable that to

> allow switching between the docvalues approach and the UnInvertedField
> approach.
>

Only to clarify, please correct me Yonik if my understanding is wrong or
outdated :
To calculate facets, without going into the algorithm details there are 2
approaches available :
Term Enum ( good for limited number of unique values for your field) and Fc
( FieldCache) good for a lot of unique values, but not for big fields.

For the FC approach,
 - storing the DocValues for the field would transparently use them ( with
the known benefit at the cost of disk space for the docValues data
structures)
 - without the DocValues , there algorithm will un-invert the index at
runtime using the field cache to store the results

So , from your quote, Term Enum will not be supported by Json Faceting ?
DocValues usage will not happen automatically ?

Cheers

>
> --
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Yonik Seeley <ys...@gmail.com>.
On Thu, Sep 24, 2015 at 10:16 AM, Alessandro Benedetti
<be...@gmail.com> wrote:
> Yonik, I am really excited about the Json faceting module.
> I find it really interesting.
> Is there any pros/cons in using them, or it's definitely the "approach of
> the future" ?

Thanks!

The cons to the new stuff is that it doesn't yet have everything the
old stuff has.  But it does already have new stuff that the old stuff
doesn't have (like sorting by any statistic and rudimentary block-join
integration).

And yes, I do see it as "the future", a platform for integrating the
disparate features that have been developed for solr over time, but
don't always work that well together:
 - search
 - statistics
 - grouping
 - joins


> I saw your benchmarks and seems impressive.
>
> I have not read all the topic in details, just briefly, but is Json
> faceting using different faceting algorithms from the standard ones ? (
> Enum and fc)

I wouldn't say different fundamental algorithms yet... (compared to
4.10) but different code (to support some of the new features) and in
some places more optimized.

> I can not find the algorithm parameter to be passed in the Json facets.

There is an undocumented "method" parameter - I need to enable that to
allow switching between the docvalues approach and the UnInvertedField
approach.

-Yonik


> Are they using a complete different approach ?
> Is the algorithm used expressed anywhere ?
> This could give very good insights on when to use them.
>
> Cheers
>
> 2015-09-24 14:58 GMT+01:00 Yonik Seeley <ys...@gmail.com>:
>
>> On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh <re...@hebis.uni-frankfurt.de>
>> wrote:
>> > our bibliographic index (~20M entries) runs fine with Solr 4.10.3
>> > With Solr 5.3 faceted searching is constantly incredibly slow (~ 20
>> seconds)
>> [...]
>> >
>> > The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
>> > 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
>> > cumulative_hitratio of 1.
>>
>>
>> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
>> removed as part of LUCENE-5666, causing these performance regressions.
>>
>> This code had been evolved over years to be very fast for specific use
>> cases.  No one facet algorithm is going to be optimal for everyone, so
>> it's important we have multiple.  But use of the UnInvertedField was
>> removed without any notification or discussion whatsoever (and
>> obviously no benchmarking), and was only discovered later by Solr devs
>> in SOLR-7190 that it was essentially dead code.
>>
>>
>> When I brought back my "JSON Facet API" work to Solr (which was based
>> on 4.10.x) it came with a heavily modified version of UnInvertedField
>> that is available via the JSON Facet API.  It might currently work
>> better for your usecase.
>>
>> On your normal (non-docValues) index, you can try something like the
>> following to see what the performance would be:
>>
>> $ curl http://yxz/solr/hebis/query -d 'q=darwin&
>> json.facet={
>>   authors : { type:terms, field:author_facet, limit:30 },
>>   material_access : { type:terms, field:material_access, limit:30 },
>>   material_brief : { type:terms, field:material_brief, limit:30 },
>>   rvk : { type:terms, field:rvk_facet, limit:30 },
>>   lang : { type:terms, field:language, limit:30 },
>>   dept : { type:terms, field:department_3, limit:30 }
>> }'
>>
>> There were other changes in LUCENE-5666 that will probably slow down
>> faceting on the single valued fields as well (so this may still be a
>> little slower than 4.10.x), but hopefully it would be more
>> competitive.
>>
>> -Yonik
>>
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Alessandro Benedetti <be...@gmail.com>.
Yonik, I am really excited about the Json faceting module.
I find it really interesting.
Is there any pros/cons in using them, or it's definitely the "approach of
the future" ?
I saw your benchmarks and seems impressive.

I have not read all the topic in details, just briefly, but is Json
faceting using different faceting algorithms from the standard ones ? (
Enum and fc)
I can not find the algorithm parameter to be passed in the Json facets.
Are they using a complete different approach ?
Is the algorithm used expressed anywhere ?
This could give very good insights on when to use them.

Cheers

2015-09-24 14:58 GMT+01:00 Yonik Seeley <ys...@gmail.com>:

> On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh <re...@hebis.uni-frankfurt.de>
> wrote:
> > our bibliographic index (~20M entries) runs fine with Solr 4.10.3
> > With Solr 5.3 faceted searching is constantly incredibly slow (~ 20
> seconds)
> [...]
> >
> > The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
> > 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
> > cumulative_hitratio of 1.
>
>
> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
> removed as part of LUCENE-5666, causing these performance regressions.
>
> This code had been evolved over years to be very fast for specific use
> cases.  No one facet algorithm is going to be optimal for everyone, so
> it's important we have multiple.  But use of the UnInvertedField was
> removed without any notification or discussion whatsoever (and
> obviously no benchmarking), and was only discovered later by Solr devs
> in SOLR-7190 that it was essentially dead code.
>
>
> When I brought back my "JSON Facet API" work to Solr (which was based
> on 4.10.x) it came with a heavily modified version of UnInvertedField
> that is available via the JSON Facet API.  It might currently work
> better for your usecase.
>
> On your normal (non-docValues) index, you can try something like the
> following to see what the performance would be:
>
> $ curl http://yxz/solr/hebis/query -d 'q=darwin&
> json.facet={
>   authors : { type:terms, field:author_facet, limit:30 },
>   material_access : { type:terms, field:material_access, limit:30 },
>   material_brief : { type:terms, field:material_brief, limit:30 },
>   rvk : { type:terms, field:rvk_facet, limit:30 },
>   lang : { type:terms, field:language, limit:30 },
>   dept : { type:terms, field:department_3, limit:30 }
> }'
>
> There were other changes in LUCENE-5666 that will probably slow down
> faceting on the single valued fields as well (so this may still be a
> little slower than 4.10.x), but hopefully it would be more
> competitive.
>
> -Yonik
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: faceting is unusable slow since upgrade to 5.3.0

Posted by Yonik Seeley <ys...@gmail.com>.
On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:
> our bibliographic index (~20M entries) runs fine with Solr 4.10.3
> With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 seconds)
[...]
>
> The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
> 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
> cumulative_hitratio of 1.


Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
removed as part of LUCENE-5666, causing these performance regressions.

This code had been evolved over years to be very fast for specific use
cases.  No one facet algorithm is going to be optimal for everyone, so
it's important we have multiple.  But use of the UnInvertedField was
removed without any notification or discussion whatsoever (and
obviously no benchmarking), and was only discovered later by Solr devs
in SOLR-7190 that it was essentially dead code.


When I brought back my "JSON Facet API" work to Solr (which was based
on 4.10.x) it came with a heavily modified version of UnInvertedField
that is available via the JSON Facet API.  It might currently work
better for your usecase.

On your normal (non-docValues) index, you can try something like the
following to see what the performance would be:

$ curl http://yxz/solr/hebis/query -d 'q=darwin&
json.facet={
  authors : { type:terms, field:author_facet, limit:30 },
  material_access : { type:terms, field:material_access, limit:30 },
  material_brief : { type:terms, field:material_brief, limit:30 },
  rvk : { type:terms, field:rvk_facet, limit:30 },
  lang : { type:terms, field:language, limit:30 },
  dept : { type:terms, field:department_3, limit:30 }
}'

There were other changes in LUCENE-5666 that will probably slow down
faceting on the single valued fields as well (so this may still be a
little slower than 4.10.x), but hopefully it would be more
competitive.

-Yonik