You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sundeep T <su...@gmail.com> on 2017/08/25 06:02:01 UTC

What is the org.apache.solr.uninverting.FieldCacheImpl?

Hi,

In our enterprise application, we occasionally get range facet queries
ordered by the timestamp field. The timestamp field is of date type.

Below is the query from solr.log -

2017-08-25 05:18:51.048 INFO  (qtp1321530272-90) [   x:drums]
o.a.s.c.S.Request [drums]  webapp=/solr path=/select
params={df=text&distrib=false&_facet_={}&fl=id&fl=score&shards.purpose=1048580&start=0&fsv=true&shard.url=
http://localhost:8983/solr/drums&rows=0&version=2&q=*:*&json.facet={“timestamp”:{“type”:“range”,“field”:“timestamp”,“start”:“2016-05-28T16:19:09.857Z”,“end”:“2017-08-18T10:57:10.365Z”,“gap”:“+5000SECOND”,“limit”:100000,“sort”:{“index”:“desc”},“facet”:{}}}&NOW=1503638261623&isShard=true&timeAllowed=-1&wt=javabin}
hits=68541066 status=0 QTime=69422

Whenever such query runs we see that
org.apache.solr.uninverting.FieldCacheImpl is being populated in the
backend jvm heap. When we analyzed using heapdump, all the underlying
objects in the FieldCacheImpl have timestamp as the cache key. It seems to
be taking quite a bit of memory.

Does any one have an idea what this cache is and why its being populated?
Also, what is the criteria for clearing this cache?

Really appreciate your response. Thanks!

Re: What is the org.apache.solr.uninverting.FieldCacheImpl?

Posted by Erick Erickson <er...@gmail.com>.
You need to enable docValues on the field (and completely reindex).

The standard inverted index structure is great for answering "for term
X in field Y, what docs does it appear in?". It's rotten for the
"uninverted" case: "For doc X, what is the value of field Y?". This
latter question is the one that needs to be answered for sorting,
faceting and grouping. So when you do one of those operations, Solr
(well, Lucene actually) "uninverts" the field into the JVM if you have
not specified docValues="true" and builds a structure efficient for
answering this latter question.

Specifying docValues="true" effectively builds this "uninverted"
structure at _index time_ and serializes it out to disk. Then the
structure is mapped into MMapDirectory space using the OS memory (much
more efficient than the JVM).

Helpful background for MMapDirectory:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

See also:
https://lucene.apache.org/solr/guide/6_6/docvalues.html

Best,
Erick

On Thu, Aug 24, 2017 at 11:02 PM, Sundeep T <su...@gmail.com> wrote:
> Hi,
>
> In our enterprise application, we occasionally get range facet queries
> ordered by the timestamp field. The timestamp field is of date type.
>
> Below is the query from solr.log -
>
> 2017-08-25 05:18:51.048 INFO  (qtp1321530272-90) [   x:drums]
> o.a.s.c.S.Request [drums]  webapp=/solr path=/select
> params={df=text&distrib=false&_facet_={}&fl=id&fl=score&shards.purpose=1048580&start=0&fsv=true&shard.url=
> http://localhost:8983/solr/drums&rows=0&version=2&q=*:*&json.facet={“timestamp”:{“type”:“range”,“field”:“timestamp”,“start”:“2016-05-28T16:19:09.857Z”,“end”:“2017-08-18T10:57:10.365Z”,“gap”:“+5000SECOND”,“limit”:100000,“sort”:{“index”:“desc”},“facet”:{}}}&NOW=1503638261623&isShard=true&timeAllowed=-1&wt=javabin}
> hits=68541066 status=0 QTime=69422
>
> Whenever such query runs we see that
> org.apache.solr.uninverting.FieldCacheImpl is being populated in the
> backend jvm heap. When we analyzed using heapdump, all the underlying
> objects in the FieldCacheImpl have timestamp as the cache key. It seems to
> be taking quite a bit of memory.
>
> Does any one have an idea what this cache is and why its being populated?
> Also, what is the criteria for clearing this cache?
>
> Really appreciate your response. Thanks!