You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by IJ <ja...@gmail.com> on 2014/10/11 10:42:32 UTC
Query on Solr Caches, OOM Errors and - how q and fq affect Solr Cache memory consumption

This question is related to Out of Memory Errors that I am seeing on my Solr
Cloud Setup - I am running Solr 4.5.1.

Here is how my setup looks:
1. Have 6 Solr Tomcat Nodes distributed across 3 Servers - i.e. 2 nodes per
server
2. Each tomcat node has been allocated 2 GB RAM - XmX setting

Have two Collections:
1. UserGroupMappings (2 shards x 3 replicas = 6 Cores)
2. GroupCustomerMappings (2 shards x 3 replicas = 6 Cores)

Each Solr Tomcat Node contains 2 cores - one from each of the 2 collection.
Every Solr Tomcat Node manages a combined index size (from 2 cores) of
approximately 6 GB.

Here is the approximate structure of my Collections (have eliminated some
fields for ease of representation):
1. UserGroupCollection - UserId, CompanyName, GroupName
Each User is affiliated with exactly one company - but could service
multiple Groups

2. GroupCustomerMappings - CustomerId, FirstName, LastName, Address,
CustomerNameAndAddress (Copy Field), GroupName (multi-valued field), Company
Name
A Customer could be affiliated with multiple companies - and within a
company could be affiliated with one or more groups

>From my FE - when a user performs a search - he / she is looking for
customers by entering their name or address.
My application performs two queries:

Query 1: Fetch all Groups for current user from UserGroupCollection
q= *:*
fq= UserId:<current user id> and CompanyName:<current user's company>
fl = GroupName

Query 2: Search for Customers within groups returned from Query 1
q= CustomerNameAndAddress: <user entered search term>
fq= CompanyName:<current user's company> and (GroupName: <first groupName
returned by Query 1> OR GroupName: <second groupName returned by Query 1> OR
.... )
fl = CustomerId, FirstName, LastName, Address

Have been load testing the application with 100 unique users. Have been
noticing that every now and then some of my nodes run out of Memory.

Did some research on the net and came across this article -
http://teaspoon-consulting.com/articles/solr-cache-tuning.html - that
explains the structure of the query cache, filter cache and document cache.
In specific it states that:
1. Each entry in the Query Cache holds an array of integers containing
DocIds that are returned as part of the search results.
2. Each entry in the Filter Cache holds an array of bits, and the size of
the array equals the total number of documents in the current core (this was
a real eye opener - points at the fact that the memory requirements of this
cache increases for larger indices)

Based on the above information should I consider Query 1 to something as
follows:
Query 1 (Modified)
q= UserId:<current user id> and CompanyName:<current user's company>
fl = GroupName

Basically, I have gotten rid of the "fq" completely and put the query
parameters in the "q". I am doing this because I know that Each User in my
system manages a max of 50 Groups - which means a max of 50 Doc Ids
(integers) in each query cache entry - as compared to 500,000 bits per
filter cache entry - which is a lot more memory.

I get the feeling that the original Query 1 is probably the semantically
right solution (also I dont care about scoring for Query 1 - since its just
a data fetch than a search) - but modified Query 1 will be much more
performant. Is this a correct decision that I am making on Query 1 ?

Also on Query 2 - I am looking to modify it as follows:
Query 2 (Modified)
q= CustomerNameAndAddress: <user entered search term> and
CompanyName:<current user's company>
fq= CompanyName:<current user's company> and (GroupName: <first groupName
returned by Query 1> OR GroupName: <second groupName returned by Query 1> OR
.... )
fl = CustomerId, FirstName, LastName, Address

The only change I am making here is to add the "CompanyName:<current user's
company>" which was originally only in the "fq" to the "q" as well (please
note that I am NOT removing this parameter from the fq).
The reason I am doing this is - if a user on the system searches for a
person named "john" - I don't want the Query Cache to hold docIds for John's
across all companies. I just want the query cache to hold DocIds for all
John's in context of the current company.

Is this a correct decision on Query 2 ? The one thing I am NOT very sure
about is whether its appropriate / justifiable in my Use Case to have the
same query parameter "CompanyName" in both the "q" and "fq".

Also, need to mention that I fell into the trap of setting extremely huge
cache sizes for the Solr Caches - Had set the sizes of each of the 3 caches
(query, filter and document caches) to 500,000 entries - which is way too
extreme. I am going to reduce that number to something between 500 - 1000
per cache based on query patterns that I see in the live system.

I did a memory dump on one of my solr tomcat nodes - when it was on the
verge of running out of memory (with my extreme cache settings) - and the
biggest consumers of memory happened to be:

class [Ljava.lang.Object;                                   - 9,692
instances,     Size - 721,392,616
class [B                                                          - 25,285
instances,     Size -  91,429,183 
class org.apache.lucene.search.ScoreDoc      - 3,431,025 instances,     Size
-  41,172,300
class [I                                                             - 3,556
instances,     Size -  34,444,276
class [C                                                            -92,970
instances,     Size -  13,751,378
 
I am trying to co-relate / maps these classes to one of the 3 caches -
filter, query and coument caches.

Is it fair to assume the following: 
1. "class [I" or the array of integers maps to the query cache ?? -- Its
highly possible that I ran 3556 unique queries
2.  "class [B" or the array of bytes maps to the filter cache ?? -- I am
actually a little surprised here - I am seeing 25,285 instances - I was
expecting a number more like 100 - since I had 100 unique users on my load
test - and the way my system works - I have a unique filter query per user -
so that makes a total of 100 filter cache entries.

Also, I have no clue on what the class [Ljava.lang.Object - array of objects
and class [C - array of characters represent ? Are they related to any of
the caches or any thing else in Solr ? The object array is the biggest
consumer of heap space - what could it be ??

Also - why am I seeing so many instances (3.4 Million) of
"org.apache.lucene.search.ScoreDoc" ? Is this being stored in one of the
caches ??




--
View this message in context: http://lucene.472066.n3.nabble.com/Query-on-Solr-Caches-OOM-Errors-and-how-q-and-fq-affect-Solr-Cache-memory-consumption-tp4163814.html
Sent from the Solr - User mailing list archive at Nabble.com.