You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Srinivas Kashyap <sr...@bamboorose.com.INVALID> on 2020/10/02 07:22:20 UTC

RE: Sql entity processor sortedmapbackedcache out of memory issue

Hi Shawn,

Continuing with the older thread, I have implemented WHERE clause on the inner child entity. When the import is run, whether it brings only the records matched with WHERE condition to JVM memory or will it bring entire SQL with joined tables on to JVM and does the WHERE filter in memory?

Also, I have written custom java code, 'onImportEnd' event listener. Can I call destroy() method of SortedMapBackedCache class to remove the cached entities in this event listener. This is required since for every import, there would be some entities which would be new and wouldn't be present in previous run of dih cache. My assumption is, when I call destroy method it would free up the JVM memory and wouldn't cause OOM.


Also Is there a way I can specify Garbage collection to run on DIHCache every time when an import is finished on a core.

P.S: Ours is a standalone Solr server with 18 cores in it. Each core is in sync by running full-import on SortedMapBackedCache entities with WHERE clause based on timestamp(last index time) on child entities.

-----Original Message-----
From: Shawn Heisey <ap...@elyograg.org>
Sent: 09 April 2019 13:27
To: solr-user@lucene.apache.org
Subject: Re: Sql entity processor sortedmapbackedcache out of memory issue

On 4/8/2019 11:47 PM, Srinivas Kashyap wrote:
> I'm using DIH to index the data and the structure of the DIH is like below for solr core:
>
> <entity>
> 16 child entities
> </entity>
>
> During indexing, since the number of requests being made to database was high(to process one document 17 queries) and was utilizing most of connections of database thereby blocking our web application.

If you have 17 entities, then one document will indeed take 17 queries.
That's the nature of multiple DIH entities.

> To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to reduce the number of requests to database.

When you use SortedMapBackedCache on an entity, you are asking Solr to store the results of the entire query in memory, even if you don't need all of the results.  If the database has a lot of rows, that's going to take a lot of memory.

In your excerpt from the config, your inner entity doesn't have a WHERE clause.  Which means that it's going to retrieve all of the rows of the ABC table for *EVERY* single entry in the DEF table.  That's going to be exceptionally slow.  Normally the SQL query on inner entities will have some kind of WHERE clause that limits the results to rows that match the entry from the outer entity.

You may need to write a custom indexing program that runs separately from Solr, possibly on an entirely different server.  That might be a lot more efficient than DIH.

Thanks,
Shawn
________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way.
No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.