You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by solr2020 <ps...@gmail.com> on 2014/03/17 19:26:12 UTC

More heap usage in Solr during indexing

Hi,

we have 80 million records in index now and we are indexing 800k records
everyday.We have one shard and 4 replicas in 4 servers under solrcloud.
Currently we have 16GB heap but during indexing sometimes it is reaching
16GB and sometimes its normal. What is the reason to use the max heap at
sometimes during indexing?

due to large index size(80M docs) or some large incoming record.


Thanks.



--
View this message in context: http://lucene.472066.n3.nabble.com/More-heap-usage-in-Solr-during-indexing-tp4124898.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: More heap usage in Solr during indexing

Posted by solr2020 <ps...@gmail.com>.
We are doing Autocommit for every five minutes.



--
View this message in context: http://lucene.472066.n3.nabble.com/More-heap-usage-in-Solr-during-indexing-tp4124898p4125497.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: More heap usage in Solr during indexing

Posted by Greg Walters <gr...@answers.com>.
It's entirely possible that you're seeing higher memory usage while indexing due to more objects being created and abandoned. Another thing to consider could be your commit settings. Perhaps http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html can answer some of your questions and it might also be worthwhile to check http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning for some general suggestions.

Thanks,
Greg

On Mar 17, 2014, at 1:39 PM, solr2020 <ps...@gmail.com> wrote:

> previously we faced OOM when we try to index 1.2M records at the same time.
> Now we divided that into two chunks and indexing twice. So now we are not
> getting OOM but heap usage is more. So we are analyzing and trying to find
> the cause to make sure we shouldn't get OOM again.
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/More-heap-usage-in-Solr-during-indexing-tp4124898p4124906.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: More heap usage in Solr during indexing

Posted by solr2020 <ps...@gmail.com>.
Yes Shawn. our data source is oracle DB. Here is the datasource section
config.

<dataSource name="jdbc" driver="oracle.jdbc.OracleDriver"
		url="jdbc:oracle:thin:@dbname:port:dbname" user="user" password="password"
 		batchSize="5000" autoCommit="false"
		transactionIsolation="TRANSACTION_READ_COMMITTED"
holdability="CLOSE_CURSORS_AT_COMMIT"/>





--
View this message in context: http://lucene.472066.n3.nabble.com/More-heap-usage-in-Solr-during-indexing-tp4124898p4124934.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: More heap usage in Solr during indexing

Posted by Shawn Heisey <so...@elyograg.org>.
On 3/17/2014 12:39 PM, solr2020 wrote:
> previously we faced OOM when we try to index 1.2M records at the same time.
> Now we divided that into two chunks and indexing twice. So now we are not
> getting OOM but heap usage is more. So we are analyzing and trying to find
> the cause to make sure we shouldn't get OOM again.

How are you indexing?  A previous message you sent to the mailing list 
indicates that your source is a DB table.

If that's true, can you share the dataSource section(s) from your 
dataimport handler configuration? You might be running into a situation 
where DIH is retrieving the entire dataset via JDBC.

For a MySQL JDBC driver, you can avoid this with a batchSize parameter 
set to -1.  This causes the JDBC driver to stream the results from the 
server rather than read them into memory.  Other JDBC drivers may need 
different settings.

http://mysolr.com/tips/dataimporthandler-runs-out-of-memory-on-large-table/

Thanks,
Shawn


Re: More heap usage in Solr during indexing

Posted by solr2020 <ps...@gmail.com>.
previously we faced OOM when we try to index 1.2M records at the same time.
Now we divided that into two chunks and indexing twice. So now we are not
getting OOM but heap usage is more. So we are analyzing and trying to find
the cause to make sure we shouldn't get OOM again.



--
View this message in context: http://lucene.472066.n3.nabble.com/More-heap-usage-in-Solr-during-indexing-tp4124898p4124906.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: More heap usage in Solr during indexing

Posted by Greg Walters <gr...@answers.com>.
Are your JVM running out of ram (actual exceptions) or is the used heap just reaching 16G prior to a garbage collection? If it's the later then that is expected behavior and is how Java's garbage collection works.

Thanks,
Greg

On Mar 17, 2014, at 1:26 PM, solr2020 <ps...@gmail.com> wrote:

> Hi,
> 
> we have 80 million records in index now and we are indexing 800k records
> everyday.We have one shard and 4 replicas in 4 servers under solrcloud.
> Currently we have 16GB heap but during indexing sometimes it is reaching
> 16GB and sometimes its normal. What is the reason to use the max heap at
> sometimes during indexing?
> 
> due to large index size(80M docs) or some large incoming record.
> 
> 
> Thanks.
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/More-heap-usage-in-Solr-during-indexing-tp4124898.html
> Sent from the Solr - User mailing list archive at Nabble.com.