You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Dan ." <ro...@gmail.com> on 2017/05/18 10:08:07 UTC
Slow Bulk InPlace DocValues updates
Hi,
-Solr 6.5.1
-SSD disk
-23M docs index 64G single shard
I'm trying to do around 4M in-place docValue updates to a collection
(single shard or around 23M docs) [these are ALL in-place updates]
I can add the updates in around 7mins, but flushing to disk takes around
40mins! I've been able to add the updates quickly by adding:
<indexConfig>
<ramBufferSizeMB>4000</ramBufferSizeMB>
</indexConfig>
autoSoftCommit/autoCommit currently disabled.
From the thread dump I see that the flush is in a single thread and
extremely slow. Dump below, the culprit seems to be [
-
org.apache.lucene.index.BufferedUpdatesStream.applyDocValuesUpdates(BufferedUpdatesStream.java:666)]
:
-
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.pushFrame(SegmentTermsEnum.java:256)
-
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.pushFrame(SegmentTermsEnum.java:248)
-
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:538)
-
org.apache.lucene.index.BufferedUpdatesStream.applyDocValuesUpdates(BufferedUpdatesStream.java:666)
-
org.apache.lucene.index.BufferedUpdatesStream.applyDocValuesUpdatesList(BufferedUpdatesStream.java:612)
-
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:269)
-
org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3454)
-
org.apache.lucene.index.IndexWriter.applyDeletesAndPurge(IndexWriter.java:4990)
-
org.apache.lucene.index.DocumentsWriter$ApplyDeletesEvent.process(DocumentsWriter.java:717)
-
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5040)
-
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5031)
-
org.apache.lucene.index.IndexWriter.updateDocValues(IndexWriter.java:1731)
-
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:911)
-
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:302)
-
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
-
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194)
I think this is related to
SOLR-6838 [https://issues.apache.org/jira/browse/SOLR-6838]
and
LUCENE-6161 [https://issues.apache.org/jira/browse/LUCENE-6161]
I need to make the flush faster, to complete the update quicker. Has anyone
a workaround or have any suggestions?
Many thanks,
Dan
Re: Slow Bulk InPlace DocValues updates
Posted by Damien Kamerman <da...@gmail.com>.
Adding more shards will scale your writes.
On 18 May 2017 at 20:08, Dan . <ro...@gmail.com> wrote:
> Hi,
>
> -Solr 6.5.1
> -SSD disk
> -23M docs index 64G single shard
>
> I'm trying to do around 4M in-place docValue updates to a collection
> (single shard or around 23M docs) [these are ALL in-place updates]
>
> I can add the updates in around 7mins, but flushing to disk takes around
> 40mins! I've been able to add the updates quickly by adding:
>
> <indexConfig>
> <ramBufferSizeMB>4000</ramBufferSizeMB>
> </indexConfig>
>
> autoSoftCommit/autoCommit currently disabled.
>
> From the thread dump I see that the flush is in a single thread and
> extremely slow. Dump below, the culprit seems to be [
>
> -
> org.apache.lucene.index.BufferedUpdatesStream.applyDocValuesUpdates(
> BufferedUpdatesStream.java:666)]
>
> :
>
>
> -
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> pushFrame(SegmentTermsEnum.java:256)
> -
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> pushFrame(SegmentTermsEnum.java:248)
> -
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> seekExact(SegmentTermsEnum.java:538)
>
>
>
> -
> org.apache.lucene.index.BufferedUpdatesStream.applyDocValuesUpdates(
> BufferedUpdatesStream.java:666)
> -
> org.apache.lucene.index.BufferedUpdatesStream.
> applyDocValuesUpdatesList(BufferedUpdatesStream.java:612)
> -
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(
> BufferedUpdatesStream.java:269)
> -
> org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(
> IndexWriter.java:3454)
> -
> org.apache.lucene.index.IndexWriter.applyDeletesAndPurge(
> IndexWriter.java:4990)
> -
> org.apache.lucene.index.DocumentsWriter$ApplyDeletesEvent.process(
> DocumentsWriter.java:717)
> -
> org.apache.lucene.index.IndexWriter.processEvents(
> IndexWriter.java:5040)
> -
> org.apache.lucene.index.IndexWriter.processEvents(
> IndexWriter.java:5031)
> -
> org.apache.lucene.index.IndexWriter.updateDocValues(
> IndexWriter.java:1731)
> -
> org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(
> DirectUpdateHandler2.java:911)
> -
> org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(
> DirectUpdateHandler2.java:302)
> -
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(
> DirectUpdateHandler2.java:239)
> -
> org.apache.solr.update.DirectUpdateHandler2.addDoc(
> DirectUpdateHandler2.java:194)
>
>
> I think this is related to
> SOLR-6838 [https://issues.apache.org/jira/browse/SOLR-6838]
> and
> LUCENE-6161 [https://issues.apache.org/jira/browse/LUCENE-6161]
>
> I need to make the flush faster, to complete the update quicker. Has anyone
> a workaround or have any suggestions?
>
> Many thanks,
> Dan
>