You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Dan ." <ro...@gmail.com> on 2017/05/18 10:08:07 UTC

Slow Bulk InPlace DocValues updates

Hi,

-Solr 6.5.1
-SSD disk
-23M docs index 64G single shard

I'm trying to do around 4M in-place docValue updates to a collection
(single shard or around 23M docs) [these are ALL in-place updates]

 I can add the updates in around 7mins, but flushing to disk takes around
40mins! I've been able to add the updates quickly by adding:

<indexConfig>
    <ramBufferSizeMB>4000</ramBufferSizeMB>
  </indexConfig>

autoSoftCommit/autoCommit currently disabled.

From the thread dump I see that the flush is in a single thread and
extremely slow. Dump below, the culprit seems to be [

   -
   org.apache.lucene.index.BufferedUpdatesStream.applyDocValuesUpdates​(BufferedUpdatesStream.java:666)]

:


   -
   org.apache.lucene.codecs.blocktree.SegmentTermsEnum.pushFrame​(SegmentTermsEnum.java:256)
   -
   org.apache.lucene.codecs.blocktree.SegmentTermsEnum.pushFrame​(SegmentTermsEnum.java:248)
   -
   org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact​(SegmentTermsEnum.java:538)



   -
   org.apache.lucene.index.BufferedUpdatesStream.applyDocValuesUpdates​(BufferedUpdatesStream.java:666)
   -
   org.apache.lucene.index.BufferedUpdatesStream.applyDocValuesUpdatesList​(BufferedUpdatesStream.java:612)
   -
   org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates​(BufferedUpdatesStream.java:269)
   -
   org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates​(IndexWriter.java:3454)
   -
   org.apache.lucene.index.IndexWriter.applyDeletesAndPurge​(IndexWriter.java:4990)
   -
   org.apache.lucene.index.DocumentsWriter$ApplyDeletesEvent.process​(DocumentsWriter.java:717)
   -
   org.apache.lucene.index.IndexWriter.processEvents​(IndexWriter.java:5040)
   -
   org.apache.lucene.index.IndexWriter.processEvents​(IndexWriter.java:5031)
   -
   org.apache.lucene.index.IndexWriter.updateDocValues​(IndexWriter.java:1731)
   -
   org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues​(DirectUpdateHandler2.java:911)
   -
   org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate​(DirectUpdateHandler2.java:302)
   -
   org.apache.solr.update.DirectUpdateHandler2.addDoc0​(DirectUpdateHandler2.java:239)
   -
   org.apache.solr.update.DirectUpdateHandler2.addDoc​(DirectUpdateHandler2.java:194)


I think this is related to
SOLR-6838 [https://issues.apache.org/jira/browse/SOLR-6838]
and
LUCENE-6161 [https://issues.apache.org/jira/browse/LUCENE-6161]

I need to make the flush faster, to complete the update quicker. Has anyone
a workaround or have any suggestions?

Many thanks,
Dan

Re: Slow Bulk InPlace DocValues updates

Posted by Damien Kamerman <da...@gmail.com>.
Adding more shards will scale your writes.

On 18 May 2017 at 20:08, Dan . <ro...@gmail.com> wrote:

> Hi,
>
> -Solr 6.5.1
> -SSD disk
> -23M docs index 64G single shard
>
> I'm trying to do around 4M in-place docValue updates to a collection
> (single shard or around 23M docs) [these are ALL in-place updates]
>
>  I can add the updates in around 7mins, but flushing to disk takes around
> 40mins! I've been able to add the updates quickly by adding:
>
> <indexConfig>
>     <ramBufferSizeMB>4000</ramBufferSizeMB>
>   </indexConfig>
>
> autoSoftCommit/autoCommit currently disabled.
>
> From the thread dump I see that the flush is in a single thread and
> extremely slow. Dump below, the culprit seems to be [
>
>    -
>    org.apache.lucene.index.BufferedUpdatesStream.applyDocValuesUpdates​(
> BufferedUpdatesStream.java:666)]
>
> :
>
>
>    -
>    org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> pushFrame​(SegmentTermsEnum.java:256)
>    -
>    org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> pushFrame​(SegmentTermsEnum.java:248)
>    -
>    org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> seekExact​(SegmentTermsEnum.java:538)
>
>
>
>    -
>    org.apache.lucene.index.BufferedUpdatesStream.applyDocValuesUpdates​(
> BufferedUpdatesStream.java:666)
>    -
>    org.apache.lucene.index.BufferedUpdatesStream.
> applyDocValuesUpdatesList​(BufferedUpdatesStream.java:612)
>    -
>    org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates​(
> BufferedUpdatesStream.java:269)
>    -
>    org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates​(
> IndexWriter.java:3454)
>    -
>    org.apache.lucene.index.IndexWriter.applyDeletesAndPurge​(
> IndexWriter.java:4990)
>    -
>    org.apache.lucene.index.DocumentsWriter$ApplyDeletesEvent.process​(
> DocumentsWriter.java:717)
>    -
>    org.apache.lucene.index.IndexWriter.processEvents​(
> IndexWriter.java:5040)
>    -
>    org.apache.lucene.index.IndexWriter.processEvents​(
> IndexWriter.java:5031)
>    -
>    org.apache.lucene.index.IndexWriter.updateDocValues​(
> IndexWriter.java:1731)
>    -
>    org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues​(
> DirectUpdateHandler2.java:911)
>    -
>    org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate​(
> DirectUpdateHandler2.java:302)
>    -
>    org.apache.solr.update.DirectUpdateHandler2.addDoc0​(
> DirectUpdateHandler2.java:239)
>    -
>    org.apache.solr.update.DirectUpdateHandler2.addDoc​(
> DirectUpdateHandler2.java:194)
>
>
> I think this is related to
> SOLR-6838 [https://issues.apache.org/jira/browse/SOLR-6838]
> and
> LUCENE-6161 [https://issues.apache.org/jira/browse/LUCENE-6161]
>
> I need to make the flush faster, to complete the update quicker. Has anyone
> a workaround or have any suggestions?
>
> Many thanks,
> Dan
>