You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Thomas Wöckinger (Jira)" <ji...@apache.org> on 2021/01/04 22:29:00 UTC

[jira] [Comment Edited] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

    [ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254118#comment-17254118 ] 

Thomas Wöckinger edited comment on SOLR-14923 at 1/4/21, 10:28 PM:
-------------------------------------------------------------------

I run the indexing tests five times, one run took about 60mins +-2mins.

Compared to the the first version it is about 3-4mins better, which is about 5%

Lock contention on UpdateLog is also a bit better.

Another very interesting behavior emerged: In 5 minutes there are about 12000 java.io.FileNotFoundException and about 2000 java.nio.NoSuchFileException thrown. The first is thrown from RAMDirectory.fileLength the second from FSDirectory.fileLength. They are booth used from NRTCachingDirectory.

The implementation is different between master and 8.x but *createTempOutput* is still using the method *slowFileExists* which is using exception handling to detect if a file exists, which can be avoided most of the time in the existing implementations. I'm not sure if Solr uses -_XX:-StackTraceInThrowable_, but if not these calls can be a hundred times faster. But this seems to be lucene related. May i open different issue for this.

[~dsmiley] So from my side this looks very good!

 


was (Author: thomas.woeckinger):
I run the indexing tests five times, one run took about 60mins +-2mins.

Compared to the the first version it is about 3-4mins better, which is about 5%

Lock contention on UpdateLog is also a bit better.

Another very interesting behavior shown up: In 5 minutes there are about 12000 java.io.FileNotFoundException and about 2000 java.nio.NoSuchFileException thrown. The first is thrown from RAMDirectory.fileLength the second from FSDirectory.fileLength. They are booth used from NRTCachingDirectory.

T implementation is different between master and 8.x but *createTempOutput* is still using the method *slowFileExists* which is using exception handling to detect if a file exists, which can be avoided most of the time in the existing implementations. I'm not sure if Solr uses -_XX:-StackTraceInThrowable_, but if not these calls can be a hundred times faster. But this seems to be lucene  related. May i open different issue for this.

[~dsmiley] So from my side this looks very good!

 

> Indexing performance is unacceptable when child documents are involved
> ----------------------------------------------------------------------
>
>                 Key: SOLR-14923
>                 URL: https://issues.apache.org/jira/browse/SOLR-14923
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update, UpdateRequestProcessors
>    Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0)
>            Reporter: Thomas Wöckinger
>            Priority: Critical
>              Labels: performance, pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, and executed in a synchronized block of the UpdateLog instance, therefore all other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, so it does not make any difference if 'waitFlush', 'waitSearcher' or 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org