You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2019/04/02 15:40:00 UTC

[jira] [Commented] (SOLR-12833) Use timed-out lock in DistributedUpdateProcessor

    [ https://issues.apache.org/jira/browse/SOLR-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807868#comment-16807868 ] 

Andrzej Bialecki  commented on SOLR-12833:
------------------------------------------

There is one unfortunate consequence of this change - the {{VersionBucket}} size increased from  ~24 bytes to ~104 bytes. This may not seem like much, but for every {{VersionInfo}} (which is created for every SolrCore) we create 64k of them by default. So before the change {{VersionInfo}} was taking 1.5 MB but now it takes ~7 MB. Now, if you have 100 cores this becomes 0.7 GB. The result is that a scenario where 100 collections would be created in 7.5 just fine now in 7.7 fails with OOM.

Currently the workaround is to either decrease the number of buckets (which comes with its own side-effects) or increase the heap size, but I wonder if we really need a Lock+Condition instance and reference in every {{VersionBucket}} - AFAIK not all of 64k buckets are being updated simultaneously? Also, the {{versionBucketLockTimeoutMs}} value is common for all {{VersionBucket}}-s so we don't need to keep the additional {{int}} in every {{VersionBucket}}.

> Use timed-out lock in DistributedUpdateProcessor
> ------------------------------------------------
>
>                 Key: SOLR-12833
>                 URL: https://issues.apache.org/jira/browse/SOLR-12833
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update, UpdateRequestProcessors
>    Affects Versions: 7.5, 8.0
>            Reporter: jefferyyuan
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 7.7, 8.0
>
>         Attachments: SOLR-12833.patch, SOLR-12833.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a synchronize block that blocks other update requests whose IDs fall in the same hash bucket. The update waits forever until it gets the lock at the synchronize block, this can be a problem in some cases.
>  
> Some add/update requests (for example updates with spatial/shape analysis) like may take time (30+ seconds or even more), this would the request time out and fail.
> Client may retry the same requests multiple times or several minutes, this would make things worse.
> The server side receives all the update requests but all except one can do nothing, have to wait there. This wastes precious memory and cpu resource.
> We have seen the case 2000+ threads are blocking at the synchronize lock, and only a few updates are making progress. Each thread takes 3+ mb memory which causes OOM.
> Also if the update can't get the lock in expected time range, its better to fail fast.
>  
> We can have one configuration in solrconfig.xml: updateHandler/versionLock/timeInMill, so users can specify how long they want to wait the version bucket lock.
> The default value can be -1, so it behaves same - wait forever until it gets the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org