You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Istvan Farkas (Jira)" <ji...@apache.org> on 2020/03/30 08:30:00 UTC
[jira] [Commented] (SOLR-14373) HDFS block cache allows
overallocation
[ https://issues.apache.org/jira/browse/SOLR-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070787#comment-17070787 ]
Istvan Farkas commented on SOLR-14373:
--------------------------------------
Started working on this, will submit a patch when ready.
> HDFS block cache allows overallocation
> --------------------------------------
>
> Key: SOLR-14373
> URL: https://issues.apache.org/jira/browse/SOLR-14373
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: hdfs
> Affects Versions: 4.10
> Reporter: Istvan Farkas
> Priority: Minor
>
> For the HDFS block cache, when we allocate more slabs the direct memory available, the error message seems to be hidden.
> In such cases The HdfsDirectoryFactory throws an OutOfMemoryError, which seems to be caught in the HdfsDirectoryFactory itself and thrown as a RuntimeException:
> {code}
> try {
> blockCache = new BlockCache(metrics, directAllocation, totalMemory, slabSize, blockSize);
> } catch (OutOfMemoryError e) {
> throw new RuntimeException(
> "The max direct memory is likely too low. Either increase it (by adding -XX:MaxDirectMemorySize=<size>g -XX:+UseLargePages to your containers startup args)"
> + " or disable direct allocation using solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If you are putting the block cache on the heap,"
> + " your java heap size might not be large enough."
> + " Failed allocating ~" + totalMemory / 1000000.0 + " MB.",
> e);
> }
> {code}
> Which will manifest as a NullPointerException during core load.
> {code}
> 2020-02-24 06:50:23,492 ERROR (coreLoadExecutor-5-thread-8)-c: collection1-s:shard2-r:core_node2-x: collection1_shard2_replica1-o.a.s.c.SolrCore: Error while closing
> java.lang.NullPointerException
> at org.apache.solr.core.SolrCore.close(SolrCore.java:1352)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:967)
> {code}
> When directAllocation is true, the directoryFactory has an approximation of the memory to be allocated.
> {code}
> 2020-02-24 06:49:53,153 INFO (coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory: Number of slabs of block cache [16384] with direct memory allocation set to [true]
> 2020-02-24 06:49:53,153 INFO (coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory: Block cache target memory usage, slab size of [134217728] will allocate [16384] slabs and use ~[2199023255552] bytes
> {code}
> This is detected on Solr 4.10 but it seems that it also affects current versions, I will double check.
> Plan to resolve:
> - correct logging and throwable instance checking so it does not manifest in a nullpointerexception during core load
> - add a detection which checks if the memory to be allocated is higher than the available direct memory. If yes, fall back to a smaller slab count and log a warning message.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org