You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Istvan Farkas (Jira)" <ji...@apache.org> on 2020/03/30 08:30:00 UTC

[jira] [Commented] (SOLR-14373) HDFS block cache allows overallocation

    [ https://issues.apache.org/jira/browse/SOLR-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070787#comment-17070787 ] 

Istvan Farkas commented on SOLR-14373:
--------------------------------------

Started working on this, will submit a patch when ready.

> HDFS block cache allows overallocation
> --------------------------------------
>
>                 Key: SOLR-14373
>                 URL: https://issues.apache.org/jira/browse/SOLR-14373
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: hdfs
>    Affects Versions: 4.10
>            Reporter: Istvan Farkas
>            Priority: Minor
>
> For the HDFS block cache, when we allocate more slabs the direct memory available, the error message seems to be hidden.
> In such cases The HdfsDirectoryFactory throws an OutOfMemoryError, which seems to be caught in the HdfsDirectoryFactory itself and thrown as a RuntimeException: 
> {code}
>  try {
>       blockCache = new BlockCache(metrics, directAllocation, totalMemory, slabSize, blockSize);
>     } catch (OutOfMemoryError e) {
>       throw new RuntimeException(
>           "The max direct memory is likely too low.  Either increase it (by adding -XX:MaxDirectMemorySize=<size>g -XX:+UseLargePages to your containers startup args)"
>               + " or disable direct allocation using solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If you are putting the block cache on the heap,"
>               + " your java heap size might not be large enough."
>               + " Failed allocating ~" + totalMemory / 1000000.0 + " MB.",
>           e);
>     }
> {code}
> Which will manifest as a NullPointerException during core load.
> {code}
> 2020-02-24 06:50:23,492 ERROR (coreLoadExecutor-5-thread-8)-c: collection1-s:shard2-r:core_node2-x: collection1_shard2_replica1-o.a.s.c.SolrCore: Error while closing
> java.lang.NullPointerException
>         at org.apache.solr.core.SolrCore.close(SolrCore.java:1352)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:967)
> {code}
> When directAllocation is true, the directoryFactory has an approximation of the memory to be allocated.
> {code}
> 2020-02-24 06:49:53,153 INFO (coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory: Number of slabs of block cache [16384] with direct memory allocation set to [true]
> 2020-02-24 06:49:53,153 INFO (coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory: Block cache target memory usage, slab size of [134217728] will allocate [16384] slabs and use ~[2199023255552] bytes
> {code}
> This is detected on Solr 4.10 but it seems that it also affects current versions, I will double check.
> Plan to resolve:
> - correct logging and throwable instance checking so it does not manifest in a nullpointerexception during core load
> - add a detection which checks if the memory to be allocated is higher than the available direct memory. If yes, fall back to a smaller slab count and log a warning message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org