You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Istvan Farkas (Jira)" <ji...@apache.org> on 2020/03/30 08:29:00 UTC

[jira] [Created] (SOLR-14373) HDFS block cache allows overallocation

Istvan Farkas created SOLR-14373:
------------------------------------

             Summary: HDFS block cache allows overallocation
                 Key: SOLR-14373
                 URL: https://issues.apache.org/jira/browse/SOLR-14373
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: hdfs
    Affects Versions: 4.10
            Reporter: Istvan Farkas


For the HDFS block cache, when we allocate more slabs the direct memory available, the error message seems to be hidden.

In such cases The HdfsDirectoryFactory throws an OutOfMemoryError, which seems to be caught in the HdfsDirectoryFactory itself and thrown as a RuntimeException: 

{code}
 try {
      blockCache = new BlockCache(metrics, directAllocation, totalMemory, slabSize, blockSize);
    } catch (OutOfMemoryError e) {
      throw new RuntimeException(
          "The max direct memory is likely too low.  Either increase it (by adding -XX:MaxDirectMemorySize=<size>g -XX:+UseLargePages to your containers startup args)"
              + " or disable direct allocation using solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If you are putting the block cache on the heap,"
              + " your java heap size might not be large enough."
              + " Failed allocating ~" + totalMemory / 1000000.0 + " MB.",
          e);
    }
{code}

Which will manifest as a NullPointerException during core load.

{code}
2020-02-24 06:50:23,492 ERROR (coreLoadExecutor-5-thread-8)-c: collection1-s:shard2-r:core_node2-x: collection1_shard2_replica1-o.a.s.c.SolrCore: Error while closing
java.lang.NullPointerException
        at org.apache.solr.core.SolrCore.close(SolrCore.java:1352)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:967)
{code}

When directAllocation is true, the directoryFactory has an approximation of the memory to be allocated.

{code}
2020-02-24 06:49:53,153 INFO (coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory: Number of slabs of block cache [16384] with direct memory allocation set to [true]
2020-02-24 06:49:53,153 INFO (coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory: Block cache target memory usage, slab size of [134217728] will allocate [16384] slabs and use ~[2199023255552] bytes
{code}

This is detected on Solr 4.10 but it seems that it also affects current versions, I will double check.

Plan to resolve:
- correct logging and throwable instance checking so it does not manifest in a nullpointerexception during core load
- add a detection which checks if the memory to be allocated is higher than the available direct memory. If yes, fall back to a smaller slab count and log a warning message.







--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org