You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by "Mark Miller (Confluence)" <co...@apache.org> on 2013/07/24 18:26:00 UTC

[CONF] Apache Solr Reference Guide > Running Solr on HDFS

Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr)
Page: Running Solr on HDFS (https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS)

Added by Mark Miller:
---------------------------------------------------------------------
Running Solr with HDFS

Solr has support for writing and reading it's index and transaction log files to the HDFS distributed filesystem. To use HDFS rather than a local filesystem, you must configure solrconfig.xml properly.

You need to use an HdfsDirectoryFactory and a data dir of the form hdfs://host:port/path

You need to specify an UpdateLog location of the form hdfs://host:port/path

You should specify a lock factory type of 'hdfs' or none.

With the default configuration files, you can start Solr on HDFS with the following command:

java \-Dsolr.directoryFactory=HdfsDirectoryFactory
\-Dsolr.lockType=solr.HdfsLockFactory \-Dsolr.data.dir=hdfs://host:port/path
\-Dsolr.updatelog=hdfs://host:port/path \-jar start.jar


The Block Cache

For performance, the HdfsDirectoryFactory uses a Directory that will cache HDFS blocks. This caching mechanism is meant to replace the standard file system cache that Solr utilizes so much. By default, this cache is allocated off heap. This cache will often need to be quite large and you may need to raise the off heap memory limit for you JVM. For the oracle/OpenJDK vm's, this is the \-XX:MaxDirectMemorySize=20g setting.


The HdfsDirectoryFactory has a number of settings.

Block Cache Settings

&nbsp;Enable the blockcache
Enable the read cache
&nbsp;Enable the write cache.
&nbsp;Enable direct memory allocation. If this is false, heap is used
&nbsp;Number of memory slabs to allocate. Each slab is 128 MB in size.
|| Param || Default || Description ||
| solr.hdfs.blockcache.enabled\\ | true | Enable the blockcache\\ |
| solr.hdfs.blockcache.read.enabled\\ | true | Enable the read cache |
| solr.hdfs.blockcache.write.enabled\\ | true | Enable the write cache |
| solr.hdfs.blockcache.direct.memory.allocation\\ | true | Enable direct memory allocation. If this is false, heap is used\\ |
| solr.hdfs.blockcache.slab.count\\ | 1 | Number of memory slabs to allocate. Each slab is 128 MB in size.\\ |

NRTCachingDirectory Settings

solr.hdfs.nrtcachingdirectory.enable Enable the use of NrtCachingDirectory
solr.hdfs.nrtcachingdirectory.maxmergesizemb NRTCachingDirectory max segment size for merges
solr.hdfs.nrtcachingdirectory.maxcachedmb NRTCachingDirectory max cache size
|| Param || Default || Description ||
| solr.hdfs.nrtcachingdirectory.enable\\ | true | Enable the use of NrtCachingDirectory\\ |
| solr.hdfs.nrtcachingdirectory.maxmergesizemb\\ | 16 | NRTCachingDirectory max segment size for merges\\ |
| solr.hdfs.nrtcachingdirectory.maxcachedmb\\ | 192 | NRTCachingDirectory max cache size\\ |

HDFS Client Configuraiton Settings

solr.hdfs.confdir pass the location of HDFS client configuration files - needed for HDFS HA for example.


Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action