You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jou Sung-Shik <li...@gmail.com> on 2015/03/01 04:03:23 UTC

Is it possible to use multiple index data directory in Apache Solr?

I'm new in Apache Lucene/Solr.

I try to move from Elasticsearch to Apache Solr.

So, I have a question about following index data location configuration.


*in Elasticsearch*

# Can optionally include more than one lo # the locations (a la RAID 0) on
a file l # space on creation. For example:
#
# path.data: /path/to/data1,/path/to/data2

*in Apache Solr*

<dataDir>/var/data/solr/</dataDir>


I want to configure multiple index data directory like Elasticsearch in
Apache Solr.

Is it possible?

How I can reach the goal?





-- 
---------------------------------------------------------------------
BLOG : http://www.codingstar.net
---------------------------------------------------------------------

RE: Is it possible to use multiple index data directory in Apache Solr?

Posted by Susheel Kumar <su...@thedigitalgroup.com>.
Under Solr/example folder, you will find "multicore" folder under which you can create multiple core/index directory folders and edit the solr.xml to specify each of the new core/directory.  

When you start Solr under examples directory, use command line like below to load Solr and then you should be able to see these multiple core in Solr admin and index data in each of the core/data directory.

> java -Dsolr.solr.home=multicore -jar start.jar 

Thnx

-----Original Message-----
From: Jou Sung-Shik [mailto:liks79@gmail.com] 
Sent: February 28, 2015 10:03 PM
To: solr-user@lucene.apache.org
Subject: Is it possible to use multiple index data directory in Apache Solr?

I'm new in Apache Lucene/Solr.

I try to move from Elasticsearch to Apache Solr.

So, I have a question about following index data location configuration.


*in Elasticsearch*

# Can optionally include more than one lo # the locations (a la RAID 0) on a file l # space on creation. For example:
#
# path.data: /path/to/data1,/path/to/data2

*in Apache Solr*

<dataDir>/var/data/solr/</dataDir>


I want to configure multiple index data directory like Elasticsearch in Apache Solr.

Is it possible?

How I can reach the goal?





--
---------------------------------------------------------------------
BLOG : http://www.codingstar.net
---------------------------------------------------------------------

Re: Is it possible to use multiple index data directory in Apache Solr?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/1/2015 9:33 AM, Alexandre Rafalovitch wrote:
> On 1 March 2015 at 01:03, Shawn Heisey <ap...@elyograg.org> wrote:
>> How exactly does ES split the index files when multiple paths are
>> configured?  I am very curious about exactly how this works.  Google is
>> not helping me figure it out.  I even grabbed the ES master branch and
>> wasn't able to trace how path.data is used after it makes it into the
>> environment.
> Elasticsearch automatically creates indexes and shards. So, multiple
> directories are just used to distribute the shards' indexes among
> them. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-dir-layout.html
> So, when a new shard is created, one of the directories is used either
> randomly or usage-based.

So specifying multiple paths for path.data in ES is NOT a way to split a
single Lucene index across multiple directories?  That was the
implication that I took from the OP's question.  If path.data is a more
general config that is used to contain the data for all indexes in the
application and not something that gets specified per index, then there
is no need for Solr to emulate it, because Solr can already specify a
completely different data directory for each core/shard.

Thanks,
Shawn


Re: Is it possible to use multiple index data directory in Apache Solr?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
On 1 March 2015 at 01:03, Shawn Heisey <ap...@elyograg.org> wrote:
> How exactly does ES split the index files when multiple paths are
> configured?  I am very curious about exactly how this works.  Google is
> not helping me figure it out.  I even grabbed the ES master branch and
> wasn't able to trace how path.data is used after it makes it into the
> environment.

Elasticsearch automatically creates indexes and shards. So, multiple
directories are just used to distribute the shards' indexes among
them. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-dir-layout.html
So, when a new shard is created, one of the directories is used either
randomly or usage-based.

So, to me, the question would be not about the implementation matching
but what is the OP trying to achieve with that: replication? more even
disk utilization? something else?

Regards,
    Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

Re: Is it possible to use multiple index data directory in Apache Solr?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 2/28/2015 8:03 PM, Jou Sung-Shik wrote:
> *in Elasticsearch*
> 
> # Can optionally include more than one lo # the locations (a la RAID 0) on
> a file l # space on creation. For example:
> #
> # path.data: /path/to/data1,/path/to/data2
> 
> *in Apache Solr*
> 
> <dataDir>/var/data/solr/</dataDir>
> 
> 
> I want to configure multiple index data directory like Elasticsearch in
> Apache Solr.
> 
> Is it possible?
> 
> How I can reach the goal?

I don't believe this is possible in Solr.

How exactly does ES split the index files when multiple paths are
configured?  I am very curious about exactly how this works.  Google is
not helping me figure it out.  I even grabbed the ES master branch and
wasn't able to trace how path.data is used after it makes it into the
environment.

In truth, for most people I do not really see this feature as all that
much of an advantage.  For best performance, you want to completely
avoid hitting the disk at all -- the index should be entirely cached in
RAM.  When that is achieved, disk performance won't matter.  It could
help in situations where the total index data on a single server is far
too big to ever fit into RAM, or where each disk is small.

Thanks,
Shawn