You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by adfel70 <ad...@gmail.com> on 2013/11/25 14:12:27 UTC

Setting solr.data.dir for SolrCloud instance

I found something strange while trying to create more than one collection in
SolrCloud:
I am running every instance with -Dsolr.data.dir=/data
If I look at Core Admin section, I can see that I have one core and its
dataDir is set to this fixed location. Problem is, if I create a new
collection, another core is created - but with this fixed index location
again.
I was expecting that the path I sent would serve as the BASE path for all
cores the the node hosts. Current behaviour seems like a bug to me, because
obviously one collection will see data that was not indexed to him.
Is there a way to overcome this? I mean, change the default data dir
location, but still be able to create more than one collection correctly?



--
View this message in context: http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Setting solr.data.dir for SolrCloud instance

Posted by Shawn Heisey <so...@elyograg.org>.
On 11/26/2013 9:19 AM, adfel70 wrote:
> The problem we had was that we tried to run:
> java -Dsolr.data.dir=/opt/solr/data -Dsolr.solr.home=/opt/solr/home -jar
> start.jar
> and got different behavior for how solr handles these 2 params.
>
> we created 2 collections, which created 2 cores.
> then we got 2 home dirs for the cores, as expected:
> /opt/solr/home/collection1_shard1_replica1
> /opt/solr/home/collection2_shard1_replica1
>
> but instead of creating 2 data dirs like:
> /opt/solr/data/collection1_shard1_replica1
> /opt/solr/data/collection2_shard1_replica1
>   
> solr had both cores' data dirs  pointing to the same directory -
> /opt/solr/data
>
> when we tried putting a relative path in -Dsolr.data.dir, it worked as
> expected.
>
> I don't know if this is a bug, but we thought of 2 solutions in our case:
> 1. point -Dsolr.data.dir to a relative path on symlink that path to the
> absolute path we wanted in the first place.
> 2. dont provide -Dsolr.data.dir at all, and then solr puts the data dir
> inside the home.dir, which as said, works with relative paths.
>
> we chose the first option for now.

The dataDir is a per-core setting, you cannot set it for the entire 
application.  If you make it relative, then it will be relative to each 
individual instanceDir.  It defaults to ./data, so you get 
$instanceDir/data as the location.

Thanks,
Shawn


Re: Setting solr.data.dir for SolrCloud instance

Posted by adfel70 <ad...@gmail.com>.
The problem we had was that we tried to run: 
java -Dsolr.data.dir=/opt/solr/data -Dsolr.solr.home=/opt/solr/home -jar
start.jar
and got different behavior for how solr handles these 2 params.

we created 2 collections, which created 2 cores. 
then we got 2 home dirs for the cores, as expected:
/opt/solr/home/collection1_shard1_replica1
/opt/solr/home/collection2_shard1_replica1

but instead of creating 2 data dirs like:
/opt/solr/data/collection1_shard1_replica1
/opt/solr/data/collection2_shard1_replica1
 
solr had both cores' data dirs  pointing to the same directory -
/opt/solr/data

when we tried putting a relative path in -Dsolr.data.dir, it worked as
expected.

I don't know if this is a bug, but we thought of 2 solutions in our case:
1. point -Dsolr.data.dir to a relative path on symlink that path to the
absolute path we wanted in the first place.
2. dont provide -Dsolr.data.dir at all, and then solr puts the data dir
inside the home.dir, which as said, works with relative paths.

we chose the first option for now.





Erick Erickson wrote
> The data _is_ separated from the code. It's all relative
> to solr_home which need not have any relation to where
> the code is executing from.
> 
> For instance, I can start Solr like
> java -Dsolr.solr.home=/Users/Erick/testdir/solr -jar start.jar
> 
> and have my war in a completely different place.
> 
> Best,
> Erick
> 
> 
> On Tue, Nov 26, 2013 at 1:08 AM, adfel70 &lt;

> adfel70@

> &gt; wrote:
> 
>> Thanks for the reply, Erick.
>> Actually, I didnt not think this through. I just thought it would be a
>> good
>> idea to separate the data from the application code.
>> I guess I'll leave it without setting the datadir parameter and add a
>> symlink.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103228.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>





--
View this message in context: http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103334.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Setting solr.data.dir for SolrCloud instance

Posted by Erick Erickson <er...@gmail.com>.
The data _is_ separated from the code. It's all relative
to solr_home which need not have any relation to where
the code is executing from.

For instance, I can start Solr like
java -Dsolr.solr.home=/Users/Erick/testdir/solr -jar start.jar

and have my war in a completely different place.

Best,
Erick


On Tue, Nov 26, 2013 at 1:08 AM, adfel70 <ad...@gmail.com> wrote:

> Thanks for the reply, Erick.
> Actually, I didnt not think this through. I just thought it would be a good
> idea to separate the data from the application code.
> I guess I'll leave it without setting the datadir parameter and add a
> symlink.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103228.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Setting solr.data.dir for SolrCloud instance

Posted by adfel70 <ad...@gmail.com>.
Thanks for the reply, Erick.
Actually, I didnt not think this through. I just thought it would be a good
idea to separate the data from the application code.
I guess I'll leave it without setting the datadir parameter and add a
symlink.



--
View this message in context: http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103228.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Setting solr.data.dir for SolrCloud instance

Posted by Erick Erickson <er...@gmail.com>.
The first thing I'd do is not send an absolute path. What
happens if you just sent -Dsolr.data.dir=data? (no '/')?

We had this discussion a while ago when we were working
on auto-discovery, and it turns out that
there _are_ legitimate cases in which more than one
core/collection can point to the same data dir. You have to very
carefully control who writes to the core, and I wouldn't do it
unless there was no choice, but some people find it useful.

And, in general, I wouldn't mix and match the _core_ admin API
with the _collections_ api unless you're very confident in what
you are doing.

Why isn't just letting the default data.dir location working for you?
There are good reasons to make it explicit, mostly just checking
that you're not over-thinking the problem. Usually they'll be located
in a reasonable place.

Best,
Erick



On Mon, Nov 25, 2013 at 8:12 AM, adfel70 <ad...@gmail.com> wrote:

> I found something strange while trying to create more than one collection
> in
> SolrCloud:
> I am running every instance with -Dsolr.data.dir=/data
> If I look at Core Admin section, I can see that I have one core and its
> dataDir is set to this fixed location. Problem is, if I create a new
> collection, another core is created - but with this fixed index location
> again.
> I was expecting that the path I sent would serve as the BASE path for all
> cores the the node hosts. Current behaviour seems like a bug to me, because
> obviously one collection will see data that was not indexed to him.
> Is there a way to overcome this? I mean, change the default data dir
> location, but still be able to create more than one collection correctly?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Setting solr.data.dir for SolrCloud instance

Posted by Mark Miller <ma...@gmail.com>.
On Nov 25, 2013, at 8:12 AM, adfel70 <ad...@gmail.com> wrote:

> I was expecting that the path I sent would serve as the BASE path for all
> cores the the node hosts

When running Solr on HDFS, there is a similar prop you can use -Dsolr.hdfs.home. If you set that, all data dirs are created nicely under it.

We talked about wanting a similar option for SolrCloud and local filesystem a while back. If there is no JIRA issue for it, please file one!

- Mark