You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Evans <te...@googlemail.com> on 2015/12/14 18:49:45 UTC

Moving to SolrCloud, specifying dataDir correctly

Hi all

We're currently in the process of migrating our distributed search
running on 5.0 to SolrCloud running on 5.4, and setting up a test
cluster for performance testing etc.

We have several cores/collections, and in each core's solrconfig.xml,
we were specifying an empty <dataDir>, and specifying the same
core.baseDataDir in core.properties.

When I tried this in SolrCloud mode, specifying
"-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
for the first collection, but then the second collection tried to use
the same directory to store its index, which obviously failed. I fixed
this by changing solrconfig.xml in each collection to specify a
specific directory, like so:

  <dataDir>${solr.data.dir:}products</dataDir>

Looking back after the weekend, I'm not a big fan of this. Is there a
way to add a core.properties to ZK, or a way to specify
core.baseDatadir on the command line, or just a better way of handling
this that I'm not aware of?

Cheers

Tom

Re: Moving to SolrCloud, specifying dataDir correctly

Posted by Rahul Ramesh <rr...@gmail.com>.
We currently moved data from magnetic drive to SSD. We run Solr in cloud
mode. Only data is stored in the drive configuration is stored in ZK. We
start solr using the -s option specifying the data dir
Command to start solr
./bin/solr start -c -h <host_name> -p <port> -z <sk_instances> -s <solr
data directory name>

We followed the following steps to migrate data

1. Stop all new insertions .
2. Copy the solr data to the new location
3. restart the server with -s option pointing to new solr directory name.
4. We have a 3 node solr cluster. The restarted server will get in sync
with the other two servers.
5. Repeat this procedure for other two servers.

We used the similar procedure to upgrade from 5.2.1 to 5.3.1.





On Tue, Dec 15, 2015 at 5:07 AM, Jeff Wartes <jw...@whitepages.com> wrote:

>
> Don’t set solr.data.dir. Instead, set the install dir. Something like:
> -Dsolr.solr.home=/data/solr
> -Dsolr.install.dir=/opt/solr
>
> I have many solrcloud collections, and separate data/install dirs, and
> I’ve never had to do anything with manual per-collection or per-replica
> data dirs.
>
> That said, it’s been a while since I set this up, and I may not remember
> all the pieces.
> You might need something like this too, for example:
>
> -Djetty.home=/opt/solr/server
>
>
> On 12/14/15, 3:11 PM, "Erick Erickson" <er...@gmail.com> wrote:
>
> >Currently, it'll be a little tedious but here's what you can do (going
> >partly from memory)...
> >
> >When you create the collection, specify the special value EMPTY for
> >createNodeSet (Solr 5.3+).
> >Use ADDREPLICA to add each individual replica. When you do this, you
> >can add a dataDir for
> >each individual replica and thus keep them separate, i.e. for a
> >particular box the first
> >replica would get /data/solr/collection1_shard1_replica1, the second
> >/data/solr/collection1_shard2_replica1 and so forth.
> >
> >If you don't have Solr 5.3+, you can still to the same thing, except
> >you create your collection letting
> >the replicas fall where they will. Then do the ADDREPLICA as above.
> >When that's all done,
> >DELETEREPLICA for the original replicas.
> >
> >Best,
> >Erick
> >
> >On Mon, Dec 14, 2015 at 2:21 PM, Tom Evans <te...@googlemail.com>
> >wrote:
> >> On Mon, Dec 14, 2015 at 1:22 PM, Shawn Heisey <ap...@elyograg.org>
> >>wrote:
> >>> On 12/14/2015 10:49 AM, Tom Evans wrote:
> >>>> When I tried this in SolrCloud mode, specifying
> >>>> "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
> >>>> for the first collection, but then the second collection tried to use
> >>>> the same directory to store its index, which obviously failed. I fixed
> >>>> this by changing solrconfig.xml in each collection to specify a
> >>>> specific directory, like so:
> >>>>
> >>>>   <dataDir>${solr.data.dir:}products</dataDir>
> >>>>
> >>>> Looking back after the weekend, I'm not a big fan of this. Is there a
> >>>> way to add a core.properties to ZK, or a way to specify
> >>>> core.baseDatadir on the command line, or just a better way of handling
> >>>> this that I'm not aware of?
> >>>
> >>> Since you're running SolrCloud, just let Solr handle the dataDir, don't
> >>> try to override it.  It will default to "data" relative to the
> >>> instanceDir.  Each instanceDir is likely to be in the solr home.
> >>>
> >>> With SolrCloud, your cores will not contain a "conf" directory (unless
> >>> you create it manually), therefore the on-disk locations will be *only*
> >>> data, there's not really any need to have separate locations for
> >>> instanceDir and dataDir.  All active configuration information for
> >>> SolrCloud is in zookeeper.
> >>>
> >>
> >> That makes sense, but I guess I was asking the wrong question :)
> >>
> >> We have our SSDs mounted on /data/solr, which is where our indexes
> >> should go, but our solr install is on /opt/solr, with the default solr
> >> home in /opt/solr/server/solr. How do we change where the indexes get
> >> put so they end up on the fast storage?
> >>
> >> Cheers
> >>
> >> Tom
>
>

Re: Moving to SolrCloud, specifying dataDir correctly

Posted by Jeff Wartes <jw...@whitepages.com>.
Don’t set solr.data.dir. Instead, set the install dir. Something like:
-Dsolr.solr.home=/data/solr
-Dsolr.install.dir=/opt/solr

I have many solrcloud collections, and separate data/install dirs, and
I’ve never had to do anything with manual per-collection or per-replica
data dirs.

That said, it’s been a while since I set this up, and I may not remember
all the pieces. 
You might need something like this too, for example:

-Djetty.home=/opt/solr/server


On 12/14/15, 3:11 PM, "Erick Erickson" <er...@gmail.com> wrote:

>Currently, it'll be a little tedious but here's what you can do (going
>partly from memory)...
>
>When you create the collection, specify the special value EMPTY for
>createNodeSet (Solr 5.3+).
>Use ADDREPLICA to add each individual replica. When you do this, you
>can add a dataDir for
>each individual replica and thus keep them separate, i.e. for a
>particular box the first
>replica would get /data/solr/collection1_shard1_replica1, the second
>/data/solr/collection1_shard2_replica1 and so forth.
>
>If you don't have Solr 5.3+, you can still to the same thing, except
>you create your collection letting
>the replicas fall where they will. Then do the ADDREPLICA as above.
>When that's all done,
>DELETEREPLICA for the original replicas.
>
>Best,
>Erick
>
>On Mon, Dec 14, 2015 at 2:21 PM, Tom Evans <te...@googlemail.com>
>wrote:
>> On Mon, Dec 14, 2015 at 1:22 PM, Shawn Heisey <ap...@elyograg.org>
>>wrote:
>>> On 12/14/2015 10:49 AM, Tom Evans wrote:
>>>> When I tried this in SolrCloud mode, specifying
>>>> "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
>>>> for the first collection, but then the second collection tried to use
>>>> the same directory to store its index, which obviously failed. I fixed
>>>> this by changing solrconfig.xml in each collection to specify a
>>>> specific directory, like so:
>>>>
>>>>   <dataDir>${solr.data.dir:}products</dataDir>
>>>>
>>>> Looking back after the weekend, I'm not a big fan of this. Is there a
>>>> way to add a core.properties to ZK, or a way to specify
>>>> core.baseDatadir on the command line, or just a better way of handling
>>>> this that I'm not aware of?
>>>
>>> Since you're running SolrCloud, just let Solr handle the dataDir, don't
>>> try to override it.  It will default to "data" relative to the
>>> instanceDir.  Each instanceDir is likely to be in the solr home.
>>>
>>> With SolrCloud, your cores will not contain a "conf" directory (unless
>>> you create it manually), therefore the on-disk locations will be *only*
>>> data, there's not really any need to have separate locations for
>>> instanceDir and dataDir.  All active configuration information for
>>> SolrCloud is in zookeeper.
>>>
>>
>> That makes sense, but I guess I was asking the wrong question :)
>>
>> We have our SSDs mounted on /data/solr, which is where our indexes
>> should go, but our solr install is on /opt/solr, with the default solr
>> home in /opt/solr/server/solr. How do we change where the indexes get
>> put so they end up on the fast storage?
>>
>> Cheers
>>
>> Tom


Re: Moving to SolrCloud, specifying dataDir correctly

Posted by Erick Erickson <er...@gmail.com>.
Currently, it'll be a little tedious but here's what you can do (going
partly from memory)...

When you create the collection, specify the special value EMPTY for
createNodeSet (Solr 5.3+).
Use ADDREPLICA to add each individual replica. When you do this, you
can add a dataDir for
each individual replica and thus keep them separate, i.e. for a
particular box the first
replica would get /data/solr/collection1_shard1_replica1, the second
/data/solr/collection1_shard2_replica1 and so forth.

If you don't have Solr 5.3+, you can still to the same thing, except
you create your collection letting
the replicas fall where they will. Then do the ADDREPLICA as above.
When that's all done,
DELETEREPLICA for the original replicas.

Best,
Erick

On Mon, Dec 14, 2015 at 2:21 PM, Tom Evans <te...@googlemail.com> wrote:
> On Mon, Dec 14, 2015 at 1:22 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>> On 12/14/2015 10:49 AM, Tom Evans wrote:
>>> When I tried this in SolrCloud mode, specifying
>>> "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
>>> for the first collection, but then the second collection tried to use
>>> the same directory to store its index, which obviously failed. I fixed
>>> this by changing solrconfig.xml in each collection to specify a
>>> specific directory, like so:
>>>
>>>   <dataDir>${solr.data.dir:}products</dataDir>
>>>
>>> Looking back after the weekend, I'm not a big fan of this. Is there a
>>> way to add a core.properties to ZK, or a way to specify
>>> core.baseDatadir on the command line, or just a better way of handling
>>> this that I'm not aware of?
>>
>> Since you're running SolrCloud, just let Solr handle the dataDir, don't
>> try to override it.  It will default to "data" relative to the
>> instanceDir.  Each instanceDir is likely to be in the solr home.
>>
>> With SolrCloud, your cores will not contain a "conf" directory (unless
>> you create it manually), therefore the on-disk locations will be *only*
>> data, there's not really any need to have separate locations for
>> instanceDir and dataDir.  All active configuration information for
>> SolrCloud is in zookeeper.
>>
>
> That makes sense, but I guess I was asking the wrong question :)
>
> We have our SSDs mounted on /data/solr, which is where our indexes
> should go, but our solr install is on /opt/solr, with the default solr
> home in /opt/solr/server/solr. How do we change where the indexes get
> put so they end up on the fast storage?
>
> Cheers
>
> Tom

Re: Moving to SolrCloud, specifying dataDir correctly

Posted by Tom Evans <te...@googlemail.com>.
On Mon, Dec 14, 2015 at 1:22 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 12/14/2015 10:49 AM, Tom Evans wrote:
>> When I tried this in SolrCloud mode, specifying
>> "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
>> for the first collection, but then the second collection tried to use
>> the same directory to store its index, which obviously failed. I fixed
>> this by changing solrconfig.xml in each collection to specify a
>> specific directory, like so:
>>
>>   <dataDir>${solr.data.dir:}products</dataDir>
>>
>> Looking back after the weekend, I'm not a big fan of this. Is there a
>> way to add a core.properties to ZK, or a way to specify
>> core.baseDatadir on the command line, or just a better way of handling
>> this that I'm not aware of?
>
> Since you're running SolrCloud, just let Solr handle the dataDir, don't
> try to override it.  It will default to "data" relative to the
> instanceDir.  Each instanceDir is likely to be in the solr home.
>
> With SolrCloud, your cores will not contain a "conf" directory (unless
> you create it manually), therefore the on-disk locations will be *only*
> data, there's not really any need to have separate locations for
> instanceDir and dataDir.  All active configuration information for
> SolrCloud is in zookeeper.
>

That makes sense, but I guess I was asking the wrong question :)

We have our SSDs mounted on /data/solr, which is where our indexes
should go, but our solr install is on /opt/solr, with the default solr
home in /opt/solr/server/solr. How do we change where the indexes get
put so they end up on the fast storage?

Cheers

Tom

Re: Moving to SolrCloud, specifying dataDir correctly

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/14/2015 10:49 AM, Tom Evans wrote:
> When I tried this in SolrCloud mode, specifying
> "-Dsolr.data.dir=/mnt/solr/" when starting each node, it worked fine
> for the first collection, but then the second collection tried to use
> the same directory to store its index, which obviously failed. I fixed
> this by changing solrconfig.xml in each collection to specify a
> specific directory, like so:
>
>   <dataDir>${solr.data.dir:}products</dataDir>
>
> Looking back after the weekend, I'm not a big fan of this. Is there a
> way to add a core.properties to ZK, or a way to specify
> core.baseDatadir on the command line, or just a better way of handling
> this that I'm not aware of?

Since you're running SolrCloud, just let Solr handle the dataDir, don't
try to override it.  It will default to "data" relative to the
instanceDir.  Each instanceDir is likely to be in the solr home.

With SolrCloud, your cores will not contain a "conf" directory (unless
you create it manually), therefore the on-disk locations will be *only*
data, there's not really any need to have separate locations for
instanceDir and dataDir.  All active configuration information for
SolrCloud is in zookeeper.

Thanks,
Shawn