You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Walter Underwood <wu...@wunderwood.org> on 2017/02/24 01:41:35 UTC

Setting Solr data dir isn't really working (6.3.0)

I did this in the solrconfig.xml for both collections (tutors and questions). 

  <dataDir>/solr/data</dataDir>

I deleted the old collection indexes, reloaded, restarted, and created a new collection for “tutors". And I see this on the disk.

[wunder@new-solr-c02.test3]# ls -l /solr/data
total 36
drwxr-xr-x 2 bin bin 20480 Feb 23 17:40 index
drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 snapshot_metadata
drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_fuzzy
drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_infix
drwxr-xr-x 2 bin bin  4096 Feb 23 17:40 tlog
[wunder@new-solr-c02.test3]# ls -l /apps/solr6/server/solr
total 12
drwxr-xr-x 5 bin bin   93 Jul 14  2016 configsets
-rw-r--r-- 1 bin bin 3037 Jul 14  2016 README.txt
-rw-r--r-- 1 bin bin 2117 Aug 31 20:13 solr.xml
drwxr-xr-x 2 bin bin   28 Feb 23 15:57 tutors_shard1_replica5
-rw-r--r-- 1 bin bin  501 Jul 14  2016 zoo.cfg
[wunder@new-solr-c02.test3]#

Seems pretty broken to me.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


Re: Setting Solr data dir isn't really working (6.3.0)

Posted by Erick Erickson <er...@gmail.com>.
I don't see that any index data _is_ outside the data dir.
Configuration stuff might be.
What's the content of

/apps/solr6/server/solr/tutors_shard_1_replica_5?

If it's just some configs
(and maybe even just core.properties) then the index data isn't there, the
"instanceDir" is.

I should probably be specific here in that I'm taking "index data" to mean
"tlogs and segment files"

If segments files and/or tlogs are in /apps/solr6/server/solr that's
very bad indeed, but they look
to be under /solr/data.

Not sure what's with /solr/data/suggest_subject_names_fuzzy though.

All that said, the whole concept of "instanceDir" is something that we
should probably revisit. It
made more sense when we stored conf locally for each core. With
configsets it's pretty degenerate.

And one thing that does disturb me is that some other core with the
same data dir
would, presumably, also have the /solr/data/index directory and you'd
have two cores pointing
to the same data directory...

On Thu, Feb 23, 2017 at 6:44 PM, Walter Underwood <wu...@wunderwood.org> wrote:
> The bug is that the dataDir is /solr/data and the index data is in /apps/solr6/server/solr. Except for the suggest data. No index data should be outside the dataDir, right?
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Feb 23, 2017, at 6:11 PM, Erick Erickson <er...@gmail.com> wrote:
>>
>> Not quite sure what you're complaint is. Is it that
>> you've get the index directory under /solr/data and
>> not under, say, /solr/data/tutors? Or that
>> /apps/solr6/server/solr/tutors_shard1_replica5 exists at all?
>>
>> And what's in tutors_shard1_replica5 anyway? Just the
>> core.properties file?
>>
>> Erick
>>
>> On Thu, Feb 23, 2017 at 5:41 PM, Walter Underwood <wu...@wunderwood.org> wrote:
>>> I did this in the solrconfig.xml for both collections (tutors and questions).
>>>
>>>  <dataDir>/solr/data</dataDir>
>>>
>>> I deleted the old collection indexes, reloaded, restarted, and created a new collection for “tutors". And I see this on the disk.
>>>
>>> [wunder@new-solr-c02.test3]# ls -l /solr/data
>>> total 36
>>> drwxr-xr-x 2 bin bin 20480 Feb 23 17:40 index
>>> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 snapshot_metadata
>>> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_fuzzy
>>> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_infix
>>> drwxr-xr-x 2 bin bin  4096 Feb 23 17:40 tlog
>>> [wunder@new-solr-c02.test3]# ls -l /apps/solr6/server/solr
>>> total 12
>>> drwxr-xr-x 5 bin bin   93 Jul 14  2016 configsets
>>> -rw-r--r-- 1 bin bin 3037 Jul 14  2016 README.txt
>>> -rw-r--r-- 1 bin bin 2117 Aug 31 20:13 solr.xml
>>> drwxr-xr-x 2 bin bin   28 Feb 23 15:57 tutors_shard1_replica5
>>> -rw-r--r-- 1 bin bin  501 Jul 14  2016 zoo.cfg
>>> [wunder@new-solr-c02.test3]#
>>>
>>> Seems pretty broken to me.
>>>
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>

Re: Setting Solr data dir isn't really working (6.3.0)

Posted by Walter Underwood <wu...@wunderwood.org>.
The bug is that the dataDir is /solr/data and the index data is in /apps/solr6/server/solr. Except for the suggest data. No index data should be outside the dataDir, right?

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 23, 2017, at 6:11 PM, Erick Erickson <er...@gmail.com> wrote:
> 
> Not quite sure what you're complaint is. Is it that
> you've get the index directory under /solr/data and
> not under, say, /solr/data/tutors? Or that
> /apps/solr6/server/solr/tutors_shard1_replica5 exists at all?
> 
> And what's in tutors_shard1_replica5 anyway? Just the
> core.properties file?
> 
> Erick
> 
> On Thu, Feb 23, 2017 at 5:41 PM, Walter Underwood <wu...@wunderwood.org> wrote:
>> I did this in the solrconfig.xml for both collections (tutors and questions).
>> 
>>  <dataDir>/solr/data</dataDir>
>> 
>> I deleted the old collection indexes, reloaded, restarted, and created a new collection for “tutors". And I see this on the disk.
>> 
>> [wunder@new-solr-c02.test3]# ls -l /solr/data
>> total 36
>> drwxr-xr-x 2 bin bin 20480 Feb 23 17:40 index
>> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 snapshot_metadata
>> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_fuzzy
>> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_infix
>> drwxr-xr-x 2 bin bin  4096 Feb 23 17:40 tlog
>> [wunder@new-solr-c02.test3]# ls -l /apps/solr6/server/solr
>> total 12
>> drwxr-xr-x 5 bin bin   93 Jul 14  2016 configsets
>> -rw-r--r-- 1 bin bin 3037 Jul 14  2016 README.txt
>> -rw-r--r-- 1 bin bin 2117 Aug 31 20:13 solr.xml
>> drwxr-xr-x 2 bin bin   28 Feb 23 15:57 tutors_shard1_replica5
>> -rw-r--r-- 1 bin bin  501 Jul 14  2016 zoo.cfg
>> [wunder@new-solr-c02.test3]#
>> 
>> Seems pretty broken to me.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 


Re: Setting Solr data dir isn't really working (6.3.0)

Posted by Erick Erickson <er...@gmail.com>.
Not quite sure what you're complaint is. Is it that
you've get the index directory under /solr/data and
not under, say, /solr/data/tutors? Or that
/apps/solr6/server/solr/tutors_shard1_replica5 exists at all?

And what's in tutors_shard1_replica5 anyway? Just the
core.properties file?

Erick

On Thu, Feb 23, 2017 at 5:41 PM, Walter Underwood <wu...@wunderwood.org> wrote:
> I did this in the solrconfig.xml for both collections (tutors and questions).
>
>   <dataDir>/solr/data</dataDir>
>
> I deleted the old collection indexes, reloaded, restarted, and created a new collection for “tutors". And I see this on the disk.
>
> [wunder@new-solr-c02.test3]# ls -l /solr/data
> total 36
> drwxr-xr-x 2 bin bin 20480 Feb 23 17:40 index
> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 snapshot_metadata
> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_fuzzy
> drwxr-xr-x 2 bin bin  4096 Feb 23 15:57 suggest_subject_names_infix
> drwxr-xr-x 2 bin bin  4096 Feb 23 17:40 tlog
> [wunder@new-solr-c02.test3]# ls -l /apps/solr6/server/solr
> total 12
> drwxr-xr-x 5 bin bin   93 Jul 14  2016 configsets
> -rw-r--r-- 1 bin bin 3037 Jul 14  2016 README.txt
> -rw-r--r-- 1 bin bin 2117 Aug 31 20:13 solr.xml
> drwxr-xr-x 2 bin bin   28 Feb 23 15:57 tutors_shard1_replica5
> -rw-r--r-- 1 bin bin  501 Jul 14  2016 zoo.cfg
> [wunder@new-solr-c02.test3]#
>
> Seems pretty broken to me.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>

Re: Setting Solr data dir isn't really working (6.3.0)

Posted by Walter Underwood <wu...@wunderwood.org>.
Thanks! 

Now I need to write up the mistakes I made trying to use the solr command.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 24, 2017, at 11:17 AM, Erick Erickson <er...@gmail.com> wrote:
> 
> bq: Which means the docs have a problem, since they are recommending
> something that should not be recommended
> 
> Absolutely, just changed:
> https://cwiki.apache.org/confluence/display/solr/DataDir+and+DirectoryFactory+in+SolrConfig.
> Was that the one that mislead you?
> 
> Erick
> 
> On Fri, Feb 24, 2017 at 10:42 AM, Walter Underwood
> <wu...@wunderwood.org> wrote:
>> Running with this, which works they way we want.
>> 
>>  <dataDir>/solr/data/${solr.core.name}</dataDir>
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 24, 2017, at 10:08 AM, Walter Underwood <wu...@wunderwood.org> wrote:
>>> 
>>> Dang it. I know better than that, but I was blindly following the docs. Which means the docs have a problem, since they are recommending something that should not be recommended.
>>> 
>>> Putting variable data on a different volume is very common. Official support for that goes at least as far back as Unix V7 (1979), with /var. It should be easy to do in Solr.
>>> 
>>> I expected to see the shard names as directories under /solr/data, but I now remember that I need to set that with a variable.
>>> 
>>> Time to delete everything and rebuild everything again.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Feb 24, 2017, at 8:30 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>>>> 
>>>> On 2/23/2017 6:41 PM, Walter Underwood wrote:
>>>>> I did this in the solrconfig.xml for both collections (tutors and questions).
>>>>> 
>>>>> <dataDir>/solr/data</dataDir>
>>>>> 
>>>>> I deleted the old collection indexes, reloaded, restarted, and created a new collection for “tutors". And I see this on the disk.
>>>> 
>>>> Setting dataDir in solrconfig.xml, especially to an absolute path like
>>>> that, is generally not a good idea.  It's VERY bad if that config will
>>>> be used by multiple cores.  The best place to do it is in
>>>> core.properties, so it's part of the core definition and independent of
>>>> config/schema.  IMHO it's best to make it a relative path.  Below is a
>>>> core.properties file from my dev system running 6.3.0, in a
>>>> "cores/sparkinc_0" directory under the solr home.
>>>> 
>>>> I do not see anything broken in the directory listings you provided.
>>>> What do you see that is misplaced?
>>>> 
>>>> With SolrCloud, I wouldn't be setting dataDir *at all* -- I would let
>>>> Solr handle that, mostly because the config for SolrCloud is not on the
>>>> disk and therefore dataDir doesn't need to be separated from instanceDir.
>>>> 
>>>> #Written by CorePropertiesLocator
>>>> #Mon Feb 06 19:24:18 UTC 2017
>>>> name=sparkinclive
>>>> loadonStartup=false
>>>> dataDir=../../data/sparkinc_0
>>>> transient=false
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>>> 
>> 


Re: Setting Solr data dir isn't really working (6.3.0)

Posted by Erick Erickson <er...@gmail.com>.
bq: Which means the docs have a problem, since they are recommending
something that should not be recommended

Absolutely, just changed:
https://cwiki.apache.org/confluence/display/solr/DataDir+and+DirectoryFactory+in+SolrConfig.
Was that the one that mislead you?

Erick

On Fri, Feb 24, 2017 at 10:42 AM, Walter Underwood
<wu...@wunderwood.org> wrote:
> Running with this, which works they way we want.
>
>   <dataDir>/solr/data/${solr.core.name}</dataDir>
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Feb 24, 2017, at 10:08 AM, Walter Underwood <wu...@wunderwood.org> wrote:
>>
>> Dang it. I know better than that, but I was blindly following the docs. Which means the docs have a problem, since they are recommending something that should not be recommended.
>>
>> Putting variable data on a different volume is very common. Official support for that goes at least as far back as Unix V7 (1979), with /var. It should be easy to do in Solr.
>>
>> I expected to see the shard names as directories under /solr/data, but I now remember that I need to set that with a variable.
>>
>> Time to delete everything and rebuild everything again.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>>> On Feb 24, 2017, at 8:30 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>>>
>>> On 2/23/2017 6:41 PM, Walter Underwood wrote:
>>>> I did this in the solrconfig.xml for both collections (tutors and questions).
>>>>
>>>> <dataDir>/solr/data</dataDir>
>>>>
>>>> I deleted the old collection indexes, reloaded, restarted, and created a new collection for “tutors". And I see this on the disk.
>>>
>>> Setting dataDir in solrconfig.xml, especially to an absolute path like
>>> that, is generally not a good idea.  It's VERY bad if that config will
>>> be used by multiple cores.  The best place to do it is in
>>> core.properties, so it's part of the core definition and independent of
>>> config/schema.  IMHO it's best to make it a relative path.  Below is a
>>> core.properties file from my dev system running 6.3.0, in a
>>> "cores/sparkinc_0" directory under the solr home.
>>>
>>> I do not see anything broken in the directory listings you provided.
>>> What do you see that is misplaced?
>>>
>>> With SolrCloud, I wouldn't be setting dataDir *at all* -- I would let
>>> Solr handle that, mostly because the config for SolrCloud is not on the
>>> disk and therefore dataDir doesn't need to be separated from instanceDir.
>>>
>>> #Written by CorePropertiesLocator
>>> #Mon Feb 06 19:24:18 UTC 2017
>>> name=sparkinclive
>>> loadonStartup=false
>>> dataDir=../../data/sparkinc_0
>>> transient=false
>>>
>>> Thanks,
>>> Shawn
>>>
>>
>

Re: Setting Solr data dir isn't really working (6.3.0)

Posted by Walter Underwood <wu...@wunderwood.org>.
Running with this, which works they way we want.

  <dataDir>/solr/data/${solr.core.name}</dataDir>

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 24, 2017, at 10:08 AM, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> Dang it. I know better than that, but I was blindly following the docs. Which means the docs have a problem, since they are recommending something that should not be recommended.
> 
> Putting variable data on a different volume is very common. Official support for that goes at least as far back as Unix V7 (1979), with /var. It should be easy to do in Solr.
> 
> I expected to see the shard names as directories under /solr/data, but I now remember that I need to set that with a variable.
> 
> Time to delete everything and rebuild everything again.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Feb 24, 2017, at 8:30 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>> 
>> On 2/23/2017 6:41 PM, Walter Underwood wrote:
>>> I did this in the solrconfig.xml for both collections (tutors and questions). 
>>> 
>>> <dataDir>/solr/data</dataDir>
>>> 
>>> I deleted the old collection indexes, reloaded, restarted, and created a new collection for “tutors". And I see this on the disk.
>> 
>> Setting dataDir in solrconfig.xml, especially to an absolute path like
>> that, is generally not a good idea.  It's VERY bad if that config will
>> be used by multiple cores.  The best place to do it is in
>> core.properties, so it's part of the core definition and independent of
>> config/schema.  IMHO it's best to make it a relative path.  Below is a
>> core.properties file from my dev system running 6.3.0, in a
>> "cores/sparkinc_0" directory under the solr home.
>> 
>> I do not see anything broken in the directory listings you provided. 
>> What do you see that is misplaced?
>> 
>> With SolrCloud, I wouldn't be setting dataDir *at all* -- I would let
>> Solr handle that, mostly because the config for SolrCloud is not on the
>> disk and therefore dataDir doesn't need to be separated from instanceDir.
>> 
>> #Written by CorePropertiesLocator
>> #Mon Feb 06 19:24:18 UTC 2017
>> name=sparkinclive
>> loadonStartup=false
>> dataDir=../../data/sparkinc_0
>> transient=false
>> 
>> Thanks,
>> Shawn
>> 
> 


Re: Setting Solr data dir isn't really working (6.3.0)

Posted by Walter Underwood <wu...@wunderwood.org>.
Dang it. I know better than that, but I was blindly following the docs. Which means the docs have a problem, since they are recommending something that should not be recommended.

Putting variable data on a different volume is very common. Official support for that goes at least as far back as Unix V7 (1979), with /var. It should be easy to do in Solr.

I expected to see the shard names as directories under /solr/data, but I now remember that I need to set that with a variable.

Time to delete everything and rebuild everything again.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 24, 2017, at 8:30 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
> On 2/23/2017 6:41 PM, Walter Underwood wrote:
>> I did this in the solrconfig.xml for both collections (tutors and questions). 
>> 
>>  <dataDir>/solr/data</dataDir>
>> 
>> I deleted the old collection indexes, reloaded, restarted, and created a new collection for “tutors". And I see this on the disk.
> 
> Setting dataDir in solrconfig.xml, especially to an absolute path like
> that, is generally not a good idea.  It's VERY bad if that config will
> be used by multiple cores.  The best place to do it is in
> core.properties, so it's part of the core definition and independent of
> config/schema.  IMHO it's best to make it a relative path.  Below is a
> core.properties file from my dev system running 6.3.0, in a
> "cores/sparkinc_0" directory under the solr home.
> 
> I do not see anything broken in the directory listings you provided. 
> What do you see that is misplaced?
> 
> With SolrCloud, I wouldn't be setting dataDir *at all* -- I would let
> Solr handle that, mostly because the config for SolrCloud is not on the
> disk and therefore dataDir doesn't need to be separated from instanceDir.
> 
> #Written by CorePropertiesLocator
> #Mon Feb 06 19:24:18 UTC 2017
> name=sparkinclive
> loadonStartup=false
> dataDir=../../data/sparkinc_0
> transient=false
> 
> Thanks,
> Shawn
> 


Re: Setting Solr data dir isn't really working (6.3.0)

Posted by Shawn Heisey <ap...@elyograg.org>.
On 2/23/2017 6:41 PM, Walter Underwood wrote:
> I did this in the solrconfig.xml for both collections (tutors and questions). 
>
>   <dataDir>/solr/data</dataDir>
>
> I deleted the old collection indexes, reloaded, restarted, and created a new collection for \u201ctutors". And I see this on the disk.

Setting dataDir in solrconfig.xml, especially to an absolute path like
that, is generally not a good idea.  It's VERY bad if that config will
be used by multiple cores.  The best place to do it is in
core.properties, so it's part of the core definition and independent of
config/schema.  IMHO it's best to make it a relative path.  Below is a
core.properties file from my dev system running 6.3.0, in a
"cores/sparkinc_0" directory under the solr home.

I do not see anything broken in the directory listings you provided. 
What do you see that is misplaced?

With SolrCloud, I wouldn't be setting dataDir *at all* -- I would let
Solr handle that, mostly because the config for SolrCloud is not on the
disk and therefore dataDir doesn't need to be separated from instanceDir.

#Written by CorePropertiesLocator
#Mon Feb 06 19:24:18 UTC 2017
name=sparkinclive
loadonStartup=false
dataDir=../../data/sparkinc_0
transient=false

Thanks,
Shawn