You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Prateek Jain J <pr...@ericsson.com> on 2016/11/14 15:37:16 UTC

index and data directories

Hi All,

We are using solr 4.8.1 and would like to know if it is possible to store data and indexes in separate directories? I know following tag exist in solrconfig.xml file

<!-- Data Directory Used to specify an alternate directory to hold all index
                                data other than the default ./data under the Solr home. If replication is
                                in use, this should match the replication configuration. -->
                <dataDir>C:/del-it/solr/cm_events_nbi/data</dataDir>



Regards,
Prateek Jain

Re: index and data directories

Posted by Erick Erickson <er...@gmail.com>.
Oh, and to make matters even more "interesting", for
docValues=true fields there's no need to even store
anything, you can return the fields in the fl list that
are docValues=true, stored=false.......

On Tue, Nov 15, 2016 at 1:53 AM, Prateek Jain J
<pr...@ericsson.com> wrote:
>
> Thanks a lot Erick
>
>
> Regards,
> Prateek Jain
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 14 November 2016 09:14 PM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: index and data directories
>
> Theoretically, perhaps. And it's quite true that stored data for fields marked stored=true are just passed through verbatim and compressed on disk while the data associated with indexed=true fields go through an analysis chain and are stored in a much different format. However these different data are simply stored in files with different suffixes in a segment. So you might have _0.fdx, _0.fdt, _0.tim, _0.tvx etc. that together form a single segment.
>
> This is done on a per-segment basis. So certain segment files, namely the *.fdt and *.fdx file will contain the stored data while other extensions have the indexed data, see: "File naming" here for a somewhat out of date format, but close enough for this discussion:
> https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html.
> And there's no option to store the *.fdt and *.fdx files independently from the rest of the segment files.
>
> This statement: "I mean documents which are to be indexed" really doesn't make sense. You send these things called Solr documents to be indexed, but they are just a set of fields with values handled as their definitions indicate (i.e. respecting stored=true|false, indexed=true false, docValues=true|false. The Solr document sent by SolrJ is simply thrown away after processing into segment files.
>
> If you're sending semi-structured docs (say Word, PDF etc) to be indexed through Tika they are simply transformed into a Solr doc (set of field/value pairs) and the original document is thrown away as well. There's no option to store the original semi-structured doc either.
>
>
> Best,
> Erick
>
> On Mon, Nov 14, 2016 at 12:35 PM, Prateek Jain J <pr...@ericsson.com> wrote:
>>
>> By data, I mean documents which are to be indexed. Some fields can be stored="true" but that doesn’t matter.
>>
>> For example: App1 creates an object (AppObj) to be indexed and sends it to SOLR via solrj. Some of the attributes of this object can be declared to be used for storage.
>>
>> Now, my understanding is data and indexes generated on data are two separate things. In my particular example, all fields have stored="true" but only selected fields have indexed="true". My expectation is, indexes are stored separately from data because indexes can be generated by different techniques/algorithms but data/documents remain unchanged. Please correct me if my understanding is not correct.
>>
>>
>> Regards,
>> Prateek Jain
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: 14 November 2016 07:05 PM
>> To: solr-user <so...@lucene.apache.org>
>> Subject: Re: index and data directories
>>
>> The question is pretty opaque. What do you mean by "data" as opposed to "indexes"? Are you talking about where Lucene puts stored="true"
>> fields? If not, what do you mean by "data"?
>>
>> If you are talking about where Lucene puts the stored="true" bits the no, there's no way to segregate that our from the other files that make up a segment.
>>
>> Best,
>> Erick
>>
>> On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J <pr...@ericsson.com> wrote:
>>>
>>> Hi Alex,
>>>
>>>  I am unable to get it correctly. Is it possible to store indexes and data separately?
>>>
>>>
>>> Regards,
>>> Prateek Jain
>>>
>>> -----Original Message-----
>>> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
>>> Sent: 14 November 2016 03:53 PM
>>> To: solr-user <so...@lucene.apache.org>
>>> Subject: Re: index and data directories
>>>
>>> solr.xml also has a bunch of properties under the core tag:
>>>
>>>   <cores adminPath="/admin/cores">
>>>     <core name="core0" instanceDir="core0">
>>>       <property name="dataDir" value="/data/core0"/></core>
>>>     <core name="core1" instanceDir="core1"/>
>>>   </cores>
>>>
>>> You can get the Reference Guide for your specific version here:
>>> http://archive.apache.org/dist/lucene/solr/ref-guide/
>>>
>>> Regards,
>>>    Alex.
>>> ----
>>> Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG Newsletter and resources for Solr beginners and intermediates:
>>> http://www.solr-start.com/
>>>
>>>
>>> On 15 November 2016 at 02:37, Prateek Jain J <pr...@ericsson.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> We are using solr 4.8.1 and would like to know if it is possible to
>>>> store data and indexes in separate directories? I know following tag
>>>> exist in solrconfig.xml file
>>>>
>>>> <!-- Data Directory Used to specify an alternate directory to hold all index
>>>>                                 data other than the default ./data under the Solr home. If replication is
>>>>                                 in use, this should match the replication configuration. -->
>>>>                 <dataDir>C:/del-it/solr/cm_events_nbi/data</dataDir>
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Prateek Jain

RE: index and data directories

Posted by Prateek Jain J <pr...@ericsson.com>.
Thanks a lot Erick


Regards,
Prateek Jain

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 14 November 2016 09:14 PM
To: solr-user <so...@lucene.apache.org>
Subject: Re: index and data directories

Theoretically, perhaps. And it's quite true that stored data for fields marked stored=true are just passed through verbatim and compressed on disk while the data associated with indexed=true fields go through an analysis chain and are stored in a much different format. However these different data are simply stored in files with different suffixes in a segment. So you might have _0.fdx, _0.fdt, _0.tim, _0.tvx etc. that together form a single segment.

This is done on a per-segment basis. So certain segment files, namely the *.fdt and *.fdx file will contain the stored data while other extensions have the indexed data, see: "File naming" here for a somewhat out of date format, but close enough for this discussion:
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html.
And there's no option to store the *.fdt and *.fdx files independently from the rest of the segment files.

This statement: "I mean documents which are to be indexed" really doesn't make sense. You send these things called Solr documents to be indexed, but they are just a set of fields with values handled as their definitions indicate (i.e. respecting stored=true|false, indexed=true false, docValues=true|false. The Solr document sent by SolrJ is simply thrown away after processing into segment files.

If you're sending semi-structured docs (say Word, PDF etc) to be indexed through Tika they are simply transformed into a Solr doc (set of field/value pairs) and the original document is thrown away as well. There's no option to store the original semi-structured doc either.


Best,
Erick

On Mon, Nov 14, 2016 at 12:35 PM, Prateek Jain J <pr...@ericsson.com> wrote:
>
> By data, I mean documents which are to be indexed. Some fields can be stored="true" but that doesn’t matter.
>
> For example: App1 creates an object (AppObj) to be indexed and sends it to SOLR via solrj. Some of the attributes of this object can be declared to be used for storage.
>
> Now, my understanding is data and indexes generated on data are two separate things. In my particular example, all fields have stored="true" but only selected fields have indexed="true". My expectation is, indexes are stored separately from data because indexes can be generated by different techniques/algorithms but data/documents remain unchanged. Please correct me if my understanding is not correct.
>
>
> Regards,
> Prateek Jain
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 14 November 2016 07:05 PM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: index and data directories
>
> The question is pretty opaque. What do you mean by "data" as opposed to "indexes"? Are you talking about where Lucene puts stored="true"
> fields? If not, what do you mean by "data"?
>
> If you are talking about where Lucene puts the stored="true" bits the no, there's no way to segregate that our from the other files that make up a segment.
>
> Best,
> Erick
>
> On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J <pr...@ericsson.com> wrote:
>>
>> Hi Alex,
>>
>>  I am unable to get it correctly. Is it possible to store indexes and data separately?
>>
>>
>> Regards,
>> Prateek Jain
>>
>> -----Original Message-----
>> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
>> Sent: 14 November 2016 03:53 PM
>> To: solr-user <so...@lucene.apache.org>
>> Subject: Re: index and data directories
>>
>> solr.xml also has a bunch of properties under the core tag:
>>
>>   <cores adminPath="/admin/cores">
>>     <core name="core0" instanceDir="core0">
>>       <property name="dataDir" value="/data/core0"/></core>
>>     <core name="core1" instanceDir="core1"/>
>>   </cores>
>>
>> You can get the Reference Guide for your specific version here:
>> http://archive.apache.org/dist/lucene/solr/ref-guide/
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 15 November 2016 at 02:37, Prateek Jain J <pr...@ericsson.com> wrote:
>>>
>>> Hi All,
>>>
>>> We are using solr 4.8.1 and would like to know if it is possible to 
>>> store data and indexes in separate directories? I know following tag 
>>> exist in solrconfig.xml file
>>>
>>> <!-- Data Directory Used to specify an alternate directory to hold all index
>>>                                 data other than the default ./data under the Solr home. If replication is
>>>                                 in use, this should match the replication configuration. -->
>>>                 <dataDir>C:/del-it/solr/cm_events_nbi/data</dataDir>
>>>
>>>
>>>
>>> Regards,
>>> Prateek Jain

Re: index and data directories

Posted by Erick Erickson <er...@gmail.com>.
Theoretically, perhaps. And it's quite true that stored data for
fields marked stored=true are just passed through verbatim and
compressed on disk while the data associated with indexed=true fields
go through an analysis chain and are stored in a much different
format. However these different data are simply stored in files with
different suffixes in a segment. So you might have _0.fdx, _0.fdt,
_0.tim, _0.tvx etc. that together form a single segment.

This is done on a per-segment basis. So certain segment files, namely
the *.fdt and *.fdx file will contain the stored data while other
extensions have the indexed data, see: "File naming" here for a
somewhat out of date format, but close enough for this discussion:
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/codecs/lucene40/package-summary.html.
And there's no option to store the *.fdt and *.fdx files independently
from the rest of the segment files.

This statement: "I mean documents which are to be indexed" really
doesn't make sense. You send these things called Solr documents to be
indexed, but they are just a set of fields with values handled as
their definitions indicate (i.e. respecting stored=true|false,
indexed=true
false, docValues=true|false. The Solr document sent by SolrJ is simply
thrown away after processing into segment files.

If you're sending semi-structured docs (say Word, PDF etc) to be
indexed through Tika they are simply transformed into a Solr doc (set
of field/value pairs) and the original document is thrown away as
well. There's no option to store the original semi-structured doc
either.


Best,
Erick

On Mon, Nov 14, 2016 at 12:35 PM, Prateek Jain J
<pr...@ericsson.com> wrote:
>
> By data, I mean documents which are to be indexed. Some fields can be stored="true" but that doesn’t matter.
>
> For example: App1 creates an object (AppObj) to be indexed and sends it to SOLR via solrj. Some of the attributes of this object can be declared to be used for storage.
>
> Now, my understanding is data and indexes generated on data are two separate things. In my particular example, all fields have stored="true" but only selected fields have indexed="true". My expectation is, indexes are stored separately from data because indexes can be generated by different techniques/algorithms but data/documents remain unchanged. Please correct me if my understanding is not correct.
>
>
> Regards,
> Prateek Jain
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 14 November 2016 07:05 PM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: index and data directories
>
> The question is pretty opaque. What do you mean by "data" as opposed to "indexes"? Are you talking about where Lucene puts stored="true"
> fields? If not, what do you mean by "data"?
>
> If you are talking about where Lucene puts the stored="true" bits the no, there's no way to segregate that our from the other files that make up a segment.
>
> Best,
> Erick
>
> On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J <pr...@ericsson.com> wrote:
>>
>> Hi Alex,
>>
>>  I am unable to get it correctly. Is it possible to store indexes and data separately?
>>
>>
>> Regards,
>> Prateek Jain
>>
>> -----Original Message-----
>> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
>> Sent: 14 November 2016 03:53 PM
>> To: solr-user <so...@lucene.apache.org>
>> Subject: Re: index and data directories
>>
>> solr.xml also has a bunch of properties under the core tag:
>>
>>   <cores adminPath="/admin/cores">
>>     <core name="core0" instanceDir="core0">
>>       <property name="dataDir" value="/data/core0"/></core>
>>     <core name="core1" instanceDir="core1"/>
>>   </cores>
>>
>> You can get the Reference Guide for your specific version here:
>> http://archive.apache.org/dist/lucene/solr/ref-guide/
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 15 November 2016 at 02:37, Prateek Jain J <pr...@ericsson.com> wrote:
>>>
>>> Hi All,
>>>
>>> We are using solr 4.8.1 and would like to know if it is possible to
>>> store data and indexes in separate directories? I know following tag
>>> exist in solrconfig.xml file
>>>
>>> <!-- Data Directory Used to specify an alternate directory to hold all index
>>>                                 data other than the default ./data under the Solr home. If replication is
>>>                                 in use, this should match the replication configuration. -->
>>>                 <dataDir>C:/del-it/solr/cm_events_nbi/data</dataDir>
>>>
>>>
>>>
>>> Regards,
>>> Prateek Jain

RE: index and data directories

Posted by Prateek Jain J <pr...@ericsson.com>.
By data, I mean documents which are to be indexed. Some fields can be stored="true" but that doesn’t matter.

For example: App1 creates an object (AppObj) to be indexed and sends it to SOLR via solrj. Some of the attributes of this object can be declared to be used for storage. 

Now, my understanding is data and indexes generated on data are two separate things. In my particular example, all fields have stored="true" but only selected fields have indexed="true". My expectation is, indexes are stored separately from data because indexes can be generated by different techniques/algorithms but data/documents remain unchanged. Please correct me if my understanding is not correct.


Regards,
Prateek Jain

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 14 November 2016 07:05 PM
To: solr-user <so...@lucene.apache.org>
Subject: Re: index and data directories

The question is pretty opaque. What do you mean by "data" as opposed to "indexes"? Are you talking about where Lucene puts stored="true"
fields? If not, what do you mean by "data"?

If you are talking about where Lucene puts the stored="true" bits the no, there's no way to segregate that our from the other files that make up a segment.

Best,
Erick

On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J <pr...@ericsson.com> wrote:
>
> Hi Alex,
>
>  I am unable to get it correctly. Is it possible to store indexes and data separately?
>
>
> Regards,
> Prateek Jain
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: 14 November 2016 03:53 PM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: index and data directories
>
> solr.xml also has a bunch of properties under the core tag:
>
>   <cores adminPath="/admin/cores">
>     <core name="core0" instanceDir="core0">
>       <property name="dataDir" value="/data/core0"/></core>
>     <core name="core1" instanceDir="core1"/>
>   </cores>
>
> You can get the Reference Guide for your specific version here:
> http://archive.apache.org/dist/lucene/solr/ref-guide/
>
> Regards,
>    Alex.
> ----
> Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 15 November 2016 at 02:37, Prateek Jain J <pr...@ericsson.com> wrote:
>>
>> Hi All,
>>
>> We are using solr 4.8.1 and would like to know if it is possible to 
>> store data and indexes in separate directories? I know following tag 
>> exist in solrconfig.xml file
>>
>> <!-- Data Directory Used to specify an alternate directory to hold all index
>>                                 data other than the default ./data under the Solr home. If replication is
>>                                 in use, this should match the replication configuration. -->
>>                 <dataDir>C:/del-it/solr/cm_events_nbi/data</dataDir>
>>
>>
>>
>> Regards,
>> Prateek Jain

Re: index and data directories

Posted by Erick Erickson <er...@gmail.com>.
The question is pretty opaque. What do you mean by "data" as opposed
to "indexes"? Are you talking about where Lucene puts stored="true"
fields? If not, what do you mean by "data"?

If you are talking about where Lucene puts the stored="true" bits the
no, there's no way to segregate that our from the other files that
make up a segment.

Best,
Erick

On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J
<pr...@ericsson.com> wrote:
>
> Hi Alex,
>
>  I am unable to get it correctly. Is it possible to store indexes and data separately?
>
>
> Regards,
> Prateek Jain
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: 14 November 2016 03:53 PM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: index and data directories
>
> solr.xml also has a bunch of properties under the core tag:
>
>   <cores adminPath="/admin/cores">
>     <core name="core0" instanceDir="core0">
>       <property name="dataDir" value="/data/core0"/></core>
>     <core name="core1" instanceDir="core1"/>
>   </cores>
>
> You can get the Reference Guide for your specific version here:
> http://archive.apache.org/dist/lucene/solr/ref-guide/
>
> Regards,
>    Alex.
> ----
> Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 15 November 2016 at 02:37, Prateek Jain J <pr...@ericsson.com> wrote:
>>
>> Hi All,
>>
>> We are using solr 4.8.1 and would like to know if it is possible to
>> store data and indexes in separate directories? I know following tag
>> exist in solrconfig.xml file
>>
>> <!-- Data Directory Used to specify an alternate directory to hold all index
>>                                 data other than the default ./data under the Solr home. If replication is
>>                                 in use, this should match the replication configuration. -->
>>                 <dataDir>C:/del-it/solr/cm_events_nbi/data</dataDir>
>>
>>
>>
>> Regards,
>> Prateek Jain

RE: index and data directories

Posted by Prateek Jain J <pr...@ericsson.com>.
Hi Alex,

 I am unable to get it correctly. Is it possible to store indexes and data separately? 


Regards,
Prateek Jain

-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: 14 November 2016 03:53 PM
To: solr-user <so...@lucene.apache.org>
Subject: Re: index and data directories

solr.xml also has a bunch of properties under the core tag:

  <cores adminPath="/admin/cores">
    <core name="core0" instanceDir="core0">
      <property name="dataDir" value="/data/core0"/></core>
    <core name="core1" instanceDir="core1"/>
  </cores>

You can get the Reference Guide for your specific version here:
http://archive.apache.org/dist/lucene/solr/ref-guide/

Regards,
   Alex.
----
Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 15 November 2016 at 02:37, Prateek Jain J <pr...@ericsson.com> wrote:
>
> Hi All,
>
> We are using solr 4.8.1 and would like to know if it is possible to 
> store data and indexes in separate directories? I know following tag 
> exist in solrconfig.xml file
>
> <!-- Data Directory Used to specify an alternate directory to hold all index
>                                 data other than the default ./data under the Solr home. If replication is
>                                 in use, this should match the replication configuration. -->
>                 <dataDir>C:/del-it/solr/cm_events_nbi/data</dataDir>
>
>
>
> Regards,
> Prateek Jain

Re: index and data directories

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
solr.xml also has a bunch of properties under the core tag:

  <cores adminPath="/admin/cores">
    <core name="core0" instanceDir="core0">
      <property name="dataDir" value="/data/core0"/></core>
    <core name="core1" instanceDir="core1"/>
  </cores>

You can get the Reference Guide for your specific version here:
http://archive.apache.org/dist/lucene/solr/ref-guide/

Regards,
   Alex.
----
Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 15 November 2016 at 02:37, Prateek Jain J
<pr...@ericsson.com> wrote:
>
> Hi All,
>
> We are using solr 4.8.1 and would like to know if it is possible to store data and indexes in separate directories? I know following tag exist in solrconfig.xml file
>
> <!-- Data Directory Used to specify an alternate directory to hold all index
>                                 data other than the default ./data under the Solr home. If replication is
>                                 in use, this should match the replication configuration. -->
>                 <dataDir>C:/del-it/solr/cm_events_nbi/data</dataDir>
>
>
>
> Regards,
> Prateek Jain