You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by tomasv <da...@gmail.com> on 2014/07/01 01:52:29 UTC

Indexing non-stored fields

Hello All, (warning: newbie question)

In our schema.xml we have defined many fields such as:
<field name="firstname" type="string" indexed="true" stored="false" />

Other fields are defined as this:
<field name="recordid" type="long" indexed="true" stored="true" />

Q: If my server is restarted/ rebooted, will I still be able to search for
documents using the "firstname" field? Or will my records need to be
re-indexed before I can search by first name?
It seems that after a re-boot, I can search for the "stored='true'" fields
but not the "stored='false'" fields.

Am I interpreting this correctly? or am I missing something?

Thanks for any help or links! (Still working through the wiki)



--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing non-stored fields

Posted by tomasv <da...@gmail.com>.
Thank you Very much for that explanation. Well done!
-tomas
On Jun 30, 2014 5:55 PM, "Steve McKay-4 [via Lucene]" <
ml-node+s472066n4144902h72@n3.nabble.com> wrote:

> Stored doesn't mean "stored to disk", more like "stored verbatim". When
> you index a field, Solr analyzes the field value and makes it part of the
> index. The index is persisted to disk when you commit, which is why it
> sticks around after a restart. Searching the index, mapping from search
> terms to doc ids, is very fast. However, the index is very very bad at
> going in reverse, from doc ids to terms. That's where stored fields come
> in. When you store a field, Solr takes the field value and stores the
> entire value separate from the index. This makes it trivial to get the
> value for a particular doc id, but it's terrible for searching.
>
> So the stored attribute and the indexed attribute have different purposes.
> Indexed means you want to be able to search on the value, and stored means
> you want to be able to see the value in search results.
>
> On Jun 30, 2014, at 8:15 PM, tomasv <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=4144902&i=0>> wrote:
>
> > Thanks for the quick response.
> >
> > Follow-up newbie question:
> > If the fields are not stored, how is the server able to search for them
> > after a restart? Where does it get the data to be searched?
> >
> > Example:  "bob" (firstname) is indexed but not stored. After initial
> > indexing, I query for "firstname:(bob)" and I get my document back. But
> if
> > I restart the server, where does the server go to retrieve information
> that
> > will allow me to query for "bob" once again? It would seem that "bob"
> got
> > stored someplace if I can query on it after a restart.
> >
> > My untrained mind thinks that searching for "firstname:(bob)" (after a
> > restart) will fail, but that searching for "recordid:(12345)" (in my
> > original example) will succeed since it was indexed+stored.
> >
> > (stored + indexed makes total sense to me; it's the indexed but NOT
> stored
> > that I can't get my head around).
> >
> > Thanks!
> >
> >
> >
> > On Mon, Jun 30, 2014 at 5:05 PM, Shawn Heisey-4 [via Lucene] <
> > [hidden email] <http://user/SendEmail.jtp?type=node&node=4144902&i=1>>
> wrote:
> >
> >>> Hello All, (warning: newbie question)
> >>>
> >>> In our schema.xml we have defined many fields such as:
> >>> <field name="firstname" type="string" indexed="true" stored="false" />
> >>>
> >>> Other fields are defined as this:
> >>> <field name="recordid" type="long" indexed="true" stored="true" />
> >>>
> >>> Q: If my server is restarted/ rebooted, will I still be able to search
> >> for
> >>> documents using the "firstname" field? Or will my records need to be
> >>> re-indexed before I can search by first name?
> >>> It seems that after a re-boot, I can search for the "stored='true'"
> >> fields
> >>> but not the "stored='false'" fields.
> >>>
> >>> Am I interpreting this correctly? or am I missing something?
> >>
> >> Fields that are not stored simply mean that they will not be returned
> in
> >> search results. If they are indexed, then you will be able to search on
> >> those fields.
> >>
> >> This should be the case before or after a restart.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >>
> >>
> >>
> >> ------------------------------
> >> If you reply to this email, your message will be added to the
> discussion
> >> below:
> >>
> >>
> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144894.html
> >> To unsubscribe from Indexing non-stored fields, click here
> >> <
> >> .
> >> NAML
> >> <
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> >>
> >
> >
> >
> > --
> > /*-----------------------
> > * Tomas at Home
> > * [hidden email] <http://user/SendEmail.jtp?type=node&node=4144902&i=2>
> > * ---------------------*/
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144895.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144902.html
>  To unsubscribe from Indexing non-stored fields, click here
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4144893&code=ZGFka2luZEBnbWFpbC5jb218NDE0NDg5M3wtMTcwMzcxNzM5Mg==>
> .
> NAML
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144904.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing non-stored fields

Posted by Steve McKay <st...@b.abbies.us>.
Stored doesn't mean "stored to disk", more like "stored verbatim". When you index a field, Solr analyzes the field value and makes it part of the index. The index is persisted to disk when you commit, which is why it sticks around after a restart. Searching the index, mapping from search terms to doc ids, is very fast. However, the index is very very bad at going in reverse, from doc ids to terms. That's where stored fields come in. When you store a field, Solr takes the field value and stores the entire value separate from the index. This makes it trivial to get the value for a particular doc id, but it's terrible for searching.

So the stored attribute and the indexed attribute have different purposes. Indexed means you want to be able to search on the value, and stored means you want to be able to see the value in search results.

On Jun 30, 2014, at 8:15 PM, tomasv <da...@gmail.com> wrote:

> Thanks for the quick response.
> 
> Follow-up newbie question:
> If the fields are not stored, how is the server able to search for them
> after a restart? Where does it get the data to be searched?
> 
> Example:  "bob" (firstname) is indexed but not stored. After initial
> indexing, I query for "firstname:(bob)" and I get my document back. But if
> I restart the server, where does the server go to retrieve information that
> will allow me to query for "bob" once again? It would seem that "bob" got
> stored someplace if I can query on it after a restart.
> 
> My untrained mind thinks that searching for "firstname:(bob)" (after a
> restart) will fail, but that searching for "recordid:(12345)" (in my
> original example) will succeed since it was indexed+stored.
> 
> (stored + indexed makes total sense to me; it's the indexed but NOT stored
> that I can't get my head around).
> 
> Thanks!
> 
> 
> 
> On Mon, Jun 30, 2014 at 5:05 PM, Shawn Heisey-4 [via Lucene] <
> ml-node+s472066n4144894h24@n3.nabble.com> wrote:
> 
>>> Hello All, (warning: newbie question)
>>> 
>>> In our schema.xml we have defined many fields such as:
>>> <field name="firstname" type="string" indexed="true" stored="false" />
>>> 
>>> Other fields are defined as this:
>>> <field name="recordid" type="long" indexed="true" stored="true" />
>>> 
>>> Q: If my server is restarted/ rebooted, will I still be able to search
>> for
>>> documents using the "firstname" field? Or will my records need to be
>>> re-indexed before I can search by first name?
>>> It seems that after a re-boot, I can search for the "stored='true'"
>> fields
>>> but not the "stored='false'" fields.
>>> 
>>> Am I interpreting this correctly? or am I missing something?
>> 
>> Fields that are not stored simply mean that they will not be returned in
>> search results. If they are indexed, then you will be able to search on
>> those fields.
>> 
>> This should be the case before or after a restart.
>> 
>> Thanks,
>> Shawn
>> 
>> 
>> 
>> 
>> 
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>> 
>> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144894.html
>> To unsubscribe from Indexing non-stored fields, click here
>> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4144893&code=ZGFka2luZEBnbWFpbC5jb218NDE0NDg5M3wtMTcwMzcxNzM5Mg==>
>> .
>> NAML
>> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> 
> 
> 
> 
> -- 
> /*-----------------------
> * Tomas at Home
> * dadkind@gmail.com
> * ---------------------*/
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144895.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing non-stored fields

Posted by tomasv <da...@gmail.com>.
Thanks for the quick response.

Follow-up newbie question:
If the fields are not stored, how is the server able to search for them
after a restart? Where does it get the data to be searched?

Example:  "bob" (firstname) is indexed but not stored. After initial
indexing, I query for "firstname:(bob)" and I get my document back. But if
I restart the server, where does the server go to retrieve information that
will allow me to query for "bob" once again? It would seem that "bob" got
stored someplace if I can query on it after a restart.

My untrained mind thinks that searching for "firstname:(bob)" (after a
restart) will fail, but that searching for "recordid:(12345)" (in my
original example) will succeed since it was indexed+stored.

(stored + indexed makes total sense to me; it's the indexed but NOT stored
that I can't get my head around).

Thanks!



On Mon, Jun 30, 2014 at 5:05 PM, Shawn Heisey-4 [via Lucene] <
ml-node+s472066n4144894h24@n3.nabble.com> wrote:

> > Hello All, (warning: newbie question)
> >
> > In our schema.xml we have defined many fields such as:
> > <field name="firstname" type="string" indexed="true" stored="false" />
> >
> > Other fields are defined as this:
> > <field name="recordid" type="long" indexed="true" stored="true" />
> >
> > Q: If my server is restarted/ rebooted, will I still be able to search
> for
> > documents using the "firstname" field? Or will my records need to be
> > re-indexed before I can search by first name?
> > It seems that after a re-boot, I can search for the "stored='true'"
> fields
> > but not the "stored='false'" fields.
> >
> > Am I interpreting this correctly? or am I missing something?
>
> Fields that are not stored simply mean that they will not be returned in
> search results. If they are indexed, then you will be able to search on
> those fields.
>
> This should be the case before or after a restart.
>
> Thanks,
> Shawn
>
>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144894.html
>  To unsubscribe from Indexing non-stored fields, click here
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4144893&code=ZGFka2luZEBnbWFpbC5jb218NDE0NDg5M3wtMTcwMzcxNzM5Mg==>
> .
> NAML
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
/*-----------------------
 * Tomas at Home
 * dadkind@gmail.com
 * ---------------------*/




--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144895.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing non-stored fields

Posted by Shawn Heisey <so...@elyograg.org>.
> Hello All, (warning: newbie question)
>
> In our schema.xml we have defined many fields such as:
> <field name="firstname" type="string" indexed="true" stored="false" />
>
> Other fields are defined as this:
> <field name="recordid" type="long" indexed="true" stored="true" />
>
> Q: If my server is restarted/ rebooted, will I still be able to search for
> documents using the "firstname" field? Or will my records need to be
> re-indexed before I can search by first name?
> It seems that after a re-boot, I can search for the "stored='true'" fields
> but not the "stored='false'" fields.
>
> Am I interpreting this correctly? or am I missing something?

Fields that are not stored simply mean that they will not be returned in
search results. If they are indexed, then you will be able to search on
those fields.

This should be the case before or after a restart.

Thanks,
Shawn