You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Christian <en...@gmail.com> on 2011/07/07 00:05:33 UTC

Getting the indexed value rather than the stored value

Hi,

I am using the following fieldType to change the index of a given field:

    <fieldtype name="profanity" stored="false" indexed="true"
class="solr.TextField" >
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.KeepWordFilterFactory" words="profane-words.txt"
ignoreCase="true"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="^(.+)$"
replacement="YES"
replace="all"/>
      </analyzer>
  <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
      </analyzer>
    </fieldtype>


This is great for finding all things with or without profanity (as separate
queries), but I would like to get the value as part of a the query and let
the consumer of the call decide what to do with the data.

Is there a way to do this w/o having to instantiate a KeepWordFilterFactory
in the Java class that is responsible for inserting the document into Solr?
For example, I know that I can do this in Java code during the insert, but I
would rather get the indexed value (the one that shows up when faceting).

Please let me know if this is not clear.

Thanks,
Christian

Re: Getting the indexed value rather than the stored value

Posted by Chris Hostetter <ho...@fucit.org>.
: However, when I get the value of the field from a Solr query, I get the
: original sentence ("some sentence like this") which is not what I want (in
: this particular case).

the stored field is allways the original stored value -- analysis is only 
used for producing the indexed terms.

: For now, i ended up creating a custom updateprocessor and configured it in
: solrconfig.xml, but I would still like to know if there's a way through the
: SOLR API to get the actual indexed value (like the way the SOLR api does it)

an updateprocessor is definiltey the "right" way to go about a problem 
like this.

Solr actually doesn't have an efficient way to get the indexed values for 
a document, the very nature of hte indexed values is that they are an 
*inverted* index -- it's efficient to go from indexed term -> doc, not the 
other way arround.

The caveat to this is that things like the FieldCache and UnInvertedField 
can be used internally for fast lookup of indexed terms but they have 
heavy initialization cost to build up these data structures for each 
newSearcher.

Bottom line: an updateprocessor (or generating this value in your indexing 
code) is the way to go.

-Hoss

Re: Getting the indexed value rather than the stored value

Posted by Christian <en...@gmail.com>.
Hi Gora,

The problem I am finding is that the copyField directive sends the original
value to the new field type.
The field type then munges the index until it's completely different
(original -> some sentence this like, index -> true), but the stored value
is still the original sentence.

When I facet on this field, I get the actual "true"/"false" value in the
index (which is what i want) and not the stored value "some sentence like
this".

However, when I get the value of the field from a Solr query, I get the
original sentence ("some sentence like this") which is not what I want (in
this particular case).

So in other words, I want to pull a document out of Solr with the indexed
value of one of the fields ("true") and not the stored value "some sentence
like this"). I understand that Faceting does this for me, but I don't just
want the # of docs that are 'true', but I want the actual document with its
field set to "true".

For now, i ended up creating a custom updateprocessor and configured it in
solrconfig.xml, but I would still like to know if there's a way through the
SOLR API to get the actual indexed value (like the way the SOLR api does it)
or to change the stored value and set it equal to the indexed value (for
example, after the stop word filter, the index becomes the original sentence
minus the stop words and the stored value remains the same. Is there a way
to change the stored value to the be same as the indexed value?).

Thanks,
Christian
On Thu, Jul 7, 2011 at 1:05 PM, Gora Mohanty <go...@mimirtech.com> wrote:

> On Thu, Jul 7, 2011 at 3:35 AM, Christian <en...@gmail.com> wrote:
> [...]
> > This is great for finding all things with or without profanity (as
> separate
> > queries), but I would like to get the value as part of a the query and
> let
> > the consumer of the call decide what to do with the data.
> >
> > Is there a way to do this w/o having to instantiate a
> KeepWordFilterFactory
> > in the Java class that is responsible for inserting the document into
> Solr?
> > For example, I know that I can do this in Java code during the insert,
> but I
> > would rather get the indexed value (the one that shows up when faceting).
> >
> > Please let me know if this is not clear.
>
> Not sure that I follow what you are after:
> * If you are using this field as part of faceting, the facet values should
> be
>  what you are after.
> * In the general case, if you want to retrieve the actual value, you should
>  have both indexed=true, and stored=true.
>
> Regards,
> Gora
>

Re: Getting the indexed value rather than the stored value

Posted by Gora Mohanty <go...@mimirtech.com>.
On Thu, Jul 7, 2011 at 3:35 AM, Christian <en...@gmail.com> wrote:
[...]
> This is great for finding all things with or without profanity (as separate
> queries), but I would like to get the value as part of a the query and let
> the consumer of the call decide what to do with the data.
>
> Is there a way to do this w/o having to instantiate a KeepWordFilterFactory
> in the Java class that is responsible for inserting the document into Solr?
> For example, I know that I can do this in Java code during the insert, but I
> would rather get the indexed value (the one that shows up when faceting).
>
> Please let me know if this is not clear.

Not sure that I follow what you are after:
* If you are using this field as part of faceting, the facet values should be
  what you are after.
* In the general case, if you want to retrieve the actual value, you should
  have both indexed=true, and stored=true.

Regards,
Gora