You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shanmugavel SRD <sr...@gmail.com> on 2010/11/23 12:15:05 UTC
copyField is not tokenizing the values at index time
schema.xml config:
<fieldType name="textWordSpell" class="solr.TextField"
positionIncrementGap="100" >
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern=", *" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="spellword" type="textWordSpell" indexed="true"
stored="true" multiValued="true"/>
<copyField source="keywords_t" dest="spellword"/>
feed.xml
<field name="keywords_t"><![CDATA[Internet, Songs, Canada]]></field>
After index if I search for spellword:[* TO *], it displays result like
below.
Actual :
<arr name="spellword">
<str>Internet, Songs, Canada</str>
</arr>
Expected :
<arr name="spellword">
<str>Internet</str>
<str>Songs</str>
<str>Canada</str>
</arr>
Could anyone help me on what configuration I have to make to get the above
mentioned expected output?
--
View this message in context: http://lucene.472066.n3.nabble.com/copyField-is-not-tokenizing-the-values-at-index-time-tp1952756p1952756.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: copyField is not tokenizing the values at index time
Posted by Shanmugavel SRD <sr...@gmail.com>.
Thanks Erick.
--
View this message in context: http://lucene.472066.n3.nabble.com/copyField-is-not-tokenizing-the-values-at-index-time-tp1952756p1958946.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: copyField is not tokenizing the values at index time
Posted by Erick Erickson <er...@gmail.com>.
I think you got fooled by what's returned as a field value. When you store a
field and
later return that field as part of a document, your exact input is returned
*regardless* of what analysis has been done. So your *query* of spellword:[*
to *]
returns the stored value, not the indexed tokens.
I claim that if you examine your index via the admin page for spellword,
you'll see
three distinct tokens. I further claim that if you interrogate your
spellword field with
the spellcheck component, you'll get what you expect. The proof is left as
an
exercise for the reader <G>...
Best
Erick
On Tue, Nov 23, 2010 at 6:15 AM, Shanmugavel SRD
<sr...@gmail.com>wrote:
>
> schema.xml config:
>
> <fieldType name="textWordSpell" class="solr.TextField"
> positionIncrementGap="100" >
> <analyzer>
> <tokenizer class="solr.PatternTokenizerFactory" pattern=", *" />
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> </fieldType>
> </types>
>
> <fields>
> <field name="spellword" type="textWordSpell" indexed="true"
> stored="true" multiValued="true"/>
> <copyField source="keywords_t" dest="spellword"/>
>
> feed.xml
>
> <field name="keywords_t"><![CDATA[Internet, Songs, Canada]]></field>
>
> After index if I search for spellword:[* TO *], it displays result like
> below.
>
> Actual :
> <arr name="spellword">
> <str>Internet, Songs, Canada</str>
> </arr>
>
>
> Expected :
> <arr name="spellword">
> <str>Internet</str>
> <str>Songs</str>
> <str>Canada</str>
> </arr>
>
> Could anyone help me on what configuration I have to make to get the above
> mentioned expected output?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/copyField-is-not-tokenizing-the-values-at-index-time-tp1952756p1952756.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>