You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Willie Whitehead <bw...@gmail.com> on 2010/03/22 19:06:11 UTC
Correct way to use tokenizer for whitespace
Hi,
In my schema.xml, I am trying to remove whitespace from a multivalued
field as they come from the database. Is this the correct way:
<fieldType name="size" class="solr.TextField">
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldType>
I do not believe this is working.
Thanks!
Re: Correct way to use tokenizer for whitespace
Posted by Ahmet Arslan <io...@yahoo.com>.
> Thank you. I tried that but it did
> not work to remove trailing spaces.
> I believe this is why my size facet queries are not
> working. After
> reloading, the XML result entries still have:
>
> <arr name="size">
> <str>LARGE </str>
> <str>MEDIUM </str>
> <str>SMALL </str>
> </arr>
>
> I am using this:
> <fieldType name="size" class="solr.TextField">
> <analyzer>
> <tokenizer
> class="solr.StandardTokenizerFactory"/>
> </analyzer>
> </fieldType>
>
> And here is my size field:
> <field name="size" type="string"
> indexed="true" stored="true"
> multiValued="true" required="false"/>
The problem is you are using string type (type="string") here. Which is not analyzed. It should be :
<field name="size" type="size" indexed="true" stored="true"
multiValued="true" required="false"/>
Re: Correct way to use tokenizer for whitespace
Posted by Willie Whitehead <bw...@gmail.com>.
Thank you. I tried that but it did not work to remove trailing spaces.
I believe this is why my size facet queries are not working. After
reloading, the XML result entries still have:
<arr name="size">
<str>LARGE </str>
<str>MEDIUM </str>
<str>SMALL </str>
</arr>
I am using this:
<fieldType name="size" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
And here is my size field:
<field name="size" type="string" indexed="true" stored="true"
multiValued="true" required="false"/>
I did not know what difference this does:
<analyzer type="query">
vs this:
<analyzer type="index">
But it appears I do not need that part.
On Mon, Mar 22, 2010 at 2:12 PM, Ahmet Arslan <io...@yahoo.com> wrote:
>
>> In my schema.xml, I am trying to remove whitespace from a
>> multivalued
>> field as they come from the database. Is this the correct
>> way:
>>
>> <fieldType name="size"
>> class="solr.TextField">
>> <analyzer type="query">
>> <tokenizer
>> class="solr.StandardTokenizerFactory"/>
>> <filter
>> class="solr.TrimFilterFactory" />
>> </analyzer>
>> </fieldType>
>>
>> I do not believe this is working.
>
> TrimFilterFactory trims leading and trailing white-spaces. But StandardTokenizerFactory already eats up white-spaces. In other words it is meaningless to use it with StandardTokenizerFactory.
>
> In your field type definition you specified only query analyzer but not index analyzer. You can use this directly:
>
> <fieldType name="size" class="solr.TextField">
> <analyzer>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> </analyzer>
> </fieldType>
>
> What do you mean by removing whitespace from a multivalued field as they come from the database?
>
>
>
>
Re: Correct way to use tokenizer for whitespace
Posted by Ahmet Arslan <io...@yahoo.com>.
> In my schema.xml, I am trying to remove whitespace from a
> multivalued
> field as they come from the database. Is this the correct
> way:
>
> <fieldType name="size"
> class="solr.TextField">
> <analyzer type="query">
> <tokenizer
> class="solr.StandardTokenizerFactory"/>
> <filter
> class="solr.TrimFilterFactory" />
> </analyzer>
> </fieldType>
>
> I do not believe this is working.
TrimFilterFactory trims leading and trailing white-spaces. But StandardTokenizerFactory already eats up white-spaces. In other words it is meaningless to use it with StandardTokenizerFactory.
In your field type definition you specified only query analyzer but not index analyzer. You can use this directly:
<fieldType name="size" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
What do you mean by removing whitespace from a multivalued field as they come from the database?