You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Willie Whitehead <bw...@gmail.com> on 2010/03/22 19:06:11 UTC

Correct way to use tokenizer for whitespace

Hi,

In my schema.xml, I am trying to remove whitespace from a multivalued
field as they come from the database. Is this the correct way:

   <fieldType name="size" class="solr.TextField">
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
		<filter class="solr.TrimFilterFactory" />
      </analyzer>
    </fieldType>

I do not believe this is working.

Thanks!

Re: Correct way to use tokenizer for whitespace

Posted by Ahmet Arslan <io...@yahoo.com>.
> Thank you. I tried that but it did
> not work to remove trailing spaces.
> I believe this is why my size facet queries are not
> working. After
> reloading, the XML result entries still have:
> 
> <arr name="size">
> <str>LARGE     </str>
> <str>MEDIUM    </str>
> <str>SMALL     </str>
> </arr>
> 
> I am using this:
> <fieldType name="size" class="solr.TextField">
>     <analyzer>
>     <tokenizer
> class="solr.StandardTokenizerFactory"/>
>     </analyzer>
> </fieldType>
> 
> And here is my size field:
>     <field name="size" type="string"
> indexed="true" stored="true"
> multiValued="true" required="false"/>

The problem is you are using string type (type="string") here. Which is not analyzed. It should be :

<field name="size" type="size" indexed="true" stored="true"
multiValued="true" required="false"/>




      

Re: Correct way to use tokenizer for whitespace

Posted by Willie Whitehead <bw...@gmail.com>.
Thank you. I tried that but it did not work to remove trailing spaces.
I believe this is why my size facet queries are not working. After
reloading, the XML result entries still have:

<arr name="size">
<str>LARGE     </str>
<str>MEDIUM    </str>
<str>SMALL     </str>
</arr>

I am using this:
<fieldType name="size" class="solr.TextField">
    <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    </analyzer>
</fieldType>

And here is my size field:
    <field name="size" type="string" indexed="true" stored="true"
multiValued="true" required="false"/>



I did not know what difference this does:
<analyzer type="query">

vs this:

<analyzer type="index">

But it appears I do not need that part.





On Mon, Mar 22, 2010 at 2:12 PM, Ahmet Arslan <io...@yahoo.com> wrote:
>
>> In my schema.xml, I am trying to remove whitespace from a
>> multivalued
>> field as they come from the database. Is this the correct
>> way:
>>
>>    <fieldType name="size"
>> class="solr.TextField">
>>       <analyzer type="query">
>>         <tokenizer
>> class="solr.StandardTokenizerFactory"/>
>>         <filter
>> class="solr.TrimFilterFactory" />
>>       </analyzer>
>>     </fieldType>
>>
>> I do not believe this is working.
>
> TrimFilterFactory trims leading and trailing white-spaces. But StandardTokenizerFactory already eats up white-spaces. In other words it is meaningless to use it with StandardTokenizerFactory.
>
> In your field type definition you specified only query analyzer but not index analyzer. You can use this directly:
>
> <fieldType name="size" class="solr.TextField">
>     <analyzer>
>     <tokenizer class="solr.StandardTokenizerFactory"/>
>     </analyzer>
> </fieldType>
>
> What do you mean by removing whitespace from a multivalued field as they come from the database?
>
>
>
>

Re: Correct way to use tokenizer for whitespace

Posted by Ahmet Arslan <io...@yahoo.com>.
> In my schema.xml, I am trying to remove whitespace from a
> multivalued
> field as they come from the database. Is this the correct
> way:
> 
>    <fieldType name="size"
> class="solr.TextField">
>       <analyzer type="query">
>         <tokenizer
> class="solr.StandardTokenizerFactory"/>
>         <filter
> class="solr.TrimFilterFactory" />
>       </analyzer>
>     </fieldType>
> 
> I do not believe this is working.

TrimFilterFactory trims leading and trailing white-spaces. But StandardTokenizerFactory already eats up white-spaces. In other words it is meaningless to use it with StandardTokenizerFactory.

In your field type definition you specified only query analyzer but not index analyzer. You can use this directly:

<fieldType name="size" class="solr.TextField">
    <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>        
    </analyzer>
</fieldType>

What do you mean by removing whitespace from a multivalued field as they come from the database?