You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexandre Rafalovitch <ar...@gmail.com> on 2013/09/20 02:52:38 UTC

Re: Question on ICUFoldingFilterFactory

What do you mean by "output"? Are you looking at fields in returned
documents? In which case you should see original stored field. Or are you -
for example - looking at facet/group values which are using tokenized
post-processed results?

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Sep 20, 2013 at 2:22 AM, Nemani, Raj <Ra...@turner.com> wrote:

> Hello,
>
> I was wondering if anybody who has experience with ICUFoldingFilterFactory
> can help out with the following issue.  Thank you so much in advance.
>
> Raj
>
> ------------------------------------------------------------------
>
> Problem:
> When a document is created/updated, the value's casing is indexed
> properly. However, when it's queried, the value is returned in lowercase.
> Example:
> Document input: NBAE
> Document value: NBAE
> Query input: NBAE,nbae,Nbae...etc
> Query Output: nbae
>
> If I remove the ICUFoldingFilterFactory filter, the casing problem goes
> away, but I then searches for nbae (lowercase) or Nbae (mix case) return no
> values.
>
>
> Field Type:
> <fieldType name="text_phrase" class="solr.TextField"
> positionIncrementGap="20" autoGeneratePhraseQueries="true">
>       <analyzer>
>                                 <filter
> class="solr.PatternReplaceFilterFactory" pattern="\s&amp;\s"
> replacement="\sand\s"/>
>                                 <charFilter
> class="solr.PatternReplaceCharFilterFactory"
> pattern="[\p{Punct}\u00BF\u00A1]" replaceWith=" "/>
>                     <tokenizer class="solr.KeywordTokenizerFactory"/>
>                                 <filter class="solr.TrimFilterFactory" />
>                                 <filter
> class="solr.PatternReplaceFilterFactory" pattern="[\p{Cntrl}]"
> replacement=""/>
>         <filter class="solr.ICUFoldingFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_en.txt" enablePositionIncrements="true" />
>       </analyzer>
>     </fieldType>
>
>
> Let me know if that makes sense. I'm curious if the
> solr.ICUFoldingFilterFactory has additional attributes that I can use to
> control the casing behavior but retain it's other filtering properties
> (ASCIIFoldingFilter,  and ICUNormalizer2Filter)
>
> Thanks!!!
>
>