You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2010/07/05 16:36:37 UTC

[Solr Wiki] Update of "SchemaDesign" by KojiSekiguchi

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SchemaDesign" page has been changed by KojiSekiguchi.
The comment on this change is: No more CharStreamAwareTokenizers needed.
http://wiki.apache.org/solr/SchemaDesign?action=diff&rev1=10&rev2=11

--------------------------------------------------

  Searching text in different languages is very difficult. The Latin1Accent filters downgrade all European "special characters" down to their US Ascii equivalents: the French spelling ''protégé'' becomes the English spelling ''protege''. 
  In Solr-1.3, use this in the filter stack of your "text" field type:
  {{{
+ <tokenizer class="solr.WhitespaceTokenizer" />
  <filter class="solr.ISOLatin1AccentFilterFactory" />
  }}}
  In Solr-1.4, use this:
  {{{
  <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
+ <tokenizer class="solr.WhitespaceTokenizer" />
  }}}
  
- At the moment you must also use this tokenizer with solr.MappingCharFilterFactory:
- {{{
- <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
- }}}
- Otherwise you will get errors (potentially including fatal, uncaught exceptions) when using the lucene highlighter, etc: 
-