You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2011/07/08 08:56:31 UTC

[Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by Bill Bell

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by Bill Bell:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=121&rev2=122

  
  === solr.HTMLStripCharFilterFactory ===
  Creates `org.apache.solr.analysis.HTMLStripCharFilter`. `HTMLStripCharFilter` strips HTML from the input stream and passes the result to either `CharFilter` or `Tokenizer`.  Like other CharFilters, it's specified using a <charFilter> tag, and must come before the <tokenizer>.  An example:
+ 
  {{{
  <analyzer>
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
@@ -116, +117 @@

    <filter class="solr.StandardFilterFactory"/>
  </analyzer>
  }}}
- 
  HTML stripping features:
  
   * The input need not be an HTML document as only constructs that look like HTML will be removed.
@@ -134, +134 @@

     * terminating '`;`' is mandatory to avoid false matches on something like "`Alpha&Omega Corp`"
  
  HTML stripping examples:
- ||{{{my <a href="www.foo.bar">link</a> }}}||`my link `||
+ ||{{{my <a href="www.foo.bar">link</a> }}} ||`my link ` ||
- ||{{{<br>hello<!--comment--> }}}||`hello `||
+ ||{{{<br>hello<!--comment--> }}} ||`hello ` ||
- ||{{{hello<script><!-- f('<!--internal--></script>'); --></script> }}}||`hello `||
+ ||{{{hello<script><!-- f('<!--internal--></script>'); --></script> }}} ||`hello ` ||
- ||{{{if a<b then print a; }}}||`if a<b then print a; `||
+ ||{{{if a<b then print a; }}} ||`if a<b then print a; ` ||
- ||{{{hello <td height=22 nowrap align="left"> }}}||`hello `||
+ ||{{{hello <td height=22 nowrap align="left"> }}} ||`hello ` ||
- ||{{{a<b &#65; Alpha&Omega O}}} ||`a<b A Alpha&Omega O `||
+ ||{{{a<b &#65; Alpha&Omega O}}} ||`a<b A Alpha&Omega O ` ||
- ||{{{M&eacute;xico}}}||`México`||
+ ||{{{M&eacute;xico}}} ||`México` ||