You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2011/06/15 00:56:59 UTC

[Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by SteveRowe

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by SteveRowe:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=119&rev2=120

Comment:
Escape HTML markup characters in the HTML stripping examples

     * terminating ';' is mandatory to avoid false matches on something like "Alpha&Omega Corp"
  
  HTML stripping examples:
- ||my <a href="www.foo.bar">link</a> ||my link ||
+ ||my &lt;a href="www.foo.bar"&gt;link&lt;/a&gt; ||my link ||
- ||hello<!--comment--> ||hello ||
+ ||&lt;br&gt;hello&lt;!--comment--&gt; ||hello ||
- ||hello<script><-- f('<--internal--></script>'); --></script> ||hello ||
+ ||hello&lt;script&gt;&lt;!-- f('&lt;!--internal--&gt;&lt;/script&gt;'); --&gt;&lt;/script&gt; ||hello ||
- ||if a<b then print a; ||if a<b then print a; ||
+ ||if a&lt;b then print a; ||if a&lt;b then print a; ||
- ||hello <td height=22 nowrap align="left"> ||hello ||
+ ||hello &lt;td height=22 nowrap align="left"&gt; ||hello ||
- ||a<b A Alpha&Omega Ω ||a<b A Alpha&Omega Ω ||
+ ||a&lt;b &amp;#65; Alpha&Omega O ||a&lt;b A Alpha&Omega O ||