You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lyndon Maydwell <ma...@gmail.com> on 2007/08/10 09:25:22 UTC
Snippet contents.
I've noticed that the snippets returned in nutch's search seem to have
the formatting added to them, and are then escaped into xml strings.
How would I go about changing the process so that the content was
escaped, then formatting added, then the snippet escaped?
the reason I want this is so that I can return valid xml with the
formatting as xml entities, but the actual snippet text escaped.
example of how nutch does it:
origional text:
"red fox & lazy dog"
formatting applied:
"red <span class="highlight">fox</span> & lazy dog"
escaped:
"red <span class="highlight">fox</span> & dog"
example of what I'm after:
origional text:
"red fox & lazy dog"
escaped text"
"red fox & lazy dog"
formatting applied:
"red <span class="highlight">fox</span> & lazy dog"
escaped:
"red <span class="highlight">fox</span> &amp; lazy dog"