You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by nick19701 <to...@yahoo.com> on 2007/03/07 00:04:25 UTC
Re: [2] Highlighting problems with HTML tagged fields
Yonik Seeley wrote:
>
> HTMLStripWhitespaceTokenizerFactory works in two phases...
> HTMLStripReader removes the HTML and passes the result to
> WhitespaceTokenizer... at that point, Tokens are generated, but the
> offsets will correspond to the text after HTML removal, not before.
>
> I did it this way so that HTMLStripReader could go before any
> tokenizer (like StandardTokenizer).
>
> Can you open a JIRA bug for this? The fix would be a special version
> of HTMLStripReader integrated with a WhitespaceTokenizer to keep
> offsets correct.
>
> -Yonik
>
>
Is there a fix for this problem?
my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory +
highlighting still
doesn't work. All the wrong items are highlighted.
--
View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253
Sent from the Solr - User mailing list archive at Nabble.com.
Re: [2] Highlighting problems with HTML tagged fields
Posted by nick19701 <to...@yahoo.com>.
Chris Hostetter wrote:
>
>
> patches for issues can't be applied until someone who cares about them
> write them and contribute them for committers to consider/apply :)
>
>
it seems I'm one of the very few people who care about this feature :)
Unfortunately my daily languages are c++ and c#. I only know a little bit
Java. Otherwise I'll contribute.
--
View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9365098
Sent from the Solr - User mailing list archive at Nabble.com.
Re: [2] Highlighting problems with HTML tagged fields
Posted by Chris Hostetter <ho...@fucit.org>.
: The suggested fix from Mirko seems very simple. Hopefull a patch will be
: applied
: very soon. In the meantime, I'll use my backup solution:
patches for issues can't be applied until someone who cares about them
write them and contribute them for committers to consider/apply :)
-Hoss
Re: [2] Highlighting problems with HTML tagged fields
Posted by nick19701 <to...@yahoo.com>.
Chris Hostetter wrote:
>
>
> It is tracked in http://issues.apache.org/jira/browse/SOLR-42
>
> ...there are currently no patches.
>
>
The suggested fix from Mirko seems very simple. Hopefull a patch will be
applied
very soon. In the meantime, I'll use my backup solution:
http://fucoder.com/code/se-hilite/ http://fucoder.com/code/se-hilite/
--
View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9363720
Sent from the Solr - User mailing list archive at Nabble.com.
Re: [2] Highlighting problems with HTML tagged fields
Posted by Chris Hostetter <ho...@fucit.org>.
It is tracked in http://issues.apache.org/jira/browse/SOLR-42
...there are currently no patches.
: Date: Tue, 6 Mar 2007 15:04:25 -0800 (PST)
: From: nick19701 <to...@yahoo.com>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: [2] Highlighting problems with HTML tagged fields
:
:
:
: Yonik Seeley wrote:
: >
: > HTMLStripWhitespaceTokenizerFactory works in two phases...
: > HTMLStripReader removes the HTML and passes the result to
: > WhitespaceTokenizer... at that point, Tokens are generated, but the
: > offsets will correspond to the text after HTML removal, not before.
: >
: > I did it this way so that HTMLStripReader could go before any
: > tokenizer (like StandardTokenizer).
: >
: > Can you open a JIRA bug for this? The fix would be a special version
: > of HTMLStripReader integrated with a WhitespaceTokenizer to keep
: > offsets correct.
: >
: > -Yonik
: >
: >
: Is there a fix for this problem?
:
: my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory +
: highlighting still
: doesn't work. All the wrong items are highlighted.
: --
: View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253
: Sent from the Solr - User mailing list archive at Nabble.com.
:
-Hoss