You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by nick19701 <to...@yahoo.com> on 2007/03/07 00:04:25 UTC

Re: [2] Highlighting problems with HTML tagged fields


Yonik Seeley wrote:
> 
> HTMLStripWhitespaceTokenizerFactory works in two phases...
> HTMLStripReader removes the HTML and passes the result to
> WhitespaceTokenizer... at that point, Tokens are generated, but the
> offsets will correspond to the text after HTML removal, not before.
> 
> I did it this way so that HTMLStripReader  could go before any
> tokenizer (like StandardTokenizer).
> 
> Can you open a JIRA bug for this?  The fix would be a special version
> of HTMLStripReader integrated with a WhitespaceTokenizer to keep
> offsets correct.
> 
> -Yonik
> 
> 
Is there a fix for this problem?

my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory +
highlighting still
doesn't work. All the wrong items are highlighted.
-- 
View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [2] Highlighting problems with HTML tagged fields

Posted by nick19701 <to...@yahoo.com>.

Chris Hostetter wrote:
> 
> 
> patches for issues can't be applied until someone who cares about them
> write them and contribute them for committers to consider/apply :)
> 
> 

it seems I'm one of the very few people who care about this feature :)

Unfortunately my daily languages are c++ and c#. I only know a little bit
Java. Otherwise I'll contribute.

-- 
View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9365098
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [2] Highlighting problems with HTML tagged fields

Posted by Chris Hostetter <ho...@fucit.org>.
: The suggested fix from Mirko seems very simple. Hopefull a patch will be
: applied
: very soon. In the meantime, I'll use my backup solution:

patches for issues can't be applied until someone who cares about them
write them and contribute them for committers to consider/apply :)

-Hoss


Re: [2] Highlighting problems with HTML tagged fields

Posted by nick19701 <to...@yahoo.com>.

Chris Hostetter wrote:
> 
> 
> It is tracked in http://issues.apache.org/jira/browse/SOLR-42
> 
> ...there are currently no patches.
> 
> 

The suggested fix from Mirko seems very simple. Hopefull a patch will be
applied 
very soon. In the meantime, I'll use my backup solution: 
http://fucoder.com/code/se-hilite/ http://fucoder.com/code/se-hilite/ 


-- 
View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9363720
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [2] Highlighting problems with HTML tagged fields

Posted by Chris Hostetter <ho...@fucit.org>.
It is tracked in http://issues.apache.org/jira/browse/SOLR-42

...there are currently no patches.


: Date: Tue, 6 Mar 2007 15:04:25 -0800 (PST)
: From: nick19701 <to...@yahoo.com>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: [2] Highlighting problems with HTML tagged fields
:
:
:
: Yonik Seeley wrote:
: >
: > HTMLStripWhitespaceTokenizerFactory works in two phases...
: > HTMLStripReader removes the HTML and passes the result to
: > WhitespaceTokenizer... at that point, Tokens are generated, but the
: > offsets will correspond to the text after HTML removal, not before.
: >
: > I did it this way so that HTMLStripReader  could go before any
: > tokenizer (like StandardTokenizer).
: >
: > Can you open a JIRA bug for this?  The fix would be a special version
: > of HTMLStripReader integrated with a WhitespaceTokenizer to keep
: > offsets correct.
: >
: > -Yonik
: >
: >
: Is there a fix for this problem?
:
: my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory +
: highlighting still
: doesn't work. All the wrong items are highlighted.
: --
: View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253
: Sent from the Solr - User mailing list archive at Nabble.com.
:



-Hoss