You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "suleman mubarik (JIRA)" <ji...@apache.org> on 2014/09/11 22:18:34 UTC

[jira] [Created] (LUCENE-5943) HTML strip filter removes text between < and >

suleman mubarik created LUCENE-5943:
---------------------------------------

             Summary: HTML strip filter removes text between < and >
                 Key: LUCENE-5943
                 URL: https://issues.apache.org/jira/browse/LUCENE-5943
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/index
         Environment: Production
            Reporter: suleman mubarik


If I have this as input “I love <pizza  hut> so much”
When I apply html striper it removes “pizza  hut” and I get tokens "i", "love" ,"so", "much"
And these are offsets I get back ((0,1), (2,6), (20,22), (23,27))
Html strip filter should return "i", "love" ,"pizza", "hut", "so", "much"




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org