You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ranveer Kumar <ra...@gmail.com> on 2010/02/05 14:15:33 UTC

html tag problem while searching

Hi All,

I have problem related to html tag.

Basically in database some column carry html tage, for example"
<p> Hello how are you? </p>
I am indexing same as it is in index.

I am filtering solr supported special character at query time.

now the problem is when I am searching by "p" then result is
*<p> Hello how are you? </p>*
I dont want to search in html tag content?

please help?

thanks

Re: html tag problem while searching

Posted by Ahmet Arslan <io...@yahoo.com>.
> I have problem related to html tag.
> 
> Basically in database some column carry html tage, for
> example"
> <p> Hello how are you? </p>
> I am indexing same as it is in index.
> 
> I am filtering solr supported special character at query
> time.
> 
> now the problem is when I am searching by "p" then result
> is
> *<p> Hello how are you? </p>*
> I dont want to search in html tag content?
> 
> please help?

You can remove html tags in analysis phase with HTMLStripCharFilterFactory[1]. 
With this, searching p wont return *<p> Hello how are you? </p>* anymore.
But when you search hello, returned document will still contain <p> tags.

If you do not want this behavior ( what only *Hello how are you?* ) and you are using DIH, you can use HTMLStripTransformer[2]

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
[2]http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer