You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ranveer Kumar <ra...@gmail.com> on 2010/02/05 14:15:33 UTC
html tag problem while searching
Hi All,
I have problem related to html tag.
Basically in database some column carry html tage, for example"
<p> Hello how are you? </p>
I am indexing same as it is in index.
I am filtering solr supported special character at query time.
now the problem is when I am searching by "p" then result is
*<p> Hello how are you? </p>*
I dont want to search in html tag content?
please help?
thanks
Re: html tag problem while searching
Posted by Ahmet Arslan <io...@yahoo.com>.
> I have problem related to html tag.
>
> Basically in database some column carry html tage, for
> example"
> <p> Hello how are you? </p>
> I am indexing same as it is in index.
>
> I am filtering solr supported special character at query
> time.
>
> now the problem is when I am searching by "p" then result
> is
> *<p> Hello how are you? </p>*
> I dont want to search in html tag content?
>
> please help?
You can remove html tags in analysis phase with HTMLStripCharFilterFactory[1].
With this, searching p wont return *<p> Hello how are you? </p>* anymore.
But when you search hello, returned document will still contain <p> tags.
If you do not want this behavior ( what only *Hello how are you?* ) and you are using DIH, you can use HTMLStripTransformer[2]
[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
[2]http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer