You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fiz Newyorker <fi...@gmail.com> on 2017/12/04 20:37:49 UTC

Index Content Removing the HTML Tags.

Hello Solr Group,

Good Morning !

I am working on Solr 6.5 version and I am trying to Index from Mongo DB
3.2.5.

I have content collection in mongodb where there is body column which has
html tags in it.
I want to index body column with out html tags.

*Please see the below body column data in mongodb*

"<body><p>i cant hear the other side but they can hear me we are both using
same android software and Note 4 what seems to be the problem on her phone
that i cant hear her on messenger</p></body>"

I want to index only the content , I don't want html tags to be indexed and
searched.

Please let me know how to go about this .


Thanks
Fiz Ahmed.

Re: Index Content Removing the HTML Tags.

Posted by Erick Erickson <er...@gmail.com>.
Have you tried: HtmlStripCharFilterFactory?

On Mon, Dec 4, 2017 at 12:37 PM, Fiz Newyorker <fi...@gmail.com> wrote:
> Hello Solr Group,
>
> Good Morning !
>
> I am working on Solr 6.5 version and I am trying to Index from Mongo DB
> 3.2.5.
>
> I have content collection in mongodb where there is body column which has
> html tags in it.
> I want to index body column with out html tags.
>
> *Please see the below body column data in mongodb*
>
> "<body><p>i cant hear the other side but they can hear me we are both using
> same android software and Note 4 what seems to be the problem on her phone
> that i cant hear her on messenger</p></body>"
>
> I want to index only the content , I don't want html tags to be indexed and
> searched.
>
> Please let me know how to go about this .
>
>
> Thanks
> Fiz Ahmed.