You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ryan McKinley <ry...@gmail.com> on 2009/11/12 04:40:17 UTC

Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter

The HTMLStripCharFilter will strip the html for the *indexed* terms,  
it does not effect the *stored* field.

If you don't want html in the stored field, can you just strip it out  
before passing to solr?


On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:

> Hey Guys,
> How do I add HTML/XML documents using SolrJ such that it does not by
> pass the HTML char filter?
>
> SolrJ escapes the HTML/XML value of a field, and that make it bypass
> the HTML char filter. For example <center>content</center> if added to
> a field with HTMLStripCharFilter on the field using SolrJ, is not
> stripped of center tags. But if check in analysis.jsp, it does get
> stripped. When I look at the SolrJ XML feed, it looks like this:
> <add><doc boost="1.0"><field name="id">http://haha.com</field><field
> name="text">&lt;center&gt;content&lt;/center&gt;</field></doc></add>
>
> Any help is highly appreciated. Thanks.
>
> -- 
> Aseem


Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter

Posted by aseem cheema <as...@gmail.com>.
Ohhh... you are a life saver... thank you so much.. it makes sense.

Aseem

On Wed, Nov 11, 2009 at 7:40 PM, Ryan McKinley <ry...@gmail.com> wrote:
> The HTMLStripCharFilter will strip the html for the *indexed* terms, it does
> not effect the *stored* field.
>
> If you don't want html in the stored field, can you just strip it out before
> passing to solr?
>
>
> On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:
>
>> Hey Guys,
>> How do I add HTML/XML documents using SolrJ such that it does not by
>> pass the HTML char filter?
>>
>> SolrJ escapes the HTML/XML value of a field, and that make it bypass
>> the HTML char filter. For example <center>content</center> if added to
>> a field with HTMLStripCharFilter on the field using SolrJ, is not
>> stripped of center tags. But if check in analysis.jsp, it does get
>> stripped. When I look at the SolrJ XML feed, it looks like this:
>> <add><doc boost="1.0"><field name="id">http://haha.com</field><field
>> name="text">&lt;center&gt;content&lt;/center&gt;</field></doc></add>
>>
>> Any help is highly appreciated. Thanks.
>>
>> --
>> Aseem
>
>



-- 
Aseem