You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by aseem cheema <as...@gmail.com> on 2009/11/12 02:07:41 UTC
add XML/HTML documents using SolrJ, without bypassing HTML char
filter
Hey Guys,
How do I add HTML/XML documents using SolrJ such that it does not by
pass the HTML char filter?
SolrJ escapes the HTML/XML value of a field, and that make it bypass
the HTML char filter. For example <center>content</center> if added to
a field with HTMLStripCharFilter on the field using SolrJ, is not
stripped of center tags. But if check in analysis.jsp, it does get
stripped. When I look at the SolrJ XML feed, it looks like this:
<add><doc boost="1.0"><field name="id">http://haha.com</field><field
name="text"><center>content</center></field></doc></add>
Any help is highly appreciated. Thanks.
--
Aseem
Re: add XML/HTML documents using SolrJ, without bypassing HTML char
filter
Posted by aseem cheema <as...@gmail.com>.
Ohhh... you are a life saver... thank you so much.. it makes sense.
Aseem
On Wed, Nov 11, 2009 at 7:40 PM, Ryan McKinley <ry...@gmail.com> wrote:
> The HTMLStripCharFilter will strip the html for the *indexed* terms, it does
> not effect the *stored* field.
>
> If you don't want html in the stored field, can you just strip it out before
> passing to solr?
>
>
> On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:
>
>> Hey Guys,
>> How do I add HTML/XML documents using SolrJ such that it does not by
>> pass the HTML char filter?
>>
>> SolrJ escapes the HTML/XML value of a field, and that make it bypass
>> the HTML char filter. For example <center>content</center> if added to
>> a field with HTMLStripCharFilter on the field using SolrJ, is not
>> stripped of center tags. But if check in analysis.jsp, it does get
>> stripped. When I look at the SolrJ XML feed, it looks like this:
>> <add><doc boost="1.0"><field name="id">http://haha.com</field><field
>> name="text"><center>content</center></field></doc></add>
>>
>> Any help is highly appreciated. Thanks.
>>
>> --
>> Aseem
>
>
--
Aseem
Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter
Posted by Ryan McKinley <ry...@gmail.com>.
The HTMLStripCharFilter will strip the html for the *indexed* terms,
it does not effect the *stored* field.
If you don't want html in the stored field, can you just strip it out
before passing to solr?
On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:
> Hey Guys,
> How do I add HTML/XML documents using SolrJ such that it does not by
> pass the HTML char filter?
>
> SolrJ escapes the HTML/XML value of a field, and that make it bypass
> the HTML char filter. For example <center>content</center> if added to
> a field with HTMLStripCharFilter on the field using SolrJ, is not
> stripped of center tags. But if check in analysis.jsp, it does get
> stripped. When I look at the SolrJ XML feed, it looks like this:
> <add><doc boost="1.0"><field name="id">http://haha.com</field><field
> name="text"><center>content</center></field></doc></add>
>
> Any help is highly appreciated. Thanks.
>
> --
> Aseem