You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2011/06/16 21:06:05 UTC
RE: HTMLStripTransformer will remove the content in XML??
FYI: There's a new patch specificly for dealing with xml tags and entities
that handles the CDATA case...
https://issues.apache.org/jira/browse/SOLR-2597
: Date: Fri, 27 May 2011 17:01:26 +0800
: From: Ellery Leung <el...@be-o.com>
: Reply-To: solr-user@lucene.apache.org, elleryleung@be-o.com
: To: solr-user@lucene.apache.org
: Subject: RE: HTMLStripTransformer will remove the content in XML??
:
: Got it. Actually I use solr.MappingCharFilterFactory to replace the <![CDATA[ and ]]> to empty first, and use HTMLStripCharFilterFactory to get "hello" and "solr".
:
: For future reference, here is part of schema.xml
:
: <fieldType name="textMaxWord" class="solr.TextField" >
: <analyzer type="index">
: <charFilter class="solr.MappingCharFilterFactory" mapping="mappings.txt"/>
: <charFilter class="solr.HTMLStripCharFilterFactory" />
: ...
:
: In mappings.txt (2 lines)
:
: "<![CDATA[" => ""
:
: "]]>" => ""
:
: Restart Solr
:
: It works.
:
: Thank you
:
: -----Original Message-----
: From: bryan rasmussen [mailto:rasmussen.bryan@gmail.com]
: Sent: 2011年5月27日 4:20 下午
: To: solr-user@lucene.apache.org; elleryleung@be-o.com
: Subject: Re: HTMLStripTransformer will remove the content in XML??
:
: I would expect that it doesn't understand CDATA and thinks of
: everything between < and > as a 'tag'.
:
: Best Regards,
: Bryan Rasmussen
:
: On Fri, May 27, 2011 at 9:41 AM, Ellery Leung <el...@be-o.com> wrote:
: > I have an XML string like this:
: >
: >
: >
: > <?xml version="1.0"
: > encoding="UTF-8"?><language><intl><![CDATA[hello]]></intl><loc><![CDATA[solr
: > ]]></loc></language>
: >
: >
: >
: > By using HTMLStripTransformer, I expect to get 'hello,solr'.
: >
: >
: >
: > But actual this transformer will remove ALL THE TEXT INSIDE!
: >
: >
: >
: > Did I do something silly, or is it a bug?
: >
: >
: >
: > Thank you
: >
: >
:
:
-Hoss