You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2011/06/16 21:06:05 UTC

RE: HTMLStripTransformer will remove the content in XML??

FYI: There's a new patch specificly for dealing with xml tags and entities 
that handles the CDATA case...

https://issues.apache.org/jira/browse/SOLR-2597

: Date: Fri, 27 May 2011 17:01:26 +0800
: From: Ellery Leung <el...@be-o.com>
: Reply-To: solr-user@lucene.apache.org, elleryleung@be-o.com
: To: solr-user@lucene.apache.org
: Subject: RE: HTMLStripTransformer will remove the content in XML??
: 
: Got it.  Actually I use solr.MappingCharFilterFactory to replace the <![CDATA[ and ]]> to empty first, and use HTMLStripCharFilterFactory to get "hello" and "solr".
: 
: For future reference, here is part of schema.xml
: 
: <fieldType name="textMaxWord" class="solr.TextField" >
: 	<analyzer type="index">
: 		<charFilter class="solr.MappingCharFilterFactory" mapping="mappings.txt"/>
: 		<charFilter class="solr.HTMLStripCharFilterFactory" />
: ...
: 
: In mappings.txt (2 lines)
: 
: "<![CDATA[" => ""
: 
: "]]>" => ""
: 
: Restart Solr
: 
: It works.
: 
: Thank you
: 
: -----Original Message-----
: From: bryan rasmussen [mailto:rasmussen.bryan@gmail.com] 
: Sent: 2011年5月27日 4:20 下午
: To: solr-user@lucene.apache.org; elleryleung@be-o.com
: Subject: Re: HTMLStripTransformer will remove the content in XML??
: 
: I would expect that it doesn't understand CDATA and thinks of
: everything between < and > as a 'tag'.
: 
: Best Regards,
: Bryan Rasmussen
: 
: On Fri, May 27, 2011 at 9:41 AM, Ellery Leung <el...@be-o.com> wrote:
: > I have an XML string like this:
: >
: >
: >
: > <?xml version="1.0"
: > encoding="UTF-8"?><language><intl><![CDATA[hello]]></intl><loc><![CDATA[solr
: > ]]></loc></language>
: >
: >
: >
: > By using HTMLStripTransformer, I expect to get 'hello,solr'.
: >
: >
: >
: > But actual this transformer will remove ALL THE TEXT INSIDE!
: >
: >
: >
: > Did I do something silly, or is it a bug?
: >
: >
: >
: > Thank you
: >
: >
: 
: 

-Hoss