You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by zqzuk <zi...@hotmail.com> on 2008/01/21 22:24:04 UTC

illegal characters in xml file to be posted?

Hi, I am using the SimplePostTool to post files to solr. I have encoutered
some problem with the content of xml files. I noticed that if my xml file
has fields whose values contain the character "&" or "<" or ">", the post
fails and I get the exception :

"javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y]
Message: The entity name must immediately follow the '&' in the entity
reference"

Looks like these characters are illegal in xml as embedded contents - but I
did extract them from xml in the first place. Is there a list of such
characters I need to deal with before I pass that to SimplePostTool?

Thanks!
-- 
View this message in context: http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp15006748p15006748.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: illegal characters in xml file to be posted?

Posted by zqzuk <zi...@hotmail.com>.
Thanks for the quick advice!


pbinkley wrote:
> 
> You should encode those three characters, and it doesn't hurt to encode
> the ampersand and double-quote characters too:
> http://en.wikipedia.org/wiki/XML#Entity_references
> 
> Peter 
> 
> -----Original Message-----
> From: zqzuk [mailto:ziqi.zhang@hotmail.com] 
> Sent: Monday, January 21, 2008 2:24 PM
> To: solr-user@lucene.apache.org
> Subject: illegal characters in xml file to be posted?
> 
> 
> Hi, I am using the SimplePostTool to post files to solr. I have
> encoutered some problem with the content of xml files. I noticed that if
> my xml file has fields whose values contain the character "&" or "<" or
> ">", the post fails and I get the exception :
> 
> "javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y]
> Message: The entity name must immediately follow the '&' in the entity
> reference"
> 
> Looks like these characters are illegal in xml as embedded contents -
> but I did extract them from xml in the first place. Is there a list of
> such characters I need to deal with before I pass that to
> SimplePostTool?
> 
> Thanks!
> --
> View this message in context:
> http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp150
> 06748p15006748.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp15006748p15007840.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: illegal characters in xml file to be posted?

Posted by "Binkley, Peter" <Pe...@ualberta.ca>.
You should encode those three characters, and it doesn't hurt to encode
the ampersand and double-quote characters too:
http://en.wikipedia.org/wiki/XML#Entity_references

Peter 

-----Original Message-----
From: zqzuk [mailto:ziqi.zhang@hotmail.com] 
Sent: Monday, January 21, 2008 2:24 PM
To: solr-user@lucene.apache.org
Subject: illegal characters in xml file to be posted?


Hi, I am using the SimplePostTool to post files to solr. I have
encoutered some problem with the content of xml files. I noticed that if
my xml file has fields whose values contain the character "&" or "<" or
">", the post fails and I get the exception :

"javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y]
Message: The entity name must immediately follow the '&' in the entity
reference"

Looks like these characters are illegal in xml as embedded contents -
but I did extract them from xml in the first place. Is there a list of
such characters I need to deal with before I pass that to
SimplePostTool?

Thanks!
--
View this message in context:
http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp150
06748p15006748.html
Sent from the Solr - User mailing list archive at Nabble.com.