You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@xmlbeans.apache.org by Lawrence Jones <lj...@bea.com> on 2006/01/03 22:22:35 UTC

RE: AW: Illegal characters, can xmlbeans be forgiving?

Just to chime in a bit here - I worked on I18N for a different product and can help a little here. Unfortunately I don't think there's much XMLBeans (or even the parser) can do - the errors below are not fatal - in fact the parser doesn't even know that they are errors.

 

The fact that you can take a String, convert it to bytes and have the parser read those bytes assuming that they're a different encoding is not necessarily something that the parser (or anything else) can pick up. For noticing errors, what matters is whether the bytes, once they've been produced by encoding a string in the "charsetBytes" encoding represent a valid encoding of _any_ set of characters in the "charsetDocument" encoding. If they do then the parser will treat them as those characters and will not (and cannot) complain (unless you happen to produce chars which are invalid under XML, e.g. \u0001, which then means the document is not well-formed).

 

Let me explain a bit more - suppose you have a String of characters ABCD (ABCD are not literal - they each stand for "some character"). Translate that under charsetBytes to a set of bytes abcdefgh (each of a, b, c, etc. represents a byte - not a character). Now translate back from abcdefgh to a string XYZ using encoding charsetDocument (XYZ may or may not be the same as ABCD). If this process succeeds i.e. if there exists _any_ string which is validly encoded as abcdefgh under encoding charsetDocument then the parser has _no_ way of telling that that wasn't what you meant to do. You passed it a series of bytes and a character encoding. It translated that into a series of characters in a valid way. It has no way of knowing whether the characters it is producing are the characters you intended.

 

Some character encodings (e.g. ISO-8859-1) have mappings from _every_ possible byte combination to a valid character or set of characters and as such they will never cause an exception below (though they may or may not represent the original characters you input). (But try putting in an xmlText() call and you'll see that the output document has different chars than the input one).

 

Some encodings (e.g. UTF-8) do have a certain structure that they expect of the bytes and if you fail to follow that structure an exception is thrown. E.g. in your own test if you call:

 

        test ("UTF-8", "ISO-8859-1");

 

You will get a java.io.CharConversionException from the parser. If you print out the stack trace for this you'll see something like:

 

java.io.CharConversionException: Malformed UTF-8 character: 0xe4 0xe3 0x61

        at org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:108)

 

Note that validation has little or nothing to do with this. If you end up producing characters which are invalid XML characters then the document is not well-formed and this should be caught well before XmlBeans would attempt validation. However such characters are relatively rare so it's not something you can rely on. Any other result means you have valid XML chars - validation might catch this if you had a restriction on which chars are allowed and you happened to produce ones which are not allowed but again this is rare and with most schemas there is nothing for validation to catch here.

 

On the other hand, the very last output from your program I do find interesting - in that I would expect it to throw an UnsupportedEncodingException just as for the other ones. I haven't a lot of experience with UTF-16. I think in this case the parser may be reading the byte order mark and using it regardless of anything else. If there is both a byte order mark _and_ an encoding decl which is inconsistent with the byte order mark then (according to http://www.xml.com/axml/testaxml.htm section 4.3.3) I think that the parser should error but I don't think that's happening at the moment.

 

Sorry - I know that doesn't help a lot - but in general in Java if you pass a series of bytes and an encoding to a Reader and if the Reader can then read those bytes according to that encoding then there is no way that you can trap that either the bytes or the encoding were in some way "wrong".

 

Cheers,

 

Lawrence

 

________________________________

From: maarten [mailto:maartenb@dns.be] 
Sent: Friday, December 30, 2005 2:41 AM
To: user@xmlbeans.apache.org
Subject: Re: AW: Illegal characters, can xmlbeans be forgiving?

 

profos@rspd.ch wrote: 

Hi all,
 
concerning the question of whether XmlBeans should enforce the "environmental" rules about encoding and by consequence about erroneous sequences, I have a sligthly different viewpoint: I agree that - given an encoding indication - the parser should detect and reject erroneous sentences. However, I don't see the strict necessity of the presence of such specifications: Think of internal parameter files of configurations or user setup which are strictly internal to the application using it - here I don't see any reason why to enforce such rules. I therefore would opt for XmlBean Options enabling to switch on or off the rigourous enforcing of such rules.
 
What do you mean about?
 
Dieter
 
  

What is the problem with using UTF-8  (or UTF-16) for these internal files ?




________________________________
 
Von: Dennis Sosnoski [mailto:dms@sosnoski.com]
Gesendet: Do 29.12.2005 09:44
An: user@xmlbeans.apache.org
Betreff: Re: Illegal characters, can xmlbeans be forgiving?
 
 
 
The XML recommendation says (4.3.3):
 
"It is a fatal error when an XML processor encounters an entity with an
encoding that it is unable to process. It is a fatal error if an XML
entity is determined (via default, encoding declaration, or higher-level
protocol) to be in a certain encoding but contains octet sequences that
are not legal in that encoding. It is also a fatal error if an XML
entity contains no encoding declaration and its content is not legal
UTF-8 or UTF-16."
 
Fatal errors are supposed to end processing. Since this doesn't seem to
be enforced by XMLBeans (or more likely, by the parser), you should
report this as an error.
 
I think it'd be a much more serious problem if XMLBeans fails to process
a document written as UTF-8 or UTF-16 without an encoding declaration,
or a document written as ISO-8858-1 with an encoding declaration. You
might want to test those variations.
 
  - Dennis
 
maarten wrote:
 
  

	I have noticed that xmlbeans 2.0 doesn't care whether the encoding
	declaration
	in the xml document matches the byte-encoding that is actually used.
	It seems to be more forgiving than I would like it to be.
	 
	For example:
	 
	public static void test (String charsetDocument, String charsetBytes)
	throws Exception {
	System.out.print ("doc: " + charsetDocument + ", bytes: " +
	charsetBytes + " => ");
	String xml =
	"<?xml version=\"1.0\" encoding=\"" + charsetDocument + "\"?>\n" +
	"<vap xmlns=\"http://www.eurid.eu/2005/vap\" <http://www.eurid.eu/2005/vap/>  >" +
	" <command>\n" +
	" <login>\n" +
	" <id>àáâäãa</id>\n" +
	" <password>àáâäãa</password>\n" +
	" </login> \n" +
	" </command>\n" +
	"</vap>";
	byte[] bytes = new byte[0];
	bytes = xml.getBytes(charsetBytes);
	ByteArrayInputStream in = new ByteArrayInputStream(bytes);
	try {
	VapDocument document = VapDocument.Factory.parse(in);
	if (document.validate()) {
	System.out.println("valid, encoding = " +
	document.documentProperties().getEncoding());
	return;
	}
	} catch(Exception e) {
	System.out.println(e.getClass().getName());
	return;
	}
	}
	 
	public static void main(String[] args) throws Exception {
	test ("UTF-8", "UTF-8");
	test ("UTF-8", "UTF-16");
	test ("ISO-8859-1", "UTF-8");
	test ("ISO-8859-1", "UTF-16");
	test ("anything", "ISO-8859-1");
	test ("anything", "UTF-8");
	test ("anything", "UTF-16");
	}
	 
	gives the following output:
	 
	doc: UTF-8, bytes: UTF-8 => valid, encoding = UTF-8
	doc: UTF-8, bytes: UTF-16 => valid, encoding = UTF-8
	doc: ISO-8859-1, bytes: UTF-8 => valid, encoding = ISO-8859-1
	doc: ISO-8859-1, bytes: UTF-16 => valid, encoding = ISO-8859-1
	doc: anything, bytes: ISO-8859-1 => java.io.UnsupportedEncodingException
	doc: anything, bytes: UTF-8 => java.io.UnsupportedEncodingException
	doc: anything, bytes: UTF-16 => valid, encoding = anything
	 
	 
	Anything I can do about this ?
	 
	Maarten
	 
	 
	Dennis Sosnoski wrote:
	 
	    

		Do your XML documents specify the encoding in the XML declaration? If
		not, there's no way to distinguish between UTF-8 and ISO-8859-X
		without the multiple parses - and the multiple parse approach doesn't
		even come close to guaranteeing that you've ended up with the correct
		encoding (since the different flavors of ISO-8859-X reuse the same
		byte values for different characters). If the documents *do* give the
		encoding in the XML declaration, XMLBeans should be reading it and
		interpreting the document correctly.
		 
		- Dennis
		 
		Christophe Bouhier (MC/ECM) wrote:
		 
		      

			Hi Lawrence,
			I am not sure how to detect the XML charsets, besides just looping
			through the list of supported encodings and trying to parse
			succesfully. This is is not elegant but it worked for me. Thanks for
			your help.
			Cheers . Christophe
			 
			 
			        

				-----Original Message-----
				From: Lawrence Jones [mailto:ljones@bea.com] Sent: 17 Disember 2005
				0:59
				To: user@xmlbeans.apache.org
				Subject: RE: Illegal characters, can xmlbeans be forgiving?
				Have a look at the code in:
				 
				$XMLBEANS/src/common/org/apache/xmlbeans/impl/common/EncodingMap.java
				 
				and the code that calls it in
				 
				$XMLBEANS/src/store/org/apache/xmlbeans/impl/store/Saver.java
				around line 1760 onwards
				 
				EncodingMap.java contains all the supported encodings in the static
				initializer at line 70.
				 
				Cheers,
				 
				Lawrence
				 
				 
				          

					-----Original Message-----
					From: Christophe Bouhier (MC/ECM)
					[mailto:Christophe.Bouhier@ericsson.com]
					Sent: Thursday, December 15, 2005 7:25 PM
					To: 'user@xmlbeans.apache.org'
					Subject: RE: Illegal characters, can xmlbeans be forgiving?
					 
					Thanks! That helps. I checked the API doc for
					            

				 
				setCharterEncoding but
				 
				          

					couldn't find The supported encoding types. In other words which
					encodings are allowed in the Function
					setCharacterEncoding("encoding"); ?
					 
					Cheers / Christophe
					 
					 
					            

					-----Original Message-----
					From: Lawrence Jones [mailto:ljones@bea.com]
					Sent: 16 Disember 2005 2:11
					To: user@xmlbeans.apache.org
					Subject: RE: Illegal characters, can xmlbeans be forgiving?
					 
					Hi Christophe
					 
					It's very unlikely that the characters are the problem -
					              

					 
					            

				all Unicode
				 
				          

					characters are allowed in XML - see e.g.
					http://www.xml.com/axml/testaxml.htm (section 2.2) and hence in
					XmlBeans.
					 
					What is more likely is that the characters are not encoded (as
					bytes) in the way XmlBeans expects. By default XmlBeans assumes
					UTF-8 encoding. Yours are probably ISO8859_1 or some such
					              

					 
					            

				thing. If
				 
				          

					you want to play around with character encoding have a look at
					XmlOptions.setCharacterEncoding().
					 
					Cheers,
					 
					Lawrence
					 
					 
					              

					-----Original Message-----
					From: Christophe Bouhier (MC/ECM)
					[mailto:Christophe.Bouhier@ericsson.com]
					Sent: Wednesday, December 14, 2005 6:04 PM
					To: 'user@xmlbeans.apache.org'
					Subject: Illegal characters, can xmlbeans be forgiving?
					 
					Hi,
					 
					My application parses XML from many different sources.
					                

					 
					              

				(It's a RSS
				 
				          

					reader/Podcast receiver).
					Before I switched to XMLBeans I was using an xml parser
					 
					                

					called nanoXMl
					 
					              

					which didn't mind Some illegal characters especially when
					 
					                

					wrapped in
					 
					              

					CDATA.
					Now XMLBeans stumbles over the illegal chars
					                

					 
					              

				below:(âEURoe) (Throws
				 
				          

					exception).
					 
					....
					<description><![CDATA[
					Miljenko âEURoeMikeâEUR? Grgich first gained international
					 
					                

					recognition at
					 
					              

					the celebrated âEURoeParis TastingâEUR? of 1976. They had
					 
					                

					chosen MikeâEUR(tm)s
					 
					              

					1973 Chateau Montelena Chardonnay as the finest white wine
					 
					                

					in the world.
					 
					              

					Today, Mike oversees daily operations at his winery
					 
					                

					Grgich Hills.
					 
					              

					His aim, year after year, is to improve the quality of their
					[...]]]></description> ......
					 
					Is there anyway I can set an option to ignore illegal chars
					 
					                

					and go on.
					 
					              

					For me this could be a deal-breaker. I unfortunatly can't
					 
					                

					expect all
					 
					              

					XML out on the web to be "nice and tidy".
					 
					Thanks for the help!
					Cheers / Christophe
					 
					 
					 
					                

				--------------------------------------------------------------------
				 
				          

					-
					 
					              

					To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
					For additional commands, e-mail: user-help@xmlbeans.apache.org
					 
					                

				---------------------------------------------------------------------
				 
				          

					To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
					For additional commands, e-mail: user-help@xmlbeans.apache.org
					 
					            

			---------------------------------------------------------------------
			To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
			For additional commands, e-mail: user-help@xmlbeans.apache.org
			 
			 
			 
			 
			        

		 
		---------------------------------------------------------------------
		To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
		For additional commands, e-mail: user-help@xmlbeans.apache.org
		 
		 
		      

	 
	---------------------------------------------------------------------
	To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
	For additional commands, e-mail: user-help@xmlbeans.apache.org
	 
	    

 
 
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org
 
 
 
  
 



________________________________



 
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org

Re: AW: Illegal characters, can xmlbeans be forgiving?

Posted by maarten <ma...@dns.be>.

Of course,  you are right. I didn't think hard enough about this one. 

Concerning the last case,  removing the byte-order-mark does not make a 
difference.
I will file this in JIRA.

doc: anything,   bytes: UTF-16 without BOM => valid, encoding = anything

Maarten

Lawrence Jones wrote:

> Just to chime in a bit here - I worked on I18N for a different product 
> and can help a little here. Unfortunately I don't think there's much 
> XMLBeans (or even the parser) can do - the errors below are not fatal 
> - in fact the parser doesn't even know that they are errors.
>
>  
>
> The fact that you can take a String, convert it to bytes and have the 
> parser read those bytes assuming that they're a different encoding is 
> not necessarily something that the parser (or anything else) can pick 
> up. For noticing errors, what matters is whether the bytes, once 
> they've been produced by encoding a string in the "charsetBytes" 
> encoding represent a valid encoding of _/any/_ set of characters in 
> the "charsetDocument" encoding. If they do then the parser will treat 
> them as those characters and will not (and cannot) complain (unless 
> you happen to produce chars which are invalid under XML, e.g. \u0001, 
> which then means the document is not well-formed).
>
>  
>
> Let me explain a bit more - suppose you have a String of characters 
> ABCD (ABCD are not literal - they each stand for "some character"). 
> Translate that under charsetBytes to a set of bytes abcdefgh (each of 
> a, b, c, etc. represents a byte - not a character). Now translate back 
> from abcdefgh to a string XYZ using encoding charsetDocument (XYZ may 
> or may not be the same as ABCD). If this process succeeds i.e. if 
> there exists _/any/_ string which is validly encoded as abcdefgh under 
> encoding charsetDocument then the parser has _/no/_ way of telling 
> that that wasn't what you meant to do. You passed it a series of bytes 
> and a character encoding. It translated that into a series of 
> characters in a valid way. It has no way of knowing whether the 
> characters it is producing are the characters you intended.
>
>  
>
> Some character encodings (e.g. ISO-8859-1) have mappings from 
> _/every/_ possible byte combination to a valid character or set of 
> characters and as such they will never cause an exception below 
> (though they may or may not represent the original characters you 
> input). (But try putting in an xmlText() call and you'll see that the 
> output document has different chars than the input one).
>
>  
>
> Some encodings (e.g. UTF-8) do have a certain structure that they 
> expect of the bytes and if you fail to follow that structure an 
> exception is thrown. E.g. in your own test if you call:
>
>  
>
>         test ("UTF-8", "ISO-8859-1");
>
>  
>
> You will get a java.io.CharConversionException from the parser. If you 
> print out the stack trace for this you'll see something like:
>
>  
>
> java.io.CharConversionException: Malformed UTF-8 character: 0xe4 0xe3 0x61
>
>         at 
> org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:108)
>
>  
>
> Note that validation has little or nothing to do with this. If you end 
> up producing characters which are invalid XML characters then the 
> document is not well-formed and this should be caught well before 
> XmlBeans would attempt validation. However such characters are 
> relatively rare so it's not something you can rely on. Any other 
> result means you have valid XML chars - validation might catch this if 
> you had a restriction on which chars are allowed and you happened to 
> produce ones which are not allowed but again this is rare and with 
> most schemas there is nothing for validation to catch here.
>
>  
>
> On the other hand, the very last output from your program I do find 
> interesting - in that I would expect it to throw an 
> UnsupportedEncodingException just as for the other ones. I haven't a 
> lot of experience with UTF-16. I think in this case the parser may be 
> reading the byte order mark and using it regardless of anything else. 
> If there is both a byte order mark _/and/_ an encoding decl which is 
> inconsistent with the byte order mark then (according to 
> http://www.xml.com/axml/testaxml.htm section 4.3.3) I think that the 
> parser should error but I don't think that's happening at the moment.
>
>  
>
> Sorry - I know that doesn't help a lot - but in general in Java if you 
> pass a series of bytes and an encoding to a Reader and if the Reader 
> can then read those bytes according to that encoding then there is no 
> way that you can trap that either the bytes or the encoding were in 
> some way "wrong".
>
>  
>
> Cheers,
>
>  
>
> Lawrence
>
>  
>
> ------------------------------------------------------------------------
>
> *From:* maarten [mailto:maartenb@dns.be]
> *Sent:* Friday, December 30, 2005 2:41 AM
> *To:* user@xmlbeans.apache.org
> *Subject:* Re: AW: Illegal characters, can xmlbeans be forgiving?
>
>  
>
> profos@rspd.ch <ma...@rspd.ch> wrote:
>
>Hi all,
>
> 
>
>concerning the question of whether XmlBeans should enforce the "environmental" rules about encoding and by consequence about erroneous sequences, I have a sligthly different viewpoint: I agree that - given an encoding indication - the parser should detect and reject erroneous sentences. However, I don't see the strict necessity of the presence of such specifications: Think of internal parameter files of configurations or user setup which are strictly internal to the application using it - here I don't see any reason why to enforce such rules. I therefore would opt for XmlBean Options enabling to switch on or off the rigourous enforcing of such rules.
>
> 
>
>What do you mean about?
>
> 
>
>Dieter
>
> 
>
>  
>
> What is the problem with using UTF-8  (or UTF-16) for these internal 
> files ?
>
>
>________________________________
>
> 
>
>Von: Dennis Sosnoski [mailto:dms@sosnoski.com]
>
>Gesendet: Do 29.12.2005 09:44
>
>An: user@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>
>Betreff: Re: Illegal characters, can xmlbeans be forgiving?
>
> 
>
> 
>
> 
>
>The XML recommendation says (4.3.3):
>
> 
>
>"It is a fatal error when an XML processor encounters an entity with an
>
>encoding that it is unable to process. It is a fatal error if an XML
>
>entity is determined (via default, encoding declaration, or higher-level
>
>protocol) to be in a certain encoding but contains octet sequences that
>
>are not legal in that encoding. It is also a fatal error if an XML
>
>entity contains no encoding declaration and its content is not legal
>
>UTF-8 or UTF-16."
>
> 
>
>Fatal errors are supposed to end processing. Since this doesn't seem to
>
>be enforced by XMLBeans (or more likely, by the parser), you should
>
>report this as an error.
>
> 
>
>I think it'd be a much more serious problem if XMLBeans fails to process
>
>a document written as UTF-8 or UTF-16 without an encoding declaration,
>
>or a document written as ISO-8858-1 with an encoding declaration. You
>
>might want to test those variations.
>
> 
>
>  - Dennis
>
> 
>
>maarten wrote:
>
> 
>
>  
>
>>I have noticed that xmlbeans 2.0 doesn't care whether the encoding
>>
>>declaration
>>
>>in the xml document matches the byte-encoding that is actually used.
>>
>>It seems to be more forgiving than I would like it to be.
>>
>> 
>>
>>For example:
>>
>> 
>>
>>public static void test (String charsetDocument, String charsetBytes)
>>
>>throws Exception {
>>
>>System.out.print ("doc: " + charsetDocument + ", bytes: " +
>>
>>charsetBytes + " => ");
>>
>>String xml =
>>
>>"<?xml version=\"1.0\" encoding=\"" + charsetDocument + "\"?>\n" +
>>
>>"<vap xmlns=\"http://www.eurid.eu/2005/vap\" <http://www.eurid.eu/2005/vap/> >" +
>>
>>" <command>\n" +
>>
>>" <login>\n" +
>>
>>" <id>àáâäãa</id>\n" +
>>
>>" <password>àáâäãa</password>\n" +
>>
>>" </login> \n" +
>>
>>" </command>\n" +
>>
>>"</vap>";
>>
>>byte[] bytes = new byte[0];
>>
>>bytes = xml.getBytes(charsetBytes);
>>
>>ByteArrayInputStream in = new ByteArrayInputStream(bytes);
>>
>>try {
>>
>>VapDocument document = VapDocument.Factory.parse(in);
>>
>>if (document.validate()) {
>>
>>System.out.println("valid, encoding = " +
>>
>>document.documentProperties().getEncoding());
>>
>>return;
>>
>>}
>>
>>} catch(Exception e) {
>>
>>System.out.println(e.getClass().getName());
>>
>>return;
>>
>>}
>>
>>}
>>
>> 
>>
>>public static void main(String[] args) throws Exception {
>>
>>test ("UTF-8", "UTF-8");
>>
>>test ("UTF-8", "UTF-16");
>>
>>test ("ISO-8859-1", "UTF-8");
>>
>>test ("ISO-8859-1", "UTF-16");
>>
>>test ("anything", "ISO-8859-1");
>>
>>test ("anything", "UTF-8");
>>
>>test ("anything", "UTF-16");
>>
>>}
>>
>> 
>>
>>gives the following output:
>>
>> 
>>
>>doc: UTF-8, bytes: UTF-8 => valid, encoding = UTF-8
>>
>>doc: UTF-8, bytes: UTF-16 => valid, encoding = UTF-8
>>
>>doc: ISO-8859-1, bytes: UTF-8 => valid, encoding = ISO-8859-1
>>
>>doc: ISO-8859-1, bytes: UTF-16 => valid, encoding = ISO-8859-1
>>
>>doc: anything, bytes: ISO-8859-1 => java.io.UnsupportedEncodingException
>>
>>doc: anything, bytes: UTF-8 => java.io.UnsupportedEncodingException
>>
>>doc: anything, bytes: UTF-16 => valid, encoding = anything
>>
>> 
>>
>> 
>>
>>Anything I can do about this ?
>>
>> 
>>
>>Maarten
>>
>> 
>>
>> 
>>
>>Dennis Sosnoski wrote:
>>
>> 
>>
>>    
>>
>>>Do your XML documents specify the encoding in the XML declaration? If
>>>
>>>not, there's no way to distinguish between UTF-8 and ISO-8859-X
>>>
>>>without the multiple parses - and the multiple parse approach doesn't
>>>
>>>even come close to guaranteeing that you've ended up with the correct
>>>
>>>encoding (since the different flavors of ISO-8859-X reuse the same
>>>
>>>byte values for different characters). If the documents *do* give the
>>>
>>>encoding in the XML declaration, XMLBeans should be reading it and
>>>
>>>interpreting the document correctly.
>>>
>>> 
>>>
>>>- Dennis
>>>
>>> 
>>>
>>>Christophe Bouhier (MC/ECM) wrote:
>>>
>>> 
>>>
>>>      
>>>
>>>>Hi Lawrence,
>>>>
>>>>I am not sure how to detect the XML charsets, besides just looping
>>>>
>>>>through the list of supported encodings and trying to parse
>>>>
>>>>succesfully. This is is not elegant but it worked for me. Thanks for
>>>>
>>>>your help.
>>>>
>>>>Cheers . Christophe
>>>>
>>>> 
>>>>
>>>> 
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>
>>>>>From: Lawrence Jones [mailto:ljones@bea.com] Sent: 17 Disember 2005
>>>>>
>>>>>0:59
>>>>>
>>>>>To: user@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>>>>
>>>>>Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>
>>>>>Have a look at the code in:
>>>>>
>>>>> 
>>>>>
>>>>>$XMLBEANS/src/common/org/apache/xmlbeans/impl/common/EncodingMap.java
>>>>>
>>>>> 
>>>>>
>>>>>and the code that calls it in
>>>>>
>>>>> 
>>>>>
>>>>>$XMLBEANS/src/store/org/apache/xmlbeans/impl/store/Saver.java
>>>>>
>>>>>around line 1760 onwards
>>>>>
>>>>> 
>>>>>
>>>>>EncodingMap.java contains all the supported encodings in the static
>>>>>
>>>>>initializer at line 70.
>>>>>
>>>>> 
>>>>>
>>>>>Cheers,
>>>>>
>>>>> 
>>>>>
>>>>>Lawrence
>>>>>
>>>>> 
>>>>>
>>>>> 
>>>>>
>>>>>          
>>>>>
>>>>>>-----Original Message-----
>>>>>>
>>>>>>From: Christophe Bouhier (MC/ECM)
>>>>>>
>>>>>>[mailto:Christophe.Bouhier@ericsson.com]
>>>>>>
>>>>>>Sent: Thursday, December 15, 2005 7:25 PM
>>>>>>
>>>>>>To: 'user@xmlbeans.apache.org <ma...@xmlbeans.apache.org>'
>>>>>>
>>>>>>Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>>
>>>>>> 
>>>>>>
>>>>>>Thanks! That helps. I checked the API doc for
>>>>>>
>>>>>>            
>>>>>>
>>>>> 
>>>>>
>>>>>setCharterEncoding but
>>>>>
>>>>> 
>>>>>
>>>>>          
>>>>>
>>>>>>couldn't find The supported encoding types. In other words which
>>>>>>
>>>>>>encodings are allowed in the Function
>>>>>>
>>>>>>setCharacterEncoding("encoding"); ?
>>>>>>
>>>>>> 
>>>>>>
>>>>>>Cheers / Christophe
>>>>>>
>>>>>> 
>>>>>>
>>>>>> 
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>
>>>>>>>From: Lawrence Jones [mailto:ljones@bea.com]
>>>>>>>
>>>>>>>Sent: 16 Disember 2005 2:11
>>>>>>>
>>>>>>>To: user@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>>>>>>
>>>>>>>Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>Hi Christophe
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>It's very unlikely that the characters are the problem -
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>> 
>>>>>>
>>>>>>            
>>>>>>
>>>>>all Unicode
>>>>>
>>>>> 
>>>>>
>>>>>          
>>>>>
>>>>>>>characters are allowed in XML - see e.g.
>>>>>>>
>>>>>>>http://www.xml.com/axml/testaxml.htm (section 2.2) and hence in
>>>>>>>
>>>>>>>XmlBeans.
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>What is more likely is that the characters are not encoded (as
>>>>>>>
>>>>>>>bytes) in the way XmlBeans expects. By default XmlBeans assumes
>>>>>>>
>>>>>>>UTF-8 encoding. Yours are probably ISO8859_1 or some such
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>> 
>>>>>>
>>>>>>            
>>>>>>
>>>>>thing. If
>>>>>
>>>>> 
>>>>>
>>>>>          
>>>>>
>>>>>>>you want to play around with character encoding have a look at
>>>>>>>
>>>>>>>XmlOptions.setCharacterEncoding().
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>Cheers,
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>Lawrence
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>
>>>>>>>>From: Christophe Bouhier (MC/ECM)
>>>>>>>>
>>>>>>>>[mailto:Christophe.Bouhier@ericsson.com]
>>>>>>>>
>>>>>>>>Sent: Wednesday, December 14, 2005 6:04 PM
>>>>>>>>
>>>>>>>>To: 'user@xmlbeans.apache.org <ma...@xmlbeans.apache.org>'
>>>>>>>>
>>>>>>>>Subject: Illegal characters, can xmlbeans be forgiving?
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>Hi,
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>My application parses XML from many different sources.
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>(It's a RSS
>>>>>
>>>>> 
>>>>>
>>>>>          
>>>>>
>>>>>>>>reader/Podcast receiver).
>>>>>>>>
>>>>>>>>Before I switched to XMLBeans I was using an xml parser
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>called nanoXMl
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>which didn't mind Some illegal characters especially when
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>wrapped in
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>CDATA.
>>>>>>>>
>>>>>>>>Now XMLBeans stumbles over the illegal chars
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>below:(âEURoe) (Throws
>>>>>
>>>>> 
>>>>>
>>>>>          
>>>>>
>>>>>>>>exception).
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>....
>>>>>>>>
>>>>>>>><description><![CDATA[
>>>>>>>>
>>>>>>>>Miljenko âEURoeMikeâEUR? Grgich first gained international
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>recognition at
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>the celebrated âEURoeParis TastingâEUR? of 1976. They had
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>chosen MikeâEUR(tm)s
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>1973 Chateau Montelena Chardonnay as the finest white wine
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>in the world.
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Today, Mike oversees daily operations at his winery
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>Grgich Hills.
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>His aim, year after year, is to improve the quality of their
>>>>>>>>
>>>>>>>>[...]]]></description> ......
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>Is there anyway I can set an option to ignore illegal chars
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>and go on.
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>For me this could be a deal-breaker. I unfortunatly can't
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>expect all
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>XML out on the web to be "nice and tidy".
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>Thanks for the help!
>>>>>>>>
>>>>>>>>Cheers / Christophe
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>--------------------------------------------------------------------
>>>>>
>>>>> 
>>>>>
>>>>>          
>>>>>
>>>>>>>-
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>>>>>>>
>>>>>>>>For additional commands, e-mail: user-help@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>>>>>>>
>>>>>>>> 
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>
>>>>> 
>>>>>
>>>>>          
>>>>>
>>>>>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>>>>>
>>>>>>For additional commands, e-mail: user-help@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>>>>>
>>>>>> 
>>>>>>
>>>>>>            
>>>>>>
>>>>---------------------------------------------------------------------
>>>>
>>>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>>>
>>>>For additional commands, e-mail: user-help@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>>>
>>>> 
>>>>
>>>> 
>>>>
>>>> 
>>>>
>>>> 
>>>>
>>>>        
>>>>
>>> 
>>>
>>>---------------------------------------------------------------------
>>>
>>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>>
>>>For additional commands, e-mail: user-help@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>>
>>> 
>>>
>>> 
>>>
>>>      
>>>
>> 
>>
>>---------------------------------------------------------------------
>>
>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>
>>For additional commands, e-mail: user-help@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>>
>> 
>>
>>    
>>
> 
>
> 
>
>---------------------------------------------------------------------
>
>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>
>For additional commands, e-mail: user-help@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>
> 
>
> 
>
> 
>
>  
>
> 
>
>
>
>------------------------------------------------------------------------
>
>
> 
>
>---------------------------------------------------------------------
>
>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>
>For additional commands, e-mail: user-help@xmlbeans.apache.org <ma...@xmlbeans.apache.org>
>
>  
>