You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@xmlbeans.apache.org by Dennis Sosnoski <dm...@sosnoski.com> on 2005/12/24 03:29:27 UTC

Re: Illegal characters, can xmlbeans be forgiving?

Do your XML documents specify the encoding in the XML declaration? If 
not, there's no way to distinguish between UTF-8 and ISO-8859-X without 
the multiple parses - and the multiple parse approach doesn't even come 
close to guaranteeing that you've ended up with the correct encoding 
(since the different flavors of ISO-8859-X reuse the same byte values 
for different characters). If the documents *do* give the encoding in 
the XML declaration, XMLBeans should be reading it and interpreting the 
document correctly.

  - Dennis

Christophe Bouhier (MC/ECM) wrote:

>Hi Lawrence, 
>
>I am not sure how to detect the XML charsets, besides just looping through the list of supported encodings 
>and trying to parse succesfully. This is is not elegant but it worked for me. 
>Thanks for your help. 
>
>Cheers . Christophe 
>
>  
>
>>-----Original Message-----
>>From: Lawrence Jones [mailto:ljones@bea.com] 
>>Sent: 17 Disember 2005 0:59
>>To: user@xmlbeans.apache.org
>>Subject: RE: Illegal characters, can xmlbeans be forgiving? 
>>
>>Have a look at the code in:
>>
>>$XMLBEANS/src/common/org/apache/xmlbeans/impl/common/EncodingMap.java
>>
>>and the code that calls it in
>>
>>$XMLBEANS/src/store/org/apache/xmlbeans/impl/store/Saver.java 
>>around line 1760 onwards
>>
>>EncodingMap.java contains all the supported encodings in the 
>>static initializer at line 70.
>>
>>Cheers,
>>
>>Lawrence
>>
>>    
>>
>>>-----Original Message-----
>>>From: Christophe Bouhier (MC/ECM) 
>>>[mailto:Christophe.Bouhier@ericsson.com]
>>>Sent: Thursday, December 15, 2005 7:25 PM
>>>To: 'user@xmlbeans.apache.org'
>>>Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>
>>>Thanks! That helps. I checked the API doc for 
>>>      
>>>
>>setCharterEncoding but 
>>    
>>
>>>couldn’t find The supported encoding types. In other words which 
>>>encodings are allowed in the Function 
>>>setCharacterEncoding("encoding"); ?
>>>
>>>Cheers / Christophe
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Lawrence Jones [mailto:ljones@bea.com]
>>>>Sent: 16 Disember 2005 2:11
>>>>To: user@xmlbeans.apache.org
>>>>Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>
>>>>Hi Christophe
>>>>
>>>>It's very unlikely that the characters are the problem - 
>>>>        
>>>>
>>all Unicode 
>>    
>>
>>>>characters are allowed in XML - see e.g.
>>>>http://www.xml.com/axml/testaxml.htm (section 2.2) and hence in 
>>>>XmlBeans.
>>>>
>>>>What is more likely is that the characters are not encoded (as 
>>>>bytes) in the way XmlBeans expects. By default XmlBeans assumes 
>>>>UTF-8 encoding. Yours are probably ISO8859_1 or some such 
>>>>        
>>>>
>>thing. If 
>>    
>>
>>>>you want to play around with character encoding have a look at 
>>>>XmlOptions.setCharacterEncoding().
>>>>
>>>>Cheers,
>>>>
>>>>Lawrence
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>From: Christophe Bouhier (MC/ECM)
>>>>>[mailto:Christophe.Bouhier@ericsson.com]
>>>>>Sent: Wednesday, December 14, 2005 6:04 PM
>>>>>To: 'user@xmlbeans.apache.org'
>>>>>Subject: Illegal characters, can xmlbeans be forgiving?
>>>>>
>>>>>Hi,
>>>>>
>>>>>My application parses XML from many different sources. 
>>>>>          
>>>>>
>>(It's a RSS 
>>    
>>
>>>>>reader/Podcast receiver).
>>>>>Before I switched to XMLBeans I was using an xml parser
>>>>>          
>>>>>
>>>>called nanoXMl
>>>>        
>>>>
>>>>>which didn't mind Some illegal characters especially when
>>>>>          
>>>>>
>>>>wrapped in
>>>>        
>>>>
>>>>>CDATA.
>>>>>Now XMLBeans stumbles over the illegal chars 
>>>>>          
>>>>>
>>below:(â€œ) (Throws 
>>    
>>
>>>>>exception).
>>>>>
>>>>>....
>>>>><description><![CDATA[
>>>>>	Miljenko â€œMikeâ€? Grgich first gained international
>>>>>          
>>>>>
>>>>recognition at
>>>>        
>>>>
>>>>>the celebrated â€œParis Tastingâ€? of 1976.  They had
>>>>>          
>>>>>
>>>>chosen Mikeâ€™s
>>>>        
>>>>
>>>>>1973 Chateau Montelena Chardonnay as the finest white wine
>>>>>          
>>>>>
>>>>in the world.
>>>>        
>>>>
>>>>>	Today, Mike oversees daily operations at his winery
>>>>>          
>>>>>
>>>>Grgich Hills.
>>>>        
>>>>
>>>>>His aim, year after year, is to improve the quality of their 
>>>>>[...]]]></description> ......
>>>>>
>>>>>Is there anyway I can set an option to ignore illegal chars
>>>>>          
>>>>>
>>>>and go on.
>>>>        
>>>>
>>>>>For me this could be a deal-breaker. I unfortunatly can't
>>>>>          
>>>>>
>>>>expect all
>>>>        
>>>>
>>>>>XML out on the web to be "nice and tidy".
>>>>>
>>>>>Thanks for the help!
>>>>>Cheers / Christophe
>>>>>
>>>>>
>>>>>          
>>>>>
>>--------------------------------------------------------------------
>>    
>>
>>>>-
>>>>        
>>>>
>>>>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>>>For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>>>          
>>>>>
>>>>        
>>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>    
>>
>>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>      
>>>
>>    
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>For additional commands, e-mail: user-help@xmlbeans.apache.org
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org

Re: AW: Illegal characters, can xmlbeans be forgiving?

Posted by maarten <ma...@dns.be>.

profos@rspd.ch wrote:

>Hi all,
> 
>concerning the question of whether XmlBeans should enforce the "environmental" rules about encoding and by consequence about erroneous sequences, I have a sligthly different viewpoint: I agree that - given an encoding indication - the parser should detect and reject erroneous sentences. However, I don't see the strict necessity of the presence of such specifications: Think of internal parameter files of configurations or user setup which are strictly internal to the application using it - here I don't see any reason why to enforce such rules. I therefore would opt for XmlBean Options enabling to switch on or off the rigourous enforcing of such rules.
> 
>What do you mean about?
> 
>Dieter
>
>  
>
What is the problem with using UTF-8  (or UTF-16) for these internal files ?

>________________________________
>
>Von: Dennis Sosnoski [mailto:dms@sosnoski.com]
>Gesendet: Do 29.12.2005 09:44
>An: user@xmlbeans.apache.org
>Betreff: Re: Illegal characters, can xmlbeans be forgiving?
>
>
>
>The XML recommendation says (4.3.3):
>
>"It is a fatal error when an XML processor encounters an entity with an
>encoding that it is unable to process. It is a fatal error if an XML
>entity is determined (via default, encoding declaration, or higher-level
>protocol) to be in a certain encoding but contains octet sequences that
>are not legal in that encoding. It is also a fatal error if an XML
>entity contains no encoding declaration and its content is not legal
>UTF-8 or UTF-16."
>
>Fatal errors are supposed to end processing. Since this doesn't seem to
>be enforced by XMLBeans (or more likely, by the parser), you should
>report this as an error.
>
>I think it'd be a much more serious problem if XMLBeans fails to process
>a document written as UTF-8 or UTF-16 without an encoding declaration,
>or a document written as ISO-8858-1 with an encoding declaration. You
>might want to test those variations.
>
>  - Dennis
>
>maarten wrote:
>
>  
>
>>I have noticed that xmlbeans 2.0 doesn't care whether the encoding
>>declaration
>>in the xml document matches the byte-encoding that is actually used.
>>It seems to be more forgiving than I would like it to be.
>>
>>For example:
>>
>>public static void test (String charsetDocument, String charsetBytes)
>>throws Exception {
>>System.out.print ("doc: " + charsetDocument + ", bytes: " +
>>charsetBytes + " => ");
>>String xml =
>>"<?xml version=\"1.0\" encoding=\"" + charsetDocument + "\"?>\n" +
>>"<vap xmlns=\"http://www.eurid.eu/2005/vap\" >" +
>>" <command>\n" +
>>" <login>\n" +
>>" <id>àáâäãa</id>\n" +
>>" <password>àáâäãa</password>\n" +
>>" </login> \n" +
>>" </command>\n" +
>>"</vap>";
>>byte[] bytes = new byte[0];
>>bytes = xml.getBytes(charsetBytes);
>>ByteArrayInputStream in = new ByteArrayInputStream(bytes);
>>try {
>>VapDocument document = VapDocument.Factory.parse(in);
>>if (document.validate()) {
>>System.out.println("valid, encoding = " +
>>document.documentProperties().getEncoding());
>>return;
>>}
>>} catch(Exception e) {
>>System.out.println(e.getClass().getName());
>>return;
>>}
>>}
>>
>>public static void main(String[] args) throws Exception {
>>test ("UTF-8", "UTF-8");
>>test ("UTF-8", "UTF-16");
>>test ("ISO-8859-1", "UTF-8");
>>test ("ISO-8859-1", "UTF-16");
>>test ("anything", "ISO-8859-1");
>>test ("anything", "UTF-8");
>>test ("anything", "UTF-16");
>>}
>>
>>gives the following output:
>>
>>doc: UTF-8, bytes: UTF-8 => valid, encoding = UTF-8
>>doc: UTF-8, bytes: UTF-16 => valid, encoding = UTF-8
>>doc: ISO-8859-1, bytes: UTF-8 => valid, encoding = ISO-8859-1
>>doc: ISO-8859-1, bytes: UTF-16 => valid, encoding = ISO-8859-1
>>doc: anything, bytes: ISO-8859-1 => java.io.UnsupportedEncodingException
>>doc: anything, bytes: UTF-8 => java.io.UnsupportedEncodingException
>>doc: anything, bytes: UTF-16 => valid, encoding = anything
>>
>>
>>Anything I can do about this ?
>>
>>Maarten
>>
>>
>>Dennis Sosnoski wrote:
>>
>>    
>>
>>>Do your XML documents specify the encoding in the XML declaration? If
>>>not, there's no way to distinguish between UTF-8 and ISO-8859-X
>>>without the multiple parses - and the multiple parse approach doesn't
>>>even come close to guaranteeing that you've ended up with the correct
>>>encoding (since the different flavors of ISO-8859-X reuse the same
>>>byte values for different characters). If the documents *do* give the
>>>encoding in the XML declaration, XMLBeans should be reading it and
>>>interpreting the document correctly.
>>>
>>>- Dennis
>>>
>>>Christophe Bouhier (MC/ECM) wrote:
>>>
>>>      
>>>
>>>>Hi Lawrence,
>>>>I am not sure how to detect the XML charsets, besides just looping
>>>>through the list of supported encodings and trying to parse
>>>>succesfully. This is is not elegant but it worked for me. Thanks for
>>>>your help.
>>>>Cheers . Christophe
>>>>
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>From: Lawrence Jones [mailto:ljones@bea.com] Sent: 17 Disember 2005
>>>>>0:59
>>>>>To: user@xmlbeans.apache.org
>>>>>Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>Have a look at the code in:
>>>>>
>>>>>$XMLBEANS/src/common/org/apache/xmlbeans/impl/common/EncodingMap.java
>>>>>
>>>>>and the code that calls it in
>>>>>
>>>>>$XMLBEANS/src/store/org/apache/xmlbeans/impl/store/Saver.java
>>>>>around line 1760 onwards
>>>>>
>>>>>EncodingMap.java contains all the supported encodings in the static
>>>>>initializer at line 70.
>>>>>
>>>>>Cheers,
>>>>>
>>>>>Lawrence
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Christophe Bouhier (MC/ECM)
>>>>>>[mailto:Christophe.Bouhier@ericsson.com]
>>>>>>Sent: Thursday, December 15, 2005 7:25 PM
>>>>>>To: 'user@xmlbeans.apache.org'
>>>>>>Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>>
>>>>>>Thanks! That helps. I checked the API doc for
>>>>>>            
>>>>>>
>>>>>setCharterEncoding but
>>>>>
>>>>>          
>>>>>
>>>>>>couldn't find The supported encoding types. In other words which
>>>>>>encodings are allowed in the Function
>>>>>>setCharacterEncoding("encoding"); ?
>>>>>>
>>>>>>Cheers / Christophe
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: Lawrence Jones [mailto:ljones@bea.com]
>>>>>>>Sent: 16 Disember 2005 2:11
>>>>>>>To: user@xmlbeans.apache.org
>>>>>>>Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>>>
>>>>>>>Hi Christophe
>>>>>>>
>>>>>>>It's very unlikely that the characters are the problem -
>>>>>>>              
>>>>>>>
>>>>>>            
>>>>>>
>>>>>all Unicode
>>>>>
>>>>>          
>>>>>
>>>>>>>characters are allowed in XML - see e.g.
>>>>>>>http://www.xml.com/axml/testaxml.htm (section 2.2) and hence in
>>>>>>>XmlBeans.
>>>>>>>
>>>>>>>What is more likely is that the characters are not encoded (as
>>>>>>>bytes) in the way XmlBeans expects. By default XmlBeans assumes
>>>>>>>UTF-8 encoding. Yours are probably ISO8859_1 or some such
>>>>>>>              
>>>>>>>
>>>>>>            
>>>>>>
>>>>>thing. If
>>>>>
>>>>>          
>>>>>
>>>>>>>you want to play around with character encoding have a look at
>>>>>>>XmlOptions.setCharacterEncoding().
>>>>>>>
>>>>>>>Cheers,
>>>>>>>
>>>>>>>Lawrence
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Christophe Bouhier (MC/ECM)
>>>>>>>>[mailto:Christophe.Bouhier@ericsson.com]
>>>>>>>>Sent: Wednesday, December 14, 2005 6:04 PM
>>>>>>>>To: 'user@xmlbeans.apache.org'
>>>>>>>>Subject: Illegal characters, can xmlbeans be forgiving?
>>>>>>>>
>>>>>>>>Hi,
>>>>>>>>
>>>>>>>>My application parses XML from many different sources.
>>>>>>>>                
>>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>(It's a RSS
>>>>>
>>>>>          
>>>>>
>>>>>>>>reader/Podcast receiver).
>>>>>>>>Before I switched to XMLBeans I was using an xml parser
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>called nanoXMl
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>which didn't mind Some illegal characters especially when
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>wrapped in
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>CDATA.
>>>>>>>>Now XMLBeans stumbles over the illegal chars
>>>>>>>>                
>>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>below:(âEURoe) (Throws
>>>>>
>>>>>          
>>>>>
>>>>>>>>exception).
>>>>>>>>
>>>>>>>>....
>>>>>>>><description><![CDATA[
>>>>>>>>Miljenko âEURoeMikeâEUR? Grgich first gained international
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>recognition at
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>the celebrated âEURoeParis TastingâEUR? of 1976. They had
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>chosen MikeâEUR(tm)s
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>1973 Chateau Montelena Chardonnay as the finest white wine
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>in the world.
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Today, Mike oversees daily operations at his winery
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>Grgich Hills.
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>His aim, year after year, is to improve the quality of their
>>>>>>>>[...]]]></description> ......
>>>>>>>>
>>>>>>>>Is there anyway I can set an option to ignore illegal chars
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>and go on.
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>For me this could be a deal-breaker. I unfortunatly can't
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>expect all
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>XML out on the web to be "nice and tidy".
>>>>>>>>
>>>>>>>>Thanks for the help!
>>>>>>>>Cheers / Christophe
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>--------------------------------------------------------------------
>>>>>
>>>>>          
>>>>>
>>>>>>>-
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>>>>>>For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>
>>>>>          
>>>>>
>>>>>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>>>>For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>>>>
>>>>>>            
>>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>>For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>For additional commands, e-mail: user-help@xmlbeans.apache.org
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>For additional commands, e-mail: user-help@xmlbeans.apache.org
>
>
>
>  
>
>------------------------------------------------------------------------
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>For additional commands, e-mail: user-help@xmlbeans.apache.org
>

AW: Illegal characters, can xmlbeans be forgiving?

Posted by pr...@rspd.ch.

Hi all,
 
concerning the question of whether XmlBeans should enforce the "environmental" rules about encoding and by consequence about erroneous sequences, I have a sligthly different viewpoint: I agree that - given an encoding indication - the parser should detect and reject erroneous sentences. However, I don't see the strict necessity of the presence of such specifications: Think of internal parameter files of configurations or user setup which are strictly internal to the application using it - here I don't see any reason why to enforce such rules. I therefore would opt for XmlBean Options enabling to switch on or off the rigourous enforcing of such rules.
 
What do you mean about?
 
Dieter

________________________________

Von: Dennis Sosnoski [mailto:dms@sosnoski.com]
Gesendet: Do 29.12.2005 09:44
An: user@xmlbeans.apache.org
Betreff: Re: Illegal characters, can xmlbeans be forgiving?



The XML recommendation says (4.3.3):

"It is a fatal error when an XML processor encounters an entity with an
encoding that it is unable to process. It is a fatal error if an XML
entity is determined (via default, encoding declaration, or higher-level
protocol) to be in a certain encoding but contains octet sequences that
are not legal in that encoding. It is also a fatal error if an XML
entity contains no encoding declaration and its content is not legal
UTF-8 or UTF-16."

Fatal errors are supposed to end processing. Since this doesn't seem to
be enforced by XMLBeans (or more likely, by the parser), you should
report this as an error.

I think it'd be a much more serious problem if XMLBeans fails to process
a document written as UTF-8 or UTF-16 without an encoding declaration,
or a document written as ISO-8858-1 with an encoding declaration. You
might want to test those variations.

  - Dennis

maarten wrote:

> I have noticed that xmlbeans 2.0 doesn't care whether the encoding
> declaration
> in the xml document matches the byte-encoding that is actually used.
> It seems to be more forgiving than I would like it to be.
>
> For example:
>
> public static void test (String charsetDocument, String charsetBytes)
> throws Exception {
> System.out.print ("doc: " + charsetDocument + ", bytes: " +
> charsetBytes + " => ");
> String xml =
> "<?xml version=\"1.0\" encoding=\"" + charsetDocument + "\"?>\n" +
> "<vap xmlns=\"http://www.eurid.eu/2005/vap\" >" +
> " <command>\n" +
> " <login>\n" +
> " <id>�����a</id>\n" +
> " <password>�����a</password>\n" +
> " </login> \n" +
> " </command>\n" +
> "</vap>";
> byte[] bytes = new byte[0];
> bytes = xml.getBytes(charsetBytes);
> ByteArrayInputStream in = new ByteArrayInputStream(bytes);
> try {
> VapDocument document = VapDocument.Factory.parse(in);
> if (document.validate()) {
> System.out.println("valid, encoding = " +
> document.documentProperties().getEncoding());
> return;
> }
> } catch(Exception e) {
> System.out.println(e.getClass().getName());
> return;
> }
> }
>
> public static void main(String[] args) throws Exception {
> test ("UTF-8", "UTF-8");
> test ("UTF-8", "UTF-16");
> test ("ISO-8859-1", "UTF-8");
> test ("ISO-8859-1", "UTF-16");
> test ("anything", "ISO-8859-1");
> test ("anything", "UTF-8");
> test ("anything", "UTF-16");
> }
>
> gives the following output:
>
> doc: UTF-8, bytes: UTF-8 => valid, encoding = UTF-8
> doc: UTF-8, bytes: UTF-16 => valid, encoding = UTF-8
> doc: ISO-8859-1, bytes: UTF-8 => valid, encoding = ISO-8859-1
> doc: ISO-8859-1, bytes: UTF-16 => valid, encoding = ISO-8859-1
> doc: anything, bytes: ISO-8859-1 => java.io.UnsupportedEncodingException
> doc: anything, bytes: UTF-8 => java.io.UnsupportedEncodingException
> doc: anything, bytes: UTF-16 => valid, encoding = anything
>
>
> Anything I can do about this ?
>
> Maarten
>
>
> Dennis Sosnoski wrote:
>
>> Do your XML documents specify the encoding in the XML declaration? If
>> not, there's no way to distinguish between UTF-8 and ISO-8859-X
>> without the multiple parses - and the multiple parse approach doesn't
>> even come close to guaranteeing that you've ended up with the correct
>> encoding (since the different flavors of ISO-8859-X reuse the same
>> byte values for different characters). If the documents *do* give the
>> encoding in the XML declaration, XMLBeans should be reading it and
>> interpreting the document correctly.
>>
>> - Dennis
>>
>> Christophe Bouhier (MC/ECM) wrote:
>>
>>> Hi Lawrence,
>>> I am not sure how to detect the XML charsets, besides just looping
>>> through the list of supported encodings and trying to parse
>>> succesfully. This is is not elegant but it worked for me. Thanks for
>>> your help.
>>> Cheers . Christophe
>>>
>>>
>>>> -----Original Message-----
>>>> From: Lawrence Jones [mailto:ljones@bea.com] Sent: 17 Disember 2005
>>>> 0:59
>>>> To: user@xmlbeans.apache.org
>>>> Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>> Have a look at the code in:
>>>>
>>>> $XMLBEANS/src/common/org/apache/xmlbeans/impl/common/EncodingMap.java
>>>>
>>>> and the code that calls it in
>>>>
>>>> $XMLBEANS/src/store/org/apache/xmlbeans/impl/store/Saver.java
>>>> around line 1760 onwards
>>>>
>>>> EncodingMap.java contains all the supported encodings in the static
>>>> initializer at line 70.
>>>>
>>>> Cheers,
>>>>
>>>> Lawrence
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Christophe Bouhier (MC/ECM)
>>>>> [mailto:Christophe.Bouhier@ericsson.com]
>>>>> Sent: Thursday, December 15, 2005 7:25 PM
>>>>> To: 'user@xmlbeans.apache.org'
>>>>> Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>
>>>>> Thanks! That helps. I checked the API doc for
>>>>
>>>>
>>>> setCharterEncoding but
>>>>
>>>>> couldn't find The supported encoding types. In other words which
>>>>> encodings are allowed in the Function
>>>>> setCharacterEncoding("encoding"); ?
>>>>>
>>>>> Cheers / Christophe
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Lawrence Jones [mailto:ljones@bea.com]
>>>>>> Sent: 16 Disember 2005 2:11
>>>>>> To: user@xmlbeans.apache.org
>>>>>> Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>>
>>>>>> Hi Christophe
>>>>>>
>>>>>> It's very unlikely that the characters are the problem -
>>>>>
>>>>>
>>>> all Unicode
>>>>
>>>>>> characters are allowed in XML - see e.g.
>>>>>> http://www.xml.com/axml/testaxml.htm (section 2.2) and hence in
>>>>>> XmlBeans.
>>>>>>
>>>>>> What is more likely is that the characters are not encoded (as
>>>>>> bytes) in the way XmlBeans expects. By default XmlBeans assumes
>>>>>> UTF-8 encoding. Yours are probably ISO8859_1 or some such
>>>>>
>>>>>
>>>> thing. If
>>>>
>>>>>> you want to play around with character encoding have a look at
>>>>>> XmlOptions.setCharacterEncoding().
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Lawrence
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Christophe Bouhier (MC/ECM)
>>>>>>> [mailto:Christophe.Bouhier@ericsson.com]
>>>>>>> Sent: Wednesday, December 14, 2005 6:04 PM
>>>>>>> To: 'user@xmlbeans.apache.org'
>>>>>>> Subject: Illegal characters, can xmlbeans be forgiving?
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> My application parses XML from many different sources.
>>>>>>
>>>>>>
>>>> (It's a RSS
>>>>
>>>>>>> reader/Podcast receiver).
>>>>>>> Before I switched to XMLBeans I was using an xml parser
>>>>>>>
>>>>>> called nanoXMl
>>>>>>
>>>>>>> which didn't mind Some illegal characters especially when
>>>>>>>
>>>>>> wrapped in
>>>>>>
>>>>>>> CDATA.
>>>>>>> Now XMLBeans stumbles over the illegal chars
>>>>>>
>>>>>>
>>>> below:(�EURoe) (Throws
>>>>
>>>>>>> exception).
>>>>>>>
>>>>>>> ....
>>>>>>> <description><![CDATA[
>>>>>>> Miljenko �EURoeMike�EUR? Grgich first gained international
>>>>>>>
>>>>>> recognition at
>>>>>>
>>>>>>> the celebrated �EURoeParis Tasting�EUR? of 1976. They had
>>>>>>>
>>>>>> chosen Mike�EUR(tm)s
>>>>>>
>>>>>>> 1973 Chateau Montelena Chardonnay as the finest white wine
>>>>>>>
>>>>>> in the world.
>>>>>>
>>>>>>> Today, Mike oversees daily operations at his winery
>>>>>>>
>>>>>> Grgich Hills.
>>>>>>
>>>>>>> His aim, year after year, is to improve the quality of their
>>>>>>> [...]]]></description> ......
>>>>>>>
>>>>>>> Is there anyway I can set an option to ignore illegal chars
>>>>>>>
>>>>>> and go on.
>>>>>>
>>>>>>> For me this could be a deal-breaker. I unfortunatly can't
>>>>>>>
>>>>>> expect all
>>>>>>
>>>>>>> XML out on the web to be "nice and tidy".
>>>>>>>
>>>>>>> Thanks for the help!
>>>>>>> Cheers / Christophe
>>>>>>>
>>>>>>>
>>>>>>>
>>>> --------------------------------------------------------------------
>>>>
>>>>>> -
>>>>>>
>>>>>>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>>>>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>>>>>
>>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org

Re: Illegal characters, can xmlbeans be forgiving?

Posted by Dennis Sosnoski <dm...@sosnoski.com>.

The XML recommendation says (4.3.3):

"It is a fatal error when an XML processor encounters an entity with an 
encoding that it is unable to process. It is a fatal error if an XML 
entity is determined (via default, encoding declaration, or higher-level 
protocol) to be in a certain encoding but contains octet sequences that 
are not legal in that encoding. It is also a fatal error if an XML 
entity contains no encoding declaration and its content is not legal 
UTF-8 or UTF-16."

Fatal errors are supposed to end processing. Since this doesn't seem to 
be enforced by XMLBeans (or more likely, by the parser), you should 
report this as an error.

I think it'd be a much more serious problem if XMLBeans fails to process 
a document written as UTF-8 or UTF-16 without an encoding declaration, 
or a document written as ISO-8858-1 with an encoding declaration. You 
might want to test those variations.

  - Dennis

maarten wrote:

> I have noticed that xmlbeans 2.0 doesn't care whether the encoding 
> declaration
> in the xml document matches the byte-encoding that is actually used.
> It seems to be more forgiving than I would like it to be.
>
> For example:
>
> public static void test (String charsetDocument, String charsetBytes) 
> throws Exception {
> System.out.print ("doc: " + charsetDocument + ", bytes: " + 
> charsetBytes + " => ");
> String xml =
> "<?xml version=\"1.0\" encoding=\"" + charsetDocument + "\"?>\n" +
> "<vap xmlns=\"http://www.eurid.eu/2005/vap\" >" +
> " <command>\n" +
> " <login>\n" +
> " <id>àáâäãā</id>\n" +
> " <password>àáâäãā</password>\n" +
> " </login> \n" +
> " </command>\n" +
> "</vap>";
> byte[] bytes = new byte[0];
> bytes = xml.getBytes(charsetBytes);
> ByteArrayInputStream in = new ByteArrayInputStream(bytes);
> try {
> VapDocument document = VapDocument.Factory.parse(in);
> if (document.validate()) {
> System.out.println("valid, encoding = " + 
> document.documentProperties().getEncoding());
> return;
> }
> } catch(Exception e) {
> System.out.println(e.getClass().getName());
> return;
> }
> }
>
> public static void main(String[] args) throws Exception {
> test ("UTF-8", "UTF-8");
> test ("UTF-8", "UTF-16");
> test ("ISO-8859-1", "UTF-8");
> test ("ISO-8859-1", "UTF-16");
> test ("anything", "ISO-8859-1");
> test ("anything", "UTF-8");
> test ("anything", "UTF-16");
> }
>
> gives the following output:
>
> doc: UTF-8, bytes: UTF-8 => valid, encoding = UTF-8
> doc: UTF-8, bytes: UTF-16 => valid, encoding = UTF-8
> doc: ISO-8859-1, bytes: UTF-8 => valid, encoding = ISO-8859-1
> doc: ISO-8859-1, bytes: UTF-16 => valid, encoding = ISO-8859-1
> doc: anything, bytes: ISO-8859-1 => java.io.UnsupportedEncodingException
> doc: anything, bytes: UTF-8 => java.io.UnsupportedEncodingException
> doc: anything, bytes: UTF-16 => valid, encoding = anything
>
>
> Anything I can do about this ?
>
> Maarten
>
>
> Dennis Sosnoski wrote:
>
>> Do your XML documents specify the encoding in the XML declaration? If 
>> not, there's no way to distinguish between UTF-8 and ISO-8859-X 
>> without the multiple parses - and the multiple parse approach doesn't 
>> even come close to guaranteeing that you've ended up with the correct 
>> encoding (since the different flavors of ISO-8859-X reuse the same 
>> byte values for different characters). If the documents *do* give the 
>> encoding in the XML declaration, XMLBeans should be reading it and 
>> interpreting the document correctly.
>>
>> - Dennis
>>
>> Christophe Bouhier (MC/ECM) wrote:
>>
>>> Hi Lawrence,
>>> I am not sure how to detect the XML charsets, besides just looping 
>>> through the list of supported encodings and trying to parse 
>>> succesfully. This is is not elegant but it worked for me. Thanks for 
>>> your help.
>>> Cheers . Christophe
>>>
>>>
>>>> -----Original Message-----
>>>> From: Lawrence Jones [mailto:ljones@bea.com] Sent: 17 Disember 2005 
>>>> 0:59
>>>> To: user@xmlbeans.apache.org
>>>> Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>> Have a look at the code in:
>>>>
>>>> $XMLBEANS/src/common/org/apache/xmlbeans/impl/common/EncodingMap.java
>>>>
>>>> and the code that calls it in
>>>>
>>>> $XMLBEANS/src/store/org/apache/xmlbeans/impl/store/Saver.java 
>>>> around line 1760 onwards
>>>>
>>>> EncodingMap.java contains all the supported encodings in the static 
>>>> initializer at line 70.
>>>>
>>>> Cheers,
>>>>
>>>> Lawrence
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Christophe Bouhier (MC/ECM) 
>>>>> [mailto:Christophe.Bouhier@ericsson.com]
>>>>> Sent: Thursday, December 15, 2005 7:25 PM
>>>>> To: 'user@xmlbeans.apache.org'
>>>>> Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>
>>>>> Thanks! That helps. I checked the API doc for
>>>>
>>>>
>>>> setCharterEncoding but
>>>>
>>>>> couldn’t find The supported encoding types. In other words which 
>>>>> encodings are allowed in the Function 
>>>>> setCharacterEncoding("encoding"); ?
>>>>>
>>>>> Cheers / Christophe
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Lawrence Jones [mailto:ljones@bea.com]
>>>>>> Sent: 16 Disember 2005 2:11
>>>>>> To: user@xmlbeans.apache.org
>>>>>> Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>>
>>>>>> Hi Christophe
>>>>>>
>>>>>> It's very unlikely that the characters are the problem -
>>>>>
>>>>>
>>>> all Unicode
>>>>
>>>>>> characters are allowed in XML - see e.g.
>>>>>> http://www.xml.com/axml/testaxml.htm (section 2.2) and hence in 
>>>>>> XmlBeans.
>>>>>>
>>>>>> What is more likely is that the characters are not encoded (as 
>>>>>> bytes) in the way XmlBeans expects. By default XmlBeans assumes 
>>>>>> UTF-8 encoding. Yours are probably ISO8859_1 or some such
>>>>>
>>>>>
>>>> thing. If
>>>>
>>>>>> you want to play around with character encoding have a look at 
>>>>>> XmlOptions.setCharacterEncoding().
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Lawrence
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Christophe Bouhier (MC/ECM)
>>>>>>> [mailto:Christophe.Bouhier@ericsson.com]
>>>>>>> Sent: Wednesday, December 14, 2005 6:04 PM
>>>>>>> To: 'user@xmlbeans.apache.org'
>>>>>>> Subject: Illegal characters, can xmlbeans be forgiving?
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> My application parses XML from many different sources.
>>>>>>
>>>>>>
>>>> (It's a RSS
>>>>
>>>>>>> reader/Podcast receiver).
>>>>>>> Before I switched to XMLBeans I was using an xml parser
>>>>>>>
>>>>>> called nanoXMl
>>>>>>
>>>>>>> which didn't mind Some illegal characters especially when
>>>>>>>
>>>>>> wrapped in
>>>>>>
>>>>>>> CDATA.
>>>>>>> Now XMLBeans stumbles over the illegal chars
>>>>>>
>>>>>>
>>>> below:(â€œ) (Throws
>>>>
>>>>>>> exception).
>>>>>>>
>>>>>>> ....
>>>>>>> <description><![CDATA[
>>>>>>> Miljenko â€œMikeâ€? Grgich first gained international
>>>>>>>
>>>>>> recognition at
>>>>>>
>>>>>>> the celebrated â€œParis Tastingâ€? of 1976. They had
>>>>>>>
>>>>>> chosen Mikeâ€™s
>>>>>>
>>>>>>> 1973 Chateau Montelena Chardonnay as the finest white wine
>>>>>>>
>>>>>> in the world.
>>>>>>
>>>>>>> Today, Mike oversees daily operations at his winery
>>>>>>>
>>>>>> Grgich Hills.
>>>>>>
>>>>>>> His aim, year after year, is to improve the quality of their 
>>>>>>> [...]]]></description> ......
>>>>>>>
>>>>>>> Is there anyway I can set an option to ignore illegal chars
>>>>>>>
>>>>>> and go on.
>>>>>>
>>>>>>> For me this could be a deal-breaker. I unfortunatly can't
>>>>>>>
>>>>>> expect all
>>>>>>
>>>>>>> XML out on the web to be "nice and tidy".
>>>>>>>
>>>>>>> Thanks for the help!
>>>>>>> Cheers / Christophe
>>>>>>>
>>>>>>>
>>>>>>>
>>>> --------------------------------------------------------------------
>>>>
>>>>>> -
>>>>>>
>>>>>>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>>>>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>>>>>
>>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org

Re: Illegal characters, can xmlbeans be forgiving?

Posted by maarten <ma...@dns.be>.

I have noticed that xmlbeans 2.0 doesn't care whether the encoding 
declaration
in the xml document matches the byte-encoding that is actually used.
It seems to be more forgiving than I would like it to be.

For example:

public static void test (String charsetDocument, String charsetBytes) 
throws Exception {
System.out.print ("doc: " + charsetDocument + ", bytes: " + charsetBytes 
+ " => ");
String xml =
"<?xml version=\"1.0\" encoding=\"" + charsetDocument + "\"?>\n" +
"<vap xmlns=\"http://www.eurid.eu/2005/vap\" >" +
" <command>\n" +
" <login>\n" +
" <id>àáâäãā</id>\n" +
" <password>àáâäãā</password>\n" +
" </login> \n" +
" </command>\n" +
"</vap>";
byte[] bytes = new byte[0];
bytes = xml.getBytes(charsetBytes);
ByteArrayInputStream in = new ByteArrayInputStream(bytes);
try {
VapDocument document = VapDocument.Factory.parse(in);
if (document.validate()) {
System.out.println("valid, encoding = " + 
document.documentProperties().getEncoding());
return;
}
} catch(Exception e) {
System.out.println(e.getClass().getName());
return;
}
}

public static void main(String[] args) throws Exception {
test ("UTF-8", "UTF-8");
test ("UTF-8", "UTF-16");
test ("ISO-8859-1", "UTF-8");
test ("ISO-8859-1", "UTF-16");
test ("anything", "ISO-8859-1");
test ("anything", "UTF-8");
test ("anything", "UTF-16");
}

gives the following output:

doc: UTF-8, bytes: UTF-8 => valid, encoding = UTF-8
doc: UTF-8, bytes: UTF-16 => valid, encoding = UTF-8
doc: ISO-8859-1, bytes: UTF-8 => valid, encoding = ISO-8859-1
doc: ISO-8859-1, bytes: UTF-16 => valid, encoding = ISO-8859-1
doc: anything, bytes: ISO-8859-1 => java.io.UnsupportedEncodingException
doc: anything, bytes: UTF-8 => java.io.UnsupportedEncodingException
doc: anything, bytes: UTF-16 => valid, encoding = anything


Anything I can do about this ?

Maarten


Dennis Sosnoski wrote:

> Do your XML documents specify the encoding in the XML declaration? If 
> not, there's no way to distinguish between UTF-8 and ISO-8859-X 
> without the multiple parses - and the multiple parse approach doesn't 
> even come close to guaranteeing that you've ended up with the correct 
> encoding (since the different flavors of ISO-8859-X reuse the same 
> byte values for different characters). If the documents *do* give the 
> encoding in the XML declaration, XMLBeans should be reading it and 
> interpreting the document correctly.
>
> - Dennis
>
> Christophe Bouhier (MC/ECM) wrote:
>
>> Hi Lawrence,
>> I am not sure how to detect the XML charsets, besides just looping 
>> through the list of supported encodings and trying to parse 
>> succesfully. This is is not elegant but it worked for me. Thanks for 
>> your help.
>> Cheers . Christophe
>>
>>
>>> -----Original Message-----
>>> From: Lawrence Jones [mailto:ljones@bea.com] Sent: 17 Disember 2005 
>>> 0:59
>>> To: user@xmlbeans.apache.org
>>> Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>> Have a look at the code in:
>>>
>>> $XMLBEANS/src/common/org/apache/xmlbeans/impl/common/EncodingMap.java
>>>
>>> and the code that calls it in
>>>
>>> $XMLBEANS/src/store/org/apache/xmlbeans/impl/store/Saver.java around 
>>> line 1760 onwards
>>>
>>> EncodingMap.java contains all the supported encodings in the static 
>>> initializer at line 70.
>>>
>>> Cheers,
>>>
>>> Lawrence
>>>
>>>
>>>> -----Original Message-----
>>>> From: Christophe Bouhier (MC/ECM) 
>>>> [mailto:Christophe.Bouhier@ericsson.com]
>>>> Sent: Thursday, December 15, 2005 7:25 PM
>>>> To: 'user@xmlbeans.apache.org'
>>>> Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>
>>>> Thanks! That helps. I checked the API doc for
>>>
>>> setCharterEncoding but
>>>
>>>> couldn’t find The supported encoding types. In other words which 
>>>> encodings are allowed in the Function 
>>>> setCharacterEncoding("encoding"); ?
>>>>
>>>> Cheers / Christophe
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Lawrence Jones [mailto:ljones@bea.com]
>>>>> Sent: 16 Disember 2005 2:11
>>>>> To: user@xmlbeans.apache.org
>>>>> Subject: RE: Illegal characters, can xmlbeans be forgiving?
>>>>>
>>>>> Hi Christophe
>>>>>
>>>>> It's very unlikely that the characters are the problem -
>>>>
>>> all Unicode
>>>
>>>>> characters are allowed in XML - see e.g.
>>>>> http://www.xml.com/axml/testaxml.htm (section 2.2) and hence in 
>>>>> XmlBeans.
>>>>>
>>>>> What is more likely is that the characters are not encoded (as 
>>>>> bytes) in the way XmlBeans expects. By default XmlBeans assumes 
>>>>> UTF-8 encoding. Yours are probably ISO8859_1 or some such
>>>>
>>> thing. If
>>>
>>>>> you want to play around with character encoding have a look at 
>>>>> XmlOptions.setCharacterEncoding().
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Lawrence
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Christophe Bouhier (MC/ECM)
>>>>>> [mailto:Christophe.Bouhier@ericsson.com]
>>>>>> Sent: Wednesday, December 14, 2005 6:04 PM
>>>>>> To: 'user@xmlbeans.apache.org'
>>>>>> Subject: Illegal characters, can xmlbeans be forgiving?
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> My application parses XML from many different sources.
>>>>>
>>> (It's a RSS
>>>
>>>>>> reader/Podcast receiver).
>>>>>> Before I switched to XMLBeans I was using an xml parser
>>>>>>
>>>>> called nanoXMl
>>>>>
>>>>>> which didn't mind Some illegal characters especially when
>>>>>>
>>>>> wrapped in
>>>>>
>>>>>> CDATA.
>>>>>> Now XMLBeans stumbles over the illegal chars
>>>>>
>>> below:(â€œ) (Throws
>>>
>>>>>> exception).
>>>>>>
>>>>>> ....
>>>>>> <description><![CDATA[
>>>>>> Miljenko â€œMikeâ€? Grgich first gained international
>>>>>>
>>>>> recognition at
>>>>>
>>>>>> the celebrated â€œParis Tastingâ€? of 1976. They had
>>>>>>
>>>>> chosen Mikeâ€™s
>>>>>
>>>>>> 1973 Chateau Montelena Chardonnay as the finest white wine
>>>>>>
>>>>> in the world.
>>>>>
>>>>>> Today, Mike oversees daily operations at his winery
>>>>>>
>>>>> Grgich Hills.
>>>>>
>>>>>> His aim, year after year, is to improve the quality of their 
>>>>>> [...]]]></description> ......
>>>>>>
>>>>>> Is there anyway I can set an option to ignore illegal chars
>>>>>>
>>>>> and go on.
>>>>>
>>>>>> For me this could be a deal-breaker. I unfortunatly can't
>>>>>>
>>>>> expect all
>>>>>
>>>>>> XML out on the web to be "nice and tidy".
>>>>>>
>>>>>> Thanks for the help!
>>>>>> Cheers / Christophe
>>>>>>
>>>>>>
>>>>>>
>>> --------------------------------------------------------------------
>>>
>>>>> -
>>>>>
>>>>>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>>>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>>>>
>>>>>
>>>>
>>> ---------------------------------------------------------------------
>>>
>>>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>>>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
>> For additional commands, e-mail: user-help@xmlbeans.apache.org
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org