You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@xmlbeans.apache.org by El...@ibi.com on 2007/07/23 22:48:13 UTC

RE: trouble validating UTF-8 document with internationalcharacters - please help!

Sorry, my e-mail had files attached to it, and the xml file was the document in question. 
Anyway, the offending line is:

         <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>

As you can see, there are 25 characters in this element, but xmlbeans thinks there are 26.

Thanks,
Elvira

-----Original Message-----
From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
Sent: Monday, July 23, 2007 4:33 PM
To: user@xmlbeans.apache.org
Subject: Re: trouble validating UTF-8 document with internationalcharacters - please help!

It would be more interesting to see the document in question at the line
and column referenced in the error message: is that string longer than
the declared maxLength facet?

(I wouldn't take XmlSpy as reference, since it is known to be
unreliable)

Radu

On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> Hello,
> 
>  
> 
> I am using XmlBeans to validate a document against its schema.
> 
> It works fine, except when international characters are used in the
> document.
> 
> For the attached document and the corresponding schema, the error
> message is as following:
> 
>  
> 
> Node: COMP_NAME, Line: 148, Column: 10, Detail: string length (string) is greater than maxLength facet (26) for 25
>   Document encoding is: null
>  
> This document is validated with XmlSpy. What am I missing? The document file was written as UTF-8. 
> 
>  
> 
> The code follows. Thanks so much for your help. 
> 
>  
> 
>       private boolean xmlBeanValidate(File xmlFile, List sdocs) {
> 
>             XmlObject[] schemas = (XmlObject[]) sdocs.toArray(new
> XmlObject[0]);
> 
>             SchemaTypeLoader sLoader;
> 
>             Collection compErrors = new ArrayList();
> 
>             XmlOptions schemaOptions = new XmlOptions();
> 
>             schemaOptions.setErrorListener(compErrors);
> 
>  
> 
>            try {
> 
>                   sLoader = XmlBeans.loadXsd(schemas, schemaOptions);
> 
>             } catch (Exception e) {
> 
>                  if(compErrors.isEmpty() || !(e instanceof
> XmlException)) {
> 
>                         e.printStackTrace();
> 
>                   }
> 
>                   logError("Schema is invalid");
> 
>                  for (Iterator i = compErrors.iterator();
> i.hasNext();)
> 
>                         log(i.next().toString());
> 
>                  return false;
> 
>             }
> 
>  
> 
>             XmlObject xobj = null;
> 
>            try {
> 
>                   Reader sr = newFileReader(xmlFile);
> 
>                   XmlOptions opt = new XmlOptions();
> 
>                   opt.setCharacterEncoding("UTF-8");
> 
>                   opt.setLoadLineNumbers();
> 
>                   xobj = sLoader.parse(sr, null, opt);
> 
>             } catch (Exception e) {
> 
>                   logError("xml not loadable: " + e);
> 
>                   e.printStackTrace();
> 
>                  return false;
> 
>             }
> 
>  
> 
>             Collection errors = new ArrayList();
> 
>            if(xobj.schemaType() == XmlObject.type) {
> 
>                   logError("xml is NOT valid. Document type not
> found.");
> 
>                  return false;
> 
>             } else if (xobj.validate(new
> XmlOptions().setErrorListener(errors))){
> 
>                   log("Document validation completed successfully.");
> 
>                  return true;
> 
>             }else {
> 
>                  for (Iterator it = errors.iterator(); it.hasNext();)
> {
> 
>                         XmlError xmlError = (XmlError)it.next();
> 
>                     logError("  Node: " 
> 
> 
> +xmlError.getCursorLocation().getDomNode().getNodeName()
> 
>                               +", Line: " + xmlError.getLine()
> 
>                               +", Column: " + xmlError.getColumn()
> 
>                               +", Detail: " + xmlError.getMessage());
> 
>                     logError("  Document encoding is: " 
> 
> 
> +xobj.documentProperties().getEncoding());
> 
>  
> 
>                   }
> 
>                  return false;
> 
>             }
> 
>       }
> 
>  
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document withinternationalcharacters-please help!

Posted by Radu Preotiuc-Pietro <ra...@bea.com>.
That makes sense, I had run the code on Linux, with the default encoding
set to UTF-8.

Thanks for posting the solution back to the list.

Radu

On Thu, 2007-07-26 at 11:37 -0400, Elvira_Gurevich@ibi.com wrote:
> Hi Radu,
> 
> With your help, I finally figured the problem:
> In my code, instead of passing the xml document file to the SchemaTypeLoader to parse, I created a FileReader first, which uses the default character encoding, which is not UTF-8 on Windows.
> 
> Thanks again, 
> Elvira
> 
> -----Original Message-----
> From: Gurevich, Elvira 
> Sent: Wednesday, July 25, 2007 5:15 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document withinternationalcharacters-please help!
> 
> Radu,
> 
> I remember you said before that you ran my code and it worked, but it does not work for me for some reason, in the same exact environment where your "validate" works! I really don't understand this! 
> I only sent you one method of my class, you had to complete it into a class... Something must be different! Would you mind sending to me the source for this class that you tested with my code? I am going to go through the code line by line, I don't know how else I can resolve it...
> 
> Thanks a lot,
> Elvira
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
> Sent: Wednesday, July 25, 2007 4:43 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document withinternationalcharacters-please help!
> 
> I am not sure what 1.x is. If you want to see the source code of
> InstanceValidator for 2.3, here is a link to ViewSVN:
> http://svn.apache.org/viewvc/xmlbeans/tags/2.3.0/src/xmlcomp/org/apache/xmlbeans/impl/tool/InstanceValidator.java?view=markup
> 
> Look at http://xmlbeans.apache.org/sourceAndBinaries/index.html for
> instructions on how to sync to SVN or simply download one of the source
> distributions.
> 
> However, the code you posted earlier also worked for me, so I don't know
> how much you should focus on the differences between the
> InstanceValidator code and your code.
> 
> Radu
> 
> On Wed, 2007-07-25 at 15:56 -0400, Elvira_Gurevich@ibi.com wrote:
> > Radu,
> > Thank you for your reply. Your information helped me isolate the problem further.
> > 
> > I ran xmlbeans's "validate" and it worked. 
> > 
> > Then I ran my validation code with the same classpath, same document and schema, and it still fails.
> >  
> > I tried to find source for org.apache.xmlbeans.impl.tool.InstanceValidator, I was looking in 1.x. Is this the right place for 2.3? The source I found in 1.x\src\xmlcomp\org\apache\xmlbeans\impl\tool does not seem to be the same level code, since if I run "validate" with no parameters, the help message is different, it says:
> > 
> > C:\xmlbeans\xmlbeans-2.3.0\bin>validate
> > Validates the specified instance against the specified schema.
> > Contrast with the svalidate tool, which validates using a stream.
> > Usage: validate [-dl] [-nopvr] [-noupa] [-license] schema.xsd instance.xml
> > Options:
> >     -dl - permit network downloads for imports and includes (default is off)
> >     -noupa - do not enforce the unique particle attribution rule
> >     -nopvr - do not enforce the particle valid (restriction) rule
> >     -partial - allow partial schema type system
> >     -license - prints license information  
> > 
> > Instead of what's in the code:
> > 
> >         if (cl.args().length == 0)
> >         {
> >             System.out.println("Validates a schema defintion and instances within the schema.");
> >             System.out.println("Usage: validate [switches] schema.xsd instance.xml");
> >             System.out.println("Switches:");
> >             System.out.println("    -dl    enable network downloads for imports and includes");
> >             System.out.println("    -nopvr disable particle valid (restriction) rule");
> >             System.out.println("    -noupa diable unique particle attributeion rule");
> >             System.out.println("    -license prints license information");
> >             return;
> >         }
> > 
> > I want to find the source to the InstanceValidator class that works with my document and schema. Please help,
> > 
> > Elvira
> > 
> > -----Original Message-----
> > From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
> > Sent: Tuesday, July 24, 2007 4:42 PM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document withinternationalcharacters- please help!
> > 
> > Elvira,
> > 
> > You are right that two bytes CAN mean one single char, but my (and probably Vinh's) point is that it can ALSO mean two chars (depending on what the bytes are really).
> > 
> > 1. I am using XmlBeans 2.3
> > 2. Not sure what role does Xerces play, but probably shouldn't matter
> > 3. I use "validate test.xml SAP_schema.xsd", having added XMLBEANS_HOME/bin to my PATH
> > 
> > One thing you should try, not sure if you've already tried it, is download the files from your own post and try those, it's not impossible that in the process of going through the mail system the files have been changed.
> > 
> > Radu
> > 
> > > -----Original Message-----
> > > From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com] 
> > > Sent: Tuesday, July 24, 2007 7:32 AM
> > > To: user@xmlbeans.apache.org
> > > Subject: RE: trouble validating UTF-8 document 
> > > withinternationalcharacters- please help!
> > > 
> > > Radu and Vinh, Thank you so much for putting your time into this!
> > > 
> > > Yes, I understand that the problem manifests because the "e" 
> > > in Mexico is a double-byte char. But java is supposed to 
> > > count the characters, not the bytes, and UTF-8 supports 
> > > international characters.
> > > 
> > > Radu, since you can validate the document, it means that 
> > > something else is different. 
> > > 1. What version of xmlbeans are you using? 
> > > 2. Should I be concerned about the xerces version? 
> > > 3. Could you please send me the exact code that you use to 
> > > validate, so that 4. I can try to isolate my problem further? 
> > > 5. How do I run the 'validate' utility of xmlbeans?
> > > 
> > > I've been trying to solve this problem for a while already, 
> > > before I posted to the forum, now I am becoming really desperate.
> > > 
> > > Thank you guys so much for helping!
> > > Elvira
> > >  
> > > 
> > > -----Original Message-----
> > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > Sent: Monday, July 23, 2007 6:54 PM
> > > To: user@xmlbeans.apache.org
> > > Subject: RE: trouble validating UTF-8 document 
> > > withinternationalcharacters- please help!
> > > 
> > > Elvira,
> > > 
> > > I had ran the Schema and the document through the 'validate' 
> > > utility that ships with XmlBeans initially. Now that you 
> > > mentioned it, I have also tried your code, same result, 
> > > document validates.
> > > 
> > > Radu
> > > 
> > > On Mon, 2007-07-23 at 14:57 -0700, Vinh Nguyen (vinguye2) wrote:
> > > > My guess is that the "e" in Mexico is a double-byte char.  
> > > So your XML document should actually be UTF-16, not UTF-8.  
> > > In your db table, perhaps you are using 16-bit chars, so your 
> > > string would correctly appear to have 25 chars.  But in byte 
> > > representation, it's actually 26 bytes = 26 chars.
> > > >  
> > > > 
> > > > -----Original Message-----
> > > > From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com]
> > > > Sent: Monday, July 23, 2007 2:45 PM
> > > > To: user@xmlbeans.apache.org
> > > > Subject: RE: trouble validating UTF-8 document with 
> > > internationalcharacters- please help!
> > > > 
> > > > Radu,
> > > > 
> > > > You mean you tried the attached document with the attached 
> > > schema, and the same code as in the original question, and it 
> > > worked with no errors? If you open the attachment, can you go 
> > > to line 148 and see the same line? Could you send your test 
> > > case back to me and I'll try to run it? Because when I run my 
> > > setup, no matter what I do, I cannot get around this error. 
> > > Are you using the latest release of xmlbeans? Mine is dated 6/12/2006.
> > > > 
> > > > BTW, this document is composed from data from table data, 
> > > and the column width for this column is defined as 25. That 
> > > is how the schema is constructed. 
> > > > 
> > > > Thanks for your help.
> > > > Elvira
> > > > 
> > > > -----Original Message-----
> > > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > > Sent: Monday, July 23, 2007 5:26 PM
> > > > To: user@xmlbeans.apache.org
> > > > Subject: RE: trouble validating UTF-8 document with 
> > > internationalcharacters- please help!
> > > > 
> > > > Sorry, my mistake, I did not notice the attachments.
> > > > 
> > > > However, it is not clear to me whether there are 25 or 26 
> > > characters in that string. I am rusty on UTF-8 encoding rules 
> > > and Unicode, but I am sure that the 'é' character can be 
> > > represented as either 'eacute' (Unicode 00E9) or a composite 
> > > character (I would imagine there are many ways to represent 
> > > it in this way). What XMLSchema says is 'count Unicode 
> > > codepoints' as far as I can tell.
> > > > 
> > > > So, while it is not impossible that there is a bug, I think 
> > > the far more likely possibility is that your document does 
> > > not contain the characters you think it contains. I also 
> > > doubt that the attachment is the same document that gives the 
> > > error, because I have tried it and it works for me. So you 
> > > would have to do some additional investigation on this, 
> > > ideally get the exact bytes from that document.
> > > > 
> > > > Radu
> > > > 
> > > > On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> > > > > Sorry, my e-mail had files attached to it, and the xml 
> > > file was the document in question. 
> > > > > Anyway, the offending line is:
> > > > > 
> > > > >          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> > > > > 
> > > > > As you can see, there are 25 characters in this element, 
> > > but xmlbeans thinks there are 26.
> > > > > 
> > > > > Thanks,
> > > > > Elvira
> > > > > 
> > > > > -----Original Message-----
> > > > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > > > Sent: Monday, July 23, 2007 4:33 PM
> > > > > To: user@xmlbeans.apache.org
> > > > > Subject: Re: trouble validating UTF-8 document with 
> > > internationalcharacters - please help!
> > > > > 
> > > > > It would be more interesting to see the document in 
> > > question at the 
> > > > > line and column referenced in the error message: is that string 
> > > > > longer than the declared maxLength facet?
> > > > > 
> > > > > (I wouldn't take XmlSpy as reference, since it is known to be
> > > > > unreliable)
> > > > > 
> > > > > Radu
> > > > > 
> > > > > On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > > > > > Hello,
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > > I am using XmlBeans to validate a document against its schema.
> > > > > > 
> > > > > > It works fine, except when international characters are used in 
> > > > > > the document.
> > > > > > 
> > > > > > For the attached document and the corresponding schema, 
> > > the error 
> > > > > > message is as following:
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > > Node: COMP_NAME, Line: 148, Column: 10, Detail: string 
> > > length (string) is greater than maxLength facet (26) for 25
> > > > > >   Document encoding is: null
> > > > > >  
> > > > > > This document is validated with XmlSpy. What am I 
> > > missing? The document file was written as UTF-8. 
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > > The code follows. Thanks so much for your help. 
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > >       private boolean xmlBeanValidate(File xmlFile, 
> > > List sdocs) {
> > > > > > 
> > > > > >             XmlObject[] schemas = (XmlObject[]) 
> > > sdocs.toArray(new 
> > > > > > XmlObject[0]);
> > > > > > 
> > > > > >             SchemaTypeLoader sLoader;
> > > > > > 
> > > > > >             Collection compErrors = new ArrayList();
> > > > > > 
> > > > > >             XmlOptions schemaOptions = new XmlOptions();
> > > > > > 
> > > > > >             schemaOptions.setErrorListener(compErrors);
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > >            try {
> > > > > > 
> > > > > >                   sLoader = XmlBeans.loadXsd(schemas, 
> > > > > > schemaOptions);
> > > > > > 
> > > > > >             } catch (Exception e) {
> > > > > > 
> > > > > >                  if(compErrors.isEmpty() || !(e instanceof
> > > > > > XmlException)) {
> > > > > > 
> > > > > >                         e.printStackTrace();
> > > > > > 
> > > > > >                   }
> > > > > > 
> > > > > >                   logError("Schema is invalid");
> > > > > > 
> > > > > >                  for (Iterator i = compErrors.iterator();
> > > > > > i.hasNext();)
> > > > > > 
> > > > > >                         log(i.next().toString());
> > > > > > 
> > > > > >                  return false;
> > > > > > 
> > > > > >             }
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > >             XmlObject xobj = null;
> > > > > > 
> > > > > >            try {
> > > > > > 
> > > > > >                   Reader sr = newFileReader(xmlFile);
> > > > > > 
> > > > > >                   XmlOptions opt = new XmlOptions();
> > > > > > 
> > > > > >                   opt.setCharacterEncoding("UTF-8");
> > > > > > 
> > > > > >                   opt.setLoadLineNumbers();
> > > > > > 
> > > > > >                   xobj = sLoader.parse(sr, null, opt);
> > > > > > 
> > > > > >             } catch (Exception e) {
> > > > > > 
> > > > > >                   logError("xml not loadable: " + e);
> > > > > > 
> > > > > >                   e.printStackTrace();
> > > > > > 
> > > > > >                  return false;
> > > > > > 
> > > > > >             }
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > >             Collection errors = new ArrayList();
> > > > > > 
> > > > > >            if(xobj.schemaType() == XmlObject.type) {
> > > > > > 
> > > > > >                   logError("xml is NOT valid. Document type not 
> > > > > > found.");
> > > > > > 
> > > > > >                  return false;
> > > > > > 
> > > > > >             } else if (xobj.validate(new 
> > > > > > XmlOptions().setErrorListener(errors))){
> > > > > > 
> > > > > >                   log("Document validation completed 
> > > > > > successfully.");
> > > > > > 
> > > > > >                  return true;
> > > > > > 
> > > > > >             }else {
> > > > > > 
> > > > > >                  for (Iterator it = errors.iterator();
> > > > > > it.hasNext();) {
> > > > > > 
> > > > > >                         XmlError xmlError = (XmlError)it.next();
> > > > > > 
> > > > > >                     logError("  Node: " 
> > > > > > 
> > > > > > 
> > > > > > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > > > > > 
> > > > > >                               +", Line: " + xmlError.getLine()
> > > > > > 
> > > > > >                               +", Column: " + 
> > > xmlError.getColumn()
> > > > > > 
> > > > > >                               +", Detail: " + 
> > > > > > xmlError.getMessage());
> > > > > > 
> > > > > >                     logError("  Document encoding is: " 
> > > > > > 
> > > > > > 
> > > > > > +xobj.documentProperties().getEncoding());
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > >                   }
> > > > > > 
> > > > > >                  return false;
> > > > > > 
> > > > > >             }
> > > > > > 
> > > > > >       }
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > > 
> > > > > > 
> > > ------------------------------------------------------------------
> > > > > > --
> > > > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > > 
> > > > > Notice:  This email message, together with any 
> > > attachments, may contain information  of  BEA Systems,  Inc., 
> > >  its subsidiaries  and  affiliated entities,  that may be 
> > > confidential,  proprietary,  copyrighted  and/or legally 
> > > privileged, and is intended solely for the use of the 
> > > individual or entity named in this message. If you are not 
> > > the intended recipient, and have received this message in 
> > > error, please immediately return this by email and then delete it.
> > > > > 
> > > > > 
> > > --------------------------------------------------------------------
> > > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > > 
> > > > > 
> > > > > 
> > > --------------------------------------------------------------------
> > > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > > 
> > > > 
> > > > Notice:  This email message, together with any attachments, 
> > > may contain information  of  BEA Systems,  Inc.,  its 
> > > subsidiaries  and  affiliated entities,  that may be 
> > > confidential,  proprietary,  copyrighted  and/or legally 
> > > privileged, and is intended solely for the use of the 
> > > individual or entity named in this message. If you are not 
> > > the intended recipient, and have received this message in 
> > > error, please immediately return this by email and then delete it.
> > > > 
> > > > 
> > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > > 
> > > > 
> > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > > 
> > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > 
> > > Notice:  This email message, together with any attachments, 
> > > may contain information  of  BEA Systems,  Inc.,  its 
> > > subsidiaries  and  affiliated entities,  that may be 
> > > confidential,  proprietary,  copyrighted  and/or legally 
> > > privileged, and is intended solely for the use of the 
> > > individual or entity named in this message. If you are not 
> > > the intended recipient, and have received this message in 
> > > error, please immediately return this by email and then delete it.
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > 
> > 
> > Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> 
> Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document withinternationalcharacters-please help!

Posted by El...@ibi.com.
Hi Radu,

With your help, I finally figured the problem:
In my code, instead of passing the xml document file to the SchemaTypeLoader to parse, I created a FileReader first, which uses the default character encoding, which is not UTF-8 on Windows.

Thanks again, 
Elvira

-----Original Message-----
From: Gurevich, Elvira 
Sent: Wednesday, July 25, 2007 5:15 PM
To: user@xmlbeans.apache.org
Subject: RE: trouble validating UTF-8 document withinternationalcharacters-please help!

Radu,

I remember you said before that you ran my code and it worked, but it does not work for me for some reason, in the same exact environment where your "validate" works! I really don't understand this! 
I only sent you one method of my class, you had to complete it into a class... Something must be different! Would you mind sending to me the source for this class that you tested with my code? I am going to go through the code line by line, I don't know how else I can resolve it...

Thanks a lot,
Elvira

-----Original Message-----
From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
Sent: Wednesday, July 25, 2007 4:43 PM
To: user@xmlbeans.apache.org
Subject: RE: trouble validating UTF-8 document withinternationalcharacters-please help!

I am not sure what 1.x is. If you want to see the source code of
InstanceValidator for 2.3, here is a link to ViewSVN:
http://svn.apache.org/viewvc/xmlbeans/tags/2.3.0/src/xmlcomp/org/apache/xmlbeans/impl/tool/InstanceValidator.java?view=markup

Look at http://xmlbeans.apache.org/sourceAndBinaries/index.html for
instructions on how to sync to SVN or simply download one of the source
distributions.

However, the code you posted earlier also worked for me, so I don't know
how much you should focus on the differences between the
InstanceValidator code and your code.

Radu

On Wed, 2007-07-25 at 15:56 -0400, Elvira_Gurevich@ibi.com wrote:
> Radu,
> Thank you for your reply. Your information helped me isolate the problem further.
> 
> I ran xmlbeans's "validate" and it worked. 
> 
> Then I ran my validation code with the same classpath, same document and schema, and it still fails.
>  
> I tried to find source for org.apache.xmlbeans.impl.tool.InstanceValidator, I was looking in 1.x. Is this the right place for 2.3? The source I found in 1.x\src\xmlcomp\org\apache\xmlbeans\impl\tool does not seem to be the same level code, since if I run "validate" with no parameters, the help message is different, it says:
> 
> C:\xmlbeans\xmlbeans-2.3.0\bin>validate
> Validates the specified instance against the specified schema.
> Contrast with the svalidate tool, which validates using a stream.
> Usage: validate [-dl] [-nopvr] [-noupa] [-license] schema.xsd instance.xml
> Options:
>     -dl - permit network downloads for imports and includes (default is off)
>     -noupa - do not enforce the unique particle attribution rule
>     -nopvr - do not enforce the particle valid (restriction) rule
>     -partial - allow partial schema type system
>     -license - prints license information  
> 
> Instead of what's in the code:
> 
>         if (cl.args().length == 0)
>         {
>             System.out.println("Validates a schema defintion and instances within the schema.");
>             System.out.println("Usage: validate [switches] schema.xsd instance.xml");
>             System.out.println("Switches:");
>             System.out.println("    -dl    enable network downloads for imports and includes");
>             System.out.println("    -nopvr disable particle valid (restriction) rule");
>             System.out.println("    -noupa diable unique particle attributeion rule");
>             System.out.println("    -license prints license information");
>             return;
>         }
> 
> I want to find the source to the InstanceValidator class that works with my document and schema. Please help,
> 
> Elvira
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
> Sent: Tuesday, July 24, 2007 4:42 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document withinternationalcharacters- please help!
> 
> Elvira,
> 
> You are right that two bytes CAN mean one single char, but my (and probably Vinh's) point is that it can ALSO mean two chars (depending on what the bytes are really).
> 
> 1. I am using XmlBeans 2.3
> 2. Not sure what role does Xerces play, but probably shouldn't matter
> 3. I use "validate test.xml SAP_schema.xsd", having added XMLBEANS_HOME/bin to my PATH
> 
> One thing you should try, not sure if you've already tried it, is download the files from your own post and try those, it's not impossible that in the process of going through the mail system the files have been changed.
> 
> Radu
> 
> > -----Original Message-----
> > From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com] 
> > Sent: Tuesday, July 24, 2007 7:32 AM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document 
> > withinternationalcharacters- please help!
> > 
> > Radu and Vinh, Thank you so much for putting your time into this!
> > 
> > Yes, I understand that the problem manifests because the "e" 
> > in Mexico is a double-byte char. But java is supposed to 
> > count the characters, not the bytes, and UTF-8 supports 
> > international characters.
> > 
> > Radu, since you can validate the document, it means that 
> > something else is different. 
> > 1. What version of xmlbeans are you using? 
> > 2. Should I be concerned about the xerces version? 
> > 3. Could you please send me the exact code that you use to 
> > validate, so that 4. I can try to isolate my problem further? 
> > 5. How do I run the 'validate' utility of xmlbeans?
> > 
> > I've been trying to solve this problem for a while already, 
> > before I posted to the forum, now I am becoming really desperate.
> > 
> > Thank you guys so much for helping!
> > Elvira
> >  
> > 
> > -----Original Message-----
> > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > Sent: Monday, July 23, 2007 6:54 PM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document 
> > withinternationalcharacters- please help!
> > 
> > Elvira,
> > 
> > I had ran the Schema and the document through the 'validate' 
> > utility that ships with XmlBeans initially. Now that you 
> > mentioned it, I have also tried your code, same result, 
> > document validates.
> > 
> > Radu
> > 
> > On Mon, 2007-07-23 at 14:57 -0700, Vinh Nguyen (vinguye2) wrote:
> > > My guess is that the "e" in Mexico is a double-byte char.  
> > So your XML document should actually be UTF-16, not UTF-8.  
> > In your db table, perhaps you are using 16-bit chars, so your 
> > string would correctly appear to have 25 chars.  But in byte 
> > representation, it's actually 26 bytes = 26 chars.
> > >  
> > > 
> > > -----Original Message-----
> > > From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com]
> > > Sent: Monday, July 23, 2007 2:45 PM
> > > To: user@xmlbeans.apache.org
> > > Subject: RE: trouble validating UTF-8 document with 
> > internationalcharacters- please help!
> > > 
> > > Radu,
> > > 
> > > You mean you tried the attached document with the attached 
> > schema, and the same code as in the original question, and it 
> > worked with no errors? If you open the attachment, can you go 
> > to line 148 and see the same line? Could you send your test 
> > case back to me and I'll try to run it? Because when I run my 
> > setup, no matter what I do, I cannot get around this error. 
> > Are you using the latest release of xmlbeans? Mine is dated 6/12/2006.
> > > 
> > > BTW, this document is composed from data from table data, 
> > and the column width for this column is defined as 25. That 
> > is how the schema is constructed. 
> > > 
> > > Thanks for your help.
> > > Elvira
> > > 
> > > -----Original Message-----
> > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > Sent: Monday, July 23, 2007 5:26 PM
> > > To: user@xmlbeans.apache.org
> > > Subject: RE: trouble validating UTF-8 document with 
> > internationalcharacters- please help!
> > > 
> > > Sorry, my mistake, I did not notice the attachments.
> > > 
> > > However, it is not clear to me whether there are 25 or 26 
> > characters in that string. I am rusty on UTF-8 encoding rules 
> > and Unicode, but I am sure that the 'é' character can be 
> > represented as either 'eacute' (Unicode 00E9) or a composite 
> > character (I would imagine there are many ways to represent 
> > it in this way). What XMLSchema says is 'count Unicode 
> > codepoints' as far as I can tell.
> > > 
> > > So, while it is not impossible that there is a bug, I think 
> > the far more likely possibility is that your document does 
> > not contain the characters you think it contains. I also 
> > doubt that the attachment is the same document that gives the 
> > error, because I have tried it and it works for me. So you 
> > would have to do some additional investigation on this, 
> > ideally get the exact bytes from that document.
> > > 
> > > Radu
> > > 
> > > On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> > > > Sorry, my e-mail had files attached to it, and the xml 
> > file was the document in question. 
> > > > Anyway, the offending line is:
> > > > 
> > > >          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> > > > 
> > > > As you can see, there are 25 characters in this element, 
> > but xmlbeans thinks there are 26.
> > > > 
> > > > Thanks,
> > > > Elvira
> > > > 
> > > > -----Original Message-----
> > > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > > Sent: Monday, July 23, 2007 4:33 PM
> > > > To: user@xmlbeans.apache.org
> > > > Subject: Re: trouble validating UTF-8 document with 
> > internationalcharacters - please help!
> > > > 
> > > > It would be more interesting to see the document in 
> > question at the 
> > > > line and column referenced in the error message: is that string 
> > > > longer than the declared maxLength facet?
> > > > 
> > > > (I wouldn't take XmlSpy as reference, since it is known to be
> > > > unreliable)
> > > > 
> > > > Radu
> > > > 
> > > > On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > > > > Hello,
> > > > > 
> > > > >  
> > > > > 
> > > > > I am using XmlBeans to validate a document against its schema.
> > > > > 
> > > > > It works fine, except when international characters are used in 
> > > > > the document.
> > > > > 
> > > > > For the attached document and the corresponding schema, 
> > the error 
> > > > > message is as following:
> > > > > 
> > > > >  
> > > > > 
> > > > > Node: COMP_NAME, Line: 148, Column: 10, Detail: string 
> > length (string) is greater than maxLength facet (26) for 25
> > > > >   Document encoding is: null
> > > > >  
> > > > > This document is validated with XmlSpy. What am I 
> > missing? The document file was written as UTF-8. 
> > > > > 
> > > > >  
> > > > > 
> > > > > The code follows. Thanks so much for your help. 
> > > > > 
> > > > >  
> > > > > 
> > > > >       private boolean xmlBeanValidate(File xmlFile, 
> > List sdocs) {
> > > > > 
> > > > >             XmlObject[] schemas = (XmlObject[]) 
> > sdocs.toArray(new 
> > > > > XmlObject[0]);
> > > > > 
> > > > >             SchemaTypeLoader sLoader;
> > > > > 
> > > > >             Collection compErrors = new ArrayList();
> > > > > 
> > > > >             XmlOptions schemaOptions = new XmlOptions();
> > > > > 
> > > > >             schemaOptions.setErrorListener(compErrors);
> > > > > 
> > > > >  
> > > > > 
> > > > >            try {
> > > > > 
> > > > >                   sLoader = XmlBeans.loadXsd(schemas, 
> > > > > schemaOptions);
> > > > > 
> > > > >             } catch (Exception e) {
> > > > > 
> > > > >                  if(compErrors.isEmpty() || !(e instanceof
> > > > > XmlException)) {
> > > > > 
> > > > >                         e.printStackTrace();
> > > > > 
> > > > >                   }
> > > > > 
> > > > >                   logError("Schema is invalid");
> > > > > 
> > > > >                  for (Iterator i = compErrors.iterator();
> > > > > i.hasNext();)
> > > > > 
> > > > >                         log(i.next().toString());
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             }
> > > > > 
> > > > >  
> > > > > 
> > > > >             XmlObject xobj = null;
> > > > > 
> > > > >            try {
> > > > > 
> > > > >                   Reader sr = newFileReader(xmlFile);
> > > > > 
> > > > >                   XmlOptions opt = new XmlOptions();
> > > > > 
> > > > >                   opt.setCharacterEncoding("UTF-8");
> > > > > 
> > > > >                   opt.setLoadLineNumbers();
> > > > > 
> > > > >                   xobj = sLoader.parse(sr, null, opt);
> > > > > 
> > > > >             } catch (Exception e) {
> > > > > 
> > > > >                   logError("xml not loadable: " + e);
> > > > > 
> > > > >                   e.printStackTrace();
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             }
> > > > > 
> > > > >  
> > > > > 
> > > > >             Collection errors = new ArrayList();
> > > > > 
> > > > >            if(xobj.schemaType() == XmlObject.type) {
> > > > > 
> > > > >                   logError("xml is NOT valid. Document type not 
> > > > > found.");
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             } else if (xobj.validate(new 
> > > > > XmlOptions().setErrorListener(errors))){
> > > > > 
> > > > >                   log("Document validation completed 
> > > > > successfully.");
> > > > > 
> > > > >                  return true;
> > > > > 
> > > > >             }else {
> > > > > 
> > > > >                  for (Iterator it = errors.iterator();
> > > > > it.hasNext();) {
> > > > > 
> > > > >                         XmlError xmlError = (XmlError)it.next();
> > > > > 
> > > > >                     logError("  Node: " 
> > > > > 
> > > > > 
> > > > > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > > > > 
> > > > >                               +", Line: " + xmlError.getLine()
> > > > > 
> > > > >                               +", Column: " + 
> > xmlError.getColumn()
> > > > > 
> > > > >                               +", Detail: " + 
> > > > > xmlError.getMessage());
> > > > > 
> > > > >                     logError("  Document encoding is: " 
> > > > > 
> > > > > 
> > > > > +xobj.documentProperties().getEncoding());
> > > > > 
> > > > >  
> > > > > 
> > > > >                   }
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             }
> > > > > 
> > > > >       }
> > > > > 
> > > > >  
> > > > > 
> > > > > 
> > > > > 
> > ------------------------------------------------------------------
> > > > > --
> > > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > > Notice:  This email message, together with any 
> > attachments, may contain information  of  BEA Systems,  Inc., 
> >  its subsidiaries  and  affiliated entities,  that may be 
> > confidential,  proprietary,  copyrighted  and/or legally 
> > privileged, and is intended solely for the use of the 
> > individual or entity named in this message. If you are not 
> > the intended recipient, and have received this message in 
> > error, please immediately return this by email and then delete it.
> > > > 
> > > > 
> > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > > 
> > > > 
> > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > 
> > > Notice:  This email message, together with any attachments, 
> > may contain information  of  BEA Systems,  Inc.,  its 
> > subsidiaries  and  affiliated entities,  that may be 
> > confidential,  proprietary,  copyrighted  and/or legally 
> > privileged, and is intended solely for the use of the 
> > individual or entity named in this message. If you are not 
> > the intended recipient, and have received this message in 
> > error, please immediately return this by email and then delete it.
> > > 
> > > 
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > 
> > > 
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > 
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > 
> > Notice:  This email message, together with any attachments, 
> > may contain information  of  BEA Systems,  Inc.,  its 
> > subsidiaries  and  affiliated entities,  that may be 
> > confidential,  proprietary,  copyrighted  and/or legally 
> > privileged, and is intended solely for the use of the 
> > individual or entity named in this message. If you are not 
> > the intended recipient, and have received this message in 
> > error, please immediately return this by email and then delete it.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> 
> Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document withinternationalcharacters-please help!

Posted by El...@ibi.com.
Radu,

I remember you said before that you ran my code and it worked, but it does not work for me for some reason, in the same exact environment where your "validate" works! I really don't understand this! 
I only sent you one method of my class, you had to complete it into a class... Something must be different! Would you mind sending to me the source for this class that you tested with my code? I am going to go through the code line by line, I don't know how else I can resolve it...

Thanks a lot,
Elvira

-----Original Message-----
From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
Sent: Wednesday, July 25, 2007 4:43 PM
To: user@xmlbeans.apache.org
Subject: RE: trouble validating UTF-8 document withinternationalcharacters-please help!

I am not sure what 1.x is. If you want to see the source code of
InstanceValidator for 2.3, here is a link to ViewSVN:
http://svn.apache.org/viewvc/xmlbeans/tags/2.3.0/src/xmlcomp/org/apache/xmlbeans/impl/tool/InstanceValidator.java?view=markup

Look at http://xmlbeans.apache.org/sourceAndBinaries/index.html for
instructions on how to sync to SVN or simply download one of the source
distributions.

However, the code you posted earlier also worked for me, so I don't know
how much you should focus on the differences between the
InstanceValidator code and your code.

Radu

On Wed, 2007-07-25 at 15:56 -0400, Elvira_Gurevich@ibi.com wrote:
> Radu,
> Thank you for your reply. Your information helped me isolate the problem further.
> 
> I ran xmlbeans's "validate" and it worked. 
> 
> Then I ran my validation code with the same classpath, same document and schema, and it still fails.
>  
> I tried to find source for org.apache.xmlbeans.impl.tool.InstanceValidator, I was looking in 1.x. Is this the right place for 2.3? The source I found in 1.x\src\xmlcomp\org\apache\xmlbeans\impl\tool does not seem to be the same level code, since if I run "validate" with no parameters, the help message is different, it says:
> 
> C:\xmlbeans\xmlbeans-2.3.0\bin>validate
> Validates the specified instance against the specified schema.
> Contrast with the svalidate tool, which validates using a stream.
> Usage: validate [-dl] [-nopvr] [-noupa] [-license] schema.xsd instance.xml
> Options:
>     -dl - permit network downloads for imports and includes (default is off)
>     -noupa - do not enforce the unique particle attribution rule
>     -nopvr - do not enforce the particle valid (restriction) rule
>     -partial - allow partial schema type system
>     -license - prints license information  
> 
> Instead of what's in the code:
> 
>         if (cl.args().length == 0)
>         {
>             System.out.println("Validates a schema defintion and instances within the schema.");
>             System.out.println("Usage: validate [switches] schema.xsd instance.xml");
>             System.out.println("Switches:");
>             System.out.println("    -dl    enable network downloads for imports and includes");
>             System.out.println("    -nopvr disable particle valid (restriction) rule");
>             System.out.println("    -noupa diable unique particle attributeion rule");
>             System.out.println("    -license prints license information");
>             return;
>         }
> 
> I want to find the source to the InstanceValidator class that works with my document and schema. Please help,
> 
> Elvira
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
> Sent: Tuesday, July 24, 2007 4:42 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document withinternationalcharacters- please help!
> 
> Elvira,
> 
> You are right that two bytes CAN mean one single char, but my (and probably Vinh's) point is that it can ALSO mean two chars (depending on what the bytes are really).
> 
> 1. I am using XmlBeans 2.3
> 2. Not sure what role does Xerces play, but probably shouldn't matter
> 3. I use "validate test.xml SAP_schema.xsd", having added XMLBEANS_HOME/bin to my PATH
> 
> One thing you should try, not sure if you've already tried it, is download the files from your own post and try those, it's not impossible that in the process of going through the mail system the files have been changed.
> 
> Radu
> 
> > -----Original Message-----
> > From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com] 
> > Sent: Tuesday, July 24, 2007 7:32 AM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document 
> > withinternationalcharacters- please help!
> > 
> > Radu and Vinh, Thank you so much for putting your time into this!
> > 
> > Yes, I understand that the problem manifests because the "e" 
> > in Mexico is a double-byte char. But java is supposed to 
> > count the characters, not the bytes, and UTF-8 supports 
> > international characters.
> > 
> > Radu, since you can validate the document, it means that 
> > something else is different. 
> > 1. What version of xmlbeans are you using? 
> > 2. Should I be concerned about the xerces version? 
> > 3. Could you please send me the exact code that you use to 
> > validate, so that 4. I can try to isolate my problem further? 
> > 5. How do I run the 'validate' utility of xmlbeans?
> > 
> > I've been trying to solve this problem for a while already, 
> > before I posted to the forum, now I am becoming really desperate.
> > 
> > Thank you guys so much for helping!
> > Elvira
> >  
> > 
> > -----Original Message-----
> > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > Sent: Monday, July 23, 2007 6:54 PM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document 
> > withinternationalcharacters- please help!
> > 
> > Elvira,
> > 
> > I had ran the Schema and the document through the 'validate' 
> > utility that ships with XmlBeans initially. Now that you 
> > mentioned it, I have also tried your code, same result, 
> > document validates.
> > 
> > Radu
> > 
> > On Mon, 2007-07-23 at 14:57 -0700, Vinh Nguyen (vinguye2) wrote:
> > > My guess is that the "e" in Mexico is a double-byte char.  
> > So your XML document should actually be UTF-16, not UTF-8.  
> > In your db table, perhaps you are using 16-bit chars, so your 
> > string would correctly appear to have 25 chars.  But in byte 
> > representation, it's actually 26 bytes = 26 chars.
> > >  
> > > 
> > > -----Original Message-----
> > > From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com]
> > > Sent: Monday, July 23, 2007 2:45 PM
> > > To: user@xmlbeans.apache.org
> > > Subject: RE: trouble validating UTF-8 document with 
> > internationalcharacters- please help!
> > > 
> > > Radu,
> > > 
> > > You mean you tried the attached document with the attached 
> > schema, and the same code as in the original question, and it 
> > worked with no errors? If you open the attachment, can you go 
> > to line 148 and see the same line? Could you send your test 
> > case back to me and I'll try to run it? Because when I run my 
> > setup, no matter what I do, I cannot get around this error. 
> > Are you using the latest release of xmlbeans? Mine is dated 6/12/2006.
> > > 
> > > BTW, this document is composed from data from table data, 
> > and the column width for this column is defined as 25. That 
> > is how the schema is constructed. 
> > > 
> > > Thanks for your help.
> > > Elvira
> > > 
> > > -----Original Message-----
> > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > Sent: Monday, July 23, 2007 5:26 PM
> > > To: user@xmlbeans.apache.org
> > > Subject: RE: trouble validating UTF-8 document with 
> > internationalcharacters- please help!
> > > 
> > > Sorry, my mistake, I did not notice the attachments.
> > > 
> > > However, it is not clear to me whether there are 25 or 26 
> > characters in that string. I am rusty on UTF-8 encoding rules 
> > and Unicode, but I am sure that the 'é' character can be 
> > represented as either 'eacute' (Unicode 00E9) or a composite 
> > character (I would imagine there are many ways to represent 
> > it in this way). What XMLSchema says is 'count Unicode 
> > codepoints' as far as I can tell.
> > > 
> > > So, while it is not impossible that there is a bug, I think 
> > the far more likely possibility is that your document does 
> > not contain the characters you think it contains. I also 
> > doubt that the attachment is the same document that gives the 
> > error, because I have tried it and it works for me. So you 
> > would have to do some additional investigation on this, 
> > ideally get the exact bytes from that document.
> > > 
> > > Radu
> > > 
> > > On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> > > > Sorry, my e-mail had files attached to it, and the xml 
> > file was the document in question. 
> > > > Anyway, the offending line is:
> > > > 
> > > >          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> > > > 
> > > > As you can see, there are 25 characters in this element, 
> > but xmlbeans thinks there are 26.
> > > > 
> > > > Thanks,
> > > > Elvira
> > > > 
> > > > -----Original Message-----
> > > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > > Sent: Monday, July 23, 2007 4:33 PM
> > > > To: user@xmlbeans.apache.org
> > > > Subject: Re: trouble validating UTF-8 document with 
> > internationalcharacters - please help!
> > > > 
> > > > It would be more interesting to see the document in 
> > question at the 
> > > > line and column referenced in the error message: is that string 
> > > > longer than the declared maxLength facet?
> > > > 
> > > > (I wouldn't take XmlSpy as reference, since it is known to be
> > > > unreliable)
> > > > 
> > > > Radu
> > > > 
> > > > On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > > > > Hello,
> > > > > 
> > > > >  
> > > > > 
> > > > > I am using XmlBeans to validate a document against its schema.
> > > > > 
> > > > > It works fine, except when international characters are used in 
> > > > > the document.
> > > > > 
> > > > > For the attached document and the corresponding schema, 
> > the error 
> > > > > message is as following:
> > > > > 
> > > > >  
> > > > > 
> > > > > Node: COMP_NAME, Line: 148, Column: 10, Detail: string 
> > length (string) is greater than maxLength facet (26) for 25
> > > > >   Document encoding is: null
> > > > >  
> > > > > This document is validated with XmlSpy. What am I 
> > missing? The document file was written as UTF-8. 
> > > > > 
> > > > >  
> > > > > 
> > > > > The code follows. Thanks so much for your help. 
> > > > > 
> > > > >  
> > > > > 
> > > > >       private boolean xmlBeanValidate(File xmlFile, 
> > List sdocs) {
> > > > > 
> > > > >             XmlObject[] schemas = (XmlObject[]) 
> > sdocs.toArray(new 
> > > > > XmlObject[0]);
> > > > > 
> > > > >             SchemaTypeLoader sLoader;
> > > > > 
> > > > >             Collection compErrors = new ArrayList();
> > > > > 
> > > > >             XmlOptions schemaOptions = new XmlOptions();
> > > > > 
> > > > >             schemaOptions.setErrorListener(compErrors);
> > > > > 
> > > > >  
> > > > > 
> > > > >            try {
> > > > > 
> > > > >                   sLoader = XmlBeans.loadXsd(schemas, 
> > > > > schemaOptions);
> > > > > 
> > > > >             } catch (Exception e) {
> > > > > 
> > > > >                  if(compErrors.isEmpty() || !(e instanceof
> > > > > XmlException)) {
> > > > > 
> > > > >                         e.printStackTrace();
> > > > > 
> > > > >                   }
> > > > > 
> > > > >                   logError("Schema is invalid");
> > > > > 
> > > > >                  for (Iterator i = compErrors.iterator();
> > > > > i.hasNext();)
> > > > > 
> > > > >                         log(i.next().toString());
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             }
> > > > > 
> > > > >  
> > > > > 
> > > > >             XmlObject xobj = null;
> > > > > 
> > > > >            try {
> > > > > 
> > > > >                   Reader sr = newFileReader(xmlFile);
> > > > > 
> > > > >                   XmlOptions opt = new XmlOptions();
> > > > > 
> > > > >                   opt.setCharacterEncoding("UTF-8");
> > > > > 
> > > > >                   opt.setLoadLineNumbers();
> > > > > 
> > > > >                   xobj = sLoader.parse(sr, null, opt);
> > > > > 
> > > > >             } catch (Exception e) {
> > > > > 
> > > > >                   logError("xml not loadable: " + e);
> > > > > 
> > > > >                   e.printStackTrace();
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             }
> > > > > 
> > > > >  
> > > > > 
> > > > >             Collection errors = new ArrayList();
> > > > > 
> > > > >            if(xobj.schemaType() == XmlObject.type) {
> > > > > 
> > > > >                   logError("xml is NOT valid. Document type not 
> > > > > found.");
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             } else if (xobj.validate(new 
> > > > > XmlOptions().setErrorListener(errors))){
> > > > > 
> > > > >                   log("Document validation completed 
> > > > > successfully.");
> > > > > 
> > > > >                  return true;
> > > > > 
> > > > >             }else {
> > > > > 
> > > > >                  for (Iterator it = errors.iterator();
> > > > > it.hasNext();) {
> > > > > 
> > > > >                         XmlError xmlError = (XmlError)it.next();
> > > > > 
> > > > >                     logError("  Node: " 
> > > > > 
> > > > > 
> > > > > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > > > > 
> > > > >                               +", Line: " + xmlError.getLine()
> > > > > 
> > > > >                               +", Column: " + 
> > xmlError.getColumn()
> > > > > 
> > > > >                               +", Detail: " + 
> > > > > xmlError.getMessage());
> > > > > 
> > > > >                     logError("  Document encoding is: " 
> > > > > 
> > > > > 
> > > > > +xobj.documentProperties().getEncoding());
> > > > > 
> > > > >  
> > > > > 
> > > > >                   }
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             }
> > > > > 
> > > > >       }
> > > > > 
> > > > >  
> > > > > 
> > > > > 
> > > > > 
> > ------------------------------------------------------------------
> > > > > --
> > > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > > Notice:  This email message, together with any 
> > attachments, may contain information  of  BEA Systems,  Inc., 
> >  its subsidiaries  and  affiliated entities,  that may be 
> > confidential,  proprietary,  copyrighted  and/or legally 
> > privileged, and is intended solely for the use of the 
> > individual or entity named in this message. If you are not 
> > the intended recipient, and have received this message in 
> > error, please immediately return this by email and then delete it.
> > > > 
> > > > 
> > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > > 
> > > > 
> > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > 
> > > Notice:  This email message, together with any attachments, 
> > may contain information  of  BEA Systems,  Inc.,  its 
> > subsidiaries  and  affiliated entities,  that may be 
> > confidential,  proprietary,  copyrighted  and/or legally 
> > privileged, and is intended solely for the use of the 
> > individual or entity named in this message. If you are not 
> > the intended recipient, and have received this message in 
> > error, please immediately return this by email and then delete it.
> > > 
> > > 
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > 
> > > 
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > 
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > 
> > Notice:  This email message, together with any attachments, 
> > may contain information  of  BEA Systems,  Inc.,  its 
> > subsidiaries  and  affiliated entities,  that may be 
> > confidential,  proprietary,  copyrighted  and/or legally 
> > privileged, and is intended solely for the use of the 
> > individual or entity named in this message. If you are not 
> > the intended recipient, and have received this message in 
> > error, please immediately return this by email and then delete it.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> 
> Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document withinternationalcharacters- please help!

Posted by Radu Preotiuc-Pietro <ra...@bea.com>.
I am not sure what 1.x is. If you want to see the source code of
InstanceValidator for 2.3, here is a link to ViewSVN:
http://svn.apache.org/viewvc/xmlbeans/tags/2.3.0/src/xmlcomp/org/apache/xmlbeans/impl/tool/InstanceValidator.java?view=markup

Look at http://xmlbeans.apache.org/sourceAndBinaries/index.html for
instructions on how to sync to SVN or simply download one of the source
distributions.

However, the code you posted earlier also worked for me, so I don't know
how much you should focus on the differences between the
InstanceValidator code and your code.

Radu

On Wed, 2007-07-25 at 15:56 -0400, Elvira_Gurevich@ibi.com wrote:
> Radu,
> Thank you for your reply. Your information helped me isolate the problem further.
> 
> I ran xmlbeans's "validate" and it worked. 
> 
> Then I ran my validation code with the same classpath, same document and schema, and it still fails.
>  
> I tried to find source for org.apache.xmlbeans.impl.tool.InstanceValidator, I was looking in 1.x. Is this the right place for 2.3? The source I found in 1.x\src\xmlcomp\org\apache\xmlbeans\impl\tool does not seem to be the same level code, since if I run "validate" with no parameters, the help message is different, it says:
> 
> C:\xmlbeans\xmlbeans-2.3.0\bin>validate
> Validates the specified instance against the specified schema.
> Contrast with the svalidate tool, which validates using a stream.
> Usage: validate [-dl] [-nopvr] [-noupa] [-license] schema.xsd instance.xml
> Options:
>     -dl - permit network downloads for imports and includes (default is off)
>     -noupa - do not enforce the unique particle attribution rule
>     -nopvr - do not enforce the particle valid (restriction) rule
>     -partial - allow partial schema type system
>     -license - prints license information  
> 
> Instead of what's in the code:
> 
>         if (cl.args().length == 0)
>         {
>             System.out.println("Validates a schema defintion and instances within the schema.");
>             System.out.println("Usage: validate [switches] schema.xsd instance.xml");
>             System.out.println("Switches:");
>             System.out.println("    -dl    enable network downloads for imports and includes");
>             System.out.println("    -nopvr disable particle valid (restriction) rule");
>             System.out.println("    -noupa diable unique particle attributeion rule");
>             System.out.println("    -license prints license information");
>             return;
>         }
> 
> I want to find the source to the InstanceValidator class that works with my document and schema. Please help,
> 
> Elvira
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
> Sent: Tuesday, July 24, 2007 4:42 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document withinternationalcharacters- please help!
> 
> Elvira,
> 
> You are right that two bytes CAN mean one single char, but my (and probably Vinh's) point is that it can ALSO mean two chars (depending on what the bytes are really).
> 
> 1. I am using XmlBeans 2.3
> 2. Not sure what role does Xerces play, but probably shouldn't matter
> 3. I use "validate test.xml SAP_schema.xsd", having added XMLBEANS_HOME/bin to my PATH
> 
> One thing you should try, not sure if you've already tried it, is download the files from your own post and try those, it's not impossible that in the process of going through the mail system the files have been changed.
> 
> Radu
> 
> > -----Original Message-----
> > From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com] 
> > Sent: Tuesday, July 24, 2007 7:32 AM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document 
> > withinternationalcharacters- please help!
> > 
> > Radu and Vinh, Thank you so much for putting your time into this!
> > 
> > Yes, I understand that the problem manifests because the "e" 
> > in Mexico is a double-byte char. But java is supposed to 
> > count the characters, not the bytes, and UTF-8 supports 
> > international characters.
> > 
> > Radu, since you can validate the document, it means that 
> > something else is different. 
> > 1. What version of xmlbeans are you using? 
> > 2. Should I be concerned about the xerces version? 
> > 3. Could you please send me the exact code that you use to 
> > validate, so that 4. I can try to isolate my problem further? 
> > 5. How do I run the 'validate' utility of xmlbeans?
> > 
> > I've been trying to solve this problem for a while already, 
> > before I posted to the forum, now I am becoming really desperate.
> > 
> > Thank you guys so much for helping!
> > Elvira
> >  
> > 
> > -----Original Message-----
> > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > Sent: Monday, July 23, 2007 6:54 PM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document 
> > withinternationalcharacters- please help!
> > 
> > Elvira,
> > 
> > I had ran the Schema and the document through the 'validate' 
> > utility that ships with XmlBeans initially. Now that you 
> > mentioned it, I have also tried your code, same result, 
> > document validates.
> > 
> > Radu
> > 
> > On Mon, 2007-07-23 at 14:57 -0700, Vinh Nguyen (vinguye2) wrote:
> > > My guess is that the "e" in Mexico is a double-byte char.  
> > So your XML document should actually be UTF-16, not UTF-8.  
> > In your db table, perhaps you are using 16-bit chars, so your 
> > string would correctly appear to have 25 chars.  But in byte 
> > representation, it's actually 26 bytes = 26 chars.
> > >  
> > > 
> > > -----Original Message-----
> > > From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com]
> > > Sent: Monday, July 23, 2007 2:45 PM
> > > To: user@xmlbeans.apache.org
> > > Subject: RE: trouble validating UTF-8 document with 
> > internationalcharacters- please help!
> > > 
> > > Radu,
> > > 
> > > You mean you tried the attached document with the attached 
> > schema, and the same code as in the original question, and it 
> > worked with no errors? If you open the attachment, can you go 
> > to line 148 and see the same line? Could you send your test 
> > case back to me and I'll try to run it? Because when I run my 
> > setup, no matter what I do, I cannot get around this error. 
> > Are you using the latest release of xmlbeans? Mine is dated 6/12/2006.
> > > 
> > > BTW, this document is composed from data from table data, 
> > and the column width for this column is defined as 25. That 
> > is how the schema is constructed. 
> > > 
> > > Thanks for your help.
> > > Elvira
> > > 
> > > -----Original Message-----
> > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > Sent: Monday, July 23, 2007 5:26 PM
> > > To: user@xmlbeans.apache.org
> > > Subject: RE: trouble validating UTF-8 document with 
> > internationalcharacters- please help!
> > > 
> > > Sorry, my mistake, I did not notice the attachments.
> > > 
> > > However, it is not clear to me whether there are 25 or 26 
> > characters in that string. I am rusty on UTF-8 encoding rules 
> > and Unicode, but I am sure that the 'é' character can be 
> > represented as either 'eacute' (Unicode 00E9) or a composite 
> > character (I would imagine there are many ways to represent 
> > it in this way). What XMLSchema says is 'count Unicode 
> > codepoints' as far as I can tell.
> > > 
> > > So, while it is not impossible that there is a bug, I think 
> > the far more likely possibility is that your document does 
> > not contain the characters you think it contains. I also 
> > doubt that the attachment is the same document that gives the 
> > error, because I have tried it and it works for me. So you 
> > would have to do some additional investigation on this, 
> > ideally get the exact bytes from that document.
> > > 
> > > Radu
> > > 
> > > On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> > > > Sorry, my e-mail had files attached to it, and the xml 
> > file was the document in question. 
> > > > Anyway, the offending line is:
> > > > 
> > > >          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> > > > 
> > > > As you can see, there are 25 characters in this element, 
> > but xmlbeans thinks there are 26.
> > > > 
> > > > Thanks,
> > > > Elvira
> > > > 
> > > > -----Original Message-----
> > > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > > Sent: Monday, July 23, 2007 4:33 PM
> > > > To: user@xmlbeans.apache.org
> > > > Subject: Re: trouble validating UTF-8 document with 
> > internationalcharacters - please help!
> > > > 
> > > > It would be more interesting to see the document in 
> > question at the 
> > > > line and column referenced in the error message: is that string 
> > > > longer than the declared maxLength facet?
> > > > 
> > > > (I wouldn't take XmlSpy as reference, since it is known to be
> > > > unreliable)
> > > > 
> > > > Radu
> > > > 
> > > > On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > > > > Hello,
> > > > > 
> > > > >  
> > > > > 
> > > > > I am using XmlBeans to validate a document against its schema.
> > > > > 
> > > > > It works fine, except when international characters are used in 
> > > > > the document.
> > > > > 
> > > > > For the attached document and the corresponding schema, 
> > the error 
> > > > > message is as following:
> > > > > 
> > > > >  
> > > > > 
> > > > > Node: COMP_NAME, Line: 148, Column: 10, Detail: string 
> > length (string) is greater than maxLength facet (26) for 25
> > > > >   Document encoding is: null
> > > > >  
> > > > > This document is validated with XmlSpy. What am I 
> > missing? The document file was written as UTF-8. 
> > > > > 
> > > > >  
> > > > > 
> > > > > The code follows. Thanks so much for your help. 
> > > > > 
> > > > >  
> > > > > 
> > > > >       private boolean xmlBeanValidate(File xmlFile, 
> > List sdocs) {
> > > > > 
> > > > >             XmlObject[] schemas = (XmlObject[]) 
> > sdocs.toArray(new 
> > > > > XmlObject[0]);
> > > > > 
> > > > >             SchemaTypeLoader sLoader;
> > > > > 
> > > > >             Collection compErrors = new ArrayList();
> > > > > 
> > > > >             XmlOptions schemaOptions = new XmlOptions();
> > > > > 
> > > > >             schemaOptions.setErrorListener(compErrors);
> > > > > 
> > > > >  
> > > > > 
> > > > >            try {
> > > > > 
> > > > >                   sLoader = XmlBeans.loadXsd(schemas, 
> > > > > schemaOptions);
> > > > > 
> > > > >             } catch (Exception e) {
> > > > > 
> > > > >                  if(compErrors.isEmpty() || !(e instanceof
> > > > > XmlException)) {
> > > > > 
> > > > >                         e.printStackTrace();
> > > > > 
> > > > >                   }
> > > > > 
> > > > >                   logError("Schema is invalid");
> > > > > 
> > > > >                  for (Iterator i = compErrors.iterator();
> > > > > i.hasNext();)
> > > > > 
> > > > >                         log(i.next().toString());
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             }
> > > > > 
> > > > >  
> > > > > 
> > > > >             XmlObject xobj = null;
> > > > > 
> > > > >            try {
> > > > > 
> > > > >                   Reader sr = newFileReader(xmlFile);
> > > > > 
> > > > >                   XmlOptions opt = new XmlOptions();
> > > > > 
> > > > >                   opt.setCharacterEncoding("UTF-8");
> > > > > 
> > > > >                   opt.setLoadLineNumbers();
> > > > > 
> > > > >                   xobj = sLoader.parse(sr, null, opt);
> > > > > 
> > > > >             } catch (Exception e) {
> > > > > 
> > > > >                   logError("xml not loadable: " + e);
> > > > > 
> > > > >                   e.printStackTrace();
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             }
> > > > > 
> > > > >  
> > > > > 
> > > > >             Collection errors = new ArrayList();
> > > > > 
> > > > >            if(xobj.schemaType() == XmlObject.type) {
> > > > > 
> > > > >                   logError("xml is NOT valid. Document type not 
> > > > > found.");
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             } else if (xobj.validate(new 
> > > > > XmlOptions().setErrorListener(errors))){
> > > > > 
> > > > >                   log("Document validation completed 
> > > > > successfully.");
> > > > > 
> > > > >                  return true;
> > > > > 
> > > > >             }else {
> > > > > 
> > > > >                  for (Iterator it = errors.iterator();
> > > > > it.hasNext();) {
> > > > > 
> > > > >                         XmlError xmlError = (XmlError)it.next();
> > > > > 
> > > > >                     logError("  Node: " 
> > > > > 
> > > > > 
> > > > > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > > > > 
> > > > >                               +", Line: " + xmlError.getLine()
> > > > > 
> > > > >                               +", Column: " + 
> > xmlError.getColumn()
> > > > > 
> > > > >                               +", Detail: " + 
> > > > > xmlError.getMessage());
> > > > > 
> > > > >                     logError("  Document encoding is: " 
> > > > > 
> > > > > 
> > > > > +xobj.documentProperties().getEncoding());
> > > > > 
> > > > >  
> > > > > 
> > > > >                   }
> > > > > 
> > > > >                  return false;
> > > > > 
> > > > >             }
> > > > > 
> > > > >       }
> > > > > 
> > > > >  
> > > > > 
> > > > > 
> > > > > 
> > ------------------------------------------------------------------
> > > > > --
> > > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > > Notice:  This email message, together with any 
> > attachments, may contain information  of  BEA Systems,  Inc., 
> >  its subsidiaries  and  affiliated entities,  that may be 
> > confidential,  proprietary,  copyrighted  and/or legally 
> > privileged, and is intended solely for the use of the 
> > individual or entity named in this message. If you are not 
> > the intended recipient, and have received this message in 
> > error, please immediately return this by email and then delete it.
> > > > 
> > > > 
> > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > > 
> > > > 
> > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > > 
> > > 
> > > Notice:  This email message, together with any attachments, 
> > may contain information  of  BEA Systems,  Inc.,  its 
> > subsidiaries  and  affiliated entities,  that may be 
> > confidential,  proprietary,  copyrighted  and/or legally 
> > privileged, and is intended solely for the use of the 
> > individual or entity named in this message. If you are not 
> > the intended recipient, and have received this message in 
> > error, please immediately return this by email and then delete it.
> > > 
> > > 
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > 
> > > 
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > 
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > 
> > Notice:  This email message, together with any attachments, 
> > may contain information  of  BEA Systems,  Inc.,  its 
> > subsidiaries  and  affiliated entities,  that may be 
> > confidential,  proprietary,  copyrighted  and/or legally 
> > privileged, and is intended solely for the use of the 
> > individual or entity named in this message. If you are not 
> > the intended recipient, and have received this message in 
> > error, please immediately return this by email and then delete it.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> 
> Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document withinternationalcharacters- please help!

Posted by El...@ibi.com.
Radu,
Thank you for your reply. Your information helped me isolate the problem further.

I ran xmlbeans's "validate" and it worked. 

Then I ran my validation code with the same classpath, same document and schema, and it still fails.
 
I tried to find source for org.apache.xmlbeans.impl.tool.InstanceValidator, I was looking in 1.x. Is this the right place for 2.3? The source I found in 1.x\src\xmlcomp\org\apache\xmlbeans\impl\tool does not seem to be the same level code, since if I run "validate" with no parameters, the help message is different, it says:

C:\xmlbeans\xmlbeans-2.3.0\bin>validate
Validates the specified instance against the specified schema.
Contrast with the svalidate tool, which validates using a stream.
Usage: validate [-dl] [-nopvr] [-noupa] [-license] schema.xsd instance.xml
Options:
    -dl - permit network downloads for imports and includes (default is off)
    -noupa - do not enforce the unique particle attribution rule
    -nopvr - do not enforce the particle valid (restriction) rule
    -partial - allow partial schema type system
    -license - prints license information  

Instead of what's in the code:

        if (cl.args().length == 0)
        {
            System.out.println("Validates a schema defintion and instances within the schema.");
            System.out.println("Usage: validate [switches] schema.xsd instance.xml");
            System.out.println("Switches:");
            System.out.println("    -dl    enable network downloads for imports and includes");
            System.out.println("    -nopvr disable particle valid (restriction) rule");
            System.out.println("    -noupa diable unique particle attributeion rule");
            System.out.println("    -license prints license information");
            return;
        }

I want to find the source to the InstanceValidator class that works with my document and schema. Please help,

Elvira

-----Original Message-----
From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
Sent: Tuesday, July 24, 2007 4:42 PM
To: user@xmlbeans.apache.org
Subject: RE: trouble validating UTF-8 document withinternationalcharacters- please help!

Elvira,

You are right that two bytes CAN mean one single char, but my (and probably Vinh's) point is that it can ALSO mean two chars (depending on what the bytes are really).

1. I am using XmlBeans 2.3
2. Not sure what role does Xerces play, but probably shouldn't matter
3. I use "validate test.xml SAP_schema.xsd", having added XMLBEANS_HOME/bin to my PATH

One thing you should try, not sure if you've already tried it, is download the files from your own post and try those, it's not impossible that in the process of going through the mail system the files have been changed.

Radu

> -----Original Message-----
> From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com] 
> Sent: Tuesday, July 24, 2007 7:32 AM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document 
> withinternationalcharacters- please help!
> 
> Radu and Vinh, Thank you so much for putting your time into this!
> 
> Yes, I understand that the problem manifests because the "e" 
> in Mexico is a double-byte char. But java is supposed to 
> count the characters, not the bytes, and UTF-8 supports 
> international characters.
> 
> Radu, since you can validate the document, it means that 
> something else is different. 
> 1. What version of xmlbeans are you using? 
> 2. Should I be concerned about the xerces version? 
> 3. Could you please send me the exact code that you use to 
> validate, so that 4. I can try to isolate my problem further? 
> 5. How do I run the 'validate' utility of xmlbeans?
> 
> I've been trying to solve this problem for a while already, 
> before I posted to the forum, now I am becoming really desperate.
> 
> Thank you guys so much for helping!
> Elvira
>  
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> Sent: Monday, July 23, 2007 6:54 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document 
> withinternationalcharacters- please help!
> 
> Elvira,
> 
> I had ran the Schema and the document through the 'validate' 
> utility that ships with XmlBeans initially. Now that you 
> mentioned it, I have also tried your code, same result, 
> document validates.
> 
> Radu
> 
> On Mon, 2007-07-23 at 14:57 -0700, Vinh Nguyen (vinguye2) wrote:
> > My guess is that the "e" in Mexico is a double-byte char.  
> So your XML document should actually be UTF-16, not UTF-8.  
> In your db table, perhaps you are using 16-bit chars, so your 
> string would correctly appear to have 25 chars.  But in byte 
> representation, it's actually 26 bytes = 26 chars.
> >  
> > 
> > -----Original Message-----
> > From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com]
> > Sent: Monday, July 23, 2007 2:45 PM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document with 
> internationalcharacters- please help!
> > 
> > Radu,
> > 
> > You mean you tried the attached document with the attached 
> schema, and the same code as in the original question, and it 
> worked with no errors? If you open the attachment, can you go 
> to line 148 and see the same line? Could you send your test 
> case back to me and I'll try to run it? Because when I run my 
> setup, no matter what I do, I cannot get around this error. 
> Are you using the latest release of xmlbeans? Mine is dated 6/12/2006.
> > 
> > BTW, this document is composed from data from table data, 
> and the column width for this column is defined as 25. That 
> is how the schema is constructed. 
> > 
> > Thanks for your help.
> > Elvira
> > 
> > -----Original Message-----
> > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > Sent: Monday, July 23, 2007 5:26 PM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document with 
> internationalcharacters- please help!
> > 
> > Sorry, my mistake, I did not notice the attachments.
> > 
> > However, it is not clear to me whether there are 25 or 26 
> characters in that string. I am rusty on UTF-8 encoding rules 
> and Unicode, but I am sure that the 'é' character can be 
> represented as either 'eacute' (Unicode 00E9) or a composite 
> character (I would imagine there are many ways to represent 
> it in this way). What XMLSchema says is 'count Unicode 
> codepoints' as far as I can tell.
> > 
> > So, while it is not impossible that there is a bug, I think 
> the far more likely possibility is that your document does 
> not contain the characters you think it contains. I also 
> doubt that the attachment is the same document that gives the 
> error, because I have tried it and it works for me. So you 
> would have to do some additional investigation on this, 
> ideally get the exact bytes from that document.
> > 
> > Radu
> > 
> > On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> > > Sorry, my e-mail had files attached to it, and the xml 
> file was the document in question. 
> > > Anyway, the offending line is:
> > > 
> > >          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> > > 
> > > As you can see, there are 25 characters in this element, 
> but xmlbeans thinks there are 26.
> > > 
> > > Thanks,
> > > Elvira
> > > 
> > > -----Original Message-----
> > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > Sent: Monday, July 23, 2007 4:33 PM
> > > To: user@xmlbeans.apache.org
> > > Subject: Re: trouble validating UTF-8 document with 
> internationalcharacters - please help!
> > > 
> > > It would be more interesting to see the document in 
> question at the 
> > > line and column referenced in the error message: is that string 
> > > longer than the declared maxLength facet?
> > > 
> > > (I wouldn't take XmlSpy as reference, since it is known to be
> > > unreliable)
> > > 
> > > Radu
> > > 
> > > On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > > > Hello,
> > > > 
> > > >  
> > > > 
> > > > I am using XmlBeans to validate a document against its schema.
> > > > 
> > > > It works fine, except when international characters are used in 
> > > > the document.
> > > > 
> > > > For the attached document and the corresponding schema, 
> the error 
> > > > message is as following:
> > > > 
> > > >  
> > > > 
> > > > Node: COMP_NAME, Line: 148, Column: 10, Detail: string 
> length (string) is greater than maxLength facet (26) for 25
> > > >   Document encoding is: null
> > > >  
> > > > This document is validated with XmlSpy. What am I 
> missing? The document file was written as UTF-8. 
> > > > 
> > > >  
> > > > 
> > > > The code follows. Thanks so much for your help. 
> > > > 
> > > >  
> > > > 
> > > >       private boolean xmlBeanValidate(File xmlFile, 
> List sdocs) {
> > > > 
> > > >             XmlObject[] schemas = (XmlObject[]) 
> sdocs.toArray(new 
> > > > XmlObject[0]);
> > > > 
> > > >             SchemaTypeLoader sLoader;
> > > > 
> > > >             Collection compErrors = new ArrayList();
> > > > 
> > > >             XmlOptions schemaOptions = new XmlOptions();
> > > > 
> > > >             schemaOptions.setErrorListener(compErrors);
> > > > 
> > > >  
> > > > 
> > > >            try {
> > > > 
> > > >                   sLoader = XmlBeans.loadXsd(schemas, 
> > > > schemaOptions);
> > > > 
> > > >             } catch (Exception e) {
> > > > 
> > > >                  if(compErrors.isEmpty() || !(e instanceof
> > > > XmlException)) {
> > > > 
> > > >                         e.printStackTrace();
> > > > 
> > > >                   }
> > > > 
> > > >                   logError("Schema is invalid");
> > > > 
> > > >                  for (Iterator i = compErrors.iterator();
> > > > i.hasNext();)
> > > > 
> > > >                         log(i.next().toString());
> > > > 
> > > >                  return false;
> > > > 
> > > >             }
> > > > 
> > > >  
> > > > 
> > > >             XmlObject xobj = null;
> > > > 
> > > >            try {
> > > > 
> > > >                   Reader sr = newFileReader(xmlFile);
> > > > 
> > > >                   XmlOptions opt = new XmlOptions();
> > > > 
> > > >                   opt.setCharacterEncoding("UTF-8");
> > > > 
> > > >                   opt.setLoadLineNumbers();
> > > > 
> > > >                   xobj = sLoader.parse(sr, null, opt);
> > > > 
> > > >             } catch (Exception e) {
> > > > 
> > > >                   logError("xml not loadable: " + e);
> > > > 
> > > >                   e.printStackTrace();
> > > > 
> > > >                  return false;
> > > > 
> > > >             }
> > > > 
> > > >  
> > > > 
> > > >             Collection errors = new ArrayList();
> > > > 
> > > >            if(xobj.schemaType() == XmlObject.type) {
> > > > 
> > > >                   logError("xml is NOT valid. Document type not 
> > > > found.");
> > > > 
> > > >                  return false;
> > > > 
> > > >             } else if (xobj.validate(new 
> > > > XmlOptions().setErrorListener(errors))){
> > > > 
> > > >                   log("Document validation completed 
> > > > successfully.");
> > > > 
> > > >                  return true;
> > > > 
> > > >             }else {
> > > > 
> > > >                  for (Iterator it = errors.iterator();
> > > > it.hasNext();) {
> > > > 
> > > >                         XmlError xmlError = (XmlError)it.next();
> > > > 
> > > >                     logError("  Node: " 
> > > > 
> > > > 
> > > > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > > > 
> > > >                               +", Line: " + xmlError.getLine()
> > > > 
> > > >                               +", Column: " + 
> xmlError.getColumn()
> > > > 
> > > >                               +", Detail: " + 
> > > > xmlError.getMessage());
> > > > 
> > > >                     logError("  Document encoding is: " 
> > > > 
> > > > 
> > > > +xobj.documentProperties().getEncoding());
> > > > 
> > > >  
> > > > 
> > > >                   }
> > > > 
> > > >                  return false;
> > > > 
> > > >             }
> > > > 
> > > >       }
> > > > 
> > > >  
> > > > 
> > > > 
> > > > 
> ------------------------------------------------------------------
> > > > --
> > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > Notice:  This email message, together with any 
> attachments, may contain information  of  BEA Systems,  Inc., 
>  its subsidiaries  and  affiliated entities,  that may be 
> confidential,  proprietary,  copyrighted  and/or legally 
> privileged, and is intended solely for the use of the 
> individual or entity named in this message. If you are not 
> the intended recipient, and have received this message in 
> error, please immediately return this by email and then delete it.
> > > 
> > > 
> --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > 
> > > 
> --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > 
> > Notice:  This email message, together with any attachments, 
> may contain information  of  BEA Systems,  Inc.,  its 
> subsidiaries  and  affiliated entities,  that may be 
> confidential,  proprietary,  copyrighted  and/or legally 
> privileged, and is intended solely for the use of the 
> individual or entity named in this message. If you are not 
> the intended recipient, and have received this message in 
> error, please immediately return this by email and then delete it.
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> 
> Notice:  This email message, together with any attachments, 
> may contain information  of  BEA Systems,  Inc.,  its 
> subsidiaries  and  affiliated entities,  that may be 
> confidential,  proprietary,  copyrighted  and/or legally 
> privileged, and is intended solely for the use of the 
> individual or entity named in this message. If you are not 
> the intended recipient, and have received this message in 
> error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document withinternationalcharacters- please help!

Posted by Radu Preotiuc-Pietro <ra...@bea.com>.
Elvira,

You are right that two bytes CAN mean one single char, but my (and probably Vinh's) point is that it can ALSO mean two chars (depending on what the bytes are really).

1. I am using XmlBeans 2.3
2. Not sure what role does Xerces play, but probably shouldn't matter
3. I use "validate test.xml SAP_schema.xsd", having added XMLBEANS_HOME/bin to my PATH

One thing you should try, not sure if you've already tried it, is download the files from your own post and try those, it's not impossible that in the process of going through the mail system the files have been changed.

Radu

> -----Original Message-----
> From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com] 
> Sent: Tuesday, July 24, 2007 7:32 AM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document 
> withinternationalcharacters- please help!
> 
> Radu and Vinh, Thank you so much for putting your time into this!
> 
> Yes, I understand that the problem manifests because the "e" 
> in Mexico is a double-byte char. But java is supposed to 
> count the characters, not the bytes, and UTF-8 supports 
> international characters.
> 
> Radu, since you can validate the document, it means that 
> something else is different. 
> 1. What version of xmlbeans are you using? 
> 2. Should I be concerned about the xerces version? 
> 3. Could you please send me the exact code that you use to 
> validate, so that 4. I can try to isolate my problem further? 
> 5. How do I run the 'validate' utility of xmlbeans?
> 
> I've been trying to solve this problem for a while already, 
> before I posted to the forum, now I am becoming really desperate.
> 
> Thank you guys so much for helping!
> Elvira
>  
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> Sent: Monday, July 23, 2007 6:54 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document 
> withinternationalcharacters- please help!
> 
> Elvira,
> 
> I had ran the Schema and the document through the 'validate' 
> utility that ships with XmlBeans initially. Now that you 
> mentioned it, I have also tried your code, same result, 
> document validates.
> 
> Radu
> 
> On Mon, 2007-07-23 at 14:57 -0700, Vinh Nguyen (vinguye2) wrote:
> > My guess is that the "e" in Mexico is a double-byte char.  
> So your XML document should actually be UTF-16, not UTF-8.  
> In your db table, perhaps you are using 16-bit chars, so your 
> string would correctly appear to have 25 chars.  But in byte 
> representation, it's actually 26 bytes = 26 chars.
> >  
> > 
> > -----Original Message-----
> > From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com]
> > Sent: Monday, July 23, 2007 2:45 PM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document with 
> internationalcharacters- please help!
> > 
> > Radu,
> > 
> > You mean you tried the attached document with the attached 
> schema, and the same code as in the original question, and it 
> worked with no errors? If you open the attachment, can you go 
> to line 148 and see the same line? Could you send your test 
> case back to me and I'll try to run it? Because when I run my 
> setup, no matter what I do, I cannot get around this error. 
> Are you using the latest release of xmlbeans? Mine is dated 6/12/2006.
> > 
> > BTW, this document is composed from data from table data, 
> and the column width for this column is defined as 25. That 
> is how the schema is constructed. 
> > 
> > Thanks for your help.
> > Elvira
> > 
> > -----Original Message-----
> > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > Sent: Monday, July 23, 2007 5:26 PM
> > To: user@xmlbeans.apache.org
> > Subject: RE: trouble validating UTF-8 document with 
> internationalcharacters- please help!
> > 
> > Sorry, my mistake, I did not notice the attachments.
> > 
> > However, it is not clear to me whether there are 25 or 26 
> characters in that string. I am rusty on UTF-8 encoding rules 
> and Unicode, but I am sure that the 'é' character can be 
> represented as either 'eacute' (Unicode 00E9) or a composite 
> character (I would imagine there are many ways to represent 
> it in this way). What XMLSchema says is 'count Unicode 
> codepoints' as far as I can tell.
> > 
> > So, while it is not impossible that there is a bug, I think 
> the far more likely possibility is that your document does 
> not contain the characters you think it contains. I also 
> doubt that the attachment is the same document that gives the 
> error, because I have tried it and it works for me. So you 
> would have to do some additional investigation on this, 
> ideally get the exact bytes from that document.
> > 
> > Radu
> > 
> > On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> > > Sorry, my e-mail had files attached to it, and the xml 
> file was the document in question. 
> > > Anyway, the offending line is:
> > > 
> > >          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> > > 
> > > As you can see, there are 25 characters in this element, 
> but xmlbeans thinks there are 26.
> > > 
> > > Thanks,
> > > Elvira
> > > 
> > > -----Original Message-----
> > > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > > Sent: Monday, July 23, 2007 4:33 PM
> > > To: user@xmlbeans.apache.org
> > > Subject: Re: trouble validating UTF-8 document with 
> internationalcharacters - please help!
> > > 
> > > It would be more interesting to see the document in 
> question at the 
> > > line and column referenced in the error message: is that string 
> > > longer than the declared maxLength facet?
> > > 
> > > (I wouldn't take XmlSpy as reference, since it is known to be
> > > unreliable)
> > > 
> > > Radu
> > > 
> > > On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > > > Hello,
> > > > 
> > > >  
> > > > 
> > > > I am using XmlBeans to validate a document against its schema.
> > > > 
> > > > It works fine, except when international characters are used in 
> > > > the document.
> > > > 
> > > > For the attached document and the corresponding schema, 
> the error 
> > > > message is as following:
> > > > 
> > > >  
> > > > 
> > > > Node: COMP_NAME, Line: 148, Column: 10, Detail: string 
> length (string) is greater than maxLength facet (26) for 25
> > > >   Document encoding is: null
> > > >  
> > > > This document is validated with XmlSpy. What am I 
> missing? The document file was written as UTF-8. 
> > > > 
> > > >  
> > > > 
> > > > The code follows. Thanks so much for your help. 
> > > > 
> > > >  
> > > > 
> > > >       private boolean xmlBeanValidate(File xmlFile, 
> List sdocs) {
> > > > 
> > > >             XmlObject[] schemas = (XmlObject[]) 
> sdocs.toArray(new 
> > > > XmlObject[0]);
> > > > 
> > > >             SchemaTypeLoader sLoader;
> > > > 
> > > >             Collection compErrors = new ArrayList();
> > > > 
> > > >             XmlOptions schemaOptions = new XmlOptions();
> > > > 
> > > >             schemaOptions.setErrorListener(compErrors);
> > > > 
> > > >  
> > > > 
> > > >            try {
> > > > 
> > > >                   sLoader = XmlBeans.loadXsd(schemas, 
> > > > schemaOptions);
> > > > 
> > > >             } catch (Exception e) {
> > > > 
> > > >                  if(compErrors.isEmpty() || !(e instanceof
> > > > XmlException)) {
> > > > 
> > > >                         e.printStackTrace();
> > > > 
> > > >                   }
> > > > 
> > > >                   logError("Schema is invalid");
> > > > 
> > > >                  for (Iterator i = compErrors.iterator();
> > > > i.hasNext();)
> > > > 
> > > >                         log(i.next().toString());
> > > > 
> > > >                  return false;
> > > > 
> > > >             }
> > > > 
> > > >  
> > > > 
> > > >             XmlObject xobj = null;
> > > > 
> > > >            try {
> > > > 
> > > >                   Reader sr = newFileReader(xmlFile);
> > > > 
> > > >                   XmlOptions opt = new XmlOptions();
> > > > 
> > > >                   opt.setCharacterEncoding("UTF-8");
> > > > 
> > > >                   opt.setLoadLineNumbers();
> > > > 
> > > >                   xobj = sLoader.parse(sr, null, opt);
> > > > 
> > > >             } catch (Exception e) {
> > > > 
> > > >                   logError("xml not loadable: " + e);
> > > > 
> > > >                   e.printStackTrace();
> > > > 
> > > >                  return false;
> > > > 
> > > >             }
> > > > 
> > > >  
> > > > 
> > > >             Collection errors = new ArrayList();
> > > > 
> > > >            if(xobj.schemaType() == XmlObject.type) {
> > > > 
> > > >                   logError("xml is NOT valid. Document type not 
> > > > found.");
> > > > 
> > > >                  return false;
> > > > 
> > > >             } else if (xobj.validate(new 
> > > > XmlOptions().setErrorListener(errors))){
> > > > 
> > > >                   log("Document validation completed 
> > > > successfully.");
> > > > 
> > > >                  return true;
> > > > 
> > > >             }else {
> > > > 
> > > >                  for (Iterator it = errors.iterator();
> > > > it.hasNext();) {
> > > > 
> > > >                         XmlError xmlError = (XmlError)it.next();
> > > > 
> > > >                     logError("  Node: " 
> > > > 
> > > > 
> > > > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > > > 
> > > >                               +", Line: " + xmlError.getLine()
> > > > 
> > > >                               +", Column: " + 
> xmlError.getColumn()
> > > > 
> > > >                               +", Detail: " + 
> > > > xmlError.getMessage());
> > > > 
> > > >                     logError("  Document encoding is: " 
> > > > 
> > > > 
> > > > +xobj.documentProperties().getEncoding());
> > > > 
> > > >  
> > > > 
> > > >                   }
> > > > 
> > > >                  return false;
> > > > 
> > > >             }
> > > > 
> > > >       }
> > > > 
> > > >  
> > > > 
> > > > 
> > > > 
> ------------------------------------------------------------------
> > > > --
> > > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > Notice:  This email message, together with any 
> attachments, may contain information  of  BEA Systems,  Inc., 
>  its subsidiaries  and  affiliated entities,  that may be 
> confidential,  proprietary,  copyrighted  and/or legally 
> privileged, and is intended solely for the use of the 
> individual or entity named in this message. If you are not 
> the intended recipient, and have received this message in 
> error, please immediately return this by email and then delete it.
> > > 
> > > 
> --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > > 
> > > 
> --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > > 
> > 
> > Notice:  This email message, together with any attachments, 
> may contain information  of  BEA Systems,  Inc.,  its 
> subsidiaries  and  affiliated entities,  that may be 
> confidential,  proprietary,  copyrighted  and/or legally 
> privileged, and is intended solely for the use of the 
> individual or entity named in this message. If you are not 
> the intended recipient, and have received this message in 
> error, please immediately return this by email and then delete it.
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> 
> Notice:  This email message, together with any attachments, 
> may contain information  of  BEA Systems,  Inc.,  its 
> subsidiaries  and  affiliated entities,  that may be 
> confidential,  proprietary,  copyrighted  and/or legally 
> privileged, and is intended solely for the use of the 
> individual or entity named in this message. If you are not 
> the intended recipient, and have received this message in 
> error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document withinternationalcharacters- please help!

Posted by El...@ibi.com.
Radu and Vinh, Thank you so much for putting your time into this!

Yes, I understand that the problem manifests because the "e" in Mexico is a double-byte char. But java is supposed to count the characters, not the bytes, and UTF-8 supports international characters.

Radu, since you can validate the document, it means that something else is different. 
1. What version of xmlbeans are you using? 
2. Should I be concerned about the xerces version? 
3. Could you please send me the exact code that you use to validate, so that 4. I can try to isolate my problem further? 
5. How do I run the 'validate' utility of xmlbeans?

I've been trying to solve this problem for a while already, before I posted to the forum, now I am becoming really desperate.

Thank you guys so much for helping!
Elvira
 

-----Original Message-----
From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
Sent: Monday, July 23, 2007 6:54 PM
To: user@xmlbeans.apache.org
Subject: RE: trouble validating UTF-8 document withinternationalcharacters- please help!

Elvira,

I had ran the Schema and the document through the 'validate' utility
that ships with XmlBeans initially. Now that you mentioned it, I have
also tried your code, same result, document validates.

Radu

On Mon, 2007-07-23 at 14:57 -0700, Vinh Nguyen (vinguye2) wrote:
> My guess is that the "e" in Mexico is a double-byte char.  So your XML document should actually be UTF-16, not UTF-8.  In your db table, perhaps you are using 16-bit chars, so your string would correctly appear to have 25 chars.  But in byte representation, it's actually 26 bytes = 26 chars.
>  
> 
> -----Original Message-----
> From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com] 
> Sent: Monday, July 23, 2007 2:45 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document with internationalcharacters- please help!
> 
> Radu,
> 
> You mean you tried the attached document with the attached schema, and the same code as in the original question, and it worked with no errors? If you open the attachment, can you go to line 148 and see the same line? Could you send your test case back to me and I'll try to run it? Because when I run my setup, no matter what I do, I cannot get around this error. Are you using the latest release of xmlbeans? Mine is dated 6/12/2006.
> 
> BTW, this document is composed from data from table data, and the column width for this column is defined as 25. That is how the schema is constructed. 
> 
> Thanks for your help.
> Elvira
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> Sent: Monday, July 23, 2007 5:26 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document with internationalcharacters- please help!
> 
> Sorry, my mistake, I did not notice the attachments.
> 
> However, it is not clear to me whether there are 25 or 26 characters in that string. I am rusty on UTF-8 encoding rules and Unicode, but I am sure that the 'é' character can be represented as either 'eacute' (Unicode 00E9) or a composite character (I would imagine there are many ways to represent it in this way). What XMLSchema says is 'count Unicode codepoints' as far as I can tell.
> 
> So, while it is not impossible that there is a bug, I think the far more likely possibility is that your document does not contain the characters you think it contains. I also doubt that the attachment is the same document that gives the error, because I have tried it and it works for me. So you would have to do some additional investigation on this, ideally get the exact bytes from that document.
> 
> Radu
> 
> On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> > Sorry, my e-mail had files attached to it, and the xml file was the document in question. 
> > Anyway, the offending line is:
> > 
> >          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> > 
> > As you can see, there are 25 characters in this element, but xmlbeans thinks there are 26.
> > 
> > Thanks,
> > Elvira
> > 
> > -----Original Message-----
> > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > Sent: Monday, July 23, 2007 4:33 PM
> > To: user@xmlbeans.apache.org
> > Subject: Re: trouble validating UTF-8 document with internationalcharacters - please help!
> > 
> > It would be more interesting to see the document in question at the 
> > line and column referenced in the error message: is that string longer 
> > than the declared maxLength facet?
> > 
> > (I wouldn't take XmlSpy as reference, since it is known to be
> > unreliable)
> > 
> > Radu
> > 
> > On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > > Hello,
> > > 
> > >  
> > > 
> > > I am using XmlBeans to validate a document against its schema.
> > > 
> > > It works fine, except when international characters are used in the 
> > > document.
> > > 
> > > For the attached document and the corresponding schema, the error 
> > > message is as following:
> > > 
> > >  
> > > 
> > > Node: COMP_NAME, Line: 148, Column: 10, Detail: string length (string) is greater than maxLength facet (26) for 25
> > >   Document encoding is: null
> > >  
> > > This document is validated with XmlSpy. What am I missing? The document file was written as UTF-8. 
> > > 
> > >  
> > > 
> > > The code follows. Thanks so much for your help. 
> > > 
> > >  
> > > 
> > >       private boolean xmlBeanValidate(File xmlFile, List sdocs) {
> > > 
> > >             XmlObject[] schemas = (XmlObject[]) sdocs.toArray(new 
> > > XmlObject[0]);
> > > 
> > >             SchemaTypeLoader sLoader;
> > > 
> > >             Collection compErrors = new ArrayList();
> > > 
> > >             XmlOptions schemaOptions = new XmlOptions();
> > > 
> > >             schemaOptions.setErrorListener(compErrors);
> > > 
> > >  
> > > 
> > >            try {
> > > 
> > >                   sLoader = XmlBeans.loadXsd(schemas, 
> > > schemaOptions);
> > > 
> > >             } catch (Exception e) {
> > > 
> > >                  if(compErrors.isEmpty() || !(e instanceof
> > > XmlException)) {
> > > 
> > >                         e.printStackTrace();
> > > 
> > >                   }
> > > 
> > >                   logError("Schema is invalid");
> > > 
> > >                  for (Iterator i = compErrors.iterator();
> > > i.hasNext();)
> > > 
> > >                         log(i.next().toString());
> > > 
> > >                  return false;
> > > 
> > >             }
> > > 
> > >  
> > > 
> > >             XmlObject xobj = null;
> > > 
> > >            try {
> > > 
> > >                   Reader sr = newFileReader(xmlFile);
> > > 
> > >                   XmlOptions opt = new XmlOptions();
> > > 
> > >                   opt.setCharacterEncoding("UTF-8");
> > > 
> > >                   opt.setLoadLineNumbers();
> > > 
> > >                   xobj = sLoader.parse(sr, null, opt);
> > > 
> > >             } catch (Exception e) {
> > > 
> > >                   logError("xml not loadable: " + e);
> > > 
> > >                   e.printStackTrace();
> > > 
> > >                  return false;
> > > 
> > >             }
> > > 
> > >  
> > > 
> > >             Collection errors = new ArrayList();
> > > 
> > >            if(xobj.schemaType() == XmlObject.type) {
> > > 
> > >                   logError("xml is NOT valid. Document type not 
> > > found.");
> > > 
> > >                  return false;
> > > 
> > >             } else if (xobj.validate(new 
> > > XmlOptions().setErrorListener(errors))){
> > > 
> > >                   log("Document validation completed 
> > > successfully.");
> > > 
> > >                  return true;
> > > 
> > >             }else {
> > > 
> > >                  for (Iterator it = errors.iterator(); 
> > > it.hasNext();) {
> > > 
> > >                         XmlError xmlError = (XmlError)it.next();
> > > 
> > >                     logError("  Node: " 
> > > 
> > > 
> > > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > > 
> > >                               +", Line: " + xmlError.getLine()
> > > 
> > >                               +", Column: " + xmlError.getColumn()
> > > 
> > >                               +", Detail: " + 
> > > xmlError.getMessage());
> > > 
> > >                     logError("  Document encoding is: " 
> > > 
> > > 
> > > +xobj.documentProperties().getEncoding());
> > > 
> > >  
> > > 
> > >                   }
> > > 
> > >                  return false;
> > > 
> > >             }
> > > 
> > >       }
> > > 
> > >  
> > > 
> > > 
> > > --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> 
> Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document with internationalcharacters- please help!

Posted by Radu Preotiuc-Pietro <ra...@bea.com>.
Elvira,

I had ran the Schema and the document through the 'validate' utility
that ships with XmlBeans initially. Now that you mentioned it, I have
also tried your code, same result, document validates.

Radu

On Mon, 2007-07-23 at 14:57 -0700, Vinh Nguyen (vinguye2) wrote:
> My guess is that the "e" in Mexico is a double-byte char.  So your XML document should actually be UTF-16, not UTF-8.  In your db table, perhaps you are using 16-bit chars, so your string would correctly appear to have 25 chars.  But in byte representation, it's actually 26 bytes = 26 chars.
>  
> 
> -----Original Message-----
> From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com] 
> Sent: Monday, July 23, 2007 2:45 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document with internationalcharacters- please help!
> 
> Radu,
> 
> You mean you tried the attached document with the attached schema, and the same code as in the original question, and it worked with no errors? If you open the attachment, can you go to line 148 and see the same line? Could you send your test case back to me and I'll try to run it? Because when I run my setup, no matter what I do, I cannot get around this error. Are you using the latest release of xmlbeans? Mine is dated 6/12/2006.
> 
> BTW, this document is composed from data from table data, and the column width for this column is defined as 25. That is how the schema is constructed. 
> 
> Thanks for your help.
> Elvira
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> Sent: Monday, July 23, 2007 5:26 PM
> To: user@xmlbeans.apache.org
> Subject: RE: trouble validating UTF-8 document with internationalcharacters- please help!
> 
> Sorry, my mistake, I did not notice the attachments.
> 
> However, it is not clear to me whether there are 25 or 26 characters in that string. I am rusty on UTF-8 encoding rules and Unicode, but I am sure that the 'é' character can be represented as either 'eacute' (Unicode 00E9) or a composite character (I would imagine there are many ways to represent it in this way). What XMLSchema says is 'count Unicode codepoints' as far as I can tell.
> 
> So, while it is not impossible that there is a bug, I think the far more likely possibility is that your document does not contain the characters you think it contains. I also doubt that the attachment is the same document that gives the error, because I have tried it and it works for me. So you would have to do some additional investigation on this, ideally get the exact bytes from that document.
> 
> Radu
> 
> On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> > Sorry, my e-mail had files attached to it, and the xml file was the document in question. 
> > Anyway, the offending line is:
> > 
> >          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> > 
> > As you can see, there are 25 characters in this element, but xmlbeans thinks there are 26.
> > 
> > Thanks,
> > Elvira
> > 
> > -----Original Message-----
> > From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> > Sent: Monday, July 23, 2007 4:33 PM
> > To: user@xmlbeans.apache.org
> > Subject: Re: trouble validating UTF-8 document with internationalcharacters - please help!
> > 
> > It would be more interesting to see the document in question at the 
> > line and column referenced in the error message: is that string longer 
> > than the declared maxLength facet?
> > 
> > (I wouldn't take XmlSpy as reference, since it is known to be
> > unreliable)
> > 
> > Radu
> > 
> > On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > > Hello,
> > > 
> > >  
> > > 
> > > I am using XmlBeans to validate a document against its schema.
> > > 
> > > It works fine, except when international characters are used in the 
> > > document.
> > > 
> > > For the attached document and the corresponding schema, the error 
> > > message is as following:
> > > 
> > >  
> > > 
> > > Node: COMP_NAME, Line: 148, Column: 10, Detail: string length (string) is greater than maxLength facet (26) for 25
> > >   Document encoding is: null
> > >  
> > > This document is validated with XmlSpy. What am I missing? The document file was written as UTF-8. 
> > > 
> > >  
> > > 
> > > The code follows. Thanks so much for your help. 
> > > 
> > >  
> > > 
> > >       private boolean xmlBeanValidate(File xmlFile, List sdocs) {
> > > 
> > >             XmlObject[] schemas = (XmlObject[]) sdocs.toArray(new 
> > > XmlObject[0]);
> > > 
> > >             SchemaTypeLoader sLoader;
> > > 
> > >             Collection compErrors = new ArrayList();
> > > 
> > >             XmlOptions schemaOptions = new XmlOptions();
> > > 
> > >             schemaOptions.setErrorListener(compErrors);
> > > 
> > >  
> > > 
> > >            try {
> > > 
> > >                   sLoader = XmlBeans.loadXsd(schemas, 
> > > schemaOptions);
> > > 
> > >             } catch (Exception e) {
> > > 
> > >                  if(compErrors.isEmpty() || !(e instanceof
> > > XmlException)) {
> > > 
> > >                         e.printStackTrace();
> > > 
> > >                   }
> > > 
> > >                   logError("Schema is invalid");
> > > 
> > >                  for (Iterator i = compErrors.iterator();
> > > i.hasNext();)
> > > 
> > >                         log(i.next().toString());
> > > 
> > >                  return false;
> > > 
> > >             }
> > > 
> > >  
> > > 
> > >             XmlObject xobj = null;
> > > 
> > >            try {
> > > 
> > >                   Reader sr = newFileReader(xmlFile);
> > > 
> > >                   XmlOptions opt = new XmlOptions();
> > > 
> > >                   opt.setCharacterEncoding("UTF-8");
> > > 
> > >                   opt.setLoadLineNumbers();
> > > 
> > >                   xobj = sLoader.parse(sr, null, opt);
> > > 
> > >             } catch (Exception e) {
> > > 
> > >                   logError("xml not loadable: " + e);
> > > 
> > >                   e.printStackTrace();
> > > 
> > >                  return false;
> > > 
> > >             }
> > > 
> > >  
> > > 
> > >             Collection errors = new ArrayList();
> > > 
> > >            if(xobj.schemaType() == XmlObject.type) {
> > > 
> > >                   logError("xml is NOT valid. Document type not 
> > > found.");
> > > 
> > >                  return false;
> > > 
> > >             } else if (xobj.validate(new 
> > > XmlOptions().setErrorListener(errors))){
> > > 
> > >                   log("Document validation completed 
> > > successfully.");
> > > 
> > >                  return true;
> > > 
> > >             }else {
> > > 
> > >                  for (Iterator it = errors.iterator(); 
> > > it.hasNext();) {
> > > 
> > >                         XmlError xmlError = (XmlError)it.next();
> > > 
> > >                     logError("  Node: " 
> > > 
> > > 
> > > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > > 
> > >                               +", Line: " + xmlError.getLine()
> > > 
> > >                               +", Column: " + xmlError.getColumn()
> > > 
> > >                               +", Detail: " + 
> > > xmlError.getMessage());
> > > 
> > >                     logError("  Document encoding is: " 
> > > 
> > > 
> > > +xobj.documentProperties().getEncoding());
> > > 
> > >  
> > > 
> > >                   }
> > > 
> > >                  return false;
> > > 
> > >             }
> > > 
> > >       }
> > > 
> > >  
> > > 
> > > 
> > > --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> > 
> 
> Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document with internationalcharacters- please help!

Posted by "Vinh Nguyen (vinguye2)" <vi...@cisco.com>.
My guess is that the "e" in Mexico is a double-byte char.  So your XML document should actually be UTF-16, not UTF-8.  In your db table, perhaps you are using 16-bit chars, so your string would correctly appear to have 25 chars.  But in byte representation, it's actually 26 bytes = 26 chars.
 

-----Original Message-----
From: Elvira_Gurevich@ibi.com [mailto:Elvira_Gurevich@ibi.com] 
Sent: Monday, July 23, 2007 2:45 PM
To: user@xmlbeans.apache.org
Subject: RE: trouble validating UTF-8 document with internationalcharacters- please help!

Radu,

You mean you tried the attached document with the attached schema, and the same code as in the original question, and it worked with no errors? If you open the attachment, can you go to line 148 and see the same line? Could you send your test case back to me and I'll try to run it? Because when I run my setup, no matter what I do, I cannot get around this error. Are you using the latest release of xmlbeans? Mine is dated 6/12/2006.

BTW, this document is composed from data from table data, and the column width for this column is defined as 25. That is how the schema is constructed. 

Thanks for your help.
Elvira

-----Original Message-----
From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
Sent: Monday, July 23, 2007 5:26 PM
To: user@xmlbeans.apache.org
Subject: RE: trouble validating UTF-8 document with internationalcharacters- please help!

Sorry, my mistake, I did not notice the attachments.

However, it is not clear to me whether there are 25 or 26 characters in that string. I am rusty on UTF-8 encoding rules and Unicode, but I am sure that the 'é' character can be represented as either 'eacute' (Unicode 00E9) or a composite character (I would imagine there are many ways to represent it in this way). What XMLSchema says is 'count Unicode codepoints' as far as I can tell.

So, while it is not impossible that there is a bug, I think the far more likely possibility is that your document does not contain the characters you think it contains. I also doubt that the attachment is the same document that gives the error, because I have tried it and it works for me. So you would have to do some additional investigation on this, ideally get the exact bytes from that document.

Radu

On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> Sorry, my e-mail had files attached to it, and the xml file was the document in question. 
> Anyway, the offending line is:
> 
>          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> 
> As you can see, there are 25 characters in this element, but xmlbeans thinks there are 26.
> 
> Thanks,
> Elvira
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com]
> Sent: Monday, July 23, 2007 4:33 PM
> To: user@xmlbeans.apache.org
> Subject: Re: trouble validating UTF-8 document with internationalcharacters - please help!
> 
> It would be more interesting to see the document in question at the 
> line and column referenced in the error message: is that string longer 
> than the declared maxLength facet?
> 
> (I wouldn't take XmlSpy as reference, since it is known to be
> unreliable)
> 
> Radu
> 
> On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > Hello,
> > 
> >  
> > 
> > I am using XmlBeans to validate a document against its schema.
> > 
> > It works fine, except when international characters are used in the 
> > document.
> > 
> > For the attached document and the corresponding schema, the error 
> > message is as following:
> > 
> >  
> > 
> > Node: COMP_NAME, Line: 148, Column: 10, Detail: string length (string) is greater than maxLength facet (26) for 25
> >   Document encoding is: null
> >  
> > This document is validated with XmlSpy. What am I missing? The document file was written as UTF-8. 
> > 
> >  
> > 
> > The code follows. Thanks so much for your help. 
> > 
> >  
> > 
> >       private boolean xmlBeanValidate(File xmlFile, List sdocs) {
> > 
> >             XmlObject[] schemas = (XmlObject[]) sdocs.toArray(new 
> > XmlObject[0]);
> > 
> >             SchemaTypeLoader sLoader;
> > 
> >             Collection compErrors = new ArrayList();
> > 
> >             XmlOptions schemaOptions = new XmlOptions();
> > 
> >             schemaOptions.setErrorListener(compErrors);
> > 
> >  
> > 
> >            try {
> > 
> >                   sLoader = XmlBeans.loadXsd(schemas, 
> > schemaOptions);
> > 
> >             } catch (Exception e) {
> > 
> >                  if(compErrors.isEmpty() || !(e instanceof
> > XmlException)) {
> > 
> >                         e.printStackTrace();
> > 
> >                   }
> > 
> >                   logError("Schema is invalid");
> > 
> >                  for (Iterator i = compErrors.iterator();
> > i.hasNext();)
> > 
> >                         log(i.next().toString());
> > 
> >                  return false;
> > 
> >             }
> > 
> >  
> > 
> >             XmlObject xobj = null;
> > 
> >            try {
> > 
> >                   Reader sr = newFileReader(xmlFile);
> > 
> >                   XmlOptions opt = new XmlOptions();
> > 
> >                   opt.setCharacterEncoding("UTF-8");
> > 
> >                   opt.setLoadLineNumbers();
> > 
> >                   xobj = sLoader.parse(sr, null, opt);
> > 
> >             } catch (Exception e) {
> > 
> >                   logError("xml not loadable: " + e);
> > 
> >                   e.printStackTrace();
> > 
> >                  return false;
> > 
> >             }
> > 
> >  
> > 
> >             Collection errors = new ArrayList();
> > 
> >            if(xobj.schemaType() == XmlObject.type) {
> > 
> >                   logError("xml is NOT valid. Document type not 
> > found.");
> > 
> >                  return false;
> > 
> >             } else if (xobj.validate(new 
> > XmlOptions().setErrorListener(errors))){
> > 
> >                   log("Document validation completed 
> > successfully.");
> > 
> >                  return true;
> > 
> >             }else {
> > 
> >                  for (Iterator it = errors.iterator(); 
> > it.hasNext();) {
> > 
> >                         XmlError xmlError = (XmlError)it.next();
> > 
> >                     logError("  Node: " 
> > 
> > 
> > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > 
> >                               +", Line: " + xmlError.getLine()
> > 
> >                               +", Column: " + xmlError.getColumn()
> > 
> >                               +", Detail: " + 
> > xmlError.getMessage());
> > 
> >                     logError("  Document encoding is: " 
> > 
> > 
> > +xobj.documentProperties().getEncoding());
> > 
> >  
> > 
> >                   }
> > 
> >                  return false;
> > 
> >             }
> > 
> >       }
> > 
> >  
> > 
> > 
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document with internationalcharacters- please help!

Posted by El...@ibi.com.
Radu,

You mean you tried the attached document with the attached schema, and the same code as in the original question, and it worked with no errors? If you open the attachment, can you go to line 148 and see the same line? Could you send your test case back to me and I'll try to run it? Because when I run my setup, no matter what I do, I cannot get around this error. Are you using the latest release of xmlbeans? Mine is dated 6/12/2006.

BTW, this document is composed from data from table data, and the column width for this column is defined as 25. That is how the schema is constructed. 

Thanks for your help.
Elvira

-----Original Message-----
From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
Sent: Monday, July 23, 2007 5:26 PM
To: user@xmlbeans.apache.org
Subject: RE: trouble validating UTF-8 document with internationalcharacters- please help!

Sorry, my mistake, I did not notice the attachments.

However, it is not clear to me whether there are 25 or 26 characters in
that string. I am rusty on UTF-8 encoding rules and Unicode, but I am
sure that the 'é' character can be represented as either
'eacute' (Unicode 00E9) or a composite character (I would imagine there
are many ways to represent it in this way). What XMLSchema says is
'count Unicode codepoints' as far as I can tell.

So, while it is not impossible that there is a bug, I think the far more
likely possibility is that your document does not contain the characters
you think it contains. I also doubt that the attachment is the same
document that gives the error, because I have tried it and it works for
me. So you would have to do some additional investigation on this,
ideally get the exact bytes from that document.

Radu

On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> Sorry, my e-mail had files attached to it, and the xml file was the document in question. 
> Anyway, the offending line is:
> 
>          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> 
> As you can see, there are 25 characters in this element, but xmlbeans thinks there are 26.
> 
> Thanks,
> Elvira
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
> Sent: Monday, July 23, 2007 4:33 PM
> To: user@xmlbeans.apache.org
> Subject: Re: trouble validating UTF-8 document with internationalcharacters - please help!
> 
> It would be more interesting to see the document in question at the line
> and column referenced in the error message: is that string longer than
> the declared maxLength facet?
> 
> (I wouldn't take XmlSpy as reference, since it is known to be
> unreliable)
> 
> Radu
> 
> On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > Hello,
> > 
> >  
> > 
> > I am using XmlBeans to validate a document against its schema.
> > 
> > It works fine, except when international characters are used in the
> > document.
> > 
> > For the attached document and the corresponding schema, the error
> > message is as following:
> > 
> >  
> > 
> > Node: COMP_NAME, Line: 148, Column: 10, Detail: string length (string) is greater than maxLength facet (26) for 25
> >   Document encoding is: null
> >  
> > This document is validated with XmlSpy. What am I missing? The document file was written as UTF-8. 
> > 
> >  
> > 
> > The code follows. Thanks so much for your help. 
> > 
> >  
> > 
> >       private boolean xmlBeanValidate(File xmlFile, List sdocs) {
> > 
> >             XmlObject[] schemas = (XmlObject[]) sdocs.toArray(new
> > XmlObject[0]);
> > 
> >             SchemaTypeLoader sLoader;
> > 
> >             Collection compErrors = new ArrayList();
> > 
> >             XmlOptions schemaOptions = new XmlOptions();
> > 
> >             schemaOptions.setErrorListener(compErrors);
> > 
> >  
> > 
> >            try {
> > 
> >                   sLoader = XmlBeans.loadXsd(schemas, schemaOptions);
> > 
> >             } catch (Exception e) {
> > 
> >                  if(compErrors.isEmpty() || !(e instanceof
> > XmlException)) {
> > 
> >                         e.printStackTrace();
> > 
> >                   }
> > 
> >                   logError("Schema is invalid");
> > 
> >                  for (Iterator i = compErrors.iterator();
> > i.hasNext();)
> > 
> >                         log(i.next().toString());
> > 
> >                  return false;
> > 
> >             }
> > 
> >  
> > 
> >             XmlObject xobj = null;
> > 
> >            try {
> > 
> >                   Reader sr = newFileReader(xmlFile);
> > 
> >                   XmlOptions opt = new XmlOptions();
> > 
> >                   opt.setCharacterEncoding("UTF-8");
> > 
> >                   opt.setLoadLineNumbers();
> > 
> >                   xobj = sLoader.parse(sr, null, opt);
> > 
> >             } catch (Exception e) {
> > 
> >                   logError("xml not loadable: " + e);
> > 
> >                   e.printStackTrace();
> > 
> >                  return false;
> > 
> >             }
> > 
> >  
> > 
> >             Collection errors = new ArrayList();
> > 
> >            if(xobj.schemaType() == XmlObject.type) {
> > 
> >                   logError("xml is NOT valid. Document type not
> > found.");
> > 
> >                  return false;
> > 
> >             } else if (xobj.validate(new
> > XmlOptions().setErrorListener(errors))){
> > 
> >                   log("Document validation completed successfully.");
> > 
> >                  return true;
> > 
> >             }else {
> > 
> >                  for (Iterator it = errors.iterator(); it.hasNext();)
> > {
> > 
> >                         XmlError xmlError = (XmlError)it.next();
> > 
> >                     logError("  Node: " 
> > 
> > 
> > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > 
> >                               +", Line: " + xmlError.getLine()
> > 
> >                               +", Column: " + xmlError.getColumn()
> > 
> >                               +", Detail: " + xmlError.getMessage());
> > 
> >                     logError("  Document encoding is: " 
> > 
> > 
> > +xobj.documentProperties().getEncoding());
> > 
> >  
> > 
> >                   }
> > 
> >                  return false;
> > 
> >             }
> > 
> >       }
> > 
> >  
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org


RE: trouble validating UTF-8 document with internationalcharacters - please help!

Posted by Radu Preotiuc-Pietro <ra...@bea.com>.
Sorry, my mistake, I did not notice the attachments.

However, it is not clear to me whether there are 25 or 26 characters in
that string. I am rusty on UTF-8 encoding rules and Unicode, but I am
sure that the 'é' character can be represented as either
'eacute' (Unicode 00E9) or a composite character (I would imagine there
are many ways to represent it in this way). What XMLSchema says is
'count Unicode codepoints' as far as I can tell.

So, while it is not impossible that there is a bug, I think the far more
likely possibility is that your document does not contain the characters
you think it contains. I also doubt that the attachment is the same
document that gives the error, because I have tried it and it works for
me. So you would have to do some additional investigation on this,
ideally get the exact bytes from that document.

Radu

On Mon, 2007-07-23 at 16:48 -0400, Elvira_Gurevich@ibi.com wrote:
> Sorry, my e-mail had files attached to it, and the xml file was the document in question. 
> Anyway, the offending line is:
> 
>          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> 
> As you can see, there are 25 characters in this element, but xmlbeans thinks there are 26.
> 
> Thanks,
> Elvira
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:radup@bea.com] 
> Sent: Monday, July 23, 2007 4:33 PM
> To: user@xmlbeans.apache.org
> Subject: Re: trouble validating UTF-8 document with internationalcharacters - please help!
> 
> It would be more interesting to see the document in question at the line
> and column referenced in the error message: is that string longer than
> the declared maxLength facet?
> 
> (I wouldn't take XmlSpy as reference, since it is known to be
> unreliable)
> 
> Radu
> 
> On Mon, 2007-07-23 at 15:57 -0400, Elvira_Gurevich@ibi.com wrote:
> > Hello,
> > 
> >  
> > 
> > I am using XmlBeans to validate a document against its schema.
> > 
> > It works fine, except when international characters are used in the
> > document.
> > 
> > For the attached document and the corresponding schema, the error
> > message is as following:
> > 
> >  
> > 
> > Node: COMP_NAME, Line: 148, Column: 10, Detail: string length (string) is greater than maxLength facet (26) for 25
> >   Document encoding is: null
> >  
> > This document is validated with XmlSpy. What am I missing? The document file was written as UTF-8. 
> > 
> >  
> > 
> > The code follows. Thanks so much for your help. 
> > 
> >  
> > 
> >       private boolean xmlBeanValidate(File xmlFile, List sdocs) {
> > 
> >             XmlObject[] schemas = (XmlObject[]) sdocs.toArray(new
> > XmlObject[0]);
> > 
> >             SchemaTypeLoader sLoader;
> > 
> >             Collection compErrors = new ArrayList();
> > 
> >             XmlOptions schemaOptions = new XmlOptions();
> > 
> >             schemaOptions.setErrorListener(compErrors);
> > 
> >  
> > 
> >            try {
> > 
> >                   sLoader = XmlBeans.loadXsd(schemas, schemaOptions);
> > 
> >             } catch (Exception e) {
> > 
> >                  if(compErrors.isEmpty() || !(e instanceof
> > XmlException)) {
> > 
> >                         e.printStackTrace();
> > 
> >                   }
> > 
> >                   logError("Schema is invalid");
> > 
> >                  for (Iterator i = compErrors.iterator();
> > i.hasNext();)
> > 
> >                         log(i.next().toString());
> > 
> >                  return false;
> > 
> >             }
> > 
> >  
> > 
> >             XmlObject xobj = null;
> > 
> >            try {
> > 
> >                   Reader sr = newFileReader(xmlFile);
> > 
> >                   XmlOptions opt = new XmlOptions();
> > 
> >                   opt.setCharacterEncoding("UTF-8");
> > 
> >                   opt.setLoadLineNumbers();
> > 
> >                   xobj = sLoader.parse(sr, null, opt);
> > 
> >             } catch (Exception e) {
> > 
> >                   logError("xml not loadable: " + e);
> > 
> >                   e.printStackTrace();
> > 
> >                  return false;
> > 
> >             }
> > 
> >  
> > 
> >             Collection errors = new ArrayList();
> > 
> >            if(xobj.schemaType() == XmlObject.type) {
> > 
> >                   logError("xml is NOT valid. Document type not
> > found.");
> > 
> >                  return false;
> > 
> >             } else if (xobj.validate(new
> > XmlOptions().setErrorListener(errors))){
> > 
> >                   log("Document validation completed successfully.");
> > 
> >                  return true;
> > 
> >             }else {
> > 
> >                  for (Iterator it = errors.iterator(); it.hasNext();)
> > {
> > 
> >                         XmlError xmlError = (XmlError)it.next();
> > 
> >                     logError("  Node: " 
> > 
> > 
> > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > 
> >                               +", Line: " + xmlError.getLine()
> > 
> >                               +", Column: " + xmlError.getColumn()
> > 
> >                               +", Detail: " + xmlError.getMessage());
> > 
> >                     logError("  Document encoding is: " 
> > 
> > 
> > +xobj.documentProperties().getEncoding());
> > 
> >  
> > 
> >                   }
> > 
> >                  return false;
> > 
> >             }
> > 
> >       }
> > 
> >  
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> > For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
> For additional commands, e-mail: user-help@xmlbeans.apache.org
> 

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: user-help@xmlbeans.apache.org