You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "liaomingxue (JIRA)" <xe...@xml.apache.org> on 2009/02/20 04:06:03 UTC
[jira] Commented: (XERCESJ-1359) DOMParser exception with an xml file which name contains Chinese characters

    [ https://issues.apache.org/jira/browse/XERCESJ-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675222#action_12675222 ] 

liaomingxue commented on XERCESJ-1359:
--------------------------------------

I have found a solution to this problem. I think xcerces should support GB2312 encoding (Chinese).
The solution also solves another problem that xceces does not support files encoded in GB2312.
The solution is as below:

in org.apache.xerces.impl.XMLEntityManager.createReader(InputStream,String,Boolean) add:
	        /**
	         * why not supporting GB2312?
	         * @author liaomingxue@sohu.com
	         */
	        if(encoding.equals("GB2312"))
	        {
	           return new InputStreamReader(inputStream,encoding);
	        }

and in org.apache.xerces.util.URI.initializePath(String, int) update:
                else if (!isPathCharacter(testChar))
                {
                  /**
                   * @author liaomingxue@sohu.com
                   * The path part of a URI may contain characters which are not included in URI Spec.
                   */
                  if(Character.isUnicodeIdentifierStart(testChar)||Character.isUnicodeIdentifierPart(testChar))
                  {
                    ++index;
                    continue;
                  }

                  if (testChar == '?' || testChar == '#')
                  {
                    break;
                  }
                  throw new MalformedURIException("Path contains invalid character: " + testChar);
                }

and in org.apache.xerces.util.URI.initializePath(String, int) update:
                else if (!isURICharacter(testChar))
                {
                  /**
                   * A path may contain Chinese characters,
                   * but I am not sure that the method used here is right.
                   * And I believe that there must be other parts of this file to be corrected.
                   * And why not use java.net.URI?
                   * By liaomingxue@sohu.com
                   */
                  if(Character.isUnicodeIdentifierPart(testChar)||Character.isUnicodeIdentifierStart(testChar))
                  {
                    index++;
                    continue;
                  }
                    throw new MalformedURIException(
                        "Opaque part contains invalid character: " + testChar);
                }




> DOMParser exception with an xml file which name contains Chinese characters
> ---------------------------------------------------------------------------
>
>                 Key: XERCESJ-1359
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1359
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: JAXP (javax.xml.parsers)
>    Affects Versions: 2.9.1
>         Environment: Windows in China
>            Reporter: liaomingxue
>            Priority: Minor
>
> Under the same directory, there are an xml file a.xml and a schema file r.xsd. 
> With the code below, all is ok. But if change the name of the file a.xml to a name containing Chinese characters (eg 中.xml), then the DOMParser issues an Exception:
> java.net.MalformedURLException: unknown protocol: e
> 	at java.net.URL.<init>(URL.java:586)
> 	at java.net.URL.<init>(URL.java:476)
> 	at java.net.URL.<init>(URL.java:425)
> 	at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
> 	at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
> 	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> 	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> 	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> 	at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> 	at xml.DOMParserDemo.main(DOMParserDemo.java:36)
>  
> And if replace parser.parse("E:/a.xml");  with parser.parse("file:E:/中.xml");  then it gives some warnings and errors:
> [Warning] 中.xml:3:117: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Error] 中.xml:3:117: cvc-elt.1: Cannot find the declaration of element 'ResourceReg'.
> [Warning] 中.xml:5:16: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:8:18: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:10:15: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:12:18: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:18:19: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:20:11: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:22:11: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:24:11: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:26:10: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> the code:
>    try 
>     { 
>       DOMParser parser = new DOMParser(); 
>       parser.setFeature("http://xml.org/sax/features/validation",true); 
>       parser.setFeature("http://apache.org/xml/features/validation/schema",true); 
>       parser.parse("E:/a.xml"); 
>       Document doc = parser.getDocument(); 
>     } 
>     catch(Exception e) 
>     { 
>       e.printStackTrace(); 
>     } 
> a.xml:
> <?xml version="1.0" encoding="gb2312"?> 
> <ResourceReg xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNameSpaceSchemaLocation="r.xsd"> 
>   <ResourceFig> 
>     <ResourceKID>512 </ResourceKID> 
>     <PortAddr>192.192.192.222:1:1 </PortAddr> 
>     <ResourceSID>3 </ResourceSID> 
>   </ResourceFig> 
>   <ResourceStatus> 
>     <ZBWZ>135,26 </ZBWZ> 
>     <YXZT>1 </YXZT> 
>     <CSQB>true </CSQB> 
>     <BKF>2 </BKF> 
>   </ResourceStatus> 
> </ResourceReg> 
> r.xsd:  
> <?xml version="1.0"?> 
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> 
> <xs:element name="ResourceReg"> 
>   <xs:complexType> 
>   <xs:all> 
>     <xs:element name="ResourceFig" type="ResourceFig" minOccurs="1" maxOccurs="1" /> 
>     <xs:element name="ResourceStatus" minOccurs="1" maxOccurs="1" /> 
>   </xs:all> 
>   </xs:complexType> 
> </xs:element> 
> <xs:complexType name="ResourceFig"> 
>   <xs:all> 
>   <xs:element name="ResourceKID" type="xs:unsignedShort" /> 
>   <xs:element name="PortAddr" type="xs:token" /> 
>   <xs:element name="ResourceSID" type="xs:unsignedByte" /> 
>   </xs:all> 
> </xs:complexType> 
> </xs:schema>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org