You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "liaomingxue (JIRA)" <xe...@xml.apache.org> on 2009/02/20 04:06:03 UTC
[jira] Commented: (XERCESJ-1359) DOMParser exception with an xml
file which name contains Chinese characters
[ https://issues.apache.org/jira/browse/XERCESJ-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675222#action_12675222 ]
liaomingxue commented on XERCESJ-1359:
--------------------------------------
I have found a solution to this problem. I think xcerces should support GB2312 encoding (Chinese).
The solution also solves another problem that xceces does not support files encoded in GB2312.
The solution is as below:
in org.apache.xerces.impl.XMLEntityManager.createReader(InputStream,String,Boolean) add:
/**
* why not supporting GB2312?
* @author liaomingxue@sohu.com
*/
if(encoding.equals("GB2312"))
{
return new InputStreamReader(inputStream,encoding);
}
and in org.apache.xerces.util.URI.initializePath(String, int) update:
else if (!isPathCharacter(testChar))
{
/**
* @author liaomingxue@sohu.com
* The path part of a URI may contain characters which are not included in URI Spec.
*/
if(Character.isUnicodeIdentifierStart(testChar)||Character.isUnicodeIdentifierPart(testChar))
{
++index;
continue;
}
if (testChar == '?' || testChar == '#')
{
break;
}
throw new MalformedURIException("Path contains invalid character: " + testChar);
}
and in org.apache.xerces.util.URI.initializePath(String, int) update:
else if (!isURICharacter(testChar))
{
/**
* A path may contain Chinese characters,
* but I am not sure that the method used here is right.
* And I believe that there must be other parts of this file to be corrected.
* And why not use java.net.URI?
* By liaomingxue@sohu.com
*/
if(Character.isUnicodeIdentifierPart(testChar)||Character.isUnicodeIdentifierStart(testChar))
{
index++;
continue;
}
throw new MalformedURIException(
"Opaque part contains invalid character: " + testChar);
}
> DOMParser exception with an xml file which name contains Chinese characters
> ---------------------------------------------------------------------------
>
> Key: XERCESJ-1359
> URL: https://issues.apache.org/jira/browse/XERCESJ-1359
> Project: Xerces2-J
> Issue Type: Bug
> Components: JAXP (javax.xml.parsers)
> Affects Versions: 2.9.1
> Environment: Windows in China
> Reporter: liaomingxue
> Priority: Minor
>
> Under the same directory, there are an xml file a.xml and a schema file r.xsd.
> With the code below, all is ok. But if change the name of the file a.xml to a name containing Chinese characters (eg 中.xml), then the DOMParser issues an Exception:
> java.net.MalformedURLException: unknown protocol: e
> at java.net.URL.<init>(URL.java:586)
> at java.net.URL.<init>(URL.java:476)
> at java.net.URL.<init>(URL.java:425)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
> at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at xml.DOMParserDemo.main(DOMParserDemo.java:36)
>
> And if replace parser.parse("E:/a.xml"); with parser.parse("file:E:/中.xml"); then it gives some warnings and errors:
> [Warning] 中.xml:3:117: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Error] 中.xml:3:117: cvc-elt.1: Cannot find the declaration of element 'ResourceReg'.
> [Warning] 中.xml:5:16: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:8:18: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:10:15: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:12:18: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:18:19: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:20:11: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:22:11: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:24:11: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> [Warning] 中.xml:26:10: schema_reference.4: Failed to read schema document 'r.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
> the code:
> try
> {
> DOMParser parser = new DOMParser();
> parser.setFeature("http://xml.org/sax/features/validation",true);
> parser.setFeature("http://apache.org/xml/features/validation/schema",true);
> parser.parse("E:/a.xml");
> Document doc = parser.getDocument();
> }
> catch(Exception e)
> {
> e.printStackTrace();
> }
> a.xml:
> <?xml version="1.0" encoding="gb2312"?>
> <ResourceReg xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNameSpaceSchemaLocation="r.xsd">
> <ResourceFig>
> <ResourceKID>512 </ResourceKID>
> <PortAddr>192.192.192.222:1:1 </PortAddr>
> <ResourceSID>3 </ResourceSID>
> </ResourceFig>
> <ResourceStatus>
> <ZBWZ>135,26 </ZBWZ>
> <YXZT>1 </YXZT>
> <CSQB>true </CSQB>
> <BKF>2 </BKF>
> </ResourceStatus>
> </ResourceReg>
> r.xsd:
> <?xml version="1.0"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
> <xs:element name="ResourceReg">
> <xs:complexType>
> <xs:all>
> <xs:element name="ResourceFig" type="ResourceFig" minOccurs="1" maxOccurs="1" />
> <xs:element name="ResourceStatus" minOccurs="1" maxOccurs="1" />
> </xs:all>
> </xs:complexType>
> </xs:element>
> <xs:complexType name="ResourceFig">
> <xs:all>
> <xs:element name="ResourceKID" type="xs:unsignedShort" />
> <xs:element name="PortAddr" type="xs:token" />
> <xs:element name="ResourceSID" type="xs:unsignedByte" />
> </xs:all>
> </xs:complexType>
> </xs:schema>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org