You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Joshua Santelli <js...@cornell.edu> on 2004/04/07 22:59:38 UTC
high value unicode characters
Hello,
We're using Xerces SAX2Print, version 2.5.0
(xerces-c_2_5_0-solaris_27-cc_62) and have run into a problem with a few
"high value" unicode characters. What we would like to do is validate the
file and convert it to UTF-8. The SAX2Print process completes with no
error but there appears to be some strange characters after the high value
unicode characters (𝖢, 𝖧 and 𝒫) in the output.
The command is: # SAX2Print -v=always -x=UTF-8 test1.xml
The error that I get using SAX2Print on the output XML file is:
Fatal Error at file test1-out.xml, line 5, char 35
Message: Got an unexpected trailing surrogate character
Any idea what is going wrong here?
Thanks in advance,
josh
=========================
<?xml version="1.0"?>
<!DOCTYPE test SYSTEM "test.dtd">
<test>
<testPara>
<head>1. high value Unicode characters and some
punctuation as entities</head>
<p>Assuming 𝖢𝖧, Hindman [ht1] showed that
the existence of certain ultrafilters on the power set of the natural
numbers is equivalent to Hindman’s Theorem. Adapting this work to a
countable setting formalized in RCA<sub>0</sub>, this article proves the
equivalence of the existence of certain ultrafilters on countable Boolean
algebras and an iterated form of Hindman’s Theorem, which is closely
related to Milliken’s Theorem.</p>
</testPara>
<testPara>
<head>2. high value Unicode char and some Greek as
entities</head>
<p>This article is a continuation of our search for
tautologies that are hard even for strong propositional proof systems like
EF, cf. [Kra-wphp,Kra-tau]. The particular tautologies we study, the
τ-formulas, are obtained from any 𝒫/poly map g; they express
that a string is outside of the range of g. Maps g considered here are
particular pseudorandom generators. The ultimate goal is to deduce the
hardness of the τ-formulas for at least EF from some general,
plausible computational hardness hypothesis.</p>
</testPara>
</test>
=========================
<!ELEMENT test (testPara+) >
<!ELEMENT testPara (head, p) >
<!ELEMENT head (#PCDATA) >
<!ELEMENT p (#PCDATA | b | i | sub)* >
<!ELEMENT b (#PCDATA) >
<!ELEMENT i (#PCDATA) >
<!ELEMENT sub (#PCDATA) >
=========================
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org