You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Joshua Santelli <js...@cornell.edu> on 2004/04/07 22:59:38 UTC

high value unicode characters

Hello,

We're using Xerces SAX2Print, version 2.5.0 
(xerces-c_2_5_0-solaris_27-cc_62) and have run into a problem with a few 
"high value" unicode characters.  What we would like to do is validate the 
file and convert it to UTF-8.  The SAX2Print process completes with no 
error but there appears to be some strange characters after the high value 
unicode characters (&#x1D5A2;, &#x1D5A7; and &#x1D4AB;) in the output.

     The command is: # SAX2Print -v=always -x=UTF-8 test1.xml

The error that I get using SAX2Print on the output XML file is:

     Fatal Error at file test1-out.xml, line 5, char 35
       Message: Got an unexpected trailing surrogate character


Any idea what is going wrong here?

Thanks in advance,
josh


=========================
<?xml version="1.0"?>
<!DOCTYPE test SYSTEM "test.dtd">
<test>
         <testPara>
                 <head>1. high value Unicode characters and some 
punctuation as entities</head>
                 <p>Assuming &#x1D5A2;&#x1D5A7;, Hindman [ht1] showed that 
the existence of certain ultrafilters on the power set of the natural 
numbers is equivalent to Hindman&#x2019;s Theorem.  Adapting this work to a 
countable setting formalized in RCA<sub>0</sub>, this article proves the 
equivalence of the existence of certain ultrafilters on countable Boolean 
algebras and an iterated form of Hindman&#x2019;s Theorem, which is closely 
related to Milliken&#x2019;s Theorem.</p>
         </testPara>
         <testPara>
                 <head>2. high value Unicode char and some Greek as 
entities</head>
                 <p>This article is a continuation of our search for 
tautologies that are hard even for strong propositional proof systems like 
EF, cf. [Kra-wphp,Kra-tau].  The particular tautologies we study, the 
&#x03C4;-formulas, are obtained from any &#x1D4AB;/poly map g; they express 
that a string is outside of the range of g. Maps g considered here are 
particular pseudorandom generators. The ultimate goal is to deduce the 
hardness of the &#x03C4;-formulas for at least EF from some general, 
plausible computational hardness hypothesis.</p>
         </testPara>
</test>
=========================
<!ELEMENT test (testPara+) >
<!ELEMENT testPara (head, p) >
<!ELEMENT head (#PCDATA) >
<!ELEMENT p (#PCDATA | b | i | sub)* >
<!ELEMENT b (#PCDATA) >
<!ELEMENT i (#PCDATA) >
<!ELEMENT sub (#PCDATA) >
=========================


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org