You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-dev@xml.apache.org by James Bates <ja...@amplexor.com> on 2002/10/16 14:17:44 UTC
Another peek at Xindice development
Just downloaded and checked out latest CVS state of Xindice, in particular
UTF-8 status ;)
Seems that using the cmd-line tools everything is fine, though the cmdline
option to choose output encoding I once added has disappeared.
Now that Xindice is a webapp however, it is also possible to retrive documents
through HTTP, and these are delivered incorrectly. Here's why:
in the file org/apache/xindice/server/XindiceServlet.java, method doGET().
First, the result is calculated as a java.lang.String (i.e. pure Unicode).
Near the end of the method, this string is written to the Servlet's output
stream using the *DANGEROUS* java.lang.String.getBytes(). More correct
would be resultString.getBytes( SOME_ENCODING). In addition, as the output is
XML in the case of a document, and the XML contains no encoding="xxx" in its
declaration, the UTF-8 encoding is in fact required.
Changing the encoding to UTF-8 causes two other, easily resolvable problems:
1) the content-type "text/xml" should then become "text/xml; charset=utf-8" so
the browser's encoding detection can work.
2) the specified content/length is wrong, as with UTF-8, the string length,
and encoded byte-array length are different. Simply omitting the
content-length header is the easiest solution.
I have tweaked my local copy in this way, and HTTP output is now totally
correct.
Would it be appreciated if I attempted to patch the CVS version of
XindiceServlet.java?
Regards,
James Bates