You are viewing a plain text version of this content. The canonical link for it is here.

Posted to xindice-dev@xml.apache.org by James Bates <ja...@amplexor.com> on 2002/10/16 14:17:44 UTC

Another peek at Xindice development

Just downloaded and checked out latest CVS state of Xindice, in particular 
UTF-8 status ;)

Seems that using the cmd-line tools everything is fine, though the cmdline 
option to choose output encoding I once added has disappeared.

Now that Xindice is a webapp however, it is also possible to retrive documents 
through HTTP, and these are delivered incorrectly. Here's why:

in the file org/apache/xindice/server/XindiceServlet.java, method doGET().

First, the result is calculated as a java.lang.String (i.e. pure Unicode). 
Near the end of the method, this string is written to the Servlet's output 
stream using the *DANGEROUS* java.lang.String.getBytes(). More correct
would be resultString.getBytes( SOME_ENCODING). In addition, as the output is 
XML in the case of a document, and the XML contains no encoding="xxx" in its
declaration, the UTF-8 encoding is in fact required.

Changing the encoding to UTF-8 causes two other, easily resolvable problems:
1) the content-type "text/xml" should then become "text/xml; charset=utf-8" so 
the browser's encoding detection can work.
2) the specified content/length is wrong, as with UTF-8, the string length, 
and encoded byte-array length are different. Simply omitting the 
content-length header is the easiest solution.

I have tweaked my local copy in this way, and HTTP output is now totally 
correct.

Would it be appreciated if I attempted to patch the CVS version of 
XindiceServlet.java?

Regards,
James Bates