You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-dev@xml.apache.org by Konrad Scherer <bc...@uottawa.ca> on 2002/05/28 17:19:21 UTC

UTF-8 [again]

Hello all,

As I am new to the list I will briefly explain my situation. I work for 
small university project that is creating a fully bilingual Canadian 
English and French dictionary. The project started in the 1980`s and is 
currently still done in SGML. I have completed the conversion to XML and I 
have now planning on using the Xindice and Cocoon combination to search the 
XML documents and make them available on the Internet.
As I am sure most of you know the 1.0 release totally garbled all special 
chars including the letters with french accents. The CVS (last friday May 
24) version didn`t solve the problem but the patch from
http://lambiek.amplexor.be/downloads/xindice-utf8-patches
outputs the special chars in UTF-8. Thank you very much for your work.
Some additional observations.
1) The last command in the patch failed. Probably not important.
2) The Xindice-HTTP-0.8 package does not like UTF-8 characters.
I also have a question. Xindice resolves all entities when storing 
document. Would it not be possible to store the unresolved entity? XPath 
queries would resolve the entity during a search but I could retrieve the 
document with the entities unresolved and then let the browser or whatever 
worry about the display.
Again thank you for your time.

Konrad Scherer