You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Olle Olsson <ol...@sics.se> on 2005/03/21 17:50:17 UTC

Apache Xindice - export/import - encodings lost

About:   xindice-1.1b4 - commandline tool
Question: How to preserve encodings across export/import ???

--------------------------------
Scenario explaining the problem
--------------------------------
Step 1.
Adding document:
     xindice ad -c xmldb:xindice://localhost:8080/db/foo/ -n 36.xml  
-f   36.xml
on document 36.xml:
     <?xml version="1.0" encoding="iso-8859-1"?> ... etc ...
This works OK

Step 2.
Retrieving document:
     xindice rd -c xmldb:xindice://localhost:8080/db/foo/ -n 36.xml  -f 
36a.xml
results in document 36a.xml:
     <?xml version="1.0"?> ... etc ...
Payload of extracted doc (36a.xml) is very much identical to payload of 
original document (36.xml)

Step 3.
Adding document:
     xindice ad -c xmldb:xindice://localhost:8080/db/foo/ -n 36a.xml  
-f   36a.xml
on document 36a.xml above.

This results in error:
     ERROR : Invalid byte 2 of 3-byte UTF-8 sequence.

--------------------------------
What is happening here?
--------------------------------

The reason for this is that there is an ISO-8859-1 character in the 
source document of step 1 (which was the reason for the explicit 
encoding in the document)

This is a practical problem.
In the off-the-shelf installation of Xindice 1.0 this was a non-problem 
-- that tool seems to be more tolerant.
In Xindice 1.1 ... how can one make it work?

--------------------------------
Run-time environment
--------------------------------
  
Xindice server run in an off-the-shelf Tomcat/Cocoon framwork.
 - Tomcat 4.1.12
 - Cocoon 2.1.6


-- 
------------------------------------------------------------------
Olle Olsson   olleo@sics.se   Tel: +46 8 633 15 19  Fax: +46 8 751 72 30
	[Svenska W3C-kontoret: olleo@w3.org]
SICS [Swedish Institute of Computer Science]
Box 1263
SE - 164 29 Kista
Sweden
------------------------------------------------------------------



Re: Apache Xindice - export/import - encodings lost

Posted by Vadim Gritsenko <va...@reverycodes.com>.
Olle Olsson wrote:
> About:   xindice-1.1b4 - commandline tool
> Question: How to preserve encodings across export/import ???
> 
> --------------------------------
> Scenario explaining the problem
> --------------------------------
> Step 1.
> Adding document:
>     xindice ad -c xmldb:xindice://localhost:8080/db/foo/ -n 36.xml  -f   
> 36.xml
> on document 36.xml:
>     <?xml version="1.0" encoding="iso-8859-1"?> ... etc ...
> This works OK
> 
> Step 2.
> Retrieving document:
>     xindice rd -c xmldb:xindice://localhost:8080/db/foo/ -n 36.xml  -f 
> 36a.xml
> results in document 36a.xml:
>     <?xml version="1.0"?> ... etc ...
> Payload of extracted doc (36a.xml) is very much identical to payload of 
> original document (36.xml)
> 
> Step 3.
> Adding document:
>     xindice ad -c xmldb:xindice://localhost:8080/db/foo/ -n 36a.xml  
> -f   36a.xml
> on document 36a.xml above.
> 
> This results in error:
>     ERROR : Invalid byte 2 of 3-byte UTF-8 sequence.

This seems like a bug. Please file bug report into the bugzilla. And if you know 
how to fix it, attach a patch to the bug report.

Thanks,
Vadim