You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Adrian Petru Dimulescu <ad...@free.fr> on 2002/06/10 22:00:17 UTC

problems with iso-8859-2 text

hello,

i've just discovered XIndice and i am planning using it with cocoon for a 
small digital library project. 

the problem is that importing iso-8859-2 (east-european) xml documents results 
in "?" characters replacing every non-ascii character, rendering the 
resulting text unusable.

i currently have the 1.0 version but i played a little with a cvs build, with 
the same (non) results.

i would then appreciate any idea concerning the use of non-english text in 
XIndice. somebody on the cocoon list told me that russian text and the 
iso-8859-1 encoding works. i have not tried it but if anybody here can share 
some experience, it would be most welcome.

thank you,
adrian.

Re: problems with iso-8859-2 text

Posted by Juozas Baliuka <ba...@centras.lt>.
No,  "single encoding mode" it is not JAVA term.
If application uses JAVA io API
without endoding like " new InputStreamReader(in)" , InputStreamReader will
use
some "default" encoding, It can be set in system properties. It is no
problems if your
application uses single endoding (All documents are the same encoded ) and
single
application is running in the JVM.
 "InputStreamReader(InputStream in, String enc)" this constructor will
ignore "default" setting and will use
parameter "enc" to convert bytes  to Character or string.


> Hello,
>
> It is the first time I hear about single encoding mode. Is it a switch,
> an option to java?
>
>
> > -----Original Message-----
> > From: Juozas Baliuka [mailto:baliuka@centras.lt]
> > Sent: Tuesday, June 11, 2002 6:26 PM
> > To: xindice-users@xml.apache.org
> > Subject: Re: problems with iso-8859-2 text
> >
> >
> > I think it must work in single encoding mode if you will set
> > System property for JVM (default encoding), "file.encoding"
> > or something like this, I don't remember a key. It is common
> > for all JAVA application on for systems,if application use
> > "default" encoding.
> >
> >
> > > Hi,
> > > It was me (in cocoon list). To store non ASCII characters under
> > > Xindice it
> > is
> > > necessary to set appropriate system locale. That works at
> > least for me
> > > on
> > Windows
> > > NT.
> > >
> > > Best regards
> > > Roman
> > >
> > > Adrian Petru Dimulescu wrote:
> > >
> > > > hello,
> > > >
> > > > i've just discovered XIndice and i am planning using it
> > with cocoon
> > > > for
> > a
> > > > small digital library project.
> > > >
> > > > the problem is that importing iso-8859-2 (east-european) xml
> > > > documents
> > results
> > > > in "?" characters replacing every non-ascii character,
> > rendering the
> > > > resulting text unusable.
> > > >
> > > > i currently have the 1.0 version but i played a little with a cvs
> > > > build,
> > with
> > > > the same (non) results.
> > > >
> > > > i would then appreciate any idea concerning the use of
> > non-english
> > > > text
> > in
> > > > XIndice. somebody on the cocoon list told me that russian
> > text and
> > > > the iso-8859-1 encoding works. i have not tried it but if anybody
> > > > here can
> > share
> > > > some experience, it would be most welcome.
> > > >
> > > > thank you,
> > > > adrian.
> > >
> >
> >
> >
>
>


RE: problems with iso-8859-2 text

Posted by Adrian Petru Dimulescu <ad...@free.fr>.
Hello,

It is the first time I hear about single encoding mode. Is it a switch,
an option to java?


> -----Original Message-----
> From: Juozas Baliuka [mailto:baliuka@centras.lt] 
> Sent: Tuesday, June 11, 2002 6:26 PM
> To: xindice-users@xml.apache.org
> Subject: Re: problems with iso-8859-2 text
> 
> 
> I think it must work in single encoding mode if you will set 
> System property for JVM (default encoding), "file.encoding"  
> or something like this, I don't remember a key. It is common 
> for all JAVA application on for systems,if application use 
> "default" encoding.
> 
> 
> > Hi,
> > It was me (in cocoon list). To store non ASCII characters under 
> > Xindice it
> is
> > necessary to set appropriate system locale. That works at 
> least for me 
> > on
> Windows
> > NT.
> >
> > Best regards
> > Roman
> >
> > Adrian Petru Dimulescu wrote:
> >
> > > hello,
> > >
> > > i've just discovered XIndice and i am planning using it 
> with cocoon 
> > > for
> a
> > > small digital library project.
> > >
> > > the problem is that importing iso-8859-2 (east-european) xml 
> > > documents
> results
> > > in "?" characters replacing every non-ascii character, 
> rendering the 
> > > resulting text unusable.
> > >
> > > i currently have the 1.0 version but i played a little with a cvs 
> > > build,
> with
> > > the same (non) results.
> > >
> > > i would then appreciate any idea concerning the use of 
> non-english 
> > > text
> in
> > > XIndice. somebody on the cocoon list told me that russian 
> text and 
> > > the iso-8859-1 encoding works. i have not tried it but if anybody 
> > > here can
> share
> > > some experience, it would be most welcome.
> > >
> > > thank you,
> > > adrian.
> >
> 
> 
> 



Re: problems with iso-8859-2 text

Posted by Juozas Baliuka <ba...@centras.lt>.
I think it must work in single encoding mode if you will set System property
for JVM (default encoding), "file.encoding"  or something like this, I don't
remember a key.
It is common for all JAVA application on for systems,if application use
"default" encoding.


> Hi,
> It was me (in cocoon list). To store non ASCII characters under Xindice it
is
> necessary to set appropriate system locale. That works at least for me on
Windows
> NT.
>
> Best regards
> Roman
>
> Adrian Petru Dimulescu wrote:
>
> > hello,
> >
> > i've just discovered XIndice and i am planning using it with cocoon for
a
> > small digital library project.
> >
> > the problem is that importing iso-8859-2 (east-european) xml documents
results
> > in "?" characters replacing every non-ascii character, rendering the
> > resulting text unusable.
> >
> > i currently have the 1.0 version but i played a little with a cvs build,
with
> > the same (non) results.
> >
> > i would then appreciate any idea concerning the use of non-english text
in
> > XIndice. somebody on the cocoon list told me that russian text and the
> > iso-8859-1 encoding works. i have not tried it but if anybody here can
share
> > some experience, it would be most welcome.
> >
> > thank you,
> > adrian.
>


Re: problems with iso-8859-2 text

Posted by Kimbro Staken <ks...@xmldatabases.org>.
On Tuesday, June 11, 2002, at 01:58  PM, Adrian Petru Dimulescu wrote:
>
> On the other hand, what's happening with the UTF-8 under Xindice? I took
> a look at the black Xindice mail archive and I saw a message with a
> patch. Getting the latest Xindice CVS, I could not find this patch in
> the current source. Or perhaps I didn't look well enough?
>

As far as I know this patch was applied, but it will only work with the 
XML:DB API in scratchpad.

> I am sorry this is the twentieth time this question is asked on the list
> but, since I am new here, could anybody inform me what's the i18n status
> on Xindice?

As far as I know, the code in CVS is working with UTF-8, but the details 
are incomplete.

>
> Thank you,
> Adrian.
>
Kimbro Staken
Java and XML Software, Consulting and Writing http://www.xmldatabases.org/
Apache Xindice native XML database http://xml.apache.org/xindice
XML:DB Initiative http://www.xmldb.org


RE: problems with iso-8859-2 text

Posted by Adrian Petru Dimulescu <ad...@free.fr>.
Hello again :)

This is in fact the thing I don't understand: how can the system locale
influence the way Xindice stores the special characters?

I built Xindice under Linux, I wonder how should I set the locale there.
Something like "LANG=RO $XINDICE_HOME/start" ? I must admit I am quite
confused at this point.

On the other hand, what's happening with the UTF-8 under Xindice? I took
a look at the black Xindice mail archive and I saw a message with a
patch. Getting the latest Xindice CVS, I could not find this patch in
the current source. Or perhaps I didn't look well enough?

I am sorry this is the twentieth time this question is asked on the list
but, since I am new here, could anybody inform me what's the i18n status
on Xindice?

Thank you,
Adrian.


> -----Original Message-----
> From: KOZLOV Roman [mailto:r-kozlov@opencascade.com] 
> Sent: Tuesday, June 11, 2002 9:30 AM
> To: xindice-users@xml.apache.org
> Subject: Re: problems with iso-8859-2 text
> 
> 
> Hi,
> It was me (in cocoon list). To store non ASCII characters 
> under Xindice it is necessary to set appropriate system 
> locale. That works at least for me on Windows NT.
> 
> Best regards
> Roman
> 
> Adrian Petru Dimulescu wrote:
> 
> > hello,
> >
> > i've just discovered XIndice and i am planning using it with cocoon 
> > for a small digital library project.
> >
> > the problem is that importing iso-8859-2 (east-european) 
> xml documents 
> > results in "?" characters replacing every non-ascii character, 
> > rendering the resulting text unusable.
> >
> > i currently have the 1.0 version but i played a little with a cvs 
> > build, with the same (non) results.
> >
> > i would then appreciate any idea concerning the use of non-english 
> > text in XIndice. somebody on the cocoon list told me that 
> russian text 
> > and the iso-8859-1 encoding works. i have not tried it but 
> if anybody 
> > here can share some experience, it would be most welcome.
> >
> > thank you,
> > adrian.
> 
> 
> 



Re: problems with iso-8859-2 text

Posted by KOZLOV Roman <r-...@opencascade.com>.
Hi,
It was me (in cocoon list). To store non ASCII characters under Xindice it is
necessary to set appropriate system locale. That works at least for me on Windows
NT.

Best regards
Roman

Adrian Petru Dimulescu wrote:

> hello,
>
> i've just discovered XIndice and i am planning using it with cocoon for a
> small digital library project.
>
> the problem is that importing iso-8859-2 (east-european) xml documents results
> in "?" characters replacing every non-ascii character, rendering the
> resulting text unusable.
>
> i currently have the 1.0 version but i played a little with a cvs build, with
> the same (non) results.
>
> i would then appreciate any idea concerning the use of non-english text in
> XIndice. somebody on the cocoon list told me that russian text and the
> iso-8859-1 encoding works. i have not tried it but if anybody here can share
> some experience, it would be most welcome.
>
> thank you,
> adrian.