You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pivot.apache.org by Jérôme Serré <je...@gmail.com> on 2010/10/29 18:06:21 UTC

Character '€'

Hello,

 

The  character ‘€’ in a file wtkx is represented as a square on the screen. is this normal ?

<? Xml version = "1.0" encoding = "ISO-8859-1"?>
.....
<BoxPane Form.label="Prix (€)">
.....

--

Cordialement

Jérôme Serré

 



Re: Character '€'

Posted by Niclas Hedhman <ni...@hedhman.org>.
On Mon, Nov 15, 2010 at 9:03 AM, Greg Brown <gk...@mac.com> wrote:
> The problem is that, even if the PI specifies UTF-8 for example, the file itself may be saved with a different encoding (so they may not match).

That is not a problem, but a bug with the user... Don't try to
outsmart stupidity of others, it is a loosing battle. ;-)

-- 
Niclas Hedhman, Software Developer
http://www.qi4j.org - New Energy for Java

I  live here; http://tinyurl.com/2qq9er
I  work here; http://tinyurl.com/2ymelc
I relax here; http://tinyurl.com/2cgsug

Re: Character '€'

Posted by Greg Brown <gk...@mac.com>.
Either way, you're probably right that entering it as a bug makes sense. That way we can track it and investigate further.
G

On Nov 14, 2010, at 8:03 PM, Greg Brown wrote:

> The problem is that, even if the PI specifies UTF-8 for example, the file itself may be saved with a different encoding (so they may not match).
> 
> On Nov 14, 2010, at 8:00 PM, Niclas Hedhman wrote:
> 
>> On Mon, Nov 15, 2010 at 8:52 AM, Greg Brown <gk...@mac.com> wrote:
>>>> Doesn't the XML deserializer you use just work correctly if you pass
>>>> an InputStream instead of a Reader??
>>> 
>>> 
>>> Actually, I think a Reader would work but we don't currently expose that API. We use javax.xml.stream.XMLInputFactory#createXMLStreamReader() to process the XML, which takes an InputStream as an argument. What we should probably do is allow the caller to specify the character set to read (there is another version of createXMLStreamReader() that takes both an InputStream and a java.nio.charset.Charset).
>> 
>> That is incorrect. XML specification says that the <?xml> processing
>> instruction is in (IIRC) ASCII and it contains the encoding of the
>> rest of the document., such as <?xml version="1.0" encoding="UTF-8"
>> ?>, and compliant parsers should understand this. So, for instance, if
>> the document is in UTF-16, the <?xml?> PI is NOT, and a regular text
>> editor would have problem with handling that. For UTF-8, ISO-8859-X
>> and others, the ASCII encoding coincide so not so obvious.
>> 
>> Cheers
>> -- 
>> Niclas Hedhman, Software Developer
>> http://www.qi4j.org - New Energy for Java
>> 
>> I  live here; http://tinyurl.com/2qq9er
>> I  work here; http://tinyurl.com/2ymelc
>> I relax here; http://tinyurl.com/2cgsug
> 


Re: Character '€'

Posted by Greg Brown <gk...@mac.com>.
The problem is that, even if the PI specifies UTF-8 for example, the file itself may be saved with a different encoding (so they may not match).

On Nov 14, 2010, at 8:00 PM, Niclas Hedhman wrote:

> On Mon, Nov 15, 2010 at 8:52 AM, Greg Brown <gk...@mac.com> wrote:
>>> Doesn't the XML deserializer you use just work correctly if you pass
>>> an InputStream instead of a Reader??
>> 
>> 
>> Actually, I think a Reader would work but we don't currently expose that API. We use javax.xml.stream.XMLInputFactory#createXMLStreamReader() to process the XML, which takes an InputStream as an argument. What we should probably do is allow the caller to specify the character set to read (there is another version of createXMLStreamReader() that takes both an InputStream and a java.nio.charset.Charset).
> 
> That is incorrect. XML specification says that the <?xml> processing
> instruction is in (IIRC) ASCII and it contains the encoding of the
> rest of the document., such as <?xml version="1.0" encoding="UTF-8"
> ?>, and compliant parsers should understand this. So, for instance, if
> the document is in UTF-16, the <?xml?> PI is NOT, and a regular text
> editor would have problem with handling that. For UTF-8, ISO-8859-X
> and others, the ASCII encoding coincide so not so obvious.
> 
> Cheers
> -- 
> Niclas Hedhman, Software Developer
> http://www.qi4j.org - New Energy for Java
> 
> I  live here; http://tinyurl.com/2qq9er
> I  work here; http://tinyurl.com/2ymelc
> I relax here; http://tinyurl.com/2cgsug


Re: Character '€'

Posted by Niclas Hedhman <ni...@hedhman.org>.
On Mon, Nov 15, 2010 at 8:52 AM, Greg Brown <gk...@mac.com> wrote:
>> Doesn't the XML deserializer you use just work correctly if you pass
>> an InputStream instead of a Reader??
>
>
> Actually, I think a Reader would work but we don't currently expose that API. We use javax.xml.stream.XMLInputFactory#createXMLStreamReader() to process the XML, which takes an InputStream as an argument. What we should probably do is allow the caller to specify the character set to read (there is another version of createXMLStreamReader() that takes both an InputStream and a java.nio.charset.Charset).

That is incorrect. XML specification says that the <?xml> processing
instruction is in (IIRC) ASCII and it contains the encoding of the
rest of the document., such as <?xml version="1.0" encoding="UTF-8"
?>, and compliant parsers should understand this. So, for instance, if
the document is in UTF-16, the <?xml?> PI is NOT, and a regular text
editor would have problem with handling that. For UTF-8, ISO-8859-X
and others, the ASCII encoding coincide so not so obvious.

Cheers
-- 
Niclas Hedhman, Software Developer
http://www.qi4j.org - New Energy for Java

I  live here; http://tinyurl.com/2qq9er
I  work here; http://tinyurl.com/2ymelc
I relax here; http://tinyurl.com/2cgsug

Re: Character '€'

Posted by Greg Brown <gk...@mac.com>.
>> It's because BXMLSerializer assumes that BXML files are encoded in UTF-8. There is currently no way to specify an alternate encoding.
> 
> I would categorize this as a bug. XML deserializers must respect the
> encoding, otherwise we end up with a mess ;-) especially when we have
> mixed namespaces and multiple intermixed consumers...
> 
> Doesn't the XML deserializer you use just work correctly if you pass
> an InputStream instead of a Reader??


Actually, I think a Reader would work but we don't currently expose that API. We use javax.xml.stream.XMLInputFactory#createXMLStreamReader() to process the XML, which takes an InputStream as an argument. What we should probably do is allow the caller to specify the character set to read (there is another version of createXMLStreamReader() that takes both an InputStream and a java.nio.charset.Charset).

G


Fwd: Character '€'

Posted by Niclas Hedhman <ni...@hedhman.org>.
(can't post to user@ :-( )

On Sat, Oct 30, 2010 at 8:08 PM, Greg Brown <gk...@mac.com> wrote:
> It's because BXMLSerializer assumes that BXML files are encoded in UTF-8. There is currently no way to specify an alternate encoding.

I would categorize this as a bug. XML deserializers must respect the
encoding, otherwise we end up with a mess ;-) especially when we have
mixed namespaces and multiple intermixed consumers...

Doesn't the XML deserializer you use just work correctly if you pass
an InputStream instead of a Reader??

Cheers
--
Niclas Hedhman, Software Developer
http://www.qi4j.org - New Energy for Java

I  live here; http://tinyurl.com/2qq9er
I  work here; http://tinyurl.com/2ymelc
I relax here; http://tinyurl.com/2cgsug



-- 
Niclas Hedhman, Software Developer
http://www.qi4j.org - New Energy for Java

I  live here; http://tinyurl.com/2qq9er
I  work here; http://tinyurl.com/2ymelc
I relax here; http://tinyurl.com/2cgsug

Re: Character '€'

Posted by Greg Brown <gk...@mac.com>.
It's because BXMLSerializer assumes that BXML files are encoded in UTF-8. There is currently no way to specify an alternate encoding.

On Oct 30, 2010, at 6:27 AM, Jérôme Serré wrote:

> Yes but it doesn't work too
> 
> -----Message d'origine-----
> De : Thomas Leclaire [mailto:zeusviper@gmail.com] 
> Envoyé : vendredi 29 octobre 2010 20:36
> À : user@pivot.apache.org
> Objet : Re: Character '€'
> 
> Hi!
> 
> Euro symbol is not in ISO 8859 −1 but in ISO 8859-15
> 
> 
> see http://fr.wikipedia.org/wiki/ISO_8859-1#ISO_8859-15
> 
> Regards,
> Thomas
> 
> 
> Le 29 oct. 2010 à 18:23, Jérôme Serré a écrit :
> 
>> Because i have to use the French character é, à etc...
>> 
>> -----Message d'origine-----
>> De : Greg Brown [mailto:gkbrown@mac.com] 
>> Envoyé : vendredi 29 octobre 2010 18:19
>> À : user@pivot.apache.org
>> Objet : Re: Character '€'
>> 
>> Probably a file encoding mismatch. Why are you using ISO-8859 instead of UTF-8?
>> 
>> On Oct 29, 2010, at 12:06 PM, Jérôme Serré wrote:
>> 
>>> Hello,
>>> 
>>> 
>>> 
>>> The  character ‘€’ in a file wtkx is represented as a square on the screen. is this normal ?
>>> 
>>> <? Xml version = "1.0" encoding = "ISO-8859-1"?>
>>> .....
>>> <BoxPane Form.label="Prix (€)">
>>> .....
>>> 
>>> --
>>> 
>>> Cordialement
>>> 
>>> Jérôme Serré
>>> 
>>> 
>>> 
>>> 
>> 
> 


RE: Character '€'

Posted by Jérôme Serré <je...@gmail.com>.
Yes but it doesn't work too

-----Message d'origine-----
De : Thomas Leclaire [mailto:zeusviper@gmail.com] 
Envoyé : vendredi 29 octobre 2010 20:36
À : user@pivot.apache.org
Objet : Re: Character '€'

Hi!

Euro symbol is not in ISO 8859 −1 but in ISO 8859-15


see http://fr.wikipedia.org/wiki/ISO_8859-1#ISO_8859-15

Regards,
Thomas


Le 29 oct. 2010 à 18:23, Jérôme Serré a écrit :

> Because i have to use the French character é, à etc...
> 
> -----Message d'origine-----
> De : Greg Brown [mailto:gkbrown@mac.com] 
> Envoyé : vendredi 29 octobre 2010 18:19
> À : user@pivot.apache.org
> Objet : Re: Character '€'
> 
> Probably a file encoding mismatch. Why are you using ISO-8859 instead of UTF-8?
> 
> On Oct 29, 2010, at 12:06 PM, Jérôme Serré wrote:
> 
>> Hello,
>> 
>> 
>> 
>> The  character ‘€’ in a file wtkx is represented as a square on the screen. is this normal ?
>> 
>> <? Xml version = "1.0" encoding = "ISO-8859-1"?>
>> .....
>> <BoxPane Form.label="Prix (€)">
>> .....
>> 
>> --
>> 
>> Cordialement
>> 
>> Jérôme Serré
>> 
>> 
>> 
>> 
> 


Re: Character '€'

Posted by Thomas Leclaire <ze...@gmail.com>.
In fact, there's no valid reason to not use utf-8.

However, in particular in windows french, there are still a lot of software which don't manage well with utf-8. 
Same thing with some web provider which still provide some database blocked on cp 1252 (windows encoding).
There is some historic reasons  for web-apps (really difficult to use utf-8 before mysql 4 and php 5).
So it can be sometimes really difficult to manage with utf-8 depending of the context!

But we agree, utf-8 is the best to use!


Le 29 oct. 2010 à 20:40, Greg Brown a écrit :

> Good to know. But I'd still ask, why not just use UTF-8?  ;-)
> 
> On Oct 29, 2010, at 2:36 PM, Thomas Leclaire wrote:
> 
>> Hi!
>> 
>> Euro symbol is not in ISO 8859 −1 but in ISO 8859-15
>> 
>> 
>> see http://fr.wikipedia.org/wiki/ISO_8859-1#ISO_8859-15
>> 
>> Regards,
>> Thomas
>> 
>> 
>> Le 29 oct. 2010 à 18:23, Jérôme Serré a écrit :
>> 
>>> Because i have to use the French character é, à etc...
>>> 
>>> -----Message d'origine-----
>>> De : Greg Brown [mailto:gkbrown@mac.com] 
>>> Envoyé : vendredi 29 octobre 2010 18:19
>>> À : user@pivot.apache.org
>>> Objet : Re: Character '€'
>>> 
>>> Probably a file encoding mismatch. Why are you using ISO-8859 instead of UTF-8?
>>> 
>>> On Oct 29, 2010, at 12:06 PM, Jérôme Serré wrote:
>>> 
>>>> Hello,
>>>> 
>>>> 
>>>> 
>>>> The  character ‘€’ in a file wtkx is represented as a square on the screen. is this normal ?
>>>> 
>>>> <? Xml version = "1.0" encoding = "ISO-8859-1"?>
>>>> .....
>>>> <BoxPane Form.label="Prix (€)">
>>>> .....
>>>> 
>>>> --
>>>> 
>>>> Cordialement
>>>> 
>>>> Jérôme Serré
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


Re: Character '€'

Posted by Greg Brown <gk...@mac.com>.
Good to know. But I'd still ask, why not just use UTF-8?  ;-)

On Oct 29, 2010, at 2:36 PM, Thomas Leclaire wrote:

> Hi!
> 
> Euro symbol is not in ISO 8859 −1 but in ISO 8859-15
> 
> 
> see http://fr.wikipedia.org/wiki/ISO_8859-1#ISO_8859-15
> 
> Regards,
> Thomas
> 
> 
> Le 29 oct. 2010 à 18:23, Jérôme Serré a écrit :
> 
>> Because i have to use the French character é, à etc...
>> 
>> -----Message d'origine-----
>> De : Greg Brown [mailto:gkbrown@mac.com] 
>> Envoyé : vendredi 29 octobre 2010 18:19
>> À : user@pivot.apache.org
>> Objet : Re: Character '€'
>> 
>> Probably a file encoding mismatch. Why are you using ISO-8859 instead of UTF-8?
>> 
>> On Oct 29, 2010, at 12:06 PM, Jérôme Serré wrote:
>> 
>>> Hello,
>>> 
>>> 
>>> 
>>> The  character ‘€’ in a file wtkx is represented as a square on the screen. is this normal ?
>>> 
>>> <? Xml version = "1.0" encoding = "ISO-8859-1"?>
>>> .....
>>> <BoxPane Form.label="Prix (€)">
>>> .....
>>> 
>>> --
>>> 
>>> Cordialement
>>> 
>>> Jérôme Serré
>>> 
>>> 
>>> 
>>> 
>> 
> 


Re: Character '€'

Posted by Thomas Leclaire <ze...@gmail.com>.
Hi!

Euro symbol is not in ISO 8859 −1 but in ISO 8859-15


see http://fr.wikipedia.org/wiki/ISO_8859-1#ISO_8859-15

Regards,
Thomas


Le 29 oct. 2010 à 18:23, Jérôme Serré a écrit :

> Because i have to use the French character é, à etc...
> 
> -----Message d'origine-----
> De : Greg Brown [mailto:gkbrown@mac.com] 
> Envoyé : vendredi 29 octobre 2010 18:19
> À : user@pivot.apache.org
> Objet : Re: Character '€'
> 
> Probably a file encoding mismatch. Why are you using ISO-8859 instead of UTF-8?
> 
> On Oct 29, 2010, at 12:06 PM, Jérôme Serré wrote:
> 
>> Hello,
>> 
>> 
>> 
>> The  character ‘€’ in a file wtkx is represented as a square on the screen. is this normal ?
>> 
>> <? Xml version = "1.0" encoding = "ISO-8859-1"?>
>> .....
>> <BoxPane Form.label="Prix (€)">
>> .....
>> 
>> --
>> 
>> Cordialement
>> 
>> Jérôme Serré
>> 
>> 
>> 
>> 
> 


RE: Character '€'

Posted by Jérôme Serré <je...@gmail.com>.
Ok thanks

-----Message d'origine-----
De : Greg Brown [mailto:gkbrown@mac.com] 
Envoyé : vendredi 29 octobre 2010 18:34
À : user@pivot.apache.org
Objet : Re: Character '€'

Those characters are supported in UTF-8.

On Oct 29, 2010, at 12:23 PM, Jérôme Serré wrote:

> Because i have to use the French character é, à etc...
> 
> -----Message d'origine-----
> De : Greg Brown [mailto:gkbrown@mac.com] 
> Envoyé : vendredi 29 octobre 2010 18:19
> À : user@pivot.apache.org
> Objet : Re: Character '€'
> 
> Probably a file encoding mismatch. Why are you using ISO-8859 instead of UTF-8?
> 
> On Oct 29, 2010, at 12:06 PM, Jérôme Serré wrote:
> 
>> Hello,
>> 
>> 
>> 
>> The  character ‘€’ in a file wtkx is represented as a square on the screen. is this normal ?
>> 
>> <? Xml version = "1.0" encoding = "ISO-8859-1"?>
>> .....
>> <BoxPane Form.label="Prix (€)">
>> .....
>> 
>> --
>> 
>> Cordialement
>> 
>> Jérôme Serré
>> 
>> 
>> 
>> 
> 


Re: Character '€'

Posted by Greg Brown <gk...@mac.com>.
Those characters are supported in UTF-8.

On Oct 29, 2010, at 12:23 PM, Jérôme Serré wrote:

> Because i have to use the French character é, à etc...
> 
> -----Message d'origine-----
> De : Greg Brown [mailto:gkbrown@mac.com] 
> Envoyé : vendredi 29 octobre 2010 18:19
> À : user@pivot.apache.org
> Objet : Re: Character '€'
> 
> Probably a file encoding mismatch. Why are you using ISO-8859 instead of UTF-8?
> 
> On Oct 29, 2010, at 12:06 PM, Jérôme Serré wrote:
> 
>> Hello,
>> 
>> 
>> 
>> The  character ‘€’ in a file wtkx is represented as a square on the screen. is this normal ?
>> 
>> <? Xml version = "1.0" encoding = "ISO-8859-1"?>
>> .....
>> <BoxPane Form.label="Prix (€)">
>> .....
>> 
>> --
>> 
>> Cordialement
>> 
>> Jérôme Serré
>> 
>> 
>> 
>> 
> 


RE: Character '€'

Posted by Jérôme Serré <je...@gmail.com>.
Because i have to use the French character é, à etc...

-----Message d'origine-----
De : Greg Brown [mailto:gkbrown@mac.com] 
Envoyé : vendredi 29 octobre 2010 18:19
À : user@pivot.apache.org
Objet : Re: Character '€'

Probably a file encoding mismatch. Why are you using ISO-8859 instead of UTF-8?

On Oct 29, 2010, at 12:06 PM, Jérôme Serré wrote:

> Hello,
> 
> 
> 
> The  character ‘€’ in a file wtkx is represented as a square on the screen. is this normal ?
> 
> <? Xml version = "1.0" encoding = "ISO-8859-1"?>
> .....
> <BoxPane Form.label="Prix (€)">
> .....
> 
> --
> 
> Cordialement
> 
> Jérôme Serré
> 
> 
> 
> 


Re: Character '€'

Posted by Greg Brown <gk...@mac.com>.
Probably a file encoding mismatch. Why are you using ISO-8859 instead of UTF-8?

On Oct 29, 2010, at 12:06 PM, Jérôme Serré wrote:

> Hello,
> 
> 
> 
> The  character ‘€’ in a file wtkx is represented as a square on the screen. is this normal ?
> 
> <? Xml version = "1.0" encoding = "ISO-8859-1"?>
> .....
> <BoxPane Form.label="Prix (€)">
> .....
> 
> --
> 
> Cordialement
> 
> Jérôme Serré
> 
> 
> 
>