You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Sylvain Wallez <sy...@apache.org> on 2005/03/11 20:05:04 UTC
Flowscript encoding weirdness and a solution
Hi all,
I encountered some weird things with a flowscript containing strings
with accented characters, saved in UTF-8. This is because the flow
interpreter uses the platform's default encoding to read script files.
And of course this default encoding isn't the same on Windows and Mac...
To solve this, I added the possibility to specify the file's encoding as
a comment in the very first line of the script, e.g.
// encoding = UTF-8
function blah()
...
If no special comment exists, we fall back to the platform's default
encoding as of today.
This works beautifully, and I'm thinking of adding this to 2.1 even if
(or especially because) the release is coming soon.
WDYT?
--
Sylvain Wallez Anyware Technologies
http://www.apache.org/~sylvain http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Re: Flowscript encoding weirdness and a solution
Posted by Sylvain Wallez <sy...@apache.org>.
Bertrand Delacretaz wrote:
> Le 11 mars 05, à 21:42, Sylvain Wallez a écrit :
>
>> ....Or even a more javadoc-like
>>
>> // @encoding UTF-8...
>
>
> Looks good.
>
> Note that IIUC the same problem exists for java source files: unless
> the -encoding switch is used for javac, the default platform encoding
> is used to compile. Should we add it to our build targets?
>
> I haven't seen problems, but if you have a use case for encoded
> strings in flowscript it might apply to java source code as well.
Over time, I have written a small (but useful) library of flowscript
dialog functions inspired by javax.swing.JOptionPane. For example, I can
write:
if (Dialog.confirm("Item already exists. Overwrite it?")) {
overwrite();
} else {
cancel();
}
As you can see, the message is the one displayed to the user, and may
therefore contain accented letters in french. There's also a i18n-ized
version, but setting up a dictionary is overkill for quick
single-language demos and prototypes.
I never encountered this problem in Java classes as they're used as
logic components and therefore don't produce user-readable messages, and
also because encoding problems are solved at compilation time and not at
runtime.
Now with Javaflow+CompilingClassLoader, this problem is certainly likely
to arise. So this should probably be a setting of the CompilingClassloader.
Sylvain
--
Sylvain Wallez Anyware Technologies
http://www.apache.org/~sylvain http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Re: Flowscript encoding weirdness and a solution
Posted by Bertrand Delacretaz <bd...@apache.org>.
Le 11 mars 05, à 21:42, Sylvain Wallez a écrit :
> ....Or even a more javadoc-like
>
> // @encoding UTF-8...
Looks good.
Note that IIUC the same problem exists for java source files: unless
the -encoding switch is used for javac, the default platform encoding
is used to compile. Should we add it to our build targets?
I haven't seen problems, but if you have a use case for encoded strings
in flowscript it might apply to java source code as well.
-Bertrand
Re: Flowscript encoding weirdness and a solution
Posted by Sylvain Wallez <sy...@apache.org>.
Stefano Mazzocchi wrote:
> Sylvain Wallez wrote:
>
>> Hi all,
>>
>> I encountered some weird things with a flowscript containing strings
>> with accented characters, saved in UTF-8. This is because the flow
>> interpreter uses the platform's default encoding to read script
>> files. And of course this default encoding isn't the same on Windows
>> and Mac...
>>
>> To solve this, I added the possibility to specify the file's encoding
>> as a comment in the very first line of the script, e.g.
>>
>> // encoding = UTF-8
>> function blah()
>> ...
>>
>> If no special comment exists, we fall back to the platform's default
>> encoding as of today.
>>
>> This works beautifully, and I'm thinking of adding this to 2.1 even
>> if (or especially because) the release is coming soon.
>
>
> how about
>
> //@ encoding = UTF-8
>
> instead? so that we can discriminate between comments and 'metadata
> comments'?
Or even a more javadoc-like
// @encoding UTF-8
However, just like <?xml encoding="..."?>, this comment must appear on
the _first_ line, as a PushbackInputStream is used to re-read the script
with the correct encoding and therefore we cannot do some complicated
parsing to determine the encoding.
Sylvain
--
Sylvain Wallez Anyware Technologies
http://www.apache.org/~sylvain http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Re: Flowscript encoding weirdness and a solution
Posted by Sylvain Wallez <sy...@apache.org>.
Stefano Mazzocchi wrote:
>>> how about
>>>
>>> //@ encoding = UTF-8
>>>
>>> instead? so that we can discriminate between comments and 'metadata
>>> comments'?
>>>
>>
>>
>> had a similar reflex, but from a different angle though:
>> namely by considering how vim is doing this:
>>
>> // vim: set fileencoding=iso-8859-1 nu ai:
>>
>> so: I surely like the @ idea, but am doubthing if we shouldn't
>> 'namespace' it some more (god knows how many more apps out there
>> might be willing to do interesting annotations inside comments)
>>
>>
>> thinking of annotations, and the resemblance of js to java: we could
>> require /** comments?
>> (which is not single line however, so stretches the first-line
>> requirement)
>
>
> here people would suggest to embed RDF in it ;-)
>
> KISS!
Ok, so here's the regexp:
^.*encoding\s*=\s*([^\s]+)
This matches "encoding = xxx" on the first line with any space
combination around "=" and with anything you like before "encoding", be
it "//" "// @" or "// vim: set file".
Sylvain
--
Sylvain Wallez Anyware Technologies
http://www.apache.org/~sylvain http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Re: Flowscript encoding weirdness and a solution
Posted by Stefano Mazzocchi <st...@apache.org>.
Marc Portier wrote:
>
>
> Stefano Mazzocchi wrote:
>
>> Sylvain Wallez wrote:
>>
>>> Hi all,
>>>
>>> I encountered some weird things with a flowscript containing strings
>>> with accented characters, saved in UTF-8. This is because the flow
>>> interpreter uses the platform's default encoding to read script
>>> files. And of course this default encoding isn't the same on Windows
>>> and Mac...
>>>
>>> To solve this, I added the possibility to specify the file's encoding
>>> as a comment in the very first line of the script, e.g.
>>>
>>> // encoding = UTF-8
>>> function blah()
>>> ...
>>>
>>> If no special comment exists, we fall back to the platform's default
>>> encoding as of today.
>>>
>>> This works beautifully, and I'm thinking of adding this to 2.1 even
>>> if (or especially because) the release is coming soon.
>>
>>
>>
>> how about
>>
>> //@ encoding = UTF-8
>>
>> instead? so that we can discriminate between comments and 'metadata
>> comments'?
>>
>
>
> had a similar reflex, but from a different angle though:
> namely by considering how vim is doing this:
>
> // vim: set fileencoding=iso-8859-1 nu ai:
>
> so: I surely like the @ idea, but am doubthing if we shouldn't
> 'namespace' it some more (god knows how many more apps out there might
> be willing to do interesting annotations inside comments)
>
>
> thinking of annotations, and the resemblance of js to java: we could
> require /** comments?
> (which is not single line however, so stretches the first-line requirement)
here people would suggest to embed RDF in it ;-)
KISS!
--
Stefano.
Re: Flowscript encoding weirdness and a solution
Posted by Marc Portier <mp...@outerthought.org>.
Stefano Mazzocchi wrote:
> Sylvain Wallez wrote:
>
>> Hi all,
>>
>> I encountered some weird things with a flowscript containing strings
>> with accented characters, saved in UTF-8. This is because the flow
>> interpreter uses the platform's default encoding to read script files.
>> And of course this default encoding isn't the same on Windows and Mac...
>>
>> To solve this, I added the possibility to specify the file's encoding
>> as a comment in the very first line of the script, e.g.
>>
>> // encoding = UTF-8
>> function blah()
>> ...
>>
>> If no special comment exists, we fall back to the platform's default
>> encoding as of today.
>>
>> This works beautifully, and I'm thinking of adding this to 2.1 even if
>> (or especially because) the release is coming soon.
>
>
> how about
>
> //@ encoding = UTF-8
>
> instead? so that we can discriminate between comments and 'metadata
> comments'?
>
had a similar reflex, but from a different angle though:
namely by considering how vim is doing this:
// vim: set fileencoding=iso-8859-1 nu ai:
so: I surely like the @ idea, but am doubthing if we shouldn't
'namespace' it some more (god knows how many more apps out there might
be willing to do interesting annotations inside comments)
thinking of annotations, and the resemblance of js to java: we could
require /** comments?
(which is not single line however, so stretches the first-line requirement)
-marc=
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/mpo/
mpo@outerthought.org mpo@apache.org
Re: Flowscript encoding weirdness and a solution
Posted by Stefano Mazzocchi <st...@apache.org>.
Sylvain Wallez wrote:
> Hi all,
>
> I encountered some weird things with a flowscript containing strings
> with accented characters, saved in UTF-8. This is because the flow
> interpreter uses the platform's default encoding to read script files.
> And of course this default encoding isn't the same on Windows and Mac...
>
> To solve this, I added the possibility to specify the file's encoding as
> a comment in the very first line of the script, e.g.
>
> // encoding = UTF-8
> function blah()
> ...
>
> If no special comment exists, we fall back to the platform's default
> encoding as of today.
>
> This works beautifully, and I'm thinking of adding this to 2.1 even if
> (or especially because) the release is coming soon.
how about
//@ encoding = UTF-8
instead? so that we can discriminate between comments and 'metadata
comments'?
--
Stefano.