You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@freemarker.apache.org by Daniel Dekany <dd...@freemail.hu> on 2017/03/19 18:22:28 UTC

[FM3] Rename encoding to charset, use Charset instead of String

We have this retro terminology where instead of charset we say
encoding. (I understand that encoding has a wider meaning, but we only
intend to support encoding/decoding via a charset.) So I think
cfg.setDefaultEncoding and <#ftl encoding=...> and such should be
renamed to cfg.setDefaultCharset and <#ftl charset=...>.

Also, in the Java API-s we should use Charset instead of a String
(java.nio.charset.Charset didn't exist when FM 2.3 was created).

-- 
Thanks,
 Daniel Dekany


Re: [FM3] Rename encoding to charset, use Charset instead of String

Posted by Daniel Dekany <dd...@apache.org>.
I would like to point out that because nobody has complained, I have
implemented and committed this (a week ago or something). That is, the
names are "sourceEncoding" and "outputEncoding" and
"URLEscapingCharset". Of course, all these settings have
java.nio.charset.Charset type (as opposed to String in FM2).

But nothing is graved into stone until there's a release. However,
some people beside me have to take their time, check this thing out,
and criticize it. Because more eyes see more. As always,
src/manual/en_US/FM3-CHANGE-LOG.txt contains the
(not-entirely-trivial-) changes made so far.

BTW, right now I'm working towards immutable Configuration-s (the
builder thing) which kind of implies immutable TemplateConfiguration
and immutable Template-s. I'm in the middle of this, so that part is a
bit messy ATM, but it compiles and is supposed to work without bugs.
(But it's not backward compatible, mind you.)


Monday, March 27, 2017, 4:03:23 PM, Daniel Dekany wrote:

> I have second thoughts regarding encoding VS charset... When it's
> about the charset of a file, people always seem to use "encoding":
>
> - In Eclipse the setting name is called "Text file encoding"
> - In IntetelliJ it's called "File Encoding"
> - In Notepad++ the related top-level menu point is "Encoding"
> - In TextMate it's called "File Encoding"
>
> I didn't run into any case where an editor uses the term "charset" for
> this.
>
> Worse, there's this: <?xml version="1.0" encoding="UTF-8"?>. We also
> have `<#ftl encoding="...">`. Because it's somewhat reminiscent of the
> XML declaration, people will tend to write "encoding". Of course, we
> can have "encoding" there and still call the setting sourceCharset,
> but that would be a bit confusing.
>
> Things get less obvious when it comes to settings like
> URLEscapingCharset and outputEncoding (these are the FM2 names)...
>
> For URL escaping... first of all, the FM2 name isn't very good, as
> this kind of escaping is called "URL encoding", not escaping. (But FM
> have auto-escaping, and all the related directives and built-ins, all
> using the term escaping, so I guess that's how it got the wrong name.)
> Anyway, I think the charset term is used more often in this context
> (or rather the mutated forms of it, like "character set"). Certainly
> because it would be confusing to talk about the encoding used for URL
> encoding, as opposed to the the charset used for URL encoding.
>
> For the charset of the output, in Content-Type HTTP response header
> you have "charset". So developers are often talking about the charset,
> rather than about the encoding. But, web browsers call this thing the
> encoding of the page, though that's because from their side it's
> analogous to opening a file, so they inherit the terminology from file
> editors.
>
> So, yeah... you can't be consistent with everything. Maybe the charset
> VS encoding terminology choices of FM2 were the right compromise.
> Except that we will still say "sourceEncoding" instead of just
> "encoding", and use the Charset type instead of String.
>
>
> Friday, March 24, 2017, 4:50:09 PM, Woonsan Ko wrote:
>
>> On Tue, Mar 21, 2017 at 2:39 PM, Daniel Dekany <dd...@apache.org> wrote:
>>> Tuesday, March 21, 2017, 3:31:56 PM, Woonsan Ko wrote:
>>>
>>>> +1 on both.
>>>
>>> Furthermore, as the "encoding" parameter of
>>> getTemplate/#include/#import was removed in FM3, the
>>> locale-to-encoding map (`Configuration.setEncoding(Locale, String)`)
>>> was also removed. So now it should just be `charset`, not
>>> `defaultCharset` (similarly as we have Template.charset). However,
>>> that name is still pretty bad, as it doesn't tell if the charset of
>>> what it is. It's the charset of the the template file when we read it.
>>> So, maybe, it should be "sourceCharset"?
>>
>> Yes, "sourceCharset" helps clarify the meaning, indeed!
>>
>> Cheers,
>>
>> Woonsan
>>
>>>
>>>> Woonsan
>>>>
>>>> On Sun, Mar 19, 2017 at 2:22 PM, Daniel Dekany <dd...@freemail.hu> wrote:
>>>>> We have this retro terminology where instead of charset we say
>>>>> encoding. (I understand that encoding has a wider meaning, but we only
>>>>> intend to support encoding/decoding via a charset.) So I think
>>>>> cfg.setDefaultEncoding and <#ftl encoding=...> and such should be
>>>>> renamed to cfg.setDefaultCharset and <#ftl charset=...>.
>>>>>
>>>>> Also, in the Java API-s we should use Charset instead of a String
>>>>> (java.nio.charset.Charset didn't exist when FM 2.3 was created).
>>>>>
>>>>> --
>>>>> Thanks,
>>>>>  Daniel Dekany
>>>>>
>>>>
>>>
>>> --
>>> Thanks,
>>>  Daniel Dekany
>>>
>>
>

-- 
Thanks,
 Daniel Dekany


Re: [FM3] Rename encoding to charset, use Charset instead of String

Posted by Daniel Dekany <dd...@apache.org>.
I have second thoughts regarding encoding VS charset... When it's
about the charset of a file, people always seem to use "encoding":

- In Eclipse the setting name is called "Text file encoding"
- In IntetelliJ it's called "File Encoding"
- In Notepad++ the related top-level menu point is "Encoding"
- In TextMate it's called "File Encoding"

I didn't run into any case where an editor uses the term "charset" for
this.

Worse, there's this: <?xml version="1.0" encoding="UTF-8"?>. We also
have `<#ftl encoding="...">`. Because it's somewhat reminiscent of the
XML declaration, people will tend to write "encoding". Of course, we
can have "encoding" there and still call the setting sourceCharset,
but that would be a bit confusing.

Things get less obvious when it comes to settings like
URLEscapingCharset and outputEncoding (these are the FM2 names)...

For URL escaping... first of all, the FM2 name isn't very good, as
this kind of escaping is called "URL encoding", not escaping. (But FM
have auto-escaping, and all the related directives and built-ins, all
using the term escaping, so I guess that's how it got the wrong name.)
Anyway, I think the charset term is used more often in this context
(or rather the mutated forms of it, like "character set"). Certainly
because it would be confusing to talk about the encoding used for URL
encoding, as opposed to the the charset used for URL encoding.

For the charset of the output, in Content-Type HTTP response header
you have "charset". So developers are often talking about the charset,
rather than about the encoding. But, web browsers call this thing the
encoding of the page, though that's because from their side it's
analogous to opening a file, so they inherit the terminology from file
editors.

So, yeah... you can't be consistent with everything. Maybe the charset
VS encoding terminology choices of FM2 were the right compromise.
Except that we will still say "sourceEncoding" instead of just
"encoding", and use the Charset type instead of String.


Friday, March 24, 2017, 4:50:09 PM, Woonsan Ko wrote:

> On Tue, Mar 21, 2017 at 2:39 PM, Daniel Dekany <dd...@apache.org> wrote:
>> Tuesday, March 21, 2017, 3:31:56 PM, Woonsan Ko wrote:
>>
>>> +1 on both.
>>
>> Furthermore, as the "encoding" parameter of
>> getTemplate/#include/#import was removed in FM3, the
>> locale-to-encoding map (`Configuration.setEncoding(Locale, String)`)
>> was also removed. So now it should just be `charset`, not
>> `defaultCharset` (similarly as we have Template.charset). However,
>> that name is still pretty bad, as it doesn't tell if the charset of
>> what it is. It's the charset of the the template file when we read it.
>> So, maybe, it should be "sourceCharset"?
>
> Yes, "sourceCharset" helps clarify the meaning, indeed!
>
> Cheers,
>
> Woonsan
>
>>
>>> Woonsan
>>>
>>> On Sun, Mar 19, 2017 at 2:22 PM, Daniel Dekany <dd...@freemail.hu> wrote:
>>>> We have this retro terminology where instead of charset we say
>>>> encoding. (I understand that encoding has a wider meaning, but we only
>>>> intend to support encoding/decoding via a charset.) So I think
>>>> cfg.setDefaultEncoding and <#ftl encoding=...> and such should be
>>>> renamed to cfg.setDefaultCharset and <#ftl charset=...>.
>>>>
>>>> Also, in the Java API-s we should use Charset instead of a String
>>>> (java.nio.charset.Charset didn't exist when FM 2.3 was created).
>>>>
>>>> --
>>>> Thanks,
>>>>  Daniel Dekany
>>>>
>>>
>>
>> --
>> Thanks,
>>  Daniel Dekany
>>
>

-- 
Thanks,
 Daniel Dekany


Re: [FM3] Rename encoding to charset, use Charset instead of String

Posted by Jacques Le Roux <ja...@les7arts.com>.
I agree on all points

Jacques


Le 24/03/2017 � 15:50, Woonsan Ko a �crit :
> On Tue, Mar 21, 2017 at 2:39 PM, Daniel Dekany <dd...@apache.org> wrote:
>> Tuesday, March 21, 2017, 3:31:56 PM, Woonsan Ko wrote:
>>
>>> +1 on both.
>> Furthermore, as the "encoding" parameter of
>> getTemplate/#include/#import was removed in FM3, the
>> locale-to-encoding map (`Configuration.setEncoding(Locale, String)`)
>> was also removed. So now it should just be `charset`, not
>> `defaultCharset` (similarly as we have Template.charset). However,
>> that name is still pretty bad, as it doesn't tell if the charset of
>> what it is. It's the charset of the the template file when we read it.
>> So, maybe, it should be "sourceCharset"?
> Yes, "sourceCharset" helps clarify the meaning, indeed!
>
> Cheers,
>
> Woonsan
>
>>> Woonsan
>>>
>>> On Sun, Mar 19, 2017 at 2:22 PM, Daniel Dekany <dd...@freemail.hu> wrote:
>>>> We have this retro terminology where instead of charset we say
>>>> encoding. (I understand that encoding has a wider meaning, but we only
>>>> intend to support encoding/decoding via a charset.) So I think
>>>> cfg.setDefaultEncoding and <#ftl encoding=...> and such should be
>>>> renamed to cfg.setDefaultCharset and <#ftl charset=...>.
>>>>
>>>> Also, in the Java API-s we should use Charset instead of a String
>>>> (java.nio.charset.Charset didn't exist when FM 2.3 was created).
>>>>
>>>> --
>>>> Thanks,
>>>>   Daniel Dekany
>>>>
>> --
>> Thanks,
>>   Daniel Dekany
>>


Re: [FM3] Rename encoding to charset, use Charset instead of String

Posted by Woonsan Ko <wo...@apache.org>.
On Tue, Mar 21, 2017 at 2:39 PM, Daniel Dekany <dd...@apache.org> wrote:
> Tuesday, March 21, 2017, 3:31:56 PM, Woonsan Ko wrote:
>
>> +1 on both.
>
> Furthermore, as the "encoding" parameter of
> getTemplate/#include/#import was removed in FM3, the
> locale-to-encoding map (`Configuration.setEncoding(Locale, String)`)
> was also removed. So now it should just be `charset`, not
> `defaultCharset` (similarly as we have Template.charset). However,
> that name is still pretty bad, as it doesn't tell if the charset of
> what it is. It's the charset of the the template file when we read it.
> So, maybe, it should be "sourceCharset"?

Yes, "sourceCharset" helps clarify the meaning, indeed!

Cheers,

Woonsan

>
>> Woonsan
>>
>> On Sun, Mar 19, 2017 at 2:22 PM, Daniel Dekany <dd...@freemail.hu> wrote:
>>> We have this retro terminology where instead of charset we say
>>> encoding. (I understand that encoding has a wider meaning, but we only
>>> intend to support encoding/decoding via a charset.) So I think
>>> cfg.setDefaultEncoding and <#ftl encoding=...> and such should be
>>> renamed to cfg.setDefaultCharset and <#ftl charset=...>.
>>>
>>> Also, in the Java API-s we should use Charset instead of a String
>>> (java.nio.charset.Charset didn't exist when FM 2.3 was created).
>>>
>>> --
>>> Thanks,
>>>  Daniel Dekany
>>>
>>
>
> --
> Thanks,
>  Daniel Dekany
>

Re: [FM3] Rename encoding to charset, use Charset instead of String

Posted by Daniel Dekany <dd...@apache.org>.
Tuesday, March 21, 2017, 3:31:56 PM, Woonsan Ko wrote:

> +1 on both.

Furthermore, as the "encoding" parameter of
getTemplate/#include/#import was removed in FM3, the
locale-to-encoding map (`Configuration.setEncoding(Locale, String)`)
was also removed. So now it should just be `charset`, not
`defaultCharset` (similarly as we have Template.charset). However,
that name is still pretty bad, as it doesn't tell if the charset of
what it is. It's the charset of the the template file when we read it.
So, maybe, it should be "sourceCharset"?

> Woonsan
>
> On Sun, Mar 19, 2017 at 2:22 PM, Daniel Dekany <dd...@freemail.hu> wrote:
>> We have this retro terminology where instead of charset we say
>> encoding. (I understand that encoding has a wider meaning, but we only
>> intend to support encoding/decoding via a charset.) So I think
>> cfg.setDefaultEncoding and <#ftl encoding=...> and such should be
>> renamed to cfg.setDefaultCharset and <#ftl charset=...>.
>>
>> Also, in the Java API-s we should use Charset instead of a String
>> (java.nio.charset.Charset didn't exist when FM 2.3 was created).
>>
>> --
>> Thanks,
>>  Daniel Dekany
>>
>

-- 
Thanks,
 Daniel Dekany


Re: [FM3] Rename encoding to charset, use Charset instead of String

Posted by Woonsan Ko <wo...@apache.org>.
+1 on both.

Woonsan

On Sun, Mar 19, 2017 at 2:22 PM, Daniel Dekany <dd...@freemail.hu> wrote:
> We have this retro terminology where instead of charset we say
> encoding. (I understand that encoding has a wider meaning, but we only
> intend to support encoding/decoding via a charset.) So I think
> cfg.setDefaultEncoding and <#ftl encoding=...> and such should be
> renamed to cfg.setDefaultCharset and <#ftl charset=...>.
>
> Also, in the Java API-s we should use Charset instead of a String
> (java.nio.charset.Charset didn't exist when FM 2.3 was created).
>
> --
> Thanks,
>  Daniel Dekany
>