You are viewing a plain text version of this content. The canonical link for it is here.
Posted to docs@httpd.apache.org by André Malo <nd...@perlig.de> on 2003/09/06 00:04:01 UTC

Character References for Japanese translations

* Tetsuya Kitahata wrote:

> Other language resource files are using "Character Mnemonic Entities",

Just to clarify, what we're talking about:

&#number; or &#xhexnumber; are "character references", i.e. they refer to a
particular code point of the Unicode charset.
&somename; are generic entities, some of them are predefined as character
references. We use the W3C character entity definitions for easier work and
some own for better readability.

> it seems. I think that japanese translations should follow
> that principle.

That's basically the translator's decision. Though I'd suggest not to use
them, because they blow up the xml source 2 - 4 times (or more). Note that
it's only a *source* issue. Character references are resolved during the
source parsing stage and are inserted as raw iso-2022-jp into the transformed
result. I'm wondering a bit why you want to make such an effort. I can
imagine (but I don't know) that's even more simple and efficient to type in
the characters (or symbols, however) directly.

> Fortunately, PHP.exe will convert japanese characters into
> "Character Mnemonic Entities" easily equivalent to them, it seems.

Doh. Same question as Kess. PHP.exe? What do you want to do?

nd

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: Character References for Japanese translations

Posted by Yoshiki Hayashi <yo...@xemacs.org>.
Erik Abele <er...@codefaktor.de> writes:

>> That's basically the translator's decision. Though I'd suggest not to 
>> use
>> them, because they blow up the xml source 2 - 4 times (or more). Note 
>> that
>> it's only a *source* issue. Character references are resolved during 
>> the
>> source parsing stage and are inserted as raw iso-2022-jp into the 
>> transformed
>> result. I'm wondering a bit why you want to make such an effort. I can
>> imagine (but I don't know) that's even more simple and efficient to 
>> type in
>> the characters (or symbols, however) directly.
>
> I'm clearly with nd here; I can't see any reason why someone would want 
> to only use character entities in a XML source which is transformed 
> anyway and I would strongly suggest to *not* use them exclusively in 
> the error doc typemaps because of the increased size (they are not 
> transformed).

+1.

The funny things may happen if we insert characters encoded
in iso-2022-jp directly to typemap file but we are going to
use URI keyword to refer to separate file for Japanese
translation, I don't see any need to use character
references.

-- 
Yoshiki Hayashi

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: Character References for Japanese translations

Posted by Erik Abele <er...@codefaktor.de>.
On 07/09/2003, at 06:34, Tetsuya Kitahata wrote:

>   Yoshiki Hayashi <yo...@xemacs.org> wrote:
>
>> The funny things may happen if we insert characters encoded
>> in iso-2022-jp directly to typemap file but we are going to
>> use URI keyword to refer to separate file for Japanese
>> translation, I don't see any need to use character
>
> If this (separation of the files) is true, I will put
> my plus one (and just my 2 YEN :-), too.

Yes, I'd also say the best way is to simply store 'special' languages
in separate files and then refer to these files from the typemap itself.

btw, the attached materials already contained separate files and
the corresponding patches to the typemap files, so there shouldn't
be a problem ;)

Cheers,
Erik

> Sincerely,
>
> -- Tetsuya. (tetsuya@apache.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: Character References for Japanese translations

Posted by Tetsuya Kitahata <te...@apache.org>.
On Sat, 6 Sep 2003 20:24:18 +0200
Erik Abele <er...@codefaktor.de> wrote:

> >> Fortunately, PHP.exe will convert japanese characters into
> >> "Character Mnemonic Entities" easily equivalent to them, it seems.
> > Doh. Same question as Kess. PHP.exe? What do you want to do?
> I think Tetsuya wants to express that one can use a php script to 
> transform text written in iso-2022-jp encoded form into character 
> entities and vice versa, right Tetsuya?

> ...just my 0.02 Euro

Of course, YES :-)
(But ... not *vice versa*. Hey, you can check the reverse ones
using your favorite browser :-)

The best way is using something like "native2ascii" (i think it is
the best way to use native2ascii when thinking of the i18n/l10n/m17n
@ java), however, I am not sure what could be the best tool for the
others to use.

By the way, do you know that you can do the same thing (To know the
number of character reference) by using some HTML editors?
e.g. MS FrontPage :) :) .. by setting "charset" to us-ascii and 
     writing Japanese words to the page. (and see the HTML source)

Of course, I wanted to refer to FrontPage, however, I knew that 
most of the folks here would not be willing to use that...
.... So, I've decide to make it secret :-)

--

Character References for Japanese translations and for other
Asian translations can co-exist I am sure, however, I was not
sure whether they could co-exist using the encodings peculiar
to their native languages, and it would be a smart way to
deal with the multilingualization.

I think there should be discussions more about this matter.

However,

  Yoshiki Hayashi <yo...@xemacs.org> wrote:

> The funny things may happen if we insert characters encoded
> in iso-2022-jp directly to typemap file but we are going to
> use URI keyword to refer to separate file for Japanese
> translation, I don't see any need to use character

If this (separation of the files) is true, I will put
my plus one (and just my 2 YEN :-), too.

Sincerely,

-- Tetsuya. (tetsuya@apache.org)



---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: Character References for Japanese translations

Posted by Erik Abele <er...@codefaktor.de>.
On 06/09/2003, at 12:04, André Malo wrote:

> * Tetsuya Kitahata wrote:
>
>> Other language resource files are using "Character Mnemonic Entities",
>
> Just to clarify, what we're talking about:
>
> &#number; or &#xhexnumber; are "character references", i.e. they refer 
> to a
> particular code point of the Unicode charset.
> &somename; are generic entities, some of them are predefined as 
> character
> references. We use the W3C character entity definitions for easier 
> work and
> some own for better readability.

Yep, and the present error doc typemaps for example, are using them 
only to replace single characters (e.g. the german umlauts). I don't 
think it makes any sense to write a translated doc/typemap exclusively 
with them (readability, size, old and broken clients?).

>> it seems. I think that japanese translations should follow
>> that principle.
>
> That's basically the translator's decision. Though I'd suggest not to 
> use
> them, because they blow up the xml source 2 - 4 times (or more). Note 
> that
> it's only a *source* issue. Character references are resolved during 
> the
> source parsing stage and are inserted as raw iso-2022-jp into the 
> transformed
> result. I'm wondering a bit why you want to make such an effort. I can
> imagine (but I don't know) that's even more simple and efficient to 
> type in
> the characters (or symbols, however) directly.

I'm clearly with nd here; I can't see any reason why someone would want 
to only use character entities in a XML source which is transformed 
anyway and I would strongly suggest to *not* use them exclusively in 
the error doc typemaps because of the increased size (they are not 
transformed).

>> Fortunately, PHP.exe will convert japanese characters into
>> "Character Mnemonic Entities" easily equivalent to them, it seems.
>
> Doh. Same question as Kess. PHP.exe? What do you want to do?

I think Tetsuya wants to express that one can use a php script to 
transform text written in iso-2022-jp encoded form into character 
entities and vice versa, right Tetsuya?

...just my 0.02 €uro

Cheers,
Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org