You are viewing a plain text version of this content. The canonical link for it is here.
Posted to l10n@openoffice.apache.org by janI <ja...@apache.org> on 2013/03/16 00:47:30 UTC
Language codes ???
Hi
I am (as usual confused). I have merged translation files from our sources,
sdf files and pottle. I have the following codes (directories):
af brx dz eu he ka ky my om ro
sk tr ts zu
ar bs el fa hi kab lt nb or
ru sl sw_TZ ug
as ca en_AU fi hr kk lv ne pa_IN rw
so ta uk
ast ca_XV en_GB fr hu km mai nl pap sa_IN son
te uz
be_BY cs en_US fur id kn mk nn pl sat
sq tg ve
bg cy en_ZA ga is ko ml nr ps sc
sr th vi
bn da eo gd it kok mn nso pt
sd ss tk xh
bo de es gl ja ks mni ny pt_BR sh
st tlh zh_CN
br dgo et gu jbo ku mr oc pyg
si sv tn zh_TW
(All the po files are available in "branches/l10n/main/l10ntools/lang" once
svn is finished)
Where can I find the relation between the directory names and the
languages (human names), someone (I think andrea) mentioned it was country
codes ?
I expected dialects within a language to be written as e.g. es_XX, and I
know there is an ongoing effort on translating to
Catalan Euskadi and Gallego
but I cannot find, so I am afraid I have missed something for my test.
(personal note, to my friends up north, sorry for using the word "dialect",
but spain is still one country)
I am also a bit puzzled about pt_BR and ca_XV (google just gave me LO as
answer).
thanks a lot in advance for any help.
rgds
jan I.
Re: Language codes ???
Posted by Andrea Pescetti <pe...@apache.org>.
On 16/03/2013 janI wrote:
> 3 possibilities when inserting a language message that has not been
> translated:
> 1) Do not insert the message for this language
> 2) Insert the message with an empty string
> 3) Replace the string with the en-US string and insert that
> I think 3) is the most correct approach ? or is there an automatic fallback
> for non-existing strings so 1) would be the correct way ?
Option 3 is surely the current and expected outcome, i.e., if a string
is not translated then the English string is used instead.
But I don't know how we handle it now internally, whether it is
automatic that a missing translation is replaced by the English original
at build time (like your option 1) or we need to explicitly put the
English version in place of the translation (while leaving PO files in
the untranslated status, option 3). If the replacement is automatic then
option 1 seems cleaner, just leave untranslated what is untranslated.
Regards,
Andrea.
---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org
Re: Language codes ???
Posted by Aivaras Stepukonis <as...@gmail.com>.
In that case, I think it is better to offer information in English than
none at all (this way the user is retaining the option to do the
improvised translation on his/her own).
Sincerely,
Aivaras
2013.03.16 12:50, janI rašė:
> On 16 March 2013 10:51, Andrea Pescetti <pe...@apache.org> wrote:
>
>> janI wrote:
>>
>>> I have the following codes (directories):
>>> af brx dz eu he ka ky my om ro ...
>>>
>>> Where can I find the relation between the directory names and the
>>> languages (human names), someone (I think andrea) mentioned it was country
>>> codes ?
>>>
>> We don't use country codes, we rely on the LANGUAGE codes, which are ISO
>> standards. So, in general:
>> - if it is a two-letter code, look it up in ISO 639-1:
>> http://en.wikipedia.org/wiki/**List_of_ISO_639-1_codes<http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes> ("af" -> "Afrikaans")
>> - if it is a three-letter code, use ISO 639-2 or (more complete, extends
>> 639-2) 639-3: http://en.wikipedia.org/wiki/**List_of_ISO_639-3_codes<http://en.wikipedia.org/wiki/List_of_ISO_639-3_codes>("pap" -> "Papiamento")
>>
>>
>> I expected dialects within a language to be written as e.g. es_XX, and I
>>> know there is an ongoing effort on translating to
>>> Catalan Euskadi and Gallego
>>>
>> No, this would be a dangerous approach! There is a lot of "political
>> correctness" at work here. Everything that is in ISO is a language. So all
>> languages spoken in Spain have equal dignity and their own codes. Catalan
>> is "ca", Basque/Euskadi is "eu", Gallego is "gl" and you listed all three
>> of them.
>>
>>
>> I am also a bit puzzled about pt_BR and ca_XV
>> These are extensions made to accommodate language variants. Languages in
>> the form '[a-z]*_[A-Z]*' are an internal convention to be read as:
>> language_PLACE. So en_US means "English, as spoken in the US"; en_GB =
>> "English, as spoken in Great Britain"; pt_BR = "Portoguese, as spoken in
>> Brazil"; ca_XV = "Catalan, as spoken in Valencia [or Comunidad
>> Valenciana]". zh_CN and zh_TW are often called "simplified" and
>> "traditional" Chinese, instead of being linked to China and Taiwan as the
>> two codes would mean.
>>
> Thanks a lot for a very full filling answer.
>
> Most of our languages are not translated 100% meaning a lot of strings are
> empty, when genLang generates source files with all languages (as today) I
> have 3 possibilities when inserting a language message that has not been
> translated:
>
> 1) Do not insert the message for this language
> 2) Insert the message with an empty string
> 3) Replace the string with the en-US string and insert that
>
> I think 3) is the most correct approach ? or is there an automatic fallback
> for non-existing strings so 1) would be the correct way ?
>
>
> Ps. this does of course not affect the .po files, they stay untranslated.
>
>> Regards,
>> Andrea.
>>
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: l10n-unsubscribe@openoffice.**apache.org<l1...@openoffice.apache.org>
>> For additional commands, e-mail: l10n-help@openoffice.apache.**org<l1...@openoffice.apache.org>
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org
Re: Language codes ???
Posted by janI <ja...@apache.org>.
On 16 March 2013 10:51, Andrea Pescetti <pe...@apache.org> wrote:
> janI wrote:
>
>> I have the following codes (directories):
>> af brx dz eu he ka ky my om ro ...
>>
>> Where can I find the relation between the directory names and the
>> languages (human names), someone (I think andrea) mentioned it was country
>> codes ?
>>
>
> We don't use country codes, we rely on the LANGUAGE codes, which are ISO
> standards. So, in general:
> - if it is a two-letter code, look it up in ISO 639-1:
> http://en.wikipedia.org/wiki/**List_of_ISO_639-1_codes<http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes> ("af" -> "Afrikaans")
> - if it is a three-letter code, use ISO 639-2 or (more complete, extends
> 639-2) 639-3: http://en.wikipedia.org/wiki/**List_of_ISO_639-3_codes<http://en.wikipedia.org/wiki/List_of_ISO_639-3_codes>("pap" -> "Papiamento")
>
>
> I expected dialects within a language to be written as e.g. es_XX, and I
>> know there is an ongoing effort on translating to
>> Catalan Euskadi and Gallego
>>
>
> No, this would be a dangerous approach! There is a lot of "political
> correctness" at work here. Everything that is in ISO is a language. So all
> languages spoken in Spain have equal dignity and their own codes. Catalan
> is "ca", Basque/Euskadi is "eu", Gallego is "gl" and you listed all three
> of them.
>
>
> I am also a bit puzzled about pt_BR and ca_XV
>>
>
> These are extensions made to accommodate language variants. Languages in
> the form '[a-z]*_[A-Z]*' are an internal convention to be read as:
> language_PLACE. So en_US means "English, as spoken in the US"; en_GB =
> "English, as spoken in Great Britain"; pt_BR = "Portoguese, as spoken in
> Brazil"; ca_XV = "Catalan, as spoken in Valencia [or Comunidad
> Valenciana]". zh_CN and zh_TW are often called "simplified" and
> "traditional" Chinese, instead of being linked to China and Taiwan as the
> two codes would mean.
>
Thanks a lot for a very full filling answer.
Most of our languages are not translated 100% meaning a lot of strings are
empty, when genLang generates source files with all languages (as today) I
have 3 possibilities when inserting a language message that has not been
translated:
1) Do not insert the message for this language
2) Insert the message with an empty string
3) Replace the string with the en-US string and insert that
I think 3) is the most correct approach ? or is there an automatic fallback
for non-existing strings so 1) would be the correct way ?
Ps. this does of course not affect the .po files, they stay untranslated.
>
> Regards,
> Andrea.
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: l10n-unsubscribe@openoffice.**apache.org<l1...@openoffice.apache.org>
> For additional commands, e-mail: l10n-help@openoffice.apache.**org<l1...@openoffice.apache.org>
>
>
Re: Language codes ???
Posted by Xuacu <xu...@gmail.com>.
Hi!
2013/3/16 Andrea Pescetti <pe...@apache.org>:
A good explanation about ISO codes for languages!
>
>> I expected dialects within a language to be written as e.g. es_XX, and I
>> know there is an ongoing effort on translating to
>> Catalan Euskadi and Gallego
>
>
> No, this would be a dangerous approach! There is a lot of "political
> correctness" at work here. Everything that is in ISO is a language. So all
> languages spoken in Spain have equal dignity and their own codes. Catalan is
> "ca", Basque/Euskadi is "eu", Gallego is "gl" and you listed all three of
> them.
>
And there are also the Asturian (ast) and the Aragonese (an)
languages, often forgotten because they don't have an official legal
status in Spain. But still we exist ;)
All the best
--
Xuacu
---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org
Re: Language codes ???
Posted by janI <ja...@apache.org>.
On Mar 19, 2013 1:45 PM, "Claudio Filho" <fi...@gmail.com> wrote:
>
> Hi
>
> 2013/3/19 Jürgen Schmidt <jo...@gmail.com>:
> > I think we have a mix of both which was confusing to me as well at the
> > beginning. Pootle seems to use "_" where we in the office
> > "extras/l10n/source/..." use "-" and also for the language selection in
> > configure "--with-lang="en-US de es pt-BR ..."
>
> In other softwares (I remember of Mozilla), they use "_".
>
> IMHO, we can change from "-" to "_", but we need to evaluate the cost
> to change. Maybe open a branch only for adaptation for l10n, like Janl
> is doing.
no need to, I think it is part of my integration, or maybe a second phase.
>
> Cheers,
> Claudio
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: l10n-help@openoffice.apache.org
>
Re: Language codes ???
Posted by Claudio Filho <fi...@gmail.com>.
Hi
2013/3/19 Jürgen Schmidt <jo...@gmail.com>:
> I think we have a mix of both which was confusing to me as well at the
> beginning. Pootle seems to use "_" where we in the office
> "extras/l10n/source/..." use "-" and also for the language selection in
> configure "--with-lang="en-US de es pt-BR ..."
In other softwares (I remember of Mozilla), they use "_".
IMHO, we can change from "-" to "_", but we need to evaluate the cost
to change. Maybe open a branch only for adaptation for l10n, like Janl
is doing.
Cheers,
Claudio
---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org
Re: Language codes ???
Posted by Jürgen Schmidt <jo...@gmail.com>.
On 3/18/13 11:06 PM, Rob Weir wrote:
> On Mon, Mar 18, 2013 at 4:10 PM, Andrea Pescetti <pe...@apache.org> wrote:
>> Rob Weir wrote:
>>>
>>> Do you know why we don't just follow the IETF's recommendations in
>>> this area? They have a similar scheme, BCP 47, but use a hyphen
>>> rather than underscore, e.g., en-US, pt-BR. This is what is used on
>>> the web in general, e.g., in HTTP headers.
>>> See: http://www.rfc-editor.org/bcp/bcp47.txt
>>
>>
>> I have absolutely no idea, probably it just happened that someone chose a
>> convention for OpenOffice.
>>
>
> If it is possible to synch up on the BCP 47 standard, it might have
> some advantages. For example, it should make recommending a specific
> download for AOO very easy. Most browsers put the user's locale into
> the HTTP request header "Accept-Language" using the BCP.47 format.
> They can even put multiple, prioritized languages. For example, I.E.
> can send something like this:
>
> Accept-Language: fr-FR,de-DE;q=0.5
>
> That means it prefers French (with default weight q=1.0) but will also
> accept German, but with a lower weight.
>
> If we were consistent with how we tag the languages, we could make
> better recommendations for users whose 1st language we don't support,
> using the same logic that websites do today.
I think we have a mix of both which was confusing to me as well at the
beginning. Pootle seems to use "_" where we in the office
"extras/l10n/source/..." use "-" and also for the language selection in
configure "--with-lang="en-US de es pt-BR ..."
Juergen
>
> -Rob
>
>>
>>> The even take it a step further, which might be useful in some cases.
>>> For example: sr-Latn-RS means Serbian language written in Latin
>>> script, as used in Serbia.
>>
>>
>> In this case we have both, and we call them "sh" and "sr":
>> http://www.openoffice.org/download/legacy/other.html
>> But indeed we wouldn't be able to use this trick in other, similar cases.
>>
>>
>> Regards,
>> Andrea.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
>> For additional commands, e-mail: l10n-help@openoffice.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: l10n-help@openoffice.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org
Re: Language codes ???
Posted by Rob Weir <ro...@apache.org>.
On Mon, Mar 18, 2013 at 4:10 PM, Andrea Pescetti <pe...@apache.org> wrote:
> Rob Weir wrote:
>>
>> Do you know why we don't just follow the IETF's recommendations in
>> this area? They have a similar scheme, BCP 47, but use a hyphen
>> rather than underscore, e.g., en-US, pt-BR. This is what is used on
>> the web in general, e.g., in HTTP headers.
>> See: http://www.rfc-editor.org/bcp/bcp47.txt
>
>
> I have absolutely no idea, probably it just happened that someone chose a
> convention for OpenOffice.
>
If it is possible to synch up on the BCP 47 standard, it might have
some advantages. For example, it should make recommending a specific
download for AOO very easy. Most browsers put the user's locale into
the HTTP request header "Accept-Language" using the BCP.47 format.
They can even put multiple, prioritized languages. For example, I.E.
can send something like this:
Accept-Language: fr-FR,de-DE;q=0.5
That means it prefers French (with default weight q=1.0) but will also
accept German, but with a lower weight.
If we were consistent with how we tag the languages, we could make
better recommendations for users whose 1st language we don't support,
using the same logic that websites do today.
-Rob
>
>> The even take it a step further, which might be useful in some cases.
>> For example: sr-Latn-RS means Serbian language written in Latin
>> script, as used in Serbia.
>
>
> In this case we have both, and we call them "sh" and "sr":
> http://www.openoffice.org/download/legacy/other.html
> But indeed we wouldn't be able to use this trick in other, similar cases.
>
>
> Regards,
> Andrea.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: l10n-help@openoffice.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org
Re: Language codes ???
Posted by Andrea Pescetti <pe...@apache.org>.
Rob Weir wrote:
> Do you know why we don't just follow the IETF's recommendations in
> this area? They have a similar scheme, BCP 47, but use a hyphen
> rather than underscore, e.g., en-US, pt-BR. This is what is used on
> the web in general, e.g., in HTTP headers.
> See: http://www.rfc-editor.org/bcp/bcp47.txt
I have absolutely no idea, probably it just happened that someone chose
a convention for OpenOffice.
> The even take it a step further, which might be useful in some cases.
> For example: sr-Latn-RS means Serbian language written in Latin
> script, as used in Serbia.
In this case we have both, and we call them "sh" and "sr":
http://www.openoffice.org/download/legacy/other.html
But indeed we wouldn't be able to use this trick in other, similar cases.
Regards,
Andrea.
---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org
Re: Language codes ???
Posted by Rob Weir <ro...@apache.org>.
On Sat, Mar 16, 2013 at 5:51 AM, Andrea Pescetti <pe...@apache.org> wrote:
> janI wrote:
>>
>> I have the following codes (directories):
>> af brx dz eu he ka ky my om ro ...
>>
>> Where can I find the relation between the directory names and the
>> languages (human names), someone (I think andrea) mentioned it was country
>> codes ?
>
>
> We don't use country codes, we rely on the LANGUAGE codes, which are ISO
> standards. So, in general:
> - if it is a two-letter code, look it up in ISO 639-1:
> http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes ("af" -> "Afrikaans")
> - if it is a three-letter code, use ISO 639-2 or (more complete, extends
> 639-2) 639-3: http://en.wikipedia.org/wiki/List_of_ISO_639-3_codes ("pap" ->
> "Papiamento")
>
>
>> I expected dialects within a language to be written as e.g. es_XX, and I
>> know there is an ongoing effort on translating to
>> Catalan Euskadi and Gallego
>
>
> No, this would be a dangerous approach! There is a lot of "political
> correctness" at work here. Everything that is in ISO is a language. So all
> languages spoken in Spain have equal dignity and their own codes. Catalan is
> "ca", Basque/Euskadi is "eu", Gallego is "gl" and you listed all three of
> them.
>
>
>> I am also a bit puzzled about pt_BR and ca_XV
>
>
> These are extensions made to accommodate language variants. Languages in the
> form '[a-z]*_[A-Z]*' are an internal convention to be read as:
> language_PLACE. So en_US means "English, as spoken in the US"; en_GB =
> "English, as spoken in Great Britain"; pt_BR = "Portoguese, as spoken in
> Brazil"; ca_XV = "Catalan, as spoken in Valencia [or Comunidad Valenciana]".
> zh_CN and zh_TW are often called "simplified" and "traditional" Chinese,
> instead of being linked to China and Taiwan as the two codes would mean.
>
Do you know why we don't just follow the IETF's recommendations in
this area? They have a similar scheme, BCP 47, but use a hyphen
rather than underscore, e.g., en-US, pt-BR. This is what is used on
the web in general, e.g., in HTTP headers.
See: http://www.rfc-editor.org/bcp/bcp47.txt
The even take it a step further, which might be useful in some cases.
For example: sr-Latn-RS means Serbian language written in Latin
script, as used in Serbia.
-Rob
> Regards,
> Andrea.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: l10n-help@openoffice.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org
Re: Language codes ???
Posted by Andrea Pescetti <pe...@apache.org>.
janI wrote:
> I have the following codes (directories):
> af brx dz eu he ka ky my om ro ...
> Where can I find the relation between the directory names and the
> languages (human names), someone (I think andrea) mentioned it was country
> codes ?
We don't use country codes, we rely on the LANGUAGE codes, which are ISO
standards. So, in general:
- if it is a two-letter code, look it up in ISO 639-1:
http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes ("af" -> "Afrikaans")
- if it is a three-letter code, use ISO 639-2 or (more complete, extends
639-2) 639-3: http://en.wikipedia.org/wiki/List_of_ISO_639-3_codes
("pap" -> "Papiamento")
> I expected dialects within a language to be written as e.g. es_XX, and I
> know there is an ongoing effort on translating to
> Catalan Euskadi and Gallego
No, this would be a dangerous approach! There is a lot of "political
correctness" at work here. Everything that is in ISO is a language. So
all languages spoken in Spain have equal dignity and their own codes.
Catalan is "ca", Basque/Euskadi is "eu", Gallego is "gl" and you listed
all three of them.
> I am also a bit puzzled about pt_BR and ca_XV
These are extensions made to accommodate language variants. Languages in
the form '[a-z]*_[A-Z]*' are an internal convention to be read as:
language_PLACE. So en_US means "English, as spoken in the US"; en_GB =
"English, as spoken in Great Britain"; pt_BR = "Portoguese, as spoken in
Brazil"; ca_XV = "Catalan, as spoken in Valencia [or Comunidad
Valenciana]". zh_CN and zh_TW are often called "simplified" and
"traditional" Chinese, instead of being linked to China and Taiwan as
the two codes would mean.
Regards,
Andrea.
---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org