You are viewing a plain text version of this content. The canonical link for it is here.
Posted to docs@httpd.apache.org by Christophe JAILLET <ch...@wanadoo.fr> on 2020/06/16 19:52:36 UTC

A new approach to doc translation ?

Hi,

What I consider the biggest drawback of our current doc translation 
process is that you have to keep it updated all the time in order to be 
able to follow the updates from the English version.

For a new comer, or someone who has just a few hours a week or month for 
it, I think that it is quite hard.

Not that docs updates happen so often, but when it gets out of synch, 
getting it back to a good shape looks hard to me.
You have to diff the English version so see what has changed. Then to 
find the impact in the translated files, then update it, then propose it 
via ML or BZ, then wait for someone to take it and apply it.

The few that have seen in the past years look rapidly discouraged and 
stop updating the doc rapidly.
One special mention to Lucien for the GREAT work he does for the French 
translation.


I've been looking for a tool that could do some xml --> po files 
updates. The files to translate would then be only some small pieces of 
text that could be handled by poedit or equivalent software.

The main advantages I see are:
    - ease to spot changes
    - same sentences in different files (or even branch) are translated 
only once
    - ease to merge work of different contributors
    - some translation web sites have a translation process that ease 
access to contributor, with the possibility for the translation 
community to validate others translation (Some years ago, I've been 
using https://translatewiki.net for that)

The drawbacks are the one of po files:
    - the context is missing when translating
    - this requires some additional scripting to generate and update the 
po files, and to convert them back to XML for our XSL based toolchain



Using something like po files for the translation would also lead to 
only partly localized files. Little by little, the not-updated part of 
the doc would get replaced by the more up-to-date English version. I 
don't think it is an issue. I prefer a mixed language document than 
having something that I can not trust because I don't know what is 
up-to-date or not.

itstool [1] is the most promising tool I found so far.
The main advantages it has is that it can easily be configured to tell 
what must not be translated. It also have a kind of placeholder 
mechanism. This fits perfectly well with our current XML based master 
documents.

I'm close to have a working PoC but I wanted to have your feedback on 
this approach to doc translation.

Attached is an example of all the mod/*/xml files processed and the 
rules file I've written so far.



Do you think that such an approach is viable ?

CJ


[1]: http://itstool.org/index.html

Re: A new approach to doc translation ?

Posted by Lucien Gentis <lu...@univ-lorraine.fr>.
Le 20/06/2020 à 11:11, Tom Fredrik Blenning a écrit :
> On 18/06/2020 17:37, Lucien Gentis wrote:
>> Le 17/06/2020 à 13:45, Tom Fredrik Blenning a écrit :
>>> Den 6/16/2020 9:52 PM, skrev Christophe JAILLET:
>>>> Hi,
>>>>
>>>> What I consider the biggest drawback of our current doc translation
>>>> process is that you have to keep it updated all the time in order to
>>>> be able to follow the updates from the English version.
>>>>
>>>> For a new comer, or someone who has just a few hours a week or month
>>>> for it, I think that it is quite hard.
>>>>
>>>> Not that docs updates happen so often, but when it gets out of synch,
>>>> getting it back to a good shape looks hard to me.
>>>> You have to diff the English version so see what has changed. Then to
>>>> find the impact in the translated files, then update it, then propose
>>>> it via ML or BZ, then wait for someone to take it and apply it.
>>>>
>>>> The few that have seen in the past years look rapidly discouraged and
>>>> stop updating the doc rapidly.
>>>> One special mention to Lucien for the GREAT work he does for the
>>>> French translation.
>>>>
>>>>
>>>> I've been looking for a tool that could do some xml --> po files
>>>> updates. The files to translate would then be only some small pieces
>>>> of text that could be handled by poedit or equivalent software.
>>>>
>>>> The main advantages I see are:
>>>>     - ease to spot changes
>>>>     - same sentences in different files (or even branch) are
>>>> translated only once
>>>>     - ease to merge work of different contributors
>>>>     - some translation web sites have a translation process that ease
>>>> access to contributor, with the possibility for the translation
>>>> community to validate others translation (Some years ago, I've been
>>>> using https://translatewiki.net for that)
>>>>
>>>> The drawbacks are the one of po files:
>>>>     - the context is missing when translating
>>>>     - this requires some additional scripting to generate and update
>>>> the po files, and to convert them back to XML for our XSL based
>>>> toolchain
>>>>
>>>>
>>>>
>>>> Using something like po files for the translation would also lead to
>>>> only partly localized files. Little by little, the not-updated part
>>>> of the doc would get replaced by the more up-to-date English version.
>>>> I don't think it is an issue. I prefer a mixed language document than
>>>> having something that I can not trust because I don't know what is
>>>> up-to-date or not.
>>>>
>>>> itstool [1] is the most promising tool I found so far.
>>>> The main advantages it has is that it can easily be configured to
>>>> tell what must not be translated. It also have a kind of placeholder
>>>> mechanism. This fits perfectly well with our current XML based master
>>>> documents.
>>>>
>>>> I'm close to have a working PoC but I wanted to have your feedback on
>>>> this approach to doc translation.
>>>>
>>>> Attached is an example of all the mod/*/xml files processed and the
>>>> rules file I've written so far.
>>>>
>>>>
>>>>
>>>> Do you think that such an approach is viable ?
>>>
>>> Hi,
>>>
>>> I'm just a lurker who once did some Norwegian translation, but I am
>>> from time to time involved in translations in other projects.
>>>
>>> The process you describe is consistent with what we do in other
>>> projects, and is in my opinion the prefered method. The drawback of
>>> missing context can to a large degree be ameliorated by build automation.
>>>
>>> What I do in some projects I am responsible for is that I set a limit,
>>> at least X % of the project must be translated in order for it to be
>>> published. In my personal opinion, at about 95% a translation becomes
>>> useful, anything less leaves the whole thing as a mess. It's better to
>>> concede defeat and either publish outdated docs, clearly marked or
>>> redirect to an actually completed translation in another language. Eg.
>>> English as a default.
>>>
>>> I'm a big believer in using Weblate as it enables the whole
>>> translation to be somewhat democratized. Anyone can suggest a new
>>> translation if enabled, and someone authorized can choose to accept or
>>> reject it. This is separated from the actual repository access.
>>>
>>> So in short, I think this is the way forward.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
>>> For additional commands, e-mail: docs-help@httpd.apache.org
>>>
>> Hello everybody,
>>
>> About newcomers, it seems that the main problem is to find reviewers.
>> (Aleksey, are you still here ?)
>>
>> About translations updates :
>>
>> I have downloaded the two svn repos, say in /2.4-repos and /trunk-repos
>>
>> All english XML files are saved  in a backup directory on my computer.
>>
>> Every time I want to update my xml files, I do "svn update" in
>> /2.4-repos and /trunk-repos, then I filter the output to only see XML
>> files.
>>
>> Yet, I only have to do a diff between original XML file in the backup
>> directory and the corresponding one that was modified in the svn repos.
>>
>> I think it's not so hard to do.
> I digress.
>
> For you and me that might not be a hurdle, but I dear you to introduce
> that process to the 10 next people you meet outside of a developer
> environment, chances are they will not understand what you talk about.
>
> We have pensioners who are, with all due respect, computer illiterates
> doing translation for us. Apache is a very specialized project, so I
> don't think there will be an avalanche of pensioners volunteering to do
> translations on this, but I do think there's a lot of more casual users
> who would be able to do this if it was more accessible. Even if they are
> capable of understanding svn and diff, there's a hurdle to participation.
>
> Participation is always the key, but participation often requires ease
> of access.
>
> -Tom Fredrik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
> For additional commands, e-mail: docs-help@httpd.apache.org
>
I only wanted to show to Apache doc team that it was not so difficult to 
handle doc translation.

Of course, if a newcomer asked me to give some tips, I should give 
him/her more detailed explanations. (with 
http://httpd.apache.org/docs-project/translations.html as support)

All in all, there are only a few svn commands to know, the diff command, 
save/open/edit files in the file system.

Finally, if another translation environment is to be installed, this is 
not a problem for me, I'll adapt myself.


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: A new approach to doc translation ?

Posted by Tom Fredrik Blenning <bf...@blenning.no>.
On 18/06/2020 17:37, Lucien Gentis wrote:
> 
> Le 17/06/2020 à 13:45, Tom Fredrik Blenning a écrit :
>> Den 6/16/2020 9:52 PM, skrev Christophe JAILLET:
>>> Hi,
>>>
>>> What I consider the biggest drawback of our current doc translation
>>> process is that you have to keep it updated all the time in order to
>>> be able to follow the updates from the English version.
>>>
>>> For a new comer, or someone who has just a few hours a week or month
>>> for it, I think that it is quite hard.
>>>
>>> Not that docs updates happen so often, but when it gets out of synch,
>>> getting it back to a good shape looks hard to me.
>>> You have to diff the English version so see what has changed. Then to
>>> find the impact in the translated files, then update it, then propose
>>> it via ML or BZ, then wait for someone to take it and apply it.
>>>
>>> The few that have seen in the past years look rapidly discouraged and
>>> stop updating the doc rapidly.
>>> One special mention to Lucien for the GREAT work he does for the
>>> French translation.
>>>
>>>
>>> I've been looking for a tool that could do some xml --> po files
>>> updates. The files to translate would then be only some small pieces
>>> of text that could be handled by poedit or equivalent software.
>>>
>>> The main advantages I see are:
>>>    - ease to spot changes
>>>    - same sentences in different files (or even branch) are
>>> translated only once
>>>    - ease to merge work of different contributors
>>>    - some translation web sites have a translation process that ease
>>> access to contributor, with the possibility for the translation
>>> community to validate others translation (Some years ago, I've been
>>> using https://translatewiki.net for that)
>>>
>>> The drawbacks are the one of po files:
>>>    - the context is missing when translating
>>>    - this requires some additional scripting to generate and update
>>> the po files, and to convert them back to XML for our XSL based
>>> toolchain
>>>
>>>
>>>
>>> Using something like po files for the translation would also lead to
>>> only partly localized files. Little by little, the not-updated part
>>> of the doc would get replaced by the more up-to-date English version.
>>> I don't think it is an issue. I prefer a mixed language document than
>>> having something that I can not trust because I don't know what is
>>> up-to-date or not.
>>>
>>> itstool [1] is the most promising tool I found so far.
>>> The main advantages it has is that it can easily be configured to
>>> tell what must not be translated. It also have a kind of placeholder
>>> mechanism. This fits perfectly well with our current XML based master
>>> documents.
>>>
>>> I'm close to have a working PoC but I wanted to have your feedback on
>>> this approach to doc translation.
>>>
>>> Attached is an example of all the mod/*/xml files processed and the
>>> rules file I've written so far.
>>>
>>>
>>>
>>> Do you think that such an approach is viable ?
>>
>>
>> Hi,
>>
>> I'm just a lurker who once did some Norwegian translation, but I am
>> from time to time involved in translations in other projects.
>>
>> The process you describe is consistent with what we do in other
>> projects, and is in my opinion the prefered method. The drawback of
>> missing context can to a large degree be ameliorated by build automation.
>>
>> What I do in some projects I am responsible for is that I set a limit,
>> at least X % of the project must be translated in order for it to be
>> published. In my personal opinion, at about 95% a translation becomes
>> useful, anything less leaves the whole thing as a mess. It's better to
>> concede defeat and either publish outdated docs, clearly marked or
>> redirect to an actually completed translation in another language. Eg.
>> English as a default.
>>
>> I'm a big believer in using Weblate as it enables the whole
>> translation to be somewhat democratized. Anyone can suggest a new
>> translation if enabled, and someone authorized can choose to accept or
>> reject it. This is separated from the actual repository access.
>>
>> So in short, I think this is the way forward.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
>> For additional commands, e-mail: docs-help@httpd.apache.org
>>
> Hello everybody,
> 
> About newcomers, it seems that the main problem is to find reviewers.
> (Aleksey, are you still here ?)
> 
> About translations updates :
> 
> I have downloaded the two svn repos, say in /2.4-repos and /trunk-repos
> 
> All english XML files are saved  in a backup directory on my computer.
> 
> Every time I want to update my xml files, I do "svn update" in
> /2.4-repos and /trunk-repos, then I filter the output to only see XML
> files.
> 
> Yet, I only have to do a diff between original XML file in the backup
> directory and the corresponding one that was modified in the svn repos.
> 
> I think it's not so hard to do.

I digress.

For you and me that might not be a hurdle, but I dear you to introduce
that process to the 10 next people you meet outside of a developer
environment, chances are they will not understand what you talk about.

We have pensioners who are, with all due respect, computer illiterates
doing translation for us. Apache is a very specialized project, so I
don't think there will be an avalanche of pensioners volunteering to do
translations on this, but I do think there's a lot of more casual users
who would be able to do this if it was more accessible. Even if they are
capable of understanding svn and diff, there's a hurdle to participation.

Participation is always the key, but participation often requires ease
of access.

-Tom Fredrik

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: A new approach to doc translation ?

Posted by Lucien Gentis <lu...@univ-lorraine.fr>.
Le 17/06/2020 à 13:45, Tom Fredrik Blenning a écrit :
> Den 6/16/2020 9:52 PM, skrev Christophe JAILLET:
>> Hi,
>>
>> What I consider the biggest drawback of our current doc translation 
>> process is that you have to keep it updated all the time in order to 
>> be able to follow the updates from the English version.
>>
>> For a new comer, or someone who has just a few hours a week or month 
>> for it, I think that it is quite hard.
>>
>> Not that docs updates happen so often, but when it gets out of synch, 
>> getting it back to a good shape looks hard to me.
>> You have to diff the English version so see what has changed. Then to 
>> find the impact in the translated files, then update it, then propose 
>> it via ML or BZ, then wait for someone to take it and apply it.
>>
>> The few that have seen in the past years look rapidly discouraged and 
>> stop updating the doc rapidly.
>> One special mention to Lucien for the GREAT work he does for the 
>> French translation.
>>
>>
>> I've been looking for a tool that could do some xml --> po files 
>> updates. The files to translate would then be only some small pieces 
>> of text that could be handled by poedit or equivalent software.
>>
>> The main advantages I see are:
>>    - ease to spot changes
>>    - same sentences in different files (or even branch) are 
>> translated only once
>>    - ease to merge work of different contributors
>>    - some translation web sites have a translation process that ease 
>> access to contributor, with the possibility for the translation 
>> community to validate others translation (Some years ago, I've been 
>> using https://translatewiki.net for that)
>>
>> The drawbacks are the one of po files:
>>    - the context is missing when translating
>>    - this requires some additional scripting to generate and update 
>> the po files, and to convert them back to XML for our XSL based 
>> toolchain
>>
>>
>>
>> Using something like po files for the translation would also lead to 
>> only partly localized files. Little by little, the not-updated part 
>> of the doc would get replaced by the more up-to-date English version. 
>> I don't think it is an issue. I prefer a mixed language document than 
>> having something that I can not trust because I don't know what is 
>> up-to-date or not.
>>
>> itstool [1] is the most promising tool I found so far.
>> The main advantages it has is that it can easily be configured to 
>> tell what must not be translated. It also have a kind of placeholder 
>> mechanism. This fits perfectly well with our current XML based master 
>> documents.
>>
>> I'm close to have a working PoC but I wanted to have your feedback on 
>> this approach to doc translation.
>>
>> Attached is an example of all the mod/*/xml files processed and the 
>> rules file I've written so far.
>>
>>
>>
>> Do you think that such an approach is viable ?
>
>
> Hi,
>
> I'm just a lurker who once did some Norwegian translation, but I am 
> from time to time involved in translations in other projects.
>
> The process you describe is consistent with what we do in other 
> projects, and is in my opinion the prefered method. The drawback of 
> missing context can to a large degree be ameliorated by build automation.
>
> What I do in some projects I am responsible for is that I set a limit, 
> at least X % of the project must be translated in order for it to be 
> published. In my personal opinion, at about 95% a translation becomes 
> useful, anything less leaves the whole thing as a mess. It's better to 
> concede defeat and either publish outdated docs, clearly marked or 
> redirect to an actually completed translation in another language. Eg. 
> English as a default.
>
> I'm a big believer in using Weblate as it enables the whole 
> translation to be somewhat democratized. Anyone can suggest a new 
> translation if enabled, and someone authorized can choose to accept or 
> reject it. This is separated from the actual repository access.
>
> So in short, I think this is the way forward.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
> For additional commands, e-mail: docs-help@httpd.apache.org
>
Hello everybody,

About newcomers, it seems that the main problem is to find reviewers. 
(Aleksey, are you still here ?)

About translations updates :

I have downloaded the two svn repos, say in /2.4-repos and /trunk-repos

All english XML files are saved  in a backup directory on my computer.

Every time I want to update my xml files, I do "svn update" in 
/2.4-repos and /trunk-repos, then I filter the output to only see XML files.

Yet, I only have to do a diff between original XML file in the backup 
directory and the corresponding one that was modified in the svn repos.

I think it's not so hard to do.

Lucien



---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: A new approach to doc translation ?

Posted by Tom Fredrik Blenning <bf...@blenning.no>.
Den 6/16/2020 9:52 PM, skrev Christophe JAILLET:
> Hi,
>
> What I consider the biggest drawback of our current doc translation 
> process is that you have to keep it updated all the time in order to 
> be able to follow the updates from the English version.
>
> For a new comer, or someone who has just a few hours a week or month 
> for it, I think that it is quite hard.
>
> Not that docs updates happen so often, but when it gets out of synch, 
> getting it back to a good shape looks hard to me.
> You have to diff the English version so see what has changed. Then to 
> find the impact in the translated files, then update it, then propose 
> it via ML or BZ, then wait for someone to take it and apply it.
>
> The few that have seen in the past years look rapidly discouraged and 
> stop updating the doc rapidly.
> One special mention to Lucien for the GREAT work he does for the 
> French translation.
>
>
> I've been looking for a tool that could do some xml --> po files 
> updates. The files to translate would then be only some small pieces 
> of text that could be handled by poedit or equivalent software.
>
> The main advantages I see are:
>    - ease to spot changes
>    - same sentences in different files (or even branch) are translated 
> only once
>    - ease to merge work of different contributors
>    - some translation web sites have a translation process that ease 
> access to contributor, with the possibility for the translation 
> community to validate others translation (Some years ago, I've been 
> using https://translatewiki.net for that)
>
> The drawbacks are the one of po files:
>    - the context is missing when translating
>    - this requires some additional scripting to generate and update 
> the po files, and to convert them back to XML for our XSL based toolchain
>
>
>
> Using something like po files for the translation would also lead to 
> only partly localized files. Little by little, the not-updated part of 
> the doc would get replaced by the more up-to-date English version. I 
> don't think it is an issue. I prefer a mixed language document than 
> having something that I can not trust because I don't know what is 
> up-to-date or not.
>
> itstool [1] is the most promising tool I found so far.
> The main advantages it has is that it can easily be configured to tell 
> what must not be translated. It also have a kind of placeholder 
> mechanism. This fits perfectly well with our current XML based master 
> documents.
>
> I'm close to have a working PoC but I wanted to have your feedback on 
> this approach to doc translation.
>
> Attached is an example of all the mod/*/xml files processed and the 
> rules file I've written so far.
>
>
>
> Do you think that such an approach is viable ?


Hi,

I'm just a lurker who once did some Norwegian translation, but I am from 
time to time involved in translations in other projects.

The process you describe is consistent with what we do in other 
projects, and is in my opinion the prefered method. The drawback of 
missing context can to a large degree be ameliorated by build automation.

What I do in some projects I am responsible for is that I set a limit, 
at least X % of the project must be translated in order for it to be 
published. In my personal opinion, at about 95% a translation becomes 
useful, anything less leaves the whole thing as a mess. It's better to 
concede defeat and either publish outdated docs, clearly marked or 
redirect to an actually completed translation in another language. Eg. 
English as a default.

I'm a big believer in using Weblate as it enables the whole translation 
to be somewhat democratized. Anyone can suggest a new translation if 
enabled, and someone authorized can choose to accept or reject it. This 
is separated from the actual repository access.

So in short, I think this is the way forward.


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: A new approach to doc translation ?

Posted by Jacob Champion <ch...@gmail.com>.
Hi Christophe,

I've been researching reST/Sphinx lately (email coming Sometime Soon -- 
hopefully later today) and wanted to chime in with some observations, 
based on the translation workflow for that.

On 6/16/20 12:52 PM, Christophe JAILLET wrote:
> I've been looking for a tool that could do some xml --> po files 
> updates. The files to translate would then be only some small pieces of 
> text that could be handled by poedit or equivalent software.

This is also Sphinx's approach to translation, based on gettext.

> The main advantages I see are:
>     - ease to spot changes
>     - same sentences in different files (or even branch) are translated 
> only once

FWIW, Sphinx splits translatable chunks by paragraph when constructing 
its .pot templates, and it puts them into separate files based on the 
source location. So though you might have to duplicate some work, you 
also get a little more context.

(Though from looking at your .pot template, it looks like your tool also 
set up some, but not all, of the translations this way.)

>     - ease to merge work of different contributors
>     - some translation web sites have a translation process that ease 
> access to contributor, with the possibility for the translation 
> community to validate others translation (Some years ago, I've been 
> using https://translatewiki.net for that)

As another example, the sphinx-intl tool integrates directly with the 
Transifex service, which appears to be used by the Sphinx project 
itself. It looks like it may have a "free" tier for OSS projects. I know 
nothing more about it.
> Using something like po files for the translation would also lead to 
> only partly localized files. Little by little, the not-updated part of 
> the doc would get replaced by the more up-to-date English version. I 
> don't think it is an issue. I prefer a mixed language document than 
> having something that I can not trust because I don't know what is 
> up-to-date or not.

This was also one of the big questions I had about the Sphinx approach. 
My own opinion isn't particularly useful here, since I consume the docs 
in English.

> Do you think that such an approach is viable ? 

Given that other large projects seem to use a similar approach, it seems 
like it should be viable from a technical perspective. I can't speak to 
the usability of the .po translation tools themselves.

--Jacob

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org