You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2010/09/09 10:48:46 UTC

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

CC += dev@, and let me point you to our patch submission guidelines:
http://subversion.apache.org/docs/community-guide/general.html#patches

Summary for dev@: allow svnsync to translate non-UTF-8 log messages to UTF-8.

(more below)

Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700:
> On Wed, Sep 8, 2010 at 4:39 PM, Daniel Trebbien <dt...@gmail.com> wrote:
> > I think that a call to `svn_subst_translate_string`
> > (http://svn.collab.net/svn-doxygen/svn__subst_8h.html#a29) needs to be
> > added in the `svnsync_normalize_revprops` function when `propname` is
> > "svn:log".
> 
> After applying the following patch to `subversion/svnsync/sync.c`, I
> can confirm that the "svnsync: Error while replaying commit" error
> disappears, allowing the sync to complete, and that the problem is
> indeed a log message encoding issue:
> diff --git a/subversion/svnsync/sync.c b/subversion/svnsync/sync.c
> index 8f7be9e..c7a5e38 100644
> --- a/subversion/svnsync/sync.c
> +++ b/subversion/svnsync/sync.c
> @@ -114,6 +114,14 @@ svnsync_normalize_revprops(apr_hash_t *rev_props,
>                /* And count this. */
>                (*normalized_count)++;
>              }
> +
> +          if (strcmp(propname, "svn:log") == 0)
> +            {

Should this use svn_prop_needs_translation()?

> +              svn_string_t *new_propval;
> +              SVN_ERR(svn_subst_translate_string(&new_propval, propval, "ISO-8859-1", pool));
> +              apr_hash_set(rev_props, propname, APR_HASH_KEY_STRING, propval = new_propval);

Please, please, please, no assignment inside a function call.           ^^^^^^^^^

> +              (*normalized_count)++;
> +            }
>          }
>      }
>    return SVN_NO_ERROR;
> 
> 
> Here I hard-coded the encoding, but I think that the encoding to use
> should be retrieved from the Subversion config file. Now the questions
> are:
> 
> 1. What is the best way to pass the `log-encoding` option of the
> [miscellany] section to the `svnsync_normalize_revprops` function?
> 

What is 'log-encoding' normally used for?  Only for recoding the commit
message supplied by an editor, right?  So I'm not sure we should use it
here, perhaps a new command-line option will be better.  (svnsync
$subcommand --source-encoding=%s)

If you'd like to use a config option, get it from 'svn_config_t *config'
in main(), stash it in the subcommand baton, and drill it down as deep
as needed.

And either way, the default behaviour should be to translate nothing.
(If you always translate from latin1, you break syncs for people who,
correctly, used UTF-8 log messages the entire time.)

> 2. Should `svnsync` always store the log messages in UTF-8 even though
> the original repository log message encoding is different?
> 

You have no choice on the matter: the RA API instructs you to supply it
a UTF-8 log message.  (IIRC, even the libsvn_client API expects an
already-UTF-8 message.)

> 3. Should there be separate configuration options for the log message
> encoding of the source repository and the log message encoding of the
> destination repository?

No, see (2).

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Trebbien wrote on Thu, Sep 09, 2010 at 08:48:43 -0700:
> On Thu, Sep 9, 2010 at 3:48 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700:
> >> +          if (strcmp(propname, "svn:log") == 0)
> >> +            {
> >
> > Should this use svn_prop_needs_translation()?
> 
> Though it does not appear so, the added lines are within the check for
> `svn_prop_needs_translation`.
> 

Okay.  But my point was really, "shouldn't the new block you're adding
apply to any prop that svn_prop_needs_translation() returns TRUE for,
rather than only for SVN_PROP_REVISION_LOG?"

> >> 1. What is the best way to pass the `log-encoding` option of the
> >> [miscellany] section to the `svnsync_normalize_revprops` function?
> >>
> >
> > What is 'log-encoding' normally used for?  Only for recoding the commit
> > message supplied by an editor, right?  So I'm not sure we should use it
> > here, perhaps a new command-line option will be better.  (svnsync
> > $subcommand --source-encoding=%s)
> 
> I like the idea of adding a command line option, so I think that this
> is the way to go.
> 
> Do other properties need to be re-encoded, potentially?

That's why I suggested svn_prop_needs_translation().  Have you read its
docstring?  

[[[
/** If @a prop_name requires that its value be stored as UTF8/LF in the
 * repository, then return @c TRUE.  Else return @c FALSE.  This is for
 * users of libsvn_client or libsvn_fs, since it their responsibility
 * to do this translation in both directions.  (See
 * svn_subst_translate_string()/svn_subst_detranslate_string() for
 * help with this task.)
 */
svn_boolean_t
svn_prop_needs_translation(const char *prop_name);
]]]

> I was only
> thinking about `svn:log`, but perhaps other properties might need
> re-encoding as well. If not, then I think that the command line option
> should be more specific (e.g.: `--source-log-encoding=%s`).
> 

svn:log is certainly the most common case.  I think we had very few
reports on users@ of this error occurring with other props (e.g.,
svn:author and svn:externals).

> Another question:  which line of code raises the "svnsync: Error while
> replaying commit" error message? I would like to try to make this more
> helpful.

If you build --enable-maintainer-mode, you should get the file-line
coordinates of where the error was raised (and most places that
returned it, too).

Failing that, there's a shortcut that works most of the time: look in
subversion/po/fr.po for "Error while replaying commit" :-)

Daniel

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Danny Trebbien <dt...@gmail.com>.
> > Hi Danny,
> > I am working with Geir Engebakken on this and we understand you won't release
> > the patch now until 1.7.x - That's fine but I wanted to try out the patch myself
> > now so I tried to run it in and it is looking for sync.c and sync.h which dont
> > seem to exist in my source tree. I have downloaded the latest version 1.6.15
> > ??
> >
> > Any help is appreciated!!
> >
> > -Chris
>
> Resolved now, seems the sync.c file is added in the Subversion source trunk (1.7?) , but is not not in any
> 1.6.x releases of the source, so that fooled us.
>
> Geir

Hi Chris and Geir,

My apologies for the belated response. I do not receive e-mails from
the Subversion Developers' or Subversion Users' lists unless I am
CC'ed directly.

I do not know if you saw
<http://article.gmane.org/gmane.comp.version-control.subversion.user/101687>
or not, but you may find it helpful.

Let me know if you have any other questions. Just be sure to click
"Reply to All" so that I will see it :)

Danny

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Danny Trebbien <dt...@gmail.com>.
Geir,

I looked through some scripts that I wrote to help me sync the GNU
Nano repository and I came across a Perl script that might be useful
to you in quickly identifying all log messages that are not
representable in ASCII (hence possibly not UTF-8).

Attached is the source of the script. To use it, you will need the
libsvn Perl bindings (on Debian, install the `libsvn-perl` package),
and you will need to edit line 20 to change the URL of the Subversion
repository that you wish to examine.

Example output for svn://svn.sv.gnu.org/nano is:
------------------------------------------------------------------------
r619
Added Galician translation by Jacobo Tarr<jt...@trasno.net>.

------------------------------------------------------------------------
r757
Updated Galician translation; thanks, Jacobo Tarr

------------------------------------------------------------------------
r826
Galician translation brought up to date for 1.1.2 by Jacobo Tarr

------------------------------------------------------------------------
r954
Galician translation update (Jacobo Tarr.

------------------------------------------------------------------------
r958
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r962
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1009
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E Sandoe <st...@ii.uib.no>.
Updated Norwegian nynorsk translation, by Kjetil Torgrim Homme
<kj...@linpro.no>.

------------------------------------------------------------------------
r1013
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E Sand𠼳tig@users.sourceforge.net>.
Added missing entries to THANKS.

------------------------------------------------------------------------
r1047
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1070
Norwegian bokm欠translation updates (Stig E Sandoe).

------------------------------------------------------------------------
r1071
Norwegian bokm欠translation updates (Stig E Sand𩮍

------------------------------------------------------------------------
r1072
Norwegian bokm欠translation updates (Stig E Sand𩮍

------------------------------------------------------------------------
r1125
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1133
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1258
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1259
Spanish translation updates (Ricardo Javier Cⳤenes Medina).

------------------------------------------------------------------------
r1299
Updated Spanish translation (Ricardo Javier Cⳤenes Medina).

------------------------------------------------------------------------
r1301
Updated French translation (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1500
Updated French translation by Jean-Philippe Gu곡rd.

------------------------------------------------------------------------
r1537
Updated French translation by Jean-Philippe Gu곡rd.

------------------------------------------------------------------------
r1923
Updated French translation by Jean-Philippe Guérard.

------------------------------------------------------------------------
r2102
spell Ulf H峮hammar's name right

------------------------------------------------------------------------
r2373
in do_credits(), display Florian König's name properly in UTF-8 mode;
since we can't dynamically set that element of the array to its UTF-8
equivalent when in UTF-8 mode, we have to use the ISO-8859-1 version and
pass every string in the credits through make_mbstring() to make sure
they're all UTF-8 (sigh)

------------------------------------------------------------------------
r2784
rework the credits handling to display Florian König's name properly
whether we're in a UTF-8 locale or not.  This requires a minor hack, but
it's better than requiring a massive function that we only use once

------------------------------------------------------------------------
r2898
Update French manpages by Jean-Philippe Guérard.

------------------------------------------------------------------------
r3924
Update French manpages by Jean-Philippe Guérard.

------------------------------------------------------------------------
r4181
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply

------------------------------------------------------------------------
r4182
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply

------------------------------------------------------------------------
r4208
in print_opt_full(), use strlenpt() instead of strlen(), so that tabs
are placed properly when displaying translated strings in UTF-8, as
found by Jean-Philippe Guérard

------------------------------------------------------------------------

The corrupted-looking entries are the ones where the log message is
incorrectly stored in ISO-8859-1.

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Danny Trebbien <dt...@gmail.com>.
Geir,

I looked through some scripts that I wrote to help me sync the GNU
Nano repository and I came across a Perl script that might be useful
to you in quickly identifying all log messages that are not
representable in ASCII (hence possibly not UTF-8).

Attached is the source of the script. To use it, you will need the
libsvn Perl bindings (on Debian, install the `libsvn-perl` package),
and you will need to edit line 20 to change the URL of the Subversion
repository that you wish to examine.

Example output for svn://svn.sv.gnu.org/nano is:
------------------------------------------------------------------------
r619
Added Galician translation by Jacobo Tarr<jt...@trasno.net>.

------------------------------------------------------------------------
r757
Updated Galician translation; thanks, Jacobo Tarr

------------------------------------------------------------------------
r826
Galician translation brought up to date for 1.1.2 by Jacobo Tarr

------------------------------------------------------------------------
r954
Galician translation update (Jacobo Tarr.

------------------------------------------------------------------------
r958
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r962
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1009
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E Sandoe <st...@ii.uib.no>.
Updated Norwegian nynorsk translation, by Kjetil Torgrim Homme
<kj...@linpro.no>.

------------------------------------------------------------------------
r1013
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E Sand𠼳tig@users.sourceforge.net>.
Added missing entries to THANKS.

------------------------------------------------------------------------
r1047
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1070
Norwegian bokm欠translation updates (Stig E Sandoe).

------------------------------------------------------------------------
r1071
Norwegian bokm欠translation updates (Stig E Sand𩮍

------------------------------------------------------------------------
r1072
Norwegian bokm欠translation updates (Stig E Sand𩮍

------------------------------------------------------------------------
r1125
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1133
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1258
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1259
Spanish translation updates (Ricardo Javier Cⳤenes Medina).

------------------------------------------------------------------------
r1299
Updated Spanish translation (Ricardo Javier Cⳤenes Medina).

------------------------------------------------------------------------
r1301
Updated French translation (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1500
Updated French translation by Jean-Philippe Gu곡rd.

------------------------------------------------------------------------
r1537
Updated French translation by Jean-Philippe Gu곡rd.

------------------------------------------------------------------------
r1923
Updated French translation by Jean-Philippe Guérard.

------------------------------------------------------------------------
r2102
spell Ulf H峮hammar's name right

------------------------------------------------------------------------
r2373
in do_credits(), display Florian König's name properly in UTF-8 mode;
since we can't dynamically set that element of the array to its UTF-8
equivalent when in UTF-8 mode, we have to use the ISO-8859-1 version and
pass every string in the credits through make_mbstring() to make sure
they're all UTF-8 (sigh)

------------------------------------------------------------------------
r2784
rework the credits handling to display Florian König's name properly
whether we're in a UTF-8 locale or not.  This requires a minor hack, but
it's better than requiring a massive function that we only use once

------------------------------------------------------------------------
r2898
Update French manpages by Jean-Philippe Guérard.

------------------------------------------------------------------------
r3924
Update French manpages by Jean-Philippe Guérard.

------------------------------------------------------------------------
r4181
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply

------------------------------------------------------------------------
r4182
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply

------------------------------------------------------------------------
r4208
in print_opt_full(), use strlenpt() instead of strlen(), so that tabs
are placed properly when displaying translated strings in UTF-8, as
found by Jean-Philippe Guérard

------------------------------------------------------------------------

The corrupted-looking entries are the ones where the log message is
incorrectly stored in ISO-8859-1.

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Danny Trebbien <dt...@gmail.com>.
Geir,

I looked through some scripts that I wrote to help me sync the GNU
Nano repository and I came across a Perl script that might be useful
to you in quickly identifying all log messages that are not
representable in ASCII (hence possibly not UTF-8).

Attached is the source of the script. To use it, you will need the
libsvn Perl bindings (on Debian, install the `libsvn-perl` package),
and you will need to edit line 20 to change the URL of the Subversion
repository that you wish to examine.

Example output for svn://svn.sv.gnu.org/nano is:
------------------------------------------------------------------------
r619
Added Galician translation by Jacobo Tarr<jt...@trasno.net>.

------------------------------------------------------------------------
r757
Updated Galician translation; thanks, Jacobo Tarr

------------------------------------------------------------------------
r826
Galician translation brought up to date for 1.1.2 by Jacobo Tarr

------------------------------------------------------------------------
r954
Galician translation update (Jacobo Tarr.

------------------------------------------------------------------------
r958
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r962
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1009
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E Sandoe <st...@ii.uib.no>.
Updated Norwegian nynorsk translation, by Kjetil Torgrim Homme
<kj...@linpro.no>.

------------------------------------------------------------------------
r1013
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E Sand𠼳tig@users.sourceforge.net>.
Added missing entries to THANKS.

------------------------------------------------------------------------
r1047
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1070
Norwegian bokm欠translation updates (Stig E Sandoe).

------------------------------------------------------------------------
r1071
Norwegian bokm欠translation updates (Stig E Sand𩮍

------------------------------------------------------------------------
r1072
Norwegian bokm欠translation updates (Stig E Sand𩮍

------------------------------------------------------------------------
r1125
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1133
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1258
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1259
Spanish translation updates (Ricardo Javier Cⳤenes Medina).

------------------------------------------------------------------------
r1299
Updated Spanish translation (Ricardo Javier Cⳤenes Medina).

------------------------------------------------------------------------
r1301
Updated French translation (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1500
Updated French translation by Jean-Philippe Gu곡rd.

------------------------------------------------------------------------
r1537
Updated French translation by Jean-Philippe Gu곡rd.

------------------------------------------------------------------------
r1923
Updated French translation by Jean-Philippe Guérard.

------------------------------------------------------------------------
r2102
spell Ulf H峮hammar's name right

------------------------------------------------------------------------
r2373
in do_credits(), display Florian König's name properly in UTF-8 mode;
since we can't dynamically set that element of the array to its UTF-8
equivalent when in UTF-8 mode, we have to use the ISO-8859-1 version and
pass every string in the credits through make_mbstring() to make sure
they're all UTF-8 (sigh)

------------------------------------------------------------------------
r2784
rework the credits handling to display Florian König's name properly
whether we're in a UTF-8 locale or not.  This requires a minor hack, but
it's better than requiring a massive function that we only use once

------------------------------------------------------------------------
r2898
Update French manpages by Jean-Philippe Guérard.

------------------------------------------------------------------------
r3924
Update French manpages by Jean-Philippe Guérard.

------------------------------------------------------------------------
r4181
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply

------------------------------------------------------------------------
r4182
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply

------------------------------------------------------------------------
r4208
in print_opt_full(), use strlenpt() instead of strlen(), so that tabs
are placed properly when displaying translated strings in UTF-8, as
found by Jean-Philippe Guérard

------------------------------------------------------------------------

The corrupted-looking entries are the ones where the log message is
incorrectly stored in ISO-8859-1.

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Danny Trebbien <dt...@gmail.com>.
Geir,

I looked through some scripts that I wrote to help me sync the GNU
Nano repository and I came across a Perl script that might be useful
to you in quickly identifying all log messages that are not
representable in ASCII (hence possibly not UTF-8).

Attached is the source of the script. To use it, you will need the
libsvn Perl bindings (on Debian, install the `libsvn-perl` package),
and you will need to edit line 20 to change the URL of the Subversion
repository that you wish to examine.

Example output for svn://svn.sv.gnu.org/nano is:
------------------------------------------------------------------------
r619
Added Galician translation by Jacobo Tarr<jt...@trasno.net>.

------------------------------------------------------------------------
r757
Updated Galician translation; thanks, Jacobo Tarr

------------------------------------------------------------------------
r826
Galician translation brought up to date for 1.1.2 by Jacobo Tarr

------------------------------------------------------------------------
r954
Galician translation update (Jacobo Tarr.

------------------------------------------------------------------------
r958
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r962
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1009
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E Sandoe <st...@ii.uib.no>.
Updated Norwegian nynorsk translation, by Kjetil Torgrim Homme
<kj...@linpro.no>.

------------------------------------------------------------------------
r1013
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E Sand𠼳tig@users.sourceforge.net>.
Added missing entries to THANKS.

------------------------------------------------------------------------
r1047
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1070
Norwegian bokm欠translation updates (Stig E Sandoe).

------------------------------------------------------------------------
r1071
Norwegian bokm欠translation updates (Stig E Sand𩮍

------------------------------------------------------------------------
r1072
Norwegian bokm欠translation updates (Stig E Sand𩮍

------------------------------------------------------------------------
r1125
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1133
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1258
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1259
Spanish translation updates (Ricardo Javier Cⳤenes Medina).

------------------------------------------------------------------------
r1299
Updated Spanish translation (Ricardo Javier Cⳤenes Medina).

------------------------------------------------------------------------
r1301
Updated French translation (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1500
Updated French translation by Jean-Philippe Gu곡rd.

------------------------------------------------------------------------
r1537
Updated French translation by Jean-Philippe Gu곡rd.

------------------------------------------------------------------------
r1923
Updated French translation by Jean-Philippe Guérard.

------------------------------------------------------------------------
r2102
spell Ulf H峮hammar's name right

------------------------------------------------------------------------
r2373
in do_credits(), display Florian König's name properly in UTF-8 mode;
since we can't dynamically set that element of the array to its UTF-8
equivalent when in UTF-8 mode, we have to use the ISO-8859-1 version and
pass every string in the credits through make_mbstring() to make sure
they're all UTF-8 (sigh)

------------------------------------------------------------------------
r2784
rework the credits handling to display Florian König's name properly
whether we're in a UTF-8 locale or not.  This requires a minor hack, but
it's better than requiring a massive function that we only use once

------------------------------------------------------------------------
r2898
Update French manpages by Jean-Philippe Guérard.

------------------------------------------------------------------------
r3924
Update French manpages by Jean-Philippe Guérard.

------------------------------------------------------------------------
r4181
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply

------------------------------------------------------------------------
r4182
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply

------------------------------------------------------------------------
r4208
in print_opt_full(), use strlenpt() instead of strlen(), so that tabs
are placed properly when displaying translated strings in UTF-8, as
found by Jean-Philippe Guérard

------------------------------------------------------------------------

The corrupted-looking entries are the ones where the log message is
incorrectly stored in ISO-8859-1.

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Danny Trebbien wrote on Tue, Nov 30, 2010 at 07:35:38 -0800:
> I suspect that when all is said and done, the version of Subversion
> that will contain the patch will be in the 1.7.x series.

Indeed.  We don't add new features in patch releases (1.6.0->1.6.x), and
I'm afraid that Danny's patches do qualify as "new features" in this
context...

(They add new API's and new --flags.)

Daniel

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Danny Trebbien wrote on Tue, Nov 30, 2010 at 07:35:38 -0800:
> I suspect that when all is said and done, the version of Subversion
> that will contain the patch will be in the 1.7.x series.

Indeed.  We don't add new features in patch releases (1.6.0->1.6.x), and
I'm afraid that Danny's patches do qualify as "new features" in this
context...

(They add new API's and new --flags.)

Daniel

RE: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Engebakken Geir <ge...@edb.com>.
Resolved now, seems the sync.c file is added in the Subversion source trunk (1.7?) , but is not not in any 1.6.x releases of the source, so that fooled us.



Geir

Note : All inquiries regarding Subversion, MKS and general Development servers should be directed to "EDB SourceControl System"


> -----Original Message-----
> From: Christopher Dobbs [mailto:christopher.dobbs@edb.com]
> Sent: 14. desember 2010 15:25
> To: users@subversion.apache.org
> Subject: Re: svnsync failure when syncing with a repository that used
> ISO-8859-1 for log messages
> 
> Danny Trebbien <dtrebbien <at> gmail.com> writes:
> 
> >
> > > I must be missing something here, is this patch applied to the
> current SVN
> source?
> > >
> > > I have downloaded SVN 1.6.15 source from Tigris.org, and cant find
> the
> changes in the source. Also this
> > patch is applied to a file : subversion/svnsync/sync.c but in my
> downloaded
> source there is only a file
> > main.c in the svnsync directory.
> > >
> > > We are for the moment stuck with this UTF-8 problem in one of our
> svn:ignore
> properties, and cant proceed
> > with seting up this sync, so any pointers are welcome!
> > >
> > > Geir
> >
> > Hi Geir,
> >
> Hi Danny,
> I am working with Geir Engebakken on this and we understand you won't
> release
> the patch now until 1.7.x - That's fine but I wanted to try out the
> patch myself
> now so I tried to run it in and it is looking for sync.c and sync.h
> which dont
> seem to exist in my source tree. I have downloaded the latest version
> 1.6.15
> ??
> 
> Any help is appreciated!!
> 
> -Chris
> 
> > No, this patch has not yet been applied to SVN trunk.  I have been
> > working with some of the Subversion devs (Daniel Shahaf, Gavin, and
> > Julian) to fix this problem in a series of commits.  We are currently
> > working on a patch to libsvn_subr-1 upon which my svnsync patch
> > depends, and it is very close to being applied to trunk, but hasn't
> > been committed yet.  I suspect that when all is said and done, the
> > version of Subversion that will contain the patch will be in the
> 1.7.x
> > series.
> >
> > In the meantime, the way that I got past the issue with syncing the
> > GNU Nano repository was to compile Subversion with the above patch
> > (the one that always recodes revprops into ISO-8859-1), use the
> > newly-compiled `svnsync` to sync the repository, revert to a standard
> > build of `svnsync`, and use `svnsync copy-revprops` to "go back" and
> > fix the revprops of revisions that really use UTF-8.  The idea here
> is
> > to pick an encoding (ISO-8859-1) that is partially compatible with
> > UTF-8 and in which any sequence of bytes is a "valid" string in that
> > encoding.  After syncing the entire repository in that encoding, any
> > revprops that are encoded in UTF-8 will be incorrect.  But, after
> > identifying specific revisions or ranges of revisions where revprops
> > are encoded in UTF-8, you "go back" and use the standard `svnsync
> > copy-revprops` for each of these revisions to correct them.
> >
> > Hope that helps,
> >
> > Danny
> >
> >
> 
> 
> 

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Christopher Dobbs <ch...@edb.com>.
Danny Trebbien <dtrebbien <at> gmail.com> writes:

> 
> > I must be missing something here, is this patch applied to the current SVN
source?
> >
> > I have downloaded SVN 1.6.15 source from Tigris.org, and cant find the
changes in the source. Also this
> patch is applied to a file : subversion/svnsync/sync.c but in my downloaded
source there is only a file
> main.c in the svnsync directory.
> >
> > We are for the moment stuck with this UTF-8 problem in one of our svn:ignore
properties, and cant proceed
> with seting up this sync, so any pointers are welcome!
> >
> > Geir
> 
> Hi Geir,
> 
Hi Danny,
I am working with Geir Engebakken on this and we understand you won't release
the patch now until 1.7.x - That's fine but I wanted to try out the patch myself
now so I tried to run it in and it is looking for sync.c and sync.h which dont
seem to exist in my source tree. I have downloaded the latest version 1.6.15
??

Any help is appreciated!!

-Chris

> No, this patch has not yet been applied to SVN trunk.  I have been
> working with some of the Subversion devs (Daniel Shahaf, Gavin, and
> Julian) to fix this problem in a series of commits.  We are currently
> working on a patch to libsvn_subr-1 upon which my svnsync patch
> depends, and it is very close to being applied to trunk, but hasn't
> been committed yet.  I suspect that when all is said and done, the
> version of Subversion that will contain the patch will be in the 1.7.x
> series.
> 
> In the meantime, the way that I got past the issue with syncing the
> GNU Nano repository was to compile Subversion with the above patch
> (the one that always recodes revprops into ISO-8859-1), use the
> newly-compiled `svnsync` to sync the repository, revert to a standard
> build of `svnsync`, and use `svnsync copy-revprops` to "go back" and
> fix the revprops of revisions that really use UTF-8.  The idea here is
> to pick an encoding (ISO-8859-1) that is partially compatible with
> UTF-8 and in which any sequence of bytes is a "valid" string in that
> encoding.  After syncing the entire repository in that encoding, any
> revprops that are encoded in UTF-8 will be incorrect.  But, after
> identifying specific revisions or ranges of revisions where revprops
> are encoded in UTF-8, you "go back" and use the standard `svnsync
> copy-revprops` for each of these revisions to correct them.
> 
> Hope that helps,
> 
> Danny
> 
> 




Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Danny Trebbien <dt...@gmail.com>.
> I must be missing something here, is this patch applied to the current SVN source?
>
> I have downloaded SVN 1.6.15 source from Tigris.org, and cant find the changes in the source. Also this patch is applied to a file : subversion/svnsync/sync.c but in my downloaded source there is only a file main.c in the svnsync directory.
>
> We are for the moment stuck with this UTF-8 problem in one of our svn:ignore properties, and cant proceed with seting up this sync, so any pointers are welcome!
>
> Geir

Hi Geir,

No, this patch has not yet been applied to SVN trunk.  I have been
working with some of the Subversion devs (Daniel Shahaf, Gavin, and
Julian) to fix this problem in a series of commits.  We are currently
working on a patch to libsvn_subr-1 upon which my svnsync patch
depends, and it is very close to being applied to trunk, but hasn't
been committed yet.  I suspect that when all is said and done, the
version of Subversion that will contain the patch will be in the 1.7.x
series.

In the meantime, the way that I got past the issue with syncing the
GNU Nano repository was to compile Subversion with the above patch
(the one that always recodes revprops into ISO-8859-1), use the
newly-compiled `svnsync` to sync the repository, revert to a standard
build of `svnsync`, and use `svnsync copy-revprops` to "go back" and
fix the revprops of revisions that really use UTF-8.  The idea here is
to pick an encoding (ISO-8859-1) that is partially compatible with
UTF-8 and in which any sequence of bytes is a "valid" string in that
encoding.  After syncing the entire repository in that encoding, any
revprops that are encoded in UTF-8 will be incorrect.  But, after
identifying specific revisions or ranges of revisions where revprops
are encoded in UTF-8, you "go back" and use the standard `svnsync
copy-revprops` for each of these revisions to correct them.

Hope that helps,

Danny

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Danny Trebbien <dt...@gmail.com>.
> I must be missing something here, is this patch applied to the current SVN source?
>
> I have downloaded SVN 1.6.15 source from Tigris.org, and cant find the changes in the source. Also this patch is applied to a file : subversion/svnsync/sync.c but in my downloaded source there is only a file main.c in the svnsync directory.
>
> We are for the moment stuck with this UTF-8 problem in one of our svn:ignore properties, and cant proceed with seting up this sync, so any pointers are welcome!
>
> Geir

Hi Geir,

No, this patch has not yet been applied to SVN trunk.  I have been
working with some of the Subversion devs (Daniel Shahaf, Gavin, and
Julian) to fix this problem in a series of commits.  We are currently
working on a patch to libsvn_subr-1 upon which my svnsync patch
depends, and it is very close to being applied to trunk, but hasn't
been committed yet.  I suspect that when all is said and done, the
version of Subversion that will contain the patch will be in the 1.7.x
series.

In the meantime, the way that I got past the issue with syncing the
GNU Nano repository was to compile Subversion with the above patch
(the one that always recodes revprops into ISO-8859-1), use the
newly-compiled `svnsync` to sync the repository, revert to a standard
build of `svnsync`, and use `svnsync copy-revprops` to "go back" and
fix the revprops of revisions that really use UTF-8.  The idea here is
to pick an encoding (ISO-8859-1) that is partially compatible with
UTF-8 and in which any sequence of bytes is a "valid" string in that
encoding.  After syncing the entire repository in that encoding, any
revprops that are encoded in UTF-8 will be incorrect.  But, after
identifying specific revisions or ranges of revisions where revprops
are encoded in UTF-8, you "go back" and use the standard `svnsync
copy-revprops` for each of these revisions to correct them.

Hope that helps,

Danny

RE: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Engebakken Geir <ge...@edb.com>.
I must be missing something here, is this patch applied to the current SVN source?

I have downloaded SVN 1.6.15 source from Tigris.org, and cant find the changes in the source. Also this patch is applied to a file : subversion/svnsync/sync.c but in my downloaded source there is only a file main.c in the svnsync directory.

We are for the moment stuck with this UTF-8 problem in one of our svn:ignore properties, and cant proceed with seting up this sync, so any pointers are welcome!



Geir

Note : All inquiries regarding Subversion, MKS and general Development servers should be directed to "EDB SourceControl System"


> -----Original Message-----
> From: Daniel Trebbien [mailto:dtrebbien@gmail.com]
> Sent: 9. september 2010 17:49
> To: Daniel Shahaf
> Cc: users@subversion.apache.org; dev@subversion.apache.org
> Subject: Re: svnsync failure when syncing with a repository that used
> ISO-8859-1 for log messages
> 
> On Thu, Sep 9, 2010 at 3:48 AM, Daniel Shahaf <d....@daniel.shahaf.name>
> wrote:
> > CC += dev@, and let me point you to our patch submission guidelines:
> > http://subversion.apache.org/docs/community-
> guide/general.html#patches
> >
> > Summary for dev@: allow svnsync to translate non-UTF-8 log messages
> to UTF-8.
> >
> > (more below)
> >
> > Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700:
> >> On Wed, Sep 8, 2010 at 4:39 PM, Daniel Trebbien
> <dt...@gmail.com> wrote:
> >> > I think that a call to `svn_subst_translate_string`
> >> > (http://svn.collab.net/svn-doxygen/svn__subst_8h.html#a29) needs
> to be
> >> > added in the `svnsync_normalize_revprops` function when `propname`
> is
> >> > "svn:log".
> >>
> >> After applying the following patch to `subversion/svnsync/sync.c`, I
> >> can confirm that the "svnsync: Error while replaying commit" error
> >> disappears, allowing the sync to complete, and that the problem is
> >> indeed a log message encoding issue:
> >> diff --git a/subversion/svnsync/sync.c b/subversion/svnsync/sync.c
> >> index 8f7be9e..c7a5e38 100644
> >> --- a/subversion/svnsync/sync.c
> >> +++ b/subversion/svnsync/sync.c
> >> @@ -114,6 +114,14 @@ svnsync_normalize_revprops(apr_hash_t
> *rev_props,
> >>                /* And count this. */
> >>                (*normalized_count)++;
> >>              }
> >> +
> >> +          if (strcmp(propname, "svn:log") == 0)
> >> +            {
> >
> > Should this use svn_prop_needs_translation()?
> 
> Though it does not appear so, the added lines are within the check for
> `svn_prop_needs_translation`.
> 
> >> +              svn_string_t *new_propval;
> >> +              SVN_ERR(svn_subst_translate_string(&new_propval,
> propval, "ISO-8859-1", pool));
> >> +              apr_hash_set(rev_props, propname,
> APR_HASH_KEY_STRING, propval = new_propval);
> >
> > Please, please, please, no assignment inside a function call.
>   ^^^^^^^^^
> 
> Fixed
> 
> >> +              (*normalized_count)++;
> >> +            }
> >>          }
> >>      }
> >>    return SVN_NO_ERROR;
> >>
> >>
> >> Here I hard-coded the encoding, but I think that the encoding to use
> >> should be retrieved from the Subversion config file. Now the
> questions
> >> are:
> >>
> >> 1. What is the best way to pass the `log-encoding` option of the
> >> [miscellany] section to the `svnsync_normalize_revprops` function?
> >>
> >
> > What is 'log-encoding' normally used for?  Only for recoding the
> commit
> > message supplied by an editor, right?  So I'm not sure we should use
> it
> > here, perhaps a new command-line option will be better.  (svnsync
> > $subcommand --source-encoding=%s)
> 
> I like the idea of adding a command line option, so I think that this
> is the way to go.
> 
> Do other properties need to be re-encoded, potentially? I was only
> thinking about `svn:log`, but perhaps other properties might need
> re-encoding as well. If not, then I think that the command line option
> should be more specific (e.g.: `--source-log-encoding=%s`).
> 
> > And either way, the default behaviour should be to translate nothing.
> > (If you always translate from latin1, you break syncs for people who,
> > correctly, used UTF-8 log messages the entire time.)
> 
> I agree. The default behavior should definitely be to not re-encode
> the log messages.
> 
> >> 2. Should `svnsync` always store the log messages in UTF-8 even
> though
> >> the original repository log message encoding is different?
> >>
> >
> > You have no choice on the matter: the RA API instructs you to supply
> it
> > a UTF-8 log message.  (IIRC, even the libsvn_client API expects an
> > already-UTF-8 message.)
> >
> >> 3. Should there be separate configuration options for the log
> message
> >> encoding of the source repository and the log message encoding of
> the
> >> destination repository?
> >
> > No, see (2).
> >
> 
> Another question:  which line of code raises the "svnsync: Error while
> replaying commit" error message? I would like to try to make this more
> helpful.

RE: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Engebakken Geir <ge...@edb.com>.
I must be missing something here, is this patch applied to the current SVN source?

I have downloaded SVN 1.6.15 source from Tigris.org, and cant find the changes in the source. Also this patch is applied to a file : subversion/svnsync/sync.c but in my downloaded source there is only a file main.c in the svnsync directory.

We are for the moment stuck with this UTF-8 problem in one of our svn:ignore properties, and cant proceed with seting up this sync, so any pointers are welcome!



Geir

Note : All inquiries regarding Subversion, MKS and general Development servers should be directed to "EDB SourceControl System"


> -----Original Message-----
> From: Daniel Trebbien [mailto:dtrebbien@gmail.com]
> Sent: 9. september 2010 17:49
> To: Daniel Shahaf
> Cc: users@subversion.apache.org; dev@subversion.apache.org
> Subject: Re: svnsync failure when syncing with a repository that used
> ISO-8859-1 for log messages
> 
> On Thu, Sep 9, 2010 at 3:48 AM, Daniel Shahaf <d....@daniel.shahaf.name>
> wrote:
> > CC += dev@, and let me point you to our patch submission guidelines:
> > http://subversion.apache.org/docs/community-
> guide/general.html#patches
> >
> > Summary for dev@: allow svnsync to translate non-UTF-8 log messages
> to UTF-8.
> >
> > (more below)
> >
> > Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700:
> >> On Wed, Sep 8, 2010 at 4:39 PM, Daniel Trebbien
> <dt...@gmail.com> wrote:
> >> > I think that a call to `svn_subst_translate_string`
> >> > (http://svn.collab.net/svn-doxygen/svn__subst_8h.html#a29) needs
> to be
> >> > added in the `svnsync_normalize_revprops` function when `propname`
> is
> >> > "svn:log".
> >>
> >> After applying the following patch to `subversion/svnsync/sync.c`, I
> >> can confirm that the "svnsync: Error while replaying commit" error
> >> disappears, allowing the sync to complete, and that the problem is
> >> indeed a log message encoding issue:
> >> diff --git a/subversion/svnsync/sync.c b/subversion/svnsync/sync.c
> >> index 8f7be9e..c7a5e38 100644
> >> --- a/subversion/svnsync/sync.c
> >> +++ b/subversion/svnsync/sync.c
> >> @@ -114,6 +114,14 @@ svnsync_normalize_revprops(apr_hash_t
> *rev_props,
> >>                /* And count this. */
> >>                (*normalized_count)++;
> >>              }
> >> +
> >> +          if (strcmp(propname, "svn:log") == 0)
> >> +            {
> >
> > Should this use svn_prop_needs_translation()?
> 
> Though it does not appear so, the added lines are within the check for
> `svn_prop_needs_translation`.
> 
> >> +              svn_string_t *new_propval;
> >> +              SVN_ERR(svn_subst_translate_string(&new_propval,
> propval, "ISO-8859-1", pool));
> >> +              apr_hash_set(rev_props, propname,
> APR_HASH_KEY_STRING, propval = new_propval);
> >
> > Please, please, please, no assignment inside a function call.
>   ^^^^^^^^^
> 
> Fixed
> 
> >> +              (*normalized_count)++;
> >> +            }
> >>          }
> >>      }
> >>    return SVN_NO_ERROR;
> >>
> >>
> >> Here I hard-coded the encoding, but I think that the encoding to use
> >> should be retrieved from the Subversion config file. Now the
> questions
> >> are:
> >>
> >> 1. What is the best way to pass the `log-encoding` option of the
> >> [miscellany] section to the `svnsync_normalize_revprops` function?
> >>
> >
> > What is 'log-encoding' normally used for?  Only for recoding the
> commit
> > message supplied by an editor, right?  So I'm not sure we should use
> it
> > here, perhaps a new command-line option will be better.  (svnsync
> > $subcommand --source-encoding=%s)
> 
> I like the idea of adding a command line option, so I think that this
> is the way to go.
> 
> Do other properties need to be re-encoded, potentially? I was only
> thinking about `svn:log`, but perhaps other properties might need
> re-encoding as well. If not, then I think that the command line option
> should be more specific (e.g.: `--source-log-encoding=%s`).
> 
> > And either way, the default behaviour should be to translate nothing.
> > (If you always translate from latin1, you break syncs for people who,
> > correctly, used UTF-8 log messages the entire time.)
> 
> I agree. The default behavior should definitely be to not re-encode
> the log messages.
> 
> >> 2. Should `svnsync` always store the log messages in UTF-8 even
> though
> >> the original repository log message encoding is different?
> >>
> >
> > You have no choice on the matter: the RA API instructs you to supply
> it
> > a UTF-8 log message.  (IIRC, even the libsvn_client API expects an
> > already-UTF-8 message.)
> >
> >> 3. Should there be separate configuration options for the log
> message
> >> encoding of the source repository and the log message encoding of
> the
> >> destination repository?
> >
> > No, see (2).
> >
> 
> Another question:  which line of code raises the "svnsync: Error while
> replaying commit" error message? I would like to try to make this more
> helpful.

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Trebbien wrote on Thu, Sep 09, 2010 at 08:48:43 -0700:
> On Thu, Sep 9, 2010 at 3:48 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700:
> >> +          if (strcmp(propname, "svn:log") == 0)
> >> +            {
> >
> > Should this use svn_prop_needs_translation()?
> 
> Though it does not appear so, the added lines are within the check for
> `svn_prop_needs_translation`.
> 

Okay.  But my point was really, "shouldn't the new block you're adding
apply to any prop that svn_prop_needs_translation() returns TRUE for,
rather than only for SVN_PROP_REVISION_LOG?"

> >> 1. What is the best way to pass the `log-encoding` option of the
> >> [miscellany] section to the `svnsync_normalize_revprops` function?
> >>
> >
> > What is 'log-encoding' normally used for?  Only for recoding the commit
> > message supplied by an editor, right?  So I'm not sure we should use it
> > here, perhaps a new command-line option will be better.  (svnsync
> > $subcommand --source-encoding=%s)
> 
> I like the idea of adding a command line option, so I think that this
> is the way to go.
> 
> Do other properties need to be re-encoded, potentially?

That's why I suggested svn_prop_needs_translation().  Have you read its
docstring?  

[[[
/** If @a prop_name requires that its value be stored as UTF8/LF in the
 * repository, then return @c TRUE.  Else return @c FALSE.  This is for
 * users of libsvn_client or libsvn_fs, since it their responsibility
 * to do this translation in both directions.  (See
 * svn_subst_translate_string()/svn_subst_detranslate_string() for
 * help with this task.)
 */
svn_boolean_t
svn_prop_needs_translation(const char *prop_name);
]]]

> I was only
> thinking about `svn:log`, but perhaps other properties might need
> re-encoding as well. If not, then I think that the command line option
> should be more specific (e.g.: `--source-log-encoding=%s`).
> 

svn:log is certainly the most common case.  I think we had very few
reports on users@ of this error occurring with other props (e.g.,
svn:author and svn:externals).

> Another question:  which line of code raises the "svnsync: Error while
> replaying commit" error message? I would like to try to make this more
> helpful.

If you build --enable-maintainer-mode, you should get the file-line
coordinates of where the error was raised (and most places that
returned it, too).

Failing that, there's a shortcut that works most of the time: look in
subversion/po/fr.po for "Error while replaying commit" :-)

Daniel

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Daniel Trebbien <dt...@gmail.com>.
On Thu, Sep 9, 2010 at 3:48 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> CC += dev@, and let me point you to our patch submission guidelines:
> http://subversion.apache.org/docs/community-guide/general.html#patches
>
> Summary for dev@: allow svnsync to translate non-UTF-8 log messages to UTF-8.
>
> (more below)
>
> Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700:
>> On Wed, Sep 8, 2010 at 4:39 PM, Daniel Trebbien <dt...@gmail.com> wrote:
>> > I think that a call to `svn_subst_translate_string`
>> > (http://svn.collab.net/svn-doxygen/svn__subst_8h.html#a29) needs to be
>> > added in the `svnsync_normalize_revprops` function when `propname` is
>> > "svn:log".
>>
>> After applying the following patch to `subversion/svnsync/sync.c`, I
>> can confirm that the "svnsync: Error while replaying commit" error
>> disappears, allowing the sync to complete, and that the problem is
>> indeed a log message encoding issue:
>> diff --git a/subversion/svnsync/sync.c b/subversion/svnsync/sync.c
>> index 8f7be9e..c7a5e38 100644
>> --- a/subversion/svnsync/sync.c
>> +++ b/subversion/svnsync/sync.c
>> @@ -114,6 +114,14 @@ svnsync_normalize_revprops(apr_hash_t *rev_props,
>>                /* And count this. */
>>                (*normalized_count)++;
>>              }
>> +
>> +          if (strcmp(propname, "svn:log") == 0)
>> +            {
>
> Should this use svn_prop_needs_translation()?

Though it does not appear so, the added lines are within the check for
`svn_prop_needs_translation`.

>> +              svn_string_t *new_propval;
>> +              SVN_ERR(svn_subst_translate_string(&new_propval, propval, "ISO-8859-1", pool));
>> +              apr_hash_set(rev_props, propname, APR_HASH_KEY_STRING, propval = new_propval);
>
> Please, please, please, no assignment inside a function call.           ^^^^^^^^^

Fixed

>> +              (*normalized_count)++;
>> +            }
>>          }
>>      }
>>    return SVN_NO_ERROR;
>>
>>
>> Here I hard-coded the encoding, but I think that the encoding to use
>> should be retrieved from the Subversion config file. Now the questions
>> are:
>>
>> 1. What is the best way to pass the `log-encoding` option of the
>> [miscellany] section to the `svnsync_normalize_revprops` function?
>>
>
> What is 'log-encoding' normally used for?  Only for recoding the commit
> message supplied by an editor, right?  So I'm not sure we should use it
> here, perhaps a new command-line option will be better.  (svnsync
> $subcommand --source-encoding=%s)

I like the idea of adding a command line option, so I think that this
is the way to go.

Do other properties need to be re-encoded, potentially? I was only
thinking about `svn:log`, but perhaps other properties might need
re-encoding as well. If not, then I think that the command line option
should be more specific (e.g.: `--source-log-encoding=%s`).

> And either way, the default behaviour should be to translate nothing.
> (If you always translate from latin1, you break syncs for people who,
> correctly, used UTF-8 log messages the entire time.)

I agree. The default behavior should definitely be to not re-encode
the log messages.

>> 2. Should `svnsync` always store the log messages in UTF-8 even though
>> the original repository log message encoding is different?
>>
>
> You have no choice on the matter: the RA API instructs you to supply it
> a UTF-8 log message.  (IIRC, even the libsvn_client API expects an
> already-UTF-8 message.)
>
>> 3. Should there be separate configuration options for the log message
>> encoding of the source repository and the log message encoding of the
>> destination repository?
>
> No, see (2).
>

Another question:  which line of code raises the "svnsync: Error while
replaying commit" error message? I would like to try to make this more
helpful.

Re: svnsync failure when syncing with a repository that used ISO-8859-1 for log messages

Posted by Daniel Trebbien <dt...@gmail.com>.
On Thu, Sep 9, 2010 at 3:48 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> CC += dev@, and let me point you to our patch submission guidelines:
> http://subversion.apache.org/docs/community-guide/general.html#patches
>
> Summary for dev@: allow svnsync to translate non-UTF-8 log messages to UTF-8.
>
> (more below)
>
> Daniel Trebbien wrote on Wed, Sep 08, 2010 at 18:58:06 -0700:
>> On Wed, Sep 8, 2010 at 4:39 PM, Daniel Trebbien <dt...@gmail.com> wrote:
>> > I think that a call to `svn_subst_translate_string`
>> > (http://svn.collab.net/svn-doxygen/svn__subst_8h.html#a29) needs to be
>> > added in the `svnsync_normalize_revprops` function when `propname` is
>> > "svn:log".
>>
>> After applying the following patch to `subversion/svnsync/sync.c`, I
>> can confirm that the "svnsync: Error while replaying commit" error
>> disappears, allowing the sync to complete, and that the problem is
>> indeed a log message encoding issue:
>> diff --git a/subversion/svnsync/sync.c b/subversion/svnsync/sync.c
>> index 8f7be9e..c7a5e38 100644
>> --- a/subversion/svnsync/sync.c
>> +++ b/subversion/svnsync/sync.c
>> @@ -114,6 +114,14 @@ svnsync_normalize_revprops(apr_hash_t *rev_props,
>>                /* And count this. */
>>                (*normalized_count)++;
>>              }
>> +
>> +          if (strcmp(propname, "svn:log") == 0)
>> +            {
>
> Should this use svn_prop_needs_translation()?

Though it does not appear so, the added lines are within the check for
`svn_prop_needs_translation`.

>> +              svn_string_t *new_propval;
>> +              SVN_ERR(svn_subst_translate_string(&new_propval, propval, "ISO-8859-1", pool));
>> +              apr_hash_set(rev_props, propname, APR_HASH_KEY_STRING, propval = new_propval);
>
> Please, please, please, no assignment inside a function call.           ^^^^^^^^^

Fixed

>> +              (*normalized_count)++;
>> +            }
>>          }
>>      }
>>    return SVN_NO_ERROR;
>>
>>
>> Here I hard-coded the encoding, but I think that the encoding to use
>> should be retrieved from the Subversion config file. Now the questions
>> are:
>>
>> 1. What is the best way to pass the `log-encoding` option of the
>> [miscellany] section to the `svnsync_normalize_revprops` function?
>>
>
> What is 'log-encoding' normally used for?  Only for recoding the commit
> message supplied by an editor, right?  So I'm not sure we should use it
> here, perhaps a new command-line option will be better.  (svnsync
> $subcommand --source-encoding=%s)

I like the idea of adding a command line option, so I think that this
is the way to go.

Do other properties need to be re-encoded, potentially? I was only
thinking about `svn:log`, but perhaps other properties might need
re-encoding as well. If not, then I think that the command line option
should be more specific (e.g.: `--source-log-encoding=%s`).

> And either way, the default behaviour should be to translate nothing.
> (If you always translate from latin1, you break syncs for people who,
> correctly, used UTF-8 log messages the entire time.)

I agree. The default behavior should definitely be to not re-encode
the log messages.

>> 2. Should `svnsync` always store the log messages in UTF-8 even though
>> the original repository log message encoding is different?
>>
>
> You have no choice on the matter: the RA API instructs you to supply it
> a UTF-8 log message.  (IIRC, even the libsvn_client API expects an
> already-UTF-8 message.)
>
>> 3. Should there be separate configuration options for the log message
>> encoding of the source repository and the log message encoding of the
>> destination repository?
>
> No, see (2).
>

Another question:  which line of code raises the "svnsync: Error while
replaying commit" error message? I would like to try to make this more
helpful.