You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Mark Phippard <ma...@gmail.com> on 2014/04/04 19:30:48 UTC

Script or process to fix old invalid data in repositories or dumps?

Loading a dump file into a new repository, and running into this:

svnadmin: E125005: Invalid property value found in dumpstream; consider
repairing the source or using --bypass-prop-validation while loading.
svnadmin: E125005: Cannot accept non-LF line endings in 'svn:log' property

So I realize I can use that option to force the file to load, but that is
just punting the problem to the future.  Has anyone ever written any
scripts that can run through an entire repository and fix these sort of
problems?  In this case, maybe a script that goes through a repos and
retrieves and then sets each revprop using the current command line?

Another problem I've seen is when the data is not UTF8.  I know you can use
svnsync to fix this problem by using the --source-prop-encoding ARG option.
 Are there any scripts to do this without doing an svnsync?

If not, does svnsync at least auto-fix the line-ending problem?
-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: Script or process to fix old invalid data in repositories or dumps?

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Mark Phippard wrote on Sat, Apr 05, 2014 at 10:58:59 -0400:
> On Sat, Apr 5, 2014 at 7:12 AM, Daniel Shahaf <d....@daniel.shahaf.name>wrote:
> 
> > Mark Phippard wrote on Fri, Apr 04, 2014 at 13:30:48 -0400:
> > > So I realize I can use that option to force the file to load, but that is
> > > just punting the problem to the future.  Has anyone ever written any
> > > scripts that can run through an entire repository and fix these sort of
> > > problems?  In this case, maybe a script that goes through a repos and
> > > retrieves and then sets each revprop using the current command line?
> >
> > That's easy enough:
> >
> > [[[
> > #!/usr/bin/env zsh
> > #
> > # Renormalize svn:* revprops in repository $1 (local path).
> > #
> > set -e
> > [ $# -eq 1 ] && [ -d "$1" ] || { echo "Usage: $0 REPOS" >&2; exit 1 }
> > REPOS=$1
> > ABS_REPOS=`cd -- "$REPOS" && pwd`
> > for revnum in {0..`svnlook youngest -- "$REPOS"`};
> >   for prop in $(svnlook proplist -r$revnum --revprop -- "$REPOS" | cut
> > -c3- | grep '^svn:');
> >     svnadmin setrevprop -r$revnum -- "$REPOS" "$prop" =(svn propget
> > --strict --revprop -r$revnum -- "$prop" "file://$ABS_REPOS")
> > ]]]
> >
> 
> Thanks.  I first ran svnsync and that worked and it said that it fixed 3
> revision properties.
> 
> I then went back and ran your script on the source repository.  This is
> what it output:
> 
> svnlook: E135000: Inconsistent line ending style
> svnlook: E135000: Inconsistent line ending style
> svnlook: E135000: Inconsistent line ending style
> 
> Do you think that means the script would have fixed those three properties,
> or did svnlook fail to list them because of the line endings?

Probably the latter.  You could work around it by changing '$(svnlook
proplist ...)' to 'svn:log' and 'svn propget' to 'svnlook propget'
(which will create empty log messages on revisions that have no svn:log
property).  It works for me on "foo\r\nbar\n"; if it fails on your data,
using the FS API directly (svn_fs_revision_proplist() and
svn_fs_revision_prop()) will definitely work, because the validation is
done in libsvn_repos.

Separately, I'm thinking we should call that a bug in svnlook.  I mean,
yes, some property *values* break a semantic repos-layer invariant, but
it would still be nice to allow 'svnlook proplist' to succeed, and not
prevent querying of innocent bystander properties' names...

Daniel

Re: Script or process to fix old invalid data in repositories or dumps?

Posted by Mark Phippard <ma...@gmail.com>.
On Sat, Apr 5, 2014 at 7:12 AM, Daniel Shahaf <d....@daniel.shahaf.name>wrote:

> Mark Phippard wrote on Fri, Apr 04, 2014 at 13:30:48 -0400:
> > So I realize I can use that option to force the file to load, but that is
> > just punting the problem to the future.  Has anyone ever written any
> > scripts that can run through an entire repository and fix these sort of
> > problems?  In this case, maybe a script that goes through a repos and
> > retrieves and then sets each revprop using the current command line?
>
> That's easy enough:
>
> [[[
> #!/usr/bin/env zsh
> #
> # Renormalize svn:* revprops in repository $1 (local path).
> #
> set -e
> [ $# -eq 1 ] && [ -d "$1" ] || { echo "Usage: $0 REPOS" >&2; exit 1 }
> REPOS=$1
> ABS_REPOS=`cd -- "$REPOS" && pwd`
> for revnum in {0..`svnlook youngest -- "$REPOS"`};
>   for prop in $(svnlook proplist -r$revnum --revprop -- "$REPOS" | cut
> -c3- | grep '^svn:');
>     svnadmin setrevprop -r$revnum -- "$REPOS" "$prop" =(svn propget
> --strict --revprop -r$revnum -- "$prop" "file://$ABS_REPOS")
> ]]]
>

Thanks.  I first ran svnsync and that worked and it said that it fixed 3
revision properties.

I then went back and ran your script on the source repository.  This is
what it output:

svnlook: E135000: Inconsistent line ending style
svnlook: E135000: Inconsistent line ending style
svnlook: E135000: Inconsistent line ending style

Do you think that means the script would have fixed those three properties,
or did svnlook fail to list them because of the line endings?


-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: Script or process to fix old invalid data in repositories or dumps?

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Mark Phippard wrote on Fri, Apr 04, 2014 at 13:30:48 -0400:
> So I realize I can use that option to force the file to load, but that is
> just punting the problem to the future.  Has anyone ever written any
> scripts that can run through an entire repository and fix these sort of
> problems?  In this case, maybe a script that goes through a repos and
> retrieves and then sets each revprop using the current command line?

That's easy enough:

[[[
#!/usr/bin/env zsh
#
# Renormalize svn:* revprops in repository $1 (local path).
#
set -e
[ $# -eq 1 ] && [ -d "$1" ] || { echo "Usage: $0 REPOS" >&2; exit 1 }
REPOS=$1
ABS_REPOS=`cd -- "$REPOS" && pwd`
for revnum in {0..`svnlook youngest -- "$REPOS"`};
  for prop in $(svnlook proplist -r$revnum --revprop -- "$REPOS" | cut -c3- | grep '^svn:');
    svnadmin setrevprop -r$revnum -- "$REPOS" "$prop" =(svn propget --strict --revprop -r$revnum -- "$prop" "file://$ABS_REPOS")
]]]

(It should be easy enough to convert this to plain sh --- all the {} and
=() parts are just syntactic sugar.)

That doesn't fix node properties, but it cannot do this on an existing
repository.  svnsync does fix nodeprops.

In my testing, the 'svnlook proplist' part errors out when a property
has mixed EOLs (both CRLF and LF in a single property value).  I don't
have a recent 1.8/1.9-dev build to test with to see if that issue persists.

> Another problem I've seen is when the data is not UTF8.  I know you can use
> svnsync to fix this problem by using the --source-prop-encoding ARG option.
>  Are there any scripts to do this without doing an svnsync?

I imagine you can just take the above script and change:

    =(svn propget --strict ...)

to

    =(svn propget --strict ... | iconv -f iso-8859-1 -t utf-8)

I'm not sure whether config:miscellany:log-encoding needs to be unset,
since I expect --strict mode to ignore it.

> If not, does svnsync at least auto-fix the line-ending problem?

Yep:

[[[
% svnsync init file://$PWD/{r2,r}
Copied properties for revision 0.
NOTE: Normalized svn:* properties to LF line endings (1 rev-props, 0 node-props).
]]]

This behaviour is in 1.7, probably earlier too.

Daniel

P.S. =(cmd) is shorthand for 'a plain file whose contents is the output of cmd'.