You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Marc Lustig <ml...@marclustig.com> on 2009/09/07 14:49:32 UTC

explaining syntax-check hook-script

Could somebody explain what exactly the pre-commit hook-script

http://svn.collab.net/repos/svn/trunk/contrib/hook-scripts/syntax-check.sh

is doing?

The comments state
# This script provides language independant syntax checking
# functionality intended to be invoked from a subversion pre-commit
# hook.

What rules are used for the syntax?
What will be ensured if the script is passed successfully?

We have the requirement to ensure that log-messages as well as file-names
and the contents of character-based files are properly encoded, i .e. in
UTF-8.
Anybody knows a script to do this job?
-- 
View this message in context: http://www.nabble.com/explaining-syntax-check-hook-script-tp25331704p25331704.html
Sent from the Subversion Users mailing list archive at Nabble.com.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2391893

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: explaining syntax-check hook-script

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Ryan Schmidt wrote on Mon, 7 Sep 2009 at 16:43 -0500:
> On Sep 7, 2009, at 16:39, Daniel Shahaf wrote:
> > Filenames are validated in libsvn_fs (fs-loader.c:path_valid()), log
> > messages in libsvn_repos (fs-wrap.c:validate_prop()).  Only the former
> > set of checks applies during 'svnadmin load', but both sets apply during
> > commits and svnsync's.
> 
> So... it would be possible to "svnadmin load" something with non-UTF-8 log
> messages, and then you wouldn't be able to "svnsync" that repository?

Yes.  (Until 1.6.3, the same was true for non-LF log messages.)

> Are there plans to fix this?

Not that I know of.  But the workaround is to edit the log messages
(after the 'svnadmin load') into valid UTF-8 (and LF newlines).  We
(cmpilato) did that in the svn repository.

FWIW, separating the checks (of filenames v. log messages) is by
design (e.g., the semantics of svn:* props do not belong in the
filesystem layer).

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392174

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: explaining syntax-check hook-script

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Sep 7, 2009, at 16:39, Daniel Shahaf wrote:

> Ryan Schmidt wrote on Mon, 7 Sep 2009 at 15:26 -0500:
>> On Sep 7, 2009, at 09:49, Marc Lustig wrote:
>>> We have the requirement to ensure that log-messages as well as file-
>>> names and the contents of character-based files are properly
>>> encoded,  i .e. in UTF-8.  Anybody knows a script to do this job?
>>
>> For filenames and log message, Subversion should be checking this
>> already. It should not be possible for non-UTF-8 filenames or log
>> messages to end up in the repository. Though I think there may be  
>> some
>> cases where it can happen, for example when loading a dumpfile via
>> svnadmin load? There seem to be more lax checks in place there than
>> with the "svn commit" or "svn import" mechanisms.
>
> Filenames are validated in libsvn_fs (fs-loader.c:path_valid()), log
> messages in libsvn_repos (fs-wrap.c:validate_prop()).  Only the former
> set of checks applies during 'svnadmin load', but both sets apply  
> during
> commits and svnsync's.

So... it would be possible to "svnadmin load" something with non-UTF-8  
log messages, and then you wouldn't be able to "svnsync" that  
repository? Are there plans to fix this?

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392165

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: explaining syntax-check hook-script

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Ryan Schmidt wrote on Mon, 7 Sep 2009 at 15:26 -0500:
> On Sep 7, 2009, at 09:49, Marc Lustig wrote:
> > We have the requirement to ensure that log-messages as well as file-
> > names and the contents of character-based files are properly
> > encoded,  i .e. in UTF-8.  Anybody knows a script to do this job?
> 
> For filenames and log message, Subversion should be checking this  
> already. It should not be possible for non-UTF-8 filenames or log  
> messages to end up in the repository. Though I think there may be some  
> cases where it can happen, for example when loading a dumpfile via  
> svnadmin load? There seem to be more lax checks in place there than  
> with the "svn commit" or "svn import" mechanisms.

Filenames are validated in libsvn_fs (fs-loader.c:path_valid()), log
messages in libsvn_repos (fs-wrap.c:validate_prop()).  Only the former
set of checks applies during 'svnadmin load', but both sets apply during
commits and svnsync's.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392164

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: explaining syntax-check hook-script

Posted by Marc Lustig <ml...@marclustig.com>.
Marc Lustig wrote:
> 
> problem scenario 2: svnsync operates, but it creates "empty" commit's! i.
> e., all of the revisions are created, but they do not have any content. (I
> am not quite sure if this is related to some encoding problem, thou. It
> might be a compatibility issue between different SVN-versions.)
> 

Let me add something regarding this problem scenario 2.
The reason why I am suspicious that the problem is related to character
encoding is that SOME of our repos could be mirrored using svnsync without
any issues.
Other repos, using the same infrastructure (master instance, slave
instance), are NOT synced properly (content is empty as described above).
So, our educated reason tells us, there must be a certain difference between
those repos which do sync successfully and those which do not.
It could be a permission problem, but I assume that this would be reported
by svnsync.
So the only difference that is left is the content of the repo, right?!
And this strongly suggests an encoding-related issue.

Any input on this matter is appreciated.
-- 
View this message in context: http://www.nabble.com/explaining-syntax-check-hook-script-tp25331704p25341363.html
Sent from the Subversion Users mailing list archive at Nabble.com.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392270

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

RE: Repository cleanup

Posted by Bob Archer <bo...@amsi.com>.
> > If you mean "get rid of" so it isn't in the repository any more...
> no.
> > If you dump, fiilter and reload all of your history for that path
> will
> > be gone.
> It should no longer be in the repository (space) and I do not need any
> history for what is gone.
> What I'm not sure about, is how this would ten affect the other folders
> and
> especially the trunk.
> Thank you,
> -Dieter

It will only affect the path you filter out. Give it a try... if something goes wrong you can always go back to the original dump. You can also load the filtered dump to a new repo to check it out before you do it for real.

BOb

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392646

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].


RE: Repository cleanup

Posted by dieter oberkofler <do...@gmail.com>.
> If you mean "get rid of" so it isn't in the repository any more... no.
> If you dump, fiilter and reload all of your history for that path will
> be gone.
It should no longer be in the repository (space) and I do not need any
history for what is gone.
What I'm not sure about, is how this would ten affect the other folders and
especially the trunk.
Thank you,
-Dieter

> -----Original Message-----
> From: Bob Archer [mailto:Bob.Archer@amsi.com]
> Sent: Tuesday, September 08, 2009 22:08
> To: dieter oberkofler; users@subversion.tigris.org
> Subject: RE: Repository cleanup
> 
> > We are using a pretty standard repository setup with a folder
> > branches/private that contains the (private and temporary) branches
> > that
> > will later be merged into the main trunk.
> > Is there a save way to get rid of the no longer used (private)
> branches
> > without losing any of the history information in trunk?
> > Thank you,
> > -Dieter
> 
> If you mean "get rid of" so they don't show up in the HEAD revision,
> sure, just deleted them using svn delete. The folder will still be
> there and all the history will remain.
> 
> If you mean "get rid of" so it isn't in the repository any more... no.
> If you dump, fiilter and reload all of your history for that path will
> be gone.
> 
> BOb

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392645

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

RE: Repository cleanup

Posted by Bob Archer <bo...@amsi.com>.
> We are using a pretty standard repository setup with a folder
> branches/private that contains the (private and temporary) branches
> that
> will later be merged into the main trunk.
> Is there a save way to get rid of the no longer used (private) branches
> without losing any of the history information in trunk?
> Thank you,
> -Dieter

If you mean "get rid of" so they don't show up in the HEAD revision, sure, just deleted them using svn delete. The folder will still be there and all the history will remain.

If you mean "get rid of" so it isn't in the repository any more... no. If you dump, fiilter and reload all of your history for that path will be gone.

BOb

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392638

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].


RE: Repository cleanup

Posted by dieter oberkofler <do...@gmail.com>.
The idea is to get rid of the no longer needed branches in the repository
itself and reducing the time of our backup operation that by itself dumps
the repository.

Using dump, filter and load looks fine but I'm still a little unsecure if
this might affect the continuity of my log history in any other folders of
the repository?

Thank you,

-Dieter

 

 

From: David Weintraub [mailto:qazwart@gmail.com] 
Sent: Tuesday, September 08, 2009 18:32
To: dieter oberkofler
Subject: Re: Repository cleanup

 

The only real way to delete old information in Subversion is to do a
"svnadmin dump", use svnfilter to remove the branches, create a new clean
repository and, do a "svnadmin load".

You can do a "svn delete" on the branch  which will remove the branch from
the head of your repository tree and thus render it pretty invisible.
However, the data is still in the repository, and can still be restored to
the head.

On Tue, Sep 8, 2009 at 11:41 AM, dieter oberkofler
<do...@gmail.com> wrote:

We are using a pretty standard repository setup with a folder
branches/private that contains the (private and temporary) branches that
will later be merged into the main trunk.
Is there a save way to get rid of the no longer used (private) branches
without losing any of the history information in trunk?
Thank you,
-Dieter

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065
<http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2
392540> &dsMessageId=2392540

To unsubscribe from this discussion, e-mail:
[users-unsubscribe@subversion.tigris.org].




-- 
David Weintraub
qazwart@gmail.com

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392559

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Repository cleanup

Posted by dieter oberkofler <do...@gmail.com>.
We are using a pretty standard repository setup with a folder
branches/private that contains the (private and temporary) branches that
will later be merged into the main trunk.
Is there a save way to get rid of the no longer used (private) branches
without losing any of the history information in trunk?
Thank you,
-Dieter

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392540

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: explaining syntax-check hook-script

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Marc Lustig wrote on Tue, 8 Sep 2009 at 00:05 -0700:
> This is my understanding of the process:
> when checking in a file, Subversion recognizes the format that the fle is
> encoded and saves this information somewhere. Then the file is transformed
> into UTF-8 and placed in the repo.
> When the file is checked out again, the file is re-encoded in the original
> format.
> Is that correct?
> 

As far as I know:

No.  The *only* transformations done on files are line-ending normalization
(if svn:eol-style is set) and $Keyword$ expansion (if svn:keywords is set).
We do not recode files, not even if 'svn:mime-type' is set.

But I suppose we do recode log messages at commit time if log-encoding 
is set in your config file.

> Another question: is there any requirement of Subversion regarding the
> encoding of files when they are checked in?
> 

Subversion treats files as opaque binary strings, unless you tell it
otherwise (see above).

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392361

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: explaining syntax-check hook-script

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Marc Lustig wrote on Tue, 8 Sep 2009 at 00:05 -0700:
> problem scenario 1: svnsync rejects to operate because it cannot handle some
> svn:log properties. You can fix this by editing those messages.
> 

Usually svnsync works fine and it is the destination repository which 
rejects the commit that svnsync attempts to make.

> problem scenario 2: svnsync operates, but it creates "empty" commit's! i.
> e., all of the revisions are created, but they do not have any content. (I
> am not quite sure if this is related to some encoding problem, thou. It
> might be a compatibility issue between different SVN-versions.)
> 

Could be authz related (you see an empty commit when you don't have
permission to see the paths modified in that commit).

> problem scenario 3: user A commits a file; user B checks the file out and
> apprently special chars are "spoiled", they are not re-encoded as they have
> been encoded before.
> 

Are you sure it's not that one user's tools assume a different encoding
than the other user's?

Try to check the raw file.  (e.g., use md5sum or a hex editor.)

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392366

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: explaining syntax-check hook-script

Posted by Marc Lustig <ml...@marclustig.com>.
Ryan Schmidt-60 wrote:
> 
> On Sep 7, 2009, at 09:49, Marc Lustig wrote:
> 
>> Could somebody explain what exactly the pre-commit hook-script
>>
>> http://svn.collab.net/repos/svn/trunk/contrib/hook-scripts/syntax-check.sh
>>
>> is doing?
> 
> Sadly it does not appear to be described here:
> 
> http://subversion.tigris.org/tools_contrib.html
> 
> 
>> The comments state
>> # This script provides language independant syntax checking
>> # functionality intended to be invoked from a subversion pre-commit
>> # hook.
>>
>> What rules are used for the syntax?
>> What will be ensured if the script is passed successfully?
> 
> As shipped, it looks like the script only checks the syntax of PHP  
> files. I believe you are meant to modify the following variables if  
> you want it to check a different language:
> 
> FPATTERN="\.\(php\|phpt\)$"
> FLANG="PHP"
> SYNTAX_CMD="php"
> SYNTAX_ARGS="-l"
> 
> 
>> We have the requirement to ensure that log-messages as well as file- 
>> names
>> and the contents of character-based files are properly encoded,  
>> i .e. in
>> UTF-8.
>> Anybody knows a script to do this job?
> 
> For filenames and log message, Subversion should be checking this  
> already. It should not be possible for non-UTF-8 filenames or log  
> messages to end up in the repository. Though I think there may be some  
> cases where it can happen, for example when loading a dumpfile via  
> svnadmin load? There seem to be more lax checks in place there than  
> with the "svn commit" or "svn import" mechanisms.
> ...
> 

We are using SVN 1.5.2 and apparrently we had trouble with log-messages,
file-names and file-contents.

problem scenario 1: svnsync rejects to operate because it cannot handle some
svn:log properties. You can fix this by editing those messages.

problem scenario 2: svnsync operates, but it creates "empty" commit's! i.
e., all of the revisions are created, but they do not have any content. (I
am not quite sure if this is related to some encoding problem, thou. It
might be a compatibility issue between different SVN-versions.)

problem scenario 3: user A commits a file; user B checks the file out and
apprently special chars are "spoiled", they are not re-encoded as they have
been encoded before.

This is my understanding of the process:
when checking in a file, Subversion recognizes the format that the fle is
encoded and saves this information somewhere. Then the file is transformed
into UTF-8 and placed in the repo.
When the file is checked out again, the file is re-encoded in the original
format.
Is that correct?

Another question: is there any requirement of Subversion regarding the
encoding of files when they are checked in?

One more: does anybody have experience using the dos2unix-tool in the
pre-commit hook?

best regards.
-- 
View this message in context: http://www.nabble.com/explaining-syntax-check-hook-script-tp25331704p25340854.html
Sent from the Subversion Users mailing list archive at Nabble.com.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392258

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: explaining syntax-check hook-script

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Sep 7, 2009, at 09:49, Marc Lustig wrote:

> Could somebody explain what exactly the pre-commit hook-script
>
> http://svn.collab.net/repos/svn/trunk/contrib/hook-scripts/syntax-check.sh
>
> is doing?

Sadly it does not appear to be described here:

http://subversion.tigris.org/tools_contrib.html


> The comments state
> # This script provides language independant syntax checking
> # functionality intended to be invoked from a subversion pre-commit
> # hook.
>
> What rules are used for the syntax?
> What will be ensured if the script is passed successfully?

As shipped, it looks like the script only checks the syntax of PHP  
files. I believe you are meant to modify the following variables if  
you want it to check a different language:

FPATTERN="\.\(php\|phpt\)$"
FLANG="PHP"
SYNTAX_CMD="php"
SYNTAX_ARGS="-l"


> We have the requirement to ensure that log-messages as well as file- 
> names
> and the contents of character-based files are properly encoded,  
> i .e. in
> UTF-8.
> Anybody knows a script to do this job?

For filenames and log message, Subversion should be checking this  
already. It should not be possible for non-UTF-8 filenames or log  
messages to end up in the repository. Though I think there may be some  
cases where it can happen, for example when loading a dumpfile via  
svnadmin load? There seem to be more lax checks in place there than  
with the "svn commit" or "svn import" mechanisms.

To verify file contents are UTF-8, there is probably a script that  
exists, to which you could point the syntax-check.sh script via the  
above-mentioned variables.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2392142

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].