You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Swen Thuemmler <sw...@mediaways.net> on 2002/07/19 10:59:55 UTC

[Patch] cvs2svn.py

Hi all,

when playing with cvs2svn.py I had some problems with the handling of
CVS log messages. Only the last message of a changeset was incorporated
in the svn commit. The enclosed patch fixes this. Now each log message
from CVS is logged in Subversion together with the affected files (there
might be the same message for some files and a different message for
some other files). Additionally, the files get markers to be able to
see whether it was a deletion, addition or modification.

For example:

M create.sql (1.2)
M foo.c (1.4)
A bar.c (1.1)
M lib/tools.c (1.3)
  funny logmessage
D context.sql (1.2)
D drop.sql (1.2)
  No longer needed
A sink.sql (1.1)
  Here is all the hot stuff

This is probably a horrible hack for any decent python programmer (it is my
first python code), and there might be better ways to implement this, but
it does at least work. It would be nice, when some variation of this patch
could get committed. I think it is important not to lose the version history when
switching from CVS to Subversion.

Comments?

--Swen

Re: [Patch] cvs2svn.py

Posted by Swen Thuemmler <sw...@mediaways.net>.
On Fri, Jul 19, 2002 at 08:57:03PM +0200, Sander Striker wrote:

> Ok, Swen, what are you seeing that you needed this patch?

Well, I'm seeing missing log messages. You can verify this easily:

1. create a new repository:
   cvs -d /var/tmp/cvs init
2. create empty directory and import it:
   mkdir /var/tmp/svntest
   cd /var/tmp/svntest
   cvs -d /var/tmp/cvs import cvstest vendortag releasetag
3. check out the created directory
   cd /var/tmp
   cvs -d /var/tmp/cvs co cvstest
4. add some files
   cd cvstest
   cp /dev/null foo
   cp /dev/null bar
   cp /dev/null baz
   cvs add foo bar baz
5. commit the files with different messages
   cvs commit -m 'added foo' foo
   cvs commit -m 'added bar' bar
   cvs commit -m 'added baz' baz
6. convert the cvs repository
   cd /path/to/cvs2svn.py
   ./cvs2svn.py -v -s /var/tmp/svn/ /var/tmp/cvs/cvstest
7. checkout the new subversion repository
   cd /var/tmp
   svn co file:///var/tmp/svn svntest

8. check log messages
   cd /var/tmp/svntest
   svn log

Without my patch, only one message is there (but all three files):
------------------------------------------------------------------------
rev 1:  swen | 2002-07-20 15:55:24 +0200 (Sat, 20 Jul 2002) | 2 lines

added foo

------------------------------------------------------------------------

With my patch, all three messages are there:
------------------------------------------------------------------------
rev 1:  swen | 2002-07-20 15:55:24 +0200 (Sat, 20 Jul 2002) | 7 lines

A baz (1.1):
  added baz
A foo (1.1):
  added foo
A bar (1.1):
  added bar

------------------------------------------------------------------------

So clearly messages are lost. Btw, I think it is common for CVS users to provide
different log messages when checking in different files, even when the changes
together are one change set logically. I do this myself sometimes, when I have
a $Log$ entry in my source files and do not want unrelated log messages from other
files get in there (or on huge log message). So often one logical changeset has
different log messages...

Hope this helps,

--Swen




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [Patch] cvs2svn.py

Posted by Scott Lamb <sl...@slamb.org>.
Greg Stein wrote:
> On Fri, Jul 19, 2002 at 03:16:51PM +0200, Sander Striker wrote:
>>Odd, if this is happening, I'll deflect my question to Greg: why are
>>commits with a different log msg identified as being in the same
>>changeset?
> 
> *if* being the operative word. I cannot believe that we would be combining
> things with different messages into a single commit. The identifying key is
> a combination of the author and log message. A second use of that key must
> occur within a five minute window to be combined into the commit with the
> first use. The time lag is because the commits are not atomic -- they will
> span periods of time, based on the speed of the user's connection and what
> they are uploading/committing. When converting the mod_dav repository, the
> commits have spanned no more than a few seconds.

I'm seeing this happening - same author, same five-minute span, 
different log message collapsed into a single commit. What's worse is 
that it's two revisions of the _same_file_. This is with a dry-run - I 
don't know what would happen if it actually tried to commit that. (Maybe 
segfault - that's what it's doing now. Not sure if it's for that 
reason.) Output looks like this:

     committing: Wed Jul  3 21:02:50 2002, over 193 seconds
         changing 1.13 : file1
         changing 1.14 : file2
         changing 1.15 : file2
         (skipped; dry run enabled)

The first two belong, the third doesn't.

I noticed that all the digests in cvs2svn-data.resync were the same. 
Could that be why it isn't telling them apart based on log messages? 
(What does this file do, anyway? I'm not very clear on the difference 
between it and cvs2svn-data.revs, etc. The comments confused me.)

The digests in cvs2svn-data.revs seemed reasonable for this repository. 
A few digests occurred many times, many digests occurred 1-3 times. 
Since it's a bunch of very small projects, that makes sense. (The ones 
that occurred many times were probably imports or something.)

--
Scott Lamb


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [Patch] cvs2svn.py

Posted by Sander Striker <st...@apache.org>.
> From: Greg Stein [mailto:gstein@lyra.org]
> Sent: 19 July 2002 20:46

> On Fri, Jul 19, 2002 at 03:16:51PM +0200, Sander Striker wrote:
> > > From: Swen Thuemmler [mailto:swen@mediaways.net]
> > > Sent: 19 July 2002 15:02
> > 
> > > On Fri, Jul 19, 2002 at 03:05:33PM +0200, Sander Striker wrote:
> > > 
> > > > Question: How do you determine if multiple commits are in the same
> > > > changeset when their commit msg differs?  I mean, mostly a cvs commit
> > > > of multiple files will have the same commit msg.  When are the commit
> > > > msgs ever different _and_ still detectably in the same changeset?
> > > 
> > > As I did understand, the changeset is identified by a time frame
> > > (initially 5 minutes I think) and possibly by the author of the changes. I
> >...
> > > I just noticed the missing log messages and tried to put them back.
> > 
> > Odd, if this is happening, I'll deflect my question to Greg: why are
> > commits with a different log msg identified as being in the same
> > changeset?
> 
> *if* being the operative word. I cannot believe that we would be combining
> things with different messages into a single commit. The identifying key is
> a combination of the author and log message. A second use of that key must
> occur within a five minute window to be combined into the commit with the
> first use. The time lag is because the commits are not atomic -- they will
> span periods of time, based on the speed of the user's connection and what
> they are uploading/committing. When converting the mod_dav repository, the
> commits have spanned no more than a few seconds.

Ok, Swen, what are you seeing that you needed this patch?

Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [Patch] cvs2svn.py

Posted by Greg Stein <gs...@lyra.org>.
On Fri, Jul 19, 2002 at 03:16:51PM +0200, Sander Striker wrote:
> > From: Swen Thuemmler [mailto:swen@mediaways.net]
> > Sent: 19 July 2002 15:02
> 
> > On Fri, Jul 19, 2002 at 03:05:33PM +0200, Sander Striker wrote:
> > 
> > > Question: How do you determine if multiple commits are in the same
> > > changeset when their commit msg differs?  I mean, mostly a cvs commit
> > > of multiple files will have the same commit msg.  When are the commit
> > > msgs ever different _and_ still detectably in the same changeset?
> > 
> > As I did understand, the changeset is identified by a time frame
> > (initially 5 minutes I think) and possibly by the author of the changes. I
>...
> > I just noticed the missing log messages and tried to put them back.
> 
> Odd, if this is happening, I'll deflect my question to Greg: why are
> commits with a different log msg identified as being in the same
> changeset?

*if* being the operative word. I cannot believe that we would be combining
things with different messages into a single commit. The identifying key is
a combination of the author and log message. A second use of that key must
occur within a five minute window to be combined into the commit with the
first use. The time lag is because the commits are not atomic -- they will
span periods of time, based on the speed of the user's connection and what
they are uploading/committing. When converting the mod_dav repository, the
commits have spanned no more than a few seconds.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [Patch] cvs2svn.py

Posted by cm...@collab.net.
"Sander Striker" <st...@apache.org> writes:

> > I just noticed the missing log messages and tried to put them back.
> 
> Odd, if this is happening, I'll deflect my question to Greg: why are
> commits with a different log msg identified as being in the same
> changeset?

I wonder if, since CVS has no atomic commits anyway, some developers
might have been in the habit of doing per-file commits, each file with
its own log message?  I can see a simple script parsing `cvs up' for
a local mods list, crawling that list doing something like:

   for file in `get_local_mods`; do
     cvs ci -F ${file}.log_msg ${file}
   done


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [Patch] cvs2svn.py

Posted by Sander Striker <st...@apache.org>.
> From: Swen Thuemmler [mailto:swen@mediaways.net]
> Sent: 19 July 2002 15:02

> On Fri, Jul 19, 2002 at 03:05:33PM +0200, Sander Striker wrote:
> 
> > Question: How do you determine if multiple commits are in the same changeset when
> > their commit msg differs?  I mean, mostly a cvs commit of multiple files will have
> > the same commit msg.  When are the commit msgs ever different _and_ still
> > detectably in the same changeset?
> 
> As I did understand, the changeset is identified by a time frame
> (initially 5 minutes I think) and possibly by the author of the changes. I
> did not write the logic to identify changesets,

I know :)  Greg did.

> I just noticed the missing log messages and tried to put them back.

Odd, if this is happening, I'll deflect my question to Greg: why are commits with
a different log msg identified as being in the same changeset?

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [Patch] cvs2svn.py

Posted by Swen Thuemmler <sw...@mediaways.net>.
On Fri, Jul 19, 2002 at 03:05:33PM +0200, Sander Striker wrote:

> Question: How do you determine if multiple commits are in the same changeset when
> their commit msg differs?  I mean, mostly a cvs commit of multiple files will have
> the same commit msg.  When are the commit msgs ever different _and_ still
> detectably in the same changeset?

As I did understand, the changeset is identified by a time frame
(initially 5 minutes I think) and possibly by the author of the changes. I
did not write the logic to identify changesets, I just noticed the
missing log messages and tried to put them back.

Greetings, Swen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [Patch] cvs2svn.py

Posted by Sander Striker <st...@apache.org>.
> From: Swen Thuemmler [mailto:swen@mediaways.net]
> Sent: 19 July 2002 13:00

> Hi all,
> 
> when playing with cvs2svn.py I had some problems with the handling of
> CVS log messages. Only the last message of a changeset was incorporated
> in the svn commit. The enclosed patch fixes this. Now each log message
> from CVS is logged in Subversion together with the affected files (there
> might be the same message for some files and a different message for
> some other files). Additionally, the files get markers to be able to
> see whether it was a deletion, addition or modification.
> 
> For example:
> 
> M create.sql (1.2)
> M foo.c (1.4)
> A bar.c (1.1)
> M lib/tools.c (1.3)
>   funny logmessage
> D context.sql (1.2)
> D drop.sql (1.2)
>   No longer needed
> A sink.sql (1.1)
>   Here is all the hot stuff
> 
> This is probably a horrible hack for any decent python programmer (it is my
> first python code), and there might be better ways to implement this, but
> it does at least work. It would be nice, when some variation of this patch
> could get committed. I think it is important not to lose the version history when
> switching from CVS to Subversion.
> 
> Comments?

Question: How do you determine if multiple commits are in the same changeset when
their commit msg differs?  I mean, mostly a cvs commit of multiple files will have
the same commit msg.  When are the commit msgs ever different _and_ still detectably
in the same changeset?
 
> --Swen

Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org