You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Daniel Berlin <db...@dberlin.org> on 2002/08/13 05:59:25 UTC

How to do annotate

Just so it doesn't get lost in the noise, i've excerpted how to do 
annotate from my email to Tom.

The exact details on how/where to track the lines should be fairly easy 
for Brane.  If it wasn't so late, i'ddo it myself.
Looking at the delta combiner code, it looks fairly simple.

> 
> 	>> * svndiff (our delta format) isn't very "cvs annotate" friendly.
> 	>> This is really depressing, esp. after Brane spent a lot of hard work
> 	>> writing a delta combiner for us. :( Thoughts about how to handle the
> 	>> delta-window problem wrt svndiff would be most welcome.
> 
....

So, to show there's really no problem, and that the delta combiner doesn't 
make it easier or harder, ....

Before the delta combiner, you would have to perform the following steps:

1. For each revision starting from 1, up to the revision we need to 
annotate to, grab the name of the delta window that starts that revision.
Make a nice mapping structure so that you can lookup "foo", and it'll say 
"1.3"
2. As you apply deltas, and generate bytes, each range of bytes generated 
gets marked with the revision number associated with that window (if any. If 
you don't find it in the map, it means it doesn't delineate a new 
revision, and we assign it the previous number), which we lookup.
3. Now you must transform the byte ranges into *lines*, so it looks 
pretty. This is done by looking at the actual content, and figuring out where each 
line would break.
Each line is assigned the newest revision that touches something on that 
line.
4.  Boom, yer dnone.

With a delta combiner, it's the same first step, but the second step's 
tracking  needs to be performed *in the delta combiner* (since now *it* 
is what is effectively applying the deltas to generate a combined one, 
it just does so *without* actually executing the commands), 
or else, you have to avoid the delta combinator, and do step two.
Doing it in the delta combiner depends on how exactly your combiner works.


--Dan




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Branko Čibej <br...@xbc.nu>.

Daniel Berlin wrote:

>I'm of the theory that annotate/blame is not used often enough that it's 
>worth the penalties.  Let's just do it slowly.
>  
>
Hear, hear! I can't agree more. :-)

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Daniel Berlin <db...@dberlin.org> writes:
> Hey, that's not a half bad idea.
> 
> I can write a python script to do it (using the bindings), so we can speed 
> test it.
> 
> It's a trivial modification of getfile.py.
> In fact, I just did it.
> Let me get some numbers.

:-) !  Nice.

Can you get some numbers on files with, like, 300 revisions?  Maybe
use cvs2svn to convert some of the GCC tree or something...

Just a thought,
-K



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Daniel Berlin <db...@dberlin.org>.

On Tue, 13 Aug 2002, Garrett Rooney wrote:

> On Tue, Aug 13, 2002 at 09:51:47AM -0500, B. W. Fitzpatrick wrote:
> > Daniel Berlin <db...@dberlin.org> writes:
> > [snip]
> > > I'm of the theory that annotate/blame is not used often enough that it's 
> > > worth the penalties.  Let's just do it slowly.
> > 
> > +1
> 
> +1
> 
> -garrett 
> 
> (who was actually toying with the idea of implementing this as a perl script 
> last night...) 

Hey, that's not a half bad idea.

I can write a python script to do it (using the bindings), so we can speed 
test it.

It's a trivial modification of getfile.py.
In fact, I just did it.
Let me get some numbers.

> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

On Tue, Aug 13, 2002 at 09:51:47AM -0500, B. W. Fitzpatrick wrote:
> Daniel Berlin <db...@dberlin.org> writes:
> [snip]
> > I'm of the theory that annotate/blame is not used often enough that it's 
> > worth the penalties.  Let's just do it slowly.
> 
> +1

+1

-garrett 

(who was actually toying with the idea of implementing this as a perl script 
last night...) 

-- 
garrett rooney                    Remember, any design flaw you're 
rooneg@electricjellyfish.net      sufficiently snide about becomes  
http://electricjellyfish.net/     a feature.       -- Dan Sugalski

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Daniel Berlin <db...@dberlin.org>.

On 13 Aug 2002, Greg Hudson wrote:

> On Tue, 2002-08-13 at 09:58, Daniel Berlin wrote:
> > So should we just point out the problems that we know we can't solve, or 
> > should we do the best we can?
> 
> In this case, I think our lemonade is going to taste like rotten
> tomatoes if we make it this way.  Let me raise a third concern, related
> to windows: if I insert just one line at the beginning of a file, then
> it will look like there is a bit of new data every 128K through.
> 
> Bill has noted that if we went back to having source windows larger than
> target windows, this wouldn't be an issue.  But we can't really do that
> and have a delta combiner.  If byte 0 of rev 1 can depend on byte 150K
> of rev 2, and byte 150K of rev 2 can depend on byte 300K of rev 3, and
> so on, then byte 0 of rev 1 can depend on pretty much any byte of rev
> 1000.  We lose streaminess of delta application.
> 
> Ignoring windows: a line-based diff does a pretty damn good job of
> telling you what has changed, most of the time. 
Correct.
>  This is because it's
> only allowed to insert source data from the current point into the
> source file.  We, on the other hand, are allowed to insert data from
> anywhere in the source window which happens to look good, or even from
> earlier parts of the target window.  So we do a miserable job of
> reflecting what has changed.
> 
> So, what are our options?  I listed these before, in passing:
> 
>   1. Don't implement "svn blame", noting that it happens to be easy in
>      the CVS design and not in ours.  See how much people complain.

I'm not sure this is a good idea, only because svn blame isn't hard to 
implement, it's just going to be slow if we do it line based.

> 
>   2. Implement "svn blame" slowly, by regenerating each rev of the file
>      and doing a line-based diff between adjacent pairs.  This won't
>      actually be too bad until your repository starts to look like
>      gcc's, with hundreds of revs per file.

Even then, skip-deltas should allow us to generate the fulltexts fast, so 
the slowdown is completely dependent on the speed of the diff algorithm.


> 
>   3. Add annotation data to the repository.  This could be done in a
>      number of ways: each could be annotated as it arrives
>      (fast, heavy space penalty, but some people have lots of space);
>      we could annotate every N revisions (a little slower, space penalty
>      drops by a factor of N); there could be an "annotation cache" where
>      we store annotations when they are requested (unpredictable speed,
>      but space penalty is fixed at the size of the cache); we could
>      store annotation diffs alongsize the binary deltas (skip deltas
>      mean it won't take long to reconstruct any annotation).

> 
>   4. We could decide that for text files, we will store line-based diffs
>      instead of svndiffs.  It's not clear whether this would be a win or
>      a lose for space in general, though it's certainly more
>      complicated.
> 
>   5. Or we could simply make rotten tomato lemonade.
> 
> I would suggest that we go with #2 for a while because it's relatively
> easy and will suffice for the common case.  The last variation of #3
> might produce the best results overall, though it's distatesful that the
> filesystem layer should be performing line-based diffs.

I'm of the theory that annotate/blame is not used often enough that it's 
worth the penalties.  Let's just do it slowly.
--Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Greg Hudson <gh...@MIT.EDU>.

On Tue, 2002-08-13 at 09:58, Daniel Berlin wrote:
> So should we just point out the problems that we know we can't solve, or 
> should we do the best we can?

In this case, I think our lemonade is going to taste like rotten
tomatoes if we make it this way.  Let me raise a third concern, related
to windows: if I insert just one line at the beginning of a file, then
it will look like there is a bit of new data every 128K through.

Bill has noted that if we went back to having source windows larger than
target windows, this wouldn't be an issue.  But we can't really do that
and have a delta combiner.  If byte 0 of rev 1 can depend on byte 150K
of rev 2, and byte 150K of rev 2 can depend on byte 300K of rev 3, and
so on, then byte 0 of rev 1 can depend on pretty much any byte of rev
1000.  We lose streaminess of delta application.

Ignoring windows: a line-based diff does a pretty damn good job of
telling you what has changed, most of the time.  This is because it's
only allowed to insert source data from the current point into the
source file.  We, on the other hand, are allowed to insert data from
anywhere in the source window which happens to look good, or even from
earlier parts of the target window.  So we do a miserable job of
reflecting what has changed.

So, what are our options?  I listed these before, in passing:

  1. Don't implement "svn blame", noting that it happens to be easy in
     the CVS design and not in ours.  See how much people complain.

  2. Implement "svn blame" slowly, by regenerating each rev of the file
     and doing a line-based diff between adjacent pairs.  This won't
     actually be too bad until your repository starts to look like
     gcc's, with hundreds of revs per file.

  3. Add annotation data to the repository.  This could be done in a
     number of ways: each could be annotated as it arrives
     (fast, heavy space penalty, but some people have lots of space);
     we could annotate every N revisions (a little slower, space penalty
     drops by a factor of N); there could be an "annotation cache" where
     we store annotations when they are requested (unpredictable speed,
     but space penalty is fixed at the size of the cache); we could
     store annotation diffs alongsize the binary deltas (skip deltas
     mean it won't take long to reconstruct any annotation).

  4. We could decide that for text files, we will store line-based diffs
     instead of svndiffs.  It's not clear whether this would be a win or
     a lose for space in general, though it's certainly more
     complicated.

  5. Or we could simply make rotten tomato lemonade.

I would suggest that we go with #2 for a while because it's relatively
easy and will suffice for the common case.  The last variation of #3
might produce the best results overall, though it's distatesful that the
filesystem layer should be performing line-based diffs.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Daniel Berlin <db...@dberlin.org> writes:
> No, but we only need new data structures if diffing the fulltexts is too 
> slow, which i'm actually not convinced it is right now, even for files 
> with 100's of revisions.
> 
> One would *think* it is, but that's not a certainty.

That'd be great!  I'd be happy to have my proposal filed in /dev/null
if there's a simpler solution :-).

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Daniel Berlin <db...@dberlin.org>.

On 13 Aug 2002, Karl Fogel wrote:

> Daniel Berlin <db...@dberlin.org> writes:
> > CVS's isn't perfect.
> > It can't be.
> 
> Sorry, I was being sloppy.  What I should have said is, CVS tends not
> to lose any important information in annotate, and we shouldn't
> either.
> 
> I understand that CVS might not be perfect, in the sense that it
> probably opts for the fast-but-not-always-optimal diff algorithm.  But
> that's a choice we can tune, right?
> 
> Let's take a line-based text file, so we're talking explicitly about
> the case that "cvs ann" works on.
> 
>    Rev 1: Obviously, all lines belong to the first committer.
> 
>    Rev 2: Any deleted lines go away.  Any changed lines belong to the
>    new committer.  Any added lines belong to the new committer.
> 
>    Rev 3: See Rev 2.
> 
>    And so on, and so on.
> 
> So the question reduces to determining what happened to which lines in
> a given commit, which boils down to the diffing algorithm's ability to
> determine what happened.  If we're using GNU diff, there are various
> flags (--minimal, --horizon-lines, --speed-large-files) that can
> affect speed, correctness, or both.  This gets more and more expensive
> as we approach true perfection, so probably we want to settle for "as
> close to perfection as CVS gets".
> 
> IOW, my point in my earlier mail was really that we shouldn't choose a
> solution that gives noticeably worse results than "cvs ann".  I should
> have said that more clearly.
> 
> Are the svndiff-based proposals leading toward an equivalent level of
> correctness?

No, but we only need new data structures if diffing the fulltexts is too 
slow, which i'm actually not convinced it is right now, even for files 
with 100's of revisions.

One would *think* it is, but that's not a certainty.

> 
> -K
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Daniel Berlin <db...@dberlin.org> writes:
> CVS's isn't perfect.
> It can't be.

Sorry, I was being sloppy.  What I should have said is, CVS tends not
to lose any important information in annotate, and we shouldn't
either.

I understand that CVS might not be perfect, in the sense that it
probably opts for the fast-but-not-always-optimal diff algorithm.  But
that's a choice we can tune, right?

Let's take a line-based text file, so we're talking explicitly about
the case that "cvs ann" works on.

   Rev 1: Obviously, all lines belong to the first committer.

   Rev 2: Any deleted lines go away.  Any changed lines belong to the
   new committer.  Any added lines belong to the new committer.

   Rev 3: See Rev 2.

   And so on, and so on.

So the question reduces to determining what happened to which lines in
a given commit, which boils down to the diffing algorithm's ability to
determine what happened.  If we're using GNU diff, there are various
flags (--minimal, --horizon-lines, --speed-large-files) that can
affect speed, correctness, or both.  This gets more and more expensive
as we approach true perfection, so probably we want to settle for "as
close to perfection as CVS gets".

IOW, my point in my earlier mail was really that we shouldn't choose a
solution that gives noticeably worse results than "cvs ann".  I should
have said that more clearly.

Are the svndiff-based proposals leading toward an equivalent level of
correctness?

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Daniel Berlin <db...@dberlin.org>.

On 13 Aug 2002, Karl Fogel wrote:

> Daniel Berlin <db...@dberlin.org> writes:
> > But there is little that can be done that isn't going to be very fragile.
> > The reality is that we need not be perfect in our annotate output.
> 
> Whoa.  Wait a second, here :-).
> 
> There is no reason our annotate output can't be perfect like CVS's.
CVS's isn't perfect.
It can't be.


> All the information is there.  We shouldn't settle for anything less
> than perfect correctness.

Then you won't be doing it like CVS.
> 
> We've been spending a lot of effort here trying to to get "svn blame"
> data cheaply from our svndiff delta format.  If we can do it that way,
> then great.  But if we can't, the answer isn't to settle for
> inaccuracy -- it's to implement blame in some other way.

> 
> If we must resort to manual diffing and counting lines, then so be it.
> Below is a loose description of such a system.  I'm not saying it has
> to be this way (haven't been closely following the svndiff-centric
> discussion, just enough to see that it's non-trivial).


> 
> Doing Blame the Brute Force Way:
> ================================
> 
> There is a new table, `blame', mapping NodeChangeIDs to lists of the
> form "((RANGE1 REV1) (RANGE2 REV2) ...)".

Whoah, whoah, we don't need a new table to do this.
We have all the info already.
--Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Daniel Berlin <db...@dberlin.org> writes:
> But there is little that can be done that isn't going to be very fragile.
> The reality is that we need not be perfect in our annotate output.

Whoa.  Wait a second, here :-).

There is no reason our annotate output can't be perfect like CVS's.
All the information is there.  We shouldn't settle for anything less
than perfect correctness.

We've been spending a lot of effort here trying to to get "svn blame"
data cheaply from our svndiff delta format.  If we can do it that way,
then great.  But if we can't, the answer isn't to settle for
inaccuracy -- it's to implement blame in some other way.

If we must resort to manual diffing and counting lines, then so be it.
Below is a loose description of such a system.  I'm not saying it has
to be this way (haven't been closely following the svndiff-centric
discussion, just enough to see that it's non-trivial).

Doing Blame the Brute Force Way:
================================

There is a new table, `blame', mapping NodeChangeIDs to lists of the
form "((RANGE1 REV1) (RANGE2 REV2) ...)".

Each RANGE indicates a range of lines "(offset len)" in the node's
content, and the REV indicates the revision in which that range was
introduced.  (Variant: or we can store ranges of bytes instead of
ranges of lines).

After a new revision is committed, it is added to a list of revs whose
annotations need to be updated.  An asynchronous process, or an
internal post-commit thread, runs over that list.  For each revision
in the list, it finds all the changed file paths; for each path, it
calculates and stores the blame information.  This may involve
re-diffing the fulltexts (variant: use a non-compressing svndiff
instead of line-based diff, if we're storing byte ranges instead of
line ranges).

Is this horrendously inefficient?  Well, it's certainly inefficient,
but not horrendously so because it doesn't delay anything.  All the
work happens after the commit has already succeeded.  It'd be nice to
find a more efficient way, just not at the price of correctness.

Backwards compatibility and crash handling are both covered by the
same rule: if you go to fetch an annotation and it isn't there, then
calculate it on the spot, recursively.  (There will have to be some
mechanism to make sure you don't get two processes calculating the
same annotations at the same time, but I'm going to hand-wave on that
as we all know it's solveable).

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Daniel Berlin <db...@dberlin.org>.

> Realize, however, that CVS has these same problems, just on a lesser scale 
> (because the granularity of it's diff is larger, and it's diff algorithm 
> doesn't do self-similarity).  If it decides that the best way to represent 
> your change is to delete the first 3000 lines, and add 6000 lines to the 
> front in it's place, annotate will show you as having changed those 6000 
> lines, when you've only added 3000.

Bad example, of course, unless it's diff algorithm does something stupid 
(since if you've only added 3000 lines, the other 3000 shouldn't be different), but 
the point still stands: If it's diff algorithm doesn't produce perfect 
results (which it doesn't), then the output of cvs annotate can't always 
be correct, or even necessarily close to it at all.

CVS doesn't perform annotate based off anything but the diff ops (I 
checked), so it's output is only as good as the diff ops it produced.  If 
they match reality, you get nice annotate results. If they are a bit 
weird, your annotate output is a bit weird.
--Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Daniel Berlin <db...@dberlin.org>.

> 
> > 
> > Fundamentally, our binary deltas are about the most efficient way we can
> > come up with of reconstructing DST given SRC.  They are not about
> > determinine what changed, either by line or by byte range.
> 
> Of course.
> Thus, trying to solve problems that are impossible should not be 
> worried about.
> 
> If you want to do an annotate based *only* on svndiff, the output will be 
> imperfect.
> But the other options are not feasible.
> 
> So should we just point out the problems that we know we can't solve, or 
> should we do the best we can?

This, BTW, is known as the "If life gives you oranges, make lemonade" 
approach.

But seriously, svndiff is effectively an append+copy format, rather than 
insert+copy, because we can't specify where the insert occurs.  If we 
could, things like inserting 128k of text at the front wouldn't be a 
problem, but only because the delta-window would "look right".  But that's 
not what we have, so we can't.

Realize, however, that CVS has these same problems, just on a lesser scale 
(because the granularity of it's diff is larger, and it's diff algorithm 
doesn't do self-similarity).  If it decides that the best way to represent 
your change is to delete the first 3000 lines, and add 6000 lines to the 
front in it's place, annotate will show you as having changed those 6000 
lines, when you've only added 3000.

So people aren't expecting perfection (I hope, since CVS won't give it to 
them in this regard either)

--Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Daniel Berlin <db...@dberlin.org>.

On 13 Aug 2002, Greg Hudson wrote:

> On Tue, 2002-08-13 at 01:59, Daniel Berlin wrote:
> > 2. As you apply deltas, and generate bytes, each range of bytes generated 
> > gets marked with the revision number associated with that window (if any. If 
> > you don't find it in the map, it means it doesn't delineate a new 
> > revision, and we assign it the previous number), which we lookup.
> 
> This is too simplistic.  Let me throw out some concerns.  First, an
> example:
> 
>   SRC: foo\nbar\nbar\n
>   DST: foo\nbar\nfoo\nbar\n
> 
> The instructions for this delta look like:
> 
>   Copy 8 bytes from src offset 0
>   Copy 8 bytes from src offset 8
> 
> A line-based diff would notice that only the third line (bytes 8-11) was
> new in the destination.  So *half* of the second copy is new, and half
> is old.

Yes, I'm aware.
But there is little that can be done that isn't going to be very fragile.
The reality is that we need not be perfect in our annotate output.

> 
> Bill Tutt implied he had a way of recognizing this, by comparing the
> self-compression of SRC to the delta between SRC and DST.  I'm dubious
> (it seems very fragile at best), but regardless, let's move on to
> concern number two: windows.  If I insert 128K at the beginning of a
> file, then *every* window will look like new data, even though only the
> first 128K is new.

Once again, if the windows say everything is "new" in the 
revision, that's life.  You can't solve this problem either in a 
reasonable way, unless you consider constructing the full text of every 
revision, and then doing some other type of diff on them, reasonable.

> 
> Fundamentally, our binary deltas are about the most efficient way we can
> come up with of reconstructing DST given SRC.  They are not about
> determinine what changed, either by line or by byte range.

Of course.
Thus, trying to solve problems that are impossible should not be 
worried about.

If you want to do an annotate based *only* on svndiff, the output will be 
imperfect.
But the other options are not feasible.

So should we just point out the problems that we know we can't solve, or 
should we do the best we can?

--Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate (blame)

Posted by sam th <sa...@uchicago.edu>.

On Tue, Aug 13, 2002 at 10:58:57AM -0400, Greg Hudson wrote:
> On Tue, 2002-08-13 at 10:45, Daniel Berlin wrote:
> > How often does one do annotate?
> 
> I will note that some CVS web interfaces make it easier to generate an
> annotated source listing tha a non-annotated source listing.  I'm sure
> CVS repositories using those web interfaces get lots of annotate
> operations.
> 
> But it's easy enough to say "don't do that, it hurts" for Subversion, in
> which case I'm happy with making annotate slow, for now.  (And, as I
> noted, not really slow until you start getting files with many many
> revisions.)

Let me preface this by saying that I'm very interested in using SVN, and have 
already used it for a small personal project.  But I really think that this 
decision would be a real problem for Subversion's adoption by the community 
that uses cvs.  I know that annotate is a much more common operation for me 
(often through the Bonsai web interface) that is moving files, or changing 
directory structure.  It's invaluable for understanding how the code got into 
the situation that it's in.  

Further, it's precisely those projects and files with lots (100+) revisions 
that need this tool.  Something that's only been changed twice has an easy to 
understand revision history.  Something that has been changed 300 times since 
the piece of code in question was committed is where this tool is most 
useful.  

Finally, the tools, like Bonsai, that have grown up around CVS are so useful 
that I, personally, would find it unreasonable to give them up, simply 
because Subversion doesn't implement annotate in a way that makes them 
possible.  

sam th
sam@uchicago.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate (blame)

Posted by Greg Hudson <gh...@MIT.EDU>.

On Tue, 2002-08-13 at 10:45, Daniel Berlin wrote:
> How often does one do annotate?

I will note that some CVS web interfaces make it easier to generate an
annotated source listing tha a non-annotated source listing.  I'm sure
CVS repositories using those web interfaces get lots of annotate
operations.

But it's easy enough to say "don't do that, it hurts" for Subversion, in
which case I'm happy with making annotate slow, for now.  (And, as I
noted, not really slow until you start getting files with many many
revisions.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Re: How to do annotate (blame)

Posted by Bill Tutt <ra...@lyra.org>.

Indeed, just think how useful blame output can be in a three way merge
UI:
http://bitmover.com/gifs/fm3new.gif

Bill

> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> 
> Daniel Berlin <db...@dberlin.org> writes:
> > How often does one do annotate?
> > I think i've used it twice, ever.
> 
> Oh, I've used it on a regular basis in past projects, and depended on
> the information being correct.  I've *wanted* to use it in svn, and
> been frustrated that I couldn't.
> 
> I think this might one of those features where those who don't use it
> can't understand the enthusiasm of those who do :-).  But it is
> important to get it right, for those who depend on it.
> 
> -K
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate (blame)

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Daniel Berlin <db...@dberlin.org> writes:
> How often does one do annotate?
> I think i've used it twice, ever.

Oh, I've used it on a regular basis in past projects, and depended on
the information being correct.  I've *wanted* to use it in svn, and
been frustrated that I couldn't.

I think this might one of those features where those who don't use it
can't understand the enthusiasm of those who do :-).  But it is
important to get it right, for those who depend on it.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How often does one do annotate?

Posted by Greg Stein <gs...@lyra.org>.

While frequency is an important measure, I think the *relative* frequency of
operations is more important. For example, we definitely don't want to
impact simple *retrieval* at the cost of "blame" support. As a result, we
probably would not want to use a system whereby we store [in the repository]
line-oriented diffs for text files.

Personally, I use blame maybe once every couple months. I always use ViewCVS
to do it, as I don't have the "cvs ann" command in my head.

Cheers,
-g

On Tue, Aug 13, 2002 at 10:33:09AM -0500, Dave Glowacki wrote:
> Daniel Berlin wrote:
> > How often does one do annotate?
> > I think i've used it twice, ever.
> 
> I probably use 'cvs ann' at least once a month.  I'm involved
> in several medium-sized projects (500-1000 files, averaging
> 700-1000 lines each) and I find it useful to figure out either
> the context in which a particular change was made or to determine
> what the file looked like before the change was made.
> 
> For either one, I'll do 'cvs ann' to find the revision where
> the line or lines I'm interested in were changed.
> 
> To get the context, I'll do 'cvs diff -r<rev-1> -r<rev>'.
> 
> To find out what the file looked like before the change,
> I'll either do 'cvs ann -r<rev-1> | less' or even
> 'cvs update -r<rev-1>'.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate (blame)

Posted by Daniel Berlin <db...@dberlin.org>.

On Tue, 13 Aug 2002, William Uther wrote:

> 
> On Tuesday, August 13, 2002, at 02:22  AM, Greg Hudson wrote:
> 
> > On Tue, 2002-08-13 at 01:59, Daniel Berlin wrote:
> >> 2. As you apply deltas, and generate bytes, each range of bytes 
> >> generated
> >> gets marked with the revision number associated with that window (if 
> >> any. If
> >> you don't find it in the map, it means it doesn't delineate a new
> >> revision, and we assign it the previous number), which we lookup.
> >
> > This is too simplistic.  Let me throw out some concerns.  First, an
> > example:
> >
> >   SRC: foo\nbar\nbar\n
> >   DST: foo\nbar\nfoo\nbar\n
> >
> > The instructions for this delta look like:
> >
> >   Copy 8 bytes from src offset 0
> >   Copy 8 bytes from src offset 8
> 
> Yes.  I think it is quite interesting that svndelta is not really a 
> "delta" algorithm, but an efficient compression algorithm for the second 
> piece of text given the first.  (I agree the difference is subtle)  The 
> fact that they are not easily reversible has made some of my thinking 
> about merging tricky.
> 
> I believe another problem is that we currently use skip-deltas.  This 
> means that many revisions are not stored as deltas to their neighbouring 
> revisions.  You can't just apply the deltas one by one.

Yes, this would be a problem.
But the only "perfect" solution in light of this (where perfect is not 
absolutely correct output, but as good as cvs) is to generate the 
fulltext of all the revisions, and do a line based diff algorithm on 
them, or to turn off skip-deltas.

How often does one do annotate?
I think i've used it twice, ever.
Is it often enough that we should be all that concerned with speed?
We are probably talking 10 seconds vs 1 second, in terms of speed 
difference between doing it only based on svndiff, or doing it based on 
the generated fulltexts.

But doing it based on fulltexts is actually improved by skip-deltas, since 
we can generate fulltexts faster.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate (blame)

Posted by William Uther <wi...@cs.cmu.edu>.

On Tuesday, August 13, 2002, at 02:22  AM, Greg Hudson wrote:

> On Tue, 2002-08-13 at 01:59, Daniel Berlin wrote:
>> 2. As you apply deltas, and generate bytes, each range of bytes 
>> generated
>> gets marked with the revision number associated with that window (if 
>> any. If
>> you don't find it in the map, it means it doesn't delineate a new
>> revision, and we assign it the previous number), which we lookup.
>
> This is too simplistic.  Let me throw out some concerns.  First, an
> example:
>
>   SRC: foo\nbar\nbar\n
>   DST: foo\nbar\nfoo\nbar\n
>
> The instructions for this delta look like:
>
>   Copy 8 bytes from src offset 0
>   Copy 8 bytes from src offset 8

Yes.  I think it is quite interesting that svndelta is not really a 
"delta" algorithm, but an efficient compression algorithm for the second 
piece of text given the first.  (I agree the difference is subtle)  The 
fact that they are not easily reversible has made some of my thinking 
about merging tricky.

I believe another problem is that we currently use skip-deltas.  This 
means that many revisions are not stored as deltas to their neighbouring 
revisions.  You can't just apply the deltas one by one.

later,

\x/ill           :-}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: How to do annotate

Posted by Greg Hudson <gh...@MIT.EDU>.

On Tue, 2002-08-13 at 01:59, Daniel Berlin wrote:
> 2. As you apply deltas, and generate bytes, each range of bytes generated 
> gets marked with the revision number associated with that window (if any. If 
> you don't find it in the map, it means it doesn't delineate a new 
> revision, and we assign it the previous number), which we lookup.

This is too simplistic.  Let me throw out some concerns.  First, an
example:

  SRC: foo\nbar\nbar\n
  DST: foo\nbar\nfoo\nbar\n

The instructions for this delta look like:

  Copy 8 bytes from src offset 0
  Copy 8 bytes from src offset 8

A line-based diff would notice that only the third line (bytes 8-11) was
new in the destination.  So *half* of the second copy is new, and half
is old.

Bill Tutt implied he had a way of recognizing this, by comparing the
self-compression of SRC to the delta between SRC and DST.  I'm dubious
(it seems very fragile at best), but regardless, let's move on to
concern number two: windows.  If I insert 128K at the beginning of a
file, then *every* window will look like new data, even though only the
first 128K is new.

Fundamentally, our binary deltas are about the most efficient way we can
come up with of reconstructing DST given SRC.  They are not about
determinine what changed, either by line or by byte range.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org