You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Eric Seppanen <ed...@reric.net> on 2004/11/20 00:42:29 UTC

svn diff with binary files

Hi,

I'm evaluating Subversion right now, and have svn 1.1.1 installed
along with websvn 1.6.1.  I have a tree that contains several large
binary files (8 files at about 400kb each) and I notice that this is
slow to work with.

The reason: if I ask websvn to show me what changed between two
versions, it executes the command:

> svn --non-interactive diff -r2:4 file:///my-repos/tree/

... and the problem is this command takes 5-7 seconds of CPU time to
complete.  All a waste, I think, because it can't actually print 
anything about the binary files other than:

  Cannot display: file marked as a binary type.
  svn:mime-type = application/octet-stream

So why does it take so long?  I can only guess that it's actually
chewing on the binary files for a while.

Is there any way to get svn to not waste cpu time trying to diff files
it already knows are binary and undisplayable?  Should there be a way?

I might even suggest that that's the way it ought to always work.

Thanks,
Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: svn diff with binary files

Posted by Gary Feldman <g1...@marsdome.com>.
kfogel@collab.net wrote:
> Eric Seppanen <ed...@reric.net> writes:
>> The problem is that the the part of Subversion that fetches the files
> would have to "know" that the diff was being computed by another part
> of Subversion, instead of by some external program that Subversion
> passes off to.  (Because in the latter case, you *can* diff files that
> Subversion thinks are binary.)  We could do this, but it would be
> tricky to do it without introducing a layering violation.  In the
> interests of simplicity, we've unofficially decided not to.

Umm, isn't it the client's responsibility (where I'm using "client" to 
mean the piece of code that controls the fetching and execution of the 
diff) to interrogate the type of the file before fetching, to make sure 
the type is appropriate?  In other words, it's the piece of code that 
knows that there are multiple types of diff that also needs to know that 
each type of diff has different prerequisites (in the general case).

As I look at the new external diff command, it occurs to me that by 
taking a totally agnostic view to --diff-cmd, it stops being a diff 
command.  One could, in theory, say things like "--diff-cmd lpr", 
"--diff-cmd /bin/rm", or even (if one were really perverse) "--diff-cmd 
'cvs' -x 'commit'".  (Actually, the manual doesn't specify how the files 
are passed to the diff program, so I'm not quite sure whether these work 
as given or need to be munged.)  The old school UNIX geek in me says 
this is great; the new school O-O QA engineer in me says it's awful.  A 
compromise idea that pops into mind is to have "--diff-type 
diff-type-name", where "diff-type-name" is defined inside of some 
configuration file.  This saves the user the trouble of remembering 
where the diff executable is (e.g., windiff is in some weird place on my 
system).

Just some early morning thoughts, before the caffeine kicks in....

Gary


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: svn diff with binary files

Posted by kf...@collab.net.
Eric Seppanen <ed...@reric.net> writes:
> I'm evaluating Subversion right now, and have svn 1.1.1 installed
> along with websvn 1.6.1.  I have a tree that contains several large
> binary files (8 files at about 400kb each) and I notice that this is
> slow to work with.
> 
> The reason: if I ask websvn to show me what changed between two
> versions, it executes the command:
> 
> > svn --non-interactive diff -r2:4 file:///my-repos/tree/
> 
> ... and the problem is this command takes 5-7 seconds of CPU time to
> complete.  All a waste, I think, because it can't actually print 
> anything about the binary files other than:
> 
>   Cannot display: file marked as a binary type.
>   svn:mime-type = application/octet-stream
> 
> So why does it take so long?  I can only guess that it's actually
> chewing on the binary files for a while.
> 
> Is there any way to get svn to not waste cpu time trying to diff files
> it already knows are binary and undisplayable?  Should there be a way?
> 
> I might even suggest that that's the way it ought to always work.

The optimization you suggest is possible, but surprisingly difficult.

The problem is that the the part of Subversion that fetches the files
would have to "know" that the diff was being computed by another part
of Subversion, instead of by some external program that Subversion
passes off to.  (Because in the latter case, you *can* diff files that
Subversion thinks are binary.)  We could do this, but it would be
tricky to do it without introducing a layering violation.  In the
interests of simplicity, we've unofficially decided not to.

Of course, if there were an easier way, some unexpectedly simple
patch, then it would be worth it.

-K


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org