You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@subversion.apache.org by jc...@apache.org on 2010/11/21 02:17:45 UTC

svn commit: r1037365 - /subversion/branches/diff-optimizations-tokens/BRANCH-README

Author: jcorvel
Date: Sun Nov 21 01:17:44 2010
New Revision: 1037365

URL: http://svn.apache.org/viewvc?rev=1037365&view=rev
Log:
On the diff-optimizations-tokens branch:

Add a BRANCH-README

Added:
    subversion/branches/diff-optimizations-tokens/BRANCH-README

Added: subversion/branches/diff-optimizations-tokens/BRANCH-README
URL: http://svn.apache.org/viewvc/subversion/branches/diff-optimizations-tokens/BRANCH-README?rev=1037365&view=auto
==============================================================================
--- subversion/branches/diff-optimizations-tokens/BRANCH-README (added)
+++ subversion/branches/diff-optimizations-tokens/BRANCH-README Sun Nov 21 01:17:44 2010
@@ -0,0 +1,24 @@
+The purpose of this branch is to experiment with speeding up 'svn diff',
+especially for large files with lots of unchanged lines.
+
+As a secondary objective, this should also speed up 'svn blame', since blame 
+performs a diff on the client side for every revision part of the blame 
+operation. This will only be noticeable if the server and network are fast
+enough, so the client becomes the bottleneck (e.g. on a LAN, server having a
+fast backend (e.g. FSFS on SSD)).
+
+General approach: reduce the problem set for the LCS algorithm as much as
+possible, by eliminating identical prefix and suffix before putting the
+tokens (lines) into the token tree (see [1] for some background).
+
+Specific approach for this branch: scan for identical prefix/suffix
+line-per-line, in the token handling layer (subversion/libsvn_diff/token.c).
+This is done by getting tokens (lines) from all files together, skipping
+identical prefix/suffix lines, and only then starting to insert the tokens 
+into the token tree. This allows the prefix/suffix scanning to take advantage
+of normalization (ignore-whitespace and ignore-eol-style options). Also, it
+may allow additional optimizations which are difficult to do when scanning
+byte-per-byte.
+
+
+[1] http://en.wikipedia.org/wiki/Longest_common_subsequence_problem#Reduce_the_problem_set