You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by VK Sameer <sa...@collab.net> on 2005/01/21 09:16:16 UTC

[PATCH] Limit size of emails sent using commit-email.pl v2

Hello,

A new log message and modified commit-email.pl.in are attached.
Changing the truncate function to zero out the difflines became too 
difflines array specific, so it has been called process_diff instead.

Regards
Sameer

Re: [PATCH] Limit size of emails sent using commit-email.pl v2

Posted by kf...@collab.net.
> A new log message and modified commit-email.pl.in are attached.
> Changing the truncate function to zero out the difflines became too 
> difflines array specific, so it has been called process_diff instead.

Here's a review; sorry for the long delay in giving this feedback.

(Note that Branden Robinson submitted a similar patch, at
http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=89960,
which has also not yet been reviewed.)

> Controls to modify email message based on diff size
> 
> * commit-email.pl.in
>   Added new commandline arguments (-ds, -dl)
>   -ds: diff_size (max) - overrides default of 1_000_000
>   -dl: diff_location   - added to email body if set
>   $max_diff_size: New variable
>   check_diff(): New function to modify body based on diff size
>   Added call to check (and possibly truncate) size of @difflines
> 
> 
> Index: tools/hook-scripts/commit-email.pl.in
> ===================================================================
> --- tools/hook-scripts/commit-email.pl.in	(revision 12807)
> +++ tools/hook-scripts/commit-email.pl.in	(working copy)
> @@ -57,6 +57,9 @@
>  # $no_diff_added to 1.
>  my $no_diff_added = 0;
>  
> +# This controls the size of the commit email.
> +my $max_diff_size  = 1_000_000;   # characters
> +

The comment says it controls the size of the commit email, but the
variable name says it controls the size of the diff.  Which is it? :-)
(Diff, I assume.)

>  # Since the path to svnlook depends upon the local installation
>  # preferences, check that the required programs exist to insure that
>  # the administrator has set up the script properly.
> @@ -108,7 +111,9 @@
>                         '-l'     => 'log_file',
>                         '-m'     => '',
>                         '-r'     => 'reply_to',
> -                       '-s'     => 'subject_prefix');
> +                       '-s'     => 'subject_prefix',
> +                       '-ds'    => 'diff_size',
> +                       '-dl'    => 'diff_location');

I think long names would be better here, especially since this script
is almost never invoked manually (only when it's being tested,
basically).  So '--diff-size-limit' and '--diff-alternative', maybe?

I call the latter '--diff-alternative' instead of '--diff-location'
because it would not always be used to give the location of the diff.
It might just be a string of text explaining that the diff could not
be shown, or perhaps giving an svn command line client example command
for how to get the diff, or whatever.  Whether or not it is a location
is up to the caller, not up to us.

>  while (@ARGV)
>    {
> @@ -367,6 +372,7 @@
>  push(@body, "Log:\n");
>  push(@body, @log);
>  push(@body, "\n");
> +&process_diff();
>  push(@body, map { /[\r\n]+$/ ? $_ : "$_\n" } @difflines);
>  
>  # Go through each project and see if there are any matches for this
> @@ -500,6 +506,8 @@
>        "  -m regex              Regular expression to match committed path\n",
>        "  -r email_address      Email address for 'Reply-To:'\n",
>        "  -s subject_prefix     Subject line prefix\n",
> +      "  -ds diff_size         Max size of diff content\n",
> +      "  -dl diff_location     String to be put before diff content\n",
>        "\n",
>        "This script supports a single repository with multiple projects,\n",
>        "where each project receives email only for commits that modify that\n",

Usage for diff_size should say what units (bytes?  lines?).

> @@ -518,7 +526,12 @@
>        "a list of email addresses on the command line.  If you do not want\n",
>        "a project that matches the entire repository, then use a -m with a\n",
>        "regular expression before any other command line options or email\n",
> -      "addresses.\n";
> +      "addresses.\n",
> +      "\n",
> +      "The maximum amount of diff content and hence the size of the email\n",
> +      "can be set using the -ds option and a diff 'location' can be set\n",
> +      "using the -dl option like so:\n",
> +      "-ds 'http://svn.collab.net/viewcvs/svn/?view=rev&rev=12798'\n";
>  }

I think that example meant '-dl' where it says '-ds'?

Again, the diff content does not define the size of the email, since
there's also the log message, which we aren't limiting.  So we should
find a more accurate way to phrase what's going on.

But more importantly, it seems like this is saying that if the diff is
too large, we'll show some of it, just not all of it.  If so, that's
not a useful behavior -- after all, there's no way to programatically
determine which parts of a diff are important and which aren't.  If we
can't show the whole thing, we should show none, and the reader should
use the alternative (ViewCVS or whatever) instead.  It's not helpful
to show just the first N lines and leave the rest out, IMHO.

>  # Return a new hash data structure for a new empty project that
> @@ -598,3 +611,30 @@
>        return @output;
>      }
>  }
> +
> +# Check size of @difflines and truncate if necessary
> +# Add diff_location (e.g, a ViewCVS URL) if specified
> +sub process_diff
> +{
> +  # all projects have the same diff_location
> +  if ($project_settings_list[0]->{diff_location})
> +    {
> +      push(@body, "$project_settings_list[0]->{diff_location}\n");
> +    }
> +
> +  if ($project_settings_list[0]->{diff_size})
> +    {
> +      $max_diff_size = $project_settings_list[0]->{diff_size};
> +    }
> +
> +  for (my ($line, $size) = (0, 0); $line <= $#difflines; $line++)
> +    {
> +      if (($size + length $difflines[$line]) > $max_diff_size)
> +        {
> +          splice(@difflines, 0, $#difflines,
> +                 "\n(Diff too large to include in email. See above)");
> +          last;
> +        }
> +      $size += length($difflines[$line]);
> +    }
> +}

It looks like there's no way to have a max_diff_size of infinity.
People can override the default with an even higher number, but
shouldn't there be some way to say "No limit at all"?  A value of -1
or something like that?

Okay, I see here you're truncating @difflines from its beginning, and
replacing it with a single-element list containing just a message
about the diff being too large.  So the problem is only in the usage
message, not in the implementation.

Here's an interesting implementation thought: one of the things I had
hoped for was that this new limit option would also mean that the
server would not have to hold the entire diff in memory at once if the
diff were over the limit.  That is, we'd accumulate the diff to a tmp
file, check the size of the file, and then splice it into the mail iff
it's below the limit.

I realize this could be considered an optimization; on the other hand,
it's kind of an important one, since a diff could be arbitrarily
large, and it's been a principle (in Subversion itself, anyway) to
avoid holding all the data in memory at once when the data is of
arbitrary size.  What do you think of doing that here?

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Limit size of emails sent using commit-email.pl v2

Posted by Michael W Thelen <mi...@pietdepsi.com>.
kfogel@collab.net wrote:
> VK Sameer <sa...@collab.net> writes:
>
>>A new log message and modified commit-email.pl.in are attached.
>>Changing the truncate function to zero out the difflines became too
>>difflines array specific, so it has been called process_diff instead.
>
>
> Thanks, Sameer!  I've printed this out, will try to review it on the
> plane so can ask you any questions in person this coming week.

I've filed the patch as issue #2213:
http://subversion.tigris.org/issues/show_bug.cgi?id=2213

--
Michael W Thelen
It is a mistake to think you can solve any major problems just with
potatoes.       -- Douglas Adams

Re: [PATCH] Limit size of emails sent using commit-email.pl v2

Posted by kf...@collab.net.
VK Sameer <sa...@collab.net> writes:
> A new log message and modified commit-email.pl.in are attached.
> Changing the truncate function to zero out the difflines became too
> difflines array specific, so it has been called process_diff instead.

Thanks, Sameer!  I've printed this out, will try to review it on the
plane so can ask you any questions in person this coming week.

-Karl

> Controls to modify email message based on diff size
> 
> * commit-email.pl.in
>   Added new commandline arguments (-ds, -dl)
>   -ds: diff_size (max) - overrides default of 1_000_000
>   -dl: diff_location   - added to email body if set
>   $max_diff_size: New variable
>   check_diff(): New function to modify body based on diff size
>   Added call to check (and possibly truncate) size of @difflines
> Index: tools/hook-scripts/commit-email.pl.in
> ===================================================================
> --- tools/hook-scripts/commit-email.pl.in	(revision 12807)
> +++ tools/hook-scripts/commit-email.pl.in	(working copy)
> @@ -57,6 +57,9 @@
>  # $no_diff_added to 1.
>  my $no_diff_added = 0;
>  
> +# This controls the size of the commit email.
> +my $max_diff_size  = 1_000_000;   # characters
> +
>  # Since the path to svnlook depends upon the local installation
>  # preferences, check that the required programs exist to insure that
>  # the administrator has set up the script properly.
> @@ -108,7 +111,9 @@
>                         '-l'     => 'log_file',
>                         '-m'     => '',
>                         '-r'     => 'reply_to',
> -                       '-s'     => 'subject_prefix');
> +                       '-s'     => 'subject_prefix',
> +                       '-ds'    => 'diff_size',
> +                       '-dl'    => 'diff_location');
>  
>  while (@ARGV)
>    {
> @@ -367,6 +372,7 @@
>  push(@body, "Log:\n");
>  push(@body, @log);
>  push(@body, "\n");
> +&process_diff();
>  push(@body, map { /[\r\n]+$/ ? $_ : "$_\n" } @difflines);
>  
>  # Go through each project and see if there are any matches for this
> @@ -500,6 +506,8 @@
>        "  -m regex              Regular expression to match committed path\n",
>        "  -r email_address      Email address for 'Reply-To:'\n",
>        "  -s subject_prefix     Subject line prefix\n",
> +      "  -ds diff_size         Max size of diff content\n",
> +      "  -dl diff_location     String to be put before diff content\n",
>        "\n",
>        "This script supports a single repository with multiple projects,\n",
>        "where each project receives email only for commits that modify that\n",
> @@ -518,7 +526,12 @@
>        "a list of email addresses on the command line.  If you do not want\n",
>        "a project that matches the entire repository, then use a -m with a\n",
>        "regular expression before any other command line options or email\n",
> -      "addresses.\n";
> +      "addresses.\n",
> +      "\n",
> +      "The maximum amount of diff content and hence the size of the email\n",
> +      "can be set using the -ds option and a diff 'location' can be set\n",
> +      "using the -dl option like so:\n",
> +      "-ds 'http://svn.collab.net/viewcvs/svn/?view=rev&rev=12798'\n";
>  }
>  
>  # Return a new hash data structure for a new empty project that
> @@ -598,3 +611,30 @@
>        return @output;
>      }
>  }
> +
> +# Check size of @difflines and truncate if necessary
> +# Add diff_location (e.g, a ViewCVS URL) if specified
> +sub process_diff
> +{
> +  # all projects have the same diff_location
> +  if ($project_settings_list[0]->{diff_location})
> +    {
> +      push(@body, "$project_settings_list[0]->{diff_location}\n");
> +    }
> +
> +  if ($project_settings_list[0]->{diff_size})
> +    {
> +      $max_diff_size = $project_settings_list[0]->{diff_size};
> +    }
> +
> +  for (my ($line, $size) = (0, 0); $line <= $#difflines; $line++)
> +    {
> +      if (($size + length $difflines[$line]) > $max_diff_size)
> +        {
> +          splice(@difflines, 0, $#difflines,
> +                 "\n(Diff too large to include in email. See above)");
> +          last;
> +        }
> +      $size += length($difflines[$line]);
> +    }
> +}
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org