You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Jon Bendtsen <jo...@laerdal.dk> on 2005/08/19 10:00:22 UTC

[PATCH] size and force text files to be considered binary in mailer.py

Hi

Here follows a patch that allows one to specify a max mailsize in the  
mailer.py script.
Further more the script allows you to specifiy which text files are  
to be considered binary
and thus no diff included in the mail.

my mailer.py is taken from debian unstable and the head of the  
mailer.py script looks like this.

# $HeadURL: http://svn.collab.net/repos/svn/branches/1.2.x/tools/hook- 
scripts/mailer/mailer.py $
# $LastChangedDate: 2005-04-21 14:12:41 -0400 (Thu, 21 Apr 2005) $
# $LastChangedBy: breser $
# $LastChangedRevision: 14358 $

I think it is version 1.2.0

The mailsize is calculated by having a hardcoded max size limit in  
the mailer.py script.
For every diff
     for every line
         i take the length of the line and subtract it from the max  
size.
         If the max size reaches 0 or below, one more line is added
         saying that the limit has been reached and no more text is
         included in the email.

Currently the limit is hardcoded to 2MB

Further more the number of diffs are counted and if there is more  
than 128 diffs/files, it aborts
futher generation as well, but does send the mail that was genereated  
so far.


To force files to be considered binary, the name of the file is  
checked against a list of regular
expressions. The regular expressions is taken from the file  
filediff.conf that usually resides in
the conf directory right next to the mailer.conf.
Every line except those beginning with # are used in a regular  
expression match.
If any match is found we stop checking for other regular expressions,  
and stops generating
diffs for this file. A message is included that it is forced binary  
because of $patern

I'm looking for input, suggestions, ... and eventually maybe someone  
will include it into the
mailer.py which exists now.




JonB


Re: [PATCH] size and force text files to be considered binary in mailer.py

Posted by Michael Brouwer <mi...@tlaloc.net>.
Making mailer.py aware of renames would be cool.  Currently if you  
rename a file you get the old file with all --- and the new one with + 
++.  Instead it should diff the file with the new name against the  
contents of the file with the old name resulting in only the actual  
changes made to the file after the rename (if any).

Michael

On Aug 19, 2005, at 3:00 AM, Jon Bendtsen wrote:

> Hi
>
> Here follows a patch that allows one to specify a max mailsize in  
> the mailer.py script.
> Further more the script allows you to specifiy which text files are  
> to be considered binary
> and thus no diff included in the mail.
>
> my mailer.py is taken from debian unstable and the head of the  
> mailer.py script looks like this.
>
> # $HeadURL: http://svn.collab.net/repos/svn/branches/1.2.x/tools/ 
> hook-scripts/mailer/mailer.py $
> # $LastChangedDate: 2005-04-21 14:12:41 -0400 (Thu, 21 Apr 2005) $
> # $LastChangedBy: breser $
> # $LastChangedRevision: 14358 $
>
> I think it is version 1.2.0
>
> The mailsize is calculated by having a hardcoded max size limit in  
> the mailer.py script.
> For every diff
>     for every line
>         i take the length of the line and subtract it from the max  
> size.
>         If the max size reaches 0 or below, one more line is added
>         saying that the limit has been reached and no more text is
>         included in the email.
>
> Currently the limit is hardcoded to 2MB
>
> Further more the number of diffs are counted and if there is more  
> than 128 diffs/files, it aborts
> futher generation as well, but does send the mail that was  
> genereated so far.
>
>
> To force files to be considered binary, the name of the file is  
> checked against a list of regular
> expressions. The regular expressions is taken from the file  
> filediff.conf that usually resides in
> the conf directory right next to the mailer.conf.
> Every line except those beginning with # are used in a regular  
> expression match.
> If any match is found we stop checking for other regular  
> expressions, and stops generating
> diffs for this file. A message is included that it is forced binary  
> because of $patern
>
> I'm looking for input, suggestions, ... and eventually maybe  
> someone will include it into the
> mailer.py which exists now.
>
>
>
>
> JonB
>
>
> <patch-120>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: [PATCH] size and force text files to be considered binary in mailer.py

Posted by Jon Bendtsen <jo...@laerdal.dk>.
Thank you for your input


Den 23. aug 2005 kl. 16:55 skrev C. Michael Pilato:

> Jon Bendtsen <jo...@laerdal.dk> writes:
>
>
>> The mailsize is calculated by having a hardcoded max size limit in
>> the mailer.py script.
>>
>
> This should be a runtime configuration parameter, not hardcoded.

I know that, but it was easier just to hardcode it at first, and  
leave it
at that if noone wanted my patch.


>> For every diff
>>      for every line
>>          i take the length of the line and subtract it from the  
>> max size.
>>          If the max size reaches 0 or below, one more line is added
>>          saying that the limit has been reached and no more text is
>>          included in the email.
>>
>
> Why not go ahead and subtract the cost in size of your "This mail is
> too big" message from the limit?  That way a mail not supposed to
> exceed some value doesn't exceed that value.

Because there are some text above the diff messages.
There are headers
There are the list of changed files
There are the log message.

This takes space too, and it was easier just to count the diff lines.
I guess i could try to subtract the above datasizes, but i think it  
is going
to be HARD to ensure that the mail does not exceed the limit.


>> Further more the number of diffs are counted and if there is more
>> than 128 diffs/files, it aborts futher generation as well, but does
>> send the mail that was genereated so far.
>>
>
> That kind of arbitrary action is nonsense.  Sorry, but who cares how
> many diff sections are in a file?  I guarantee that having more than
> 128 has never caused anybody a lick of technical trouble.

Well, my customer asked me to make this limit, so i made it.


>> To force files to be considered binary, the name of the file is
>> checked against a list of regular expressions. The regular
>> expressions is taken from the file filediff.conf that usually
>> resides in the conf directory right next to the mailer.conf.  Every
>> line except those beginning with # are used in a regular expression
>> match.  If any match is found we stop checking for other regular
>> expressions, and stops generating diffs for this file. A message is
>> included that it is forced binary because of $patern
>>
>
> So, why is a file that lists regular expressions of binary filenames
> called "filediff.conf"?  That's not intuitive at all.  And might not
> different repositories have different ideas about binariness?  In
> fact, as you'll read below, I have doubts about the need for this
> feature at all.

Well again my customer asked for it. Currently they exclude 4 kinds of
files, .po is one of them.
I guess i could change the name to exclude_files_from_diffmails.conf
or something like that, unless you have a better name.


>> I'm looking for input, suggestions, ... and eventually maybe someone
>> will include it into the mailer.py which exists now.
>>
>
> As it stands, I'm unfortunately very much agaist rolling this code (in
> particular) and several of this concepts (in general) into mailer.py.
> Some code review follows.
>
>
>> --- mailer-120.py    2005-08-18 12:32:15.000000000 +0200
>> +++ mailer-size_and_filediff-120.py    2005-08-19  
>> 11:47:50.000000000 +0200
>> @@ -667,6 +667,28 @@
>>        if self.paths.has_key(path) != self.in_paths:
>>          continue
>>
>> +      #Added ny JonB 20050818
>>
>
> We don't do in-code attribution.

Allright, i'll remove it


>> +      patern_matching = None
>> +      reason = ''
>> +
>> +      filediff_conf = open(diff_config_fname, 'r')
>> +      diffpatern = filediff_conf.readline()
>> +      while diffpatern != '':
>> +        fixed_diff_patern = diffpatern.strip()
>>
>
> In 'patern' and 'diffpatern' and 'fixed_diff_patern', the common
> English spelling is 'pattern' (with two T's).

i'll correct that then


>> +        comment_matching = re.search('^#',fixed_diff_patern)
>> +        if comment_matching != None:
>>
>
> Using re.search here is quite unnecessary.
>
>    if fixed_diff_patern[0] == '#':
>       ...
>
> Or even:
>
>    if fixed_diff_patern.find('#') == 0:
>       ...

heh, thats certainly smarter than my waste of resources.


>> +      if patern_matching != None:
>> +        reason = 'FORCED Binary file because of patern: %s\n' %
>> fixed_diff_patern
>>
>
> Users aren't going to care which pattern caused their file not to show
> up, so I think alot of this code could be greatly simplified by losing
> the "reason" thing altogether.  So all this could would need to do is
> override the answer returned by diff.either_binary() farther down.
> Basically, you give Subversion an chance to say the file is binary.
> If Subversion says, "Nope, not binary" then you see if the filename
> matches your pattern, and if so, override that binariness answer.
> From there the code flows the same as it does today, and you get the
> universal "Binary file. No diff available." notification in the mail.

I wrote a reason to make it easier for my customer to figure out why
the file was excluded.
I check the filename before letting subversion generate a diff, because
i believe that subversion uses more resources to create those diff's
than would be spent checking the filenames.


> But here's the deal -- why isn't Subversion's is-binary question not
> enough for you?  You are aware of svn:mime-type and how it's used to
> determine binariness on a per-file basis, right?

no, i am not. And neither was my customer i guess.


>> @@ -1163,6 +1243,7 @@
>>                }
>>
>>    config_fname = None
>> +  diff_config_fname = None
>>    argc = len(sys.argv)
>>    if argc < 3:
>>      usage()
>> @@ -1192,6 +1273,16 @@
>>      raise MissingConfig(config_fname)
>>    config_fp = open(config_fname)
>>
>> +  # Settle on a filediff file location, and open it.
>> +  if diff_config_fname is None:
>> +    # Default to REPOS-DIR/conf/filediff.conf.
>> +    diff_config_fname = os.path.join(repos_dir, 'conf',  
>> 'filediff.conf')
>> +    if not os.path.exists(diff_config_fname):
>> +      # Okay.  Look for 'filediff.conf' as a sibling of this script.
>> +      diff_config_fname = os.path.join(os.path.dirname(sys.argv 
>> [0]), 'filediff.conf')
>> +  if not os.path.exists(diff_config_fname):
>> +    raise MissingConfig(diff_config_fname)
>> +
>>    svn.core.run_app(main, cmd, config_fname, repos_dir,
>>                     sys.argv[3:3+expected_args])
>>
>
> You don't seem to actually pass diff_config_fname off anywhere, and it
> isn't declared in a global scope.  Also, raising an exception when you
> don't find it means that current users of mailer.py can't upgrade to a
> new version without creating this new config file, even if that file
> would be completely empty.

i guess i have to correct that as well.





JonB

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: [PATCH] size and force text files to be considered binary in mailer.py

Posted by "C. Michael Pilato" <cm...@collab.net>.
Jon Bendtsen <jo...@laerdal.dk> writes:

> The mailsize is calculated by having a hardcoded max size limit in
> the mailer.py script.

This should be a runtime configuration parameter, not hardcoded.

> For every diff
>      for every line
>          i take the length of the line and subtract it from the max size.
>          If the max size reaches 0 or below, one more line is added
>          saying that the limit has been reached and no more text is
>          included in the email.

Why not go ahead and subtract the cost in size of your "This mail is
too big" message from the limit?  That way a mail not supposed to
exceed some value doesn't exceed that value.

> Further more the number of diffs are counted and if there is more
> than 128 diffs/files, it aborts futher generation as well, but does
> send the mail that was genereated so far.

That kind of arbitrary action is nonsense.  Sorry, but who cares how
many diff sections are in a file?  I guarantee that having more than
128 has never caused anybody a lick of technical trouble.

> To force files to be considered binary, the name of the file is
> checked against a list of regular expressions. The regular
> expressions is taken from the file filediff.conf that usually
> resides in the conf directory right next to the mailer.conf.  Every
> line except those beginning with # are used in a regular expression
> match.  If any match is found we stop checking for other regular
> expressions, and stops generating diffs for this file. A message is
> included that it is forced binary because of $patern

So, why is a file that lists regular expressions of binary filenames
called "filediff.conf"?  That's not intuitive at all.  And might not
different repositories have different ideas about binariness?  In
fact, as you'll read below, I have doubts about the need for this
feature at all.

> I'm looking for input, suggestions, ... and eventually maybe someone
> will include it into the mailer.py which exists now.

As it stands, I'm unfortunately very much agaist rolling this code (in
particular) and several of this concepts (in general) into mailer.py.
Some code review follows.

> --- mailer-120.py	2005-08-18 12:32:15.000000000 +0200
> +++ mailer-size_and_filediff-120.py	2005-08-19 11:47:50.000000000 +0200
> @@ -667,6 +667,28 @@
>        if self.paths.has_key(path) != self.in_paths:
>          continue
>  
> +      #Added ny JonB 20050818

We don't do in-code attribution.

> +      patern_matching = None
> +      reason = ''
> +
> +      filediff_conf = open(diff_config_fname, 'r')
> +      diffpatern = filediff_conf.readline()
> +      while diffpatern != '':
> +        fixed_diff_patern = diffpatern.strip()

In 'patern' and 'diffpatern' and 'fixed_diff_patern', the common
English spelling is 'pattern' (with two T's).

> +        comment_matching = re.search('^#',fixed_diff_patern)
> +        if comment_matching != None:

Using re.search here is quite unnecessary.

   if fixed_diff_patern[0] == '#':
      ...

Or even:

   if fixed_diff_patern.find('#') == 0:
      ...

> +      if patern_matching != None:
> +        reason = 'FORCED Binary file because of patern: %s\n' %
> fixed_diff_patern

Users aren't going to care which pattern caused their file not to show
up, so I think alot of this code could be greatly simplified by losing
the "reason" thing altogether.  So all this could would need to do is
override the answer returned by diff.either_binary() farther down.
Basically, you give Subversion an chance to say the file is binary.
If Subversion says, "Nope, not binary" then you see if the filename
matches your pattern, and if so, override that binariness answer.

Re: [PATCH] size and force text files to be considered binary in mailer.py

Posted by Jon Bendtsen <jo...@laerdal.dk>.
Den 26. sep 2005 kl. 21:33 skrev Michael W Thelen:

> Jon Bendtsen wrote:
>
>> Here follows a patch that allows one to specify a max mailsize in the
>> mailer.py script.
>> Further more the script allows you to specifiy which text files  
>> are  to
>> be considered binary
>> and thus no diff included in the mail.
>>
>
> Thanks for the patch, and I'm sorry it's been so long without a reply.
>
>
>> The mailsize is calculated by having a hardcoded max size limit  
>> in  the
>> mailer.py script.
>> For every diff
>>     for every line
>>         i take the length of the line and subtract it from the  
>> max  size.
>>         If the max size reaches 0 or below, one more line is added
>>         saying that the limit has been reached and no more text is
>>         included in the email.
>>
>> Currently the limit is hardcoded to 2MB
>>
>> Further more the number of diffs are counted and if there is more   
>> than
>> 128 diffs/files, it aborts
>> futher generation as well, but does send the mail that was genereated
>> so far.
>>
>
> I don't know Python, so I can't comment much on the patch itself,  
> but it
> does seem like the max mail size and max diffs could be specified as
> parameters instead of hard-coded.

i might do those at some point in the future.


> Is anyone else able to comment?  Is this a feature we'd like mailer.py
> to have?  I can file a patch issue if no one responds within a few  
> days.

There was a few that replied, but C. Michael Pilato seemed the most
interesting as he? gives technical feedback. He mailed the users list
though.
     http://subversion.tigris.org/servlets/ReadMsg? 
listName=users&msgNo=37271

As for using the svn:mime-type i dont think that is the right idea,  
because
other applications, like viewcvs, websvn, ..., might use the svn:mime- 
type
field.
Else i do believe that Mr. Pilato has his? points, my code script code
quality is not that impressive, it could be better, but hey, it works  
for my
client, and they dont want to put more time/money into this matter.



JonB

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] size and force text files to be considered binary in mailer.py

Posted by Branko Čibej <br...@xbc.nu>.
Michael W Thelen wrote:

>Jon Bendtsen wrote:
>  
>
>>The mailsize is calculated by having a hardcoded max size limit in  the
>>mailer.py script.
>>For every diff
>>    for every line
>>        i take the length of the line and subtract it from the max  size.
>>        If the max size reaches 0 or below, one more line is added
>>        saying that the limit has been reached and no more text is
>>        included in the email.
>>
>>Currently the limit is hardcoded to 2MB
>>
>>Further more the number of diffs are counted and if there is more  than
>>128 diffs/files, it aborts
>>futher generation as well, but does send the mail that was genereated 
>>so far.
>>    
>>
>
>I don't know Python, so I can't comment much on the patch itself, but it
>does seem like the max mail size and max diffs could be specified as
>parameters instead of hard-coded.
>  
>
Yes, mailer.py already has a config file, so adding those parameters 
seems logical.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] size and force text files to be considered binary in mailer.py

Posted by Michael W Thelen <mi...@pietdepsi.com>.
Jon Bendtsen wrote:
> Here follows a patch that allows one to specify a max mailsize in the 
> mailer.py script.
> Further more the script allows you to specifiy which text files are  to
> be considered binary
> and thus no diff included in the mail.

Thanks for the patch, and I'm sorry it's been so long without a reply.

> The mailsize is calculated by having a hardcoded max size limit in  the
> mailer.py script.
> For every diff
>     for every line
>         i take the length of the line and subtract it from the max  size.
>         If the max size reaches 0 or below, one more line is added
>         saying that the limit has been reached and no more text is
>         included in the email.
> 
> Currently the limit is hardcoded to 2MB
> 
> Further more the number of diffs are counted and if there is more  than
> 128 diffs/files, it aborts
> futher generation as well, but does send the mail that was genereated 
> so far.

I don't know Python, so I can't comment much on the patch itself, but it
does seem like the max mail size and max diffs could be specified as
parameters instead of hard-coded.

Is anyone else able to comment?  Is this a feature we'd like mailer.py
to have?  I can file a patch issue if no one responds within a few days.

-- 
Michael W Thelen
It is a mistake to think you can solve any major problems just with
potatoes.       -- Douglas Adams

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org