You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Eugene Morozov <sa...@eltex.net> on 2004/09/01 14:02:20 UTC

Please help me understand this snippet of SA code

Hello!
What does this code from HTML.pm module:
  if ($self->{last_text}) {
    # ideas discarded since they would be easy to evade:
    # 1. using \w or [A-Za-z] instead of \S or non-punctuation
    # 2. exempting certain tags
    if ($text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s &&
	$self->{last_text} =~ /[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s)
    {
      $self->{html}{obfuscation}++;
    }
    if ($self->{last_text} =~
	/\b([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\z/s)
    {
      my $start = length($1);
      if ($text =~ /^([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\b/s) {
	my $backhair = $start . "_" . length($1);
	$self->{html}{backhair}->{$backhair}++;
	$self->{html}{backhair_count} = keys %{ $self->{html}{backhair} };
      }
    }
  }

I'm debugging my unicode patch for SpamAssassin and this one of the
places which I think may need rewriting because it probably doesn't
support unicode input.

-- 
Email: eugene @ renice.org

Re: Please help me understand this snippet of SA code

Posted by Loren Wilton <lw...@earthlink.net>.
I'm guessing,but it appears to be scanning for oddball graphics characters
in probably unlikely combinations.  From the names used, I'm assuming this
is an inlined form of Jen's backhair rules.

        Loren

----- Original Message ----- 
From: "Eugene Morozov" <sa...@eltex.net>
To: <us...@spamassassin.apache.org>
Sent: Wednesday, September 01, 2004 5:02 AM
Subject: Please help me understand this snippet of SA code


> Hello!
> What does this code from HTML.pm module:
>   if ($self->{last_text}) {
>     # ideas discarded since they would be easy to evade:
>     # 1. using \w or [A-Za-z] instead of \S or non-punctuation
>     # 2. exempting certain tags
>     if ($text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s &&
> $self->{last_text} =~ /[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s)
>     {
>       $self->{html}{obfuscation}++;
>     }
>     if ($self->{last_text} =~
> /\b([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\z/s)
>     {
>       my $start = length($1);
>       if ($text =~ /^([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\b/s)
{
> my $backhair = $start . "_" . length($1);
> $self->{html}{backhair}->{$backhair}++;
> $self->{html}{backhair_count} = keys %{ $self->{html}{backhair} };
>       }
>     }
>   }
>
> I'm debugging my unicode patch for SpamAssassin and this one of the
> places which I think may need rewriting because it probably doesn't
> support unicode input.
>
> -- 
> Email: eugene @ renice.org
>