You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Eugene Morozov <sa...@eltex.net> on 2004/09/01 14:02:20 UTC
Please help me understand this snippet of SA code
Hello!
What does this code from HTML.pm module:
if ($self->{last_text}) {
# ideas discarded since they would be easy to evade:
# 1. using \w or [A-Za-z] instead of \S or non-punctuation
# 2. exempting certain tags
if ($text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s &&
$self->{last_text} =~ /[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s)
{
$self->{html}{obfuscation}++;
}
if ($self->{last_text} =~
/\b([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\z/s)
{
my $start = length($1);
if ($text =~ /^([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\b/s) {
my $backhair = $start . "_" . length($1);
$self->{html}{backhair}->{$backhair}++;
$self->{html}{backhair_count} = keys %{ $self->{html}{backhair} };
}
}
}
I'm debugging my unicode patch for SpamAssassin and this one of the
places which I think may need rewriting because it probably doesn't
support unicode input.
--
Email: eugene @ renice.org
Re: Please help me understand this snippet of SA code
Posted by Loren Wilton <lw...@earthlink.net>.
I'm guessing,but it appears to be scanning for oddball graphics characters
in probably unlikely combinations. From the names used, I'm assuming this
is an inlined form of Jen's backhair rules.
Loren
----- Original Message -----
From: "Eugene Morozov" <sa...@eltex.net>
To: <us...@spamassassin.apache.org>
Sent: Wednesday, September 01, 2004 5:02 AM
Subject: Please help me understand this snippet of SA code
> Hello!
> What does this code from HTML.pm module:
> if ($self->{last_text}) {
> # ideas discarded since they would be easy to evade:
> # 1. using \w or [A-Za-z] instead of \S or non-punctuation
> # 2. exempting certain tags
> if ($text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s &&
> $self->{last_text} =~ /[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s)
> {
> $self->{html}{obfuscation}++;
> }
> if ($self->{last_text} =~
> /\b([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\z/s)
> {
> my $start = length($1);
> if ($text =~ /^([^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]{1,7})\b/s)
{
> my $backhair = $start . "_" . length($1);
> $self->{html}{backhair}->{$backhair}++;
> $self->{html}{backhair_count} = keys %{ $self->{html}{backhair} };
> }
> }
> }
>
> I'm debugging my unicode patch for SpamAssassin and this one of the
> places which I think may need rewriting because it probably doesn't
> support unicode input.
>
> --
> Email: eugene @ renice.org
>