You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Keith Hackworth <ke...@rpemail.com> on 2004/10/27 15:35:11 UTC

Re: 'non-pristine message'?

> On Tue, Oct 26, 2004 at 04:00:21PM -0400, Keith Hackworth wrote:
>> Is there a Mail::SpamAssassin::PerMsgStatus method that will return just
>> the body of the message (no attachments/multiparts)?  I need a
>> "non-pristine" message to speed things up in my plugins.  I know there's
>> get_message(), but that seems to return the whole message, with the
>> attachments.
>
> It's hard to give you an answer when you don't explain what you're trying
> to get ...  do you want the decoded text-only body?  rendered text-only
> body?  something completely different?
>
> I'm guess you want PMS::get_decoded_stripped_body_text_array().
>
> --
> Randomly Generated Tagline:
> Why buy shampoo when real poo is still free?
>

Thanks, Theo - this may work for html only messages, which might be good
enough for what I'm trying to do.  I need just the HTML version of the
email.  No attachments, just the HTML body.  If the 1st part if multipart,
I need the 1st html part.

Here's what I'm trying to do:
I'm trying to find invalid html tags and if there's too many, bump the sa
score up a bit.  I noticed a bunch of messages come in with obfu like this
"v-wo<notatag>rd" in the body of the html message, which shows up as
"v-word" on a normal webmail or outlook email client.  I want to see how
many "notatag"s we're getting in a message.  I got the code on how to do
it and it works fine, but it's just WAY too slow using PMS::get_message().

The PMS::get_decoded_stripped_body_text_array() will work for html only
messages, which is more than likely the only way this type of obfu will
work.  I don't know of any way a text message can take advantage of the
email clients like this.

Thanks,
Keith


Re: 'non-pristine message'?

Posted by Keith Hackworth <ke...@rpemail.com>.
> On Wed, Oct 27, 2004 at 09:35:11AM -0400, Keith Hackworth wrote:
>> > I'm guess you want PMS::get_decoded_stripped_body_text_array().
>>
>> Thanks, Theo - this may work for html only messages, which might be good
>> enough for what I'm trying to do.  I need just the HTML version of the
>> email.  No attachments, just the HTML body.  If the 1st part if
>> multipart,
>> I need the 1st html part.
>
> If you want to limit what you're looking at in that way, you'd need to
> access
> the Message object directly and use find_parts to grab just the first
> matching
> part you're interested in.  The PMS functions work on all text/* parts,
> and
> aren't limited to HTML.
>
>> Here's what I'm trying to do:
>> I'm trying to find invalid html tags and if there's too many, bump the
>> sa
>> score up a bit.  I noticed a bunch of messages come in with obfu like
>> this
>> "v-wo<notatag>rd" in the body of the html message, which shows up as
>> "v-word" on a normal webmail or outlook email client.  I want to see how
>> many "notatag"s we're getting in a message.  I got the code on how to do
>> it and it works fine, but it's just WAY too slow using
>> PMS::get_message().
>
> Yeah, that'll get you a bunch of stuff you really don't care about.
> get_decoded_stripped... is also not the right thing, since it will have
> stripped all the HTML tags.  I'd try get_decoded_body_text_array(),
> or since you're doing code anyway, just use find_parts and grab the
> m@^text/html@i parts of the message.  You can then easily call decode()
> on them (object function) and get the raw HTML out.
>
> Just curious though, why limit yourself to invalid html tags?  Why not
> just
> target the html-tag-in-middle-of-word behavior?   and isn't this the same
> idea
> as the backhair code?
>
> --
> Randomly Generated Tagline:
> "Exactly what it should've been, give people what they expect.  The third
>  one can be clever."               - John Hughes about Home Alone 2
>

Wow!  I guess if I RTFM a little better, I'd save myself a lot of trouble.
 I didn't realize backhair did this already.

On to E@sy Tr@ns, which catches "cr@p l|k3 th!s" in the subject.  Yes - I
know chicken pox does this already, but I have many custom rules built on
my server for this one and it seems to be much more accurate.

Thanks!
Keith


Re: 'non-pristine message'?

Posted by Theo Van Dinter <fe...@kluge.net>.
On Wed, Oct 27, 2004 at 09:35:11AM -0400, Keith Hackworth wrote:
> > I'm guess you want PMS::get_decoded_stripped_body_text_array().
> 
> Thanks, Theo - this may work for html only messages, which might be good
> enough for what I'm trying to do.  I need just the HTML version of the
> email.  No attachments, just the HTML body.  If the 1st part if multipart,
> I need the 1st html part.

If you want to limit what you're looking at in that way, you'd need to access
the Message object directly and use find_parts to grab just the first matching
part you're interested in.  The PMS functions work on all text/* parts, and
aren't limited to HTML.

> Here's what I'm trying to do:
> I'm trying to find invalid html tags and if there's too many, bump the sa
> score up a bit.  I noticed a bunch of messages come in with obfu like this
> "v-wo<notatag>rd" in the body of the html message, which shows up as
> "v-word" on a normal webmail or outlook email client.  I want to see how
> many "notatag"s we're getting in a message.  I got the code on how to do
> it and it works fine, but it's just WAY too slow using PMS::get_message().

Yeah, that'll get you a bunch of stuff you really don't care about.
get_decoded_stripped... is also not the right thing, since it will have
stripped all the HTML tags.  I'd try get_decoded_body_text_array(),
or since you're doing code anyway, just use find_parts and grab the
m@^text/html@i parts of the message.  You can then easily call decode()
on them (object function) and get the raw HTML out.

Just curious though, why limit yourself to invalid html tags?  Why not just
target the html-tag-in-middle-of-word behavior?   and isn't this the same idea
as the backhair code?

-- 
Randomly Generated Tagline:
"Exactly what it should've been, give people what they expect.  The third 
 one can be clever."               - John Hughes about Home Alone 2