You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bob Newhart <fl...@gmail.com> on 2006/11/29 23:31:14 UTC

HTML Source Rule

Hello, I was wondering if there is a way to write a rule for HTML
source code contained in an email. I am getting many of these "Buy
This Stock" emails and I am finding that the pictures contained in
them all have a portion of a line of source that says...

src="cid:

Thanks in advance for any help anyone may be able to provide.

-- 
Jason Broyles

"Use Linux, it's free."

Re: HTML Source Rule

Posted by Loren Wilton <lw...@earthlink.net>.
> This Stock" emails and I am finding that the pictures contained in
> them all have a portion of a line of source that says...
>
> src="cid:

*ANY* inline image of any sort is going to contain that tag.  That is what 
links to the other mime section containing the image.

There are quite a number of rules for image stock spams.  The ImageInfo and 
FuzzyOCR plugins also help quite a lot in these cases, as do many SARE rules 
and the network rules.

        Loren


Re: HTML Source Rule

Posted by Matt Kettler <mk...@verizon.net>.
Bob Newhart wrote:
> Hello, I was wondering if there is a way to write a rule for HTML
> source code contained in an email.
Use rawbody as the rule type. This will match the text after decoding
(ie: base64) and line-wrap removal, but before HTML tags are removed.


Re: HTML Source Rule

Posted by Kelson <ke...@speed.net>.
Bret Miller wrote:
>> pictures contained in them all have a portion of a line of 
>> source that says...
>>
>> src="cid:
>>
>> Thanks in advance for any help anyone may be able to provide.
> 
> So does every message sent from Outlook that includes an image. I'd
> suspect that you'd end up rejecting a lot of legitimate e-mail, unless
> no one that sends you e-mail uses Outlook or Outlook Express...

Actually, I believe any email with embedded* images, regardless of the 
sending software, will contain that fragment.  cid: is the protocol for 
identifying a resource in another MIME part of the same document.

I've got one in my inbox right now that was sent from Thunderbird.


*Embedded meaning that they appear inline in the message body and the 
data is included in the message, not retrieved from a remote server.

-- 
Kelson Vibber
SpeedGate Communications <www.speed.net>

Re: HTML Source Rule

Posted by Richard Frovarp <Ri...@sendit.nodak.edu>.
Kenneth Porter wrote:
> On Thursday, November 30, 2006 5:01 PM -0600 Richard Frovarp 
> <Ri...@sendit.nodak.edu> wrote:
>
>> Kenneth Porter wrote:
>>> --On Wednesday, November 29, 2006 5:17 PM -0600 Richard Frovarp
>>> <Ri...@sendit.nodak.edu> wrote:
>>>
>>>> I have a few legit messages that are scoring over 5.0 due to
>>>> SARE_STOCKS and the TVD rules to catch stocks, and this is after
>>>> ALL_TRUSTED has done its work to reduce the score. These messages
>>>> of course have inline images and are being sent via Outlook
>>>> Express. Some of the scores on those rules are over 2.0. I have
>>>> started to reduce the scores, as the stock messages I get usually
>>>> have header problems and hit on Razor as well. I've seen legit
>>>> messages fire the MY_CID set of rules enough to rack up a score of
>>>> over 7.0 from those rules alone.
>>>
>>> Can you attach a sample? Perhaps the sender can be convinced to change
>>> the format to make the message look less spammy.
>>>
>> I'll find one tomorrow. The big three rules are/were
>> 2.00 PART_CID_STOCK 2.00 PART_CID_STOCK_LESS 2.80 TVD_FW_GRAPHIC_ID1
>>
>> The PART_CID rules have been removed from where ever they were 
>> located. I
>> have reduced the score on the TVD rule. I have 40K+ users. Talking to
>> individual users isn't something that I can do effectively. To make a
>> message look less spammy, they would have to not inline the image 
>> with OE.
>
> [Please reply to the list.]
>
> My point is simply that others may be seeing the same issue but not 
> know how to report it so that rule developers can exclude the ham. 
> Given some samples, it may be possible to separate the wheat from the 
> chaff.
>
Just followed the reply-to header, was too tired to notice anything 
different.

I was wrong the PART_CID_STOCK and PART_CID_STOCK_LESS rules are there. 
Some of my machines were not running sa-update correctly. Attached is 
one of my FPs. Pretty brutal for including a simple GIF.

Here is the report for the attached message:
score = 8.98
-1.44 ALL_TRUSTED
0.81 EXTRA_MPART_TYPE
0.00 HTML_MESSAGE
0.81 INFO_TLD
2.00 PART_CID_STOCK 
2.00 PART_CID_STOCK_LESS 
2.80 TVD_FW_GRAPHIC_ID1 
2.00 TVD_FW_MESG1

Re: HTML Source Rule

Posted by Kenneth Porter <sh...@sewingwitch.com>.
On Thursday, November 30, 2006 5:01 PM -0600 Richard Frovarp 
<Ri...@sendit.nodak.edu> wrote:

> Kenneth Porter wrote:
>> --On Wednesday, November 29, 2006 5:17 PM -0600 Richard Frovarp
>> <Ri...@sendit.nodak.edu> wrote:
>>
>>> I have a few legit messages that are scoring over 5.0 due to
>>> SARE_STOCKS and the TVD rules to catch stocks, and this is after
>>> ALL_TRUSTED has done its work to reduce the score. These messages
>>> of course have inline images and are being sent via Outlook
>>> Express. Some of the scores on those rules are over 2.0. I have
>>> started to reduce the scores, as the stock messages I get usually
>>> have header problems and hit on Razor as well. I've seen legit
>>> messages fire the MY_CID set of rules enough to rack up a score of
>>> over 7.0 from those rules alone.
>>
>> Can you attach a sample? Perhaps the sender can be convinced to change
>> the format to make the message look less spammy.
>>
> I'll find one tomorrow. The big three rules are/were
> 2.00 PART_CID_STOCK 2.00 PART_CID_STOCK_LESS 2.80 TVD_FW_GRAPHIC_ID1
>
> The PART_CID rules have been removed from where ever they were located. I
> have reduced the score on the TVD rule. I have 40K+ users. Talking to
> individual users isn't something that I can do effectively. To make a
> message look less spammy, they would have to not inline the image with OE.

[Please reply to the list.]

My point is simply that others may be seeing the same issue but not know 
how to report it so that rule developers can exclude the ham. Given some 
samples, it may be possible to separate the wheat from the chaff.

Re: HTML Source Rule

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Wednesday, November 29, 2006 5:17 PM -0600 Richard Frovarp 
<Ri...@sendit.nodak.edu> wrote:

> I have a few legit messages that are scoring over 5.0 due to SARE_STOCKS
> and the TVD rules to catch stocks, and this is after ALL_TRUSTED has done
> its work to reduce the score. These messages of course have inline images
> and are being sent via Outlook Express. Some of the scores on those rules
> are over 2.0. I have started to reduce the scores, as the stock messages
> I get usually have header problems and hit on Razor as well. I've seen
> legit messages fire the MY_CID set of rules enough to rack up a score of
> over 7.0 from those rules alone.

Can you attach a sample? Perhaps the sender can be convinced to change the 
format to make the message look less spammy.



Re: HTML Source Rule

Posted by Richard Frovarp <Ri...@sendit.nodak.edu>.
Bret Miller wrote:
>> Hello, I was wondering if there is a way to write a rule for
>> HTML source code contained in an email. I am getting many of
>> these "Buy This Stock" emails and I am finding that the
>> pictures contained in them all have a portion of a line of
>> source that says...
>>
>> src="cid:
>>
>> Thanks in advance for any help anyone may be able to provide.
>>     
>
> So does every message sent from Outlook that includes an image. I'd
> suspect that you'd end up rejecting a lot of legitimate e-mail, unless
> no one that sends you e-mail uses Outlook or Outlook Express...
>
> Bret
>
>
>
>
>   

I have a few legit messages that are scoring over 5.0 due to SARE_STOCKS 
and the TVD rules to catch stocks, and this is after ALL_TRUSTED has 
done its work to reduce the score. These messages of course have inline 
images and are being sent via Outlook Express. Some of the scores on 
those rules are over 2.0. I have started to reduce the scores, as the 
stock messages I get usually have header problems and hit on Razor as 
well. I've seen legit messages fire the MY_CID set of rules enough to 
rack up a score of over 7.0 from those rules alone.

RE: HTML Source Rule

Posted by Bret Miller <br...@wcg.org>.
> Hello, I was wondering if there is a way to write a rule for 
> HTML source code contained in an email. I am getting many of 
> these "Buy This Stock" emails and I am finding that the 
> pictures contained in them all have a portion of a line of 
> source that says...
> 
> src="cid:
> 
> Thanks in advance for any help anyone may be able to provide.

So does every message sent from Outlook that includes an image. I'd
suspect that you'd end up rejecting a lot of legitimate e-mail, unless
no one that sends you e-mail uses Outlook or Outlook Express...

Bret