You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Sebastian Arcus <s....@open-t.co.uk> on 2018/03/31 21:24:37 UTC

BODY custom rule not working if text and html parts are different?

I have a really simple rule looking for custom text string contained in 
spam urls in the body of the email, like so:

body      SHORT_BITCOIN_DATING    /specific_string_here/i
score     SHORT_BITCOIN_DATING    3.0
describe  SHORT_BITCOIN_DATING    Body URL signature of spam

I just realised that it is only working if the URL exists in both the 
text and html versions. If the text version doesn't have the url, it 
isn't working. Do "body" rules only work on the html part of the 
message? I've tried searching through the documentation, but I can't see 
that being the case. Maybe there is something else having an effect here?

Many thanks for any hints.

Re: BODY custom rule not working if text and html parts are different?

Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 31/03/18 22:39, John Hardin wrote:
> On Sat, 31 Mar 2018, Sebastian Arcus wrote:
> 
>> I have a really simple rule looking for custom text string contained 
>> in spam urls in the body of the email, like so:
>>
>> body      SHORT_BITCOIN_DATING    /specific_string_here/i
>> score     SHORT_BITCOIN_DATING    3.0
>> describe  SHORT_BITCOIN_DATING    Body URL signature of spam
>>
>> I just realised that it is only working if the URL exists in both the 
>> text and html versions. If the text version doesn't have the url, it 
>> isn't working. Do "body" rules only work on the html part of the 
>> message? I've tried searching through the documentation, but I can't 
>> see that being the case. Maybe there is something else having an 
>> effect here?
> 
> "body" includes the *rendered* part of HTML. If the URL only appears 
> within <a href="..."> in the HTML part then "body" will not see it.
> 
> If you are looking for URLs, you should probably be using a "uri" rule. 
> There are heuristics to pull those out of the body text, as well out of 
> HTML tags.

Thank you for the suggestions - much appreciated. As my original rule 
worked initially, I didn't realise the subtle difference between using 
BODY and URI rules. It is working fine now. Thank you again!

Re: BODY custom rule not working if text and html parts are different?

Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 01/04/18 19:18, John Hardin wrote:
> On Sun, 1 Apr 2018, John Hardin wrote:
> 
>> On Sun, 1 Apr 2018, Matus UHLAR - fantomas wrote:
>>
>>> On 01.04.18 05:47, Pedro David Marco wrote:
>>>> This is a problem i see oftenly...
>>>> what if the URL is only in the TEXT part  and not in the HTML?  many 
>>>> email aplications show those URLs as clickable as if they were valid 
>>>> HTML HREFs when they are not...
>>>
>>> in this case, body rule matches, but uri does not.
>>
>> I think there are hueristics to pull (non-obfuscated) URIs out of body 
>> text.
> 
> Yeah, just confirmed. A non-obfuscated URI in plain-text body part is 
> recognized and extracted for uri rules.

That's great - thank you for testing this out and letting us know.

Re: BODY custom rule not working if text and html parts are different?

Posted by John Hardin <jh...@impsec.org>.
On Mon, 2 Apr 2018, Pedro David Marco wrote:

>
>
>> Yeah, just confirmed. A non-obfuscated URI in plain-text body part is
>> recognized and extracted for uri rules.
>
> Thanks John...  can you provide any pastebein sample please??... 

It's trivially easy to add a URI to the text body part of any test message 
you may have lying around. If you run SpamAssassin in rule debug mode and 
add a rule like this to your test environment it will be really easy to 
see the extracted URIs:

   uri     __ALL_URI    /.+/
   tflags  __ALL_URI    multiple


(running SA in rule debug mode:

   ./spamassassin -L -t --debug area=all,rules,rules-all < $MSG

)

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Gun Control: The theory that a woman found dead in an alley,
   raped and strangled with her panty hose, is somehow
   morally superior to a woman explaining to police
   how her attacker got that fatal bullet wound.      -- L. Neil Smith
-----------------------------------------------------------------------
  368 days since the first commercial re-flight of an orbital booster (SpaceX)

Re: BODY custom rule not working if text and html parts are different?

Posted by Pedro David Marco <pe...@yahoo.com>.
 

>Yeah, just confirmed. A non-obfuscated URI in plain-text body part is 
>recognized and extracted for uri rules.

Thanks John...  can you provide any pastebein sample please??... 
----PedroD  

Re: BODY custom rule not working if text and html parts are different?

Posted by John Hardin <jh...@impsec.org>.
On Sun, 1 Apr 2018, John Hardin wrote:

> On Sun, 1 Apr 2018, Matus UHLAR - fantomas wrote:
>
>> On 01.04.18 05:47, Pedro David Marco wrote:
>>> This is a problem i see oftenly...
>>> what if the URL is only in the TEXT part  and not in the HTML?  many email 
>>> aplications show those URLs as clickable as if they were valid HTML HREFs 
>>> when they are not...
>> 
>> in this case, body rule matches, but uri does not.
>
> I think there are hueristics to pull (non-obfuscated) URIs out of body text.

Yeah, just confirmed. A non-obfuscated URI in plain-text body part is 
recognized and extracted for uri rules.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The reason it took so long to get Bin Laden is that it took the
   SEALs five years to swim that far into the desert.          -- anon
-----------------------------------------------------------------------
  Today: April Fools' day

Re: BODY custom rule not working if text and html parts are different?

Posted by John Hardin <jh...@impsec.org>.
On Sun, 1 Apr 2018, Matus UHLAR - fantomas wrote:

> On 01.04.18 05:47, Pedro David Marco wrote:
>> This is a problem i see oftenly...
>> what if the URL is only in the TEXT part  and not in the HTML?  many email 
>> aplications show those URLs as clickable as if they were valid HTML HREFs 
>> when they are not...
>
> in this case, body rule matches, but uri does not.

I think there are hueristics to pull (non-obfuscated) URIs out of body 
text.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The reason it took so long to get Bin Laden is that it took the
   SEALs five years to swim that far into the desert.          -- anon
-----------------------------------------------------------------------
  Today: April Fools' day

Re: BODY custom rule not working if text and html parts are different?

Posted by Sebastian Arcus <s....@open-t.co.uk>.
On 01/04/18 07:10, Matus UHLAR - fantomas wrote:
> On 01.04.18 05:47, Pedro David Marco wrote:
>> This is a problem i see oftenly...
>> what if the URL is only in the TEXT part  and not in the HTML?  many 
>> email aplications show those URLs as clickable as if they were valid 
>> HTML HREFs when they are not...
> 
> in this case, body rule matches, but uri does not.

I wonder if RAWBODY would match the url both in the text part and in the 
html part? Does anybody know?

Re: BODY custom rule not working if text and html parts are different?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 01.04.18 05:47, Pedro David Marco wrote:
> This is a problem i see oftenly...
>what if the URL is only in the TEXT part  and not in the HTML?  many email aplications show those URLs as clickable as if they were valid HTML HREFs when they are not...

in this case, body rule matches, but uri does not.
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Microsoft dick is soft to do no harm

Re: BODY custom rule not working if text and html parts are different?

Posted by Leandro <le...@spfbl.net>.
2018-04-01 2:47 GMT-03:00 Pedro David Marco <pe...@yahoo.com>:

> This is a problem i see oftenly...
>
> what if the URL is only in the TEXT part  and not in the HTML?  many email
> aplications show those URLs as clickable as if they were valid HTML HREFs
> when they are not...
>

We have a script that can extract URLs at text part. Lines 998-1016:

https://www.dropbox.com/s/5aorrijafw5ygk0/uribl.pl?dl=0

You can use it as model to your own script or use it as is.


>
> -----
> PedroD
>

Re: BODY custom rule not working if text and html parts are different?

Posted by Pedro David Marco <pe...@yahoo.com>.
 This is a problem i see oftenly...
what if the URL is only in the TEXT part  and not in the HTML?  many email aplications show those URLs as clickable as if they were valid HTML HREFs when they are not...
-----PedroD

Re: BODY custom rule not working if text and html parts are different?

Posted by John Hardin <jh...@impsec.org>.
On Sat, 31 Mar 2018, Sebastian Arcus wrote:

> I have a really simple rule looking for custom text string contained in spam 
> urls in the body of the email, like so:
>
> body      SHORT_BITCOIN_DATING    /specific_string_here/i
> score     SHORT_BITCOIN_DATING    3.0
> describe  SHORT_BITCOIN_DATING    Body URL signature of spam
>
> I just realised that it is only working if the URL exists in both the text 
> and html versions. If the text version doesn't have the url, it isn't 
> working. Do "body" rules only work on the html part of the message? I've 
> tried searching through the documentation, but I can't see that being the 
> case. Maybe there is something else having an effect here?

"body" includes the *rendered* part of HTML. If the URL only appears 
within <a href="..."> in the HTML part then "body" will not see it.

If you are looking for URLs, you should probably be using a "uri" rule. 
There are heuristics to pull those out of the body text, as well out of 
HTML tags.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Liberals love sex ed because it teaches kids to be safe around their
   sex organs. Conservatives love gun education because it teaches kids
   to be safe around guns. However, both believe that the other's
   education goals lead to dangers too terrible to contemplate.
-----------------------------------------------------------------------
  Tomorrow: April Fools' day