You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@issues.apache.org on 2010/06/24 03:49:03 UTC

[Bug 6458] New: add blacklist_uri_host, whitelist_uri_host; and A record lookups to URIs in URIDNSBL plugin

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6458

           Summary: add blacklist_uri_host, whitelist_uri_host; and A
                    record lookups to URIs in URIDNSBL plugin
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: Mark.Martinec@ijs.si


Here are two enhancements which are mostly unrelated (except that
they both deal with URIs found in a message), but need a common
modification to the underlying support code and data structures,
which is why I'm bundling them in a single ticket. The enhancement
was suggested by AXB, and I think it makes a worthwhile addition.

1. added configuration directives blacklist_uri_host,
whitelist_uri_host and unlist_uri_host to manage a list of
black- or white-listed host names or domain name as found in URLs
of a message. This is functionally much like specifying 'uri' rules
with sufficiently precise parsing regular expressions, except that
it is much easier to specify, less error-prone than regexps, and
quicker to execute. Consider it a syntactic sugar for a common
need. It is supposed to deal with user_prefs files ('scoresonly'
switching), although this aspect was not yet thoroughly tested.
It is mostly implemented in Plugin::WLBLEval and Conf.pm.

2. added tflags 'a' and 'ns' to the 'uridnsbl' directive in the
Plugin::URIDNSBL (while preserving compatibility). Traditionally
the uridnsbl rules did a NS lookup on domain names found in URIs,
then mapped these to their IP addresses by an A lookup, which in
turn is sent to DNSBL lists. Not all DNSBL lists are supposed
to be used this way - an example is the "black_a.txt" list at
http://www.uribl.com/datasets.shtml , which expects to be queried
by IP addresses of hosts in URIs, not by their name servers.
With the addition of both tflags, one may choose one or the other
type of a lookup, or even both.

The implementation was complicated by the fact that the underlying
code stripped off host parts to a registrar boundary, which loses
the necessary information for both the uridnsbl+A lookups, as well
as for URI black- and whitelisting. So the change needed to touch
some supporting code, which preserving compativility.


Example use:


if can(Mail::SpamAssassin::Conf::feature_uri_host_wblist)

blacklist_uri_host wWw.Example.COM example.NET 127.0.0.1
blacklist_uri_host 127.0.0.2
whitelist_uri_host aaa.bbb.example.org edu mil

header URI_HOST_IN_BLACKLIST        eval:check_uri_host_in_blacklist()
describe URI_HOST_IN_BLACKLIST      Host or domain found in URI is blacklisted
tflags URI_HOST_IN_BLACKLIST        userconf noautolearn
score URI_HOST_IN_BLACKLIST 0.1

header URI_HOST_IN_WHITELIST        eval:check_uri_host_in_whitelist()
describe URI_HOST_IN_WHITELIST      Host or domain found in URI is blacklisted
tflags URI_HOST_IN_WHITELIST        userwconf noautolearn
score URI_HOST_IN_WHITELIST -0.1

endif


uridnsbl URIBL_TEST  testbl.example.org   TXT
body     URIBL_TEST  eval:check_uridnsbl('URIBL_TEST')
describe URIBL_TEST  Contains a URL listed in the xxx blocklist
tflags   URIBL_TEST  net a




Below are excerpts from the new documentation:



=item uridnsbl NAME_OF_RULE dnsbl_zone lookuptype

Specify a lookup.  C<NAME_OF_RULE> is the name of the rule to be
used, C<dnsbl_zone> is the zone to look up IPs in, and C<lookuptype>
is the type of lookup (B<TXT> or B<A>).   Note that you must also
define a body-eval rule calling C<check_uridnsbl()> to use this.

This works by collecting domain names from URLs and querying DNS
blocklists with an IP address of host names found in URLs or with
IP addresses of their name servers, according to tflags as follows.

If the corresponding body rule has a tflag 'a', the DNS blocklist will
be queried with an IP address of a host found in URLs.

If the corresponding body rule has a tflag 'ns', DNS will be queried
for name servers (NS records) of a domain name found in URLs, then
these name server names will be resolved to their IP addresses, which
in turn will be sent to DNS blocklist.

Tflags directive may specify either 'a' or 'ns' or both flags. In absence
of any of these two flags, a default is a 'ns', which is compatible with
pre-3.4 versions of SpamAssassin.

The choice of tflags must correspond to the policy and expected use of
each DNS blocklist and is normally not a local decision. As an example,
a blocklist expecting queries resulting from an 'a' tflag is a
"black_a.txt" ( http://www.uribl.com/datasets.shtml ).

Example:
 uridnsbl        URIBL_SBLXBL    sbl-xbl.spamhaus.org.   TXT
 body            URIBL_SBLXBL    eval:check_uridnsbl('URIBL_SBLXBL')
 describe        URIBL_SBLXBL    Contains a URL listed in the SBL/XBL blocklist
 tflags          URIBL_SBLXBL    net ns

[...]

=item tflags NAME_OF_RULE ns

The 'ns' flag may be applied to rules corresponding to uridnsbl and uridnssub
directives. Host names from URLs will be mapped to their name server IP
addresses (a NS lookup followed by an A lookup), which in turn will be sent
to blocklists. This is a default when neither 'a' nor 'ns' flags are specified.

=item tflags NAME_OF_RULE a

The 'a' flag may be applied to rules corresponding to uridnsbl and uridnssub
directives. Host names from URLs will be mapped to their IP addresses, which
will be sent to blocklists. When both 'ns' and 'a' flags are specified,
both queries will be performed.



[...]

=item blacklist_uri_host host-or-domain ...

Adds one or more host names to a list of blacklisted URI domains.

No wildcards are supported, but subdomains do match implicitly. There is
only one combined list for black- and whitelisting of host names in URIs.
Search starts by looking up the full hostname first, then leading fields
are progresively stripped off (e.g.: sub.example.com, example.com, com)
until a match is found or we run out of fields. The first matching entry
(the most specific) determines if a lookup yielded a blacklisted or a
whitelisted result.

If an URL contains an IP address in place of a host name, the
black- (or white-) list must specify the exact same IP address.

A domain cannot be both blacklisted and whitelisted at the same time, the
last directive prevails. Use the unlist_uri_host directive to neutralize
previous blacklist_uri_host and whitelist_uri_host settings.


=item whitelist_uri_host host-or-domain ...

Adds one or more host names to a list of whitelisted URI domains.
See blacklist_uri_host directive for details.


=item unlist_uri_host host-or-domain ...

Adds one or more specified host names from a list of black-or-white -listed
URI domains. Removing an unlisted name is ignored (is not an error).

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6458] add blacklist_uri_host, whitelist_uri_host; and A record lookups to URIs in URIDNSBL plugin

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6458

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #4 from Mark Martinec <Ma...@ijs.si> 2011-09-24 01:30:49 UTC ---
closing, this is in trunk/3.4.0 for a year now

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6458] add blacklist_uri_host, whitelist_uri_host; and A record lookups to URIs in URIDNSBL plugin

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6458

--- Comment #1 from Mark Martinec <Ma...@ijs.si> 2010-06-23 21:53:14 EDT ---
Created an attachment (id=4784)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4784)
proposed patch

trunk:
  Bug 6458: add blacklist_uri_host, whitelist_uri_host;
  and A record lookups to URIs in URIDNSBL plugin
Sending lib/Mail/SpamAssassin/Conf/Parser.pm
Sending lib/Mail/SpamAssassin/Conf.pm
Sending lib/Mail/SpamAssassin/PerMsgStatus.pm
Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending lib/Mail/SpamAssassin/Plugin/WLBLEval.pm
Sending lib/Mail/SpamAssassin/Util.pm
Committed revision 957401.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6458] add blacklist_uri_host, whitelist_uri_host; and A record lookups to URIs in URIDNSBL plugin

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6458

--- Comment #2 from Mark Martinec <Ma...@ijs.si> 2010-06-24 12:56:10 EDT ---
avoids reinventing the wheel...

trunk:
  Bug 6458: fix handling user_prefs for uri wblisting
Sending lib/Mail/SpamAssassin/Conf/Parser.pm
Sending lib/Mail/SpamAssassin/Conf.pm
Sending lib/Mail/SpamAssassin/Plugin/WLBLEval.pm
Committed revision 957624.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6458] add blacklist_uri_host, whitelist_uri_host; and A record lookups to URIs in URIDNSBL plugin

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6458

--- Comment #3 from Mark Martinec <Ma...@ijs.si> 2010-06-28 19:35:05 EDT ---
> Amazing stuff!
>   blacklist_uri_host ru cn kr

Now this brings up some ideas for added flexibility. A hard-core
blacklisting with a very high score may not suit every need, it
would be nice to be able to associate a score with a list of
'not-so-black'-listed URI domains.

The following change implements just that: instead of a single
black (or white) list, it allows one to form any number of
named lists of URI hosts, and associate a score with each list.


Example 1:

enlist_uri_host (LOW)  geocities.com
enlist_uri_host (MED)  geocities.yahoo.com.br
enlist_uri_host (LOW)  AutoFinanceUK.co.uk
enlist_uri_host (HIGH) blasdutro buckrea.com
enlist_uri_host (MED)  True.com
enlist_uri_host (LOW)  imageshack.us

and the corresponding rules:

header   URI_HOST_LOW   eval:check_uri_host_listed('LOW')
describe URI_HOST_LOW   Host or domain found in URI is listed in the LOW list
tflags   URI_HOST_LOW   userconf noautolearn
score    URI_HOST_LOW   1.5

header   URI_HOST_MED   eval:check_uri_host_listed('MED')
describe URI_HOST_MED   Host or domain found in URI is listed in the MED list
tflags   URI_HOST_MED   userconf noautolearn
score    URI_HOST_MED   4

header   URI_HOST_HIGH  eval:check_uri_host_listed('HIGH')
describe URI_HOST_HIGH  Host or domain found in URI is listed in the HIGH list
tflags   URI_HOST_HIGH  userconf noautolearn
score    URI_HOST_HIGH  12


Example 2:

blacklist_uri_host www.need-lust.com www.crave-lust
blacklist_uri_host sommerphantasie.com klick2go.com lucymeier.com
blacklist_uri_host www.replaceftpsmtp.com www.aectransfer.org
blacklist_uri_host epsore.com www.alveal.com
blacklist_uri_host reppsetinte.com preprotissit.com
blacklist_uri_host www.weinportale.de www.fasctvideos.cn
blacklist_uri_host www.dilcasino.com www.hotgoldgambling.net
blacklist_uri_host www.antos.si www.omegaic.net www.clickonevent.com
blacklist_uri_host www.exorcism.org www.eturning.com www.piramidasunca.ba
blacklist_uri_host 64.15.147.100
blacklist_uri_host bot.tormaxusa.net www.qtechna.si www.clecle.si
blacklist_uri_host www.ninadesign.co.nr constructionfiles.net aecfiles02.com
blacklist_uri_host filetransfer00.com filetransfer01.com filetransfer02.com
blacklist_uri_host filetransfer03.com filetransfer04.com filetransfer05.com
blacklist_uri_host filetransfer06.com filetransfer07.com filetransfer08.com
blacklist_uri_host filetransfer09.com

header URI_HOST_IN_BLACKLIST    eval:check_uri_host_listed('BLACK')
describe URI_HOST_IN_BLACKLIST  Host or domain found in URI is blacklisted
tflags URI_HOST_IN_BLACKLIST    userconf noautolearn
score URI_HOST_IN_BLACKLIST     8

header URI_HOST_IN_WHITELIST    eval:check_uri_host_listed('WHITE')
describe URI_HOST_IN_WHITELIST  Host or domain found in URI is blacklisted
tflags URI_HOST_IN_WHITELIST    userconf nice noautolearn
score URI_HOST_IN_WHITELIST     -10


Example 3:

enlist_uri_host (RCKT) ru !aaa.example.kr cn kr tr
header URI_HOST_RCKT  eval:check_uri_host_listed('RCKT')
score  URI_HOST_RCKT 0.1

enlist_uri_host (RU) ru
header URI_HOST_RU  eval:check_uri_host_listed('RU')
score  URI_HOST_RU  1.8

enlist_uri_host (CN) cn
header URI_HOST_CN  eval:check_uri_host_listed('CN')
score  URI_HOST_CN  1.2

enlist_uri_host (KR) kr
header URI_HOST_KR  eval:check_uri_host_listed('KR')
score  URI_HOST_KR  1.5

enlist_uri_host (TR) tr
header URI_HOST_TR  eval:check_uri_host_listed('TR')
score  URI_HOST_TR  1.5



Here is the corresponding (changed/added) documentation:

=item enlist_uri_host (listname) host ...

Adds one or more host names or domain names to a named list of URI domains.
The named list can then be consulted through a check_uri_host_in_wblist()
eval rule, which takes the list name as an argument. Parenthesis around
a list name are literal - a required syntax.

Host names may optionally be prefixed by an exclamantion mark '!', which
produces false as a result if this entry matches. This makes it easier
to exclude some subdomains when their superdomain is listed, for example:

  enlist_uri_host (MYLIST) !sub1.example.com !sub2.example.com example.com

No wildcards are supported, but subdomains do match implicitly. Lists
are independent. Search for each named list starts by looking up the
full hostname first, then leading fields are progressively stripped off
(e.g.: sub.example.com, example.com, com) until a match is found or we run
out of fields. The first matching entry (the most specific) determines if
a lookup yielded a true (no '!' prefix) or a false ('!'-prefixed) result.

If an URL found in a message contains an IP address in place of a host name,
the given list must specify the exact same IP address (instead of a host name)
in order to match.

Use the delist_uri_host directive to neutralize previous enlist_uri_host
settings. Listnames 'BLACK' and 'WHITE' have their shorthand directives
blacklist_uri_host and whitelist_uri_host and default rules, but are
otherwise not special or reserved.



=item delist_uri_host [ (listname) ] host ...

Removes one or more specified host names from a named list of URI domains.
Removing an unlisted name is ignored (is not an error). Listname is optional,
if specified then just the named list is affected, otherwise hosts are
removed from all URI host lists created so far. Parenthesis around a list
name are a required syntax.

Note that directives in configuration files are processed in sequence,
the delist_uri_host only applies to previously listed entries and has
no effect on enlisted entries in yet-to-be-processed directives.

For convenience (similarity to the enlist_uri_host directive) hostnames
may be prefixed by a an exclamation mark, which is stripped off from each
name and has no meaning here.



=item blacklist_uri_host host-or-domain ...

Is a shorthand for a directive:  enlist_uri_host (BLACK) host ...
Please see directives enlist_uri_host and delist_uri_host for details.



=item whitelist_uri_host host-or-domain ...

Is a shorthand for a directive:  enlist_uri_host (BLACK) host ...
Please see directives enlist_uri_host and delist_uri_host for details.




trunk:
  Bug 6458 - add enlist_uri_host and delist_uri_host conf directives,
  allowing for arbitrarily named URI lists, each associated with
  its own scoring rule
Sending lib/Mail/SpamAssassin/Conf.pm
Sending lib/Mail/SpamAssassin/Plugin/WLBLEval.pm
Committed revision 958790.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6458] add blacklist_uri_host, whitelist_uri_host; and A record lookups to URIs in URIDNSBL plugin

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6458

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3
   Target Milestone|Undefined                   |3.4.0

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.