You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/04/26 05:45:12 UTC

[Bug 3305] New: Are cid.* urls useful to return from get_uri_list for URIDNSBL.pm ?

http://bugzilla.spamassassin.org/show_bug.cgi?id=3305

           Summary: Are cid.* urls useful to return from get_uri_list for
                    URIDNSBL.pm ?
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Libraries
        AssignedTo: spamassassin-dev@incubator.apache.org
        ReportedBy: yusufg@outblaze.com


Hi, Currently URIDNSBL.pm uses SA's get_uri_list to get a list of URI's
from a message, the current regex seems to also get uri's of the form
cid:random_characters in the list

cid:.* seems to refer to content-ids,attachments in the same message
when these uris are run through uri_to_domain, they return back the same
result cid:.*

Currently, SURBL relies on get_uri_list the grab the list of domains,
some uri's may not be appropiate as the basis for which to grab domains
for. If that list could be cut down, then the pool from which the random
selection is made could be more interesting

e.g, I could write a message with maybe 25-30 cid:.* url's and one
real-spamvertised url'. The probability of URIDNSBL.pm to get the
spamvertised url will be higher if the noise from the cid:.* url or
other non-interesting url's could be removed

Eric Kolve author of SpamcopURI.pm wrote this on the SURBL list
--
I use URI to do all the URI parsing and then check to see if it
has a host method, which only schemes such as http, ftp, gopher, etc.
actually implement.  The cid scheme translates to an internal _foreign
URI type, which has no host implementation.
--

Maybe URIDNSBL.pm should use a filter over get_uri_list which only spits out
uri's which have host implementations and use that to get the random set which
could be fed to uri_to_domain



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.