You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by mastered <al...@pizzarelli.it> on 2017/07/15 10:19:54 UTC

Re: ramsonware URI list

Hi Nicola, 

I'm not good at SHELL script language, but this might be fine:

1 - Save file into lista.txt

2 - trasform lista.txt in spamassassin rules:

cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' |
sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2"
describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware
score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt) ;
do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf 


If anyone can optimize it, i'm happy.

Alberto.



--
View this message in context: http://spamassassin.1065346.n5.nabble.com/ramsonware-URI-list-tp122939p135313.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: ramsonware URI list

Posted by RW <rw...@googlemail.com>.

On Sat, 15 Jul 2017 13:13:31 -0500 (CDT)
David B Funk wrote:

> > On Sat, 15 Jul 2017, Antony Stone wrote:

> One observation; that list has over 10,000 entries which means that
> you're going to be adding thousands of additional rules to SA on an
> automated basis.
> 
> Some time in the past other people had worked up automated mechanisms
> to add large numbers of rules derived from example spam messages (Hi
> Chris;) and there were performance issues (significant increase in SA
> load time, memory usage, etc).

I'm not an expert on perl internals, so I may be wide of the mark,
but I would have thought that the most efficient way to do this
using uri rule(s) would be to generate a single regex recursively so
that scanning would be O(log(n)) in the number of entries rather than
O(n). 

You start by stripping the http://  and then make a list of the all
the first characters, then for each character you recurse. You end up
with something like 

^http://(a(...)|b(...)...|z(...))

Where each of the (...) contains a similar list of alternations to the
top level. 

You can take this a bit further and detect when the all the strings in
the current list start with a common sub-string - you can then generate
the equivalent of a patricia trie in regex form.  

> Be aware, you may run into that situation. Using a URI-dnsbl avoids
> that risk.

The list contains full URLs, I presume there's a reason for that. For
example:

http://invoiceholderqq.com/85.exe
http://invoiceholderqq.com/87.exe
http://invoiceholderqq.com/93.exe
http://inzt.net/08yhrf3
http://inzt.net/0ftce4

Re: ramsonware URI list

Posted by Rob McEwen <ro...@invaluement.com>.

On 7/15/2017 2:13 PM, David B Funk wrote:
> How quickly do stale entries get removed from it?

I randomly sorted this list, then I tried visiting 10 randomly selected 
links. I know that isn't a very large sample size, but it is a strong 
indicator since they were purely randomly chosen. 9 of the 10 links had 
already been taken down. So there might be much stale data in that list?

I also extracted out the host names, deleted duplicates, randomly sorted 
those, then ran checks of 500 randomly selected host names against 
SURBL, URIBL, DBL, and ivmURI. The number of hits on all 4 lists of 
shockingly low. But I think that probably has more to do with stale data 
on this URL list (and this is really a URL list, not a URI list), rather 
than with lack of effectiveness of these other domain/URI blacklists.

Still, there can be situations where a URI list won't list such a host 
name due to too much collateral damage - but yet where a URL list that 
specifically lists the entire URL - can still be effective.

Because such a URL list would be LESS efficient (due to being 
rules-based), it would be preferable that such a list would have much 
less stale data - and perhaps would focus on the stuff that isn't found 
on any (or very many) of the 4 major URI lists I mentioned, so as to 
keep the data small and focused, for maximum processing efficiency.

-- 
Rob McEwen
http://www.invaluement.com

Re: ramsonware URI list

Posted by David B Funk <db...@engineering.uiowa.edu>.

> On Sat, 15 Jul 2017, Antony Stone wrote:
>
>> On Saturday 15 July 2017 at 11:19:54, mastered wrote:
>> 
>>> Hi Nicola,
>>> 
>>> I'm not good at SHELL script language, but this might be fine:
>>> 
>>> 1 - Save file into lista.txt
>>> 
>>> 2 - trasform lista.txt in spamassassin rules:
>>> 
>>> cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' |
>>> sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print 
>>> "uri;RULE_NR_"$1";"$2"
>>> describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware
>>> score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt)
>>> ; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf
[snip..]

One observation; that list has over 10,000 entries which means that you're going 
to be adding thousands of additional rules to SA on an automated basis.

Some time in the past other people had worked up automated mechanisms to add 
large numbers of rules derived from example spam messages (Hi Chris;) and there 
were performance issues (significant increase in SA load time, memory usage, 
etc).
Be aware, you may run into that situation. Using a URI-dnsbl avoids that risk.

I see that list gets updated frequently. How quickly do stale entries get 
removed from it?
I couldn't find a policy statement about that other than the note about the 30 
days retention for the RW_IPBL list.
Checking a random sample of the URLs on that list, the majority of them hit 
404 errors.
If that list grows with out bound and isn't periodically pruned of stale entries 
then it will become problematic for automated rule generation.

I'm not saying that this isn't an idea worth pursuing, just be aware there may 
be issues.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: ramsonware URI list

Posted by Axb <ax...@gmail.com>.

On 07/16/17 06:07, Ian Zimmerman wrote:
> But one still needs to signal rbldnsd to reload the data, right?

nope... no need to signal rbldnsd

see -c switch

Re: ramsonware URI list

Posted by Ian Zimmerman <it...@very.loosely.org>.

On 2017-07-15 12:19, David B Funk wrote:

> Another way to use that data is to extract the hostnames and feed them
> into a local URI-dnsbl.

> Using "rbldnsd" is an easy to maintain, lightweight (low CPU/RAM
> overhead) way to implement a local DNSbl for multiple purposes (EG an
> IP-addr based list for RBLDNSd or host-name based URI-dnsbl).

> The URI-dnsbl has an advantage of being easy to add names (just 'cat'
> them on to the end of the data-file with appropriate suffix) and
> doesn't require a restart of any daemon to take effect.

But one still needs to signal rbldnsd to reload the data, right?

If one has just hostname data or fixed IP address data (no ranges) yet
another option is the "constant database" cdb [1].  I use it a lot for
these purposes.  You can even match domain wildcards, by successively
stripping the most significant parts of the subject domain before trying
the match.

I am wondering if (or why not) a similar no-daemon option exists for
CIDR range data.  There are definitely perl modules that manipulate such
data, but none I'm aware of with a built-in compiled, quickly loaded
dataset format.

[1]
https://cr.yp.to/cdb.html
-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.

Re: ramsonware URI list

Posted by David B Funk <db...@engineering.uiowa.edu>.

On Sat, 15 Jul 2017, Antony Stone wrote:

> On Saturday 15 July 2017 at 11:19:54, mastered wrote:
>
>> Hi Nicola,
>>
>> I'm not good at SHELL script language, but this might be fine:
>>
>> 1 - Save file into lista.txt
>>
>> 2 - trasform lista.txt in spamassassin rules:
>>
>> cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' |
>> sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2"
>> describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware
>> score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt)
>> ; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf
>>
>>
>> If anyone can optimize it, i'm happy.
>
> My first comment would be "useless use of cat" :)
>
> My second comment would be that you can combine sed commands into a single
> string, separated by ; so that you only have to call sed itself once at the
> start of all that:
>
> sed "s'/http:\/\///'; s'/\/.*//'; s'/\./\\./g'; s'/^/\//'; s'/$/\\b\/i/'"
> lista.txt | nl .....

Another observation/optimization; use the perl pattern-match separator character 
specifier to avoid delimiter collision. (EG "m!" ).

The following two regexes are functionally equivalent but one is easier to 
write/read:

   /http:\/\/site\.com\/this\/that\/the\other\//i

   m!http://site\.com/this/that/the/other/!i

Second one avoids the "Leaning toothpick syndrome" 
https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome

Another way to use that data is to extract the hostnames and feed them into a 
local URI-dnsbl.
Using "rbldnsd" is an easy to maintain, lightweight (low CPU/RAM overhead) way 
to implement a local DNSbl for multiple purposes (EG an IP-addr based list for 
RBLDNSd or host-name based URI-dnsbl).
The URI-dnsbl has an advantage of being easy to add names (just 'cat' them on to 
the end of the data-file with appropriate suffix) and doesn't require a restart 
of any daemon to take effect.
Clearly it has a greater risk of FPs than a targeted rule that matches on the 
specific URL of the malware. However if the site is purpose created by blackhats 
to disseminate malware or a legitimate site that has been compromised and isn't 
being maintained then there's a high probability that it will be (ab)used again 
for other payloads. In that case blacklisting the host name gets all future 
garbage too.
IMHO: any site on that list with more than 3 entries or a registration age of 
less than a year is fair game for URIdnsbl listing.

Looking at that data there are clearly several patterns that could be used to 
create targeted rules.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: ramsonware URI list

Posted by mastered <al...@pizzarelli.it>.

Ahuhaauahu ok ok

Thankyou for replay





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/ramsonware-URI-list-tp122939p135315.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: ramsonware URI list

Posted by Dianne Skoll <df...@roaringpenguin.com>.

My only comment on this is that shell scripting is a completely inappropriate
language to use for this.  Use a real language like Perl, Python, Ruby, or
whatever.

Regards,

Dianne.

Re: ramsonware URI list

Posted by Martin Gregorie <ma...@gregorie.org>.

On Sat, 2017-07-15 at 09:59 -0700, Ian Zimmerman wrote:
> On 2017-07-15 11:59, Antony Stone wrote:
> 
> > Maybe other people have further optimisations.
> 
> With awk already part of the pipeline, all those seds are screaming
> for
> a vacation.
> 
Indeed. I think the whole job can be done fairly easily with a single
awk script. I didn't look at the input (have parts of it appeared on
this list?), which makes it hard to work out what the entire pipeline
does. However, the more I look at it the more it looks as if awk's
default action of chopping each line into words would, when combined
with awk functions that use regexes to modify words - gsub() and
friends - should simplify the whole exercise.

To the OP: if you want to raise your game with using sed and awk, about
the best thing yo can do is to get the O'Reilly "sed & awk" book by
Dale Dougherty - its a real eye-opener and much easier to read and
understand than the manpages, if only because its better organised and
includes a lot of example code.

Martin

Re: ramsonware URI list

Posted by Ian Zimmerman <it...@very.loosely.org>.

On 2017-07-15 11:59, Antony Stone wrote:

> Maybe other people have further optimisations.

With awk already part of the pipeline, all those seds are screaming for
a vacation.

Also, isn't the following command just a no-op?

sed -n p

A couple of quick tests failed to detect any difference from cat ;-)

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
Do obvious transformation on domain to reply privately _only_ on Usenet.

Re: ramsonware URI list

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Saturday 15 July 2017 at 11:19:54, mastered wrote:

> Hi Nicola,
> 
> I'm not good at SHELL script language, but this might be fine:
> 
> 1 - Save file into lista.txt
> 
> 2 - trasform lista.txt in spamassassin rules:
> 
> cat lista.txt | sed s'/http:\/\///' | sed s'/\/.*//' | sed s'/\./\\./g' |
> sed s'/^/\//' | sed s'/$/\\b\/i/' | nl | awk '{print "uri;RULE_NR_"$1";"$2"
> describe;RULE_NR_"$1";Url;presente;nella;Blacklist;Ramsonware
> score;RULE_NR_"$1";5.0" }' > listone.txt ;for i in $(sed -n p listone.txt)
> ; do echo "$i" ; done | sed s'/;/ /g' > blacklist.cf
> 
> 
> If anyone can optimize it, i'm happy.

My first comment would be "useless use of cat" :)

My second comment would be that you can combine sed commands into a single 
string, separated by ; so that you only have to call sed itself once at the 
start of all that:

sed "s'/http:\/\///'; s'/\/.*//'; s'/\./\\./g'; s'/^/\//'; s'/$/\\b\/i/'" 
lista.txt | nl .....

My only other comment is that you might want to adjust the spelling of 
Ransomware :)

Maybe other people have further optimisations.


Antony.

-- 
The gravitational attraction exerted by a single doctor at a distance of 6 
inches is roughly twice that of Jupiter at its closest point to the Earth.

                                                   Please reply to the list;
                                                         please *don't* CC me.