You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by phuong hanu <ph...@gmail.com> on 2011/03/17 11:01:42 UTC

Script to generate whitelist based on INCOMING email????

	
Hi,

I've just read ur post on nabble

I just send this message to you to ask about one problem that I have to
solve now. I have a database of email in my linux virtual machine. this
table includes some fiedls such as ID, Spam, Data, Time, Sender_add,
sender_ip, sender_domain,....

since I do a project on automatic whitelist so the data preprocessing is
very important. My problem is that i still dont know how to generate a
database for my whitelist from that database because one domain can include
many IP addresses. My job is to group them all (by a script maybe).

For example: gmail.com: 38.98.127.148, 74.125.46.29, 74.125.46.30, .....

>From those pair of IP-domain, I have to find threshold to figure out which
IP is used for sending spam. threshold can be "3 days" (for example) because
spammers will just use IP to spread spams in such a short time. after
removing the illegal IP, we have final whitelist to apply in email sys

so what i just want to care abt are sender_ip, and sender_domain. And when I
use mySQL command to list out the number of rows in the table, the result is
more than 46,000 rows >.< (SELECT sender_ip, sender_domain FROM emailsl;)
---> i can not do it manually by see each line and note down the paper "what
domain" has "what IP"

That why i just ask u for method to solve this pre-problem. This step in
data preprocessing is very important because it creats the DB for my
whitelist in any email sys. After that, i'll create plugin for SpamAssassin
to whitelist email sys automatically based on the list that i preprocessed 

What i'm having: email db, linux virtual machine, mySQL

What i want: build db in which show the pairs of sender domain-legal IP
(cross out domains-illegal IPs based on threshold)

Hope u see my point and help me abt that
-- 
View this message in context: http://old.nabble.com/Script-to-generate-whitelist-based-on-outgoing-email-tp15254287p31171257.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Script to generate whitelist based on INCOMING email????

Posted by phuong hanu <ph...@gmail.com>.

http://old.nabble.com/file/p31192159/db.rar db.rar 

In fact, I'm having an email db (see the attach). And now I want to generate
my db which stores the info abt a domain and its legal IP addresses (this is
my whitelist)

I think there're 2 ways to do that

1. contact with domain name owners --> ask for IP addresses of each domain
but i thinks it's impossible mission

2. build db based on the info extracted fr email header------> my question.
Have u got my point?
-- 
View this message in context: http://old.nabble.com/Script-to-generate-whitelist-based-on-outgoing-email-tp15254287p31192159.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Script to generate whitelist based on INCOMING email????

Posted by Martin Gregorie <ma...@gregorie.org>.
On Sat, 2011-03-19 at 07:37 -0700, phuong hanu wrote:
> I just have difficulty in the way to create my list (whitelist) from the
> email db
> 
> U know, we must have our own whitelist before using some techniques (plugin,
> service) to prevent spam based on that list
> 
Before we can help we need to know exactly what you are trying to do and
how you are storing mail in the 'email db'.

This is not clear from what you've just written. For starters, a
whitelist is *NOT* used to 'prevent spam'. A whitelist is used to
prevent mail from members of the whitelist from being treated as spam
even if that is what it is.


Martin




Re: Script to generate whitelist based on INCOMING email????

Posted by phuong hanu <ph...@gmail.com>.
I just have difficulty in the way to create my list (whitelist) from the
email db

U know, we must have our own whitelist before using some techniques (plugin,
service) to prevent spam based on that list

So that i really need a help from you guys who have experience 
-- 
View this message in context: http://old.nabble.com/Script-to-generate-whitelist-based-on-outgoing-email-tp15254287p31189121.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Script to generate whitelist based on INCOMING email????

Posted by Martin Gregorie <ma...@gregorie.org>.
On Thu, 2011-03-17 at 23:21 -0700, phuong hanu wrote:
> actually, that's not the pb with mySQL command. i just wanna suggestion abt
> the script that can extract info from email header in my email db to create
> a list (whitelist) for future purpose.
> 
IMO doing what you are asking about is asking for trouble. The only
auto-whitelist I'd trust would be built from the recipients of outgoing
mail.


Martin



Re: Script to generate whitelist based on INCOMING email????

Posted by phuong hanu <ph...@gmail.com>.
 actually, that's not the pb with mySQL command. i just wanna suggestion abt
the script that can extract info from email header in my email db to create
a list (whitelist) for future purpose.

--> whitelist process. I'm working on the plugin but that's not the process
of generating db for my whitelist.

before testing the plugin, I have to have my own whitelist. 

-- 
View this message in context: http://old.nabble.com/Script-to-generate-whitelist-based-on-outgoing-email-tp15254287p31178856.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Script to generate whitelist based on INCOMING email????

Posted by Bowie Bailey <Bo...@BUC.com>.
On 3/17/2011 6:01 AM, phuong hanu wrote:
> 	
> Hi,
>
> I've just read ur post on nabble
>
> I just send this message to you to ask about one problem that I have to
> solve now. I have a database of email in my linux virtual machine. this
> table includes some fiedls such as ID, Spam, Data, Time, Sender_add,
> sender_ip, sender_domain,....
>
> since I do a project on automatic whitelist so the data preprocessing is
> very important. My problem is that i still dont know how to generate a
> database for my whitelist from that database because one domain can include
> many IP addresses. My job is to group them all (by a script maybe).
>
> For example: gmail.com: 38.98.127.148, 74.125.46.29, 74.125.46.30, .....
>
> From those pair of IP-domain, I have to find threshold to figure out which
> IP is used for sending spam. threshold can be "3 days" (for example) because
> spammers will just use IP to spread spams in such a short time. after
> removing the illegal IP, we have final whitelist to apply in email sys
>
> so what i just want to care abt are sender_ip, and sender_domain. And when I
> use mySQL command to list out the number of rows in the table, the result is
> more than 46,000 rows >.< (SELECT sender_ip, sender_domain FROM emailsl;)
> ---> i can not do it manually by see each line and note down the paper "what
> domain" has "what IP"
>
> That why i just ask u for method to solve this pre-problem. This step in
> data preprocessing is very important because it creats the DB for my
> whitelist in any email sys. After that, i'll create plugin for SpamAssassin
> to whitelist email sys automatically based on the list that i preprocessed 
>
> What i'm having: email db, linux virtual machine, mySQL
>
> What i want: build db in which show the pairs of sender domain-legal IP
> (cross out domains-illegal IPs based on threshold)
>
> Hope u see my point and help me abt that

That's a very open-ended question. And other than the comment about
creating a plugin, is mostly off-topic here.  Questions related to
pulling data from your DB would probably be better asked in a mySQL
forum.  Once you get a bit farther and are trying to integrate with SA,
we can help with that.

In either case, try to ask simple, specific questions.  You will get far
more responses that way.

-- 
Bowie