You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Ryan Coleman <ry...@cwis.biz> on 2015/10/19 23:21:42 UTC
Learning only on read emails?
Ok so it was established I don’t have a ham scan (correct). So how do I do it so that it only scans the read emails in a MAILDIR?
—
Ryan
Re: Learning only on read emails?
Posted by Dave Warren <da...@hireahit.com>.
On 2015-10-19 14:41, Ryan Coleman wrote:
> Actually it makes absolute sense since I dump my spam into a folder to be scanned as spam and anything that is still in my inbox, and read, is indeed ham.
>
> I just have to re-investigate the ./new and ./cur folders to make sure they will operate how I want. But if the email was delivered to my phone and it moves (but not read) then it’s not an option.
I agree completely, this has proved to be quite useful here. In my case,
I scavenge the "Archive" folder of various accounts for my users who use
the Archive functionality of modern mail clients, in particular, mobile
clients and Thunderbird.
The concept of "Copy" doesn't exist on most mobile clients, and even
when it does, most users simply won't be bothered to copy non-spam with
any regularity, so at least for me, I've had far better success dealing
with messages on an automated basis than trying to influence user behaviour.
At this point I only implement this for specially selected users (and of
course, only with a user's consent, but since I approach users when
they've had a message get misclassified, they're usually happy to help).
For users who don't use an "Archived" folder, capturing
left-in-the-inbox, marked-as-read would be useful, but it hasn't been
worth the time to implement (yet). Unfortunately I don't use maildir, so
any tools I've created here aren't useful in the general case and are
really platform and environment specific.
Also, if a user later takes a message from their archive and places it
into the spam folder, I do have a tool that detects the duplicate and
purges it from the corpus. Currently I think I just delete the whole
message, although I did actually write code to detect which was newer
and trust the most recent decision made by the user, but ultimately I
decided it was safer to just delete it completely.
--
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren
Re: Learning only on read emails?
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 19.10.15 17:30, Ryan Coleman wrote:
>I actually get THOUSANDS of emails a day. Most of it is spam. And not
> caught by SA. And when it is put into the spam folder it is not learned.
>
>But, hey, you know… you obviously know me better than me so why don’t you
> have this back and forth publicly with yourself and keep me out of it.
I would be still carefull to train all read mail as ham... there are cases
you don't notice it's spam (some of them are hardly distinguishable), forget
to move it to spam or don't have time to do it...
>>>> P.S.: no need for reply-all on a mailing list
>>>
>>> Habit. Besides, there’s no reply-to header rewrite on this mailing list. If I hit reply it goes only to you
that's why there are list headers and why some MUAs support them.
>> On Oct 19, 2015, at 5:25 PM, Reindl Harald <h....@thelounge.net> wrote:
>> nonsense - there are list headers and if you use a broken client just remove anything but the list-address
>Wow, you really are an asshole, huh?
>
>I looked at the headers before I said anything. Broken Client? No… Apple Mail. There are lists where it works because it EXISTS IN THE HEADERS.
not that I like him, but he's right that mail client that is not capable of
handling mailing lists is kind of broken...
Lists should not break mail by inserting reply-to header, because it's
supposed to be inserted by a client, not by a mailing list.
and the fact that it's made by apple doesn't make it good client.
Microsoft and apple tend to screw things their way just because they are
huge companies and don't care about (even backwards) compatibility and
correctness
>Speaking of learning spam… your email address will be joining the blacklist very soon.
just be careful when blacklisting and spam-training...
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Nothing is fool-proof to a talented fool.
Re: Learning only on read emails?
Posted by Reindl Harald <h....@thelounge.net>.
Am 20.10.2015 um 00:30 schrieb Ryan Coleman:
> I actually get THOUSANDS of emails a day. Most of it is spam. And not caught by SA. And when it is put into the spam folder it is not learned
sounds like you train the wrong bayes at all (a repeatly happening
problem of newbies) and so better look at the output of "sa-learn --dump
magic" *running as the user SA is running*, not root or something else
and well you need to run "sa-learn" in general the same way
however, if you get thousands of spam per day for a single user your
setup is questionable, we have a catchrate of 99.9% for some hundret
users while only 150-250 spam messages make it to the contentfilrer and
150000-250000 per day are catched by RBL-scoring, postscreen,
PTR-checks, HELO-checks etc.
Re: Learning only on read emails?
Posted by Ryan Coleman <ry...@cwis.biz>.
> On Oct 19, 2015, at 5:25 PM, Reindl Harald <h....@thelounge.net> wrote:
>
>
>
> Am 20.10.2015 um 00:17 schrieb Ryan Coleman:
>>> On Oct 19, 2015, at 4:45 PM, Reindl Harald <h....@thelounge.net> wrote:
>>> Am 19.10.2015 um 23:41 schrieb Ryan Coleman:
>>>> Actually it makes absolute sense since I dump my spam into a folder to be scanned as spam and anything that is still in my inbox, and read, is indeed ham.
>>>
>>> do what you want - everybody else on this world is selecting messages and not rely on a read state while easily a spam message you are not sure about got trained as ham
>>
>> I sort my emails out at the end of each calendar year. Training a ham folder at that point is, well, pointless. I will try to find another way.
>
> they you don't get really mails, having some hundret mails each day and sort them out at the end of the year won't work - *anyways* that has nothing to do with copy specific training mails to just two folders
I actually get THOUSANDS of emails a day. Most of it is spam. And not caught by SA. And when it is put into the spam folder it is not learned.
But, hey, you know… you obviously know me better than me so why don’t you have this back and forth publicly with yourself and keep me out of it.
> not possible?
> surely!
>
> 0 47764 SPAM
> 0 20371 HAM
> 0 2229900 TOKEN
>
>>> a sane training is specific spam / ham folders, a script which receives that message svia IMAP, stores them on the mailfilter-machine for later re-build of bayes and deletes them from the IMAP folder at the end
>>>
>>> P.S.: no need for reply-all on a mailing list
>>
>> Habit. Besides, there’s no reply-to header rewrite on this mailing list. If I hit reply it goes only to you
>
> nonsense - there are list headers and if you use a broken client just remove anything but the list-address
Wow, you really are an asshole, huh?
I looked at the headers before I said anything. Broken Client? No… Apple Mail. There are lists where it works because it EXISTS IN THE HEADERS.
Speaking of learning spam… your email address will be joining the blacklist very soon.
> list-help: <ma...@spamassassin.apache.org>
> list-unsubscribe: <ma...@spamassassin.apache.org>
> List-Post: <ma...@spamassassin.apache.org>
> List-Id: <users.spamassassin.apache.org>
>
Re: Learning only on read emails?
Posted by Reindl Harald <h....@thelounge.net>.
Am 20.10.2015 um 00:17 schrieb Ryan Coleman:
>> On Oct 19, 2015, at 4:45 PM, Reindl Harald <h....@thelounge.net> wrote:
>> Am 19.10.2015 um 23:41 schrieb Ryan Coleman:
>>> Actually it makes absolute sense since I dump my spam into a folder to be scanned as spam and anything that is still in my inbox, and read, is indeed ham.
>>
>> do what you want - everybody else on this world is selecting messages and not rely on a read state while easily a spam message you are not sure about got trained as ham
>
> I sort my emails out at the end of each calendar year. Training a ham folder at that point is, well, pointless. I will try to find another way.
they you don't get really mails, having some hundret mails each day and
sort them out at the end of the year won't work - *anyways* that has
nothing to do with copy specific training mails to just two folders
not possible?
surely!
0 47764 SPAM
0 20371 HAM
0 2229900 TOKEN
>> a sane training is specific spam / ham folders, a script which receives that message svia IMAP, stores them on the mailfilter-machine for later re-build of bayes and deletes them from the IMAP folder at the end
>>
>> P.S.: no need for reply-all on a mailing list
>
> Habit. Besides, there’s no reply-to header rewrite on this mailing list. If I hit reply it goes only to you
nonsense - there are list headers and if you use a broken client just
remove anything but the list-address
list-help: <ma...@spamassassin.apache.org>
list-unsubscribe: <ma...@spamassassin.apache.org>
List-Post: <ma...@spamassassin.apache.org>
List-Id: <users.spamassassin.apache.org>
Re: Learning only on read emails?
Posted by Ryan Coleman <ry...@cwis.biz>.
> On Oct 19, 2015, at 4:45 PM, Reindl Harald <h....@thelounge.net> wrote:
> Am 19.10.2015 um 23:41 schrieb Ryan Coleman:
>> Actually it makes absolute sense since I dump my spam into a folder to be scanned as spam and anything that is still in my inbox, and read, is indeed ham.
>
> do what you want - everybody else on this world is selecting messages and not rely on a read state while easily a spam message you are not sure about got trained as ham
I sort my emails out at the end of each calendar year. Training a ham folder at that point is, well, pointless. I will try to find another way.
> a sane training is specific spam / ham folders, a script which receives that message svia IMAP, stores them on the mailfilter-machine for later re-build of bayes and deletes them from the IMAP folder at the end
>
> P.S.: no need for reply-all on a mailing list
Habit. Besides, there’s no reply-to header rewrite on this mailing list. If I hit reply it goes only to you. Reply to all, as a keystroke, is the only way I can make sure it stays on the record.
Re: Learning only on read emails?
Posted by Reindl Harald <h....@thelounge.net>.
Am 19.10.2015 um 23:41 schrieb Ryan Coleman:
> Actually it makes absolute sense since I dump my spam into a folder to be scanned as spam and anything that is still in my inbox, and read, is indeed ham.
do what you want - everybody else on this world is selecting messages
and not rely on a read state while easily a spam message you are not
sure about got trained as ham
a sane training is specific spam / ham folders, a script which receives
that message svia IMAP, stores them on the mailfilter-machine for later
re-build of bayes and deletes them from the IMAP folder at the end
P.S.: no need for reply-all on a mailing list
> I just have to re-investigate the ./new and ./cur folders to make sure they will operate how I want. But if the email was delivered to my phone and it moves (but not read) then it’s not an option.
>
>> On Oct 19, 2015, at 4:35 PM, Reindl Harald <h....@thelounge.net> wrote:
>>
>>
>>
>> Am 19.10.2015 um 23:21 schrieb Ryan Coleman:
>>> Ok so it was established I don’t have a ham scan (correct). So how do I do it so that it only scans the read emails in a MAILDIR?
>>
>> that makes no sense
>>
>> train a spcific ham and a specific spam folder where you move messages you are sure how to classify and not a generic inbox just because you have read a message
Re: Learning only on read emails?
Posted by Ryan Coleman <ry...@cwis.biz>.
I figured out a way to get the spamd user to scan the spam folders. Definitely helping.
applying email in the inbox that have been read to the HAM is next on the list.
> On Oct 20, 2015, at 9:39 AM, RW <rw...@googlemail.com> wrote:
>
> On Tue, 20 Oct 2015 08:29:27 -0500
> Ryan Coleman wrote:
>
>>
>>> On Oct 20, 2015, at 8:21 AM, RW <rw...@googlemail.com> wrote:
>>>
>>> On Tue, 20 Oct 2015 15:14:42 +0300
>>> Jari Fredriksson wrote:
>>>
>>>> On 10/20/2015 12:41 AM, Ryan Coleman wrote:
>>>>> Actually it makes absolute sense since I dump my spam into a
>>>>> folder to be scanned as spam and anything that is still in my
>>>>> inbox, and read, is indeed ham.
>>>>>
>>>>> I just have to re-investigate the ./new and ./cur folders to make
>>>>> sure they will operate how I want. But if the email was delivered
>>>>> to my phone and it moves (but not read) then it?s not an option.
>>>>
>>>> cur and new folders work as supposed when the IMAP server is
>>>> Courier, but NOT when you use Dovecot.
>>>
>>> How does it not work as expected?
>>
>> I haven?t seen anything appear in the ?new? folder, to be honest.
>
> Bear in mind that the "new" directory is there for mail that's been
> delivered into the maildir folder without going through a mail client.
> If the mail is delivered there by a pop/imap client, or copied/moved
> between maildir folders, "new" shouldn't be used. Even when it is used,
> an IMAP server should move mail from "new" to "cur" immediately
> after its existence been reported to a client, and that can be
> instantaneous if the IMAP client supports IDLE.
>
> In my experience Dovecot's MDA does the right thing. There is a
> complication though in that when Sieve is used to set a flag, the MDA
> has no choice but to put it in "cur".
Re: Learning only on read emails?
Posted by RW <rw...@googlemail.com>.
On Tue, 20 Oct 2015 08:29:27 -0500
Ryan Coleman wrote:
>
> > On Oct 20, 2015, at 8:21 AM, RW <rw...@googlemail.com> wrote:
> >
> > On Tue, 20 Oct 2015 15:14:42 +0300
> > Jari Fredriksson wrote:
> >
> >> On 10/20/2015 12:41 AM, Ryan Coleman wrote:
> >>> Actually it makes absolute sense since I dump my spam into a
> >>> folder to be scanned as spam and anything that is still in my
> >>> inbox, and read, is indeed ham.
> >>>
> >>> I just have to re-investigate the ./new and ./cur folders to make
> >>> sure they will operate how I want. But if the email was delivered
> >>> to my phone and it moves (but not read) then it?s not an option.
> >>
> >> cur and new folders work as supposed when the IMAP server is
> >> Courier, but NOT when you use Dovecot.
> >
> > How does it not work as expected?
>
> I haven?t seen anything appear in the ?new? folder, to be honest.
Bear in mind that the "new" directory is there for mail that's been
delivered into the maildir folder without going through a mail client.
If the mail is delivered there by a pop/imap client, or copied/moved
between maildir folders, "new" shouldn't be used. Even when it is used,
an IMAP server should move mail from "new" to "cur" immediately
after its existence been reported to a client, and that can be
instantaneous if the IMAP client supports IDLE.
In my experience Dovecot's MDA does the right thing. There is a
complication though in that when Sieve is used to set a flag, the MDA
has no choice but to put it in "cur".
Re: Learning only on read emails?
Posted by Bowie Bailey <Bo...@BUC.com>.
On 10/20/2015 9:29 AM, Ryan Coleman wrote:
>> On Oct 20, 2015, at 8:21 AM, RW <rw...@googlemail.com> wrote:
>>
>> On Tue, 20 Oct 2015 15:14:42 +0300
>> Jari Fredriksson wrote:
>>
>>> On 10/20/2015 12:41 AM, Ryan Coleman wrote:
>>>> Actually it makes absolute sense since I dump my spam into a folder
>>>> to be scanned as spam and anything that is still in my inbox, and
>>>> read, is indeed ham.
>>>>
>>>> I just have to re-investigate the ./new and ./cur folders to make
>>>> sure they will operate how I want. But if the email was delivered
>>>> to my phone and it moves (but not read) then it?s not an option.
>>> cur and new folders work as supposed when the IMAP server is Courier,
>>> but NOT when you use Dovecot.
>> How does it not work as expected?
> I haven’t seen anything appear in the “new” folder, to be honest.
On my server (Courier), messages are in the "new" folder until I look at
the folder in my client. At that point, they are moved to the "cur"
folder. This doesn't mean that they have been read, just that I have
seen the list of messages.
To find read messages in a maildir, look for filenames ending in "S" in
the "cur" directory.
--
Bowie
Re: Learning only on read emails?
Posted by Ryan Coleman <ry...@cwis.biz>.
> On Oct 20, 2015, at 8:21 AM, RW <rw...@googlemail.com> wrote:
>
> On Tue, 20 Oct 2015 15:14:42 +0300
> Jari Fredriksson wrote:
>
>> On 10/20/2015 12:41 AM, Ryan Coleman wrote:
>>> Actually it makes absolute sense since I dump my spam into a folder
>>> to be scanned as spam and anything that is still in my inbox, and
>>> read, is indeed ham.
>>>
>>> I just have to re-investigate the ./new and ./cur folders to make
>>> sure they will operate how I want. But if the email was delivered
>>> to my phone and it moves (but not read) then it?s not an option.
>>
>> cur and new folders work as supposed when the IMAP server is Courier,
>> but NOT when you use Dovecot.
>
> How does it not work as expected?
I haven’t seen anything appear in the “new” folder, to be honest.
Re: Learning only on read emails?
Posted by RW <rw...@googlemail.com>.
On Tue, 20 Oct 2015 15:14:42 +0300
Jari Fredriksson wrote:
> On 10/20/2015 12:41 AM, Ryan Coleman wrote:
> > Actually it makes absolute sense since I dump my spam into a folder
> > to be scanned as spam and anything that is still in my inbox, and
> > read, is indeed ham.
> >
> > I just have to re-investigate the ./new and ./cur folders to make
> > sure they will operate how I want. But if the email was delivered
> > to my phone and it moves (but not read) then it?s not an option.
>
> cur and new folders work as supposed when the IMAP server is Courier,
> but NOT when you use Dovecot.
How does it not work as expected?
Re: Learning only on read emails?
Posted by Jari Fredriksson <ja...@iki.fi>.
On 10/20/2015 12:41 AM, Ryan Coleman wrote:
> Actually it makes absolute sense since I dump my spam into a folder to be scanned as spam and anything that is still in my inbox, and read, is indeed ham.
>
> I just have to re-investigate the ./new and ./cur folders to make sure they will operate how I want. But if the email was delivered to my phone and it moves (but not read) then it’s not an option.
cur and new folders work as supposed when the IMAP server is Courier,
but NOT when you use Dovecot.
That is how I have been learning from these two.
br. jarif
>
>
>> On Oct 19, 2015, at 4:35 PM, Reindl Harald <h....@thelounge.net> wrote:
>>
>>
>>
>> Am 19.10.2015 um 23:21 schrieb Ryan Coleman:
>>> Ok so it was established I don’t have a ham scan (correct). So how do I do it so that it only scans the read emails in a MAILDIR?
>> that makes no sense
>>
>> train a spcific ham and a specific spam folder where you move messages you are sure how to classify and not a generic inbox just because you have read a message
>>
>>
Re: Learning only on read emails?
Posted by Ryan Coleman <ry...@cwis.biz>.
Actually it makes absolute sense since I dump my spam into a folder to be scanned as spam and anything that is still in my inbox, and read, is indeed ham.
I just have to re-investigate the ./new and ./cur folders to make sure they will operate how I want. But if the email was delivered to my phone and it moves (but not read) then it’s not an option.
> On Oct 19, 2015, at 4:35 PM, Reindl Harald <h....@thelounge.net> wrote:
>
>
>
> Am 19.10.2015 um 23:21 schrieb Ryan Coleman:
>> Ok so it was established I don’t have a ham scan (correct). So how do I do it so that it only scans the read emails in a MAILDIR?
>
> that makes no sense
>
> train a spcific ham and a specific spam folder where you move messages you are sure how to classify and not a generic inbox just because you have read a message
>
>
Re: Learning only on read emails?
Posted by Reindl Harald <h....@thelounge.net>.
Am 19.10.2015 um 23:21 schrieb Ryan Coleman:
> Ok so it was established I don’t have a ham scan (correct). So how do I do it so that it only scans the read emails in a MAILDIR?
that makes no sense
train a spcific ham and a specific spam folder where you move messages
you are sure how to classify and not a generic inbox just because you
have read a message
Re: Learning only on read emails?
Posted by Ryan Coleman <ry...@cwis.biz>.
> On Oct 19, 2015, at 6:48 PM, Bill Cole <sa...@billmail.scconsult.com> wrote:
>
> On 19 Oct 2015, at 17:21, Ryan Coleman wrote:
>
>> Ok so it was established I don’t have a ham scan (correct). So how do I do it so that it only scans the read emails in a MAILDIR?
>
> Assuming your delivery and client access mechanisms (IMAP4/POP3/whatever) follow standard Maildir behavior & naming, a message which has been seen by a mail client program (i.e. marked as "read" by an IMAP4 client, maybe just downloaded by a POP3 client, consult your IMAP4/POP3/whatever server docs & config for the last word...) will be in the 'cur' subdirectory and its name will match the regular expression "^[0-9]*\..*\..*:2,[A-R]*S[T-Za-z]*$" Or looked at another way: the name will end with ':2," followed by one or more letters in ASCII ordering (capitals first) with one of those letters being 'S' (for "Seen").
>
> (Or in shell, with a slight chance of breakage on a pathological system: *.*.*:2,*S*)
Well, damn! I learned something new today. I never bothered to look at the file name structure that closely before!
Re: Learning only on read emails?
Posted by Eric Wong <e...@80x24.org>.
Ryan Coleman <ry...@cwis.biz> wrote:
> That’s more information than Dovecot gives for the structure, so that will help.
>
> Do you happen to know what the other flags mean?
http://cr.yp.to/proto/maildir.html
My inotify setup posted earlier relies only on the 'S' flag and
alphabetical ordering of flags.
Re: Learning only on read emails?
Posted by Ryan Coleman <ry...@cwis.biz>.
That’s more information than Dovecot gives for the structure, so that will help.
Do you happen to know what the other flags mean?
Examples I have:
Tch
Tad
STad
Sade
Sadg
RSad
FRSadfi
F I presume is flagged - that email (the last one) is definitely one I flagged in Apple Mail. The “fi” seems to be the waiting on the flag. I changed it and now it’s FRSadj.
My guesses so far…
F=Flagged
R=Replied
S=Seen
> On Oct 19, 2015, at 6:48 PM, Bill Cole <sa...@billmail.scconsult.com> wrote:
>
> On 19 Oct 2015, at 17:21, Ryan Coleman wrote:
>
>> Ok so it was established I don’t have a ham scan (correct). So how do I do it so that it only scans the read emails in a MAILDIR?
>
> Assuming your delivery and client access mechanisms (IMAP4/POP3/whatever) follow standard Maildir behavior & naming, a message which has been seen by a mail client program (i.e. marked as "read" by an IMAP4 client, maybe just downloaded by a POP3 client, consult your IMAP4/POP3/whatever server docs & config for the last word...) will be in the 'cur' subdirectory and its name will match the regular expression "^[0-9]*\..*\..*:2,[A-R]*S[T-Za-z]*$" Or looked at another way: the name will end with ':2," followed by one or more letters in ASCII ordering (capitals first) with one of those letters being 'S' (for "Seen").
>
> (Or in shell, with a slight chance of breakage on a pathological system: *.*.*:2,*S*)
Re: Learning only on read emails?
Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 19 Oct 2015, at 17:21, Ryan Coleman wrote:
> Ok so it was established I don’t have a ham scan (correct). So how
> do I do it so that it only scans the read emails in a MAILDIR?
Assuming your delivery and client access mechanisms
(IMAP4/POP3/whatever) follow standard Maildir behavior & naming, a
message which has been seen by a mail client program (i.e. marked as
"read" by an IMAP4 client, maybe just downloaded by a POP3 client,
consult your IMAP4/POP3/whatever server docs & config for the last
word...) will be in the 'cur' subdirectory and its name will match the
regular expression "^[0-9]*\..*\..*:2,[A-R]*S[T-Za-z]*$" Or looked at
another way: the name will end with ':2," followed by one or more
letters in ASCII ordering (capitals first) with one of those letters
being 'S' (for "Seen").
(Or in shell, with a slight chance of breakage on a pathological system:
*.*.*:2,*S*)
Re: Learning only on read emails?
Posted by Ryan Coleman <ry...@cwis.biz>.
Thanks, I’m going to read about it tonight.
> On Oct 19, 2015, at 5:40 PM, Eric Wong <e...@80x24.org> wrote:
>
> Ryan Coleman <ry...@cwis.biz> wrote:
>> Ok so it was established I don’t have a ham scan (correct). So how do
>> I do it so that it only scans the read emails in a MAILDIR?
>
> Since 2008, I use inotify (via incrond) on Maildirs:
>
> http://mid.gmane.org/20140822083434.GA8581@dcvr.yhbt.net
Re: Learning only on read emails?
Posted by Eric Wong <e...@80x24.org>.
Ryan Coleman <ry...@cwis.biz> wrote:
> Ok so it was established I don’t have a ham scan (correct). So how do
> I do it so that it only scans the read emails in a MAILDIR?
Since 2008, I use inotify (via incrond) on Maildirs:
http://mid.gmane.org/20140822083434.GA8581@dcvr.yhbt.net