You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by LuKreme <kr...@kreme.com> on 2014/08/30 15:49:07 UTC

sa-learn and find

The following command seems to get stuck if there is no result from the find. Any suggestions on how to avoid passing an empty find result to spamd?

sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7` 

(where user $i has no emails in notspam that are new in the last 7 days)

I am already testing for the presence of the folder. Checking if the folder is empty isn’t going to help because the folder may have mail in it, just old mail.

The only thing I can think of to do is something like this:

MYFIND= `find $H_PATH/cur -type f -mtime -7` 
if [ -n $MYFIND ]; then
       /usr/local/bin/sa-learn --ham -u ${i} $MYFIND
fi

but I haven’t gotten that to work as I can seem to pass the test with a string that on echo “\”$MYFIND\”” returns “”.

-- 
"Why, you stuck-up, half-witted, scruffy-looking... NERFHERDER!"
"Who's Scruffy looking?"


Re: sa-learn and find

Posted by LuKreme <kr...@kreme.com>.
> On 03 Sep 2014, at 02:05 , Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
> 
>> On Sat, 30 Aug 2014 08:23:02 -0600
>> LuKreme wrote:
>> 
>>>  if test -d "$J_PATH"; then
>>>    MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`
> 
> On 30.08.14 22:32, RW wrote:
>> mtime may not be the best choice. Ideally what you want is the the time
>> since the spam was moved to Junk, rather than the time since it was
>> delivered.
> 
> ctime should provide this information - it's changed when sa file is moved. For example courier-imap uses ctime ifnormation for deleting old mail from
> trash and spam (and whatever you configure to TRASH variable.

I agree that it should. However, I’ve had very poor luck with -ctime.

For example, I have a command O run to delete files in my ~/tmp that are more than 30 days old. If I use -ctime, none of the files are ever deleted, while if I use -mtime, everything works as expected.
 
> Note that something that manipulates file status can break this feature,
> e.g.  a backup system that reads files and resets atime back will cause
> resetting the ctime.  Setting it _not_ to reset atime (nobody uses atime
> nowadays) should fix the problem.

That may be what is happening then, since the system is backed up with rsnapshot.

-- 
Personal isn't the same as important


Re: sa-learn and find

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>On Sat, 30 Aug 2014 08:23:02 -0600
>LuKreme wrote:
>
>>   if test -d "$J_PATH"; then
>>     MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`

On 30.08.14 22:32, RW wrote:
>mtime may not be the best choice. Ideally what you want is the the time
>since the spam was moved to Junk, rather than the time since it was
>delivered.

ctime should provide this information - it's changed when sa file is moved. 
For example courier-imap uses ctime ifnormation for deleting old mail from
trash and spam (and whatever you configure to TRASH variable.

Note that something that manipulates file status can break this feature,
e.g.  a backup system that reads files and resets atime back will cause
resetting the ctime.  Setting it _not_ to reset atime (nobody uses atime
nowadays) should fix the problem.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Fighting for peace is like fucking for virginity...

Re: sa-learn and find

Posted by LuKreme <kr...@kreme.com>.
On 31 Aug 2014, at 18:16 , Ian Zimmerman <it...@buug.org> wrote:

> find /home/${i}/Maildir/.notspam -type f -mtime -7 | xargs -r sa-learn --ham -u ${i}

Right. Doh. I got so held up in running find under sa-learn...

Well, that does make thins a lot easier, doesn't it.

Thanks for your patience.

-- 
"There will always be women in rubber flirting with me."


Re: sa-learn and find

Posted by Ian Zimmerman <it...@buug.org>.
On Sun, 31 Aug 2014 17:37:50 -0600,
LuKreme <kr...@kreme.com> wrote:

Ian> xargs (the GNU one at least) has an option to not run the inferior
Ian> when there are no args to give it.

LuKreme> The interior is the find:

_Inferior_ which is GNU speak for "subprocess".  I should have tried to
be less concise :-)

> sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7`

find /home/${i}/Maildir/.notspam -type f -mtime -7 | xargs -r sa-learn --ham -u ${i}

LuKreme> (FreeBSD xargs never runs the command if the input is empty)

You may not need -r then.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:

Re: sa-learn and find

Posted by LuKreme <kr...@kreme.com>.
On 31 Aug 2014, at 14:46 , Ian Zimmerman <it...@buug.org> wrote:

> On Sat, 30 Aug 2014 19:59:53 -0600,
> LuKreme <kr...@kreme.com> wrote:
> 
> RW> This may run into shell argument limits if you have to learn a lot
> RW> of spam. Consider piping the output of find to xargs, or using -exec
> RW> ...{} + in find.
> 
> LuKreme> Yes, I tried to do that, but as I said in my first post, if I
> LuKreme> do the find as part of the sa-learn command, then it stall when
> LuKreme> the find command returns null.
> 
> xargs (the GNU one at least) has an option to not run the inferior when
> there are no args to give it.

The interior is the find:

This was my original command:

sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7`

Which stalls if find returns nothing. I am not seeing how xargs would help this.

(FreeBSD xargs never runs the command if the input is empty)

-- 
'I really should talk to him, sir. He's had a near-death experience!'
'We all do. It's called living.'


Re: sa-learn and find

Posted by Ian Zimmerman <it...@buug.org>.
On Sat, 30 Aug 2014 19:59:53 -0600,
LuKreme <kr...@kreme.com> wrote:

RW> This may run into shell argument limits if you have to learn a lot
RW> of spam. Consider piping the output of find to xargs, or using -exec
RW> ...{} + in find.

LuKreme> Yes, I tried to do that, but as I said in my first post, if I
LuKreme> do the find as part of the sa-learn command, then it stall when
LuKreme> the find command returns null.

xargs (the GNU one at least) has an option to not run the inferior when
there are no args to give it.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:

Re: sa-learn and find

Posted by LuKreme <kr...@kreme.com>.
> On 30 Aug 2014, at 15:32 , RW <rw...@googlemail.com> wrote:
> 
> On Sat, 30 Aug 2014 08:23:02 -0600
> LuKreme wrote:
> 
>>  if test -d "$J_PATH"; then
>>    MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`
> 
> mtime may not be the best choice. Ideally what you want is the the time
> since the spam was moved to Junk, rather than the time since it was
> delivered. What I see with dovecot when I move mail with claws mail is
> that  a new file is created with the mtime preserved at the
> delivery time and the current epoch time in the filename. In that case
> the ideal would be Btime if your OS supports it, or failing that
> ctime. 
> 
> You could also use the time in the filename. Note that epoch times are
> 10 digits until long after we're dead so simple lexicographical
> comparisons between maildir filenames or between a maildir filename and
> an epoch time will work.

On my system the file is not renamed when it is moved.

> You may want to check what happens with whatever you use to move the
> spam.

Spam is delivered to the junk box at delivery time, or is manually moved via IMAP by the user.

Is there a way to actually show the mtime and ctime of a file?

>>    if test -n "$MYFIND"; then
>>      /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #>/dev/null 2>&1
> 
> This may run into shell argument limits if you have to learn a lot of
> spam. Consider piping the output of find to xargs, or using 
> -exec ...{} + in find.

Yes, I tried to do that, but as I said in my first post, if I do the find as part of the sa-learn command, then it stall when the find command returns null.


-- 
The fact that Bob and John are married does nothing to diminish anyone
else's marriage any more than a black woman marrying a white man, a Jew
marrying a Catholic, or an ugly Lyle marrying a Pretty Woman


Re: sa-learn and find

Posted by RW <rw...@googlemail.com>.
On Sat, 30 Aug 2014 08:23:02 -0600
LuKreme wrote:

>   if test -d "$J_PATH"; then
>     MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`

mtime may not be the best choice. Ideally what you want is the the time
since the spam was moved to Junk, rather than the time since it was
delivered. What I see with dovecot when I move mail with claws mail is
that  a new file is created with the mtime preserved at the
delivery time and the current epoch time in the filename. In that case
the ideal would be Btime if your OS supports it, or failing that
ctime. 

You could also use the time in the filename. Note that epoch times are
10 digits until long after we're dead so simple lexicographical
comparisons between maildir filenames or between a maildir filename and
an epoch time will work.

You may want to check what happens with whatever you use to move the
spam.  


>     if test -n "$MYFIND"; then
>       /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #>/dev/null 2>&1

This may run into shell argument limits if you have to learn a lot of
spam. Consider piping the output of find to xargs, or using 
-exec ...{} + in find.




Re: sa-learn and find

Posted by LuKreme <kr...@kreme.com>.
On 30 Aug 2014, at 07:49 , LuKreme <kr...@kreme.com> wrote:
> MYFIND= `find $H_PATH/cur -type f -mtime -7` 
> if [ -n $MYFIND ]; then
>       /usr/local/bin/sa-learn --ham -u ${i} $MYFIND
> fi

Doh!

if [ -n “$MYFIND” ]; then

or

if test -n “$MYFIND”; then

Sigh. Feeling extra stupid this Saturday morning.

It works, and is no longer processing thousands of old messages for no reason.

#/bin/sh
#
# Straightforward shell script to be run as root.  This parses the /home
# directory for mailboxes named .Junk and learns those as spam, and then
# parses the inbox (cur, not new) for ham.

# sa-learn-script (sal) v2.1  Lewis Butler, released to the Public Domain 2012

UROOT="/home/"
echo "Running SAL"
for i in `ls $UROOT` ; do
  J_PATH="${UROOT}${i}/Maildir/.Junk";
  H_PATH="${UROOT}${i}/Maildir”;

  if test -d "$J_PATH"; then
    MYFIND=`find $J_PATH/ -type f -mtime -7|grep -v dovecot`
    if test -n "$MYFIND"; then
      /usr/local/bin/sa-learn --spam -u ${i} $MYFIND #>/dev/null 2>&1
    fi
  else
     echo "No $J_PATH for $i"
  fi
  
  if test -d "$H_PATH"; then
    MYFIND=`find $H_PATH/cur -type f -mtime -7|grep -v dovecot`
    if test -n "$MYFIND"; then
      echo "Processing $H_PATH"
     /usr/local/bin/sa-learn --ham -u ${i} $MYFIND #>/dev/null 2>&1
    fi
  #else
  #  echo "No $H_PATH for $i”
  fi
done

If I were feeling really clever, I’d make sure the user existed first, but I’m not feeling that clever today.

-- 
A marriage is always made up of two people who are prepared to swear
that only the other one snores.