You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2008/08/10 22:55:52 UTC
Re: Mass-check not scanning all messages.
if ham and spam are directories containing mboxes, you might be better
off with this:
$WORKINGDIR/mass-check --progress --all --showdots \
ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham/* \
spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam/*
I usually use ham:detect:/path/to/dir and then call mbox files "foo.mbox".
--j.
RN-Chris writes:
>
> I made a small bash wrapper script so I could set it up to scan a few
> different corpora but this is what it is executing. I specified the --all
> switch so large messages should not be an issue.
>
> In the two respective corpus directories (ham | spam) emails are just
> dumped in there. I looked at the spam.log and the ham.log and then looked at
> the corrosponding messages thinking it was something special about those
> messages but everything looks normal about them. :/
>
> $WORKINGDIR/mass-check --progress --all --showdots \
> ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham \
> spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam
>
> $WORKINGDIR/hit-frequencies -x -p -a > freqs
> egrep '(LOCAL|OVERALL%)' $WORKINGDIR/freqs
>
>
> Theo Van Dinter-2 wrote:
> >
> > On Sun, Aug 10, 2008 at 12:16:38PM -0700, RN-Chris wrote:
> >> I have a custom spam corpus that I am trying to run rules against to test
> >> their effectiveness however mass-check will only scan a few ( < 5 )
> >> messages
> >> of the spam and usually only 1 or 2 of the ham messages. Any clues?
> >> Roughly
> >> a week of googling and I can't find anyone with this exact problem.
> >
> > Can you be more specific about what you're doing / how your corpus
> > is setup / etc? You've essentially said "things don't work, what's
> > wrong". :)
> >
> > Some random thoughts: do you have mbox files but are not specifying them
> > as such? are the majority of messages > 250k?
> >
> > --
> > Randomly Selected Tagline:
> > "Besides, I think [Slackware] sounds better than 'Microsoft,' don't you?"
> > - Patrick Volkerding
> >
> >
> >
>
> --
> View this message in context: http://www.nabble.com/Mass-check-not-scanning-all-messages.-tp18916106p18916593.html
> Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
RE: Mass-check not scanning all messages.
Posted by Chris Reed <ch...@revogate.com>.
J,
Really appreciate it sir. I wasn't aware that the mbox was a switch. Changed
it to spam:dir and ham:dir and then removed the trailing /* and it worked
(at least appeared to) work wonderfully.
Thanks to all that helped. :)
-- Chris
-----Original Message-----
From: jm@jmason.org [mailto:jm@jmason.org]
Sent: Monday, August 11, 2008 6:29 AM
To: chris@revogate.com
Cc: jm@jmason.org; users@spamassassin.apache.org
Subject: Re: Mass-check not scanning all messages.
hmm. that sounds like it's not using mboxes. are you sure you are?
If not, just use "ham:detect:" or "ham:dir:".
--j.
Chris Reed writes:
> I added /* to the end of the dir paths but that didn't change anything.
The
> mails do have a somewhat weird naming convention. They were used on imap
so
> a sample filename would be something like this.
>
> 1214839027.5368_1.servername:2,
>
> However it was a problem with it scanning that type of filename I would
> think it wouldn't still scan 4 messages. I would think it would scan zero
> messages if it was a naming problem with the messages.
>
> -- Chris
>
> -----Original Message-----
> From: jm@jmason.org [mailto:jm@jmason.org]
> Sent: Sunday, August 10, 2008 3:56 PM
> To: RN-Chris
> Cc: users@spamassassin.apache.org
> Subject: Re: Mass-check not scanning all messages.
>
>
> if ham and spam are directories containing mboxes, you might be better
> off with this:
>
> $WORKINGDIR/mass-check --progress --all --showdots \
> ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham/* \
> spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam/*
>
> I usually use ham:detect:/path/to/dir and then call mbox files "foo.mbox".
>
> --j.
>
> RN-Chris writes:
> >
> > I made a small bash wrapper script so I could set it up to scan a few
> > different corpora but this is what it is executing. I specified the
--all
> > switch so large messages should not be an issue.
> >
> > In the two respective corpus directories (ham | spam) emails are just
> > dumped in there. I looked at the spam.log and the ham.log and then
looked
> at
> > the corrosponding messages thinking it was something special about those
> > messages but everything looks normal about them. :/
> >
> > $WORKINGDIR/mass-check --progress --all --showdots \
> > ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham \
> > spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam
> >
> > $WORKINGDIR/hit-frequencies -x -p -a > freqs
> > egrep '(LOCAL|OVERALL%)' $WORKINGDIR/freqs
> >
> >
> > Theo Van Dinter-2 wrote:
> > >
> > > On Sun, Aug 10, 2008 at 12:16:38PM -0700, RN-Chris wrote:
> > >> I have a custom spam corpus that I am trying to run rules against to
> test
> > >> their effectiveness however mass-check will only scan a few ( < 5 )
> > >> messages
> > >> of the spam and usually only 1 or 2 of the ham messages. Any clues?
> > >> Roughly
> > >> a week of googling and I can't find anyone with this exact problem.
> > >
> > > Can you be more specific about what you're doing / how your corpus
> > > is setup / etc? You've essentially said "things don't work, what's
> > > wrong". :)
> > >
> > > Some random thoughts: do you have mbox files but are not specifying
them
> > > as such? are the majority of messages > 250k?
> > >
> > > --
> > > Randomly Selected Tagline:
> > > "Besides, I think [Slackware] sounds better than 'Microsoft,' don't
> you?"
> > > - Patrick Volkerding
> > >
> > >
> > >
> >
> > --
> > View this message in context:
>
http://www.nabble.com/Mass-check-not-scanning-all-messages.-tp18916106p18916
> 593.html
> > Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
RE: Mass-check not scanning all messages.
Posted by Chris Reed <ch...@revogate.com>.
I added /* to the end of the dir paths but that didn't change anything. The
mails do have a somewhat weird naming convention. They were used on imap so
a sample filename would be something like this.
1214839027.5368_1.servername:2,
However it was a problem with it scanning that type of filename I would
think it wouldn't still scan 4 messages. I would think it would scan zero
messages if it was a naming problem with the messages.
-- Chris
-----Original Message-----
From: jm@jmason.org [mailto:jm@jmason.org]
Sent: Sunday, August 10, 2008 3:56 PM
To: RN-Chris
Cc: users@spamassassin.apache.org
Subject: Re: Mass-check not scanning all messages.
if ham and spam are directories containing mboxes, you might be better
off with this:
$WORKINGDIR/mass-check --progress --all --showdots \
ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham/* \
spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam/*
I usually use ham:detect:/path/to/dir and then call mbox files "foo.mbox".
--j.
RN-Chris writes:
>
> I made a small bash wrapper script so I could set it up to scan a few
> different corpora but this is what it is executing. I specified the --all
> switch so large messages should not be an issue.
>
> In the two respective corpus directories (ham | spam) emails are just
> dumped in there. I looked at the spam.log and the ham.log and then looked
at
> the corrosponding messages thinking it was something special about those
> messages but everything looks normal about them. :/
>
> $WORKINGDIR/mass-check --progress --all --showdots \
> ham:mbox:/var/home/c/h/chris/spamcorpus/custom/ham \
> spam:mbox:/var/home/c/h/chris/spamcorpus/custom/spam
>
> $WORKINGDIR/hit-frequencies -x -p -a > freqs
> egrep '(LOCAL|OVERALL%)' $WORKINGDIR/freqs
>
>
> Theo Van Dinter-2 wrote:
> >
> > On Sun, Aug 10, 2008 at 12:16:38PM -0700, RN-Chris wrote:
> >> I have a custom spam corpus that I am trying to run rules against to
test
> >> their effectiveness however mass-check will only scan a few ( < 5 )
> >> messages
> >> of the spam and usually only 1 or 2 of the ham messages. Any clues?
> >> Roughly
> >> a week of googling and I can't find anyone with this exact problem.
> >
> > Can you be more specific about what you're doing / how your corpus
> > is setup / etc? You've essentially said "things don't work, what's
> > wrong". :)
> >
> > Some random thoughts: do you have mbox files but are not specifying them
> > as such? are the majority of messages > 250k?
> >
> > --
> > Randomly Selected Tagline:
> > "Besides, I think [Slackware] sounds better than 'Microsoft,' don't
you?"
> > - Patrick Volkerding
> >
> >
> >
>
> --
> View this message in context:
http://www.nabble.com/Mass-check-not-scanning-all-messages.-tp18916106p18916
593.html
> Sent from the SpamAssassin - Users mailing list archive at Nabble.com.