You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruleqa@spamassassin.apache.org by Daniel Lemke <le...@jam-software.com> on 2012/08/09 16:55:48 UTC

setup masscheck account

So I've just received my rsync account (thanks Kevin!), tried to set up the mass check account and already run into some issues.

To be more specific (system is a Debian 6.0.4)
- Installed all required Perl modules
- Manually installed SpamAssassin 3.3.2 (used the archive on the apache site)
- Extracted the auto-mass-check script
- Copied the files to their proper location (according to http://wiki.apache.org/spamassassin/NightlyMassCheck)
- Modified the auto-mass-check.cf (defined rsync account, set directories for run_masscheck spam and ham)
- Ran the auto-mass-check script, got the following output:

++ ./mass-check --hamlog=ham-dlemke-ham.log --spamlog=spam-dlemke-ham.log -j --progress ham:dir:/home/lemke/masses/HAM '}'
./auto-mass-check.sh: line 133: ./mass-check: No such file or directory
++ LOGLIST=' ham-dlemke-ham.log spam-dlemke-ham.log'
++ set +x
rsync: failed to connect to rsync.spamassassin.org: Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(122) [Receiver=3.0.7]

I guess the mass-check script it didn't found is the one that is located at the masses directory at the SVN repo?
Do I have to download and compile this before executing the auto-mass-check script?

Sorry if I'm asking something obvious, I'm not that used to unix boxes ;-)

Regards,
Daniel

________________________________



--------------------------------------------------------
JAM Software GmbH
Managing Director: Joachim Marder
Am Wissenschaftspark 26 * 54296 Trier * Germany
Phone: +49 (0)651-145 653 -0 * Fax: +49 (0)651-145 653 -29
Commercial register number HRB 4920 (AG Wittlich) http://www.jam-software.com

Re: setup masscheck account

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/9/2012 10:55 AM, Daniel Lemke wrote:
> So I've just received my rsync account (thanks Kevin!), tried to set up the mass check account and already run into some issues.
>
> To be more specific (system is a Debian 6.0.4)
> - Installed all required Perl modules
> - Manually installed SpamAssassin 3.3.2 (used the archive on the apache site)
> - Extracted the auto-mass-check script
> - Copied the files to their proper location (according to http://wiki.apache.org/spamassassin/NightlyMassCheck)
> - Modified the auto-mass-check.cf (defined rsync account, set directories for run_masscheck spam and ham)
> - Ran the auto-mass-check script, got the following output:
>
> ++ ./mass-check --hamlog=ham-dlemke-ham.log --spamlog=spam-dlemke-ham.log -j --progress ham:dir:/home/lemke/masses/HAM '}'
> ./auto-mass-check.sh: line 133: ./mass-check: No such file or directory
> ++ LOGLIST=' ham-dlemke-ham.log spam-dlemke-ham.log'
> ++ set +x
> rsync: failed to connect to rsync.spamassassin.org: Connection timed out (110)
> rsync error: error in socket IO (code 10) at clientserver.c(122) [Receiver=3.0.7]
>
> I guess the mass-check script it didn't found is the one that is located at the masses directory at the SVN repo?
> Do I have to download and compile this before executing the auto-mass-check script?
>
> Sorry if I'm asking something obvious, I'm not that used to unix boxes ;-)
Hi Daniel,

I'm going to be setting up a mass check box of my own (I've previously 
used the corpora upload or JM processed my mail back in the old days) so 
I'll know more about your questions soon.

However, SA doesn't have to be installed for masscheck.  Masscheck uses 
the latest trunk version.

Are you somehow blocking outbound connections for rsync to grab that 
latest trunk version?  That's what I believe is going on.

Regards,
KAM

Re: setup masscheck account

Posted by Axb <ax...@gmail.com>.
On 08/09/2012 05:46 PM, Daniel Lemke wrote:
>> From: Kevin A. McGrail [mailto:KMcGrail@PCCC.com]
>> Are you somehow blocking outbound connections for rsync to grab that latest
>> trunk version?  That's what I believe is going on.
>
> Most likely, I guess our corporate firewall is blocking the script here. I'll talk to our admins on this tomorrow.

sounds like it

>> From: Axb [mailto:axb.lists@gmail.com]
>> yep - is this cygwin or a *nix distro?
>
> My Debian is hosted in VirtualBox, so kind of native *nix ;-)

standard Linux - no biggie.

"whereis rsync"
will tell you...

>> did it create a  ~/masscheckwork/weekly_mass_check/
>> (in your home directory)
>
> It created the masscheckwork dir but it's empty.
>

Alex





Re: setup masscheck account

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/9/2012 12:11 PM, Daniel Lemke wrote:
>> From: Kevin A. McGrail [mailto:KMcGrail@PCCC.com]
>> I would check with your network admin to see if your internet router/firewall is
>> blocking it and also see if you have an iptables or similar firewalls running on
>> your Debian box/virtual host that are blocking certain ports.
>
> Yeah, I'll do that tomorrow.
> I've added a point for that in the Wiki. I think it's not a bad idea to document any issues we run into while setting up those boxes ;-)
> http://wiki.apache.org/spamassassin/NightlyMassCheck
I saw and agree completely.

Thanks again for your help.  Now get back to your thesis.

RE: setup masscheck account

Posted by Daniel Lemke <le...@jam-software.com>.
> From: Kevin A. McGrail [mailto:KMcGrail@PCCC.com]
> I would check with your network admin to see if your internet router/firewall is
> blocking it and also see if you have an iptables or similar firewalls running on
> your Debian box/virtual host that are blocking certain ports.


Yeah, I'll do that tomorrow.
I've added a point for that in the Wiki. I think it's not a bad idea to document any issues we run into while setting up those boxes ;-)
http://wiki.apache.org/spamassassin/NightlyMassCheck


________________________________



----------------------------------------------------
JAM Software GmbH
Geschäftsführer: Joachim Marder
Am Wissenschaftspark 26 * 54296 Trier * Germany
Tel: 0651-145 653 -0 * Fax: 0651-145 653 -29
Handelsregister Nr. HRB 4920 (AG Wittlich) http://www.jam-software.de

Re: setup masscheck account

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/9/2012 12:03 PM, Daniel Lemke wrote:
>>
>> Is rsync on our server using port 22 because he might need another port
>> opened?  873 is the standard port I think.
> Yep that's it, 22 is working, 873 gives me a timeout...
>
I would check with your network admin to see if your internet 
router/firewall is blocking it and also see if you have an iptables or 
similar firewalls running on your Debian box/virtual host that are 
blocking certain ports.

Regards,
KAM

RE: setup masscheck account

Posted by Daniel Lemke <le...@jam-software.com>.
> From: Kevin A. McGrail [mailto:KMcGrail@PCCC.com]
> Sent: Thursday, August 09, 2012 6:00 PM
> To: ruleqa@spamassassin.apache.org
> Subject: Re: setup masscheck account
>
> On 8/9/2012 11:53 AM, darxus@chaosreigns.com wrote:
> > Try telnetting to port 22, or sshing to that host.
> Is rsync on our server using port 22 because he might need another port
> opened?  873 is the standard port I think.

Yep that's it, 22 is working, 873 gives me a timeout...

________________________________



----------------------------------------------------
JAM Software GmbH
Geschäftsführer: Joachim Marder
Am Wissenschaftspark 26 * 54296 Trier * Germany
Tel: 0651-145 653 -0 * Fax: 0651-145 653 -29
Handelsregister Nr. HRB 4920 (AG Wittlich) http://www.jam-software.de

Re: setup masscheck account

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/9/2012 11:53 AM, darxus@chaosreigns.com wrote:
> Try telnetting to port 22, or sshing to that host.
Is rsync on our server using port 22 because he might need another port 
opened?  873 is the standard port I think.

telnet rsync.spamassassin.org 873
Trying 140.211.11.80...
Connected to rsync.spamassassin.org.
Escape character is '^]'.
@RSYNCD: 28
This is the SpamAssassin Corpus rsync machine.

Modules that are available:

corpus
nightly mass-check result upload area.  It is password protected.
If you would like a password, please send a request to
pmc@spamassassin.apache.org and request a "nightly" username and password.

submit
Score generation mass-check result upload area.  It is password
protected.  If you would like a password, please send a request to
pmc@spamassassin.apache.org and request a "score generation" username
and password.  Generally these are only granted after a mass-check
announcement has been made on the spamassassin developer mailing list.

anoncorpus
mass-check result download area, available via anonymous access.

Re: setup masscheck account

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/9/2012 12:57 PM, Benny Pedersen wrote:
> or just moving to mars, we are here to help, not makeing it harder to 
> get corpus fixed, if my corpus work is not appricated i will just 
> spare my time and unsubscibe all spam maillist, thanks for using my 
> free time for all this shit asking for help, i will remember when 
> there is a problem again
>
> that said i still like to learn, its fun in the time of life, but Alex 
> comment please go away is not paid here

Email is cold so I wouldn't read his statements that harshly. People are 
just trying to identify a purpose for this mailing list that I left in a 
vacuum unless you were aware of the original setup request.

I expect that traffic will die down after the first few days and that we 
can use this to announce new versions of masscheck, etc.

In the meantime, post your questions on users or dev and I'm sure Alex 
will happily join in there.  He enjoys a good debate and likes helping 
people battle spammers.

regards,
KAM

Re: setup masscheck account

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-08-09 18:49, Kevin A. McGrail skrev:
>> PLEASE keep this list for masscheck support only.
>> blabber should go to sa-users or yahoogroups
>
> Actually, this mailing list was called ruleqa because it involves
> improving our rules and doesn't cover just masscheck.  I'll make that
> clearer on the wiki but it's likely better just to respond and add
> MOVING TO DEV or USERS as appropriate.

or just moving to mars, we are here to help, not makeing it harder to 
get corpus fixed, if my corpus work is not appricated i will just spare 
my time and unsubscibe all spam maillist, thanks for using my free time 
for all this shit asking for help, i will remember when there is a 
problem again

that said i still like to learn, its fun in the time of life, but Alex 
comment please go away is not paid here




Re: setup masscheck account

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/13/2012 3:28 AM, Daniel Lemke wrote:
> There doesn't actually exist a wiki article for the ruleqa list, does it?
Not yet, no.
> I hope it's ok to continue posting questions regarding the mass check setup stuff here, otherwise please let me know so I can move the discussion to users or dev list.
Mass check is definitely in the ruleqa list purpose.  However, RuleQA is 
a bit broader than just masscheck and hopefully will expand in the future.

> So back to topic: I think the script is basically running now, but I've got some more questions before moving on:
>
> The CorpusCleaning article in the Wiki (the one I shall read before continuing ;-)) says, one must not use data that has been collected from third-party accounts.
> Does that mean, I should only feed the corpus with mails from my own personal mail account, or is it ok if I also add mails from our sales, marketing, ..., departments?
> Those departments are relatively small as our company is and I'd call them a trustfully source as I personally 'advise' our personal in handling their mail boxes.
If the data is hand sorted and trustable, I would say a resounding yes.  
The point we are trying to make is something I see a lot.  I have users 
ALL the time report XYZ mailing list as spam.

However, there are companies that are clearly not sending unsolicited 
mail yet we'll see these mailing lists reported as spam.

This is because end users by practice consider spam filters as more of a 
"what I want to see in my inbox" more so than a purely unsolicited emails.

It sounds to me like you'll explain what this means to the people 
helping and it should lead to a great source of corpora!
> Secondly, I'd like to know in what frequency the corpus shall be fed with fresh data. Is it sufficient to do this once a week or does the nightly masscheck rely on fresh data on a daily basis?
As often as you can but we'll take what we can get.  Week old corpus 
data is still very useful and we use many years for ham data.  Plus 
spammers would just recycle their tricks if we decided to arbitrarily 
not use old spam.
> Small side note:
> Point 11 on the NightlyMassCheck article says you need to check the ham-*.log and spam-*.log in the ~/masscheckwork/nightly_mass_check/
> I found them in ~/masscheckwork/nightly_mass_check/masses, small mistake in the guide?
Likely, yes.  I'll edit it!

Regards,
KAM

RE: setup masscheck account

Posted by Daniel Lemke <le...@jam-software.com>.
> -----Original Message-----
> From: Kevin A. McGrail [mailto:KMcGrail@PCCC.com]
> Sent: Thursday, August 09, 2012 6:50 PM
> To: ruleqa@spamassassin.apache.org
> Cc: Axb
> Subject: Re: setup masscheck account
>
>
> > PLEASE keep this list for masscheck support only.
> > blabber should go to sa-users or yahoogroups
>
> Actually, this mailing list was called ruleqa because it involves improving our
> rules and doesn't cover just masscheck.  I'll make that clearer on the wiki but
> it's likely better just to respond and add MOVING TO DEV or USERS as
> appropriate.


There doesn't actually exist a wiki article for the ruleqa list, does it?
I hope it's ok to continue posting questions regarding the mass check setup stuff here, otherwise please let me know so I can move the discussion to users or dev list.


So back to topic: I think the script is basically running now, but I've got some more questions before moving on:

The CorpusCleaning article in the Wiki (the one I shall read before continuing ;-)) says, one must not use data that has been collected from third-party accounts.
Does that mean, I should only feed the corpus with mails from my own personal mail account, or is it ok if I also add mails from our sales, marketing, ..., departments?
Those departments are relatively small as our company is and I'd call them a trustfully source as I personally 'advise' our personal in handling their mail boxes.

Secondly, I'd like to know in what frequency the corpus shall be fed with fresh data. Is it sufficient to do this once a week or does the nightly masscheck rely on fresh data on a daily basis?

Small side note:
Point 11 on the NightlyMassCheck article says you need to check the ham-*.log and spam-*.log in the ~/masscheckwork/nightly_mass_check/
I found them in ~/masscheckwork/nightly_mass_check/masses, small mistake in the guide?

Regards,
Daniel


________________________________



--------------------------------------------------------
JAM Software GmbH
Managing Director: Joachim Marder
Am Wissenschaftspark 26 * 54296 Trier * Germany
Phone: +49 (0)651-145 653 -0 * Fax: +49 (0)651-145 653 -29
Commercial register number HRB 4920 (AG Wittlich) http://www.jam-software.com

Re: setup masscheck account

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
> PLEASE keep this list for masscheck support only.
> blabber should go to sa-users or yahoogroups

Actually, this mailing list was called ruleqa because it involves 
improving our rules and doesn't cover just masscheck.  I'll make that 
clearer on the wiki but it's likely better just to respond and add 
MOVING TO DEV or USERS as appropriate.

Regards,
KAM

Re: setup masscheck account

Posted by Axb <ax...@gmail.com>.
On 08/09/2012 06:29 PM, Benny Pedersen wrote:
> Den 2012-08-09 17:58, Kevin A. McGrail skrev:
>> On 8/9/2012 11:53 AM, darxus@chaosreigns.com wrote:
>>> So this is the new masscheck users list?  Nice.  Daniel, your post
>>> was the
>>> first ever.
>> About time, eh?
>>
>> I look forward to improving and increasing our RuleQA.  I've got a
>> lot of ideas of how we can battle spammers!
>
> my postfix-logwatch says i reject 12% so not much spam is sent to me

OFFTOPIC

> but to your question ?, i think about urlbl hosts blacklistning, if an
> webhoster hosts one single spammer lets list other domains that have
> same webhost ip as spamming domain, ruled out as a reputaion to make
> false marking low

OFFTOPIC

> i had the idear, but dont know if its usefull for all the phish mails i
> still get daily :(
>
> others then me using kernel 3.5.0 ? :)

OFFTOPIC

PLEASE keep this list for masscheck support only.
blabber should go to sa-users or yahoogroups


Re: setup masscheck account

Posted by Benny Pedersen <me...@junc.org>.
Den 2012-08-09 17:58, Kevin A. McGrail skrev:
> On 8/9/2012 11:53 AM, darxus@chaosreigns.com wrote:
>> So this is the new masscheck users list?  Nice.  Daniel, your post 
>> was the
>> first ever.
> About time, eh?
>
> I look forward to improving and increasing our RuleQA.  I've got a
> lot of ideas of how we can battle spammers!

my postfix-logwatch says i reject 12% so not much spam is sent to me

but to your question ?, i think about urlbl hosts blacklistning, if an 
webhoster hosts one single spammer lets list other domains that have 
same webhost ip as spamming domain, ruled out as a reputaion to make 
false marking low

i had the idear, but dont know if its usefull for all the phish mails i 
still get daily :(

others then me using kernel 3.5.0 ? :)




Re: setup masscheck account

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/9/2012 11:53 AM, darxus@chaosreigns.com wrote:
> So this is the new masscheck users list?  Nice.  Daniel, your post was the
> first ever.
About time, eh?

I look forward to improving and increasing our RuleQA.  I've got a lot 
of ideas of how we can battle spammers!

Regards,
KAM

Re: setup masscheck account

Posted by da...@chaosreigns.com.
On 08/09, Daniel Lemke wrote:
> > From: Kevin A. McGrail [mailto:KMcGrail@PCCC.com]
> > Are you somehow blocking outbound connections for rsync to grab that latest
> > trunk version?  That's what I believe is going on.
> 
> Most likely, I guess our corporate firewall is blocking the script here. I'll talk to our admins on this tomorrow.

Try telnetting to port 22, or sshing to that host.

Without a firewall blocking it, it'll look like this:


$ telnet rsync.spamassassin.org 22
Trying 140.211.11.80...
Connected to spamassassin.zones.apache.org.
Escape character is '^]'.
SSH-2.0-Sun_SSH_1.1.2


$ ssh rsync.spamassassin.org
The authenticity of host 'rsync.spamassassin.org (140.211.11.80)' can't be established.
RSA key fingerprint is 31:80:50:bb:84:85:e0:af:1e:cb:17:56:7d:bf:b3:53.
Are you sure you want to continue connecting (yes/no)? 


So this is the new masscheck users list?  Nice.  Daniel, your post was the
first ever.

-- 
"Life is either a daring adventure or it is nothing at all."
- Helen Keller
http://www.ChaosReigns.com

RE: setup masscheck account

Posted by Daniel Lemke <le...@jam-software.com>.
> From: Kevin A. McGrail [mailto:KMcGrail@PCCC.com]
> Are you somehow blocking outbound connections for rsync to grab that latest
> trunk version?  That's what I believe is going on.

Most likely, I guess our corporate firewall is blocking the script here. I'll talk to our admins on this tomorrow.


> From: Axb [mailto:axb.lists@gmail.com]
> yep - is this cygwin or a *nix distro?

My Debian is hosted in VirtualBox, so kind of native *nix ;-)

> did it create a  ~/masscheckwork/weekly_mass_check/
> (in your home directory)

It created the masscheckwork dir but it's empty.


Regards,
Daniel

________________________________



--------------------------------------------------------
JAM Software GmbH
Managing Director: Joachim Marder
Am Wissenschaftspark 26 * 54296 Trier * Germany
Phone: +49 (0)651-145 653 -0 * Fax: +49 (0)651-145 653 -29
Commercial register number HRB 4920 (AG Wittlich) http://www.jam-software.com

Re: setup masscheck account

Posted by Axb <ax...@gmail.com>.
On 08/09/2012 04:55 PM, Daniel Lemke wrote:
> So I've just received my rsync account (thanks Kevin!), tried to set up the mass check account and already run into some issues.
>
> To be more specific (system is a Debian 6.0.4)
> - Installed all required Perl modules
> - Manually installed SpamAssassin 3.3.2 (used the archive on the apache site)
> - Extracted the auto-mass-check script
> - Copied the files to their proper location (according to http://wiki.apache.org/spamassassin/NightlyMassCheck)
> - Modified the auto-mass-check.cf (defined rsync account, set directories for run_masscheck spam and ham)
> - Ran the auto-mass-check script, got the following output:
>
> ++ ./mass-check --hamlog=ham-dlemke-ham.log --spamlog=spam-dlemke-ham.log -j --progress ham:dir:/home/lemke/masses/HAM '}'
> ./auto-mass-check.sh: line 133: ./mass-check: No such file or directory
> ++ LOGLIST=' ham-dlemke-ham.log spam-dlemke-ham.log'
> ++ set +x
> rsync: failed to connect to rsync.spamassassin.org: Connection timed out (110)
> rsync error: error in socket IO (code 10) at clientserver.c(122) [Receiver=3.0.7]
>
> I guess the mass-check script it didn't found is the one that is located at the masses directory at the SVN repo?

yep - is this cygwin or a *nix distro?

> Do I have to download and compile this before executing the auto-mass-check script?

The script does a rsync of the source and places in

~/masscheckwork/weekly_mass_check/
>
> Sorry if I'm asking something obvious, I'm not that used to unix boxes ;-)

did it create a  ~/masscheckwork/weekly_mass_check/
(in your home directory)

Alex