You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruleqa@spamassassin.apache.org by Henrik Krohns <he...@hege.li> on 2018/09/14 10:30:53 UTC

slow ruleqa?

Everything is quite slow, this loads for 10 seconds:
http://ruleqa.spamassassin.org/20180913-r1840789-n/MIMEOLE_DIRECT_TO_MX/detail

Is sa-vm1 overloaded or something else broken?

I made identical setup on my home server which outputs exactly the same
page, it loads in 3 seconds (i5-2400 CPU @ 3.10GHz).  Looking if I can
optimize it a bit..

Cheers,
Henrik

Re: slow ruleqa?

Posted by Henrik Krohns <he...@hege.li>.
On Tue, Sep 18, 2018 at 07:41:51AM -0400, Kevin A. McGrail wrote:
> On 9/18/2018 2:26 AM, Henrik Krohns wrote:
> > Could someone install Compress::LZ4 to sa-vm1 please. :-)
> Dave, I'll defer to you but tell me if you want me to handle it.  I
> don't know if there is a simple package or cpan, etc.

It's standalone module with zero dependencies, so perfectly fine to cpan.. 
doesn't seem to be packaged for Ubuntu 16.04 anyway.

Re: slow ruleqa?

Posted by "Kevin A. McGrail" <km...@apache.org>.
On 9/18/2018 2:26 AM, Henrik Krohns wrote:
> Could someone install Compress::LZ4 to sa-vm1 please. :-)
Dave, I'll defer to you but tell me if you want me to handle it.  I
don't know if there is a simple package or cpan, etc.

-- 
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


Re: slow ruleqa?

Posted by Henrik Krohns <he...@hege.li>.
On Mon, Sep 17, 2018 at 07:36:23AM -0400, Kevin A. McGrail wrote:
> On 9/17/2018 7:34 AM, Henrik Krohns wrote:
> > On Fri, Sep 14, 2018 at 10:40:48PM +0300, Henrik Krohns wrote:
> >> On Fri, Sep 14, 2018 at 02:25:34PM -0500, Dave Jones wrote:
> >>> I installed XML::SAX::ExpatXS on sa-vm1.  Not sure if this will help
> >>> immediately or have to wait until something runs.
> >>>
> >>> I will be the first to admit that I am not a perl person.  I was just the
> >>> guy who stepped forward to get the RuleQA stuff running again about 19
> >>> months ago.
> >> Didn't seem to do much, still loading 10 seconds..
> >>
> >> I'm open looking into RuleQA problems too, there's many bugs around, not to
> >> mention the reuse issue and guidance.  Let me know (Kevin?) if you want help
> >> tuning sa-vm1, it's probably easier debugging with access..
> > I have pretty extensively modified and tested ruleqa.cgi ready to commit. 
> > Compress::LZ4 needs to be installed for it to work.
> >
> > Does committing it to trunk automatically drop it into sa-vm1?
> Most likely, yes.  You'll want to coordinate with sysadmins@s.a.o and
> get that module installed I would suggest.

Could someone install Compress::LZ4 to sa-vm1 please. :-)

Cheers,
Henrik

Re: slow ruleqa?

Posted by "Kevin A. McGrail" <km...@apache.org>.
On 9/17/2018 7:34 AM, Henrik Krohns wrote:
> On Fri, Sep 14, 2018 at 10:40:48PM +0300, Henrik Krohns wrote:
>> On Fri, Sep 14, 2018 at 02:25:34PM -0500, Dave Jones wrote:
>>> I installed XML::SAX::ExpatXS on sa-vm1.  Not sure if this will help
>>> immediately or have to wait until something runs.
>>>
>>> I will be the first to admit that I am not a perl person.  I was just the
>>> guy who stepped forward to get the RuleQA stuff running again about 19
>>> months ago.
>> Didn't seem to do much, still loading 10 seconds..
>>
>> I'm open looking into RuleQA problems too, there's many bugs around, not to
>> mention the reuse issue and guidance.  Let me know (Kevin?) if you want help
>> tuning sa-vm1, it's probably easier debugging with access..
> I have pretty extensively modified and tested ruleqa.cgi ready to commit. 
> Compress::LZ4 needs to be installed for it to work.
>
> Does committing it to trunk automatically drop it into sa-vm1?
Most likely, yes.  You'll want to coordinate with sysadmins@s.a.o and
get that module installed I would suggest.

Re: slow ruleqa?

Posted by Henrik Krohns <he...@hege.li>.
On Mon, Sep 17, 2018 at 02:34:17PM +0300, Henrik Krohns wrote:
> On Fri, Sep 14, 2018 at 10:40:48PM +0300, Henrik Krohns wrote:
> > On Fri, Sep 14, 2018 at 02:25:34PM -0500, Dave Jones wrote:
> > > 
> > > I installed XML::SAX::ExpatXS on sa-vm1.  Not sure if this will help
> > > immediately or have to wait until something runs.
> > > 
> > > I will be the first to admit that I am not a perl person.  I was just the
> > > guy who stepped forward to get the RuleQA stuff running again about 19
> > > months ago.
> > 
> > Didn't seem to do much, still loading 10 seconds..
> > 
> > I'm open looking into RuleQA problems too, there's many bugs around, not to
> > mention the reuse issue and guidance.  Let me know (Kevin?) if you want help
> > tuning sa-vm1, it's probably easier debugging with access..
> 
> I have pretty extensively modified and tested ruleqa.cgi ready to commit. 
> Compress::LZ4 needs to be installed for it to work.
> 
> Does committing it to trunk automatically drop it into sa-vm1?

So Dave installed it now.  Today normal load time seemed to be 6 secs, after
new ruleqa.cgi it dropped to 3 sec.  Some improvement atleast.  :-)

Could be there's some more data that I can't simulate at home, but it's fast
enough for now to browse around.

-hk

Re: slow ruleqa?

Posted by Henrik Krohns <he...@hege.li>.
On Fri, Sep 14, 2018 at 10:40:48PM +0300, Henrik Krohns wrote:
> On Fri, Sep 14, 2018 at 02:25:34PM -0500, Dave Jones wrote:
> > 
> > I installed XML::SAX::ExpatXS on sa-vm1.  Not sure if this will help
> > immediately or have to wait until something runs.
> > 
> > I will be the first to admit that I am not a perl person.  I was just the
> > guy who stepped forward to get the RuleQA stuff running again about 19
> > months ago.
> 
> Didn't seem to do much, still loading 10 seconds..
> 
> I'm open looking into RuleQA problems too, there's many bugs around, not to
> mention the reuse issue and guidance.  Let me know (Kevin?) if you want help
> tuning sa-vm1, it's probably easier debugging with access..

I have pretty extensively modified and tested ruleqa.cgi ready to commit. 
Compress::LZ4 needs to be installed for it to work.

Does committing it to trunk automatically drop it into sa-vm1?

Cheers,
Henrik

Re: slow ruleqa?

Posted by Henrik Krohns <he...@hege.li>.
On Fri, Sep 14, 2018 at 02:25:34PM -0500, Dave Jones wrote:
> 
> I installed XML::SAX::ExpatXS on sa-vm1.  Not sure if this will help
> immediately or have to wait until something runs.
> 
> I will be the first to admit that I am not a perl person.  I was just the
> guy who stepped forward to get the RuleQA stuff running again about 19
> months ago.

Didn't seem to do much, still loading 10 seconds..

I'm open looking into RuleQA problems too, there's many bugs around, not to
mention the reuse issue and guidance.  Let me know (Kevin?) if you want help
tuning sa-vm1, it's probably easier debugging with access..

Cheers,
Henrik

Re: slow ruleqa?

Posted by Dave Jones <da...@apache.org>.
On 9/14/18 2:13 PM, Henrik Krohns wrote:
> 
> On Fri, Sep 14, 2018 at 07:18:29PM +0300, Henrik Krohns wrote:
>> On Fri, Sep 14, 2018 at 07:48:48AM -0500, David Jones wrote:
>>> On 9/14/18 5:30 AM, Henrik Krohns wrote:
>>>>
>>>> Everything is quite slow, this loads for 10 seconds:
>>>> http://ruleqa.spamassassin.org/20180913-r1840789-n/MIMEOLE_DIRECT_TO_MX/detail
>>>>
>>>> Is sa-vm1 overloaded or something else broken?
>>>>
>>>> I made identical setup on my home server which outputs exactly the same
>>>> page, it loads in 3 seconds (i5-2400 CPU @ 3.10GHz).  Looking if I can
>>>> optimize it a bit..
>>>>
>>>> Cheers,
>>>> Henrik
>>>>
>>>
>>> Do you have the same large corpus on your test machine and is it running the
>>> hit-frequencies script hourly to update the ruleqa web page information?
>>
>> Yes, as I said the outputted page is identical.
>>
>> I already implemented Storable+LZ4 cache for read_freqs_file which is most
>> time consuming.  Now it can simply load the preprocessed hash from
>> .scache files (created by -refresh -scache).  That and little regex tweaks
>> dropped processing time from 3s to 1.5s, pretty sure some more stuff can be
>> tweaked too..
>>
>>    377745 Sep 14 10:30 DETAILS.age
>> 12218511 Sep 14 19:04 DETAILS.age.scache
>>   1434012 Sep 14 10:30 DETAILS.all
>>   7509747 Sep 14 19:04 DETAILS.all.scache
>>    151428 Sep 14 17:28 DETAILS.new
>>   6838930 Sep 14 19:04 DETAILS.new.scache
>> 67612070 Sep 14 12:53 OVERLAP.new
>> 12029621 Sep 14 19:04 OVERLAP.new.scache
>>   2212249 Sep 14 11:02 SCOREMAP.new
>>   6800679 Sep 14 19:04 SCOREMAP.new.scache
>>
>> Overall it only adds ~10% to daily directory sizes thanks to LZ4 compression
>> for the cache files (would be 6x otherwise).  LZ4 is so fast there is no
>> difference with or without it.
>>
>> Since ruleqa keeps lots of history, probably only last week should be
>> cached?  Or few days.  Does anyone actually browse ruleqa?
>>
>> I'll probably switch ruleqa.cache to ruleqa.scache too if it helps.
>>
>> Guess I'm waiting for RTC to end and tweak some more..
> 
> Looks like it still spends crazy amount of time just parsing
> rulemetadata.xml and friends.  Why do we even need XML?
> 
> Things sped up from 1.5s -> 0.9s when I installed XML::SAX::ExpatXS, check
> that it's installed in sa-vm1..
> 
> https://metacpan.org/pod/XML::Simple::FAQ#Why-is-XML::Simple-so-slow?
> 

I installed XML::SAX::ExpatXS on sa-vm1.  Not sure if this will help 
immediately or have to wait until something runs.

I will be the first to admit that I am not a perl person.  I was just 
the guy who stepped forward to get the RuleQA stuff running again about 
19 months ago.

Dave

Re: slow ruleqa?

Posted by Henrik Krohns <he...@hege.li>.
On Fri, Sep 14, 2018 at 07:18:29PM +0300, Henrik Krohns wrote:
> On Fri, Sep 14, 2018 at 07:48:48AM -0500, David Jones wrote:
> > On 9/14/18 5:30 AM, Henrik Krohns wrote:
> > >
> > >Everything is quite slow, this loads for 10 seconds:
> > >http://ruleqa.spamassassin.org/20180913-r1840789-n/MIMEOLE_DIRECT_TO_MX/detail
> > >
> > >Is sa-vm1 overloaded or something else broken?
> > >
> > >I made identical setup on my home server which outputs exactly the same
> > >page, it loads in 3 seconds (i5-2400 CPU @ 3.10GHz).  Looking if I can
> > >optimize it a bit..
> > >
> > >Cheers,
> > >Henrik
> > >
> > 
> > Do you have the same large corpus on your test machine and is it running the
> > hit-frequencies script hourly to update the ruleqa web page information?
> 
> Yes, as I said the outputted page is identical.
> 
> I already implemented Storable+LZ4 cache for read_freqs_file which is most
> time consuming.  Now it can simply load the preprocessed hash from
> .scache files (created by -refresh -scache).  That and little regex tweaks
> dropped processing time from 3s to 1.5s, pretty sure some more stuff can be
> tweaked too..
> 
>   377745 Sep 14 10:30 DETAILS.age
> 12218511 Sep 14 19:04 DETAILS.age.scache
>  1434012 Sep 14 10:30 DETAILS.all
>  7509747 Sep 14 19:04 DETAILS.all.scache
>   151428 Sep 14 17:28 DETAILS.new
>  6838930 Sep 14 19:04 DETAILS.new.scache
> 67612070 Sep 14 12:53 OVERLAP.new
> 12029621 Sep 14 19:04 OVERLAP.new.scache
>  2212249 Sep 14 11:02 SCOREMAP.new
>  6800679 Sep 14 19:04 SCOREMAP.new.scache
> 
> Overall it only adds ~10% to daily directory sizes thanks to LZ4 compression
> for the cache files (would be 6x otherwise).  LZ4 is so fast there is no
> difference with or without it.
> 
> Since ruleqa keeps lots of history, probably only last week should be
> cached?  Or few days.  Does anyone actually browse ruleqa?
> 
> I'll probably switch ruleqa.cache to ruleqa.scache too if it helps.
> 
> Guess I'm waiting for RTC to end and tweak some more..

Looks like it still spends crazy amount of time just parsing
rulemetadata.xml and friends.  Why do we even need XML?

Things sped up from 1.5s -> 0.9s when I installed XML::SAX::ExpatXS, check
that it's installed in sa-vm1..

https://metacpan.org/pod/XML::Simple::FAQ#Why-is-XML::Simple-so-slow?


Re: slow ruleqa?

Posted by John Hardin <jh...@impsec.org>.
On Fri, 14 Sep 2018, Henrik Krohns wrote:

> Since ruleqa keeps lots of history, probably only last week should be
> cached?  Or few days.  Does anyone actually browse ruleqa?

I'd keep at least one full week, preferably at least two.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The problem is when people look at Yahoo, slashdot, or groklaw and
   jump from obvious and correct observations like "Oh my God, this
   place is teeming with utter morons" to incorrect conclusions like
   "there's nothing of value here".        -- Al Petrofsky, in Y! SCOX
-----------------------------------------------------------------------
  3 days until the 231st anniversary of the signing of the U.S. Constitution

Re: slow ruleqa?

Posted by Henrik Krohns <he...@hege.li>.
On Fri, Sep 14, 2018 at 07:48:48AM -0500, David Jones wrote:
> On 9/14/18 5:30 AM, Henrik Krohns wrote:
> >
> >Everything is quite slow, this loads for 10 seconds:
> >http://ruleqa.spamassassin.org/20180913-r1840789-n/MIMEOLE_DIRECT_TO_MX/detail
> >
> >Is sa-vm1 overloaded or something else broken?
> >
> >I made identical setup on my home server which outputs exactly the same
> >page, it loads in 3 seconds (i5-2400 CPU @ 3.10GHz).  Looking if I can
> >optimize it a bit..
> >
> >Cheers,
> >Henrik
> >
> 
> Do you have the same large corpus on your test machine and is it running the
> hit-frequencies script hourly to update the ruleqa web page information?

Yes, as I said the outputted page is identical.

I already implemented Storable+LZ4 cache for read_freqs_file which is most
time consuming.  Now it can simply load the preprocessed hash from
.scache files (created by -refresh -scache).  That and little regex tweaks
dropped processing time from 3s to 1.5s, pretty sure some more stuff can be
tweaked too..

  377745 Sep 14 10:30 DETAILS.age
12218511 Sep 14 19:04 DETAILS.age.scache
 1434012 Sep 14 10:30 DETAILS.all
 7509747 Sep 14 19:04 DETAILS.all.scache
  151428 Sep 14 17:28 DETAILS.new
 6838930 Sep 14 19:04 DETAILS.new.scache
67612070 Sep 14 12:53 OVERLAP.new
12029621 Sep 14 19:04 OVERLAP.new.scache
 2212249 Sep 14 11:02 SCOREMAP.new
 6800679 Sep 14 19:04 SCOREMAP.new.scache

Overall it only adds ~10% to daily directory sizes thanks to LZ4 compression
for the cache files (would be 6x otherwise).  LZ4 is so fast there is no
difference with or without it.

Since ruleqa keeps lots of history, probably only last week should be
cached?  Or few days.  Does anyone actually browse ruleqa?

I'll probably switch ruleqa.cache to ruleqa.scache too if it helps.

Guess I'm waiting for RTC to end and tweak some more..

Cheers,
Henrik

Re: slow ruleqa?

Posted by David Jones <dj...@ena.com>.
On 9/14/18 5:30 AM, Henrik Krohns wrote:
> 
> Everything is quite slow, this loads for 10 seconds:
> http://ruleqa.spamassassin.org/20180913-r1840789-n/MIMEOLE_DIRECT_TO_MX/detail
> 
> Is sa-vm1 overloaded or something else broken?
> 
> I made identical setup on my home server which outputs exactly the same
> page, it loads in 3 seconds (i5-2400 CPU @ 3.10GHz).  Looking if I can
> optimize it a bit..
> 
> Cheers,
> Henrik
> 

Do you have the same large corpus on your test machine and is it running 
the hit-frequencies script hourly to update the ruleqa web page information?

-- 
David Jones