You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2013/09/05 20:26:29 UTC

[Bug 6972] New: BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

            Bug ID: 6972
           Summary: BayesStore/Redis.pm: ditching CPAN module Redis gains
                    speed, avoids its bugs, works on IPv6
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: Mark.Martinec@ijs.si

Considering that the CPAN module Redis (implementing client-side
of the redis protocol in pure perl) is:

- quite inefficient due to its baroque API design (e.g. PR #46, #49)
  (e.g. poor batching, use of callbacks, use of ungetc);

- the project seems to have stalled - no response to problem reports
  for the last four months, last release from January;

- has some bugs affecting its use in SpamAssassin (#38),
  whose workaround is only available since version 1.956 (see
  Bug 6965) which some Linux distributions haven't picked up yet;

- does not support IPv6, which is available since redis 2.8.0;

- adds one more module dependency to SpamAssassin;  ...


I decided to try my hand at re-writing the client-side of the redis
protocol from scratch, with the intent of avoiding bottlenecks
which have been indicated by profiling our usage and benchmarks
with a perl profiler. The redis protocol is fairly simple, so
the solution came out as 200 lines of perl code, including
necessary error handling and comments. This code is now included
in amavisd-new version 2.8.2, where I faced the same problems.

The BayesStore/Redis.pm module needed a couple of fairly
straightforward adjustments, often in a form of simplifications,
and during the few weeks since I'm running this code it proved
itself to be fast and robust. Works well over IPv6 too :)

The code paths involving a Lua scripting in a redis server are
still faster than the non-Lua case, but due to the now efficient
command batching (pipelining) code, the non-Lua case comes fairly
close to the Lua case. In two Bayes function they turned out to be
truly the same speed, which eliminated a need for two Lua functions:
multi_expire_script and multi_hincrby.

So despite including the full redis protocol support, the resulting
BayesStore/Redis.pm still has fewer lines than the current version
we have, and saves a few bytes of memory by not implementing what
we don't need (like message passing / subscriptions).

Perhaps the most illustrative case comes from timing of the
'sa-learn --backup' usage. While this is a marginal and non-critical
usage case, the same ratio applies to other code sections (like
multi-token lookups, as proved by measuring elapsed times on a
production mailer).

Benchmark:
  elapsed time needed for 'sa-learn --backup',
  800.000 real-life tokens, 2200 'seen' entries:

- original CPAN Redis module, with Lua:  37 seconds
- new protocol client code,   with Lua:  26 seconds

- original CPAN Redis module, no Lua:    88 seconds
- new protocol client code,   no Lua:    37 seconds

As can be seen, the batched classical use is now more than
twice the speed, while the Lua case also benefited by some 50%.


So I'd like to fold in the replaced BayesStore/Redis.pm, if
you don't mind, and un-document a dependency on the CPAN module.
The database itself is unchanged, so full compatibility is retained.


Due to the number of diffs which makes a review difficult,
I'll attach the full replacement Redis.pm, not a patch.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6972] BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

--- Comment #6 from AXB <ax...@gmail.com> ---
(In reply to Henrik Krohns from comment #4)
> Mark you are a machine! :-) Happy to test.

I was going to deploy the chnages before this today - now I'm in a dilemma.
Have these changes been running for a while on your systems?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6972] BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

--- Comment #5 from Henrik Krohns <he...@hege.li> ---
Just one though here.. would it be easy to create a generic Redis-connection
module for SA? Think something like AskDNS. There were ideas around that Redis
could be used for some hashing services etc..

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6972] BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

--- Comment #8 from Mark Martinec <Ma...@ijs.si> ---
DependencyInfo.pm: dropped dependency on module Redis,
updated PROPOSED-3.4.0.txt accordingly
  Sending build/announcements/PROPOSED-3.4.0.txt
  Sending lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Committed revision 1520513.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6972] BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #9 from Mark Martinec <Ma...@ijs.si> ---
As far as I'm concerned this is done, works well, docs updated.
Closing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6972] BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

--- Comment #1 from Mark Martinec <Ma...@ijs.si> ---
Created attachment 5170
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5170&action=edit
The proposed replacement BayesStore/Redis.pm

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6972] BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

--- Comment #7 from Mark Martinec <Ma...@ijs.si> ---
> I was going to deploy the chnages before this today - now I'm in a dilemma.
> Have these changes been running for a while on your systems?

Yes, running on our production mailer since middle of August (about
three weeks now), with some final refinements and benchmarking during
the test period. It survives redis server restarts and outages (logs
warnings during on outage, but reconnects and reloads Lua programs
when a server reappears).

Tried also the backup/restore (and noticed we need to add some
randomness to expiration times on a '--restore' operation to avoid
mass expiration of tokens in the same moment, now fixed).

The exact same redis client code is also used in my amavisd for its
pen pals and IP reputation databases. Currently released as 2.8.2-rc1
( http://amavis.org/release-notes.txt ).


Henrik writes:
> Just one though here.. would it be easy to create a generic Redis-connection
> module for SA? Think something like AskDNS. There were ideas around that
> Redis could be used for some hashing services etc..

I thought of it. The 'TinyRedis' package could be made into its own file,
or moved to Util.pm. Eventually it would be nice to make it a CPAN module
to avoid code duplication between SpamAssassin and amavisd, although this
would shed one of its benefits: self-contained code with no dependency
on new modules. The code is fairly small so code duplication is not a
big deal for now.

So yes, if other parts of SpamAssassin develop a need for the TinyRedis
package, we'll just move it out of the BayesStore/Redis.pm to its own.
It is not specific to a bayes function. AWL comes to mind.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6972] BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

--- Comment #3 from Mark Martinec <Ma...@ijs.si> ---
I should add: I also benchmarked the other alternative for the
redis client module, the Redis::hiredis CPAN module.

It is based of the hiredis library (writen in C), which makes
it small and fast. Unfortunately its batching API (pipelining)
is not 8-bit clean, which makes it useless for our purpose.
The non-batching mode is somewhat faster but in the same league
as my new protocol implementation here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6972] BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.4.0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6972] BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

--- Comment #2 from Mark Martinec <Ma...@ijs.si> ---
trunk:
  Bug 6972: BayesStore/Redis.pm: ditching CPAN module Redis
  gains speed, avoids its bugs, works on IPv6
Sending lib/Mail/SpamAssassin/BayesStore/Redis.pm
Committed revision 1520380.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6972] BayesStore/Redis.pm: ditching CPAN module Redis gains speed, avoids its bugs, works on IPv6

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972

Henrik Krohns <he...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hege@hege.li

--- Comment #4 from Henrik Krohns <he...@hege.li> ---
Mark you are a machine! :-) Happy to test.

-- 
You are receiving this mail because:
You are the assignee for the bug.