You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2013/08/07 11:35:38 UTC

[Bug 6965] New: BayesStore/Redis: dealing with server restarts and out-of-lockstep protocol

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6965

            Bug ID: 6965
           Summary: BayesStore/Redis: dealing with server restarts and
                    out-of-lockstep protocol
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: Mark.Martinec@ijs.si

Noticed some problems with the Redis backend for bayes:

1. About once in a week I can see in our logs that Redis goes out of sync
on the protocol with a server, and doesn't recover. This happens when
some redis operation times out: the result comes back from the server
but is not read by SpamAssassin because of a timeout. The next mail check
task in the same child process then tries to make its query, but receives
remains of a previous reply. This situation does not correct itself
for the lifetime of the child process.

As the Redis CPAN module does not offer any method to flush input buffers,
the cleanest way to recover is to just drop the TCP session, and re-connect
on the next request. The current operation still fails, but at least all
subsequent operations will be able to work on a fresh session.

To implement this, I propose to fully decouple the redis server session's
connect/disconnect from the SpamAssassin's notion of a "logical connection",
i.e. the tie_db_readonly / tie_db_writable / untie_db.

2. To be able to detect such out-of-sequence results in the future,
more detailed checks are added, including passing a random 'nonce'
to a multi_hmget_script and expecting the same value to come back
with a reply - and dropping a session if it doesn't match.

3. In view of the Redis module's problem report 38:
| https://github.com/melo/perl-redis/issues/38
| Select new database doesn't survive after reconnect.
| If you select to a different database, and then loose the connection
| and reconnect, you'll end back into the default database. We should
| keep track of the current database and select to it after reconnect.
we are currently in trouble if one choses a non-zero database index
and enables automatic reconnects. After a reconnect, SpamAssassin
ends up connected to a database index zero.

The solution is in providing a method on_connect() to the Redis module.
This callback is called on every connect to a server, initial and
automatic re-connects. In this routine we can select() the database
or do whatever is necessary to ensure a consistent state.

Even if the issue 38 is eventually resolved in Redis.pm, it does
no harm to call select() twice on a connect.

The only drawback is that the on_connect callback is only available
since Redis.pm version 1.956, so we need to bump the minimal requirement
for this module from 1.954 to 1.956. Btw, the current version is 1.961
from January 2013.


Attached is a patch to implement this. It is a bit larger then necessary
as it also renames the {is_really_open} to {connected} and swaps some
code sections for better readability (shorter 'if' branch first, longer
branch after an 'else').

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6965] BayesStore/Redis: dealing with server restarts and out-of-lockstep protocol

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6965

--- Comment #1 from Mark Martinec <Ma...@ijs.si> ---
Created attachment 5166
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5166&action=edit
The patch implementing the above changes

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6965] BayesStore/Redis: dealing with server restarts and out-of-lockstep protocol

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6965

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #3 from Mark Martinec <Ma...@ijs.si> ---
Now with the Bug 6972 proposed changes, this PR becomes mostly irrelevant
(but served as a foundation for Bug 6972 rewrites).
Closing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6965] BayesStore/Redis: dealing with server restarts and out-of-lockstep protocol

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6965

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.4.0

--- Comment #2 from Mark Martinec <Ma...@ijs.si> ---
trunk:
  Bug 6965: BayesStore/Redis: dealing with server restarts
  and out-of-lockstep protocol
Sending lib/Mail/SpamAssassin/BayesStore/Redis.pm
Sending lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Committed revision 1511228.

-- 
You are receiving this mail because:
You are the assignee for the bug.