You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2009/01/19 15:50:34 UTC

[Bug 6046] New: Add BayesStore backend using direct calls to BerkeleyDB

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046

           Summary: Add BayesStore backend using direct calls to BerkeleyDB
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P3
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: mdorman@ironicdesign.com


This is intended to be a high-speed, high-concurrency local BayesStore
implementation.  I think it succeeds in all of those respects.

In benchmarking using an updated (but largely unchanged in procedure)
masses/bayes-testing/benchmark setup, against a corpus of 6000 messages each of
ham and spam, it beats all of the other back-ends I tested in all operations.

The smallest gains are against the DB_File back-end, where it generally shows a
margin of 10%-20%, while it scores upwards of 3x the speed of the SQL
back-ends.

I have not yet run this code in production.  I intend to start that this week. 
I would certainly appreciate more eyes on the code.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.3.0                       |3.3.1

--- Comment #26 from Mark Martinec <Ma...@ijs.si> 2009-11-25 10:25:36 UTC ---
I tried it running for a couple of hours, seems to work: it did learn tokens
starting with an empty database, it did start providing good results once
it reached the 200 messages limit, but after a couple of hours it became
slower and slower (using libdb47), so I had to turn it off.

As it is, it's good enough for 3.3.0 (provided some warning text is written
in release notes). I'm moving it off to 3.3.1 target, perhaps someone would
be willing to improve it by then.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #22 from Justin Mason <jm...@jmason.org>  2009-06-29 04:35:25 PST ---
is this still occasionally flaky?  we could put it into 3.3.0alpha as a
disabled-by-default optional "alpha-quality" plugin....

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #11 from Michael Alan Dorman <md...@ironicdesign.com>  2009-01-23 11:18:45 PST ---
> Specifically, bdb and db_file are very close right up until the long
> single-threaded slogs through large mailboxes at the end of the benchmark, at
> which point bdb pulls out a 5x-6x speed differential.  That seems to me wildly
> improbable.

Oh, wait, I forgot, this was with auto_expire enabled, so it's not surprising
that it takes longer.  with auto_expire disabled, bdb is a little faster...but
only a little.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Re: [Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by Michael Alan Dorman <md...@ironicdesign.com>.
> The module looks alright, but needs testing. I have a few minor
> coding/style comments, but that can be addressed later or during
> importing it.

Yeah, I tried to keep to the style, but I have to admit a couple of
things go against deeply-ingrained habits, so I wouldn't be surprised
if I missed some some stuff.

> Michael, is the attached code the latest that you have, or have
> you done any changes later?

I haven't done any more development at this point.  Though as I noted
in another mail, some more development may be warranted---it did not
perform as expected when I set it up in production.  I anticipate this
being relatively minor to correct, since it seems to work well in
testing.

> Do we have a licensing agreement from Michael?

You should---I emailed a scan of a completed and signed form to
secretary@apache.org more than a week ago, and there should also be a
corporate assignment for me from ironicdesign.com that was faxed in
even before that.

If either of those are missing, please let me know---we didn't get any
ack for either.

> I'd suggest we go on with importing the module -
> - as it is an optional backend module what is important is that
> we have an intention of adopting it and that it compiles cleanly,
> any minor bugs can be fixed along the way.

That would be very cool.  I certainly intend to keep working on it, but
I would also appreciate any input others have.

Mike.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #15 from Mark Martinec <Ma...@ijs.si>  2009-02-06 05:32:10 PST ---
Now back to the original topic.

The module looks alright, but needs testing. I have a few minor
coding/style comments, but that can be addressed later or during
importing it.

Michael, is the attached code the latest that you have, or have
you done any changes later? Do we have a licensing agreement
from Michael? I'd suggest we go on with importing the module -
- as it is an optional backend module what is important is that
we have an intention of adopting it and that it compiles cleanly,
any minor bugs can be fixed along the way.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #3 from Mark Martinec <Ma...@ijs.si>  2009-01-22 06:12:19 PST ---
> I've still not quite got this in production use yet, but on benchmarks it
> thrashes all of the other options pretty soundly.  I'll try to add those
> figures after a quick re-run.

What does benchmarking say on auto-purging the database?
In my experience, lengthy auto expiration runs was the single
and most pressing reason to abandon bdb and switch our Bayes
to SQL.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #5 from Mark Martinec <Ma...@ijs.si>  2009-01-22 07:31:13 PST ---
> Well we don't have to do the dance of move-and-rebuild that DBM does, in part
> because we are able to create a secondary index on atime that makes it very
> easy to estimate whether we would expire too many tokens in a given run.  So I
> would expect it to be as efficient as the SQL back-ends in that respect.

Good to hear.

> Still, it's a good question that would not be answered out of the box by the
> benchmark code, so I've changed the local.cf files in the benchmark directory
> to remove bayes_auto_expire 0, and I'll re-run and see what the results look
> like.

Thanks!

> I'm curious, which SQL back-end are you using?

Mail::SpamAssassin::BayesStore::MySQL

> And if it's MySQL, do you have any performance tuning tips?

Initially not. Later I added (/etc/my.cnf) :

[mysqld]
bind=127.0.0.1
key_buffer_size=60M
innodb_buffer_pool_size=384M
innodb_log_buffer_size=6M
innodb_flush_log_at_trx_commit=0
max_connections=60

based on some tips from:
http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/

This is with MySQL 5.1.24 InnoDB (current size 5.5 GB), on FreeBSD,
SA 3.3; currently bayes_token has 1M records, bayes_seen has 26M records
(I know, I need to ditch bayes_seen and start it from scratch).

Initially I used MyISAM and Mail::SpamAssassin::BayesStore::SQL,
which would get me in trouble every now and then, requiring a
REPAIR TABLE. Now with InnoDB and the dedicated BayesStore::MySQL
it never again got me into trouble in two years.

> I've been a little startled to find that PgSQL is
> actually outperforming MySQL on my benchmarks (given their reps)
> but then I know how to tune PgSQL well; I'm more ignorant about MySQL.

I was running Bayes for a while on PostgreSQL 8.2 using
Mail::SpamAssassin::BayesStore::PgSQL, but the SELECT ... IN (...)
with a large set of tokens in the IN-set was quite slow,
much slower than with MySQL.

I'm still using PostgreSQL for everything else except Bayes,
i.e. for AWL, and for amavisd-new SQL logging / pen pals database,
which is quite large and outperforms MySQL, especially on purging.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #21 from Mark Martinec <Ma...@ijs.si>  2009-04-01 10:15:28 PST ---
Now that the perl 5.8.9/5.10.0 stack overflow when Berkeley DB is in use
is resolved (Bug 6060) by a workaround, I could play again with this
BDB bayesstore backend.

  M::SA::BayesStore::BDB : keep databases persistently open in an
  attempt to reduce likelihood of database corruption or permanent
  lockout, and likely to improve performance; do a db close in a
  DESTROY method to ensure a clean rundown; remove DB_LOG_AUTOREMOVE
  as it is not compatible with libdb 4.7 (bdb API change).
Sending lib/Mail/SpamAssassin/BayesStore/BDB.pm
Committed revision 760962 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=760962 ).

I'm still having trouble with this module. It can happen that I end up with
a database permanently locked (BerkeleyDB::Env->new hangs indefinitely),
or DB_SECONDARY_BAD is returned by db_put while expiring tokens,
and tests 17 and 24 in t/bayesbdb.t are failing. 


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046

--- Comment #24 from Mark Martinec <Ma...@ijs.si> 2009-10-15 11:06:44 UTC ---
Considering there is no progress or interest in this backend, do we
drop it from 3.3 code? (The high-end sites should be using SQL anyway.)

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Re: [Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by Mark Martinec <Ma...@ijs.si>.
Mike,

> I'll grab a checkout of your changes, and see if I can't run it
> through Devel::NYTProf and see how things pan out.

Thanks! It appears to be running alright now, I just put
it on our production machine an hour ago - so far so good,
scores make sense. I'm using bdb 4.6.21 from FreeBSD ports.

  Mark

Re: [Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by Michael Alan Dorman <md...@ironicdesign.com>.
> - BDB.pm: no wonder no tokens were ever found, sought in 'seen'
> instead of in a 'tokens' database;

Um, thanks Mark, though I have to admit that that error makes me feel
pretty stupid.  No, make that really stupid.

> - BDB.pm: add new subroutine _mget, and move a loop through
> search tokens from tok_get_all into _mget, avoiding one level
> of procedure call for few-hundred tokens

Though I obviously need to go back and check with code that's retrieving
tokens properly (I'm going to be kicking myself that one for a while), I
wrote tok_get_all the way I did because it lowered its impact in
profiling enormously---I was finding that tok_get was basically unused,
so writing tok_get in terms of tok_get_all was a huge win.

Now much of this was when I was also using transactions, so that
profile might have changed.

I'll grab a checkout of your changes, and see if I can't run it through
Devel::NYTProf and see how things pan out.

Mike.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #19 from Mark Martinec <Ma...@ijs.si>  2009-02-10 08:14:32 PST ---
- BDB.pm: no wonder no tokens were ever found, sought in 'seen'
instead of in a 'tokens' database;

- BDB.pm: fix 'sa-learn --dump' failing:
plugin: eval failed: bayes: dump_db_toks: not implemented
plugin: eval failed: Can't locate object method "compute_prob_for_token"
via package "Mail::SpamAssassin::Plugin::Bayes" at BDB.pm line 615

- BDB.pm: add new subroutine _mget, and move a loop through
search tokens from tok_get_all into _mget, avoiding one level
of procedure call for few-hundred tokens

- BDB.pm: comment-out debug calls to Data::Dumper

- added a timing report to Plugin::Bayes::learner_expire_old_training

Sending        lib/Mail/SpamAssassin/BayesStore/BDB.pm
Sending        lib/Mail/SpamAssassin/Plugin/Bayes.pm
Transmitting file data ..
Committed revision 743007 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=743007 ).


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #2 from Michael Alan Dorman <md...@ironicdesign.com>  2009-01-22 05:52:16 PST ---
Created an attachment (id=4427)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4427)
The patch adding BayesStore/BDB.pm

I've still not quite got this in production use yet, but on benchmarks it
thrashes all of the other options pretty soundly.  I'll try to add those
figures after a quick re-run.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #7 from Justin Mason <jm...@jmason.org>  2009-01-23 01:59:22 PST ---
wow.  those are great numbers ;)


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #4 from Michael Alan Dorman <md...@ironicdesign.com>  2009-01-22 06:31:56 PST ---
(In reply to comment #3)
> What does benchmarking say on auto-purging the database?
> In my experience, lengthy auto expiration runs was the single
> and most pressing reason to abandon bdb and switch our Bayes
> to SQL.

Well we don't have to do the dance of move-and-rebuild that DBM does, in part
because we are able to create a secondary index on atime that makes it very
easy to estimate whether we would expire too many tokens in a given run.  So I
would expect it to be as efficient as the SQL back-ends in that respect.

Still, it's a good question that would not be answered out of the box by the
benchmark code, so I've changed the local.cf files in the benchmark directory
to remove bayes_auto_expire 0, and I'll re-run and see what the results look
like.

I'm curious, which SQL back-end are you using?  And if it's MySQL, do you have
any performance tuning tips?  I've been a little startled to find that PgSQL is
actually outperforming MySQL on my benchmarks (given their reps) but then I
know how to tune PgSQL well; I'm more ignorant about MySQL.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #12 from Mark Martinec <Ma...@ijs.si>  2009-02-06 04:14:47 PST ---
I intended to try out the module, but was sidetracked by a perl5.8.9
problem, which crashes the interpreter on FreeBSD when it tries to
compile the code autogenerated from rules as soon as some module
with threading like BerkeleyDB is brought in:

perl -MBerkeleyDB body_0.pm
Bus error: 10 (core dumped)

I filed a perl bug report #63054:

  http://rt.perl.org/rt3/Public/Bug/Display.html?id=63054
  Perl_peep recursion exceeds stack in 5.8.9 eval-ing a long program

but I'm not yet sure how to proceed. Looks like I need to
downgrade to 5.8.8 or upgrade to 5.10.0, which is a bit of a
pain when most Perl modules are installed from FreeBSD ports.

A workaround could be if SA produced smaller code sections,
the 83600 lines of code for body rules at prio=0 is a lot
(using some SARE and ZDF rules), perl5.8.9 can survive about
two thirds of that.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #6 from Michael Alan Dorman <md...@ironicdesign.com>  2009-01-22 09:39:41 PST ---
So, running masses/bayes-testing/benchmark (with auto-expire enabled) across
each back-end on the same corpus on an otherwise idle machine produces the
following results:

Benchmarking BDB

real    14m19.809s
user    10m27.823s
sys     0m42.495s

Benchmarking PostgreSQL

real    44m23.183s
user    8m15.167s
sys     0m30.978s

Benchmarking MySQL

real    44m18.552s
user    11m2.257s
sys     1m6.612s

Benchmarking SDBM

real    25m14.481s
user    9m56.833s
sys     1m1.748s

Benchmarking DB_File

real    41m6.284s
user    12m33.203s
sys     0m57.692s

I won't pretend that MySQL is the best-tuned MySQL install ever, though I have
done some.  I may make try tweaking that a bit based on Mark Martinec's
suggestions and re-run.

Anyway, that's based on the patch I've included here.  I would appreciate any
comments or suggestions anyone has about the code.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046

--- Comment #25 from Mark Martinec <Ma...@ijs.si> 2009-11-19 11:21:07 UTC ---
(In reply to comment #21)
> and tests 17 and 24 in t/bayesbdb.t are failing.

These failing tests I have fixed:

  r881316 | mmartinec | 2009-11-17 16:00:34 +0100 (Tue, 17 Nov 2009) | 2 lines
  Fixed bug in BayesStore::BDB::tok_get, fixed corresponding test

The rest about alpha-quality still applies.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #10 from Michael Alan Dorman <md...@ironicdesign.com>  2009-01-23 10:42:09 PST ---
(In reply to comment #7)
> wow.  those are great numbers ;)

Yeah, they're so great that I have to admit I wonder if I've got a bug
somewhere.

Specifically, bdb and db_file are very close right up until the long
single-threaded slogs through large mailboxes at the end of the benchmark, at
which point bdb pulls out a 5x-6x speed differential.  That seems to me wildly
improbable.

I think the code is pretty straightforward; I'd love to have other eyes looking
at it.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #14 from Mark Martinec <Ma...@ijs.si>  2009-02-06 05:22:28 PST ---
> > A workaround could be if SA produced smaller code sections,
> > the 83600 lines of code for body rules at prio=0 is a lot
> 
> yeah, we can do that I think.  can you open a separate SA bug for this?

Ok, opened Bug 6060.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #20 from Mark Martinec <Ma...@ijs.si>  2009-02-10 10:49:00 PST ---
It seems to be running just fine for a while, but then a database seems
to get corrupted:

Feb 10 19:34:42 xxx amavis[36895]: (36895-30) _WARN:
  rules: failed to run BAYES_05 test, skipping:
  (Couldn't put record:
  Secondary index corrupt: not consistent with primary
  at /usr/local/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/BayesStore/BDB.pm
  line 948)

Feb 10 19:34:42 xxx amavis[36827]: (36827-30) _WARN:
  plugin: eval failed: bayes: (in learn)
  Couldn't put record:
  Secondary index corrupt: not consistent with primary
  at /usr/local/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/BayesStore/BDB.pm
  line 816.

Same happened yesterday, which made me scratch the bdb database and start
anew, but it's happened again. I'm using db46-4.6.21.3 and BerkeleyDB-0.36
from FreeBSD ports. The same pair of db46 and BerkeleyDB is working well
for busy databases maintained by amavisd (nanny/process status, SNMP-like
statistics counters, cache and a cache-queue).

Could it be some concurrency/locking problem between multiple child processes,
each doing its Bayes/BDB thing independently (just like in spamd)?


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #23 from Mark Martinec <Ma...@ijs.si>  2009-06-29 06:45:16 PST ---
> is this still occasionally flaky?  we could put it into 3.3.0alpha as a
> disabled-by-default optional "alpha-quality" plugin....

I think it doesn't work at all at the moment, although possibly there
isn't much that is missing from making it work. It would be very nice
to get it to at least alpha quality for 3.3.0.

Regardless, I'd vote for including it in the 3.3.0alpha but
disabled and marked as not-quite-ready.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #13 from Justin Mason <jm...@jmason.org>  2009-02-06 04:34:14 PST ---
(In reply to comment #12)
> I intended to try out the module, but was sidetracked by a perl5.8.9
> problem, which crashes the interpreter on FreeBSD when it tries to
> compile the code autogenerated from rules as soon as some module
> with threading like BerkeleyDB is brought in:

:(

> A workaround could be if SA produced smaller code sections,
> the 83600 lines of code for body rules at prio=0 is a lot
> (using some SARE and ZDF rules), perl5.8.9 can survive about
> two thirds of that.

yeah, we can do that I think.  can you open a separate SA bug for this?


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #17 from Mark Martinec <Ma...@ijs.si>  2009-02-09 06:42:37 PST ---
> Mark, if you want to import it into SVN, go right ahead... I'm +1

Imported a new Bayes backend M::S::BayesStore::BDB, along
with its benchmarks and tests (from Bug 6046 attachment);
author: Michael Alan Dorman
Sending        MANIFEST
Adding         lib/Mail/SpamAssassin/BayesStore/BDB.pm
Sending        masses/bayes-testing/benchmark/README
Adding         masses/bayes-testing/benchmark/helper/bdb
Adding         masses/bayes-testing/benchmark/helper/bdb/cleardb
Adding         masses/bayes-testing/benchmark/helper/bdb/dbsize
Adding         masses/bayes-testing/benchmark/helper/bdb/setup
Adding         masses/bayes-testing/benchmark/tests/bdb
Adding         masses/bayes-testing/benchmark/tests/bdb/site
Adding         masses/bayes-testing/benchmark/tests/bdb/site/init.pre
Adding         masses/bayes-testing/benchmark/tests/bdb/site/local.cf
Adding         masses/bayes-testing/benchmark/tests/bdb/user_prefs
Adding         t/bayesbdb.t
Transmitting file data ..........
Committed revision 742522 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=742522 ).


Quench the warning:
  BerkeleyDB::db_version" used only once: possible typo at t/bayesbdb.t
Prevent bayesbdb.t from aborting:
  Can't use an undefined value as an ARRAY reference at
  ../blib/lib/Mail/SpamAssassin/BayesStore/BDB.pm line 683
Sending        lib/Mail/SpamAssassin/BayesStore/BDB.pm
Sending        t/bayesbdb.t
Transmitting file data ..
Committed revision 742535 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=742535 ).


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #8 from Quanah Gibson-Mount <qu...@zimbra.com>  2009-01-23 09:47:35 PST ---
Just as an aside, is there any way to make it so the bayes table can be shared
when using BDB?  I'm guessing not, but figured it doesn't hurt to ask. ;)


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P5

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046


Justin Mason <jm...@jmason.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.2.6                       |3.3.0




--- Comment #1 from Justin Mason <jm...@jmason.org>  2009-01-19 06:57:41 PST ---
aiming at 3.3.0.  3.2.6 is unlikely to see any new features ;)


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|3.3.1                       |3.3.0

--- Comment #27 from Mark Martinec <Ma...@ijs.si> 2010-01-27 04:05:10 UTC ---
Ok, this ended up in 3.3.0, with a warning in release notes that it
is an alpha. Closing it now. For any particular problems please
open separate tickets.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #16 from Justin Mason <jm...@jmason.org>  2009-02-06 06:18:27 PST ---
(In reply to comment #15)
> Michael, is the attached code the latest that you have, or have
> you done any changes later? Do we have a licensing agreement
> from Michael? I'd suggest we go on with importing the module -
> - as it is an optional backend module what is important is that
> we have an intention of adopting it and that it compiles cleanly,
> any minor bugs can be fixed along the way.

yep, the CLA is sorted; http://people.apache.org/~jim/committers.html .
so it's ready to go!

Mark, if you want to import it into SVN, go right ahead... I'm +1


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #9 from Michael Alan Dorman <md...@ironicdesign.com>  2009-01-23 10:32:34 PST ---
> Just as an aside, is there any way to make it so the bayes table can be shared
> when using BDB?  I'm guessing not, but figured it doesn't hurt to ask. ;)

Technically, I believe it is possible.  My DB4.6 reference material discusses
the an RPC-based client-server mechanism.

That is, however, so far beyond my expertise, I wouldn't really be sure where
to begin.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6046] Add BayesStore backend using direct calls to BerkeleyDB

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6046





--- Comment #18 from Mark Martinec <Ma...@ijs.si>  2009-02-09 11:10:46 PST ---
some rather cosmetic changes to BDB.pm: wrap long lines; remove some
redundant parenthesis; TAB -> SP in POD sections (doesn't come out
nice); uncomment most dbg calls for the time being, change debug
area id 'BDB:' -> 'bayes:' for consistency with other backends;
changed 'assert'-like idiom to: 'assert_condition or die' (previously
a mix of 'negative_condition and die' and 'positive_cond or die'
was used); test status of BerkeleyDB methods for zero as per docs,
instead of testing for boolean; avoid idiom 'my $v=expr or die/dbg'
separating assignment and a test ('my $x=0 or die' yields:
'Found = in conditional', while 'my($x)=0 or die' never dies)

Sending        lib/Mail/SpamAssassin/BayesStore/BDB.pm
Transmitting file data .
Committed revision 742680 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=742680 ).


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.