You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/03/20 16:32:30 UTC

[Bug 5383] New: compile_now() doesn't pre-load all necessary Perl modules

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383

           Summary: compile_now() doesn't pre-load all necessary Perl
                    modules
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P3
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: Mark.Martinec@ijs.si


SA man page states:
  $f->compile_now (...)
    Compile all patterns, load all configuration files,
    and load all possibly-required Perl modules.

Depending on a version of SA, there is always a handful
of Perl modules which compile_now() fails to load, like
some plugins and their underlying modules, BayesStore::*,
NetAddr::IP, SA::Locker::* and the like.

In a normal (non-chrooted) environment this just means that
these modules will be loaded later by each child process (spamd,
amavisd), instead of being loaded by a master process before a
fork; which adds a slight performance penalty on each child startup.
It would be nice to avoid it (even with spamd), although is not crucial.

In a chrooted environment (optional in amavisd), this is more serious.
It implies that either the whole Perl lib hierarchy needs to be present
in a chroot jail (or at least the missing parts), or a master process
needs to pre-load missing modules manually before forking.

For every version of SA since the early 2.x, for amavisd-new I'm manually
preparing a list of missing modules and keeping it in the amavisd program,
which then preloads the remaining missing modules in a master process.
This hacking approach if getting unsightly long, and can't cope
cleanly with future versions of SA.

I'm proposing that the compile_now is enhanced to pre-load ALL the
required Perl modules, at least the ones directly needed by SA for
its current set of loaded plugins and selected bayes store etc.,
or some alternative API call be provided which can be invoked by
a master process just before forking. Certainy SA is in a better
position to know which plugins are being loaded, what is its
current bayes db backend, and what are its other components that
are not exercised by a sample message evaluated by compile_now.

I thing the SA 3.2 jump is a perfect place in time to introduce
this enhancement :)


As an illustration, here is a list of missing modules needed for SA320:

basic modules (non-plugins):

Mail::SpamAssassin::PersistentAddrList Mail::SpamAssassin::SQLBasedAddrList
Mail::SpamAssassin::Locker Mail::SpamAssassin::Locker::Flock
Mail::SpamAssassin::Locker::UnixNFSSafe Mail::SpamAssassin::BayesStore
Mail::SpamAssassin::BayesStore::DBM Mail::SpamAssassin::BayesStore::SQL
Mail::SpamAssassin::BayesStore::MySQL Mail::SpamAssassin::BayesStore::PgSQL
Mail::SpamAssassin::Bayes Mail::SpamAssassin::Bayes::CombineChi
Mail::SpamAssassin::PerMsgLearner
Mail::SpamAssassin::Locales (not sure about this one)

underlying missing Perl modules:

Net::DNS::RR::SOA Net::DNS::RR::NS Net::DNS::RR::MX
Net::DNS::RR::A Net::DNS::RR::AAAA Net::DNS::RR::PTR
Net::DNS::RR::CNAME Net::DNS::RR::TXT
NetAddr::IP NetAddr::IP::Util
auto::NetAddr::IP::Util::inet_n2dx auto::NetAddr::IP::Util::ipv6_n2d

(I'm not sure about the 'URI' Perl module, it used to be necessary
to pre-load its components, but apparently no longer is)

plugins:

Hashcash RelayCountry SPF URIDNSBL
AWL AccessDB AntiVirus AutoLearnThreshold DCC DomainKeys DKIM MIMEHeader
Pyzor Razor2 ReplaceTags SpamCop TextCat URIDetail WhiteListSubject
BodyEval DNSEval HTMLEval HeaderEval MIMEEval RelayEval URIEval WLBLEval
ASN Bayes BodyRuleBaseExtractor Check HTTPSMismatch OneLineBodyRuleType 
Rule2XSBody Shortcircuit VBounce

and their supporting modules if an associated plugin is loaded:

Razor2::Client::Agent IP::Country::Fast
Mail::DomainKeys Mail::DomainKeys::Message Mail::DomainKeys::Policy
Mail::DKIM Mail::DKIM::Verifier
Mail::SpamAssassin::Plugin::SPF

needed by SPF plugin:
Mail::SPF Mail::SPF::Query Mail::SPF::Mech Mail::SPF::Mech::A
Mail::SPF::Mech::All Mail::SPF::Mech::Exists Mail::SPF::Mech::IP4
Mail::SPF::Mech::IP6 Mail::SPF::Mech::Include Mail::SPF::Mech::MX
Mail::SPF::Mech::PTR Mail::SPF::Mod Mail::SPF::Mod::Exp
Mail::SPF::Mod::Redirect Mail::SPF::SenderIPAddrMech
Mail::SPF::v1::Record Mail::SPF::v2::Record

needed by DKIM and DomainKeys plugins:
Crypt::OpenSSL::RSA
auto::Crypt::OpenSSL::RSA::new_public_key
auto::Crypt::OpenSSL::RSA::new_key_from_parameters
auto::Crypt::OpenSSL::RSA::get_key_parameters
auto::Crypt::OpenSSL::RSA::import_random_seed
Digest::SHA


  Mark



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5383] compile_now() doesn't pre-load all necessary Perl modules

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.3.0                       |3.2.0




------- Additional Comments From jm@jmason.org  2007-03-22 10:24 -------
'Would it be straightforward to provide an API function to just
return a list of loaded plugins? (the ones listed by loadplugin)'

actually, yep, that's pretty trivial; we already track that in the PluginHandler
object hanging off M:SA.  should be able to get that into 3.2.0-rc2, hopefully ;)





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5383] compile_now() doesn't pre-load all necessary Perl modules

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383





------- Additional Comments From Mark.Martinec@ijs.si  2007-03-20 10:31 -------
Created an attachment (id=3883)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=3883&action=view)
Example/illustration code - relevant sections from amavisd-new 2.5.0




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5383] compile_now() doesn't pre-load all necessary Perl modules

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383





------- Additional Comments From Mark.Martinec@ijs.si  2007-03-20 09:52 -------
> 'auto::NetAddr::IP::Util::ipv6_n2d' is definitely an implementation detail
> of NetAddr::IP, not something we should require directly, since there's a
> possibility that a future version of that module might rename/delete that 
class.

I agree. As long as the plugins, Bayes store backends, locks and
similar 'vendor provided' modules are catered for, that's good enough
for me and eliminates the bulk of my hacks.

  Mark



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5383] compile_now() doesn't pre-load all necessary Perl modules

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383





------- Additional Comments From jm@jmason.org  2007-03-20 10:03 -------
btw can you post your current code (even if it's for the 3.1.x module set)?  it
might be handy, at least for reference...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5383] compile_now() doesn't pre-load all necessary Perl modules

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.2.0                       |3.3.0




------- Additional Comments From jm@jmason.org  2007-03-25 06:05 -------
ok, that's added;

: jm 816...; svn commit -m "add new public API:
Mail::SA::get_loaded_plugins_list(), to allow callers to get a list of the
currently-loaded plugin objects" lib/Mail/SpamAssassin.pm
lib/Mail/SpamAssassin/PluginHandler.pm t/plugin_file.t t/data/testplugin.pm
Sending        lib/Mail/SpamAssassin/PluginHandler.pm
Sending        lib/Mail/SpamAssassin.pm
Sending        t/data/testplugin.pm
Sending        t/plugin_file.t
Transmitting file data ....
Committed revision 522258.


if rc1 winds up being released as 3.2.0 GA (which IMO is unlikely right now),
you may need to use a can() check to verify that the API exists before calling it.

moving to 3.3.0 for the pre-existing issue.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5383] compile_now() doesn't pre-load all necessary Perl modules

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383





------- Additional Comments From Mark.Martinec@ijs.si  2007-03-20 09:26 -------
> you left it a bit late ;)

I know, sorry.

> let's give it a try and see if it doesn't introduce any issues.

Here is a little illustration code that can help detect missing modules:

  my(%modules_basic) = %INC;  # remember current state (e.g. after compile_now)
  require URI;   # let the program do its normal flow, possibly
                 # secretly loading additional modules
  # detect and show additionally loaded modules:
  my(@modules_extra) = grep {!exists $modules_basic{$_}} keys %INC;
  printf("missing modules: %s\n", join(", ",@modules_extra))




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5383] compile_now() doesn't pre-load all necessary Perl modules

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383





------- Additional Comments From Mark.Martinec@ijs.si  2007-03-21 16:46 -------
> actually, I think I'd prefer to push this out to 3.3.0.
> I don't think it's wise to introduce a major new change like this so late

Ok, understood.

> loading just the ones that will be needed at runtime, and ignoring
> the ones we won't need to save RAM, will require a bit more thought

Would it be straightforward to provide an API function to just
return a list of loaded plugins? (the ones listed by loadplugin)





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5383] compile_now() doesn't pre-load all necessary Perl modules

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.2.0




------- Additional Comments From jm@jmason.org  2007-03-20 08:54 -------
> For every version of SA since the early 2.x, for amavisd-new I'm manually
> preparing a list of missing modules and keeping it in the amavisd program,
> which then preloads the remaining missing modules in a master process.
> This hacking approach if getting unsightly long, and can't cope
> cleanly with future versions of SA.

definitely better if we do that.

> I'm proposing that the compile_now is enhanced to pre-load ALL the
> required Perl modules, at least the ones directly needed by SA for
> its current set of loaded plugins and selected bayes store etc.,
> or some alternative API call be provided which can be invoked by
> a master process just before forking. Certainy SA is in a better
> position to know which plugins are being loaded, what is its
> current bayes db backend, and what are its other components that
> are not exercised by a sample message evaluated by compile_now.
> 
> I thing the SA 3.2 jump is a perfect place in time to introduce
> this enhancement :)

you left it a bit late ;)  let's give it a try and see if it doesn't
introduce any issues.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5383] compile_now() doesn't pre-load all necessary Perl modules

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383





------- Additional Comments From jm@jmason.org  2007-03-20 09:45 -------
> Here is a little illustration code that can help detect missing modules:
> 
>   my(%modules_basic) = %INC;  # remember current state (e.g. after compile_now)
>   require URI;   # let the program do its normal flow, possibly
>                  # secretly loading additional modules
>   # detect and show additionally loaded modules:
>   my(@modules_extra) = grep {!exists $modules_basic{$_}} keys %INC;
>   printf("missing modules: %s\n", join(", ",@modules_extra))

I'd be worried that this would include sub-modules that version X.Y of module
Foo requires, whereas version X.Z might not include/require those modules.

e.g. 'auto::NetAddr::IP::Util::ipv6_n2d' is definitely an implementation detail
of NetAddr::IP, not something we should require directly, since there's a
possibility that a future version of that module might rename/delete that class.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5383] compile_now() doesn't pre-load all necessary Perl modules

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5383


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.2.0                       |3.3.0




------- Additional Comments From jm@jmason.org  2007-03-21 10:10 -------
actually, I think I'd prefer to push this out to 3.3.0.

I don't think it's wise to introduce a major new change like this so late, since
it *has* to work for spamd to work (since it's in the spamd init code). I'd
prefer to get 3.2.0 out ASAP, instead.

also -- I'm worried, on second thoughts...  some of those modules, e.g.:

Mail::SpamAssassin::Locker::Flock
Mail::SpamAssassin::Locker::UnixNFSSafe 
Mail::SpamAssassin::BayesStore::DBM Mail::SpamAssassin::BayesStore::SQL
Mail::SpamAssassin::BayesStore::MySQL Mail::SpamAssassin::BayesStore::PgSQL

are deliberately lazy-loaded in order to minimize memory usage (compiled
perl bytecode is surprisingly memory-hungry!)  Dealing with this
correctly, loading just the ones that will be needed at runtime, and ignoring
the ones we won't need to save RAM, will require a bit more thought, I think.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.