You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2014/10/28 14:19:27 UTC

[Bug 7097] New: Set fillfactor for PostgreSQL sample bayes_token and awl tables

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7097

            Bug ID: 7097
           Summary: Set fillfactor for PostgreSQL sample bayes_token and
                    awl tables
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Learner
          Assignee: dev@spamassassin.apache.org
          Reporter: tometzky@batory.org.pl

In PostgreSQL database update is essentialy:
- mark old version of a row as deleted;
- add new version of a row.

When there's no empty or deleted space in current 8kB page then it has to write
it to another page. And save both pages to write ahead log (think journal) and
later to data file.

Setting fillfactor table attribute, available from PostgreSQL 8.2, to less than
100 will hint Postgres that this table is updated often and make it use only
set percentage of tuple space for inserted data. Setting this to say 95 will
leave space for a couple of row versions - about 7-8 from 157 in bayes_token,
3-4 from 55 in awl according to my tests.

This would make it more efficient, as large percentage of writes to these
tables are updates (more than 99,5% on my server). It would make it use less
seeks. And would make it use less index writes.

Largest impact would be visible just after importing data - using `sa-learn
--spam`,  `sa-learn --ham` or `sa-learn --restore`. It would be less visible
during normal operation, as table would somewhat tune itself by reusing space
marked as deleted automatically. But not as well as using fillfactor. Also it
makes for example benchmarking bayes stores not entirely fair for Postgres.

Please add to bayes_pg.sql:
alter table bayes_token set (fillfactor=95);

And to awl_pg.sql:
alter table awl set (fillfactor=95);

These would generate error on ancient Postgres versions older than 8.2, which
are unsupported by upstream,. But it is a harmless error - it would simply
ignore this statement. As I remember only RHEL/CentOS 5 still support
PostgreSQL 8.1, but even there there's supported and encouraged option to
upgrade to 8.4.

Is there a standard benchmark for bayes stores to measure impact?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7097] Set fillfactor for PostgreSQL sample bayes_token and awl tables

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7097

Joe Quinn <jq...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |jquinn+SAbug@pccc.com
         Resolution|---                         |FIXED

--- Comment #1 from Joe Quinn <jq...@pccc.com> ---
Very thorough! I don't feel bad about letting 8.1 throw a harmless warning,
since it's been unmaintained for 4 years. The performance benefits of a smaller
fillfactor are well documented elsewhere, so I don't see a problem with adding
this.

Committed revision 1640413.

-- 
You are receiving this mail because:
You are the assignee for the bug.