You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/06/09 16:17:22 UTC

[Bug 4400] New: PostgreSQL schema maintains unnecessary additional index

http://bugzilla.spamassassin.org/show_bug.cgi?id=4400

           Summary: PostgreSQL schema maintains unnecessary additional index
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: mdorman@debian.org


Since the index assocated with the primary key on the bayes_token table was not
being chosen by the PostgreSQL planner for the most common queries (the id
parameter wasn't necessarily selective enough to trigger an index scan), an
index was added to the bayes_token table that indexes on token.

However, that means that two indices with extreme overlap are now being
maintained on that table, which is a drag on performance.

A better solution, that I'm testing with a 3.0.3-based install, is to reverse
the keys in the primary key declaration---this means the index will be used for
matches against the token field (removing the need for the index on token) while
still providing the same uniqueness guarantees as the exiting primary key
declaration.

I personally rate this as a high-return, low-impact change, which is why I've
put the priority up.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4400


felicity@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.1.2                       |3.2.0




------- Additional Comments From felicity@apache.org  2006-05-02 03:49 -------
since there's been no movement on this ticket for ages, I'm punting it to the 3.2. queue.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4400


Bob@Menschel.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.1.1




------- Additional Comments From Bob@Menschel.net  2005-06-09 22:32 -------
Triage: My impression is we're too close to 3.1.0 to test this type of change
adequately for that release, but it seems like a reasonable enhancement to put
into 3.1.1.  Michael: Please do keep this bug updated with the results of your
testing, and if you can supply a patch, that would be a big help too. 



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4400





------- Additional Comments From parkerm@pobox.com  2005-06-13 13:41 -------
Subject: Re:  PostgreSQL schema maintains unnecessary additional
 index

Patches accepted for improvements to the benchmark.

Michael




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4400





------- Additional Comments From parkerm@pobox.com  2005-06-13 11:38 -------
Subject: Re:  PostgreSQL schema maintains unnecessary additional
 index

Since the Pg code has been heavily benchmarked and tweaked accordingly,
I'm pretty certain I won't make this change without similar benchmarks
to back it up.  Please see:
http://wiki.apache.org/spamassassin/BayesBenchmark

And run the before and after benchmarks.

Thanks
Michael




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4400





------- Additional Comments From mdorman@debian.org  2005-06-13 10:52 -------
Well, I'm not going to raise a ruckus if it doesn't go into 3.1.0, though I
think it should (and I'll tweak my 3.1.0 install so it does there) since I see
it as obviously equivalent code with a significant upside (1/3 less writing on
inserts and updates).

I suppose I may have been a little unclear before about it's testing
status---I've been running like this for a week, and before I was having exactly
the problem that the additional index present in HEAD are intended to solve
(sequential scans for every lookup).

FWIW, I'm going to attach a short script that others could run that will test
both setups, albeit against tokens that are in my db.

I'm also going to attach the output of running that script (on my bayes db, with
about 46K messages and 1.3M tokens), and it's easy to see that the exact same
plans are being generated in both places.

Seriously, this is a big win for any postgresql installations that see any
measure of activity because it's lightening the write load on the disk
subsystem, and, perhaps only as a result of my long use of PostgreSQL, I believe
it is absolutely zero-risk.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4400





------- Additional Comments From mdorman@debian.org  2005-06-13 13:30 -------
(In reply to comment #5)

> Since the Pg code has been heavily benchmarked and tweaked accordingly,
> I'm pretty certain I won't make this change without similar benchmarks
> to back it up.  Please see:
> http://wiki.apache.org/spamassassin/BayesBenchmark

Your benchmark has at least one issue that I can see---unless I missed
something, all entries you put in the test database will all have the same ID.

This seems unlikely to be the case in most real installations, and the
selectivity of the id column could have implications for PostgreSQL's planner's
choice of index for queries of the form "where id = ? and token in ()".  OTOH, I
don't know that it's likely enough to be worth worrying about.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4400


felicity@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.1.1                       |3.1.2






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4400





------- Additional Comments From mdorman@debian.org  2005-06-13 12:48 -------
Well, I'll try and get to this at some point, but it's fairly low-priority for
me---my installations will see the increased performance, so it's hard for me to
work up a lot of enthusiasm for jumping through hoops to show that two index
writes for each insert or update is going to perform worse than one.

Perhaps some other PostgreSQL user will be more motivated than I, and can jump
in and run the tests.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4400





------- Additional Comments From parkerm@pobox.com  2005-08-05 21:52 -------
Can you clear something up.

Are you suggesting this change to 3.0 only? or did you find a benefit in the 3.1
(svn HEAD) case?

Does adding an additional index on token help?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4400


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.2.0                       |3.3.0




------- Additional Comments From jm@jmason.org  2006-12-11 10:51 -------
again, no movement, punting to 3.3.0



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4400





------- Additional Comments From mdorman@debian.org  2005-06-13 10:56 -------
Created an attachment (id=2935)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2935&action=view)
Output of pgschema.sql script

This is the output of the script attached to this bug, demonstrating that the
proposed change generates a query plan identical to the schema in HEAD. 
Although execution times are included in the output, they should be taken with
a grain of salt, as no attempt was made to compensate for caching or other
effects.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4400] PostgreSQL schema maintains unnecessary additional index

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4400





------- Additional Comments From mdorman@debian.org  2005-06-13 10:54 -------
Created an attachment (id=2934)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2934&action=view)
Script to alter schema and run identical queries to verify query plans are
identical

Running this SQL script will block access to your database as it drops and
creates various indices, so don't run it on a busy production system.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.