You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2005/11/05 04:23:18 UTC

[Spamassassin Wiki] Update of "RuleQaApp" by JustinMason

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RuleQaApp

New page:
= The Rule-QA application =

This is visible [http://buildbot.spamassassin.org/ruleqa/ here]. It has two display modes:

  * the [http://buildbot.spamassassin.org/ruleqa/ aggregate overview], where all rules are visible in an overview form
  * the [http://buildbot.spamassassin.org/ruleqa/ruleqa?keywords=&rule=RCVD_HELO_IP_MISMATCH&s_detail=1 rule-detail view], which 'zooms in' to provide lots of detail about a specific rule

Data is loaded from two sources:

  * the 'preflight' mass-checks, which run after each checkin: PreflightBuildBot
  * nightly mass-checks, which take place in a more leisurely, decentralised manner: NightlyMassCheck

== Selecting a Mass-Check ==

Up at the top of the page, there's a table of links under the 'Which Corpus?' heading, allowing you to select a mass-check from the recently-performed set.  Each line is a link to display that mass-check.  The following data helps you select the mass-check to display:

  * Date: the date of the mass-check
  * MC-Rev: the current SVN revision in the repository at the time the mass-check started
  * Prior Commit and Rev: the revision that was current in ''the SpamAssassin part of the repository'' when it started
  * Author: the person who committed that SVN revision
  * Net: whether the mass-check included network rules

(Note, 'Prior Commit and Rev' is more accurate than 'MC-Rev', since we share
a repository with other Apache projects, which results in the 'MC-Rev'
figure incrementing without any checkins taking place on ''our'' part of
the repository.)

In addition, the line below this details the commit message for that
revision, and the usernames of mass-checkers that submitted logs.

(This UI is probably going to get better btw ;)

== The Aggregate Overview ==

The [http://buildbot.spamassassin.org/ruleqa/ aggregate overview] displays all the rules in a form based on that of HitFrequencies.   There's a few minor differences, however; most notably, there are links from each rule name to the rule-detail view.

If multiple people performed mass-checks on that revision, all their data is aggregated and averaged, as if it was one gigantic mass-check.

Note that you can select a selection of rules using the 'Which Rules?' textbox.

== The Rule-Detail View ==

The [http://buildbot.spamassassin.org/ruleqa/ruleqa?keywords=&rule=RCVD_HELO_IP_MISMATCH&s_detail=1 rule-detail view] displays the following sections:

 * '''set 0, in aggregate''': same as the aggregate overview
 * '''set 0, broken down by message age in weeks''': a way to quickly see if the hit-rate is trending up or down for the rule
 * '''set 0, broken down by contributor''': see how the hit-rate changes from person to person
 * '''set 0, score-map''': hit-rates broken down by the score of the messages it hits, letting you see if a rule fires mostly on already-high-scoring spam
 * '''set 0, overlaps between rules''': display overlap with other rules
 * '''Graph, hit-rate over time''': see below

Note the (more info) links at the top-right of every freqs graph; this allows
you to see the header lines from that mass-check, if you so desire.  This
is useful if you want to find out whose corpus was used, how many mails were
used, etc.

Finally, at the bottom, there's a link to go 'back' to the aggregate view.

== The 'hit-rate over time' graph ==

[http://buildbot.spamassassin.org/ruleqa/ruleqa?daterev=20051104-r330762&keywords=&rule=RCVD_HELO_IP_MISMATCH&s_detail=1&s_g_over_time=1#over_time_anchor This graph] displays how the rule's hit rate has changed over time,
breaking it down by time and by submitter.

There are two graphs, one for spam, and one for ham messages.  On the left of
each graph is the percentage of the messages, of that type, in that time
period, that were hit by the rule.  On the bottom is the date the messages in
question were received, going from the past (on the left) to the current date
(on the right).

Each submitter has their own colour, which is used to highlight a scatter-plot
of points indicating the hit-rates on their corpus; in addition, they have
a [http://search.cpan.org/~jhi/Statistics-DEA-0.04/lib/Statistics/DEA.pm Discontiguous Exponential Average] line, which attempts to give a reasonable
average of these points.