You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Marc Perkel <ma...@perkel.com> on 2005/02/14 18:50:24 UTC
Spam and Ham have different headers - bayesian tricks
Continuing with my experimenting with a second bayesian filter - using
spamprobe and controlling the tokens myself - and using SA to score the
output.
So - I noticed that spam and ham often have different header fields.
Some headers only show up in ham - and some headers only show up in
spam. So I tokenized the headers themselves and fed just the header
names in as data and got some really good results.
So - I don't know if SA is doing this but tokenizing the header names
(excluding the common ones that all headers have) is very effective.
--
Marc Perkel - marc@perkel.com
Spam Filter: http://www.junkemailfilter.com
My Blog: http://marc.perkel.com
My Religion: http://www.churchofreality.org
~ "If it's real - we believe in it!" ~
Re: Spam and Ham have different headers - bayesian tricks
Posted by Marc Perkel <ma...@perkel.com>.
Examples:
Ham Headers:
0.0000018 786 0 0x00000395 hdr_article
0.0000019 731 0 0x00000395 hdr_x-yahoo-profile
0.0000026 535 0 0x00000395 hdr_x-virus-checked
0.0000027 518 0 0x00000395 hdr_x-asf-spam-status
0.0000048 289 0 0x00000395 hdr_x-egroups-approved-by
0.0000052 267 0 0x00000395 hdr_x-elnk-trace
0.0000054 259 0 0x00000395 hdr_x-authentication-info
0.0000058 243 0 0x00000395 hdr_mail-followup-to
0.0000070 199 0 0x00000395 hdr_x-x-sender
0.0000076 184 0 0x00000395 hdr_resent-to
0.0000085 164 0 0x00000395 hdr_x-egroups-edited-by
0.0000086 163 0 0x00000395 hdr_x-content-filtered-by
0.0000088 159 0 0x00000395 hdr_x-list-host
0.0000100 140 0 0x00000395 hdr_x-enigmail-supports
0.0000100 140 0 0x00000395 hdr_x-enigmail-version
0.0000104 134 0 0x00000395 hdr_x-sequence
0.0000109 128 0 0x00000395 hdr_x-lyris-message-id
0.0000111 126 0 0x00000395 hdr_x-precedence
0.0000124 113 0 0x00000395 hdr_x-sasl-enc
0.0000132 106 0 0x00000395 hdr_x-ms-embedded-report
0.0000133 105 0 0x00000395 hdr_x-bugzilla-reason
0.0000135 104 0 0x00000395 hdr_x-envelope-from
0.0000140 100 0 0x00000395 hdr_x-plug
0.0000144 97 0 0x00000395 hdr_x-mailing-list-name
0.0000144 97 0 0x00000395 hdr_x-old-spam-check-by
0.0000144 97 0 0x00000395 hdr_x-old-spam-status
0.0000146 96 0 0x00000395 hdr_x-contentstamp
0.0000146 96 0 0x00000395 hdr_x-untd-originstamp
0.0000171 82 0 0x00000395 hdr_x-listprocessor-version
0.0000175 80 0 0x00000395 hdr_x-authenticated-sender
0.0000203 69 0 0x00000395 hdr_x-pmx-version
0.0000215 65 0 0x00000395 hdr_x-perlmx-spam
0.0000219 64 0 0x00000395 hdr_x-greylist
0.0000229 61 0 0x00000395 hdr_x-to
0.0000229 61 0 0x00000395 hdr_x-yahoo-newman-id
0.0000237 59 0 0x00000395 hdr_x-yahoo-alertid
0.0000237 59 0 0x00000395 hdr_x-yahoo-alerts-beta
0.0000237 59 0 0x00000395 hdr_x-yahoo-returnbounces
0.0000246 57 0 0x00000395 hdr_x-gmane-mailscanner
0.0000246 57 0 0x00000395 hdr_x-mimedefang-filter
0.0000264 53 0 0x00000394 hdr_x-ebay-mailtracker
0.0000264 53 0 0x00000394 hdr_x-yahoo-newman-expires
0.0000269 52 0 0x00000395 hdr_x-mail-info
0.0000274 51 0 0x00000395 hdr_x-juno-line-breaks
0.0000274 51 0 0x00000395 hdr_x-pop-user
0.0000280 50 0 0x00000394 hdr_x-coriate
0.0000280 50 0 0x00000395 hdr_x-domain
0.0000280 50 0 0x00000395 hdr_x-key
0.0000280 50 0 0x00000395 hdr_x-message-type
0.0000280 50 0 0x00000395 hdr_x-schema
0.0000286 49 0 0x00000395 hdr_x-listserver
0.0000292 48 0 0x00000395 hdr_x-dsncontext
0.0000311 45 0 0x00000395 hdr_envelope-sender
0.0000311 45 0 0x00000395 hdr_x-db
0.0000311 45 0 0x00000395 hdr_x-parse
0.0000318 44 0 0x00000395 hdr_x-cam-antivirus
0.0000318 44 0 0x00000395 hdr_x-cam-scannerinfo
0.0000318 44 0 0x00000395 hdr_x-cam-spamdetails
0.0000326 43 0 0x00000394 hdr_x-bigfish
0.0000341 41 0 0x00000395 hdr_seal-send-time
0.0000350 40 0 0x00000394 hdr_x-warning
0.0000359 39 0 0x00000395 hdr_x-amazon-corporate-relay
0.0000359 39 0 0x00000395 hdr_x-amazon-track
0.0000368 38 0 0x00000395 hdr_x-authenticated
0.0000368 38 0 0x00000395 hdr_x-converted-to-plain-text
0.0000378 37 0 0x00000395 hdr_x-msg-ref
0.0000378 37 0 0x00000395 hdr_x-starscan-version
0.0000378 37 0 0x00000395 hdr_x-viruschecked
0.0000378 37 0 0x00000395 hdr_x-wss-id
0.0000389 36 0 0x00000395 hdr_jobid
0.0000389 36 0 0x00000395 hdr_mailid
0.0000389 36 0 0x00000395 hdr_x-env-sender
0.0000400 35 0 0x00000395 hdr_x-operating-system
0.0000400 35 0 0x00000395 hdr_x-webtv-signature
0.0000400 35 0 0x00000395 hdr_x-y-gmx-trusted
0.0000412 34 0 0x00000395 hdr_x-mailscanner-to
0.0000412 34 0 0x00000395 hdr_x-subscription_info
0.0000424 33 0 0x00000394 hdr_x-evi-mailscanner
0.0000424 33 0 0x00000394 hdr_x-evi-mailscanner-information
0.0000424 33 0 0x00000394 hdr_x-evi-mailscanner-spamcheck
0.0000424 33 0 0x00000395 hdr_restrict
0.0000452 31 0 0x00000395 hdr_x-gmane-nntp-posting-host
0.0000452 31 0 0x00000395 hdr_x-note
0.0000467 30 0 0x00000395 hdr_x-server-uuid
0.0000483 29 0 0x00000395 hdr_x-fid
0.0000483 29 0 0x00000395 hdr_x-mail-handler
0.0000518 27 0 0x00000395 hdr_x-archived-at
0.0000538 26 0 0x00000394 hdr_x-disclaimer
0.0000538 26 0 0x00000395 hdr_x-reply-to
0.0000560 25 0 0x00000394 hdr_x-compuserve-customer
0.0000560 25 0 0x00000394 hdr_x-punge
0.0000560 25 0 0x00000394 hdr_x-sbi
0.0000560 25 0 0x00000394 hdr_x-terminate
0.0000560 25 0 0x00000394 hdr_x-treme
0.0000560 25 0 0x00000395 hdr_x-juno-att
0.0000560 25 0 0x00000395 hdr_x-juno-refparts
0.0000560 25 0 0x00000395 hdr_x-pgp-key
0.0000560 25 0 0x00000395 hdr_x-ufl-scanned-by
0.0000560 25 0 0x00000395 hdr_x-ufl-spam-status
0.0000583 24 0 0x00000394 hdr_x-emailedto
0.0000583 24 0 0x00000394 hdr_x-userid
0.0000583 24 0 0x00000395 hdr_x-frameusers
0.0000609 23 0 0x00000394 hdr_x-newsreader
0.0000609 23 0 0x00000394 hdr_x-ntf-cell_id
0.0000609 23 0 0x00000394 hdr_x-ntf-mime
0.0000609 23 0 0x00000394 hdr_x-ntf-unique_key
0.0000636 22 0 0x00000394 hdr_error
0.0000636 22 0 0x00000394 hdr_usage
0.0000636 22 0 0x00000395 hdr_x-egroups-from
0.0000636 22 0 0x00000395 hdr_x-ks
0.0000636 22 0 0x00000395 hdr_x-mailman-id
0.0000667 21 0 0x00000394 hdr_x-imail-spam-valhelo
0.0000667 21 0 0x00000395 hdr_x-originating-server
0.0000667 21 0 0x00000395 hdr_x-pair-authenticated
0.0000667 21 0 0x00000395 hdr_x-sympa-to
0.0000667 21 0 0x00000395 hdr_x-validation-by
0.0000700 20 0 0x00000394 hdr_x-cron-env
0.0000700 20 0 0x00000394 hdr_x-srs-rewrite
0.0000700 20 0 0x00000394 hdr_x-unsub
0.0000778 18 0 0x00000394 hdr_x-broadcast-flag
0.0000778 18 0 0x00000394 hdr_x-consensus-at-lawyerpoint
0.0000778 18 0 0x00000394 hdr_x-cruelty-to-analog
0.0000778 18 0 0x00000394 hdr_x-modulation
0.0000778 18 0 0x00000395 hdr_x-declude-sender
0.0000778 18 0 0x00000395 hdr_x-newsserver
0.0000778 18 0 0x00000395 hdr_x-smtpserver
0.0000778 18 0 0x00000395 hdr_x-whitelisted
0.0000823 17 0 0x00000394 hdr_x-clips-url
0.0000823 17 0 0x00000394 hdr_x-list-id
0.0000823 17 0 0x00000394 hdr_x-mailscanner-spamcheck
0.0000823 17 0 0x00000394 hdr_x-original-sender
0.0000823 17 0 0x00000394 hdr_x-pstn-levels
0.0000823 17 0 0x00000395 hdr_emacs
0.0000823 17 0 0x00000395 hdr_x-copyright
0.0000875 16 0 0x00000394 hdr_x-gpg-key-fingerprint
0.0000875 16 0 0x00000394 hdr_x-uwash-spam
0.0000875 16 0 0x00000395 hdr_x-me-uuid
0.0000875 16 0 0x00000395 hdr_x-ob-received
0.0000875 16 0 0x00000395 hdr_x-unsubscribe
0.0000875 16 0 0x00000395 hdr_x-usanet-msgid
0.0000875 16 0 0x00000395 hdr_x-usanet-source
0.0000933 15 0 0x00000394 hdr_x-newsgroups
0.0000933 15 0 0x00000394 hdr_x-quris
0.0000933 15 0 0x00000394 hdr_x-spamscore
0.0000933 15 0 0x00000395 hdr_x-mho-user
0.0000933 15 0 0x00000395 hdr_x-report-abuse-to
Spam Headers:
0.9997579 0 10 0x00000394 hdr_elistexpress-info
0.9997799 0 11 0x00000394 hdr_x-mdrcpt-to
0.9997982 0 12 0x00000394 hdr_error-to
0.9997982 0 12 0x00000395 hdr_content-alias
0.9998270 0 14 0x00000395 hdr_x-gmx-antivirus
0.9998386 0 15 0x00000394 hdr_x-mailnum
0.9998576 0 17 0x00000394 hdr_x-contact
0.9998576 0 17 0x00000394 hdr_x-titankey-e_id
0.9998655 0 18 0x00000394 hdr_x-gfol
0.9998725 0 19 0x00000394 hdr_x-library
0.9998725 0 19 0x00000395 hdr_x-desist
0.9998725 0 19 0x00000395 hdr_x-satatus
0.9998725 0 19 0x00000395 hdr_x_uid
0.9998725 0 19 0x00000395 hdr_x-yurttell
0.9998991 0 24 0x00000394 hdr_x-header-companydbusername
0.9998991 0 24 0x00000394 hdr_x-header-masterid
0.9998991 0 24 0x00000394 hdr_x-header-versions
0.9998991 0 24 0x00000395 hdr_x-kaspersky-antivirus
0.9998991 0 24 0x00000395 hdr_x-subscriber
0.9999069 0 26 0x00000395 hdr_x-campidz
0.9999165 0 29 0x00000395 hdr_language
0.9999243 0 32 0x00000394 hdr_x-astrocenter-type
0.9999243 0 32 0x00000394 hdr_x-astrocenter-uid
0.9999327 0 36 0x00000395 hdr_x-rav-antivirus
0.9999327 0 36 0x00000395 hdr_x-rocket-spam
0.9999363 0 38 0x00000394 hdr_x-spam-forward
0.9999395 0 40 0x00000395 hdr_authentication-results
0.9999395 0 40 0x00000395 hdr_x-yahoo-forwarded
0.9999534 0 52 0x00000394 hdr_x-mozilla-draft-info
0.9999534 0 52 0x00000394 hdr_x-mozilla-status
0.9999560 0 55 0x00000395 hdr_x-cid
0.9999560 0 55 0x00000395 hdr_x-yahoofilteredbulk
0.9999644 0 68 0x00000395 hdr_x-disembark
0.9999644 0 68 0x00000395 hdr_x-nthart
0.9999649 0 69 0x00000395 hdr_x-clienthost
0.9999649 0 69 0x00000395 hdr_x-ip
0.9999664 0 72 0x00000395 hdr_x-mailingid
0.9999693 0 79 0x00000395 hdr_x-unsent
0.9999839 0 150 0x00000395 hdr_original-recipient
0.9999916 0 290 0x00000395 hdr_x-message-info