You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Giampaolo Tomassoni <Gi...@Tomassoni.biz> on 2006/11/14 03:29:56 UTC

RE: ????? ??? ??????

> At the risk of appearing to be (or revealing myself to be ;-) an
> anti-Windows bigot (actually, I'm more of a pro-Open Standards
> cheerleader), we mark all of the "charset=Windows-125[0-8]"
> messages by 4.85...
> 
> Why?  Because none of the Windows charsets do anything that
> the ISO-8859-x charsets don't already do...  and at least one
> Internet Draft suggests requiring the following encoding rules:
> 
> * ASCII (NVT) should be encoded as USASCII
> 
> * Anything that can fit into ISO-8859-1 must be encoded as this
>   (assuming it doesn't fit into USASCII, of course);
> 
> * All else should be encoded as UTF8.  Period.  Full-stop.
> 
> Makes sense to me.
> 
> It's easy enough (it's a single registry setting) to force Outlook
> to encode via either UTF-8 or Latin 1.

You don't need it. Don't know about Outlook Express, but Outlook got its own setting in tools->whatever about it.


> Should the following test be included in the distribution (with a
> score of 0.00) and we can crank it up based on what ham vs.
> spam differentiation indicates?
> 
> # don't allow windows-1252 text attachments...
> mimeheader __CTYPE_MH_WIN1252   Content-Type =~ 
> /charset=(\"windows-125[0-8]\"|windows-125[0-8])/i
> meta WIN_CHARSET                ((__CTYPE_MH_HTML || 
> __CTYPE_MH_TEXT_PLAIN) && __CTYPE_MH_WIN1252)
> describe WIN_CHARSET            Content-Type is Windows-specific text
> score WIN_CHARSET               0.01

In order to stress the SA's inclination toward Open Standards and push away any feeling of anti-Windows bigotism, I would suggest to name the test something like, say, NOT_RFCxxx_CHARSET and test for the charset not being any of the allowed ones.

Even IBM has (had?) its own charsets...

-----------------------------------
Giampaolo Tomassoni - IT Consultant
Piazza VIII Aprile 1948, 4
I-53044 Chiusi (SI) - Italy
Ph: +39-0578-21100

MAI inviare una e-mail a:
NEVER send an e-mail to:
 rainbowl@tomassoni.eu

> 
> 
> -Philip
> 
> 
> 
> 
> Robert Nicholson wrote:
> 
> >You may have misunderstand but that's the point.
> >
> >The message was _not_ being filtered out like it should be and that  
> >was because of the very generic /WINDOWS/ match.
> >
> >so that method doesn't really obey the locales you have set.
> >
> >when I take out the generic /WINDOWS/ match it does then screen it out.
> >
> >or rather is tagged against the rule.
> >
> >On Sep 11, 2006, at 8:40 AM, David Baron wrote:
> >
> >  
> >
> >>Local for HEBREW is not in this list.
> >>
> >>    
> >>
> >>>Windows-1255
> >>>
> >>>and apparently with locales
> >>>
> >>>DB<6> x @locales
> >>>0  'en'
> >>>1  'th'
> >>>2  'it'
> >>>3  'en_US'
> >>>
> >>>Mail::SpamAssassin::Locales::is_charset_ok_for_locales($1, @locales)
> >>>
> >>>returns true
> >>>
> >>>Mail::SpamAssassin::Locales::is_charset_ok_for_locales(/home/robert/
> >>>lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/Locales.pm:91):
> >>>91:       return 1 if ($cs =~ /^WINDOWS/);      # argh, Windows
> >>>
> >>>what?
> >>>
> >>>On Sep 10, 2006, at 4:38 PM, Robert Nicholson wrote:
> >>>      
> >>>
> >>>>Why didn't foreign charset rules catch this?
> >>>>
> >>>>Begin forwarded message:
> >>>>        
> >>>>
> >>>>>From: ariel@kini12.com
> >>>>>Date: September 10, 2006 2:17:51 PM CDT
> >>>>>To: robert@elastica.com
> >>>>>Subject: פריצת דרך מאתגרת
> >>>>>X-Spam-Dcc: : grub.camros.com 1113; Body=5 Fuz1=5 Fuz2=3
> >>>>>X-Spam-Flag: YES
> >>>>>X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on
> >>>>>grub.camros.com
> >>>>>X-Spam-Level: *****
> >>>>>X-Spam-Status: Yes, score=5.7 required=0.6
> >>>>>tests=BAYES_95,FRONTPAGE,
> >>>>>HTML_90_100,HTML_IMAGE_RATIO_02,HTML_MESSAGE,HTML_TITLE_SUBJ_DIFF,
> >>>>>MIME_HTML_ONLY,NO_REAL_NAME,UNPARSEABLE_RELAY autolearn=no
> >>>>>version=3.1.1
> >>>>>X-Spam-Report: *  1.0 NO_REAL_NAME From: does not include a real
> >>>>>name *  0.0 UNPARSEABLE_RELAY Informational: message has
> >>>>>unparseable relay *      lines *  0.5 HTML_IMAGE_RATIO_02 BODY:
> >>>>>HTML has a low ratio of text to image *      area *  0.1
> >>>>>HTML_90_100 BODY: Message is 90% to 100% HTML *  0.0 HTML_MESSAGE
> >>>>>BODY: HTML included in message *  3.0 BAYES_95 BODY: Bayesian spam
> >>>>>probability is 95 to 99% *      [score: 0.9667] *  0.0
> >>>>>MIME_HTML_ONLY BODY: Message only has text/html MIME parts *  0.9
> >>>>>FRONTPAGE RAW: Frontpage used to create the message *  0.3
> >>>>>HTML_TITLE_SUBJ_DIFF HTML_TITLE_SUBJ_DIFF
> >>>>>Received: (qmail 10557 invoked from network); 10 Sep 2006 18:17:08
> >>>>>-0000
> >>>>>Received: from  (HELO kini12.com) (208.53.131.241) by 64.34.193.12
> >>>>>with SMTP; 10 Sep 2006 18:17:08 -0000
> >>>>>Message-Id: <20...@kini12.com>
> >>>>>Mime-Version: 1.0
> >>>>>Content-Type: text/html; charset="windows-1255"
> >>>>>Content-Transfer-Encoding: quoted-printable
> >>>>>Lines: 124
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>להגיע למיליון לקוחות ?גם אתם רוצים
> >>>>>נא לחצו כאן
> >>>>>
> >>>>>
> >>>>>מתנצלים אם גרמנו להפרעה, להסרה
> >>>>>מרשימת הדיוורנמען נכבד, אנו לחץ
> >>>>>
> >>>>>להסרה לחצו כאן
> >>>>>          
> >>>>>
>