You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Richard Humphrey <ri...@multicam.com> on 2004/07/09 14:59:59 UTC

[OT] SA 3.0 Maillog parser

On RedHat 9 I get the following error when running sa-stats.pl
Any ideas how to make it go away?

Malformed UTF-8 character (overflow at 0xfc332714, byte 0x78, after 
start byte 0xff) in pattern match (m//) at sa-stats.pl line 53, <F> line 
1038

Dallas L. Engelken wrote:

> I just whipped up a quick maillog parser display top rules firing in
> 3.0.   It reads all maillog* files and generates top firing rules for
> ham and spam, as well as some general info.  A copy of it can be found
> here http://www.rulesemporium.com/programs/sa-stats.txt  
> 
> [root@mailgw tmp]# perl sa-stats.pl 5
> Email:    31808  Autolearn:  2245  AvgScore:   1.68  AvgScanTime:  2.13
> sec
> Spam:      4381  Autolearn:  2219  AvgScore:  12.43  AvgScanTime:  4.41
> sec
> Ham:      27427  Autolearn:    26  AvgScore:  -0.04  AvgScanTime:  1.76
> sec
> 
> Time Spent Running SA:        18.80 hours
> Time Spent Processing Spam:    5.37 hours
> Time Spent Processing Ham:    13.43 hours
> 
> TOP SPAM RULES FIRED
> ------------------------------------------------
> COUNT   RULE NAME                       PERCENT
> ------------------------------------------------
>  3616   HTML_MESSAGE                      5.55%
>  2249   URIBL_SBL                         3.45%
>  2069   MIME_HTML_ONLY                    3.18%
>  1885   URIBL_WS_SURBL                    2.89%
>  1630   URIBL_SC_SURBL                    2.50%
> ------------------------------------------------
> 
> TOP HAM RULES FIRED
> ------------------------------------------------
> COUNT   RULE NAME                       PERCENT
> ------------------------------------------------
>  6996   AWL                              17.56%
>  2969   HTML_MESSAGE                      7.45%
>  2546   NO_REAL_NAME                      6.39%
>  2465   FORGED_RCVD_HELO                  6.19%
>  2019   LONGWORD_TEST_1                   5.07%
> ------------------------------------------------
> 
> You can override the number of top rules shown by passing a number to
> the script..  
> 
>  ./sa-stats 10  # shows top 10
>  ./sa-stats     # shows default of 20
> 
> You can change the default number of rules shown by changing 
>  $TOPRULES=20;
> In the script.
> 
> That's about all it does right now, but that's all I wanted it to do :)
> 
> 
> It's pretty CPU intensive on large maillog's, so be warned.  It's less
> than 1 second (2.4P4,512MB) for around 30k records, at least on my
> maillogs...  
> 
> [root@mailgw tmp]# time perl sa-stats.pl  | tail -0
> real    0m0.896s
> user    0m0.880s
> sys     0m0.010s
> 
> Using it like this works well...
> 
> [root@mailgw tmp]# perl sa-stats.pl | mail <youremail>
> 
> Have Fun!
> 
> 

Re: [OT] SA 3.0 Maillog parser

Posted by Dallas Engelken <da...@engelken.net>.
Richard Humphrey wrote:

> On RedHat 9 I get the following error when running sa-stats.pl
> Any ideas how to make it go away?
>
> Malformed UTF-8 character (overflow at 0xfc332714, byte 0x78, after 
> start byte 0xff) in pattern match (m//) at sa-stats.pl line 53, <F> 
> line 1038
>
>
http://www.google.com/search?hl=en&ie=UTF-8&q=Malformed+UTF-8+character++%22redhat+9%22&btnG=Google+Search
to my knownledge, this is related to your language settings in 
/etc/sysconfig/i18n

See if your LANG is set to UTF-8, and if so, try en_US maybe.

    $ echo $LANG
    en_US.UTF-8

    $ export LANG=en_US

and re-run ./sa-stats.pl before making changes to i18n

dallas