You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Richard Humphrey <ri...@multicam.com> on 2004/07/09 14:59:59 UTC
[OT] SA 3.0 Maillog parser
On RedHat 9 I get the following error when running sa-stats.pl
Any ideas how to make it go away?
Malformed UTF-8 character (overflow at 0xfc332714, byte 0x78, after
start byte 0xff) in pattern match (m//) at sa-stats.pl line 53, <F> line
1038
Dallas L. Engelken wrote:
> I just whipped up a quick maillog parser display top rules firing in
> 3.0. It reads all maillog* files and generates top firing rules for
> ham and spam, as well as some general info. A copy of it can be found
> here http://www.rulesemporium.com/programs/sa-stats.txt
>
> [root@mailgw tmp]# perl sa-stats.pl 5
> Email: 31808 Autolearn: 2245 AvgScore: 1.68 AvgScanTime: 2.13
> sec
> Spam: 4381 Autolearn: 2219 AvgScore: 12.43 AvgScanTime: 4.41
> sec
> Ham: 27427 Autolearn: 26 AvgScore: -0.04 AvgScanTime: 1.76
> sec
>
> Time Spent Running SA: 18.80 hours
> Time Spent Processing Spam: 5.37 hours
> Time Spent Processing Ham: 13.43 hours
>
> TOP SPAM RULES FIRED
> ------------------------------------------------
> COUNT RULE NAME PERCENT
> ------------------------------------------------
> 3616 HTML_MESSAGE 5.55%
> 2249 URIBL_SBL 3.45%
> 2069 MIME_HTML_ONLY 3.18%
> 1885 URIBL_WS_SURBL 2.89%
> 1630 URIBL_SC_SURBL 2.50%
> ------------------------------------------------
>
> TOP HAM RULES FIRED
> ------------------------------------------------
> COUNT RULE NAME PERCENT
> ------------------------------------------------
> 6996 AWL 17.56%
> 2969 HTML_MESSAGE 7.45%
> 2546 NO_REAL_NAME 6.39%
> 2465 FORGED_RCVD_HELO 6.19%
> 2019 LONGWORD_TEST_1 5.07%
> ------------------------------------------------
>
> You can override the number of top rules shown by passing a number to
> the script..
>
> ./sa-stats 10 # shows top 10
> ./sa-stats # shows default of 20
>
> You can change the default number of rules shown by changing
> $TOPRULES=20;
> In the script.
>
> That's about all it does right now, but that's all I wanted it to do :)
>
>
> It's pretty CPU intensive on large maillog's, so be warned. It's less
> than 1 second (2.4P4,512MB) for around 30k records, at least on my
> maillogs...
>
> [root@mailgw tmp]# time perl sa-stats.pl | tail -0
> real 0m0.896s
> user 0m0.880s
> sys 0m0.010s
>
> Using it like this works well...
>
> [root@mailgw tmp]# perl sa-stats.pl | mail <youremail>
>
> Have Fun!
>
>
Re: [OT] SA 3.0 Maillog parser
Posted by Dallas Engelken <da...@engelken.net>.
Richard Humphrey wrote:
> On RedHat 9 I get the following error when running sa-stats.pl
> Any ideas how to make it go away?
>
> Malformed UTF-8 character (overflow at 0xfc332714, byte 0x78, after
> start byte 0xff) in pattern match (m//) at sa-stats.pl line 53, <F>
> line 1038
>
>
http://www.google.com/search?hl=en&ie=UTF-8&q=Malformed+UTF-8+character++%22redhat+9%22&btnG=Google+Search
to my knownledge, this is related to your language settings in
/etc/sysconfig/i18n
See if your LANG is set to UTF-8, and if so, try en_US maybe.
$ echo $LANG
en_US.UTF-8
$ export LANG=en_US
and re-run ./sa-stats.pl before making changes to i18n
dallas