You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Scott Ostrander <SO...@printronix.com> on 2013/03/07 21:21:20 UTC
ExtractText.pm not working with SA 3.4
Does anybody have ExtractText working with SA 3.4?
http://whatever.truls.org/graphdefang/ExtractText.zip
I loved this third party plugin back in SA 3.2.5.
Every once in a while some attachment spam gets through.
unrtf on command line works giving expected output.
/usr/local/bin/unrtf -t ExtractText.tags -nopict RTF.rtf
Debug output shows nothing extracted.
Mar 7 10:22:15.405 [18289] dbg: extracttext: set: magic=1
Mar 7 10:22:15.405 [18289] dbg: extracttext: external: antiword "/usr/bin/antiword","-t","-w","0","-m","UTF-8.txt","-"
Mar 7 10:22:15.406 [18289] dbg: extracttext: use: antiword name .*\.doc
Mar 7 10:22:15.406 [18289] dbg: extracttext: use: antiword name .*\.dot
Mar 7 10:22:15.406 [18289] dbg: extracttext: use: antiword type application/(?:vnd\.?)?ms-?word.*
Mar 7 10:22:15.406 [18289] dbg: extracttext: external: unrtf "/usr/local/bin/unrtf","-t","ExtractText.tags","--nopict"
Mar 7 10:22:15.406 [18289] dbg: extracttext: use: unrtf name .*\.doc
Mar 7 10:22:15.407 [18289] dbg: extracttext: use: unrtf name .*\.rtf
Mar 7 10:22:15.407 [18289] dbg: extracttext: use: unrtf type application/rtf
Mar 7 10:22:15.407 [18289] dbg: extracttext: use: unrtf type text/rtf
Mar 7 10:22:15.407 [18289] dbg: extracttext: external: odt2txt "/usr/bin/odt2txt","--encoding=UTF-8","${file}"
Mar 7 10:22:15.407 [18289] dbg: extracttext: use: odt2txt name .*\.odt
Mar 7 10:22:15.407 [18289] dbg: extracttext: use: odt2txt name .*\.ott
Mar 7 10:22:15.408 [18289] dbg: extracttext: use: odt2txt type application/.*?opendocument.*text
Mar 7 10:22:15.408 [18289] dbg: extracttext: use: odt2txt name .*\.sdw
Mar 7 10:22:15.408 [18289] dbg: extracttext: use: odt2txt name .*\.stw
Mar 7 10:22:15.408 [18289] dbg: extracttext: use: odt2txt type application/(?:x-)?soffice
Mar 7 10:22:15.408 [18289] dbg: extracttext: use: odt2txt type application/(?:x-)?starwriter
Mar 7 10:22:15.409 [18289] dbg: extracttext: external: pdftohtml "/usr/bin/pdftohtml","-i","-xml","-stdout","-noframes","${file}"
Mar 7 10:22:15.409 [18289] dbg: extracttext: external: pdftotext "/usr/bin/pdftotext","-q","-nopgbrk","-enc","UTF-8","${file}","-"
Mar 7 10:22:15.409 [18289] dbg: extracttext: use: pdftotext name .*\.pdf
Mar 7 10:22:15.409 [18289] dbg: extracttext: use: pdftotext type application/pdf
Mar 7 10:22:18.048 [18289] dbg: extracttext: MIME database: /usr/share/mime
Mar 7 10:22:18.152 [18289] dbg: extracttext: Part: application/rtf RTF.rtf
Mar 7 10:22:18.152 [18289] dbg: extracttext: Match: name "RTF.rtf" =~ ".*\.rtf"
Mar 7 10:22:18.213 [18289] dbg: extracttext: External call: unrtf "/usr/local/bin/unrtf","-t","ExtractText.tags","--nopict"
Mar 7 10:22:18.214 [18289] info: extracttext: External extraction command: "/usr/local/bin/unrtf","-t","ExtractText.tags","--nopict"
Mar 7 10:22:18.214 [18289] info: extracttext: External extraction object: 17 application/rtf "RTF.rtf"
Mar 7 10:22:18.214 [18289] info: extracttext: External extraction error: unrtf 0 ?
Mar 7 10:22:18.259 [18289] dbg: extracttext: Not extracted
Mar 7 10:22:18.259 [18289] dbg: extracttext: X-ExtractText-Words: 0
Mar 7 10:22:18.259 [18289] dbg: extracttext: X-ExtractText-Chars: 0
Mar 7 10:22:18.389 [18289] dbg: bayes: header tokens for x-extracttext-chars = " 0"
Mar 7 10:22:18.389 [18289] dbg: bayes: header tokens for x-extracttext-words = " 0"
Thanks,
Scott Ostrander