You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Scott Ostrander <SO...@printronix.com> on 2013/03/07 21:21:20 UTC

ExtractText.pm not working with SA 3.4

Does anybody have ExtractText working with SA 3.4?
http://whatever.truls.org/graphdefang/ExtractText.zip

I loved this third party plugin back in SA 3.2.5.
Every once in a while some attachment spam gets through.

unrtf on command line works giving expected output.
/usr/local/bin/unrtf -t ExtractText.tags -nopict  RTF.rtf

Debug output shows nothing extracted.

Mar  7 10:22:15.405 [18289] dbg: extracttext: set: magic=1
Mar  7 10:22:15.405 [18289] dbg: extracttext: external: antiword "/usr/bin/antiword","-t","-w","0","-m","UTF-8.txt","-"
Mar  7 10:22:15.406 [18289] dbg: extracttext: use: antiword name .*\.doc
Mar  7 10:22:15.406 [18289] dbg: extracttext: use: antiword name .*\.dot
Mar  7 10:22:15.406 [18289] dbg: extracttext: use: antiword type application/(?:vnd\.?)?ms-?word.*
Mar  7 10:22:15.406 [18289] dbg: extracttext: external: unrtf "/usr/local/bin/unrtf","-t","ExtractText.tags","--nopict"
Mar  7 10:22:15.406 [18289] dbg: extracttext: use: unrtf name .*\.doc
Mar  7 10:22:15.407 [18289] dbg: extracttext: use: unrtf name .*\.rtf
Mar  7 10:22:15.407 [18289] dbg: extracttext: use: unrtf type application/rtf
Mar  7 10:22:15.407 [18289] dbg: extracttext: use: unrtf type text/rtf
Mar  7 10:22:15.407 [18289] dbg: extracttext: external: odt2txt "/usr/bin/odt2txt","--encoding=UTF-8","${file}"
Mar  7 10:22:15.407 [18289] dbg: extracttext: use: odt2txt name .*\.odt
Mar  7 10:22:15.407 [18289] dbg: extracttext: use: odt2txt name .*\.ott
Mar  7 10:22:15.408 [18289] dbg: extracttext: use: odt2txt type application/.*?opendocument.*text
Mar  7 10:22:15.408 [18289] dbg: extracttext: use: odt2txt name .*\.sdw
Mar  7 10:22:15.408 [18289] dbg: extracttext: use: odt2txt name .*\.stw
Mar  7 10:22:15.408 [18289] dbg: extracttext: use: odt2txt type application/(?:x-)?soffice
Mar  7 10:22:15.408 [18289] dbg: extracttext: use: odt2txt type application/(?:x-)?starwriter
Mar  7 10:22:15.409 [18289] dbg: extracttext: external: pdftohtml "/usr/bin/pdftohtml","-i","-xml","-stdout","-noframes","${file}"
Mar  7 10:22:15.409 [18289] dbg: extracttext: external: pdftotext "/usr/bin/pdftotext","-q","-nopgbrk","-enc","UTF-8","${file}","-"
Mar  7 10:22:15.409 [18289] dbg: extracttext: use: pdftotext name .*\.pdf
Mar  7 10:22:15.409 [18289] dbg: extracttext: use: pdftotext type application/pdf
Mar  7 10:22:18.048 [18289] dbg: extracttext: MIME database: /usr/share/mime
Mar  7 10:22:18.152 [18289] dbg: extracttext: Part: application/rtf RTF.rtf
Mar  7 10:22:18.152 [18289] dbg: extracttext: Match: name "RTF.rtf" =~ ".*\.rtf"
Mar  7 10:22:18.213 [18289] dbg: extracttext: External call: unrtf "/usr/local/bin/unrtf","-t","ExtractText.tags","--nopict"
Mar  7 10:22:18.214 [18289] info: extracttext: External extraction command: "/usr/local/bin/unrtf","-t","ExtractText.tags","--nopict"
Mar  7 10:22:18.214 [18289] info: extracttext: External extraction object: 17 application/rtf "RTF.rtf"
Mar  7 10:22:18.214 [18289] info: extracttext: External extraction error: unrtf 0 ?
Mar  7 10:22:18.259 [18289] dbg: extracttext: Not extracted
Mar  7 10:22:18.259 [18289] dbg: extracttext: X-ExtractText-Words: 0
Mar  7 10:22:18.259 [18289] dbg: extracttext: X-ExtractText-Chars: 0
Mar  7 10:22:18.389 [18289] dbg: bayes: header tokens for x-extracttext-chars = " 0"
Mar  7 10:22:18.389 [18289] dbg: bayes: header tokens for x-extracttext-words = " 0"

Thanks,
Scott Ostrander