You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by polloxx <po...@gmail.com> on 2008/05/02 15:38:41 UTC

ocr plugin

Hi,

Am I right to say that picture spam has dropped dramatically since the
last months?
Is it still reasonable to run an orc plugin? I see the latest FuzzyORC
version is
not SA 3.2.x compatible. Are there more recent product compatible to 3.2.x?
Are you guys still running an ocr plugin on production servers?

Thanks for your answers,
P.

Re: ocr plugin

Posted by decoder <de...@own-hero.net>.
Theo Van Dinter wrote:
> On Fri, May 02, 2008 at 09:12:12PM +0200, decoder wrote:
>   
>> Also, the SA plugin architecture is not designed to modify the message 
>> in any way, so you cannot push back the text into the normal processing 
>> line.
>>     
>
> Really?  Who says?  I made very specific modifications in 3.2 to allow for
> just that.
>
> Search the list archives for "post_message_parse".
>   
Ah ok, I was refering to the 3.1.x architecture. I haven't looked at the 
changes done in 3.2, but if this is technically possible now, then I 
apologize :D


Best regards,


Chris


Re: ocr plugin

Posted by Theo Van Dinter <fe...@apache.org>.
On Fri, May 02, 2008 at 09:12:12PM +0200, decoder wrote:
> Also, the SA plugin architecture is not designed to modify the message 
> in any way, so you cannot push back the text into the normal processing 
> line.

Really?  Who says?  I made very specific modifications in 3.2 to allow for
just that.

Search the list archives for "post_message_parse".

-- 
Randomly Selected Tagline:
"If you ever reach total enlightenment while drinking beer, I bet it
 makes beer shoot out your nose."  - Deep Thought, Jack Handy

Re: ocr plugin

Posted by decoder <de...@own-hero.net>.
Matus UHLAR - fantomas wrote:
> does it push the extracted text back to SA so it could be used by e.g.
> bayes? This is how it imho should be used.
>
> (and imho the same for .pdf and/or .doc - extract text _and_ images from
> it, call OCR for images...)
>
>   
That is a question that was very frequently asked around here and that's 
why I also included it in the FuzzyOcr FAQ:

"If you take a look at the actual results of the OCR engines used, then 
you'll see that the output suffers from a lot of noise. Hence, it is not 
suited for common word analysis like bayes, and FuzzyOcr uses a special 
fuzzy matching algorithm to find the words"

Also, the SA plugin architecture is not designed to modify the message 
in any way, so you cannot push back the text into the normal processing 
line.

As to image spam in general: Yes, it has dropped dramatically and I 
haven't seen any actually for quite a long time now. I hope that my tool 
is one reason that this annoying technique is gone now :D


Best regards,


Chris


Re: ocr plugin

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> >>> Am I right to say that picture spam has dropped dramatically since the
> >>> last months?

On 02.05.08 11:38, Joseph Brennan wrote:
> Right.  There's close to none now.  Spam techniques come and go.

does it push the extracted text back to SA so it could be used by e.g.
bayes? This is how it imho should be used.

(and imho the same for .pdf and/or .doc - extract text _and_ images from
it, call OCR for images...)

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
A day without sunshine is like, night.

Re: ocr plugin

Posted by Joseph Brennan <br...@columbia.edu>.
>> > Am I right to say that picture spam has dropped dramatically since the
>> > last months?


Right.  There's close to none now.  Spam techniques come and go.

Joseph Brennan
Columbia University IT



Re: ocr plugin

Posted by William Taylor <wi...@corp.sonic.net>.
On Fri, May 02, 2008 at 06:06:05PM +0300, Henrik K wrote:
> On Fri, May 02, 2008 at 03:38:41PM +0200, polloxx wrote:
> > Hi,
> > 
> > Am I right to say that picture spam has dropped dramatically since the
> > last months?
> 
> Has there been any in a year? That's when I dropped using it.
> 

It's probably not worth the resources running it right now. I only get a few that trickle in here and there.
Others mileage may very though.

Re: ocr plugin

Posted by Henrik K <he...@hege.li>.
On Fri, May 02, 2008 at 03:38:41PM +0200, polloxx wrote:
> Hi,
> 
> Am I right to say that picture spam has dropped dramatically since the
> last months?

Has there been any in a year? That's when I dropped using it.


Re: ocr plugin

Posted by William Taylor <wi...@corp.sonic.net>.
We are using the SVN version of FuzzyOCR. It seems to be working fine.

-William

On Fri, May 02, 2008 at 03:38:41PM +0200, polloxx wrote:
> Hi,
> 
> Am I right to say that picture spam has dropped dramatically since the
> last months?
> Is it still reasonable to run an orc plugin? I see the latest FuzzyORC
> version is
> not SA 3.2.x compatible. Are there more recent product compatible to 3.2.x?
> Are you guys still running an ocr plugin on production servers?
> 
> Thanks for your answers,
> P.
>