You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by polloxx <po...@gmail.com> on 2008/05/02 15:38:41 UTC
ocr plugin
Hi,
Am I right to say that picture spam has dropped dramatically since the
last months?
Is it still reasonable to run an orc plugin? I see the latest FuzzyORC
version is
not SA 3.2.x compatible. Are there more recent product compatible to 3.2.x?
Are you guys still running an ocr plugin on production servers?
Thanks for your answers,
P.
Re: ocr plugin
Posted by decoder <de...@own-hero.net>.
Theo Van Dinter wrote:
> On Fri, May 02, 2008 at 09:12:12PM +0200, decoder wrote:
>
>> Also, the SA plugin architecture is not designed to modify the message
>> in any way, so you cannot push back the text into the normal processing
>> line.
>>
>
> Really? Who says? I made very specific modifications in 3.2 to allow for
> just that.
>
> Search the list archives for "post_message_parse".
>
Ah ok, I was refering to the 3.1.x architecture. I haven't looked at the
changes done in 3.2, but if this is technically possible now, then I
apologize :D
Best regards,
Chris
Re: ocr plugin
Posted by Theo Van Dinter <fe...@apache.org>.
On Fri, May 02, 2008 at 09:12:12PM +0200, decoder wrote:
> Also, the SA plugin architecture is not designed to modify the message
> in any way, so you cannot push back the text into the normal processing
> line.
Really? Who says? I made very specific modifications in 3.2 to allow for
just that.
Search the list archives for "post_message_parse".
--
Randomly Selected Tagline:
"If you ever reach total enlightenment while drinking beer, I bet it
makes beer shoot out your nose." - Deep Thought, Jack Handy
Re: ocr plugin
Posted by decoder <de...@own-hero.net>.
Matus UHLAR - fantomas wrote:
> does it push the extracted text back to SA so it could be used by e.g.
> bayes? This is how it imho should be used.
>
> (and imho the same for .pdf and/or .doc - extract text _and_ images from
> it, call OCR for images...)
>
>
That is a question that was very frequently asked around here and that's
why I also included it in the FuzzyOcr FAQ:
"If you take a look at the actual results of the OCR engines used, then
you'll see that the output suffers from a lot of noise. Hence, it is not
suited for common word analysis like bayes, and FuzzyOcr uses a special
fuzzy matching algorithm to find the words"
Also, the SA plugin architecture is not designed to modify the message
in any way, so you cannot push back the text into the normal processing
line.
As to image spam in general: Yes, it has dropped dramatically and I
haven't seen any actually for quite a long time now. I hope that my tool
is one reason that this annoying technique is gone now :D
Best regards,
Chris
Re: ocr plugin
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> >>> Am I right to say that picture spam has dropped dramatically since the
> >>> last months?
On 02.05.08 11:38, Joseph Brennan wrote:
> Right. There's close to none now. Spam techniques come and go.
does it push the extracted text back to SA so it could be used by e.g.
bayes? This is how it imho should be used.
(and imho the same for .pdf and/or .doc - extract text _and_ images from
it, call OCR for images...)
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
A day without sunshine is like, night.
Re: ocr plugin
Posted by Joseph Brennan <br...@columbia.edu>.
>> > Am I right to say that picture spam has dropped dramatically since the
>> > last months?
Right. There's close to none now. Spam techniques come and go.
Joseph Brennan
Columbia University IT
Re: ocr plugin
Posted by William Taylor <wi...@corp.sonic.net>.
On Fri, May 02, 2008 at 06:06:05PM +0300, Henrik K wrote:
> On Fri, May 02, 2008 at 03:38:41PM +0200, polloxx wrote:
> > Hi,
> >
> > Am I right to say that picture spam has dropped dramatically since the
> > last months?
>
> Has there been any in a year? That's when I dropped using it.
>
It's probably not worth the resources running it right now. I only get a few that trickle in here and there.
Others mileage may very though.
Re: ocr plugin
Posted by Henrik K <he...@hege.li>.
On Fri, May 02, 2008 at 03:38:41PM +0200, polloxx wrote:
> Hi,
>
> Am I right to say that picture spam has dropped dramatically since the
> last months?
Has there been any in a year? That's when I dropped using it.
Re: ocr plugin
Posted by William Taylor <wi...@corp.sonic.net>.
We are using the SVN version of FuzzyOCR. It seems to be working fine.
-William
On Fri, May 02, 2008 at 03:38:41PM +0200, polloxx wrote:
> Hi,
>
> Am I right to say that picture spam has dropped dramatically since the
> last months?
> Is it still reasonable to run an orc plugin? I see the latest FuzzyORC
> version is
> not SA 3.2.x compatible. Are there more recent product compatible to 3.2.x?
> Are you guys still running an ocr plugin on production servers?
>
> Thanks for your answers,
> P.
>