You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by James MacLean <ma...@ednet.ns.ca> on 2007/07/18 03:07:28 UTC
PDFText - pdftotext from Xpdf 3.02 limitation
Hi Folks,
Noticed that my bodies were not being parsed any more. Found out that
SPAM was creating PDF's that are copy protected. Xpdf utils from 3.0
will present the text, but at least 3.02 reports the file is copy
protected and does not parse it...
Simple fix here was to compile a _special_ pdftotext to be used for
SpamAssassin that would allow parsing of these files :).
JES
Re: PDFText - pdftotext from Xpdf 3.02 limitation
Posted by JT DeLys <jt...@gmail.com>.
> So, although our goal is just, and I don't believe we are "copying" the
> PDF against the spirit of this feature, I don't feel I can point out which
> lines to delete in the pdftotext.cc file before recompiling it :(.
>
> Maybe others can offer this info, or Google can help. For example :
>
> http://www.cs.cmu.edu/~dst/Adobe/Gallery/xpdf-0.93-ro-removed.patch<http://www.cs.cmu.edu/%7Edst/Adobe/Gallery/xpdf-0.93-ro-removed.patch>
>
Ok. Wisely unhelpful of you.
I'll make sure not to look at / modify / use that patch!
;-)
--
Thanks,
JTDeLys
Re: PDFText - pdftotext from Xpdf 3.02 limitation
Posted by James MacLean <ma...@ednet.ns.ca>.
Hi JT,
There is the expectation that if the author requested that a PDF not be
copied, then the PDF is not to be copied. This is done by a password
protecting mechanism when the PDF is saved and exists in the PDF file.
The author of Xpdf makes his position known on subverting this feature:
http://www.foolabs.com/xpdf/cracking.html
So, although our goal is just, and I don't believe we are "copying" the
PDF against the spirit of this feature, I don't feel I can point out
which lines to delete in the pdftotext.cc file before recompiling it :(.
Maybe others can offer this info, or Google can help. For example :
http://www.cs.cmu.edu/~dst/Adobe/Gallery/xpdf-0.93-ro-removed.patch
JES
JT DeLys wrote, on 17/07/07 10:23 PM:
>
> Simple fix here was to compile a _special_ pdftotext to be used for
> SpamAssassin that would allow parsing of these files :).
>
>
>
> could you share what you did that was 'special'?
>
> config options? other?
>
>
> --
> Thanks,
>
> JTDeLys
Re: PDFText - pdftotext from Xpdf 3.02 limitation
Posted by JT DeLys <jt...@gmail.com>.
> Simple fix here was to compile a _special_ pdftotext to be used for
> SpamAssassin that would allow parsing of these files :).
could you share what you did that was 'special'?
config options? other?
--
Thanks,
JTDeLys