You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by James MacLean <ma...@ednet.ns.ca> on 2007/07/18 03:07:28 UTC

PDFText - pdftotext from Xpdf 3.02 limitation

Hi Folks,

Noticed that my bodies were not being parsed any more. Found out that 
SPAM was creating PDF's that are copy protected. Xpdf utils from 3.0 
will present the text, but at least 3.02 reports the file is copy 
protected and does not parse it...

Simple fix here was to compile a _special_ pdftotext to be used for 
SpamAssassin that would allow parsing of these files :).

JES

Re: PDFText - pdftotext from Xpdf 3.02 limitation

Posted by JT DeLys <jt...@gmail.com>.
> So, although our goal is just, and I don't believe we are "copying" the
> PDF against the spirit of this feature, I don't feel I can point out which
> lines to delete in the pdftotext.cc file before recompiling it :(.
>
> Maybe others can offer this info, or Google can help. For example :
>
> http://www.cs.cmu.edu/~dst/Adobe/Gallery/xpdf-0.93-ro-removed.patch<http://www.cs.cmu.edu/%7Edst/Adobe/Gallery/xpdf-0.93-ro-removed.patch>
>


Ok.  Wisely unhelpful of you.

I'll make sure not to look at / modify / use that patch!

;-)

-- 
Thanks,

    JTDeLys

Re: PDFText - pdftotext from Xpdf 3.02 limitation

Posted by James MacLean <ma...@ednet.ns.ca>.
Hi JT,

There is the expectation that if the author requested that a PDF not be 
copied, then the PDF is not to be copied. This is done by a password 
protecting mechanism when the PDF is saved and exists in the PDF file. 
The author of Xpdf makes his position known on subverting this feature:

http://www.foolabs.com/xpdf/cracking.html

So, although our goal is just, and I don't believe we are "copying" the 
PDF against the spirit of this feature, I don't feel I can point out 
which lines to delete in the pdftotext.cc file before recompiling it :(.

Maybe others can offer this info, or Google can help. For example :

http://www.cs.cmu.edu/~dst/Adobe/Gallery/xpdf-0.93-ro-removed.patch

JES

JT DeLys wrote, on 17/07/07 10:23 PM:
>
>     Simple fix here was to compile a _special_ pdftotext to be used for
>     SpamAssassin that would allow parsing of these files :). 
>
>
>
> could you share what you did that was 'special'?
>
> config options? other?
>
>
> -- 
> Thanks,
>
>     JTDeLys 

Re: PDFText - pdftotext from Xpdf 3.02 limitation

Posted by JT DeLys <jt...@gmail.com>.
> Simple fix here was to compile a _special_ pdftotext to be used for
> SpamAssassin that would allow parsing of these files :).



could you share what you did that was 'special'?

config options? other?


-- 
Thanks,

    JTDeLys