You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/03/29 20:37:25 UTC

[jira] [Commented] (PDFBOX-457) Invalid code encountered while decoding CCITT

    [ https://issues.apache.org/jira/browse/PDFBOX-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954413#comment-13954413 ] 

Tilman Hausherr commented on PDFBOX-457:
----------------------------------------

There was a pdfbox bug (which I corrected) that the ccitt filter got the wrong length if it isn't the first filter. However there's still an unsolved problem, i.e. rendering that file. My current theory is that the ccitt stream that I get after the flate filter is applied from 580505.PR00003.000003.PDF is broken (because I get a perfect image by skipping 6 bytes), but that the decoders of pdf.js and gs (which have source code completely different than ours) are lenient.

The bug I corrected didn't have a big impact, because normally ccitt files aren't compressed a second time because the algorithm is really good for most bitonal files. The bug would just result in the ccitt image file being cut off.

> Invalid code encountered while decoding CCITT
> ---------------------------------------------
>
>                 Key: PDFBOX-457
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-457
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 0.8.0-incubator
>            Reporter: Marcelo Tavares
>            Assignee: Daniel Wilson
>              Labels: CCITTFaxDecode, TIFF, ccitt
>         Attachments: 580505.PR00003.000003.PDF, pdfbox-457-Scan_from_a_Xerox_WorkCentre_Pro.PDF, pdfbox-457-as_fax.pdf, pdfbox-457.PNG, testPDFToImage1.png
>
>
> I tried to convert the following document to image, but I got the attached result. 
> It parsed just the text. I also tried different formats like JPG.  I ran it using the PDFToImage class passing the document path as parameter. 
> I've read that sometimes the document is not created respecting the PDF standard. But, is there a possibility to ignore it?! In fact, it's very important to me, so, could I use PDF Box despite of those "errors"? 
> Thank you
> Marcelo



--
This message was sent by Atlassian JIRA
(v6.2#6252)