You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Antti Lankila (JIRA)" <ji...@apache.org> on 2014/05/31 11:31:02 UTC

[jira] [Updated] (PDFBOX-2105) Support for multipage TIFFs in CCITTFactory, makes PDFBox capable of doing tiff2pdf

     [ https://issues.apache.org/jira/browse/PDFBOX-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antti Lankila updated PDFBOX-2105:
----------------------------------

    Attachment: pdfbox-multipagetiff.diff

Here's the patch. Please let me know if this form is acceptable, or if we should make usage a bit different.

An alternative design for this feature that I thought up would involve in having a static method that returns List<Integer> of addresses which correspond to starting points of TIFF images in the RandomAccessBuffer. Instead of passing the page number, we would pass the address. This would be slightly better in that:

1) it avoids the O(N^2) algorithm for extracting TIFF pages
2) allows user to discover how many pages a TIFF contains

> Support for multipage TIFFs in CCITTFactory, makes PDFBox capable of doing tiff2pdf
> -----------------------------------------------------------------------------------
>
>                 Key: PDFBOX-2105
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2105
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel
>            Reporter: Antti Lankila
>            Priority: Minor
>              Labels: features, patch
>         Attachments: pdfbox-multipagetiff.diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I created a patch based on Sergey Ushakov's work that handles multipage TIFFs. This allows fast and efficient conversion from TIFF to PDF
> The general approach is to provide a new factory method that accepts an image (page) number, and then appropriate page number is located when the CCITT stream is being extracted.
> There's a minor inefficiency in this approach because the seek starts from the beginning for each page, causing O(N^2) algorithm when extracting every page, but maximum size for file appears to be 2 GB and the cost for finding a single page will still be low, so I bet this will never come up in practice.
> There is no method that tells how many pages TIFF files have. I opted to simply return null from the factory method that accepts page number if there is no such page, so users can use this as condition to break from a TIFF to PDF conversion loop.



--
This message was sent by Atlassian JIRA
(v6.2#6252)