You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Abel Salgado Romero (JIRA)" <ji...@apache.org> on 2015/07/07 19:53:04 UTC

[jira] [Commented] (PDFBOX-19) Linearize command line tool

    [ https://issues.apache.org/jira/browse/PDFBOX-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617069#comment-14617069 ] 

Abel Salgado Romero commented on PDFBOX-19:
-------------------------------------------

I know this is an old issue, but I would like to share my opinion regarding this issue.

I work in content management systems and this feature is far from little benefit, Business needs usually are related to some of this requirements which make this feature really useful:
· Costs and low solution complexity: deploying 'true' streaming solutions, those that serve PDF pages on demang is expensive in terms of licences and requires customizaton of closed products. On the other hand, with linearized PDFs you can start seeing the document at the same time it downloads (same as a youtube video) and you can do this with standard free products like the Adobe PDF plugin or the embedded viewers in Firefox of Chrome. You can even use some open source solution based on JS to do so.
· Some business activities require big PDF documents, for instance with high-res images (more than 30MB, even hundreds): (without 'true' streaming solutions) serving non-linearized documents over HTTP means that the whole PDF must be downloaded to see even the first page. So users (customers and workers) must wait for like 20s or 1 min without doing anything, as you can imagine, this not acceptable for some organizations.
· Human review/assessment of long documents: usually users only need to check for the front page or some part of the document to do some business activity. Being able to see the first sections at the same time it is downloaded saves a lot of time reducing the waiting time. It also saves bandwith, because usually the user closes the connection before downloading the whole document.

I hope I made myself clear enough :) and you may reconsider the priority of this feature.

> Linearize command line tool
> ---------------------------
>
>                 Key: PDFBOX-19
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-19
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Utilities
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1033054
> Originally submitted by benlitchfield on 2004-09-22 17:42.
> PDFBox should come with a utility to convert a pdf to a 
> linearized pdf document.
> [comment on SourceForge]
> Originally sent by renaudw.
> Logged In: YES 
> user_id=609291
> Originator: NO
> I wish to voice my support for this feature request. It would be very useful to us too. Thanks for the great work!
> [comment on SourceForge]
> Originally sent by meshcurrent.
> Logged In: YES 
> user_id=989425
> I am involved in a free project where we are digitizing a very 
> large number of books and turning them into PDF using open 
> source software to serve on the Net. I thought I'd post a 
> comment to say that linearization would be a really attractive 
> feature for us in PDFBox if implemented.
> Youssef Eldakar
> Bibliotheca Alexandrina
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES 
> user_id=601708
> Some example pdfs (linearized and not):
>  The linearized version is created by an evaluation 
> version of
> PdfLib. Don't worry about the blank page.
>  The pdf is also beeing validated by our in-house pdf 
> expert. I've
> tryed it today.
>  Sadly, it is urgent for us to deliver a correct version 
> of the pdf
> to our customer, I think we will buy a version of 
>  PdfLib (we control
> it via a JNI Bridge)..
>  Anyhow, if you find a way of implementing the 
> linearization in
> PDFBox, I will be happy to throw away PdfLib.
>  I think a constructor like 
> org.pdfbox.pdmodel.PDDocument(COSDocument
> doc, boolean linearize) would be nice, and sorry  I 
> don't have time to
> help you in enhancing PdfBox now. (Maybe I'll write some 
> examples of basic
> usage pattern of your  library)
> See the following for examples.
> Linearized_c_14720040602en00010001.pdf
> Not_Linearized_c_14720040602en00010001.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org