You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2020/09/05 14:49:00 UTC

[jira] [Commented] (PDFBOX-4952) PDF compression - object stream creation

    [ https://issues.apache.org/jira/browse/PDFBOX-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191062#comment-17191062 ] 

Tilman Hausherr commented on PDFBOX-4952:
-----------------------------------------

I will probably not handle this myself (but the idea is a very useful one, especially for files that have a structure tree), {{COSWriter}} is a difficult class, I need to handle PDFBOX-45 first. Make sure you're not breaking the API. Your diff replaces some types within "<..>", I don't know if that works. Try also to avoid changing formatting, this makes the diff longer. Is the DCT part needed at this time? This is about a different problem IMHO. The shorter the change is, the less scary it looks :-)

> PDF compression - object stream creation
> ----------------------------------------
>
>                 Key: PDFBOX-4952
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4952
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: PDModel
>    Affects Versions: 2.0.21
>            Reporter: Christian Appl
>            Priority: Major
>
> I implemented a basic starting point to realize a PDF compression based on PDFBox 2.0.22-SNAPSHOT
> I want to use this ticket, to ask if you would be interested in such a feature and whether you would be interested to merge it into PDFBox.
> This is sort of a POC, only implementing some very basic functionality, that surely must and could be extended further and it does only implement some very basic and simplistic Unit Tests.
>  However it is able to reduce the size of resulting documents, and creates objectstreams as defined in the PDF reference manual.
> *What it currently does:*
>  It provides the bundling and compression of objects to objectstreams and further applies simple content compression to a small selection of contents.
> To realize content compression, it provides a simple interface and abstract class for "ContentCompressor"s which search a document for specific content, that could be compressed and do compress that contents.
> Currently two content compressors exist:
>  _ImageCompressor_
>  Searches for simple images, that could be compressed using DCT.
> _UnencodedStreamCompressor_
>  Searches the document for yet unencoded streams and applies a Flate compression where necessary.
> Both compressors can be parameterized using a centralized "CompressParameters" instance which is passed to a new "saveCompressed" method of PDDocument.
> The compression is based on, modifies and is realized by a set of extensions for the "COSWriter" class. Basically it organizes objects, that are passed to the COSWriter in objectStreams and applies content optimization where necessary and possible.
> Currently this does support encryption, but does not support linearization of the compressed documents.
> *Caveat:*
>  If this feature is interesting to you, then I would not expect you to simply merge this fork into 2.0.22. I am expecting that you would like to have some details and concepts changed and am ready to implement changes that would be required for this to work to your liking.
> *POC:*
>  4 resulting documents can be found in "target/test-output/compression" when "COSDocumentCompressionTest" is run.
> *The Pull request can be found on Github at:*
>  [https://github.com/apache/pdfbox/pull/86]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org