You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2014/10/13 12:34:34 UTC

[jira] [Closed] (PDFBOX-384) sometimes, when PDFBox writes stream's content in a PDF file, it can no longer read it

     [ https://issues.apache.org/jira/browse/PDFBOX-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler closed PDFBOX-384.
-------------------------------------
    Resolution: Won't Fix
      Assignee: Andreas Lehmkühler

Closed as I guess some of the ideas are already implemented.

> sometimes, when PDFBox writes stream's content in a PDF file, it can no longer read it
> --------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-384
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-384
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Writing
>    Affects Versions: 0.7.3
>         Environment: pdfbox 0.73, java 5, windows os
>            Reporter: Son
>            Assignee: Andreas Lehmkühler
>         Attachments: COSStream.java, COSWriter.java
>
>
> the stream content writing of PDFBox  creates a Length entry in the stream's directory that is an indirect reference.
> the specification states (extracted from pdf reference 1.5, but also valid for all reference guide since), section 3.2.7 Stream Objects:
> ...
> stream consists of a dictionary that describes a sequence of bytes, followed by
> zero or more bytes bracketed between the keywords stream and endstream: 
> dictionary
> stream
> ...Zero or more bytes...
> endstream
> All streams must be indirect objects (see Section 3.2.9, "Indirect Objects") and
> the stream dictionary must be a direct object. The keyword stream  that follows
> the stream dictionary should be followed by an end-of-line marker...
> the stream dictionary must be direct. what is not state is that entries in the dictionary should be direct as well as .... later on, it says in the Stream Extent paragraph:
> ...
> Every stream dictionary has a Length entry that indicates how many bytes of the
> PDF file are used for the stream's data. (If the stream has a filter, Length  is the
> number of bytes of encoded data.) In addition, most filters are defined so that the
> data is self-limiting; that is, they use  an encoding scheme  in which an explicit
> end-of-data  (EOD) marker delimits the extent of the data. Finally, streams are
> used to represent many objects from whose attributes a length can be inferred. All
> of these constraints must be consistent. 
> ...
> It indicates that most filters handles self-delimiting data ... thereby not requiring all filtering algorithm to support so.
> So, in order to explicitly set the Length value inside the stream dictionary, the filtering of content should be made prior to writing the dictionary.
> The current PDFBox behavior does the following:
> (see org.pdfbox.pdfwriter.COSWriter.visitFromStream(COSStream obj) at line 929:
> ...
>             InputStream input = obj.getFilteredStream();
>             // set the length of the stream and write stream dictionary
>             COSObject lengthObject = new COSObject( null );
>             
>             obj.setItem(COSName.LENGTH, lengthObject);
>             // write the stream content
>             visitFromDictionary( obj );
>             getStandardOutput().write(STREAM);
> ...
>             // writes the content
> ...
>             lengthObject.setObject( new COSInteger( totalAmountWritten ) );
>             getStandardOutput().writeCRLF();
>             getStandardOutput().write(ENDSTREAM);
>             getStandardOutput().writeEOL();
>             return null;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)