You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Son (JIRA)" <ji...@apache.org> on 2008/11/02 15:54:44 UTC

[jira] Created: (PDFBOX-384) sometimes, when PDFBox writes stream's content in a PDF file, it can no longer read it

sometimes, when PDFBox writes stream's content in a PDF file, it can no longer read it
--------------------------------------------------------------------------------------

                 Key: PDFBOX-384
                 URL: https://issues.apache.org/jira/browse/PDFBOX-384
             Project: PDFBox
          Issue Type: Bug
          Components: Writing
    Affects Versions: 0.7.3
         Environment: pdfbox 0.73, java 5, windows os
            Reporter: Son


the stream content writing of PDFBox  creates a Length entry in the stream's directory that is an indirect reference.
the specification states (extracted from pdf reference 1.5, but also valid for all reference guide since), section 3.2.7 Stream Objects:

...
stream consists of a dictionary that describes a sequence of bytes, followed by
zero or more bytes bracketed between the keywords stream and endstream: 
dictionary
stream
...Zero or more bytes...
endstream
All streams must be indirect objects (see Section 3.2.9, "Indirect Objects") and
the stream dictionary must be a direct object. The keyword stream  that follows
the stream dictionary should be followed by an end-of-line marker...

the stream dictionary must be direct. what is not state is that entries in the dictionary should be direct as well as .... later on, it says in the Stream Extent paragraph:

...
Every stream dictionary has a Length entry that indicates how many bytes of the
PDF file are used for the stream's data. (If the stream has a filter, Length  is the
number of bytes of encoded data.) In addition, most filters are defined so that the
data is self-limiting; that is, they use  an encoding scheme  in which an explicit
end-of-data  (EOD) marker delimits the extent of the data. Finally, streams are
used to represent many objects from whose attributes a length can be inferred. All
of these constraints must be consistent. 
...
It indicates that most filters handles self-delimiting data ... thereby not requiring all filtering algorithm to support so.

So, in order to explicitly set the Length value inside the stream dictionary, the filtering of content should be made prior to writing the dictionary.

The current PDFBox behavior does the following:
(see org.pdfbox.pdfwriter.COSWriter.visitFromStream(COSStream obj) at line 929:

...
            InputStream input = obj.getFilteredStream();
            // set the length of the stream and write stream dictionary
            COSObject lengthObject = new COSObject( null );
            
            obj.setItem(COSName.LENGTH, lengthObject);
            // write the stream content
            visitFromDictionary( obj );
            getStandardOutput().write(STREAM);
...
            // writes the content
...
            lengthObject.setObject( new COSInteger( totalAmountWritten ) );
            getStandardOutput().writeCRLF();
            getStandardOutput().write(ENDSTREAM);
            getStandardOutput().writeEOL();
            return null;


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-384) sometimes, when PDFBox writes stream's content in a PDF file, it can no longer read it

Posted by "Son (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Son updated PDFBOX-384:
-----------------------

    Attachment: COSWriter.java
                COSStream.java

COSStream: added getFilteredLength () that allows for accessing the length of the filtered data
COSWriter: changed visitFromStream () to set directly length prior to writing the object



> sometimes, when PDFBox writes stream's content in a PDF file, it can no longer read it
> --------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-384
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-384
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Writing
>    Affects Versions: 0.7.3
>         Environment: pdfbox 0.73, java 5, windows os
>            Reporter: Son
>         Attachments: COSStream.java, COSWriter.java
>
>
> the stream content writing of PDFBox  creates a Length entry in the stream's directory that is an indirect reference.
> the specification states (extracted from pdf reference 1.5, but also valid for all reference guide since), section 3.2.7 Stream Objects:
> ...
> stream consists of a dictionary that describes a sequence of bytes, followed by
> zero or more bytes bracketed between the keywords stream and endstream: 
> dictionary
> stream
> ...Zero or more bytes...
> endstream
> All streams must be indirect objects (see Section 3.2.9, "Indirect Objects") and
> the stream dictionary must be a direct object. The keyword stream  that follows
> the stream dictionary should be followed by an end-of-line marker...
> the stream dictionary must be direct. what is not state is that entries in the dictionary should be direct as well as .... later on, it says in the Stream Extent paragraph:
> ...
> Every stream dictionary has a Length entry that indicates how many bytes of the
> PDF file are used for the stream's data. (If the stream has a filter, Length  is the
> number of bytes of encoded data.) In addition, most filters are defined so that the
> data is self-limiting; that is, they use  an encoding scheme  in which an explicit
> end-of-data  (EOD) marker delimits the extent of the data. Finally, streams are
> used to represent many objects from whose attributes a length can be inferred. All
> of these constraints must be consistent. 
> ...
> It indicates that most filters handles self-delimiting data ... thereby not requiring all filtering algorithm to support so.
> So, in order to explicitly set the Length value inside the stream dictionary, the filtering of content should be made prior to writing the dictionary.
> The current PDFBox behavior does the following:
> (see org.pdfbox.pdfwriter.COSWriter.visitFromStream(COSStream obj) at line 929:
> ...
>             InputStream input = obj.getFilteredStream();
>             // set the length of the stream and write stream dictionary
>             COSObject lengthObject = new COSObject( null );
>             
>             obj.setItem(COSName.LENGTH, lengthObject);
>             // write the stream content
>             visitFromDictionary( obj );
>             getStandardOutput().write(STREAM);
> ...
>             // writes the content
> ...
>             lengthObject.setObject( new COSInteger( totalAmountWritten ) );
>             getStandardOutput().writeCRLF();
>             getStandardOutput().write(ENDSTREAM);
>             getStandardOutput().writeEOL();
>             return null;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.