You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2015/07/20 22:43:04 UTC

[jira] [Comment Edited] (PDFBOX-2894) Remove COSStreamArray / SequenceRandomAccessRead

    [ https://issues.apache.org/jira/browse/PDFBOX-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634046#comment-14634046 ] 

John Hewson edited comment on PDFBOX-2894 at 7/20/15 8:42 PM:
--------------------------------------------------------------

I'm trying to solve some of the memory and performance issues with 2.0 by getting rid of SequenceRandomAccessRead, which necessitates removing COSStreamArray too. Otherwise I'd agree that we could postpone this.

It's a pretty simple change really, because we're replacing a very complex artificial abstraction with a simple one which represents the fact that pages can contain arrays of streams.

Also... we can't solve this problem by deprecating PDPage#getStream(), because that won't allow us to remove any of the classes I've mentioned. We'd just be adding yet another layer on top.


was (Author: jahewson):
I'm trying to solve some of the memory and performance issues with 2.0 by getting rid of SequenceRandomAccessRead, which necessitates removing COSStreamArray too. Otherwise I'd agree that we could postpone this.

It's a pretty simple change really, because we're replacing a very complex artificial abstraction with a simple one which represents the fact that pages can contain arrays of streams.

> Remove COSStreamArray / SequenceRandomAccessRead
> ------------------------------------------------
>
>                 Key: PDFBOX-2894
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2894
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>            Assignee: John Hewson
>             Fix For: 2.0.0
>
>
> This ties in with my COSStream simplification in PDFBOX-2893.
> COSStreamArray is a troublesome abstraction, it's not a real COS object and it's the only COS object which can be generated _after_ parsing. Look at the implementation of COSStreamArray, most methods throw an exception because it's _not_ a COSStream - it violates the contact of the very thing it claims to be. Even PDPageContentStream has to use instanceof to "peer through"  the abstraction of COSStreamArray.
> There's no reason to have this class, other than to duck-tape flaws in 1.8's APIs, namely that PDPage#getStream() returns a PDStream and PDFStreamParser expects a PDStream, yet both of these may be arrays of streams.
> We can fix this in 2.0 by getting rid of the erroneous PDPage#getStream() and by exposing the array of streams, rather than attempting to hide them. -This will also fix existing errors throughout the codebase which are associated with mistaking COSStreamArray for a COSStream.- We can still provide an InputStream API which abstracts over the array of streams, because there's nothing wrong with that - so users can have the same simple and convenient experience.
> An added benefit of doing this is that it will allow us to remove SequenceRandomAccessRead, a highly complex memory-holding class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org