You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Craig Ringer (Created) (JIRA)" <ji...@apache.org> on 2012/03/19 14:15:38 UTC

[jira] [Created] (PDFBOX-1263) [PATCH] Rewrite Overlay.java's stream rewriting and rsrc dict renaming to use PDFStreamProcessor

[PATCH] Rewrite Overlay.java's stream rewriting and rsrc dict renaming to use PDFStreamProcessor
------------------------------------------------------------------------------------------------

                 Key: PDFBOX-1263
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1263
             Project: PDFBox
          Issue Type: Improvement
          Components: Utilities
    Affects Versions: 1.7.0
         Environment: N/A
            Reporter: Craig Ringer
            Priority: Minor


The attached patch reworks the handling of content stream rewriting for resource dictionary clash avoidance in Overlay.java .

Prior to this patch, Overlay appends "overlay" to all names in the Font, XObject and ExGState resource dictionaries, then rewrites content stream(s) in the overlay PDF to reference those new names using a simple hand-rolled content-stream find-and-replace process. It doesn't check for over-length names, and it doesn't check to make sure that the newly generated name(s) don't clash. Because PDFs often use the same names for objects, this quickly becomes a problem when you're doing multiple overlays - something that becomes more likely with https://issues.apache.org/jira/browse/PDFBOX-1255 but is already useful to do with stock PDFBox.

This patch alters Overlay so that it only renames objects from the overlay PDF when there is a conflict with the PDF being overlaid upon. It also uses a name generation strategy that checks for conflicts and for over-length names, so multiple overlays will work much better. The patch uses the PDFStreamProcessor (a simplified base extracted from PDFStreamEngine by https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from the PDF to overlay to a PDFStreamWriter. It checks for names that reference renamed resources and substitutes the new name before writing each operator and its arguments to the output stream.

The main benefit of this patch is that it enables multiple overlays without name clashes.

A secondary benefit of this patch is that it eliminates Overlay.java -specific code in favour of using facilities provided by the rest of PDFBox. That makes Overlay a better example, helps it test the rest of PDFBox better, and makes it benefit from improvements in PDFBox's stream processor and writer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1263) [PATCH] Rewrite Overlay.java's stream rewriting and rsrc dict renaming to use PDFStreamProcessor

Posted by "Craig Ringer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Craig Ringer updated PDFBOX-1263:
---------------------------------

    Description: 
The attached patch reworks the handling of content stream rewriting for resource dictionary clash avoidance in Overlay.java .

Prior to this patch, Overlay appends "overlay" to all names in the Font, XObject and ExGState resource dictionaries, then rewrites content stream(s) in the overlay PDF to reference those new names using a simple hand-rolled content-stream find-and-replace process. It doesn't check for over-length names, and it doesn't check to make sure that the newly generated name(s) don't clash. Because PDFs often use the same names for objects, this quickly becomes a problem when you're doing multiple overlays - something that becomes more likely with https://issues.apache.org/jira/browse/PDFBOX-1255 but is already useful to do with stock PDFBox.

This patch alters Overlay so that it only renames objects from the overlay PDF when there is a conflict with the PDF being overlaid upon. It also uses a name generation strategy that checks for conflicts and for over-length names, so multiple overlays will work much better. The patch uses the PDFStreamProcessor (a simplified base extracted from PDFStreamEngine by https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from the PDF to overlay to a ContentStreamWriter. It checks for names that reference renamed resources and substitutes the new name before writing each operator and its arguments to the output stream.

The main benefit of this patch is that it enables multiple overlays without name clashes.

A secondary benefit of this patch is that it eliminates Overlay.java -specific code in favour of using facilities provided by the rest of PDFBox. That makes Overlay a better example, helps it test the rest of PDFBox better, and makes it benefit from improvements in PDFBox's stream processor and writer.

  was:
The attached patch reworks the handling of content stream rewriting for resource dictionary clash avoidance in Overlay.java .

Prior to this patch, Overlay appends "overlay" to all names in the Font, XObject and ExGState resource dictionaries, then rewrites content stream(s) in the overlay PDF to reference those new names using a simple hand-rolled content-stream find-and-replace process. It doesn't check for over-length names, and it doesn't check to make sure that the newly generated name(s) don't clash. Because PDFs often use the same names for objects, this quickly becomes a problem when you're doing multiple overlays - something that becomes more likely with https://issues.apache.org/jira/browse/PDFBOX-1255 but is already useful to do with stock PDFBox.

This patch alters Overlay so that it only renames objects from the overlay PDF when there is a conflict with the PDF being overlaid upon. It also uses a name generation strategy that checks for conflicts and for over-length names, so multiple overlays will work much better. The patch uses the PDFStreamProcessor (a simplified base extracted from PDFStreamEngine by https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from the PDF to overlay to a PDFStreamWriter. It checks for names that reference renamed resources and substitutes the new name before writing each operator and its arguments to the output stream.

The main benefit of this patch is that it enables multiple overlays without name clashes.

A secondary benefit of this patch is that it eliminates Overlay.java -specific code in favour of using facilities provided by the rest of PDFBox. That makes Overlay a better example, helps it test the rest of PDFBox better, and makes it benefit from improvements in PDFBox's stream processor and writer.

    
> [PATCH] Rewrite Overlay.java's stream rewriting and rsrc dict renaming to use PDFStreamProcessor
> ------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1263
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1263
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.7.0
>         Environment: N/A
>            Reporter: Craig Ringer
>            Priority: Minor
>              Labels: newbie, overlay, patch, refactoring
>         Attachments: 0003-Major-rework-of-Overlay.java-to-use-PDFStreamProcess.patch
>
>
> The attached patch reworks the handling of content stream rewriting for resource dictionary clash avoidance in Overlay.java .
> Prior to this patch, Overlay appends "overlay" to all names in the Font, XObject and ExGState resource dictionaries, then rewrites content stream(s) in the overlay PDF to reference those new names using a simple hand-rolled content-stream find-and-replace process. It doesn't check for over-length names, and it doesn't check to make sure that the newly generated name(s) don't clash. Because PDFs often use the same names for objects, this quickly becomes a problem when you're doing multiple overlays - something that becomes more likely with https://issues.apache.org/jira/browse/PDFBOX-1255 but is already useful to do with stock PDFBox.
> This patch alters Overlay so that it only renames objects from the overlay PDF when there is a conflict with the PDF being overlaid upon. It also uses a name generation strategy that checks for conflicts and for over-length names, so multiple overlays will work much better. The patch uses the PDFStreamProcessor (a simplified base extracted from PDFStreamEngine by https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from the PDF to overlay to a ContentStreamWriter. It checks for names that reference renamed resources and substitutes the new name before writing each operator and its arguments to the output stream.
> The main benefit of this patch is that it enables multiple overlays without name clashes.
> A secondary benefit of this patch is that it eliminates Overlay.java -specific code in favour of using facilities provided by the rest of PDFBox. That makes Overlay a better example, helps it test the rest of PDFBox better, and makes it benefit from improvements in PDFBox's stream processor and writer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1263) [PATCH] Rewrite Overlay.java's stream rewriting and rsrc dict renaming to use PDFStreamProcessor

Posted by "Craig Ringer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Craig Ringer updated PDFBOX-1263:
---------------------------------

    Description: 
The attached patch reworks the handling of content stream rewriting for resource dictionary clash avoidance in Overlay.java .

Prior to this patch, Overlay appends "overlay" to all names in the Font, XObject and ExGState resource dictionaries, then rewrites content stream(s) in the overlay PDF to reference those new names using a simple hand-rolled content-stream find-and-replace process. It doesn't check for over-length names, and it doesn't check to make sure that the newly generated name(s) don't clash. Because PDFs often use the same names for objects, this quickly becomes a problem when you're doing multiple overlays - something that becomes more likely with https://issues.apache.org/jira/browse/PDFBOX-1255 but is already useful to do with stock PDFBox.

This patch alters Overlay so that it only renames objects from the overlay PDF when there is a conflict with the PDF being overlaid upon. It also uses a name generation strategy that checks for conflicts and for over-length names, so multiple overlays will work much better. The patch uses the PDFStreamProcessor (a simplified base extracted from PDFStreamEngine by https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from the PDF to overlay to a ContentStreamWriter. It checks for names that reference renamed resources and substitutes the new name before writing each operator and its arguments to the output stream.

The main benefit of this patch is that it enables multiple overlays without name clashes.

A secondary benefit of this patch is that it eliminates Overlay.java -specific code in favour of using facilities provided by the rest of PDFBox. That makes Overlay a better example, helps it test the rest of PDFBox better, and makes it benefit from improvements in PDFBox's stream processor and writer.

Depends on prior patches in series:
https://issues.apache.org/jira/browse/PDFBOX-1256
https://issues.apache.org/jira/browse/PDFBOX-1255

  was:
The attached patch reworks the handling of content stream rewriting for resource dictionary clash avoidance in Overlay.java .

Prior to this patch, Overlay appends "overlay" to all names in the Font, XObject and ExGState resource dictionaries, then rewrites content stream(s) in the overlay PDF to reference those new names using a simple hand-rolled content-stream find-and-replace process. It doesn't check for over-length names, and it doesn't check to make sure that the newly generated name(s) don't clash. Because PDFs often use the same names for objects, this quickly becomes a problem when you're doing multiple overlays - something that becomes more likely with https://issues.apache.org/jira/browse/PDFBOX-1255 but is already useful to do with stock PDFBox.

This patch alters Overlay so that it only renames objects from the overlay PDF when there is a conflict with the PDF being overlaid upon. It also uses a name generation strategy that checks for conflicts and for over-length names, so multiple overlays will work much better. The patch uses the PDFStreamProcessor (a simplified base extracted from PDFStreamEngine by https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from the PDF to overlay to a ContentStreamWriter. It checks for names that reference renamed resources and substitutes the new name before writing each operator and its arguments to the output stream.

The main benefit of this patch is that it enables multiple overlays without name clashes.

A secondary benefit of this patch is that it eliminates Overlay.java -specific code in favour of using facilities provided by the rest of PDFBox. That makes Overlay a better example, helps it test the rest of PDFBox better, and makes it benefit from improvements in PDFBox's stream processor and writer.

    
> [PATCH] Rewrite Overlay.java's stream rewriting and rsrc dict renaming to use PDFStreamProcessor
> ------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1263
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1263
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.7.0
>         Environment: N/A
>            Reporter: Craig Ringer
>            Priority: Minor
>              Labels: newbie, overlay, patch, refactoring
>         Attachments: 0003-Major-rework-of-Overlay.java-to-use-PDFStreamProcess.patch
>
>
> The attached patch reworks the handling of content stream rewriting for resource dictionary clash avoidance in Overlay.java .
> Prior to this patch, Overlay appends "overlay" to all names in the Font, XObject and ExGState resource dictionaries, then rewrites content stream(s) in the overlay PDF to reference those new names using a simple hand-rolled content-stream find-and-replace process. It doesn't check for over-length names, and it doesn't check to make sure that the newly generated name(s) don't clash. Because PDFs often use the same names for objects, this quickly becomes a problem when you're doing multiple overlays - something that becomes more likely with https://issues.apache.org/jira/browse/PDFBOX-1255 but is already useful to do with stock PDFBox.
> This patch alters Overlay so that it only renames objects from the overlay PDF when there is a conflict with the PDF being overlaid upon. It also uses a name generation strategy that checks for conflicts and for over-length names, so multiple overlays will work much better. The patch uses the PDFStreamProcessor (a simplified base extracted from PDFStreamEngine by https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from the PDF to overlay to a ContentStreamWriter. It checks for names that reference renamed resources and substitutes the new name before writing each operator and its arguments to the output stream.
> The main benefit of this patch is that it enables multiple overlays without name clashes.
> A secondary benefit of this patch is that it eliminates Overlay.java -specific code in favour of using facilities provided by the rest of PDFBox. That makes Overlay a better example, helps it test the rest of PDFBox better, and makes it benefit from improvements in PDFBox's stream processor and writer.
> Depends on prior patches in series:
> https://issues.apache.org/jira/browse/PDFBOX-1256
> https://issues.apache.org/jira/browse/PDFBOX-1255

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1263) [PATCH] Rewrite Overlay.java's stream rewriting and rsrc dict renaming to use PDFStreamProcessor

Posted by "Craig Ringer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Craig Ringer updated PDFBOX-1263:
---------------------------------

    Attachment: 0003-Major-rework-of-Overlay.java-to-use-PDFStreamProcess.patch

Proposed patch. Many of the insertions are JavaDoc changes and enhancements; the actual LoC is much the same if not slightly reduced, there's just more JavaDoc there. The structure of the inner class etc adds some too.

The changes to ContentStreamWriter are the addition of a method to write a single token to the stream, and the addition of explicit control over whether it flushes the stream after every write operation or not. It used to always flush, which was slow and usually unnecessary; now it does by default but that can be controlled at construction time. 

Diffstat:

 .../src/main/java/org/apache/pdfbox/Overlay.java   |  555 ++++++++++++++------
 .../pdfbox/pdfwriter/ContentStreamWriter.java      |   47 ++-
 2 files changed, 437 insertions(+), 165 deletions(-)


                
> [PATCH] Rewrite Overlay.java's stream rewriting and rsrc dict renaming to use PDFStreamProcessor
> ------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1263
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1263
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.7.0
>         Environment: N/A
>            Reporter: Craig Ringer
>            Priority: Minor
>              Labels: newbie, overlay, patch, refactoring
>         Attachments: 0003-Major-rework-of-Overlay.java-to-use-PDFStreamProcess.patch
>
>
> The attached patch reworks the handling of content stream rewriting for resource dictionary clash avoidance in Overlay.java .
> Prior to this patch, Overlay appends "overlay" to all names in the Font, XObject and ExGState resource dictionaries, then rewrites content stream(s) in the overlay PDF to reference those new names using a simple hand-rolled content-stream find-and-replace process. It doesn't check for over-length names, and it doesn't check to make sure that the newly generated name(s) don't clash. Because PDFs often use the same names for objects, this quickly becomes a problem when you're doing multiple overlays - something that becomes more likely with https://issues.apache.org/jira/browse/PDFBOX-1255 but is already useful to do with stock PDFBox.
> This patch alters Overlay so that it only renames objects from the overlay PDF when there is a conflict with the PDF being overlaid upon. It also uses a name generation strategy that checks for conflicts and for over-length names, so multiple overlays will work much better. The patch uses the PDFStreamProcessor (a simplified base extracted from PDFStreamEngine by https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from the PDF to overlay to a PDFStreamWriter. It checks for names that reference renamed resources and substitutes the new name before writing each operator and its arguments to the output stream.
> The main benefit of this patch is that it enables multiple overlays without name clashes.
> A secondary benefit of this patch is that it eliminates Overlay.java -specific code in favour of using facilities provided by the rest of PDFBox. That makes Overlay a better example, helps it test the rest of PDFBox better, and makes it benefit from improvements in PDFBox's stream processor and writer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira