You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Nazar Dub (JIRA)" <ji...@apache.org> on 2017/09/15 13:21:00 UTC

[jira] [Updated] (PDFBOX-3931) Losing fonts (embedded subset) when merge documents with PDFMergerUtility

     [ https://issues.apache.org/jira/browse/PDFBOX-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nazar Dub updated PDFBOX-3931:
------------------------------
    Description: 
*Story:*
I want to merge two PdDocument with: 
{code:java}
PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
{code}
Both documents created from scratch in java. I open _PDPageContentStream_ for each document, add some text and then close _PDPageContentStream_. For each document I used _PdFont_ which declared by next code:
{code:java}
PDFont getFont(PdDocument document) {
    InputStream fontStream = Thread.currentThread().getContextClassLoader().getResourceAsStream("font/Calibri.ttf");
    return PDType0Font.load(ctx.getDocument(), fontStream, true);
}
// Note that subset flag is true
{code}
Then I merge documents:
{code:java}
PDFMergerUtility.appendDocument(document1, document2);
{code}
Then  close *document2*: 
{code:java}
document2.close();
{code}
And save *document1* to _OutputStream_:
{code:java}
document1.save(someOutputStream);
{code}

*Expected results:*
I get pdf file with all fonts embedded as subset.

*Actual result:*
Font is embeded correctly only for pages created with *document1*, pages created with *document2* are present, but no embed font for them. 
As a result if I open created pdf file in OS which has Calibri.ttf I see correct font on all pages, if Calibri.ttf is not exist  font is correct only on pages created with *document1*.

*Used workaround:*
I see that _PdDocument_ has field:
{code:java}
// fonts to subset before saving
private final Set<PDFont> fontsToSubset = new HashSet<PDFont>();
{code}
fonts are added to this field when client call:
{code:java}
PDPageContentStream#setFont(PdFont font, float fontSize)
{code}
and actual embedding happens in method:
{code:java}
PdDocument#save(OutputStream output);
{code}
In my example above, method *save* is never called for *document2*.
We append *docuement2* to *document1* and *save* only *document1*. 

I reviewed method:
{code:java}
PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
{code}
And I did not find that this method do something with *fontsToSubset* field.
So I create next method:    
{code:java}
@SuppressWarnings("unchecked")
private static void subsetFonts(final PDDocument document) {
    try {
        Field fontsToSubsetField = document.getClass().getDeclaredField("fontsToSubset");
        fontsToSubsetField.setAccessible(true);
        Set<PDFont> fontsToSubset = (Set<PDFont>) fontsToSubsetField.get(document);
        for (PDFont font : fontsToSubset) {
            font.subset();
        }
    } catch (NoSuchFieldException | IOException | IllegalAccessException | ClassCastException e) {
        LOGGER.warn("Error when subset embedded fonts into pdf document", e);
    }
}
{code}

And used it before merge documents:
{code:java}
subsetFonts(document2);
mergerUtility.appendDocument(document1, document2);
{code}
(I need to use some Reflection because *fontsToSubset* is private part of _PdDocument_)

I think other and maybe better option maybe:
{code:java}
docuement1.fontsToSubset.addAll(docuement2fontsToSubset);
{code}
But I did not tested this option. 

*Conclusion:*
I think this problem should be solved on library side in _PDFMergerUtility#appendDocument_ method, and not in client code. Or we should have javadoc which tells that we should use _PDFMergerUtility#appendDocument_ only for saved _PdDocument_

  was:
*Story:*
I want to merge two PdDocument with: 
{code:java}
PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
{code}
Both documents created from scratch in java. I open _PDPageContentStream_ for each document, add some text and then close _PDPageContentStream_. For each document I used _PdFont_ which declared by next code:
{code:java}
PDFont getFont(PdDocument document) {
InputStream fontStream = Thread.currentThread().getContextClassLoader().getResourceAsStream("font/Calibri.ttf");
return PDType0Font.load(ctx.getDocument(), fontStream, true);
}
// Note that subset flag is true
{code}
Then I merge documents:
{code:java}
PDFMergerUtility.appendDocument(document1, document2);
{code}
Then  close *document2*: 
{code:java}
document2.close();
{code}
And save *document1* to _OutputStream_:
{code:java}
document1.save(someOutputStream);
{code}

*Expected results:*
I get pdf file with all fonts embedded as subset.

*Actual result:*
Font is embeded correctly only for pages created with *document1*, pages created with *document2* are present, but no embed font for them. 
As a result if I open created pdf file in OS which has Calibri.ttf I see correct font on all pages, if Calibri.ttf is not exist  font is correct only on pages created with *document1*.

*Used workaround:*
I see that _PdDocument_ has field:
{code:java}
    // fonts to subset before saving
    private final Set<PDFont> fontsToSubset = new HashSet<PDFont>();
{code}
fonts are added to this field when client call:
{code:java}
PDPageContentStream#setFont(PdFont font, float fontSize)
{code}
and actual embedding happens in method:
{code:java}
PdDocument#save(OutputStream output);
{code}
In my example above, method *save* is never called for *document2*.
We append *docuement2* to *document1* and *save* only *document1*. 

I reviewed method:
{code:java}
PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
{code}
And I did not find that this method do something with *fontsToSubset* field.
So I create next method:    
{code:java}
@SuppressWarnings("unchecked")
    private static void subsetFonts(final PDDocument document) {
        try {
            Field fontsToSubsetField = document.getClass().getDeclaredField("fontsToSubset");
            fontsToSubsetField.setAccessible(true);
            Set<PDFont> fontsToSubset = (Set<PDFont>) fontsToSubsetField.get(document);
            for (PDFont font : fontsToSubset) {
                font.subset();
            }
        } catch (NoSuchFieldException | IOException | IllegalAccessException | ClassCastException e) {
            LOGGER.warn("Error when subset embedded fonts into pdf document", e);
        }
    }
{code}

And used it before merge documents:
{code:java}
subsetFonts(document2);
mergerUtility.appendDocument(document1, document2);
{code}
(I need to use some Reflection because *fontsToSubset* is private part of _PdDocument_)

I think other and maybe better option maybe:
{code:java}
docuement1.fontsToSubset.addAll(docuement2fontsToSubset);
{code}
But I did not tested this option. 

*Conclusion:*
I think this problem should be solved on library side in _PDFMergerUtility#appendDocument_ method, and not in client code. Or we should have javadoc which tells that we should use _PDFMergerUtility#appendDocument_ only for saved _PdDocument_


> Losing fonts (embedded subset) when merge documents with PDFMergerUtility
> -------------------------------------------------------------------------
>
>                 Key: PDFBOX-3931
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3931
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 2.0.7
>            Reporter: Nazar Dub
>
> *Story:*
> I want to merge two PdDocument with: 
> {code:java}
> PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
> {code}
> Both documents created from scratch in java. I open _PDPageContentStream_ for each document, add some text and then close _PDPageContentStream_. For each document I used _PdFont_ which declared by next code:
> {code:java}
> PDFont getFont(PdDocument document) {
>     InputStream fontStream = Thread.currentThread().getContextClassLoader().getResourceAsStream("font/Calibri.ttf");
>     return PDType0Font.load(ctx.getDocument(), fontStream, true);
> }
> // Note that subset flag is true
> {code}
> Then I merge documents:
> {code:java}
> PDFMergerUtility.appendDocument(document1, document2);
> {code}
> Then  close *document2*: 
> {code:java}
> document2.close();
> {code}
> And save *document1* to _OutputStream_:
> {code:java}
> document1.save(someOutputStream);
> {code}
> *Expected results:*
> I get pdf file with all fonts embedded as subset.
> *Actual result:*
> Font is embeded correctly only for pages created with *document1*, pages created with *document2* are present, but no embed font for them. 
> As a result if I open created pdf file in OS which has Calibri.ttf I see correct font on all pages, if Calibri.ttf is not exist  font is correct only on pages created with *document1*.
> *Used workaround:*
> I see that _PdDocument_ has field:
> {code:java}
> // fonts to subset before saving
> private final Set<PDFont> fontsToSubset = new HashSet<PDFont>();
> {code}
> fonts are added to this field when client call:
> {code:java}
> PDPageContentStream#setFont(PdFont font, float fontSize)
> {code}
> and actual embedding happens in method:
> {code:java}
> PdDocument#save(OutputStream output);
> {code}
> In my example above, method *save* is never called for *document2*.
> We append *docuement2* to *document1* and *save* only *document1*. 
> I reviewed method:
> {code:java}
> PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
> {code}
> And I did not find that this method do something with *fontsToSubset* field.
> So I create next method:    
> {code:java}
> @SuppressWarnings("unchecked")
> private static void subsetFonts(final PDDocument document) {
>     try {
>         Field fontsToSubsetField = document.getClass().getDeclaredField("fontsToSubset");
>         fontsToSubsetField.setAccessible(true);
>         Set<PDFont> fontsToSubset = (Set<PDFont>) fontsToSubsetField.get(document);
>         for (PDFont font : fontsToSubset) {
>             font.subset();
>         }
>     } catch (NoSuchFieldException | IOException | IllegalAccessException | ClassCastException e) {
>         LOGGER.warn("Error when subset embedded fonts into pdf document", e);
>     }
> }
> {code}
> And used it before merge documents:
> {code:java}
> subsetFonts(document2);
> mergerUtility.appendDocument(document1, document2);
> {code}
> (I need to use some Reflection because *fontsToSubset* is private part of _PdDocument_)
> I think other and maybe better option maybe:
> {code:java}
> docuement1.fontsToSubset.addAll(docuement2fontsToSubset);
> {code}
> But I did not tested this option. 
> *Conclusion:*
> I think this problem should be solved on library side in _PDFMergerUtility#appendDocument_ method, and not in client code. Or we should have javadoc which tells that we should use _PDFMergerUtility#appendDocument_ only for saved _PdDocument_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org