You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Nazar Dub (JIRA)" <ji...@apache.org> on 2017/09/15 13:07:00 UTC

[jira] [Created] (PDFBOX-3931) Losing fonts (embedded subset) when merge documents with PDFMergerUtility

Nazar Dub created PDFBOX-3931:
---------------------------------

             Summary: Losing fonts (embedded subset) when merge documents with PDFMergerUtility
                 Key: PDFBOX-3931
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3931
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel, Utilities
    Affects Versions: 2.0.7
            Reporter: Nazar Dub


*Story:*
I want to merge two PdDocument with: 
{code:java}
PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
{code}
Both documents created from scratch in java. I open _PDPageContentStream_ for each document, add some text and then close _PDPageContentStream_. For each document I used _PdFont_ which declared by next code:
{code:java}
PDFont getFont(PdDocument document) {
InputStream fontStream = Thread.currentThread().getContextClassLoader().getResourceAsStream("font/Calibri.ttf");
return PDType0Font.load(ctx.getDocument(), fontStream, true);
}
// Note that subset flag is true
{code}
Then I merge documents:
{code:java}
PDFMergerUtility.appendDocument(document1, document2);
{code}
Then  close *document2*: 
{code:java}
document2.close();
{code}
And save *document1* to _OutputStream_:
{code:java}
document1.save(someOutputStream);
{code}

*Expected results:*
I get pdf file with all fonts embedded as subset.

*Actual result:*
Font is embeded correctly only for pages created with *document1*, pages created with *document2* are present, but no embed font for them. 
As a result if I open created pdf file in OS which has Calibri.ttf I see correct font on all pages, if Calibri.ttf is not exist  font is correct only on pages created with *document1*.

*Used workaround:*
I see that _PdDocument_ has field:
{code:java}
    // fonts to subset before saving
    private final Set<PDFont> fontsToSubset = new HashSet<PDFont>();
{code}
fonts are added to this field when client call:
{code:java}
PDPageContentStream#setFont(PdFont font, float fontSize)
{code}
and actual embedding happens in method:
{code:java}
PdDocument#save(OutputStream output);
{code}
In my example above, method *save* is never called for *document2*.
We append *docuement2* to *document1* and *save* only *document1*. 

I reviewed method:
{code:java}
PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
{code}
And I did not find that this method do something with *fontsToSubset* field.
So I create next method:    
{code:java}
@SuppressWarnings("unchecked")
    private static void subsetFonts(final PDDocument document) {
        try {
            Field fontsToSubsetField = document.getClass().getDeclaredField("fontsToSubset");
            fontsToSubsetField.setAccessible(true);
            Set<PDFont> fontsToSubset = (Set<PDFont>) fontsToSubsetField.get(document);
            for (PDFont font : fontsToSubset) {
                font.subset();
            }
        } catch (NoSuchFieldException | IOException | IllegalAccessException | ClassCastException e) {
            LOGGER.warn("Error when subset embedded fonts into pdf document", e);
        }
    }
{code}

And used it before merge documents:
{code:java}
subsetFonts(document2);
mergerUtility.appendDocument(document1, document2);
{code}
(I need to use some Reflection because *fontsToSubset* is private part of _PdDocument_)

I think other and maybe better option maybe:
{code:java}
docuement1.fontsToSubset.addAll(docuement2fontsToSubset);
{code}
But I did not tested this option. 

*Conclusion:*
I think this problem should be solved on library side in _PDFMergerUtility#appendDocument_ method, and not in client code. Or we should have javadoc which tells that we should use _PDFMergerUtility#appendDocument_ only for saved _PdDocument_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org