You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2021/06/12 03:27:00 UTC

[jira] [Comment Edited] (PDFBOX-5209) Using Chinese character make the file size increases

    [ https://issues.apache.org/jira/browse/PDFBOX-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361042#comment-17361042 ] 

Tilman Hausherr edited comment on PDFBOX-5209 at 6/12/21, 3:26 AM:
-------------------------------------------------------------------

I looked at this again by playing with the {{CreateSimpleFormWithEmbeddedFont}} example, and now my analysis is a different one:
- AppearanceGeneratorHelper gets the font from the default resources
- The font is created with {{PDType0Font(COSDictionary fontDictionary)}} which does not intend to subset
- later subsetting isn't possible ({{willBeSubset()}} is false), because {{embedder}} is null

1)
One solution might be to construct the appearances yourself by using a modified version of {{AppearanceGeneratorHelper.setAppearanceValue()}}. This one should then use the font directly (with {{PDType0Font.load()}} instead from the resources (see the line {{PDFont font = defaultAppearance.getFont()}}), or by getting the actual font stream from the font in the resources (depends on the font type).

It might be possible if it is "your" document, i.e. you know what fonts you want to use. The font object(s) should be constructed in advance so that it/they can be used for all fields.

The {{PDDocument}} object is available from {{field.getAcroForm().getDocument()}}.

2)
A second solution could be that PDField accepts a PDFont object. I tried this and it seems to work:

at the beginning of {{AppearanceGeneratorHelper.setAppearanceContent()}} add this code:
{code}
if (field instanceof PDTextField)
{
	PDTextField textField = (PDTextField) field;
	if (textField.font != null)
	{
		defaultAppearance.setFont(textField.font);
	}
} 
{code}		
in PDTextField, add a field {{font}} and a setter.

In your application, set the font for all your fields, and at the end of your application, call {{font.subset()}} and then {{resources.getCOSObject().getCOSDictionary(COSName.FONT).removeItem(COSName.getPDFName(fontName))}} to remove the non-subsetted font from the AcroForm default resources. This is because of a flaw in this solution, that PDFBox still expects the font to be in the AcroForm default resources.


was (Author: tilman):
I looked at this again by playing with the {{CreateSimpleFormWithEmbeddedFont}} example, and now my analysis is a different one:
- AppearanceGeneratorHelper gets the font from the default resources
- The font is created with {{PDType0Font(COSDictionary fontDictionary)}} which does not intend to subset
- later subsetting isn't possible ({{willBeSubset()}} is false), because {{embedder}} is null

1)
One solution might be to construct the appearances yourself by using a modified version of {{AppearanceGeneratorHelper.setAppearanceValue()}}. This one should then use the font directly (with {{PDType0Font.load()}} instead from the resources (see the line {{PDFont font = defaultAppearance.getFont()}}), or by getting the actual font stream from the font in the resources (depends on the font type).

It might be possible if it is "your" document, i.e. you know what fonts you want to use. The font object(s) should be constructed in advance so that it/they can be used for all fields.

The {{PDDocument}} object is available from {{field.getAcroForm().getDocument()}}.

2)
A second solution could be that PDField accepts a PDFont object. I tried this and it seems to work:

at the beginning of {{AppearanceGeneratorHelper.setAppearanceContent()}} add this code:
{code}
if (field instanceof PDTextField)
{
	PDTextField textField = (PDTextField) field;
	if (textField.font != null)
	{
		defaultAppearance.setFont(textField.font);
	}
} 
{code}		
in PDTextField, add a field {{font}} and a setter.

In your application, set the font for all your fields, and at the end of your application, call {{font.subset()}} and then {{resources.getCOSObject().getCOSDictionary(COSName.FONT).removeItem(COSName.getPDFName(fontName))}} to remove the non-subsetted font from the default resources. This is because of a flaw in this solution, that PDFBox still expects the font to be in the default resources.

> Using Chinese character make the file size increases 
> -----------------------------------------------------
>
>                 Key: PDFBOX-5209
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5209
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: AcroForm
>    Affects Versions: 2.0.15
>         Environment: java jdk 1.8
>            Reporter: LI MING
>            Priority: Blocker
>              Labels: FileSize
>
> Like the title,we use Chinese Character to generate PDF form file ,it is successed.but the file size is larger than 10mb.except change the font file,Is there any other way we can solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org