You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2022/05/14 02:31:00 UTC
[jira] [Updated] (PDFBOX-5433) PDFStreamEngine creating new operators that do not exist in document
[ https://issues.apache.org/jira/browse/PDFBOX-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated PDFBOX-5433:
------------------------------------
Attachment: screenshot-1.png
> PDFStreamEngine creating new operators that do not exist in document
> --------------------------------------------------------------------
>
> Key: PDFBOX-5433
> URL: https://issues.apache.org/jira/browse/PDFBOX-5433
> Project: PDFBox
> Issue Type: Bug
> Reporter: Mike Cantrell
> Priority: Major
> Attachments: pdfbox-stream-engine-operators.zip, screenshot-1.png
>
>
> We're using PDFStreamEngine to do some analysis and filtering (optimizations) to the document's content streams. I've found an odd case where a form giving us extra (unwanted) operators that don't exist in the original stream.
> According to the PDFDebugger, the form's stream has the following contents:
>
> {code:java}
> 0 TL
> q
> BT
> 1 0 0 rg
> 0 i
> /TT0 20 Tf
> 0 Tc
> 0 Tw
> 0 Ts
> 100 Tz
> 0 Tr
> 0 -15.791 TD
> (HOODHD035236) Tj
> ET
> Q{code}
> I created a debug utility to output the operators given by the PDFStreamEngine
> {code:java}
> @Getter
> static class StreamDebugger extends PDFStreamEngine {
> String formName;
> Operator operator;
> List<COSBase> operands;
> int operatorCount;
> public StreamDebugger() {
> addOperator(new BeginText());
> addOperator(new Concatenate());
> addOperator(new DrawObject()); // special text version
> addOperator(new EndText());
> addOperator(new SetGraphicsStateParameters());
> addOperator(new Save());
> addOperator(new Restore());
> addOperator(new NextLine());
> addOperator(new SetCharSpacing());
> addOperator(new MoveText());
> addOperator(new MoveTextSetLeading());
> addOperator(new SetFontAndSize());
> addOperator(new ShowText());
> addOperator(new ShowTextAdjusted());
> addOperator(new SetTextLeading());
> addOperator(new SetMatrix());
> addOperator(new SetTextRenderingMode());
> addOperator(new SetTextRise());
> addOperator(new SetWordSpacing());
> addOperator(new SetTextHorizontalScaling());
> addOperator(new ShowTextLine());
> addOperator(new ShowTextLineAndSpace());
> }
> @Override
> public void showForm(PDFormXObject form) throws IOException {
> this.formName = ((COSName) operands.get(0)).getName();
> super.showForm(form);
> this.formName = null;
> }
> @Override
> protected void processOperator(Operator operator, List<COSBase> operands) throws IOException {
> this.operator = operator;
> this.operands = operands;
> if (Objects.equals(this.formName, "Fm0")) {
> this.operatorCount++;
> System.out.printf("%s:%s%n", operator.getName(), operands.toString());
> }
> super.processOperator(operator, operands);
> }
> } {code}
> The resulting output:
> {code:java}
> TL:[COSInt{0}]
> q:[]
> BT:[]
> rg:[COSInt{1}, COSInt{0}, COSInt{0}]
> i:[COSInt{0}]
> Tf:[COSName{TT0}, COSInt{20}]
> Tc:[COSInt{0}]
> Tw:[COSInt{0}]
> Ts:[COSInt{0}]
> Tz:[COSInt{100}]
> Tr:[COSInt{0}]
> TD:[COSInt{0}, COSFloat{-15.791}]
> TL:[COSFloat{15.791}]
> Td:[COSInt{0}, COSFloat{-15.791}]
> Tj:[COSString{HOODHD035236}]
> ET:[]
> Q:[] {code}
> These operators do not exist in the original stream:
> {code:java}
> TL:[COSFloat{15.791}]
> Td:[COSInt{0}, COSFloat{-15.791}]{code}
> If you were to re-write the stream given the operators from the engine, it causes display issues in the resulting PDF.
> I'm attaching a test case which demonstrates the issue.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org