You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Jira)" <ji...@apache.org> on 2022/03/05 14:14:00 UTC

[jira] [Comment Edited] (PDFBOX-5286) Runtime degredation in RC1 and alpha2

    [ https://issues.apache.org/jira/browse/PDFBOX-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501748#comment-17501748 ] 

Andreas Lehmkühler edited comment on PDFBOX-5286 at 3/5/22, 2:13 PM:
---------------------------------------------------------------------

The compression code mixed up indirect COSInteger which are the same but are holding the same value. The code which was removed when simplifying repaired that fault but transforming such indirect object into direct objects. 

My last commit fixes the root cause within the compression code. But in the end we have to think about how to handle different COSInteger objects holding the same value w.r.t. hashCode and equals. But we should discuss that on dev@


was (Author: lehmi):
The compression code mixed up indirect COSInteger which are the same but are holding the same value. The code which was removed when simplifying repaired that fault but transforming such indirect object into direct objects. 

My last commit fixes the root cause within the compression code. But in the end we have to think about how to handle different COSInteger objects olding the same value w.r.t. hashCode and equals. But we should discuss that on dev@

> Runtime degredation in RC1 and alpha2
> -------------------------------------
>
>                 Key: PDFBOX-5286
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5286
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Maruan Sahyoun
>            Priority: Critical
>         Attachments: flame-cpu-forward.html, flame-cpu-reverse.html, profiling.png
>
>
> working/reviewing PDFBOX-5068 and PDFBOX-5263 I've experiencing runtime issues for both 3.0.0-RC1 and 3.0.0-alpha2 when loading and saving a large PDF
> https://crossasia-books.ub.uni-heidelberg.de/xasia/reader/download/506/506-42-86246-2-10-20190822.pdf 
> ||version||runtime in millis||
> |2.0.24 |2076|
> |3.0.0-RC1 |219472|
> |3.0.0-alpha2 |282284|
> Basic test:
> {code:java}
> long start = System.currentTimeMillis();
> PDDocument pdf = Loader.loadPDF(new File("506-42-86246-2-10-20190822.pdf"));
> pdf.save(new NullOutputStream());
> pdf.close();        
> long end = System.currentTimeMillis();      
> System.out.println("Elapsed Time in milliseconds: "+ (end-start));     
> {code}
> with NullOuputStream
> {code:java}
> package org.apache.pdfbox;
> import java.io.IOException;
> import java.io.OutputStream;
> public class NullOutputStream extends OutputStream {
>     @Override
>     public void write(byte[] b) throws IOException {
>         // don't write anything
>     }
>     @Override
>     public void write(byte[] b, int off, int len) throws IOException {
>         // don't write anything
>     }
>     @Override
>     public void write(int b) throws IOException {
>         // don't write anything
>     }
> }
> {code}
> I've also running tests using JMH - they support these numbers. The difference in numbers for RC1/alpha2 are within a regular variation. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org