You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Jira)" <ji...@apache.org> on 2020/07/18 10:36:00 UTC

[jira] [Comment Edited] (PDFBOX-4895) Faster COSNumber

    [ https://issues.apache.org/jira/browse/PDFBOX-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160400#comment-17160400 ] 

Andreas Lehmkühler edited comment on PDFBOX-4895 at 7/18/20, 10:35 AM:
-----------------------------------------------------------------------

I've reverted parts of the commit as it introduced a regression. It is limited to corner cases but in the end the result was wrong.

{quote}We should not waste time trying to parse a COSFloat because it will only result in -MAX_VALUE or +MAX_VALUE.{quote}
This assumption is wrong. A float is able to represent bigger values than a long, so that it is wrong to set those values to Float._MAX_VALUE by default

Have a look at the file attached to PDFBOX-4889. It has some (invalid) object numbers like 18446744073307448448 which are to big for a long but small enough to fit into a float. One can simply check the behaviour by running PDAcroFormFlattenTest. There is some debug output showing the used values




was (Author: lehmi):
I've reverted parts of the commit as it introduced a regression. It is limited to corner cases but in the end the result was wrong.

{quote}We should not waste time trying to parse a COSFloat because it will only result in -MAX_VALUE or +MAX_VALUE.{quote}
This assumption is wrong. A float is able to represent bigger values than a long, so that it is wrong to set those values to Float._MAX_VALUE

Have a look at the file attached to PDFBOX-4889. It has some (invalid) object numbers like 18446744073307448448 which are to big for a long but small enough to fit into a float. One can simply check the behaviour by running PDAcroFormFlattenTest. There is some debug output showing the used values



> Faster COSNumber
> ----------------
>
>                 Key: PDFBOX-4895
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4895
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 2.0.20, 3.0.0 PDFBox
>            Reporter: Alfred
>            Assignee: Tilman Hausherr
>            Priority: Trivial
>              Labels: Optimization
>             Fix For: 2.0.21, 3.0.0 PDFBox
>
>         Attachments: PDFBOX-4895-b.patch, PDFBOX-4895.patch
>
>
> A small improvement can be made to COSNumber when checking if it's float.
> Current version uses indexOf twice, to check for '.' or 'e'.
>  We can do that in one scan.
>  
> Each call will scan through the entire string.
>  We only have to scan until we find the chars, and stop if found.
>  
> I found while profiling the code that the method gets called a lot, so the improvement makes a a bit of a difference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org