You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/06/10 19:23:08 UTC

[jira] [Comment Edited] (PDFBOX-2127) Optimize calls of getPixel in SampledImageReader and PDImageXObject

    [ https://issues.apache.org/jira/browse/PDFBOX-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026705#comment-14026705 ] 

John Hewson edited comment on PDFBOX-2127 at 6/10/14 5:22 PM:
--------------------------------------------------------------

{quote}
When calling Raster#getPixel() in a loop, it is a good practice to make sure that the result array is allocated only once. SampledImageReader#getStencilImage() and PDImageXObject#applyMask() fail to do that. When rendering the attached example, this results in allocating 24 053 760 arrays containing 3 floats, which is about 0.5GB of data if my math is right.
{quote}

The reason you're not seeing a bigger improvement is because the JVM is able to perform optimisations which weren't available when the AWT Raster API was designed. Once the code becomes "hot" and the JVM's optimisations kick in, inlining and escape analysis mean that the array of floats don't get allocated and element [0] can end even up being allocated directly to a CPU register. However, before the code becomes hot you will notice a performance difference, which is the case here - asymptotically it's not a big deal, but it's still helpful for the command-line tools if we be that little bit faster initially.


was (Author: jahewson):
{quote}
When calling Raster#getPixel() in a loop, it is a good practice to make sure that the result array is allocated only once. SampledImageReader#getStencilImage() and PDImageXObject#applyMask() fail to do that. When rendering the attached example, this results in allocating 24 053 760 arrays containing 3 floats, which is about 0.5GB of data if my math is right.
{quote}

The reason you're not seeing a bigger improvement is because the JVM is able to perform optimisations which weren't available when the AWT Raster API was designed. Once the code becomes "hot" and the JVMs optimisations kick in inlining and escape analysis mean that the array of floats don't get allocated and element [0] can end even up being allocated directly to a CPU register. However, before the code becomes hot you will notice a performance difference, which is the case here - asymptotically it's not a big deal, but it's still helpful for the command-line tools if we be that little bit faster initially.

> Optimize calls of getPixel in SampledImageReader and PDImageXObject
> -------------------------------------------------------------------
>
>                 Key: PDFBOX-2127
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2127
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Petr Slaby
>            Assignee: Andreas Lehmkühler
>             Fix For: 2.0.0
>
>         Attachments: 000048.pdf, GetPixel.patch
>
>
> When calling Raster#getPixel() in a loop, it is a good practice to make sure that the result array is allocated only once. SampledImageReader#getStencilImage() and PDImageXObject#applyMask() fail to do that. When rendering the attached example, this results in allocating 24 053 760 arrays containing 3 floats, which is about 0.5GB of data if my math is right. Also, I have noticed that SampledImageReader#getStencilImage() reads and sets the same data w/o modification if alpha of a pixel != 255. 
> After applying the attached patch, rendering time of the document drops from 8.5s to 7.4s. Not as much as I have expected - array allocation and the garbage collector seem to be fast - but still...
> Note: Rendering of the document is wrong, it does not find some of its fonts, but that is irrelevant for this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)