You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/04/03 20:03:16 UTC

[jira] [Comment Edited] (PDFBOX-1996) PDSeparation optimization

    [ https://issues.apache.org/jira/browse/PDFBOX-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959039#comment-13959039 ] 

John Hewson edited comment on PDFBOX-1996 at 4/3/14 6:01 PM:
-------------------------------------------------------------

There are only 256 values (including zero) and separations are always single-color so this can be safely cached. Storing a 1-element int[] in a HashMap doesn't seem like the right choice though, as an array is an object with at least a pointer and a length to store, so this is going to have more memory overhead than storing just a boxed Integer.

As a rough estimate, a HashMap<Integer, int[]> is going to need at least 4 bytes for the Integer object pointer and 4 bytes for its int value. For the int[] there will be at least 4 bytes for the object pointer (arrays are objects) plus 4 bytes for the single int value, plus 4 bytes for the array's length. So we're looking at maybe 20 bytes per entry, around 5KB if the cache is full (not too bad). It's also going to spend time doing memory allocations and computing hashes.

Instead, consider a 256-element byte array which would have a fixed overhead of just 256 + 8 bytes and benefit from not having to do hash computations to perform a lookup.



was (Author: jahewson):
There are only 256 values (including zero) and separations are always single-color so this can be safely cached. Storing a 1-element int[] in a HashMap doesn't seem like the right choice though, as an array is an object with at least a pointer and a length to store, so this is going to have more memory overhead than storing just a boxed Integer.

As a rough estimate, a HashMap<Integer, int[]> is going to need at least 4 bytes for the Integer object pointer and 4 bytes for its int value. For the int[] there will be at least 4 bytes for the object pointer (arrays are objects) plus 4 bytes for the single int value, plus 4 bytes for the array's length. So we're looking at maybe 20 bytes per entry, around 5KB if the cache is full (not bad). It's also going to spend time doing memory allocations and computing hashes.

A 256-element byte array would a fixed overhead of just 256 + 8 bytes and benefit from not having to do hash computations to perform a lookup.


> PDSeparation optimization
> -------------------------
>
>                 Key: PDFBOX-1996
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1996
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Rendering
>    Affects Versions: 2.0.0
>            Reporter: Dave Smith
>            Priority: Minor
>         Attachments: pdfbox.patch, pdfbox.patch
>
>
> I have a 4 page black and white pdf that takes 32 seconds (8 seconds a page) to render. It uses a Separation color space and it has to run numerous functions per pixel that is causing the slow down. I have a patch where I pre calculate the black and white pixels and cache them instead of calculating them every time. This optimization gets the page rendering down to less than a second a page. I will attach my patch. I could see going forward caching all calculated colours , but floats in hash maps are tricky.



--
This message was sent by Atlassian JIRA
(v6.2#6252)