You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/10/11 01:51:34 UTC

[jira] [Comment Edited] (PDFBOX-2092) Very slow rendering of scanned document

    [ https://issues.apache.org/jira/browse/PDFBOX-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026880#comment-14026880 ] 

John Hewson edited comment on PDFBOX-2092 at 10/10/14 11:50 PM:
----------------------------------------------------------------

To answer Tilman's question:
{quote}
And would this packed raster method also work with an image that has more than 4 color components? (Don't know if the spec allows such images, but we have a PDF with a DeviceN colorspace with 6 elements, but no image)
{quote}

No, it wouldn't as packed rasters have a 32-bit limit, which is 4x 8-bit color components. DeviceN often has 6 components and can have up to 32 according to the spec. In addition, the other color spaces expect the raster passed to their toImageRGB implementation to be banded and not packed.

Petr noticed this also:

{quote}
But it is just a quick and dirty test to see the effect on performance. SampledImageReader creates a packed raster and fills it in from8bitx(). I have changed just the color space class PDDeviceRGB to expect the different type of raster, all other color space classes would need to be adjusted, too.
{quote}

The problem is that it's not possible for color spaces with more than 4 8-bit components to use a packed raster, so we can't handle DeviceN. What we could do though is offer a fast path for images with <= 4 components where we generate a packed raster, that way only DeviceN needs to handle a banded raster. So Petr is quite right:

{quote}
a banded raster would still need to come into PDDeviceN#toRGBImage(), but what comes out can be a packed raster with DirectColorModel.
{quote}

But... a DeviceN color space can output to another DeviceN color space, so it is in theory possible to have a DeviceN with say 10 channels that outputs to 6 channels, which outputs to CMYK, then RGB (phew!). So it's strictly necessary to support DeviceN outputting to a banded raster if the output color space is another DeviceN space (as the existing code does). Alternatively we could choose not to support DeviceN -> DeviceN because it's probably an extreme edge-case.

In summary, in order to use packed rasters:
- SampledImageReader must read a packed raster for images with <= 4 components, it will need to do this in both from8bit and fromAny. Banded rasters will still be read for DeviceN.
- all color spaces with <= 4 components can switch to expecting a packed raster, no need to handle banded any more
- DeviceN will read from a banded raster as is currently the case, and must instead output a packed raster (as long as the output is not another DeviceN space, in which case the current code can be used).
- it's possible to optimise DeviceN spaces with <= 4 components (i.e., most) to use packed rasters also, it's just a lot of work.

Note that the first three points _must_ be addressed in order for any patch using packed rasters to be applied without breaking other color spaces.


was (Author: jahewson):
To answer Tilman's question:
{quote}
And would this packed raster method also work with an image that has more than 4 color components? (Don't know if the spec allows such images, but we have a PDF with a DeviceN colorspace with 6 elements, but no image)
{quote}

No, it wouldn't as packed rasters have a 32-bit limit, which is 4 8-bit color components. DeviceN often has 6 components and can have up to 32 according to the spec. In addition, the other color spaces expect the raster passed to their toImageRGB implementation to be banded and not packed.

Petr noticed this also:

{quote}
But it is just a quick and dirty test to see the effect on performance. SampledImageReader creates a packed raster and fills it in from8bitx(). I have changed just the color space class PDDeviceRGB to expect the different type of raster, all other color space classes would need to be adjusted, too.
{quote}

The problem is that it's not possible for color spaces with more than 4 8-bit components to use a packed raster, so we can't handle DeviceN. What we could do though is offer a fast path for images with <= 4 components where we generate a packed raster, that way only DeviceN needs to handle a banded raster. So Petr is quite right:

{quote}
a banded raster would still need to come into PDDeviceN#toRGBImage(), but what comes out can be a packed raster with DirectColorModel.
{quote}

But... a DeviceN color space can output to another DeviceN color space, so it is in theory possible to have a DeviceN with say 10 channels that outputs to 6 channels, which outputs to CMYK, then RGB (phew!). So it's strictly necessary to support DeviceN outputting to a banded raster if the output color space is another DeviceN space (as the existing code does). Alternatively we could choose not to support DeviceN -> DeviceN because it's probably an extreme edge-case.

In summary, in order to use packed rasters:
- SampledImageReader must read a packed raster for images with <= 4 components, it will need to do this in both from8bit and fromAny. Banded rasters will still be read for DeviceN.
- all color spaces with <= 4 components can switch to expecting a packed raster, no need to handle banded any more
- DeviceN will read from a banded raster as is currently the case, and must instead output a packed raster (as long as the output is not another DeviceN space, in which case the current code can be used).
- it's possible to optimise DeviceN spaces with <= 4 components (i.e., most) to use packed rasters also, it's just a lot of work.

Note that the first three points _must_ be addressed in order for any patch using packed rasters to be applied without breaking other color spaces.

> Very slow rendering of scanned document
> ---------------------------------------
>
>                 Key: PDFBOX-2092
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2092
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Rendering
>    Affects Versions: 2.0.0
>         Environment: Win7 x64 EN
> JDK6,JDK7,JDK8
>            Reporter: Juraj Lonc
>         Attachments: PDFBOX-2092.patch, SCAN_20140522_160457490_page2.pdf
>
>
> It takes extremely long to render this file to image.
> Depends on computer but it can take 15s+ to render 1 page.
> When I skip drawing of inserted image /Im0, then rendering is fast. So there is something wrong with drawing that image in
> {code}
> PageDrawer.drawImage(Image awtImage, AffineTransform at)
> {code}
> when I comment out line 
> {code}
> graphics.drawImage(awtImage, imageTransform, null);
> {code}
> then rendering process takes 6s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)