You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michał Pomarański (JIRA)" <ji...@apache.org> on 2019/02/20 14:51:00 UTC

[jira] [Updated] (PDFBOX-4470) Red areas around text when converting a pdf to png with pdfbox

     [ https://issues.apache.org/jira/browse/PDFBOX-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michał Pomarański updated PDFBOX-4470:
--------------------------------------
    Description: 
I'm trying to convert a pdf to png file using pdfbox. Unfortunately in the result I get weird red areas in some places of the output. I'm not sure what's the problem. It's a problem with only some of the pdf files.

Here's some of the code that I'm using:
{code:java}
    public static BufferedImage generateFromPdf(String ref, InputStream stream, int pageIndex, PreviewMode mode) throws IOException {
         PDDocument doc = null;
         try (InputStream buffered = new BufferedInputStream(stream)) {
             doc = PDDocument.load(buffered, PDF_LOADING_MEMORY_SETTING);
             if (pageIndex > doc.getNumberOfPages())
{                 return null;             }
            PDFRenderer renderer = new PDFRenderer(doc);
             return rasterizePdfBox(ref, pageIndex, renderer, mode);
         } finally {
             if (doc != null)
{                 doc.close();             }
        }
     }
{code}

 and then:
{code:java}
    private static BufferedImage rasterizePdfBox(String ref, int pageIndex, PDFRenderer renderer, PreviewMode mode) throws IOException {
         Future<BufferedImage> result = executorService.submit(() ->
{             LOGGER.info(String.format("Generate preview for ref: %s, page: %s, mode: %s ", ref, pageIndex, mode.name()));             return renderer.renderImageWithDPI(pageIndex - 1, mode.getDpi(), ImageType.RGB);         }
);
        try
{             return result.get();         }
catch (InterruptedException | ExecutionException e)
{             LOGGER.error(String.format("Error when generating preview: %s", e.getMessage()));             Thread.currentThread().interrupt();             throw new IOException(e.getMessage());         }
    }
{code}

 ```

So far I've only figured out that the places which are red in the output are blank when I open them in `Master PDF editor` on linux. They seem normal though when I open them with `Document Viewer`.

Some hints:
 - the pdfs with problems have been scanned. I can select text around the working parts but not at the places that have red overlay over them. Maybe it's something to do with OCR issues?
 - if I use the linux tool `convert not-working-pdf.pdf converted.pdf` and then try to convert this file to png, then the issue is not there anymore.

  was:
I'm trying to convert a pdf to png file using pdfbox. Unfortunately in the result I get weird red areas in some places of the output. I'm not sure what's the problem. It's a problem with only some of the pdf files.

Here's some of the code that I'm using:
```java
    public static BufferedImage generateFromPdf(String ref, InputStream stream, int pageIndex, PreviewMode mode) throws IOException {
        PDDocument doc = null;
        try (InputStream buffered = new BufferedInputStream(stream)) {
            doc = PDDocument.load(buffered, PDF_LOADING_MEMORY_SETTING);
            if (pageIndex > doc.getNumberOfPages()) {
                return null;
            }
            PDFRenderer renderer = new PDFRenderer(doc);
            return rasterizePdfBox(ref, pageIndex, renderer, mode);
        } finally {
            if (doc != null) {
                doc.close();
            }
        }
    }
```
and then:
```java
    private static BufferedImage rasterizePdfBox(String ref, int pageIndex, PDFRenderer renderer, PreviewMode mode) throws IOException {
        Future<BufferedImage> result = executorService.submit(() -> {
            LOGGER.info(String.format("Generate preview for ref: %s, page: %s, mode: %s ", ref, pageIndex, mode.name()));
            return renderer.renderImageWithDPI(pageIndex - 1, mode.getDpi(), ImageType.RGB);
        });

        try {
            return result.get();
        } catch (InterruptedException | ExecutionException e) {
            LOGGER.error(String.format("Error when generating preview: %s", e.getMessage()));
            Thread.currentThread().interrupt();
            throw new IOException(e.getMessage());
        }
    }
```

So far I've only figured out that the places which are red in the output are blank when I open them in `Master PDF editor` on linux. They seem normal though when I open them with `Document Viewer`.

Some hints:
- the pdfs with problems have been scanned. I can select text around the working parts but not at the places that have red overlay over them. Maybe it's something to do with OCR issues?
- if I use the linux tool `convert not-working-pdf.pdf converted.pdf` and then try to convert this file to png, then the issue is not there anymore.


> Red areas around text when converting a pdf to png with pdfbox
> --------------------------------------------------------------
>
>                 Key: PDFBOX-4470
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4470
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.13
>            Reporter: Michał Pomarański
>            Priority: Major
>         Attachments: 1206264-michael-sims-onboarding-documents-248 (1).pdf
>
>
> I'm trying to convert a pdf to png file using pdfbox. Unfortunately in the result I get weird red areas in some places of the output. I'm not sure what's the problem. It's a problem with only some of the pdf files.
> Here's some of the code that I'm using:
> {code:java}
>     public static BufferedImage generateFromPdf(String ref, InputStream stream, int pageIndex, PreviewMode mode) throws IOException {
>          PDDocument doc = null;
>          try (InputStream buffered = new BufferedInputStream(stream)) {
>              doc = PDDocument.load(buffered, PDF_LOADING_MEMORY_SETTING);
>              if (pageIndex > doc.getNumberOfPages())
> {                 return null;             }
>             PDFRenderer renderer = new PDFRenderer(doc);
>              return rasterizePdfBox(ref, pageIndex, renderer, mode);
>          } finally {
>              if (doc != null)
> {                 doc.close();             }
>         }
>      }
> {code}
>  and then:
> {code:java}
>     private static BufferedImage rasterizePdfBox(String ref, int pageIndex, PDFRenderer renderer, PreviewMode mode) throws IOException {
>          Future<BufferedImage> result = executorService.submit(() ->
> {             LOGGER.info(String.format("Generate preview for ref: %s, page: %s, mode: %s ", ref, pageIndex, mode.name()));             return renderer.renderImageWithDPI(pageIndex - 1, mode.getDpi(), ImageType.RGB);         }
> );
>         try
> {             return result.get();         }
> catch (InterruptedException | ExecutionException e)
> {             LOGGER.error(String.format("Error when generating preview: %s", e.getMessage()));             Thread.currentThread().interrupt();             throw new IOException(e.getMessage());         }
>     }
> {code}
>  ```
> So far I've only figured out that the places which are red in the output are blank when I open them in `Master PDF editor` on linux. They seem normal though when I open them with `Document Viewer`.
> Some hints:
>  - the pdfs with problems have been scanned. I can select text around the working parts but not at the places that have red overlay over them. Maybe it's something to do with OCR issues?
>  - if I use the linux tool `convert not-working-pdf.pdf converted.pdf` and then try to convert this file to png, then the issue is not there anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org