You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michał Pomarański (JIRA)" <ji...@apache.org> on 2019/02/20 14:51:00 UTC
[jira] [Updated] (PDFBOX-4470) Red areas around text when
converting a pdf to png with pdfbox
[ https://issues.apache.org/jira/browse/PDFBOX-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michał Pomarański updated PDFBOX-4470:
--------------------------------------
Description:
I'm trying to convert a pdf to png file using pdfbox. Unfortunately in the result I get weird red areas in some places of the output. I'm not sure what's the problem. It's a problem with only some of the pdf files.
Here's some of the code that I'm using:
{code:java}
public static BufferedImage generateFromPdf(String ref, InputStream stream, int pageIndex, PreviewMode mode) throws IOException {
PDDocument doc = null;
try (InputStream buffered = new BufferedInputStream(stream)) {
doc = PDDocument.load(buffered, PDF_LOADING_MEMORY_SETTING);
if (pageIndex > doc.getNumberOfPages())
{ return null; }
PDFRenderer renderer = new PDFRenderer(doc);
return rasterizePdfBox(ref, pageIndex, renderer, mode);
} finally {
if (doc != null)
{ doc.close(); }
}
}
{code}
and then:
{code:java}
private static BufferedImage rasterizePdfBox(String ref, int pageIndex, PDFRenderer renderer, PreviewMode mode) throws IOException {
Future<BufferedImage> result = executorService.submit(() ->
{ LOGGER.info(String.format("Generate preview for ref: %s, page: %s, mode: %s ", ref, pageIndex, mode.name())); return renderer.renderImageWithDPI(pageIndex - 1, mode.getDpi(), ImageType.RGB); }
);
try
{ return result.get(); }
catch (InterruptedException | ExecutionException e)
{ LOGGER.error(String.format("Error when generating preview: %s", e.getMessage())); Thread.currentThread().interrupt(); throw new IOException(e.getMessage()); }
}
{code}
```
So far I've only figured out that the places which are red in the output are blank when I open them in `Master PDF editor` on linux. They seem normal though when I open them with `Document Viewer`.
Some hints:
- the pdfs with problems have been scanned. I can select text around the working parts but not at the places that have red overlay over them. Maybe it's something to do with OCR issues?
- if I use the linux tool `convert not-working-pdf.pdf converted.pdf` and then try to convert this file to png, then the issue is not there anymore.
was:
I'm trying to convert a pdf to png file using pdfbox. Unfortunately in the result I get weird red areas in some places of the output. I'm not sure what's the problem. It's a problem with only some of the pdf files.
Here's some of the code that I'm using:
```java
public static BufferedImage generateFromPdf(String ref, InputStream stream, int pageIndex, PreviewMode mode) throws IOException {
PDDocument doc = null;
try (InputStream buffered = new BufferedInputStream(stream)) {
doc = PDDocument.load(buffered, PDF_LOADING_MEMORY_SETTING);
if (pageIndex > doc.getNumberOfPages()) {
return null;
}
PDFRenderer renderer = new PDFRenderer(doc);
return rasterizePdfBox(ref, pageIndex, renderer, mode);
} finally {
if (doc != null) {
doc.close();
}
}
}
```
and then:
```java
private static BufferedImage rasterizePdfBox(String ref, int pageIndex, PDFRenderer renderer, PreviewMode mode) throws IOException {
Future<BufferedImage> result = executorService.submit(() -> {
LOGGER.info(String.format("Generate preview for ref: %s, page: %s, mode: %s ", ref, pageIndex, mode.name()));
return renderer.renderImageWithDPI(pageIndex - 1, mode.getDpi(), ImageType.RGB);
});
try {
return result.get();
} catch (InterruptedException | ExecutionException e) {
LOGGER.error(String.format("Error when generating preview: %s", e.getMessage()));
Thread.currentThread().interrupt();
throw new IOException(e.getMessage());
}
}
```
So far I've only figured out that the places which are red in the output are blank when I open them in `Master PDF editor` on linux. They seem normal though when I open them with `Document Viewer`.
Some hints:
- the pdfs with problems have been scanned. I can select text around the working parts but not at the places that have red overlay over them. Maybe it's something to do with OCR issues?
- if I use the linux tool `convert not-working-pdf.pdf converted.pdf` and then try to convert this file to png, then the issue is not there anymore.
> Red areas around text when converting a pdf to png with pdfbox
> --------------------------------------------------------------
>
> Key: PDFBOX-4470
> URL: https://issues.apache.org/jira/browse/PDFBOX-4470
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 2.0.13
> Reporter: Michał Pomarański
> Priority: Major
> Attachments: 1206264-michael-sims-onboarding-documents-248 (1).pdf
>
>
> I'm trying to convert a pdf to png file using pdfbox. Unfortunately in the result I get weird red areas in some places of the output. I'm not sure what's the problem. It's a problem with only some of the pdf files.
> Here's some of the code that I'm using:
> {code:java}
> public static BufferedImage generateFromPdf(String ref, InputStream stream, int pageIndex, PreviewMode mode) throws IOException {
> PDDocument doc = null;
> try (InputStream buffered = new BufferedInputStream(stream)) {
> doc = PDDocument.load(buffered, PDF_LOADING_MEMORY_SETTING);
> if (pageIndex > doc.getNumberOfPages())
> { return null; }
> PDFRenderer renderer = new PDFRenderer(doc);
> return rasterizePdfBox(ref, pageIndex, renderer, mode);
> } finally {
> if (doc != null)
> { doc.close(); }
> }
> }
> {code}
> and then:
> {code:java}
> private static BufferedImage rasterizePdfBox(String ref, int pageIndex, PDFRenderer renderer, PreviewMode mode) throws IOException {
> Future<BufferedImage> result = executorService.submit(() ->
> { LOGGER.info(String.format("Generate preview for ref: %s, page: %s, mode: %s ", ref, pageIndex, mode.name())); return renderer.renderImageWithDPI(pageIndex - 1, mode.getDpi(), ImageType.RGB); }
> );
> try
> { return result.get(); }
> catch (InterruptedException | ExecutionException e)
> { LOGGER.error(String.format("Error when generating preview: %s", e.getMessage())); Thread.currentThread().interrupt(); throw new IOException(e.getMessage()); }
> }
> {code}
> ```
> So far I've only figured out that the places which are red in the output are blank when I open them in `Master PDF editor` on linux. They seem normal though when I open them with `Document Viewer`.
> Some hints:
> - the pdfs with problems have been scanned. I can select text around the working parts but not at the places that have red overlay over them. Maybe it's something to do with OCR issues?
> - if I use the linux tool `convert not-working-pdf.pdf converted.pdf` and then try to convert this file to png, then the issue is not there anymore.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org