You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2021/02/04 17:15:01 UTC
[jira] [Updated] (PDFBOX-5097) Rendered pdf image lacks all the
text in this particular case
[ https://issues.apache.org/jira/browse/PDFBOX-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated PDFBOX-5097:
------------------------------------
Labels: jbig2 (was: )
> Rendered pdf image lacks all the text in this particular case
> -------------------------------------------------------------
>
> Key: PDFBOX-5097
> URL: https://issues.apache.org/jira/browse/PDFBOX-5097
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.22
> Environment: Linux DamianPad 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> Reporter: Robert-Andrei Damian
> Priority: Major
> Labels: jbig2
> Attachments: 0.png, 1.png, document(3).pdf
>
>
> Hello,
> I am working with pdfbox to transform input pdf files to images, which are later fed to an OCR library. It works perfectly in most of the cases but I stumbled upon this particular case in which all text disappeared from the rendered image.
> My source code for the method which converts the pdf into images:
>
> {code:java}
> public List<BufferedImage> splitPdf(File pdfFile) throws IOException {
> List<BufferedImage> result = new ArrayList<>();
> PDDocument document = PDDocument.load(pdfFile);
> PDFRenderer pdfRenderer = new PDFRenderer(document);
> for (int pageIndex = 0; pageIndex < document.getNumberOfPages(); pageIndex++) {
> result.add(pdfRenderer.renderImage(pageIndex));
> debugPageImageInfo(result.get(result.size() - 1));
> }
> document.close();
> return result;
> }
> {code}
>
> I attached to this issue the pdf file for which I identified the problem and the resulting images.
>
> I hope this is helpful for anyone else encountering the same problem!
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org