You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Commented JIRA)" <ji...@apache.org> on 2012/03/25 15:53:28 UTC
[jira] [Commented] (PDFBOX-1072) PDFImageWriter extracts black
images from arabic PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237864#comment-13237864 ]
Andreas Lehmkühler commented on PDFBOX-1072:
--------------------------------------------
The pdf contains JBIG2 encoded images which are not yet supported, see PDFBOX-1067 for details
> PDFImageWriter extracts black images from arabic PDFs
> -----------------------------------------------------
>
> Key: PDFBOX-1072
> URL: https://issues.apache.org/jira/browse/PDFBOX-1072
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 1.6.0
> Reporter: Anton Stremoukhov
> Labels: JBIG2
> Attachments: page9_thumbnail.png
>
>
> When I tried to extract a JPEG image from arabic PDF, i've got a corrupted file with black area which overlays all arabic text on each page.
> In console i've got only this debug message and no other exceptions and so on:
> DEBUG (PDPixelMap.java:241) - ColorModel: IndexColorModel: #pixelBits = 1 numComponents = 4 color space = java.awt.color.ICC_ColorSpace@2eeb3c84 transparency = 2 transIndex = 1 has alpha = true isAlphaPre = false
> This is not only one pdf file. I have about 400-500 files which produces the same thing.
> Code:
> PDFImageWriter writer = new PDFImageWriter();
> PDDocument document = PDDocument.load(sourceFile);
> writer.writeImage(document, "jpg", "", 1, 1, filename);
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira