You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2022/10/20 17:56:00 UTC
[jira] [Commented] (PDFBOX-5531) wrong image data is extracted from PDF having single image
[ https://issues.apache.org/jira/browse/PDFBOX-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621294#comment-17621294 ]
Tilman Hausherr commented on PDFBOX-5531:
-----------------------------------------
Please attach your PDF and retry with the latest version and explain what you expected to get.
> wrong image data is extracted from PDF having single image
> ----------------------------------------------------------
>
> Key: PDFBOX-5531
> URL: https://issues.apache.org/jira/browse/PDFBOX-5531
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 2.0.26
> Reporter: Komal
> Priority: Major
>
> Dear Concerned,
> We are trying to extract image from PDF having single image with following properties: CCITTFaxDecode decoded G4 compression, 150 dpi but when following code of PDFBox is used than we get LZW image with 96 dpi
> PDDocument document = PDDocument.load(new File("D:\\extractImage\\in\\20211125174048BT Exception Documents.pdf"));
> PDPageTree list = document.getPages();
> for (PDPage page : list) {
> PDResources pdResources = page.getResources();
> for (COSName c : pdResources.getXObjectNames()) {
> PDXObject o = pdResources.getXObject(c);
> if (o instanceof org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject) {
> BufferedImage img= ((org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject)o).getImage();
> }
> }
> }
>
> Also we we try to get raw stream byte data of image using following method , the byte array coming is incorrect.
> PDPage page1 = reader.getPage(pageNumber-1);
> PDResources pdResources = page1.getResources();
> for (COSName c : pdResources.getXObjectNames()) {
> PDXObject o = pdResources.getXObject(c);
> PDImageXObject ob = (PDImageXObject)o;
> ImageXObject xObj1 = new ImageXObject();
> xObj1.xObject = (PDImageXObject) o;
> COSStream imageStream = ob.getCOSObject();
> PDStream stream = (new PDStream(imageStream));
> // BufferedImage image = ob.getImage();
> byte[] streamDataBuffer = stream.toByteArray();
>
> kindly provide a method which can return black and white image object and image raw stream byte array.
> Thanks in advance.
> Regards,
> Komal Walia
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org