You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Abid Hussain <ab...@abid76.de> on 2009/01/20 12:16:33 UTC
extract images
Hello everybody,
I'm trying to extract images from a pdf file which won't work...:-(
I tried the ExtractImages.exe which results in:
>ExtractImages.exe "C:\path\to\pdf_file"
Exception in thread "main" java.lang.NullPointerException
at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
at org.pdfbox.ExtractImages.main(ExtractImages.java:72)
Then I tried to extract the images using code I copied from the ExtractImages class:
Here's a snippet:
PDXObjectImage image = (PDXObjectImage) images.get(key);
String name = getUniqueFileName(key, image.getSuffix());
image.write2file(name);
The execution of the last line results in:
java.util.zip.ZipException: unknown compression method
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
at org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
at org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:138)
at
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:166)
at
org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.java:118)
at de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)
Anybody knows how to get the image extraction work correctly...?
Best regards,
Abid
--
Abid Hussain
RE: extract images
Posted by "Balasubramaniam, Balaji" <Ba...@ejgallo.com>.
The patch is in SVN repository. You have to update your workspace from the
SVN location.
http://svn.apache.org/repos/asf/incubator/pdfbox/trunk/
and then build the project using ANT.
-----Original Message-----
From: Abid Hussain [mailto:abid.hussain@abid76.de]
Sent: Wednesday, January 21, 2009 9:44 AM
To: pdfbox-users@incubator.apache.org
Subject: Re: extract images
Thanks for help. Where can I find the provided patch? I looked in the jira
but
didn't find anything. Maybe I have overlooked something?
Regards,
Abid
Peter_Lenahan@ibi.com schrieb:
> Abid,
>
> This bug may be the same bug that was just patched.
> The line of code it is blowing up on is the same as another bug report.
> " RE: java.io.EOFException: Unexpected end of ZLIB input stream"
>
> Please get the Patch that Andreas talks about and try that.
>
> Good Luck,
> Peter
>
>
> Hi Peter,
>
> I've checked all critical locations org.apache.pdfbox.filter.FlateFilter
> and provided a patch.
>
> Thanks you for your help.
>
> BR
> Andreas
>
> Peter_Lenahan@ibi.com schrieb:
>> I forgot to add the number of bytes available in the variable mayRead
>> to the where statement, in the earlier message. Version 2 is below.
>>
>>
>> int mayRead=compressedData.available(); // pjl
>> while ((mayRead > 0 &&
>> (amountRead = decompressor.read(buffer, 0,
>> Math.min(mayRead,BUFFER_SIZE))) != -1))
>>
>> -----Original Message-----
>> From: Lenahan, Peter
>> Sent: Friday, January 16, 2009 10:26 AM
>> To: pdfbox-users@incubator.apache.org
>> Subject: RE: java.io.EOFException: Unexpected end of ZLIB input stream
>> error message on UNIX box
>>
>> I did a Google search on your issue. There are a couple of solutions.
>> InflaterInputStream read Unexpected end of ZLIB It came up with:
>> Results 1 - 10 of about 854
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4040920
>>
>> Work Around
>> The workaround is to never attempt to read more bytes than the entry
>> contains. Call ZipEntry.getSize() to get the actual size of the entry,
>> then use this value to keep track of the number of bytes remaining in
>> the entry while reading from it. To take the previous example:
>>
>> This code change may solve the issue for PDFBox.
>>
>> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>> Add the Math.min() to reduce the number of bytes you are trying to read.
>>
>> int mayRead=compressedData.available();
>> while ((amountRead = decompressor.read(buffer, 0,
>> Math.min(mayRead,BUFFER_SIZE))) != -1)
>>
>>
>>
>> I found another potential issue like this with a solution on the Sun
>> site.
>> It was described using windows, but the same could happen on UNIX.
>> It suggests that the issue could happen if you are running several
>> processes against the same directory. Please look this over to see if
>> this is the problem. Are you running multiple processes to accomplish
>> the job faster?
>>
>> http://forums.sun.com/thread.jspa?threadID=5316308
>>
>> paul.miner
>> Posts:2,639
>> Registered: 10/8/07
>> Re: Unexpected end of ZLIB input stream error while compiling
>> Jul 22, 2008 6:54 AM (reply 1 of 2) (In reply to original post )
>>
>> koko191 wrote:
>> Main batch :
>> start /B %SWIFT_LOCAL_HOME%\scripts\rmicAll.bat
>> start /B %SWIFT_LOCAL_HOME%\scripts\create_jar.bat
>>
>> The "start" command does not wait for the command to finish, so both
>> those batch files would be running in parallel. If they both work on
>> the same jar, this could be a problem.
>>
>> If you want to run the batch files in sequence, use "call".
>>
>> -----Original Message-----
>> From: Balasubramaniam, Balaji
>> [mailto:Balaji.Balasubramaniam@ejgallo.com]
>> Sent: Tuesday, January 13, 2009 7:05 PM
>> To: pdfbox-users@incubator.apache.org
>> Subject: java.io.EOFException: Unexpected end of ZLIB input stream
>> error message on UNIX box
>>
>> Hello,
>>
>>
>>
>> I'm trying to use PdfBox to identify a PDF file is corrupted or not.
>> We are trying to automate a process in which it is going to loop
>> through a given folder and see how many of the PDF files are
>> corrupted. This program works fine in windows XP environment (OS
>> Version: x86 Windows XP 5.1, Java version
>> : Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran this
>> application in UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, Java
>> Version: Java
>> HotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) it
>> throws the following error.
>>
>>
>>
>> NOTE: This error is not happening for all the time. It throws the
>> error only for some of the PDF files. Those PDF files are not
>> corrupted and I could open those PDF files manually and it opens fine.
>>
>>
>>
>> java.io.EOFException: Unexpected end of ZLIB input stream
>>
>> at
>> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)
>>
>> at
>> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)
>>
>> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>>
>> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
>>
>> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
>>
>> at
>> org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
>>
>> at
>> org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStream
>> Ar
>> ray.j
>> ava:200)
>>
>> at
>> org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
>>
>> at
>> ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMess
>> ag
>> e.inv
>> oke(RunAuditProcessGenerateAuditLogMessage.java:212)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>> av
>> a:39)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>> or
>> Impl.
>> java:25)
>>
>> at java.lang.reflect.Method.invoke(Method.java:585)
>>
>> at
>> com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)
>>
>> at com.tibco.pe.plugin.Activity.eval(Activity.java:209)
>>
>> at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)
>>
>> at com.tibco.pe.core.Job.a(Job.java:712)
>>
>> at com.tibco.pe.core.Job.k(Job.java:501)
>>
>> at
>> com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)
>>
>> at
>> com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)
>>
>>
>>
>> Sample code snippet I use to do the task.
>>
>>
>>
>> PDDocument document = PDDocument.load(<input stream>);
>>
>> List pages = document.getDocumentCatalog().getAllPages();
>>
>> If(pages != null && pages.size() > 0) {
>>
>> PDPage page = (PDPage)pages.get(i);
>>
>> PDStream contents = page.getContents();
>>
>> PDFStreamParser parser = null;
>>
>> try {
>>
>> parser = new PDFStreamParser(contents.getStream());
>>
>> } catch(Exception e) {
>>
>> System.err.println("This PDF cannot be read. Most possibly it
>> could be corrupted. " + pdfFileName);
>>
>> }
>>
>> }
>>
>>
>>
>> Could somebody shed some light on this one?
>>
>>
>>
>> Thank you.
>>
>>
>
>
> --
> Auf der Verpackung stand "benötigt Windows 9x/2000/XP oder BESSER", also
habe ich Linux installiert.
>
>
>
> -----Original Message-----
> From: Abid Hussain [mailto:abid.hussain@abid76.de]
> Sent: Tuesday, January 20, 2009 6:17 AM
> To: pdfbox-users@incubator.apache.org
> Subject: extract images
>
> Hello everybody,
>
> I'm trying to extract images from a pdf file which won't work...:-(
>
> I tried the ExtractImages.exe which results in:
> >ExtractImages.exe "C:\path\to\pdf_file"
> Exception in thread "main" java.lang.NullPointerException
> at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
> at org.pdfbox.ExtractImages.main(ExtractImages.java:72)
>
> Then I tried to extract the images using code I copied from the
ExtractImages class:
> Here's a snippet:
> PDXObjectImage image = (PDXObjectImage) images.get(key);
> String name = getUniqueFileName(key, image.getSuffix());
> image.write2file(name);
>
> The execution of the last line results in:
> java.util.zip.ZipException: unknown compression method
> at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
> at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
> at
org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
> at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
> at
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:13
8)
> at
>
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.
java:166)
> at
>
org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.
java:118)
> at
de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
> at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)
>
> Anybody knows how to get the image extraction work correctly...?
>
> Best regards,
>
> Abid
>
--
Abid Hussain
Re: extract images
Posted by Abid Hussain <ab...@abid76.de>.
Thanks for help. Where can I find the provided patch? I looked in the jira but
didn't find anything. Maybe I have overlooked something?
Regards,
Abid
Peter_Lenahan@ibi.com schrieb:
> Abid,
>
> This bug may be the same bug that was just patched.
> The line of code it is blowing up on is the same as another bug report.
> " RE: java.io.EOFException: Unexpected end of ZLIB input stream"
>
> Please get the Patch that Andreas talks about and try that.
>
> Good Luck,
> Peter
>
>
> Hi Peter,
>
> I've checked all critical locations org.apache.pdfbox.filter.FlateFilter
> and provided a patch.
>
> Thanks you for your help.
>
> BR
> Andreas
>
> Peter_Lenahan@ibi.com schrieb:
>> I forgot to add the number of bytes available in the variable mayRead
>> to the where statement, in the earlier message. Version 2 is below.
>>
>>
>> int mayRead=compressedData.available(); // pjl
>> while ((mayRead > 0 &&
>> (amountRead = decompressor.read(buffer, 0,
>> Math.min(mayRead,BUFFER_SIZE))) != -1))
>>
>> -----Original Message-----
>> From: Lenahan, Peter
>> Sent: Friday, January 16, 2009 10:26 AM
>> To: pdfbox-users@incubator.apache.org
>> Subject: RE: java.io.EOFException: Unexpected end of ZLIB input stream
>> error message on UNIX box
>>
>> I did a Google search on your issue. There are a couple of solutions.
>> InflaterInputStream read Unexpected end of ZLIB It came up with:
>> Results 1 - 10 of about 854
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4040920
>>
>> Work Around
>> The workaround is to never attempt to read more bytes than the entry
>> contains. Call ZipEntry.getSize() to get the actual size of the entry,
>> then use this value to keep track of the number of bytes remaining in
>> the entry while reading from it. To take the previous example:
>>
>> This code change may solve the issue for PDFBox.
>>
>> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>> Add the Math.min() to reduce the number of bytes you are trying to read.
>>
>> int mayRead=compressedData.available();
>> while ((amountRead = decompressor.read(buffer, 0,
>> Math.min(mayRead,BUFFER_SIZE))) != -1)
>>
>>
>>
>> I found another potential issue like this with a solution on the Sun
>> site.
>> It was described using windows, but the same could happen on UNIX.
>> It suggests that the issue could happen if you are running several
>> processes against the same directory. Please look this over to see if
>> this is the problem. Are you running multiple processes to accomplish
>> the job faster?
>>
>> http://forums.sun.com/thread.jspa?threadID=5316308
>>
>> paul.miner
>> Posts:2,639
>> Registered: 10/8/07
>> Re: Unexpected end of ZLIB input stream error while compiling
>> Jul 22, 2008 6:54 AM (reply 1 of 2) (In reply to original post )
>>
>> koko191 wrote:
>> Main batch :
>> start /B %SWIFT_LOCAL_HOME%\scripts\rmicAll.bat
>> start /B %SWIFT_LOCAL_HOME%\scripts\create_jar.bat
>>
>> The "start" command does not wait for the command to finish, so both
>> those batch files would be running in parallel. If they both work on
>> the same jar, this could be a problem.
>>
>> If you want to run the batch files in sequence, use "call".
>>
>> -----Original Message-----
>> From: Balasubramaniam, Balaji
>> [mailto:Balaji.Balasubramaniam@ejgallo.com]
>> Sent: Tuesday, January 13, 2009 7:05 PM
>> To: pdfbox-users@incubator.apache.org
>> Subject: java.io.EOFException: Unexpected end of ZLIB input stream
>> error message on UNIX box
>>
>> Hello,
>>
>>
>>
>> I'm trying to use PdfBox to identify a PDF file is corrupted or not.
>> We are trying to automate a process in which it is going to loop
>> through a given folder and see how many of the PDF files are
>> corrupted. This program works fine in windows XP environment (OS
>> Version: x86 Windows XP 5.1, Java version
>> : Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran this
>> application in UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, Java
>> Version: Java
>> HotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) it
>> throws the following error.
>>
>>
>>
>> NOTE: This error is not happening for all the time. It throws the
>> error only for some of the PDF files. Those PDF files are not
>> corrupted and I could open those PDF files manually and it opens fine.
>>
>>
>>
>> java.io.EOFException: Unexpected end of ZLIB input stream
>>
>> at
>> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)
>>
>> at
>> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)
>>
>> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>>
>> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
>>
>> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
>>
>> at
>> org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
>>
>> at
>> org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStream
>> Ar
>> ray.j
>> ava:200)
>>
>> at
>> org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
>>
>> at
>> ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMess
>> ag
>> e.inv
>> oke(RunAuditProcessGenerateAuditLogMessage.java:212)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>> av
>> a:39)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>> or
>> Impl.
>> java:25)
>>
>> at java.lang.reflect.Method.invoke(Method.java:585)
>>
>> at
>> com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)
>>
>> at com.tibco.pe.plugin.Activity.eval(Activity.java:209)
>>
>> at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)
>>
>> at com.tibco.pe.core.Job.a(Job.java:712)
>>
>> at com.tibco.pe.core.Job.k(Job.java:501)
>>
>> at
>> com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)
>>
>> at
>> com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)
>>
>>
>>
>> Sample code snippet I use to do the task.
>>
>>
>>
>> PDDocument document = PDDocument.load(<input stream>);
>>
>> List pages = document.getDocumentCatalog().getAllPages();
>>
>> If(pages != null && pages.size() > 0) {
>>
>> PDPage page = (PDPage)pages.get(i);
>>
>> PDStream contents = page.getContents();
>>
>> PDFStreamParser parser = null;
>>
>> try {
>>
>> parser = new PDFStreamParser(contents.getStream());
>>
>> } catch(Exception e) {
>>
>> System.err.println("This PDF cannot be read. Most possibly it
>> could be corrupted. " + pdfFileName);
>>
>> }
>>
>> }
>>
>>
>>
>> Could somebody shed some light on this one?
>>
>>
>>
>> Thank you.
>>
>>
>
>
> --
> Auf der Verpackung stand "benötigt Windows 9x/2000/XP oder BESSER", also habe ich Linux installiert.
>
>
>
> -----Original Message-----
> From: Abid Hussain [mailto:abid.hussain@abid76.de]
> Sent: Tuesday, January 20, 2009 6:17 AM
> To: pdfbox-users@incubator.apache.org
> Subject: extract images
>
> Hello everybody,
>
> I'm trying to extract images from a pdf file which won't work...:-(
>
> I tried the ExtractImages.exe which results in:
> >ExtractImages.exe "C:\path\to\pdf_file"
> Exception in thread "main" java.lang.NullPointerException
> at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
> at org.pdfbox.ExtractImages.main(ExtractImages.java:72)
>
> Then I tried to extract the images using code I copied from the ExtractImages class:
> Here's a snippet:
> PDXObjectImage image = (PDXObjectImage) images.get(key);
> String name = getUniqueFileName(key, image.getSuffix());
> image.write2file(name);
>
> The execution of the last line results in:
> java.util.zip.ZipException: unknown compression method
> at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
> at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
> at org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
> at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
> at org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:138)
> at
> org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:166)
> at
> org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.java:118)
> at de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
> at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)
>
> Anybody knows how to get the image extraction work correctly...?
>
> Best regards,
>
> Abid
>
--
Abid Hussain
RE: extract images
Posted by Pe...@ibi.com.
Abid,
This bug may be the same bug that was just patched.
The line of code it is blowing up on is the same as another bug report.
" RE: java.io.EOFException: Unexpected end of ZLIB input stream"
Please get the Patch that Andreas talks about and try that.
Good Luck,
Peter
Hi Peter,
I've checked all critical locations org.apache.pdfbox.filter.FlateFilter
and provided a patch.
Thanks you for your help.
BR
Andreas
Peter_Lenahan@ibi.com schrieb:
> I forgot to add the number of bytes available in the variable mayRead
> to the where statement, in the earlier message. Version 2 is below.
>
>
> int mayRead=compressedData.available(); // pjl
> while ((mayRead > 0 &&
> (amountRead = decompressor.read(buffer, 0,
> Math.min(mayRead,BUFFER_SIZE))) != -1))
>
> -----Original Message-----
> From: Lenahan, Peter
> Sent: Friday, January 16, 2009 10:26 AM
> To: pdfbox-users@incubator.apache.org
> Subject: RE: java.io.EOFException: Unexpected end of ZLIB input stream
> error message on UNIX box
>
> I did a Google search on your issue. There are a couple of solutions.
> InflaterInputStream read Unexpected end of ZLIB It came up with:
> Results 1 - 10 of about 854
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4040920
>
> Work Around
> The workaround is to never attempt to read more bytes than the entry
> contains. Call ZipEntry.getSize() to get the actual size of the entry,
> then use this value to keep track of the number of bytes remaining in
> the entry while reading from it. To take the previous example:
>
> This code change may solve the issue for PDFBox.
>
> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
> Add the Math.min() to reduce the number of bytes you are trying to read.
>
> int mayRead=compressedData.available();
> while ((amountRead = decompressor.read(buffer, 0,
> Math.min(mayRead,BUFFER_SIZE))) != -1)
>
>
>
> I found another potential issue like this with a solution on the Sun
> site.
> It was described using windows, but the same could happen on UNIX.
> It suggests that the issue could happen if you are running several
> processes against the same directory. Please look this over to see if
> this is the problem. Are you running multiple processes to accomplish
> the job faster?
>
> http://forums.sun.com/thread.jspa?threadID=5316308
>
> paul.miner
> Posts:2,639
> Registered: 10/8/07
> Re: Unexpected end of ZLIB input stream error while compiling
> Jul 22, 2008 6:54 AM (reply 1 of 2) (In reply to original post )
>
> koko191 wrote:
> Main batch :
> start /B %SWIFT_LOCAL_HOME%\scripts\rmicAll.bat
> start /B %SWIFT_LOCAL_HOME%\scripts\create_jar.bat
>
> The "start" command does not wait for the command to finish, so both
> those batch files would be running in parallel. If they both work on
> the same jar, this could be a problem.
>
> If you want to run the batch files in sequence, use "call".
>
> -----Original Message-----
> From: Balasubramaniam, Balaji
> [mailto:Balaji.Balasubramaniam@ejgallo.com]
> Sent: Tuesday, January 13, 2009 7:05 PM
> To: pdfbox-users@incubator.apache.org
> Subject: java.io.EOFException: Unexpected end of ZLIB input stream
> error message on UNIX box
>
> Hello,
>
>
>
> I'm trying to use PdfBox to identify a PDF file is corrupted or not.
> We are trying to automate a process in which it is going to loop
> through a given folder and see how many of the PDF files are
> corrupted. This program works fine in windows XP environment (OS
> Version: x86 Windows XP 5.1, Java version
> : Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran this
> application in UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, Java
> Version: Java
> HotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) it
> throws the following error.
>
>
>
> NOTE: This error is not happening for all the time. It throws the
> error only for some of the PDF files. Those PDF files are not
> corrupted and I could open those PDF files manually and it opens fine.
>
>
>
> java.io.EOFException: Unexpected end of ZLIB input stream
>
> at
> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)
>
> at
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)
>
> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
>
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
>
> at
> org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
>
> at
> org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStream
> Ar
> ray.j
> ava:200)
>
> at
> org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
>
> at
> ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMess
> ag
> e.inv
> oke(RunAuditProcessGenerateAuditLogMessage.java:212)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> av
> a:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> or
> Impl.
> java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:585)
>
> at
> com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)
>
> at com.tibco.pe.plugin.Activity.eval(Activity.java:209)
>
> at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)
>
> at com.tibco.pe.core.Job.a(Job.java:712)
>
> at com.tibco.pe.core.Job.k(Job.java:501)
>
> at
> com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)
>
> at
> com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)
>
>
>
> Sample code snippet I use to do the task.
>
>
>
> PDDocument document = PDDocument.load(<input stream>);
>
> List pages = document.getDocumentCatalog().getAllPages();
>
> If(pages != null && pages.size() > 0) {
>
> PDPage page = (PDPage)pages.get(i);
>
> PDStream contents = page.getContents();
>
> PDFStreamParser parser = null;
>
> try {
>
> parser = new PDFStreamParser(contents.getStream());
>
> } catch(Exception e) {
>
> System.err.println("This PDF cannot be read. Most possibly it
> could be corrupted. " + pdfFileName);
>
> }
>
> }
>
>
>
> Could somebody shed some light on this one?
>
>
>
> Thank you.
>
>
--
Auf der Verpackung stand "benötigt Windows 9x/2000/XP oder BESSER", also habe ich Linux installiert.
-----Original Message-----
From: Abid Hussain [mailto:abid.hussain@abid76.de]
Sent: Tuesday, January 20, 2009 6:17 AM
To: pdfbox-users@incubator.apache.org
Subject: extract images
Hello everybody,
I'm trying to extract images from a pdf file which won't work...:-(
I tried the ExtractImages.exe which results in:
>ExtractImages.exe "C:\path\to\pdf_file"
Exception in thread "main" java.lang.NullPointerException
at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
at org.pdfbox.ExtractImages.main(ExtractImages.java:72)
Then I tried to extract the images using code I copied from the ExtractImages class:
Here's a snippet:
PDXObjectImage image = (PDXObjectImage) images.get(key);
String name = getUniqueFileName(key, image.getSuffix());
image.write2file(name);
The execution of the last line results in:
java.util.zip.ZipException: unknown compression method
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
at org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
at org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:138)
at
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:166)
at
org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.java:118)
at de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)
Anybody knows how to get the image extraction work correctly...?
Best regards,
Abid
--
Abid Hussain