You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Abid Hussain <ab...@abid76.de> on 2009/01/20 12:16:33 UTC

extract images

Hello everybody,

I'm trying to extract images from a pdf file which won't work...:-(

I tried the ExtractImages.exe which results in:
 >ExtractImages.exe "C:\path\to\pdf_file"
Exception in thread "main" java.lang.NullPointerException
         at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
         at org.pdfbox.ExtractImages.main(ExtractImages.java:72)

Then I tried to extract the images using code I copied from the ExtractImages class:
Here's a snippet:
PDXObjectImage image = (PDXObjectImage) images.get(key);
String name = getUniqueFileName(key, image.getSuffix());
image.write2file(name);

The execution of the last line results in:
java.util.zip.ZipException: unknown compression method
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
	at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
	at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
	at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
	at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
	at org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
	at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
	at org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:138)
	at 
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:166)
	at 
org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.java:118)
	at de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
	at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)

Anybody knows how to get the image extraction work correctly...?

Best regards,

Abid

-- 

Abid Hussain

RE: extract images

Posted by "Balasubramaniam, Balaji" <Ba...@ejgallo.com>.
The patch is in SVN repository. You have to update your workspace from the
SVN location.

http://svn.apache.org/repos/asf/incubator/pdfbox/trunk/

and then build the project using ANT.

-----Original Message-----
From: Abid Hussain [mailto:abid.hussain@abid76.de] 
Sent: Wednesday, January 21, 2009 9:44 AM
To: pdfbox-users@incubator.apache.org
Subject: Re: extract images

Thanks for help. Where can I find the provided patch? I looked in the jira
but 
didn't find anything. Maybe I have overlooked something?

Regards,

Abid

Peter_Lenahan@ibi.com schrieb:
> Abid,
> 
> This bug may be the same bug that was just patched.
> The line of code it is blowing up on is the same as another bug report.
> " RE: java.io.EOFException: Unexpected end of ZLIB input stream"
> 
> Please get the Patch that Andreas talks about and try that.
> 
> Good Luck,
> Peter
> 
> 
> Hi Peter,
> 
> I've checked all critical locations org.apache.pdfbox.filter.FlateFilter
> and provided a patch.
> 
> Thanks you for your help.
> 
> BR
> Andreas
> 
> Peter_Lenahan@ibi.com schrieb:
>> I forgot to add the number of bytes available in the variable mayRead 
>> to the where statement, in the earlier message. Version 2 is below.
>>
>>
>>      int mayRead=compressedData.available(); // pjl
>>      while ((mayRead > 0 && 
>>             (amountRead = decompressor.read(buffer, 0, 
>>                                Math.min(mayRead,BUFFER_SIZE))) != -1))
>>
>> -----Original Message-----
>> From: Lenahan, Peter
>> Sent: Friday, January 16, 2009 10:26 AM
>> To: pdfbox-users@incubator.apache.org
>> Subject: RE: java.io.EOFException: Unexpected end of ZLIB input stream 
>> error message on UNIX box
>>
>> I did a Google search on your issue. There are a couple of solutions.
>>    InflaterInputStream read Unexpected end of ZLIB It came up with: 
>> Results 1 - 10 of about 854
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4040920
>>
>> Work Around	
>> The workaround is to never attempt to read more bytes than the entry 
>> contains. Call ZipEntry.getSize() to get the actual size of the entry, 
>> then use this value to keep track of the number of bytes remaining in 
>> the entry while reading from it. To take the previous example:
>>
>> This code change may solve the issue for PDFBox. 
>>
>> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>> Add the Math.min() to reduce the number of bytes you are trying to read.
>>
>>                 int mayRead=compressedData.available();
>>                 while ((amountRead = decompressor.read(buffer, 0,
>> Math.min(mayRead,BUFFER_SIZE))) != -1)
>>
>>
>>
>> I found another potential issue like this with a solution on the Sun 
>> site.
>> It was described using windows, but the same could happen on UNIX.
>> It suggests that the issue could happen if you are running several 
>> processes against the same directory. Please look this over to see if 
>> this is the problem. Are you running multiple processes to accomplish 
>> the job faster?
>>
>> http://forums.sun.com/thread.jspa?threadID=5316308
>>
>> paul.miner
>> Posts:2,639
>> Registered: 10/8/07
>> Re: Unexpected end of ZLIB input stream error while compiling    
>> Jul 22, 2008 6:54 AM (reply 1 of 2)  (In reply to original post )   
>>
>> koko191 wrote:
>> Main batch :
>> start /B %SWIFT_LOCAL_HOME%\scripts\rmicAll.bat
>> start /B %SWIFT_LOCAL_HOME%\scripts\create_jar.bat
>>
>> The "start" command does not wait for the command to finish, so both 
>> those batch files would be running in parallel. If they both work on 
>> the same jar, this could be a problem.
>>
>> If you want to run the batch files in sequence, use "call".
>>
>> -----Original Message-----
>> From: Balasubramaniam, Balaji
>> [mailto:Balaji.Balasubramaniam@ejgallo.com]
>> Sent: Tuesday, January 13, 2009 7:05 PM
>> To: pdfbox-users@incubator.apache.org
>> Subject: java.io.EOFException: Unexpected end of ZLIB input stream 
>> error message on UNIX box
>>
>> Hello,
>>
>>  
>>
>> I'm trying to use PdfBox to identify a PDF file is corrupted or not. 
>> We are trying to automate a process in which it is going to loop 
>> through a given folder and see how many of the PDF files are 
>> corrupted. This program works fine in windows XP environment (OS 
>> Version: x86 Windows XP 5.1, Java version
>> : Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran this 
>> application in UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, Java 
>> Version: Java
>> HotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) it 
>> throws the following error.
>>
>>  
>>
>> NOTE: This error is not happening for all the time. It throws the 
>> error only for some of the PDF files. Those PDF files are not 
>> corrupted and I could open those PDF files manually and it opens fine.
>>
>>  
>>
>> java.io.EOFException: Unexpected end of ZLIB input stream
>>
>>         at
>> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)
>>
>>         at
>> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)
>>
>>         at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>>
>>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
>>
>>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
>>
>>         at
>> org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
>>
>>         at
>> org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStream
>> Ar
>> ray.j
>> ava:200)
>>
>>         at
>> org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
>>
>>         at
>> ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMess
>> ag
>> e.inv
>> oke(RunAuditProcessGenerateAuditLogMessage.java:212)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>> av
>> a:39)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>> or
>> Impl.
>> java:25)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:585)
>>
>>         at
>> com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)
>>
>>         at com.tibco.pe.plugin.Activity.eval(Activity.java:209)
>>
>>         at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)
>>
>>         at com.tibco.pe.core.Job.a(Job.java:712)
>>
>>         at com.tibco.pe.core.Job.k(Job.java:501)
>>
>>         at
>> com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)
>>
>>         at
>> com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)
>>
>>  
>>
>> Sample code snippet I use to do the task.
>>
>>  
>>
>> PDDocument document = PDDocument.load(<input stream>);
>>
>> List pages = document.getDocumentCatalog().getAllPages();
>>
>> If(pages != null && pages.size() > 0) {
>>
>>   PDPage page = (PDPage)pages.get(i);
>>
>>   PDStream contents = page.getContents();
>>
>>   PDFStreamParser parser = null;
>>
>>   try {
>>
>>                 parser = new PDFStreamParser(contents.getStream());
>>
>>   } catch(Exception e) {
>>
>>      System.err.println("This PDF cannot be read. Most possibly it 
>> could be corrupted. " + pdfFileName);
>>
>>   }
>>
>> }
>>
>>  
>>
>> Could somebody shed some light on this one?
>>
>>  
>>
>> Thank you.
>>
>>
> 
> 
> --
> Auf der Verpackung stand "benötigt Windows 9x/2000/XP oder BESSER", also
habe ich Linux installiert.
> 
> 
> 
> -----Original Message-----
> From: Abid Hussain [mailto:abid.hussain@abid76.de] 
> Sent: Tuesday, January 20, 2009 6:17 AM
> To: pdfbox-users@incubator.apache.org
> Subject: extract images
> 
> Hello everybody,
> 
> I'm trying to extract images from a pdf file which won't work...:-(
> 
> I tried the ExtractImages.exe which results in:
>  >ExtractImages.exe "C:\path\to\pdf_file"
> Exception in thread "main" java.lang.NullPointerException
>          at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
>          at org.pdfbox.ExtractImages.main(ExtractImages.java:72)
> 
> Then I tried to extract the images using code I copied from the
ExtractImages class:
> Here's a snippet:
> PDXObjectImage image = (PDXObjectImage) images.get(key);
> String name = getUniqueFileName(key, image.getSuffix());
> image.write2file(name);
> 
> The execution of the last line results in:
> java.util.zip.ZipException: unknown compression method
> 	at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
> 	at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
> 	at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
> 	at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
> 	at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
> 	at
org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
> 	at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
> 	at
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:13
8)
> 	at 
>
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.
java:166)
> 	at 
>
org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.
java:118)
> 	at
de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
> 	at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)
> 
> Anybody knows how to get the image extraction work correctly...?
> 
> Best regards,
> 
> Abid
> 

-- 

Abid Hussain


Re: extract images

Posted by Abid Hussain <ab...@abid76.de>.
Thanks for help. Where can I find the provided patch? I looked in the jira but 
didn't find anything. Maybe I have overlooked something?

Regards,

Abid

Peter_Lenahan@ibi.com schrieb:
> Abid,
> 
> This bug may be the same bug that was just patched.
> The line of code it is blowing up on is the same as another bug report.
> " RE: java.io.EOFException: Unexpected end of ZLIB input stream"
> 
> Please get the Patch that Andreas talks about and try that.
> 
> Good Luck,
> Peter
> 
> 
> Hi Peter,
> 
> I've checked all critical locations org.apache.pdfbox.filter.FlateFilter
> and provided a patch.
> 
> Thanks you for your help.
> 
> BR
> Andreas
> 
> Peter_Lenahan@ibi.com schrieb:
>> I forgot to add the number of bytes available in the variable mayRead 
>> to the where statement, in the earlier message. Version 2 is below.
>>
>>
>>      int mayRead=compressedData.available(); // pjl
>>      while ((mayRead > 0 && 
>>             (amountRead = decompressor.read(buffer, 0, 
>>                                Math.min(mayRead,BUFFER_SIZE))) != -1))
>>
>> -----Original Message-----
>> From: Lenahan, Peter
>> Sent: Friday, January 16, 2009 10:26 AM
>> To: pdfbox-users@incubator.apache.org
>> Subject: RE: java.io.EOFException: Unexpected end of ZLIB input stream 
>> error message on UNIX box
>>
>> I did a Google search on your issue. There are a couple of solutions.
>>    InflaterInputStream read Unexpected end of ZLIB It came up with: 
>> Results 1 - 10 of about 854
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4040920
>>
>> Work Around	
>> The workaround is to never attempt to read more bytes than the entry 
>> contains. Call ZipEntry.getSize() to get the actual size of the entry, 
>> then use this value to keep track of the number of bytes remaining in 
>> the entry while reading from it. To take the previous example:
>>
>> This code change may solve the issue for PDFBox. 
>>
>> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>> Add the Math.min() to reduce the number of bytes you are trying to read.
>>
>>                 int mayRead=compressedData.available();
>>                 while ((amountRead = decompressor.read(buffer, 0,
>> Math.min(mayRead,BUFFER_SIZE))) != -1)
>>
>>
>>
>> I found another potential issue like this with a solution on the Sun 
>> site.
>> It was described using windows, but the same could happen on UNIX.
>> It suggests that the issue could happen if you are running several 
>> processes against the same directory. Please look this over to see if 
>> this is the problem. Are you running multiple processes to accomplish 
>> the job faster?
>>
>> http://forums.sun.com/thread.jspa?threadID=5316308
>>
>> paul.miner
>> Posts:2,639
>> Registered: 10/8/07
>> Re: Unexpected end of ZLIB input stream error while compiling    
>> Jul 22, 2008 6:54 AM (reply 1 of 2)  (In reply to original post )   
>>
>> koko191 wrote:
>> Main batch :
>> start /B %SWIFT_LOCAL_HOME%\scripts\rmicAll.bat
>> start /B %SWIFT_LOCAL_HOME%\scripts\create_jar.bat
>>
>> The "start" command does not wait for the command to finish, so both 
>> those batch files would be running in parallel. If they both work on 
>> the same jar, this could be a problem.
>>
>> If you want to run the batch files in sequence, use "call".
>>
>> -----Original Message-----
>> From: Balasubramaniam, Balaji
>> [mailto:Balaji.Balasubramaniam@ejgallo.com]
>> Sent: Tuesday, January 13, 2009 7:05 PM
>> To: pdfbox-users@incubator.apache.org
>> Subject: java.io.EOFException: Unexpected end of ZLIB input stream 
>> error message on UNIX box
>>
>> Hello,
>>
>>  
>>
>> I'm trying to use PdfBox to identify a PDF file is corrupted or not. 
>> We are trying to automate a process in which it is going to loop 
>> through a given folder and see how many of the PDF files are 
>> corrupted. This program works fine in windows XP environment (OS 
>> Version: x86 Windows XP 5.1, Java version
>> : Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran this 
>> application in UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, Java 
>> Version: Java
>> HotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) it 
>> throws the following error.
>>
>>  
>>
>> NOTE: This error is not happening for all the time. It throws the 
>> error only for some of the PDF files. Those PDF files are not 
>> corrupted and I could open those PDF files manually and it opens fine.
>>
>>  
>>
>> java.io.EOFException: Unexpected end of ZLIB input stream
>>
>>         at
>> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)
>>
>>         at
>> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)
>>
>>         at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
>>
>>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
>>
>>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
>>
>>         at
>> org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
>>
>>         at
>> org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStream
>> Ar
>> ray.j
>> ava:200)
>>
>>         at
>> org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
>>
>>         at
>> ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMess
>> ag
>> e.inv
>> oke(RunAuditProcessGenerateAuditLogMessage.java:212)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>> av
>> a:39)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>> or
>> Impl.
>> java:25)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:585)
>>
>>         at
>> com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)
>>
>>         at com.tibco.pe.plugin.Activity.eval(Activity.java:209)
>>
>>         at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)
>>
>>         at com.tibco.pe.core.Job.a(Job.java:712)
>>
>>         at com.tibco.pe.core.Job.k(Job.java:501)
>>
>>         at
>> com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)
>>
>>         at
>> com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)
>>
>>  
>>
>> Sample code snippet I use to do the task.
>>
>>  
>>
>> PDDocument document = PDDocument.load(<input stream>);
>>
>> List pages = document.getDocumentCatalog().getAllPages();
>>
>> If(pages != null && pages.size() > 0) {
>>
>>   PDPage page = (PDPage)pages.get(i);
>>
>>   PDStream contents = page.getContents();
>>
>>   PDFStreamParser parser = null;
>>
>>   try {
>>
>>                 parser = new PDFStreamParser(contents.getStream());
>>
>>   } catch(Exception e) {
>>
>>      System.err.println("This PDF cannot be read. Most possibly it 
>> could be corrupted. " + pdfFileName);
>>
>>   }
>>
>> }
>>
>>  
>>
>> Could somebody shed some light on this one?
>>
>>  
>>
>> Thank you.
>>
>>
> 
> 
> --
> Auf der Verpackung stand "benötigt Windows 9x/2000/XP oder BESSER", also habe ich Linux installiert.
> 
> 
> 
> -----Original Message-----
> From: Abid Hussain [mailto:abid.hussain@abid76.de] 
> Sent: Tuesday, January 20, 2009 6:17 AM
> To: pdfbox-users@incubator.apache.org
> Subject: extract images
> 
> Hello everybody,
> 
> I'm trying to extract images from a pdf file which won't work...:-(
> 
> I tried the ExtractImages.exe which results in:
>  >ExtractImages.exe "C:\path\to\pdf_file"
> Exception in thread "main" java.lang.NullPointerException
>          at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
>          at org.pdfbox.ExtractImages.main(ExtractImages.java:72)
> 
> Then I tried to extract the images using code I copied from the ExtractImages class:
> Here's a snippet:
> PDXObjectImage image = (PDXObjectImage) images.get(key);
> String name = getUniqueFileName(key, image.getSuffix());
> image.write2file(name);
> 
> The execution of the last line results in:
> java.util.zip.ZipException: unknown compression method
> 	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
> 	at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
> 	at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
> 	at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
> 	at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
> 	at org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
> 	at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
> 	at org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:138)
> 	at 
> org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:166)
> 	at 
> org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.java:118)
> 	at de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
> 	at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)
> 
> Anybody knows how to get the image extraction work correctly...?
> 
> Best regards,
> 
> Abid
> 

-- 

Abid Hussain

RE: extract images

Posted by Pe...@ibi.com.
Abid,

This bug may be the same bug that was just patched.
The line of code it is blowing up on is the same as another bug report.
" RE: java.io.EOFException: Unexpected end of ZLIB input stream"

Please get the Patch that Andreas talks about and try that.

Good Luck,
Peter


Hi Peter,

I've checked all critical locations org.apache.pdfbox.filter.FlateFilter
and provided a patch.

Thanks you for your help.

BR
Andreas

Peter_Lenahan@ibi.com schrieb:
> I forgot to add the number of bytes available in the variable mayRead 
> to the where statement, in the earlier message. Version 2 is below.
> 
> 
>      int mayRead=compressedData.available(); // pjl
>      while ((mayRead > 0 && 
>             (amountRead = decompressor.read(buffer, 0, 
>                                Math.min(mayRead,BUFFER_SIZE))) != -1))
> 
> -----Original Message-----
> From: Lenahan, Peter
> Sent: Friday, January 16, 2009 10:26 AM
> To: pdfbox-users@incubator.apache.org
> Subject: RE: java.io.EOFException: Unexpected end of ZLIB input stream 
> error message on UNIX box
> 
> I did a Google search on your issue. There are a couple of solutions.
>    InflaterInputStream read Unexpected end of ZLIB It came up with: 
> Results 1 - 10 of about 854
> 
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4040920
> 
> Work Around	
> The workaround is to never attempt to read more bytes than the entry 
> contains. Call ZipEntry.getSize() to get the actual size of the entry, 
> then use this value to keep track of the number of bytes remaining in 
> the entry while reading from it. To take the previous example:
> 
> This code change may solve the issue for PDFBox. 
> 
> at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
> Add the Math.min() to reduce the number of bytes you are trying to read.
> 
>                 int mayRead=compressedData.available();
>                 while ((amountRead = decompressor.read(buffer, 0,
> Math.min(mayRead,BUFFER_SIZE))) != -1)
> 
> 
> 
> I found another potential issue like this with a solution on the Sun 
> site.
> It was described using windows, but the same could happen on UNIX.
> It suggests that the issue could happen if you are running several 
> processes against the same directory. Please look this over to see if 
> this is the problem. Are you running multiple processes to accomplish 
> the job faster?
> 
> http://forums.sun.com/thread.jspa?threadID=5316308
> 
> paul.miner
> Posts:2,639
> Registered: 10/8/07
> Re: Unexpected end of ZLIB input stream error while compiling    
> Jul 22, 2008 6:54 AM (reply 1 of 2)  (In reply to original post )   
> 
> koko191 wrote:
> Main batch :
> start /B %SWIFT_LOCAL_HOME%\scripts\rmicAll.bat
> start /B %SWIFT_LOCAL_HOME%\scripts\create_jar.bat
> 
> The "start" command does not wait for the command to finish, so both 
> those batch files would be running in parallel. If they both work on 
> the same jar, this could be a problem.
> 
> If you want to run the batch files in sequence, use "call".
> 
> -----Original Message-----
> From: Balasubramaniam, Balaji
> [mailto:Balaji.Balasubramaniam@ejgallo.com]
> Sent: Tuesday, January 13, 2009 7:05 PM
> To: pdfbox-users@incubator.apache.org
> Subject: java.io.EOFException: Unexpected end of ZLIB input stream 
> error message on UNIX box
> 
> Hello,
> 
>  
> 
> I'm trying to use PdfBox to identify a PDF file is corrupted or not. 
> We are trying to automate a process in which it is going to loop 
> through a given folder and see how many of the PDF files are 
> corrupted. This program works fine in windows XP environment (OS 
> Version: x86 Windows XP 5.1, Java version
> : Java HotSpot(tm) Client VM 1.5.0-15-b04). When we ran this 
> application in UNIX box (OS Version: PA_RISC2.0 HP-UX B.11.23, Java 
> Version: Java
> HotSpot(tm) Client VM 1.5.0.11 jinteg:11.07.07-09:52 PA2.0(aCC_AP)) it 
> throws the following error.
> 
>  
> 
> NOTE: This error is not happening for all the time. It throws the 
> error only for some of the PDF files. Those PDF files are not 
> corrupted and I could open those PDF files manually and it opens fine.
> 
>  
> 
> java.io.EOFException: Unexpected end of ZLIB input stream
> 
>         at
> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:216)
> 
>         at
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:134)
> 
>         at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97)
> 
>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
> 
>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
> 
>         at
> org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
> 
>         at
> org.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStream
> Ar
> ray.j
> ava:200)
> 
>         at
> org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101)
> 
>         at
> ProcessDefinitions.RunAuditProcess.RunAuditProcessGenerateAuditLogMess
> ag
> e.inv
> oke(RunAuditProcessGenerateAuditLogMessage.java:212)
> 
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> av
> a:39)
> 
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> or
> Impl.
> java:25)
> 
>         at java.lang.reflect.Method.invoke(Method.java:585)
> 
>         at
> com.tibco.plugin.java.JavaActivity.eval(JavaActivity.java:383)
> 
>         at com.tibco.pe.plugin.Activity.eval(Activity.java:209)
> 
>         at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:540)
> 
>         at com.tibco.pe.core.Job.a(Job.java:712)
> 
>         at com.tibco.pe.core.Job.k(Job.java:501)
> 
>         at
> com.tibco.pe.core.JobDispatcher$JobCourier.a(JobDispatcher.java:249)
> 
>         at
> com.tibco.pe.core.JobDispatcher$JobCourier.run(JobDispatcher.java:200)
> 
>  
> 
> Sample code snippet I use to do the task.
> 
>  
> 
> PDDocument document = PDDocument.load(<input stream>);
> 
> List pages = document.getDocumentCatalog().getAllPages();
> 
> If(pages != null && pages.size() > 0) {
> 
>   PDPage page = (PDPage)pages.get(i);
> 
>   PDStream contents = page.getContents();
> 
>   PDFStreamParser parser = null;
> 
>   try {
> 
>                 parser = new PDFStreamParser(contents.getStream());
> 
>   } catch(Exception e) {
> 
>      System.err.println("This PDF cannot be read. Most possibly it 
> could be corrupted. " + pdfFileName);
> 
>   }
> 
> }
> 
>  
> 
> Could somebody shed some light on this one?
> 
>  
> 
> Thank you.
> 
> 


--
Auf der Verpackung stand "benötigt Windows 9x/2000/XP oder BESSER", also habe ich Linux installiert.



-----Original Message-----
From: Abid Hussain [mailto:abid.hussain@abid76.de] 
Sent: Tuesday, January 20, 2009 6:17 AM
To: pdfbox-users@incubator.apache.org
Subject: extract images

Hello everybody,

I'm trying to extract images from a pdf file which won't work...:-(

I tried the ExtractImages.exe which results in:
 >ExtractImages.exe "C:\path\to\pdf_file"
Exception in thread "main" java.lang.NullPointerException
         at org.pdfbox.ExtractImages.extractImages(ExtractImages.java:138)
         at org.pdfbox.ExtractImages.main(ExtractImages.java:72)

Then I tried to extract the images using code I copied from the ExtractImages class:
Here's a snippet:
PDXObjectImage image = (PDXObjectImage) images.get(key);
String name = getUniqueFileName(key, image.getSuffix());
image.write2file(name);

The execution of the last line results in:
java.util.zip.ZipException: unknown compression method
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:140)
	at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:110)
	at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290)
	at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
	at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
	at org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:226)
	at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
	at org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:138)
	at 
org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:166)
	at 
org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.java:118)
	at de.thecode.pdf.pdfbox.ExtractImages.extractImages(ExtractImages.java:52)
	at de.thecode.pdf.pdfbox.ExtractImages.main(ExtractImages.java:30)

Anybody knows how to get the image extraction work correctly...?

Best regards,

Abid

-- 

Abid Hussain