You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by flywire <fl...@gmail.com> on 2022/02/11 20:36:31 UTC
Command-Line Tools ExtractImages Tiff
Scanning devices commonly embed tiff in pdf files and the format is not
supported by Command-Line Tools ExtractImages. It seems tiff support was
not difficult:
https://www.gnostice.com/nl_article.asp?id=203&t=Convert_Multi-Page_
Could it be added?
Usage: java org.apache.pdfbox.tools.ExtractImages [options] <inputfile>
Options:
-directJPEG : Forces the direct extraction of JPEG/JPX images
regardless of colorspace or masking
E:\Scans>java -jar pdfbox-app-2.0.25.jar ExtractImages doc.pdf
Writing image: doc-1
Writing image: doc-2
Exception in thread "main" java.util.ServiceConfigurationError:
javax.imageio.spi.ImageReaderSpi: Error reading configuration file
at java.util.ServiceLoader.fail(Unknown Source)
at java.util.ServiceLoader.parse(Unknown Source)
at java.util.ServiceLoader.access$200(Unknown Source)
at java.util.ServiceLoader$LazyIterator.hasNextService(Unknown
Source)
at java.util.ServiceLoader$LazyIterator.hasNext(Unknown Source)
at java.util.ServiceLoader$1.hasNext(Unknown Source)
at
javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(Unknown
Source)
at javax.imageio.spi.IIORegistry.<init>(Unknown Source)
at javax.imageio.spi.IIORegistry.getDefaultInstance(Unknown Source)
at javax.imageio.ImageIO.<clinit>(Unknown Source)
at
org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:249)
at
org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:216)
at
org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:192)
at
org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:167)
at
org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.write2file(ExtractImages.java:505)
at
org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.drawImage(ExtractImages.java:271)
at
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:67)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:939)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
at
org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.run(ExtractImages.java:219)
at
org.apache.pdfbox.tools.ExtractImages.extract(ExtractImages.java:197)
at org.apache.pdfbox.tools.ExtractImages.run(ExtractImages.java:158)
at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:97)
at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:64)
Caused by: java.io.FileNotFoundException: E:\Scans\pdfbox (The system
cannot find the file specified)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(Unknown Source)
at java.util.zip.ZipFile.<init>(Unknown Source)
at java.util.jar.JarFile.<init>(Unknown Source)
at java.util.jar.JarFile.<init>(Unknown Source)
at sun.net.www.protocol.jar.URLJarFile.<init>(Unknown Source)
at sun.net.www.protocol.jar.URLJarFile.getJarFile(Unknown Source)
at sun.net.www.protocol.jar.JarFileFactory.getOrCreate(Unknown
Source)
at sun.net.www.protocol.jar.JarURLConnection.connect(Unknown Source)
at sun.net.www.protocol.jar.JarURLConnection.getInputStream(Unknown
Source)
at java.net.URL.openStream(Unknown Source)
... 25 more
Re: Command-Line Tools ExtractImages Tiff
Posted by Tilman Hausherr <TH...@t-online.de>.
Your problem here is something different, it mentions
E:\Scans\pdfbox (The system cannot find the file specified)
Scanners don't embed TIF files, they embed CCITT compressed streams.
PDFBox supports that. You need to use the appropriate imaging libraries.
Put them in a lib subfolder and then call PDFBox like this:
java -cp "pdfbox-app-2.0.25.jar;lib/*" org.apache.pdfbox.tools.PDFBox
ExtractImages .....
On a non Windows OS, you may have to use ":" instead of ";".
Tilman
Am 11.02.2022 um 21:36 schrieb flywire:
> Scanning devices commonly embed tiff in pdf files and the format is not
> supported by Command-Line Tools ExtractImages. It seems tiff support was
> not difficult:
> https://www.gnostice.com/nl_article.asp?id=203&t=Convert_Multi-Page_
>
> Could it be added?
>
> Usage: java org.apache.pdfbox.tools.ExtractImages [options] <inputfile>
>
> Options:
> -directJPEG : Forces the direct extraction of JPEG/JPX images
> regardless of colorspace or masking
>
> E:\Scans>java -jar pdfbox-app-2.0.25.jar ExtractImages doc.pdf
> Writing image: doc-1
> Writing image: doc-2
> Exception in thread "main" java.util.ServiceConfigurationError:
> javax.imageio.spi.ImageReaderSpi: Error reading configuration file
> at java.util.ServiceLoader.fail(Unknown Source)
> at java.util.ServiceLoader.parse(Unknown Source)
> at java.util.ServiceLoader.access$200(Unknown Source)
> at java.util.ServiceLoader$LazyIterator.hasNextService(Unknown
> Source)
> at java.util.ServiceLoader$LazyIterator.hasNext(Unknown Source)
> at java.util.ServiceLoader$1.hasNext(Unknown Source)
> at
> javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(Unknown
> Source)
> at javax.imageio.spi.IIORegistry.<init>(Unknown Source)
> at javax.imageio.spi.IIORegistry.getDefaultInstance(Unknown Source)
> at javax.imageio.ImageIO.<clinit>(Unknown Source)
> at
> org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:249)
> at
> org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:216)
> at
> org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:192)
> at
> org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:167)
> at
> org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.write2file(ExtractImages.java:505)
> at
> org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.drawImage(ExtractImages.java:271)
> at
> org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:67)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:939)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
> at
> org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.run(ExtractImages.java:219)
> at
> org.apache.pdfbox.tools.ExtractImages.extract(ExtractImages.java:197)
> at org.apache.pdfbox.tools.ExtractImages.run(ExtractImages.java:158)
> at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:97)
> at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:64)
> Caused by: java.io.FileNotFoundException: E:\Scans\pdfbox (The system
> cannot find the file specified)
> at java.util.zip.ZipFile.open(Native Method)
> at java.util.zip.ZipFile.<init>(Unknown Source)
> at java.util.zip.ZipFile.<init>(Unknown Source)
> at java.util.jar.JarFile.<init>(Unknown Source)
> at java.util.jar.JarFile.<init>(Unknown Source)
> at sun.net.www.protocol.jar.URLJarFile.<init>(Unknown Source)
> at sun.net.www.protocol.jar.URLJarFile.getJarFile(Unknown Source)
> at sun.net.www.protocol.jar.JarFileFactory.getOrCreate(Unknown
> Source)
> at sun.net.www.protocol.jar.JarURLConnection.connect(Unknown Source)
> at sun.net.www.protocol.jar.JarURLConnection.getInputStream(Unknown
> Source)
> at java.net.URL.openStream(Unknown Source)
> ... 25 more
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org