You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by flywire <fl...@gmail.com> on 2022/02/11 20:36:31 UTC

Command-Line Tools ExtractImages Tiff

Scanning devices commonly embed tiff in pdf files and the format is not
supported by Command-Line Tools ExtractImages. It seems tiff support was
not difficult:
https://www.gnostice.com/nl_article.asp?id=203&t=Convert_Multi-Page_

Could it be added?

Usage: java org.apache.pdfbox.tools.ExtractImages [options] <inputfile>

Options:
  -directJPEG            : Forces the direct extraction of JPEG/JPX images
                           regardless of colorspace or masking

E:\Scans>java -jar pdfbox-app-2.0.25.jar ExtractImages doc.pdf
Writing image: doc-1
Writing image: doc-2
Exception in thread "main" java.util.ServiceConfigurationError:
javax.imageio.spi.ImageReaderSpi: Error reading configuration file
        at java.util.ServiceLoader.fail(Unknown Source)
        at java.util.ServiceLoader.parse(Unknown Source)
        at java.util.ServiceLoader.access$200(Unknown Source)
        at java.util.ServiceLoader$LazyIterator.hasNextService(Unknown
Source)
        at java.util.ServiceLoader$LazyIterator.hasNext(Unknown Source)
        at java.util.ServiceLoader$1.hasNext(Unknown Source)
        at
javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(Unknown
Source)
        at javax.imageio.spi.IIORegistry.<init>(Unknown Source)
        at javax.imageio.spi.IIORegistry.getDefaultInstance(Unknown Source)
        at javax.imageio.ImageIO.<clinit>(Unknown Source)
        at
org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:249)
        at
org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:216)
        at
org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:192)
        at
org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:167)
        at
org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.write2file(ExtractImages.java:505)
        at
org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.drawImage(ExtractImages.java:271)
        at
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:67)
        at
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:939)
        at
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514)
        at
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492)
        at
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
        at
org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.run(ExtractImages.java:219)
        at
org.apache.pdfbox.tools.ExtractImages.extract(ExtractImages.java:197)
        at org.apache.pdfbox.tools.ExtractImages.run(ExtractImages.java:158)
        at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:97)
        at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:64)
Caused by: java.io.FileNotFoundException: E:\Scans\pdfbox (The system
cannot find the file specified)
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(Unknown Source)
        at java.util.zip.ZipFile.<init>(Unknown Source)
        at java.util.jar.JarFile.<init>(Unknown Source)
        at java.util.jar.JarFile.<init>(Unknown Source)
        at sun.net.www.protocol.jar.URLJarFile.<init>(Unknown Source)
        at sun.net.www.protocol.jar.URLJarFile.getJarFile(Unknown Source)
        at sun.net.www.protocol.jar.JarFileFactory.getOrCreate(Unknown
Source)
        at sun.net.www.protocol.jar.JarURLConnection.connect(Unknown Source)
        at sun.net.www.protocol.jar.JarURLConnection.getInputStream(Unknown
Source)
        at java.net.URL.openStream(Unknown Source)
        ... 25 more

Re: Command-Line Tools ExtractImages Tiff

Posted by Tilman Hausherr <TH...@t-online.de>.
Your problem here is something different, it mentions

E:\Scans\pdfbox (The system cannot find the file specified)

Scanners don't embed TIF files, they embed CCITT compressed streams. 
PDFBox supports that. You need to use the appropriate imaging libraries. 
Put them in a lib subfolder and then call PDFBox like this:

java -cp "pdfbox-app-2.0.25.jar;lib/*" org.apache.pdfbox.tools.PDFBox 
ExtractImages .....

On a non Windows OS, you may have to use ":" instead of ";".

Tilman


Am 11.02.2022 um 21:36 schrieb flywire:
> Scanning devices commonly embed tiff in pdf files and the format is not
> supported by Command-Line Tools ExtractImages. It seems tiff support was
> not difficult:
> https://www.gnostice.com/nl_article.asp?id=203&t=Convert_Multi-Page_
>
> Could it be added?
>
> Usage: java org.apache.pdfbox.tools.ExtractImages [options] <inputfile>
>
> Options:
>    -directJPEG            : Forces the direct extraction of JPEG/JPX images
>                             regardless of colorspace or masking
>
> E:\Scans>java -jar pdfbox-app-2.0.25.jar ExtractImages doc.pdf
> Writing image: doc-1
> Writing image: doc-2
> Exception in thread "main" java.util.ServiceConfigurationError:
> javax.imageio.spi.ImageReaderSpi: Error reading configuration file
>          at java.util.ServiceLoader.fail(Unknown Source)
>          at java.util.ServiceLoader.parse(Unknown Source)
>          at java.util.ServiceLoader.access$200(Unknown Source)
>          at java.util.ServiceLoader$LazyIterator.hasNextService(Unknown
> Source)
>          at java.util.ServiceLoader$LazyIterator.hasNext(Unknown Source)
>          at java.util.ServiceLoader$1.hasNext(Unknown Source)
>          at
> javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(Unknown
> Source)
>          at javax.imageio.spi.IIORegistry.<init>(Unknown Source)
>          at javax.imageio.spi.IIORegistry.getDefaultInstance(Unknown Source)
>          at javax.imageio.ImageIO.<clinit>(Unknown Source)
>          at
> org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:249)
>          at
> org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:216)
>          at
> org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:192)
>          at
> org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:167)
>          at
> org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.write2file(ExtractImages.java:505)
>          at
> org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.drawImage(ExtractImages.java:271)
>          at
> org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:67)
>          at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:939)
>          at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514)
>          at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492)
>          at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
>          at
> org.apache.pdfbox.tools.ExtractImages$ImageGraphicsEngine.run(ExtractImages.java:219)
>          at
> org.apache.pdfbox.tools.ExtractImages.extract(ExtractImages.java:197)
>          at org.apache.pdfbox.tools.ExtractImages.run(ExtractImages.java:158)
>          at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.java:97)
>          at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:64)
> Caused by: java.io.FileNotFoundException: E:\Scans\pdfbox (The system
> cannot find the file specified)
>          at java.util.zip.ZipFile.open(Native Method)
>          at java.util.zip.ZipFile.<init>(Unknown Source)
>          at java.util.zip.ZipFile.<init>(Unknown Source)
>          at java.util.jar.JarFile.<init>(Unknown Source)
>          at java.util.jar.JarFile.<init>(Unknown Source)
>          at sun.net.www.protocol.jar.URLJarFile.<init>(Unknown Source)
>          at sun.net.www.protocol.jar.URLJarFile.getJarFile(Unknown Source)
>          at sun.net.www.protocol.jar.JarFileFactory.getOrCreate(Unknown
> Source)
>          at sun.net.www.protocol.jar.JarURLConnection.connect(Unknown Source)
>          at sun.net.www.protocol.jar.JarURLConnection.getInputStream(Unknown
> Source)
>          at java.net.URL.openStream(Unknown Source)
>          ... 25 more
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org