You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Quentin Laville (JIRA)" <ji...@apache.org> on 2019/06/05 09:07:00 UTC
[jira] [Updated] (TIKA-2891) ForkClient "fillBootstrapJar()" lack
few "MANIFEST.MF" properties
[ https://issues.apache.org/jira/browse/TIKA-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quentin Laville updated TIKA-2891:
----------------------------------
Description:
Due to "OOM: heap space" caused by big ".doc" files, we have decided to move to a "ForkParser" in order to get this errors, log them and keep the processing of next documents.
Unfortunately, whenever we have an image in a document, we get the following error:
{code:java}
Unexpected error in forked server process
org.apache.tika.exception.TikaException: Unexpected error in forked server process
... (bunch of line to tell call to "ForkParser.parse" failed)
Cause: java.util.ServiceConfigurationError: javax.imageio.spi.ImageOutputStreamSpi: Provider com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159)
at javax.imageio.ImageIO.<clinit>(ImageIO.java:66)
at org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:174)
...
Cause: java.lang.ExceptionInInitializerError:
at com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
...
Cause: java.lang.NullPointerException:
at com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)
at com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
...
{code}
This kind of errors didn't appear before, when we were only using an "AutodetectParser". My research of a solution lead me to "ForkClient" where you can see that only the "Main-Class" is defined in "META-INF/MANIFEST.MF", whereas in "com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)" they check that the "Implementation-Vendor" and "Implementation-Version" are not null.
As the name of the package suggests, it happens only with files containing image(s).
It's quite easy to reproduce:
# download a simple file example with [this link|https://file-examples.com/wp-content/uploads/2017/10/file-sample_100kB.odt]
# use this piece of code:
{code:java}
def test = {
val forkParser = new ForkParser(ExtractText.getClass.getClassLoader, new AutoDetectParser())
val output = new BodyContentHandler()
val stream = TikaInputStream.get(new FileInputStream("/path/to/file-sample_100kB.odt"))
val ctx = new ParseContext()
forkParser.parse(stream, output, new Metadata(), ctx)
}{code}
was:
Due to "OOM: heap space" caused by big ".doc" files, we have decided to move to a "ForkParser" in order to get this errors, log them and keep the processing of next documents.
Unfortunately, whenever we have an image in a document, we get the following error:
{code:java}
Unexpected error in forked server process
org.apache.tika.exception.TikaException: Unexpected error in forked server process
... (bunch of line to tell call to "ForkParser.parse" failed)
Cause: java.util.ServiceConfigurationError: javax.imageio.spi.ImageOutputStreamSpi: Provider com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159)
at javax.imageio.ImageIO.<clinit>(ImageIO.java:66)
at org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:174)
...
Cause: java.lang.ExceptionInInitializerError:
at com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
...
Cause: java.lang.NullPointerException:
at com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)
at com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
...
{code}
This kind of errors didn't appear before, when we were only using an "AutodetectParser". My research of a solution lead me to "ForkClient" where you can see that only the "Main-Class" is defined, whereas in "com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)" they check that the "Implementation-Vendor" and "Implementation-Version" are not null.
As the name of the package suggests, it happens only with files containing image(s).
It's quite easy to reproduce:
# download a simple file example with [this link|https://file-examples.com/wp-content/uploads/2017/10/file-sample_100kB.odt]
# use this piece of code:
{code:java}
def test = {
val forkParser = new ForkParser(ExtractText.getClass.getClassLoader, new AutoDetectParser())
val output = new BodyContentHandler()
val stream = TikaInputStream.get(new FileInputStream("/path/to/file-sample_100kB.odt"))
val ctx = new ParseContext()
forkParser.parse(stream, output, new Metadata(), ctx)
}{code}
> ForkClient "fillBootstrapJar()" lack few "MANIFEST.MF" properties
> -----------------------------------------------------------------
>
> Key: TIKA-2891
> URL: https://issues.apache.org/jira/browse/TIKA-2891
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.18
> Reporter: Quentin Laville
> Priority: Blocker
> Labels: bug, forkclient, forkparser, parser
>
> Due to "OOM: heap space" caused by big ".doc" files, we have decided to move to a "ForkParser" in order to get this errors, log them and keep the processing of next documents.
> Unfortunately, whenever we have an image in a document, we get the following error:
> {code:java}
> Unexpected error in forked server process
> org.apache.tika.exception.TikaException: Unexpected error in forked server process
> ... (bunch of line to tell call to "ForkParser.parse" failed)
> Cause: java.util.ServiceConfigurationError: javax.imageio.spi.ImageOutputStreamSpi: Provider com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi could not be instantiated
> at java.util.ServiceLoader.fail(ServiceLoader.java:232)
> at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
> at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
> at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
> at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
> at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
> at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159)
> at javax.imageio.ImageIO.<clinit>(ImageIO.java:66)
> at org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:174)
> ...
> Cause: java.lang.ExceptionInInitializerError:
> at com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at java.lang.Class.newInstance(Class.java:442)
> at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
> at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
> at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
> ...
> Cause: java.lang.NullPointerException:
> at com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)
> at com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at java.lang.Class.newInstance(Class.java:442)
> at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
> at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
> ...
> {code}
> This kind of errors didn't appear before, when we were only using an "AutodetectParser". My research of a solution lead me to "ForkClient" where you can see that only the "Main-Class" is defined in "META-INF/MANIFEST.MF", whereas in "com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)" they check that the "Implementation-Vendor" and "Implementation-Version" are not null.
> As the name of the package suggests, it happens only with files containing image(s).
> It's quite easy to reproduce:
> # download a simple file example with [this link|https://file-examples.com/wp-content/uploads/2017/10/file-sample_100kB.odt]
> # use this piece of code:
> {code:java}
> def test = {
> val forkParser = new ForkParser(ExtractText.getClass.getClassLoader, new AutoDetectParser())
> val output = new BodyContentHandler()
> val stream = TikaInputStream.get(new FileInputStream("/path/to/file-sample_100kB.odt"))
> val ctx = new ParseContext()
> forkParser.parse(stream, output, new Metadata(), ctx)
> }{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)