You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Chris Gamache <cg...@gmail.com> on 2017/07/21 19:28:38 UTC

PDFBox JPEG2000 and Tomcat

Hi all,

I'm using PDFBox 2.0.7 to extract pages from PDFs, convert them to images
and stream them back out.

I have included

<dependency>
<groupId>com.levigo.jbig2</groupId>
<artifactId>levigo-jbig2-imageio</artifactId>
<version>1.6.5</version>
</dependency>
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-core</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-jpeg2000</artifactId>
<version>1.3.0</version>
</dependency>

I can fire up tomcat and everything works fine... The next day at some
point I get

ERROR 15:11:13,114 [http-nio-8080-exec-1 PDFStreamEngine] - Cannot read
JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not
installed

which boggles my mind! They were just there. How could they disappear? Of
course, after a restart everything is fine again-- until next time.

Disk space is fine. RAM is fine. Swap is fine. Nothing else seems adversely
affected.

The way I see it working is:

org.apache.pdfbox.filter.JPXFilter#readJPX
calls org.apache.pdfbox.filter.Filter#findImageReader and that then
iterates over an ImageReader collection provided
by javax.imageio.ImageIO#getImageReadersByFormatName ...

That collection is contained in a singleton IIORegistry obtained from
javax.imageio.spi.IIORegistry#getDefaultInstance.

When that IIORegistry is constructed, it walks the classpath looking for
service provider instances
in javax.imageio.spi.IIORegistry#registerApplicationClasspathSpis ...

It obviously finds the JPEG2000 SPI early on, but then forgets it later. I
can't see how that would be possible, or how to remedy it!

Please help!

Thanks so much,

CG

Re: PDFBox JPEG2000 and Tomcat

Posted by Andreas Lehmkühler <an...@lehmi.de>.
> Chris Gamache <cg...@gmail.com> hat am 25. Juli 2017 um 03:10 geschrieben:
> 
> 
> I also recall one thread on SO where the developer had kept the scope on the imageio jars set to `test` as it is in PDFbox's pom. I wish it were a contributing factor here because it is an easy fix.
> 
> What do you know about SPI? Can I prophylactically re-add the SPI for JPEG2000 in a safe way? I don't think the visibility of that registry is available way way up the call stack. Maybe there's a way I haven't found?
> 
According to [1] java.util.ServiceLoader is the class you are looking for

Andreas
[1] https://docs.oracle.com/javase/tutorial/ext/basics/spi.html

> 
> > On Jul 24, 2017, at 3:46 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> > 
> > http://markmail.org/ offers a search engine for the user mailing list, but I haven't been able to find it either. One person had a problem but the cause was a bad pom file. The one you posted didn't have that problem. Maybe my memory was from a stackoverflow question.
> > 
> > Tilman
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDFBox JPEG2000 and Tomcat

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 25.07.2017 um 03:10 schrieb Chris Gamache:
> I also recall one thread on SO where the developer had kept the scope on the imageio jars set to `test` as it is in PDFbox's pom. I wish it were a contributing factor here because it is an easy fix.
>
> What do you know about SPI? Can I prophylactically re-add the SPI for JPEG2000 in a safe way? I don't think the visibility of that registry is available way way up the call stack. Maybe there's a way I haven't found?


Sorry, I know nothing about SPI.

Tilman



>
>
>> On Jul 24, 2017, at 3:46 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>
>> http://markmail.org/ offers a search engine for the user mailing list, but I haven't been able to find it either. One person had a problem but the cause was a bad pom file. The one you posted didn't have that problem. Maybe my memory was from a stackoverflow question.
>>
>> Tilman
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: PDFBox JPEG2000 and Tomcat

Posted by Chris Gamache <cg...@gmail.com>.
I also recall one thread on SO where the developer had kept the scope on the imageio jars set to `test` as it is in PDFbox's pom. I wish it were a contributing factor here because it is an easy fix.

What do you know about SPI? Can I prophylactically re-add the SPI for JPEG2000 in a safe way? I don't think the visibility of that registry is available way way up the call stack. Maybe there's a way I haven't found?


> On Jul 24, 2017, at 3:46 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> 
> http://markmail.org/ offers a search engine for the user mailing list, but I haven't been able to find it either. One person had a problem but the cause was a bad pom file. The one you posted didn't have that problem. Maybe my memory was from a stackoverflow question.
> 
> Tilman
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDFBox JPEG2000 and Tomcat

Posted by Tilman Hausherr <TH...@t-online.de>.
http://markmail.org/ offers a search engine for the user mailing list, 
but I haven't been able to find it either. One person had a problem but 
the cause was a bad pom file. The one you posted didn't have that 
problem. Maybe my memory was from a stackoverflow question.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDFBox JPEG2000 and Tomcat

Posted by Chris Gamache <cg...@gmail.com>.
100% sure (verified by your truly on the file system) the jars are
contained in the war. 100% sure that it does work in the beginning. 100%
sure that at some time in the future the provider goes awol, that message
gets emitted, and PDFbox ceases to render those PDF pages to images. I'll
never 100% rule out my own error, but I feel like I've checked and double
checked and this is a genuine conundrum.

These PDFs are coming from another tool hooked to a high-volume scanner
that outputs layered inviso-OCRed-text pdfs, and the page images are in
JPEG2000. Not my scanner. Not my tool. But our job is to slice and serve
them up.

I haven't found a similar problem's thread in Google, but my Google-fu may
be weak in this regard. Any pointers to this thread or an existing issue
are /very/ welcome.

I should mention that we're using Tomcat's parallel deployment feature. But
the problems don't happen after a parallel deployment occurs. I'm not
ruling out that being a contributing factor, but the timing of the failures
don't seem to relate.


On Fri, Jul 21, 2017 at 3:35 PM, Tilman Hausherr <TH...@t-online.de>
wrote:

> Are you sure that the jar files are in your classpath / in your .war file?
> I.e. are you sure that it did work at the beginning? PDFs with JPX images
> don't happen often.
>
> I think a similar problem was mentioned here some months ago... but there
> (I think) it was some IBM server...
>
> Tilman
>
>
> Am 21.07.2017 um 21:28 schrieb Chris Gamache:
>
>> Hi all,
>>
>> I'm using PDFBox 2.0.7 to extract pages from PDFs, convert them to images
>> and stream them back out.
>>
>> I have included
>>
>> <dependency>
>> <groupId>com.levigo.jbig2</groupId>
>> <artifactId>levigo-jbig2-imageio</artifactId>
>> <version>1.6.5</version>
>> </dependency>
>> <dependency>
>> <groupId>com.github.jai-imageio</groupId>
>> <artifactId>jai-imageio-core</artifactId>
>> <version>1.3.1</version>
>> </dependency>
>> <dependency>
>> <groupId>com.github.jai-imageio</groupId>
>> <artifactId>jai-imageio-jpeg2000</artifactId>
>> <version>1.3.0</version>
>> </dependency>
>>
>> I can fire up tomcat and everything works fine... The next day at some
>> point I get
>>
>> ERROR 15:11:13,114 [http-nio-8080-exec-1 PDFStreamEngine] - Cannot read
>> JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not
>> installed
>>
>> which boggles my mind! They were just there. How could they disappear? Of
>> course, after a restart everything is fine again-- until next time.
>>
>> Disk space is fine. RAM is fine. Swap is fine. Nothing else seems
>> adversely
>> affected.
>>
>> The way I see it working is:
>>
>> org.apache.pdfbox.filter.JPXFilter#readJPX
>> calls org.apache.pdfbox.filter.Filter#findImageReader and that then
>> iterates over an ImageReader collection provided
>> by javax.imageio.ImageIO#getImageReadersByFormatName ...
>>
>> That collection is contained in a singleton IIORegistry obtained from
>> javax.imageio.spi.IIORegistry#getDefaultInstance.
>>
>> When that IIORegistry is constructed, it walks the classpath looking for
>> service provider instances
>> in javax.imageio.spi.IIORegistry#registerApplicationClasspathSpis ...
>>
>> It obviously finds the JPEG2000 SPI early on, but then forgets it later. I
>> can't see how that would be possible, or how to remedy it!
>>
>> Please help!
>>
>> Thanks so much,
>>
>> CG
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: PDFBox JPEG2000 and Tomcat

Posted by Tilman Hausherr <TH...@t-online.de>.
Are you sure that the jar files are in your classpath / in your .war 
file? I.e. are you sure that it did work at the beginning? PDFs with JPX 
images don't happen often.

I think a similar problem was mentioned here some months ago... but 
there (I think) it was some IBM server...

Tilman

Am 21.07.2017 um 21:28 schrieb Chris Gamache:
> Hi all,
>
> I'm using PDFBox 2.0.7 to extract pages from PDFs, convert them to images
> and stream them back out.
>
> I have included
>
> <dependency>
> <groupId>com.levigo.jbig2</groupId>
> <artifactId>levigo-jbig2-imageio</artifactId>
> <version>1.6.5</version>
> </dependency>
> <dependency>
> <groupId>com.github.jai-imageio</groupId>
> <artifactId>jai-imageio-core</artifactId>
> <version>1.3.1</version>
> </dependency>
> <dependency>
> <groupId>com.github.jai-imageio</groupId>
> <artifactId>jai-imageio-jpeg2000</artifactId>
> <version>1.3.0</version>
> </dependency>
>
> I can fire up tomcat and everything works fine... The next day at some
> point I get
>
> ERROR 15:11:13,114 [http-nio-8080-exec-1 PDFStreamEngine] - Cannot read
> JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not
> installed
>
> which boggles my mind! They were just there. How could they disappear? Of
> course, after a restart everything is fine again-- until next time.
>
> Disk space is fine. RAM is fine. Swap is fine. Nothing else seems adversely
> affected.
>
> The way I see it working is:
>
> org.apache.pdfbox.filter.JPXFilter#readJPX
> calls org.apache.pdfbox.filter.Filter#findImageReader and that then
> iterates over an ImageReader collection provided
> by javax.imageio.ImageIO#getImageReadersByFormatName ...
>
> That collection is contained in a singleton IIORegistry obtained from
> javax.imageio.spi.IIORegistry#getDefaultInstance.
>
> When that IIORegistry is constructed, it walks the classpath looking for
> service provider instances
> in javax.imageio.spi.IIORegistry#registerApplicationClasspathSpis ...
>
> It obviously finds the JPEG2000 SPI early on, but then forgets it later. I
> can't see how that would be possible, or how to remedy it!
>
> Please help!
>
> Thanks so much,
>
> CG
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org