You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Subhajit Das <Su...@live.com> on 2021/03/12 17:05:50 UTC

TikaServer not initializing properly

I am getting this in console out:
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
But nothing on logs.

When a /tika put is send for PDF, I get nullpointer exception in AbstractPDF2XHTML.java in line 434.

Using Tikaconfig:

<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <service-loader loadErrorHandler="WARN"/>
  <parsers>
    <parser class="org.apache.tika.parser.pdf.PDFParser">
      <params>
        <param name="ocrStrategy" type="string">ocr_only</param>
        <param name="ocrImageType" type="string">rgb</param>
        <param name="ocrDPI" type="int">300</param>
      </params>
    </parser>
  </parsers>
</properties>


RE: TikaServer not initializing properly

Posted by Subhajit Das <su...@live.com>.
Hi

It seems to be issue in my side, as I was not excluding PDFParser from Default parser.

Now it is solved.

Thanks and Regards,
Subhajit

________________________________
From: Tim Allison <ta...@apache.org>
Sent: Friday, March 12, 2021 11:33:02 PM
To: user@tika.apache.org <us...@tika.apache.org>
Subject: Re: TikaServer not initializing properly

We should handle this more gracefully (and I think we do in our main
branch, Tika 2.0.0), but the problem is that you're only loading the
PDFParser...not the TesseractOCRParser so the PDFParser throws an NPE
when it can't find tesseract.

Make sure to include the DefaultParser, which will also load Tesseract.

<properties>
    <parsers>
        <parser class="org.apache.tika.parser.DefaultParser"/>
        <parser class="org.apache.tika.parser.pdf.PDFParser">
...


On Fri, Mar 12, 2021 at 12:06 PM Subhajit Das <Su...@live.com> wrote:
>
> I am getting this in console out:
>
> org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
>
> But nothing on logs.
>
>
>
> When a /tika put is send for PDF, I get nullpointer exception in AbstractPDF2XHTML.java in line 434.
>
>
>
> Using Tikaconfig:
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <properties>
>
>   <service-loader loadErrorHandler="WARN"/>
>
>   <parsers>
>
>     <parser class="org.apache.tika.parser.pdf.PDFParser">
>
>       <params>
>
>         <param name="ocrStrategy" type="string">ocr_only</param>
>
>         <param name="ocrImageType" type="string">rgb</param>
>
>         <param name="ocrDPI" type="int">300</param>
>
>       </params>
>
>     </parser>
>
>   </parsers>
>
> </properties>
>
>

Re: TikaServer not initializing properly

Posted by Tim Allison <ta...@apache.org>.
We should handle this more gracefully (and I think we do in our main
branch, Tika 2.0.0), but the problem is that you're only loading the
PDFParser...not the TesseractOCRParser so the PDFParser throws an NPE
when it can't find tesseract.

Make sure to include the DefaultParser, which will also load Tesseract.

<properties>
    <parsers>
        <parser class="org.apache.tika.parser.DefaultParser"/>
        <parser class="org.apache.tika.parser.pdf.PDFParser">
...


On Fri, Mar 12, 2021 at 12:06 PM Subhajit Das <Su...@live.com> wrote:
>
> I am getting this in console out:
>
> org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
>
> But nothing on logs.
>
>
>
> When a /tika put is send for PDF, I get nullpointer exception in AbstractPDF2XHTML.java in line 434.
>
>
>
> Using Tikaconfig:
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <properties>
>
>   <service-loader loadErrorHandler="WARN"/>
>
>   <parsers>
>
>     <parser class="org.apache.tika.parser.pdf.PDFParser">
>
>       <params>
>
>         <param name="ocrStrategy" type="string">ocr_only</param>
>
>         <param name="ocrImageType" type="string">rgb</param>
>
>         <param name="ocrDPI" type="int">300</param>
>
>       </params>
>
>     </parser>
>
>   </parsers>
>
> </properties>
>
>