You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Latha Krishnamurthi <la...@shieldx.com> on 2018/07/18 17:15:17 UTC

TIKA-OCR issue

Hi,

I am running tika-server-1.16.jar within a docker container. I build and run this using my own docker file. I connect to it using the tika-python library. This is not able to extract text out of the image files. I then downloaded tesseract and installed the 'so' files in the container and set the LD_LIBARRY_PATH etc. But still the extraction does not happen ? any idea why ? (the text extraction works fine for PDfs, DOCs etc.)

(as a debugging I downloaded the prebuilt docker image and tried it out, it works fine with the image file extraction. I see that they just download teserract in addition). I do not have a tika-config file, but then I tried creating one did not help.

10:14 $ cat tika-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser"/>
    <parser class="org.apache.tika.parser.pdf.PDFParser">
      <params>
        <param name="extractInlineimages" type="bool">true</param>
        <param name="allowExtractionForAccessibility" type="bool">true</param>
        <param name="catchIntermediateExceptions" type="bool">false</param>
        <!-- we really should throw an exception for this.
             We are currently not checking -->
        <param name="someRandomThingOrOther" type="bool">true</param>
      </params>
    </parser>
  </parsers>
</properties>

https://github.com/LogicalSpark/docker-tikaserver

thanks in advance for your response.

Here are my debug log traces when TIKA starts.

======================================================================================================================================
2018-07-17 21:46:38,926 LathaDLP-NOX-18 user.notice tika: Jul 17, 2018 9:46:38 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: TIFFImageWriter not loaded. tiff files will not be processed
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: J2KImageReader not loaded. JPEG2000 files will not be processed.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika:
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: Jul 17, 2018 9:46:39 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: WARNING: org.xerial's sqlite-jdbc is not loaded.
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: Please provide the jar on your classpath to parse sqlite files.
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: See tika-parsers/pom.xml for the correct version.
2018-07-17 21:46:39,367 LathaDLP-NOX-18 user.notice tika: INFO  Starting Apache Tika 1.16 server
2018-07-17 21:46:40,660 LathaDLP-NOX-18 user.notice tika: INFO  Setting the server's publish address to be http://0.0.0.0:9998/
2018-07-17 21:46:40,871 LathaDLP-NOX-18 user.notice tika: INFO  jetty-8.y.z-SNAPSHOT
2018-07-17 21:46:40,963 LathaDLP-NOX-18 user.notice tika: INFO  Started SelectChannelConnector@0.0.0.0:9998
2018-07-17 21:46:40,997 LathaDLP-NOX-18 user.notice tika: INFO  Started Apache Tika server at http://0.0.0.0:9998/
======================================================================================================================================




RE: TIKA-OCR issue

Posted by Latha Krishnamurthi <la...@shieldx.com>.
No luck with 1.18 ☹ Just a fyi that I am using tika-python to talk to the tika-server using the restful interface. I just use their - parsed = unpack.from_file(file, tserver) interface to extract.

Thanks,
Latha.

From: Latha Krishnamurthi
Sent: Wednesday, July 25, 2018 5:29 PM
To: 'user@tika.apache.org' <us...@tika.apache.org>
Subject: RE: TIKA-OCR issue

Also, looks like the stable release is 1.18, so should I try that instead 1.19 ?

Thanks,
Latha.

From: Latha Krishnamurthi
Sent: Wednesday, July 25, 2018 3:22 PM
To: user@tika.apache.org<ma...@tika.apache.org>
Subject: RE: TIKA-OCR issue

Hi, thank you very much for your response.

I am starting tika-server using the following command line options. I am using 1.16.  (I think this was the one logicalspark was using when I got it)

Initially
----------
tika-server-1.16.jar --host 0.0.0.0

With Tesseract libraries
--------------------------------
tika-server-1.16.jar --host 0.0.0.0 -c tika-config.xml

I tried the logicalspark on my development environment and it works fine. On the actual production environment, we didn’t want to download the jar each time, so I download and run it from the docker file. This is a better option since we have a microservices architecture. My docker file just copies the TIKA jar, the teserract shared objects and the TIKA config file to the right location on the target that’s all.
We download once and use it.

May be I should try downloading 1.19 and try it out since this bug points to fix in 1.19 (the logical spark seems to run 1.18 though). Let me try this version and update.

Thank you once again!

Latha.


From: Tim Allison <ta...@apache.org>>
Sent: Tuesday, July 24, 2018 3:04 PM
To: user@tika.apache.org<ma...@tika.apache.org>
Subject: Re: TIKA-OCR issue

How are you starting tika-server? What’s your command line?

This may be caused by
https://issues.apache.org/jira/browse/TIKA-2669, which is now fixed.

Are you using the same version of Tika as the one from LogicalSpark?

If LogicalSpark works, why build your own? I hear some very savvy Tika folks are behind that. :)

On Wed, Jul 18, 2018 at 1:18 PM Latha Krishnamurthi <la...@shieldx.com>> wrote:
Hi,

I am running tika-server-1.16.jar within a docker container. I build and run this using my own docker file. I connect to it using the tika-python library. This is not able to extract text out of the image files. I then downloaded tesseract and installed the ‘so’ files in the container and set the LD_LIBARRY_PATH etc. But still the extraction does not happen ? any idea why ? (the text extraction works fine for PDfs, DOCs etc.)

(as a debugging I downloaded the prebuilt docker image and tried it out, it works fine with the image file extraction. I see that they just download teserract in addition). I do not have a tika-config file, but then I tried creating one did not help.

10:14 $ cat tika-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser"/>
    <parser class="org.apache.tika.parser.pdf.PDFParser">
      <params>
        <param name="extractInlineimages" type="bool">true</param>
        <param name="allowExtractionForAccessibility" type="bool">true</param>
        <param name="catchIntermediateExceptions" type="bool">false</param>
        <!-- we really should throw an exception for this.
             We are currently not checking -->
        <param name="someRandomThingOrOther" type="bool">true</param>
      </params>
    </parser>
  </parsers>
</properties>

https://github.com/LogicalSpark/docker-tikaserver

thanks in advance for your response.

Here are my debug log traces when TIKA starts.

======================================================================================================================================
2018-07-17 21:46:38,926 LathaDLP-NOX-18 user.notice tika: Jul 17, 2018 9:46:38 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: TIFFImageWriter not loaded. tiff files will not be processed
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: J2KImageReader not loaded. JPEG2000 files will not be processed.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika:
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: Jul 17, 2018 9:46:39 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: WARNING: org.xerial's sqlite-jdbc is not loaded.
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: Please provide the jar on your classpath to parse sqlite files.
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: See tika-parsers/pom.xml for the correct version.
2018-07-17 21:46:39,367 LathaDLP-NOX-18 user.notice tika: INFO  Starting Apache Tika 1.16 server
2018-07-17 21:46:40,660 LathaDLP-NOX-18 user.notice tika: INFO  Setting the server's publish address to be http://0.0.0.0:9998/
2018-07-17 21:46:40,871 LathaDLP-NOX-18 user.notice tika: INFO  jetty-8.y.z-SNAPSHOT
2018-07-17 21:46:40,963 LathaDLP-NOX-18 user.notice tika: INFO  Started SelectChannelConnector@0.0.0.0:9998<http://SelectChannelConnector@0.0.0.0:9998>
2018-07-17 21:46:40,997 LathaDLP-NOX-18 user.notice tika: INFO  Started Apache Tika server at http://0.0.0.0:9998/
======================================================================================================================================




RE: TIKA-OCR issue

Posted by Latha Krishnamurthi <la...@shieldx.com>.
Also, looks like the stable release is 1.18, so should I try that instead 1.19 ?

Thanks,
Latha.

From: Latha Krishnamurthi
Sent: Wednesday, July 25, 2018 3:22 PM
To: user@tika.apache.org
Subject: RE: TIKA-OCR issue

Hi, thank you very much for your response.

I am starting tika-server using the following command line options. I am using 1.16.  (I think this was the one logicalspark was using when I got it)

Initially
----------
tika-server-1.16.jar --host 0.0.0.0

With Tesseract libraries
--------------------------------
tika-server-1.16.jar --host 0.0.0.0 -c tika-config.xml

I tried the logicalspark on my development environment and it works fine. On the actual production environment, we didn’t want to download the jar each time, so I download and run it from the docker file. This is a better option since we have a microservices architecture. My docker file just copies the TIKA jar, the teserract shared objects and the TIKA config file to the right location on the target that’s all.
We download once and use it.

May be I should try downloading 1.19 and try it out since this bug points to fix in 1.19 (the logical spark seems to run 1.18 though). Let me try this version and update.

Thank you once again!

Latha.


From: Tim Allison <ta...@apache.org>>
Sent: Tuesday, July 24, 2018 3:04 PM
To: user@tika.apache.org<ma...@tika.apache.org>
Subject: Re: TIKA-OCR issue

How are you starting tika-server? What’s your command line?

This may be caused by
https://issues.apache.org/jira/browse/TIKA-2669, which is now fixed.

Are you using the same version of Tika as the one from LogicalSpark?

If LogicalSpark works, why build your own? I hear some very savvy Tika folks are behind that. :)

On Wed, Jul 18, 2018 at 1:18 PM Latha Krishnamurthi <la...@shieldx.com>> wrote:
Hi,

I am running tika-server-1.16.jar within a docker container. I build and run this using my own docker file. I connect to it using the tika-python library. This is not able to extract text out of the image files. I then downloaded tesseract and installed the ‘so’ files in the container and set the LD_LIBARRY_PATH etc. But still the extraction does not happen ? any idea why ? (the text extraction works fine for PDfs, DOCs etc.)

(as a debugging I downloaded the prebuilt docker image and tried it out, it works fine with the image file extraction. I see that they just download teserract in addition). I do not have a tika-config file, but then I tried creating one did not help.

10:14 $ cat tika-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser"/>
    <parser class="org.apache.tika.parser.pdf.PDFParser">
      <params>
        <param name="extractInlineimages" type="bool">true</param>
        <param name="allowExtractionForAccessibility" type="bool">true</param>
        <param name="catchIntermediateExceptions" type="bool">false</param>
        <!-- we really should throw an exception for this.
             We are currently not checking -->
        <param name="someRandomThingOrOther" type="bool">true</param>
      </params>
    </parser>
  </parsers>
</properties>

https://github.com/LogicalSpark/docker-tikaserver

thanks in advance for your response.

Here are my debug log traces when TIKA starts.

======================================================================================================================================
2018-07-17 21:46:38,926 LathaDLP-NOX-18 user.notice tika: Jul 17, 2018 9:46:38 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: TIFFImageWriter not loaded. tiff files will not be processed
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: J2KImageReader not loaded. JPEG2000 files will not be processed.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika:
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: Jul 17, 2018 9:46:39 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: WARNING: org.xerial's sqlite-jdbc is not loaded.
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: Please provide the jar on your classpath to parse sqlite files.
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: See tika-parsers/pom.xml for the correct version.
2018-07-17 21:46:39,367 LathaDLP-NOX-18 user.notice tika: INFO  Starting Apache Tika 1.16 server
2018-07-17 21:46:40,660 LathaDLP-NOX-18 user.notice tika: INFO  Setting the server's publish address to be http://0.0.0.0:9998/
2018-07-17 21:46:40,871 LathaDLP-NOX-18 user.notice tika: INFO  jetty-8.y.z-SNAPSHOT
2018-07-17 21:46:40,963 LathaDLP-NOX-18 user.notice tika: INFO  Started SelectChannelConnector@0.0.0.0:9998<http://SelectChannelConnector@0.0.0.0:9998>
2018-07-17 21:46:40,997 LathaDLP-NOX-18 user.notice tika: INFO  Started Apache Tika server at http://0.0.0.0:9998/
======================================================================================================================================




RE: TIKA-OCR issue

Posted by Latha Krishnamurthi <la...@shieldx.com>.
Hi, thank you very much for your response.

I am starting tika-server using the following command line options. I am using 1.16.  (I think this was the one logicalspark was using when I got it)

Initially
----------
tika-server-1.16.jar --host 0.0.0.0

With Tesseract libraries
--------------------------------
tika-server-1.16.jar --host 0.0.0.0 -c tika-config.xml

I tried the logicalspark on my development environment and it works fine. On the actual production environment, we didn’t want to download the jar each time, so I download and run it from the docker file. This is a better option since we have a microservices architecture. My docker file just copies the TIKA jar, the teserract shared objects and the TIKA config file to the right location on the target that’s all.
We download once and use it.

May be I should try downloading 1.19 and try it out since this bug points to fix in 1.19 (the logical spark seems to run 1.18 though). Let me try this version and update.

Thank you once again!

Latha.


From: Tim Allison <ta...@apache.org>
Sent: Tuesday, July 24, 2018 3:04 PM
To: user@tika.apache.org
Subject: Re: TIKA-OCR issue

How are you starting tika-server? What’s your command line?

This may be caused by
https://issues.apache.org/jira/browse/TIKA-2669, which is now fixed.

Are you using the same version of Tika as the one from LogicalSpark?

If LogicalSpark works, why build your own? I hear some very savvy Tika folks are behind that. :)

On Wed, Jul 18, 2018 at 1:18 PM Latha Krishnamurthi <la...@shieldx.com>> wrote:
Hi,

I am running tika-server-1.16.jar within a docker container. I build and run this using my own docker file. I connect to it using the tika-python library. This is not able to extract text out of the image files. I then downloaded tesseract and installed the ‘so’ files in the container and set the LD_LIBARRY_PATH etc. But still the extraction does not happen ? any idea why ? (the text extraction works fine for PDfs, DOCs etc.)

(as a debugging I downloaded the prebuilt docker image and tried it out, it works fine with the image file extraction. I see that they just download teserract in addition). I do not have a tika-config file, but then I tried creating one did not help.

10:14 $ cat tika-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser"/>
    <parser class="org.apache.tika.parser.pdf.PDFParser">
      <params>
        <param name="extractInlineimages" type="bool">true</param>
        <param name="allowExtractionForAccessibility" type="bool">true</param>
        <param name="catchIntermediateExceptions" type="bool">false</param>
        <!-- we really should throw an exception for this.
             We are currently not checking -->
        <param name="someRandomThingOrOther" type="bool">true</param>
      </params>
    </parser>
  </parsers>
</properties>

https://github.com/LogicalSpark/docker-tikaserver

thanks in advance for your response.

Here are my debug log traces when TIKA starts.

======================================================================================================================================
2018-07-17 21:46:38,926 LathaDLP-NOX-18 user.notice tika: Jul 17, 2018 9:46:38 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: TIFFImageWriter not loaded. tiff files will not be processed
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: J2KImageReader not loaded. JPEG2000 files will not be processed.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional dependencies.
2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika:
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: Jul 17, 2018 9:46:39 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: WARNING: org.xerial's sqlite-jdbc is not loaded.
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: Please provide the jar on your classpath to parse sqlite files.
2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: See tika-parsers/pom.xml for the correct version.
2018-07-17 21:46:39,367 LathaDLP-NOX-18 user.notice tika: INFO  Starting Apache Tika 1.16 server
2018-07-17 21:46:40,660 LathaDLP-NOX-18 user.notice tika: INFO  Setting the server's publish address to be http://0.0.0.0:9998/
2018-07-17 21:46:40,871 LathaDLP-NOX-18 user.notice tika: INFO  jetty-8.y.z-SNAPSHOT
2018-07-17 21:46:40,963 LathaDLP-NOX-18 user.notice tika: INFO  Started SelectChannelConnector@0.0.0.0:9998<http://SelectChannelConnector@0.0.0.0:9998>
2018-07-17 21:46:40,997 LathaDLP-NOX-18 user.notice tika: INFO  Started Apache Tika server at http://0.0.0.0:9998/
======================================================================================================================================




Re: TIKA-OCR issue

Posted by Tim Allison <ta...@apache.org>.
How are you starting tika-server? What’s your command line?

This may be caused by
https://issues.apache.org/jira/browse/TIKA-2669, which is now fixed.

Are you using the same version of Tika as the one from LogicalSpark?

If LogicalSpark works, why build your own? I hear some very savvy Tika
folks are behind that. :)

On Wed, Jul 18, 2018 at 1:18 PM Latha Krishnamurthi <la...@shieldx.com>
wrote:

> Hi,
>
>
>
> I am running tika-server-1.16.jar within a docker container. I build and
> run this using my own docker file. I connect to it using the tika-python
> library. This is not able to extract text out of the image files. I then
> downloaded tesseract and installed the ‘so’ files in the container and set
> the LD_LIBARRY_PATH etc. But still the extraction does not happen ? any
> idea why ? (the text extraction works fine for PDfs, DOCs etc.)
>
>
>
> (as a debugging I downloaded the prebuilt docker image and tried it out,
> it works fine with the image file extraction. I see that they just download
> teserract in addition). I do not have a tika-config file, but then I tried
> creating one did not help.
>
>
>
> 10:14 $ cat tika-config.xml
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <properties>
>
>   <parsers>
>
>     <parser class="org.apache.tika.parser.DefaultParser"/>
>
>     <parser class="org.apache.tika.parser.pdf.PDFParser">
>
>       <params>
>
>         <param name="extractInlineimages" type="bool">true</param>
>
>         <param name="allowExtractionForAccessibility"
> type="bool">true</param>
>
>         <param name="catchIntermediateExceptions" type="bool">false</param>
>
>         <!-- we really should throw an exception for this.
>
>              We are currently not checking -->
>
>         <param name="someRandomThingOrOther" type="bool">true</param>
>
>       </params>
>
>     </parser>
>
>   </parsers>
>
> </properties>
>
>
>
> https://github.com/LogicalSpark/docker-tikaserver
>
>
>
> thanks in advance for your response.
>
>
>
> Here are my debug log traces when TIKA starts.
>
>
>
>
> ======================================================================================================================================
>
> 2018-07-17 21:46:38,926 LathaDLP-NOX-18 user.notice tika: Jul 17, 2018
> 9:46:38 PM org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
>
> 2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: WARNING:
> JBIG2ImageReader not loaded. jbig2 files will be ignored
>
> 2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See
> https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>
> 2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional
> dependencies.
>
> 2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: TIFFImageWriter
> not loaded. tiff files will not be processed
>
> 2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See
> https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>
> 2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional
> dependencies.
>
> 2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: J2KImageReader
> not loaded. JPEG2000 files will not be processed.
>
> 2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: See
> https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>
> 2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika: for optional
> dependencies.
>
> 2018-07-17 21:46:38,927 LathaDLP-NOX-18 user.notice tika:
>
> 2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: Jul 17, 2018
> 9:46:39 PM org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
>
> 2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: WARNING:
> org.xerial's sqlite-jdbc is not loaded.
>
> 2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: Please provide
> the jar on your classpath to parse sqlite files.
>
> 2018-07-17 21:46:39,250 LathaDLP-NOX-18 user.notice tika: See
> tika-parsers/pom.xml for the correct version.
>
> 2018-07-17 21:46:39,367 LathaDLP-NOX-18 user.notice tika: INFO  Starting
> Apache Tika 1.16 server
>
> 2018-07-17 21:46:40,660 LathaDLP-NOX-18 user.notice tika: INFO  Setting
> the server's publish address to be http://0.0.0.0:9998/
>
> 2018-07-17 21:46:40,871 LathaDLP-NOX-18 user.notice tika: INFO
> jetty-8.y.z-SNAPSHOT
>
> 2018-07-17 21:46:40,963 LathaDLP-NOX-18 user.notice tika: INFO  Started
> SelectChannelConnector@0.0.0.0:9998
>
> 2018-07-17 21:46:40,997 LathaDLP-NOX-18 user.notice tika: INFO  Started
> Apache Tika server at http://0.0.0.0:9998/
>
>
> ======================================================================================================================================
>
>
>
>
>
>
>