You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Katsuya Tomioka <ka...@gmail.com> on 2019/11/12 18:48:58 UTC
Encoding detectors in OSGi (tika-bundle)
I'm having trouble accessing encoding detectors in OSGi with Tika 1.22. AutoDetectParser returns "Failed to detect the character encoding of a document" for non-Latin text. We are migrating from 1.10, I'm sure many things are different. It seems like my problem is while all the detectors are in tika-parser, the code is loading from tika-core's. I see parsers and detectors are tracked as services. Do I need to do something similar to load encoding detectors as well?
Thanks,
-Katsuya
Re: Encoding detectors in OSGi (tika-bundle)
Posted by Katsuya Tomioka <ka...@gmail.com>.
My current approach is to set ServiceLoader's context class loader which seems to be working. It's a bit awkward, but I'm doing like:
ServiceLoader.setContextClassLoader(Icu4jEncodingDetector.class.getClassLoader());
On 2019/11/13 06:44:02, Nick Burch <ap...@gagravarr.org> wrote:
> On Tue, 12 Nov 2019, Katsuya Tomioka wrote:
> > I'm having trouble accessing encoding detectors in OSGi with Tika 1.22.
> > AutoDetectParser returns "Failed to detect the character encoding of a
> > document" for non-Latin text. We are migrating from 1.10, I'm sure many
> > things are different. It seems like my problem is while all the
> > detectors are in tika-parser, the code is loading from tika-core's. I
> > see parsers and detectors are tracked as services. Do I need to do
> > something similar to load encoding detectors as well?
>
> The things which are currently loaded via services are:
> * Parsers
> * Detectors (file type)
> * Translators
> * Encoding Detection
> * Langauge Detection
> * Probability-based type detectors
>
> I think there might be helpers to assist with those, hopefully one of our
> OSGi experts will be along shortly to advise!
>
> Nick
>
Re: Encoding detectors in OSGi (tika-bundle)
Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 12 Nov 2019, Katsuya Tomioka wrote:
> I'm having trouble accessing encoding detectors in OSGi with Tika 1.22.
> AutoDetectParser returns "Failed to detect the character encoding of a
> document" for non-Latin text. We are migrating from 1.10, I'm sure many
> things are different. It seems like my problem is while all the
> detectors are in tika-parser, the code is loading from tika-core's. I
> see parsers and detectors are tracked as services. Do I need to do
> something similar to load encoding detectors as well?
The things which are currently loaded via services are:
* Parsers
* Detectors (file type)
* Translators
* Encoding Detection
* Langauge Detection
* Probability-based type detectors
I think there might be helpers to assist with those, hopefully one of our
OSGi experts will be along shortly to advise!
Nick