You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/11/03 18:58:00 UTC

[jira] [Comment Edited] (TIKA-2491) Cannot use TikaConfig

    [ https://issues.apache.org/jira/browse/TIKA-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238177#comment-16238177 ] 

Tim Allison edited comment on TIKA-2491 at 11/3/17 6:57 PM:
------------------------------------------------------------

[~gagravarr] solved this:
bq. I think you need to give both the classloader and the config file for your setup
bq. Can you try this constructor:https://tika.apache.org/1.16/api/org/apache/tika/config/TikaConfig.html#TikaConfig-java.net.URL-java.lang.ClassLoader-

bq. With something like new TikaConfig(conf.getResource(customConfFile), this.getClass().getClassLoader());

Nick, this seems strange that we allow for not including the classloader with regular TikaConfig(), but we require it if specifying a config file.  Should we do something like this:

{noformat}
    private static ServiceLoader serviceLoaderFromDomElement(Element element, ClassLoader loader) throws TikaConfigException {

        if (serviceLoaderElement != null) {
             ...some stuff...
+           if (loader == null) {
+                loader = ServiceLoader.getContextClassLoader();
+            }
            serviceLoader = new ServiceLoader(loader, loadErrorHandler, initializableProblemHandler, dynamic);
        } else if(loader != null) {
            serviceLoader = new ServiceLoader(loader);
        } else {
            serviceLoader = new ServiceLoader();
        }
{noformat}



was (Author: tallison@mitre.org):
[~gagravarr] solved this:
bq. I think you need to give both the classloader and the config file for your setup
bq. Can you try this constructor:https://tika.apache.org/1.16/api/org/apache/tika/config/TikaConfig.html#TikaConfig-java.net.URL-java.lang.ClassLoader-

bq. With something like new TikaConfig(conf.getResource(customConfFile), this.getClass().getClassLoader());

Nick, this seems strange that we allow for not including the classloader with regular TikaConfig(), but we require it if specifying a config file.  Should we do something like this:

{noformat}
        if (serviceLoaderElement != null) {
             ...some stuff...
+           if (loader == null) {
+                loader = ServiceLoader.getContextClassLoader();
+            }
            serviceLoader = new ServiceLoader(loader, loadErrorHandler, initializableProblemHandler, dynamic);
        } else if(loader != null) {
            serviceLoader = new ServiceLoader(loader);
        } else {
            serviceLoader = new ServiceLoader();
        }
{noformat}


> Cannot use TikaConfig
> ---------------------
>
>                 Key: TIKA-2491
>                 URL: https://issues.apache.org/jira/browse/TIKA-2491
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.16
>            Reporter: Markus Jelsma
>             Fix For: 1.17
>
>         Attachments: tika-config.xml
>
>
> I need to use a custom tika-config.xml in Nutch, which has support for it but i can't get it to work. 
> This is how Nutch gets the parser:
> Parser parser = tikaConfig.getParser(MediaType.parse(mimeType));
> When no custom config is specified config is:
> new TikaConfig(this.getClass().getClassLoader());
> When i specify a custom config, it is:
> tikaConfig = new TikaConfig(conf.getResource(customConfFile));
> getParser always returns null with a custom config file. There are no errors or exceptions. The config is fine, it fixed the encoding problem in a parser outside of Nutch (thanks again Timothy) but i need to get it to work in Nutch too.
> Our external project does:
> AutoDetectParser parser = new AutoDetectParser(tikaConfig); parser.parse(..);
> and it just works! If i do this in Nutch, however, nothing is passed through the content handlers, the parser result is completely empty?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)