You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Dan Ducar (JIRA)" <ji...@apache.org> on 2010/05/27 10:57:38 UTC

[jira] Created: (JCR-2642) JackrabbitParser and tika 0.7 parser

JackrabbitParser and tika 0.7 parser
------------------------------------

                 Key: JCR-2642
                 URL: https://issues.apache.org/jira/browse/JCR-2642
             Project: Jackrabbit Content Repository
          Issue Type: New Feature
          Components: jackrabbit-core
    Affects Versions: 2.1.0
            Reporter: Dan Ducar


Hi,

I was trying to implement a custom parser and found the following problem.
Since tika 0.7 it is possible to implement your custom parser and specify it into a service provider configuration file (META-INF/services/org.apache.tika.parser.Parser). In this way there would be no need to maintain a custom tika-config.xml file if you'd like to implement a custom parser.

The problem that I had was in the JackrabbitParser because I wasn't able to instantiate the AutoDetectParser with the default constructor is will be instantiated using the default TikaConfig constructor.
Basically from tika 0.7, the TikaConfig.getTikaConfig() is instantiating the TikaConfig using the default constructor instead of accessing the tika-config.xml file from withing the package, and reads the service provider configuration files and populate the parsers map.

What I'm proposing is to change the JackrabbitParser to instantiate the AutoDetectParser using the default constructor, in this way the using tika version >= 0.7 we could easily implement our own parsers and there won't be a reason to maintain the tika-config.xml, also a sort of "backward" compatibility would be maintained because using the AutoDetectParser default constructor the TikaConfig is instantiated using TikaConfig.getTikaConfig() wich for tika versions < 0.7 calls the TikaConfig(InputStream) constructor whcih reads the configuration directly from the package.

Basically the JackrabbitParser should look like this:

    public JackrabbitParser() {
            	parser = new AutoDetectParser();
    }
 
Thanks,
Dan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2642) JackrabbitParser and tika 0.7 parser

Posted by "Dan Ducar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Ducar updated JCR-2642:
---------------------------

    Issue Type: Improvement  (was: New Feature)

> JackrabbitParser and tika 0.7 parser
> ------------------------------------
>
>                 Key: JCR-2642
>                 URL: https://issues.apache.org/jira/browse/JCR-2642
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.1.0
>            Reporter: Dan Ducar
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Hi,
> I was trying to implement a custom parser and found the following problem.
> Since tika 0.7 it is possible to implement your custom parser and specify it into a service provider configuration file (META-INF/services/org.apache.tika.parser.Parser). In this way there would be no need to maintain a custom tika-config.xml file if you'd like to implement a custom parser.
> The problem that I had was in the JackrabbitParser because I wasn't able to instantiate the AutoDetectParser with the default constructor is will be instantiated using the default TikaConfig constructor.
> Basically from tika 0.7, the TikaConfig.getTikaConfig() is instantiating the TikaConfig using the default constructor instead of accessing the tika-config.xml file from withing the package, and reads the service provider configuration files and populate the parsers map.
> What I'm proposing is to change the JackrabbitParser to instantiate the AutoDetectParser using the default constructor, in this way the using tika version >= 0.7 we could easily implement our own parsers and there won't be a reason to maintain the tika-config.xml, also a sort of "backward" compatibility would be maintained because using the AutoDetectParser default constructor the TikaConfig is instantiated using TikaConfig.getTikaConfig() wich for tika versions < 0.7 calls the TikaConfig(InputStream) constructor whcih reads the configuration directly from the package.
> Basically the JackrabbitParser should look like this:
>     public JackrabbitParser() {
>             	parser = new AutoDetectParser();
>     }
>  
> Thanks,
> Dan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.