You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sp...@gmx.eu on 2012/06/08 21:35:21 UTC

Adding Custom-Parser to Tika

Hi,

I have written a new parser for tika. The problem is, that I have to edit
org.apache.tika.parser.Parser in the tika.jar. But I do not want to edit the
jar. Is the another way to register the new parser? It must work with a
plain AutoDetectParser, since this is used in oder Parsers directly (e.g.
RFC822Parser).

Thank you.


Re: Adding Custom-Parser to Tika

Posted by Lance Norskog <go...@gmail.com>.
How do you add it to the classpath? And, is there an example somewhere
of how to package one of these external parsers?

If all else fails, the Tika code for loading external parsers is
available for viewing.

On Sat, Jun 9, 2012 at 3:00 AM,  <sp...@gmx.eu> wrote:
>> The doc is old. Tika hunts for parsers in the classpath now.
>>
>> http://www.lucidimagination.com/search/link?url=https://issues
>> .apache.org/jira/browse/SOLR-2116?focusedCommentId=12977072#ac
>> tion_12977072
>
> "Re: tika-config.xml vs. META-INF/services/...; The service provider
> mechanism [1] makes it easy to add custom parser implementations without
> having to maintain a separate copy of the full Tika configuration file. You
> could for example create a my-custom-parsers.jar file with a
> META-INF/services/o.a.tika.parser.Parser file that lists only your custom
> parser classes. When you add that jar to the classpath, Tika would then
> automatically pick up those parsers in addition to the standard parser
> classes from the tika-parsers jar."
>
> This was exactly what I tried, but it did not work.
>
> I'm using Tika 1.1
>



-- 
Lance Norskog
goksron@gmail.com

RE: Adding Custom-Parser to Tika

Posted by sp...@gmx.eu.
> The doc is old. Tika hunts for parsers in the classpath now.
> 
> http://www.lucidimagination.com/search/link?url=https://issues
> .apache.org/jira/browse/SOLR-2116?focusedCommentId=12977072#ac
> tion_12977072

"Re: tika-config.xml vs. META-INF/services/...; The service provider
mechanism [1] makes it easy to add custom parser implementations without
having to maintain a separate copy of the full Tika configuration file. You
could for example create a my-custom-parsers.jar file with a
META-INF/services/o.a.tika.parser.Parser file that lists only your custom
parser classes. When you add that jar to the classpath, Tika would then
automatically pick up those parsers in addition to the standard parser
classes from the tika-parsers jar."

This was exactly what I tried, but it did not work.

I'm using Tika 1.1


Re: Adding Custom-Parser to Tika

Posted by Lance Norskog <go...@gmail.com>.
The doc is old. Tika hunts for parsers in the classpath now.

http://www.lucidimagination.com/search/link?url=https://issues.apache.org/jira/browse/SOLR-2116?focusedCommentId=12977072#action_12977072

On Fri, Jun 8, 2012 at 2:20 PM, Chris Hostetter
<ho...@fucit.org> wrote:
> You canspecify a "tika.config" option pointing at your own
> tika-config.xml files that ExtractionRequestHandler will use to configure
> Tika with...
>
> http://wiki.apache.org/solr/ExtractingRequestHandler
>
> "The tika.config entry points to a file containing a Tika configuration.
> You would only need this if you have customized your own Tika
> configuration. The Tika config contains info about parsers, mime types,
> etc."
>
>
> -Hoss



-- 
Lance Norskog
goksron@gmail.com

RE: Adding Custom-Parser to Tika

Posted by Chris Hostetter <ho...@fucit.org>.
You canspecify a "tika.config" option pointing at your own 
tika-config.xml files that ExtractionRequestHandler will use to configure 
Tika with...

http://wiki.apache.org/solr/ExtractingRequestHandler

"The tika.config entry points to a file containing a Tika configuration. 
You would only need this if you have customized your own Tika 
configuration. The Tika config contains info about parsers, mime types, 
etc."


-Hoss

RE: Adding Custom-Parser to Tika

Posted by sp...@gmx.eu.
The parser must get registered in the service registry
(META-INF/services/org.apache.tika.parser.Parser). Just being in the
classpath does not work. 

> -----Original Message-----
> From: Lance Norskog [mailto:goksron@gmail.com] 
> Sent: Freitag, 8. Juni 2012 22:38
> To: solr-user@lucene.apache.org
> Subject: Re: Adding Custom-Parser to Tika
> 
> Solr will find libs in top-level directory solr/lib (next to solr.xml)
> or a lib/ directory inside each core directory. You can put your new
> parser in a jar file in one of those places. Like this:
> 
> solr/
> solr/solr.xml
> solr/lib
> solr/lib/yourjar.jar
> solr/collection1
> solr/collection1/conf
> solr/collection1/lib
> solr/collection1/lib/yourjar.jar
> 
> On Fri, Jun 8, 2012 at 12:35 PM,  <sp...@gmx.eu> wrote:
> > Hi,
> >
> > I have written a new parser for tika. The problem is, that 
> I have to edit
> > org.apache.tika.parser.Parser in the tika.jar. But I do not 
> want to edit the
> > jar. Is the another way to register the new parser? It must 
> work with a
> > plain AutoDetectParser, since this is used in oder Parsers 
> directly (e.g.
> > RFC822Parser).
> >
> > Thank you.
> >
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com
> 


Re: Adding Custom-Parser to Tika

Posted by Lance Norskog <go...@gmail.com>.
Solr will find libs in top-level directory solr/lib (next to solr.xml)
or a lib/ directory inside each core directory. You can put your new
parser in a jar file in one of those places. Like this:

solr/
solr/solr.xml
solr/lib
solr/lib/yourjar.jar
solr/collection1
solr/collection1/conf
solr/collection1/lib
solr/collection1/lib/yourjar.jar

On Fri, Jun 8, 2012 at 12:35 PM,  <sp...@gmx.eu> wrote:
> Hi,
>
> I have written a new parser for tika. The problem is, that I have to edit
> org.apache.tika.parser.Parser in the tika.jar. But I do not want to edit the
> jar. Is the another way to register the new parser? It must work with a
> plain AutoDetectParser, since this is used in oder Parsers directly (e.g.
> RFC822Parser).
>
> Thank you.
>



-- 
Lance Norskog
goksron@gmail.com