You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Aditya Dhulipala <ad...@usc.edu> on 2015/07/23 22:03:41 UTC

How to:- Extending Tika within Solr

Hi,


I have implemented a new file-type parser for TIka. It parses a custom
filetype (*.mx)


I would like my Solr instance to use my version of Tika with the mx parser.

I found this by a google search

https://lucidworks.com/blog/extending-apache-tika-capabilities/

But it seems to be over 5 years old. And the "download project" link is
broken


Can anybody help me with this?


I tried replaceing the tika-* jars within contrib/extraction/lib under
solr-root with my compiled tika-* jars. But that didn't work, Solr is still
using the old Tika binaries (i.e. without .mx parser). I know that my
tika-** jars are working correctly, because I can run them in GUI mode and
parse a test .mx file.



Thanks!

-

Aditya

Re: How to:- Extending Tika within Solr

Posted by Jan Høydahl <ja...@cominvent.com>.
Moving discussion from dev to user-list (CC to Aditya in case you’re not on the user list)

Since you define a new parser, you should be able to simply drop your parser’s JAR(s)
on the class path of Solr, without modifying core Tika at all, and it will be discovered
with the SPI-mechanism. 

You can quickly test by posting your .mx file to Solr with parameter &extractOnly=true, it will return extracted XML back

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. jul. 2015 kl. 12.45 skrev Jan Høydahl <ja...@cominvent.com>:
> 
> You can place a file called tika.config in your Solr core’s conf directory, and Solr’s
> ExtractingRequestHandler will parse it. In there you can define your custom new parser.
> 
> See https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 23. jul. 2015 kl. 22.03 skrev Aditya Dhulipala <ad...@usc.edu>:
>> 
>> Hi,
>> 
>> 
>> I have implemented a new file-type parser for TIka. It parses a custom
>> filetype (*.mx)
>> 
>> 
>> I would like my Solr instance to use my version of Tika with the mx parser.
>> 
>> I found this by a google search
>> 
>> https://lucidworks.com/blog/extending-apache-tika-capabilities/
>> 
>> But it seems to be over 5 years old. And the "download project" link is
>> broken
>> 
>> 
>> Can anybody help me with this?
>> 
>> 
>> I tried replaceing the tika-* jars within contrib/extraction/lib under
>> solr-root with my compiled tika-* jars. But that didn't work, Solr is still
>> using the old Tika binaries (i.e. without .mx parser). I know that my
>> tika-** jars are working correctly, because I can run them in GUI mode and
>> parse a test .mx file.
>> 
>> 
>> 
>> Thanks!
>> 
>> -
>> 
>> Aditya
> 


Re: How to:- Extending Tika within Solr

Posted by Jan Høydahl <ja...@cominvent.com>.
You can place a file called tika.config in your Solr core’s conf directory, and Solr’s
ExtractingRequestHandler will parse it. In there you can define your custom new parser.

See https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 23. jul. 2015 kl. 22.03 skrev Aditya Dhulipala <ad...@usc.edu>:
> 
> Hi,
> 
> 
> I have implemented a new file-type parser for TIka. It parses a custom
> filetype (*.mx)
> 
> 
> I would like my Solr instance to use my version of Tika with the mx parser.
> 
> I found this by a google search
> 
> https://lucidworks.com/blog/extending-apache-tika-capabilities/
> 
> But it seems to be over 5 years old. And the "download project" link is
> broken
> 
> 
> Can anybody help me with this?
> 
> 
> I tried replaceing the tika-* jars within contrib/extraction/lib under
> solr-root with my compiled tika-* jars. But that didn't work, Solr is still
> using the old Tika binaries (i.e. without .mx parser). I know that my
> tika-** jars are working correctly, because I can run them in GUI mode and
> parse a test .mx file.
> 
> 
> 
> Thanks!
> 
> -
> 
> Aditya