You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by James Brooking <j...@jambroo.com> on 2016/02/02 14:16:54 UTC
Fwd: Issues adding custom content-type
Hello Tika People,
I am trying to add a custom content-type to Tika and am finding it
difficult. Not sure if the tutorial I am following is out of date but it
could be the case.
I am using Tika 1.11, which I downloaded from here:
https://www.apache.org/dist/tika/tika-server-1.11.jar
Once I have this file I can successfully run it on my PC using:
java -jar tika-server-1.11.jar -h 0.0.0.0
I created a custom content-type like so:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<mime-exclude>application/hello</mime-exclude>
</parser>
<parser class="org.apache.tika.parser.hello.HelloWorldParser">
<mime>application/hello</mime>
</parser>
</parsers>
</properties>
This was saved into file called parsers.xml.
Then I follow the example in
https://tika.apache.org/1.5/parser_guide.html#Create_your_Parser_class and
ad the parser class.
My question is what do I need to do add to the "java -jar
tika-server-1.11.jar -h 0.0.0.0" command for it to load my custom parser?
Thanks in advanced,
James Brooking
Re: Fwd: Issues adding custom content-type
Posted by James Brooking <j...@jambroo.com>.
Just to follow up on my issue - it seems to work and shows Hello in my
list of content types. I think it could have been something really
simple like a mistyped filename.
├── org
│ └── apache
│ └── tika
│ ├── mime
│ │ └── custom-mimetypes.xml
│ └── parser
│ └── hello
│ ├── HelloParser.class
│ └── HelloParser.java
└── tika-server-1.11.jar
I ran tika-server-1.11.jar using "java -classpath .:tika-server-1.11.jar
org.apache.tika.server.TikaServerCli -h 0.0.0.0" and it shows:
*application/hello*
when I go to /mime-types.
To get my parser to be listed in Parsers do I need to add anything to my
java run script? I can see at the bottom of this page
https://tika.apache.org/1.11/parser_guide.html#Add_your_MIME-Type they
say to explicitly tell your AutoDetectParser to include the new parser
but I am unsure of how to do that.
James
On 02/02/2016 04:41 PM, James Brooking wrote:
> That ran the server fine but when I access
> http://localhost:9998/parsers HelloParser is not present. Still no
> errors or any output regarding it.
>
> On Tue, Feb 2, 2016 at 4:33 PM, Nick Burch <apache@gagravarr.org
> <ma...@gagravarr.org>> wrote:
>
> On Tue, 2 Feb 2016, James Brooking wrote:
>
> I tried to add a classpath attribute but that didn't seem to
> change
> anything:
> java -classpath "." -jar tika-server-1.11.jar -h 0.0.0.0
>
>
> The -jar and -classpath options are sadly mutually incompatible
>
> Try with:
> -classpath .:tika-server-1.11.jar
> org.apache.tika.server.TikaServerCli -h 0.0.0.0
>
> Nick
>
>
Re: Fwd: Issues adding custom content-type
Posted by James Brooking <j...@jambroo.com>.
That ran the server fine but when I access http://localhost:9998/parsers
HelloParser is not present. Still no errors or any output regarding it.
On Tue, Feb 2, 2016 at 4:33 PM, Nick Burch <ap...@gagravarr.org> wrote:
> On Tue, 2 Feb 2016, James Brooking wrote:
>
>> I tried to add a classpath attribute but that didn't seem to change
>> anything:
>> java -classpath "." -jar tika-server-1.11.jar -h 0.0.0.0
>>
>
> The -jar and -classpath options are sadly mutually incompatible
>
> Try with:
> -classpath .:tika-server-1.11.jar org.apache.tika.server.TikaServerCli
> -h 0.0.0.0
>
> Nick
>
Re: Fwd: Issues adding custom content-type
Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 2 Feb 2016, James Brooking wrote:
> I tried to add a classpath attribute but that didn't seem to change
> anything:
> java -classpath "." -jar tika-server-1.11.jar -h 0.0.0.0
The -jar and -classpath options are sadly mutually incompatible
Try with:
-classpath .:tika-server-1.11.jar org.apache.tika.server.TikaServerCli -h 0.0.0.0
Nick
Re: Fwd: Issues adding custom content-type
Posted by James Brooking <j...@jambroo.com>.
Hi Nick,
Thanks for taking the time to reply to my question.
You are right I am a bit confused by the different ways. Okay - if I don't
do the custom parser and follow
https://tika.apache.org/1.11/parser_guide.html#Add_your_MIME-Type I have
the following directory structure
.
├── org
│ └── apache
│ └── tika
│ ├── mime
│ │ └── custom-mimetypes.xml
│ └── parser
│ └── hello
│ ├── HelloParser.class
│ └── HelloParser.java
└── tika-server-1.11.jar
I tried to add a classpath attribute but that didn't seem to change
anything:
java -classpath "." -jar tika-server-1.11.jar -h 0.0.0.0
The server is functioning but when I go to the list of parsers HelloParser
is not there.
James
On Tue, Feb 2, 2016 at 2:23 PM, Nick Burch <ap...@gagravarr.org> wrote:
> On Tue, 2 Feb 2016, James Brooking wrote:
>
>> I created a custom content-type like so:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <properties>
>> <parsers>
>> <parser class="org.apache.tika.parser.DefaultParser">
>> <mime-exclude>application/hello</mime-exclude>
>> </parser>
>> <parser class="org.apache.tika.parser.hello.HelloWorldParser">
>> <mime>application/hello</mime>
>> </parser>
>> </parsers>
>> </properties>
>>
>> This was saved into file called parsers.xml.
>>
>
> That's not a custom mime type / content type file, that seems to be a
> custom Tika XML file. You seem to be confusing several things...
>
> Firstly, to define the mime type - as explained in
> https://tika.apache.org/1.11/parser_guide.html#Add_your_MIME-Type it
> needs to be called custom-mimetypes.xml and stored in the directory
> org/apache/tika/mime/ somewhere on your classpath
>
> If you want to explicitly load a custom parser, rather than letting auto
> loading work for you (which the parser quick guide sets up), then you need
> to follow
> https://tika.apache.org/1.11/configuring.html#Configuring_Parsers
>
> Nick
>
Re: Fwd: Issues adding custom content-type
Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 2 Feb 2016, James Brooking wrote:
> I created a custom content-type like so:
> <?xml version="1.0" encoding="UTF-8"?>
> <properties>
> <parsers>
> <parser class="org.apache.tika.parser.DefaultParser">
> <mime-exclude>application/hello</mime-exclude>
> </parser>
> <parser class="org.apache.tika.parser.hello.HelloWorldParser">
> <mime>application/hello</mime>
> </parser>
> </parsers>
> </properties>
>
> This was saved into file called parsers.xml.
That's not a custom mime type / content type file, that seems to be a
custom Tika XML file. You seem to be confusing several things...
Firstly, to define the mime type - as explained in
https://tika.apache.org/1.11/parser_guide.html#Add_your_MIME-Type it needs
to be called custom-mimetypes.xml and stored in the directory
org/apache/tika/mime/ somewhere on your classpath
If you want to explicitly load a custom parser, rather than letting auto
loading work for you (which the parser quick guide sets up), then you need
to follow
https://tika.apache.org/1.11/configuring.html#Configuring_Parsers
Nick