You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by James Brooking <j...@jambroo.com> on 2016/02/02 14:16:54 UTC

Fwd: Issues adding custom content-type

Hello Tika People,

I am trying to add a custom content-type to Tika and am finding it
difficult. Not sure if the tutorial I am following is out of date but it
could be the case.

I am using Tika 1.11, which I downloaded from here:
https://www.apache.org/dist/tika/tika-server-1.11.jar

Once I have this file I can successfully run it on my PC using:
java -jar tika-server-1.11.jar -h 0.0.0.0

I created a custom content-type like so:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser">
      <mime-exclude>application/hello</mime-exclude>
    </parser>
    <parser class="org.apache.tika.parser.hello.HelloWorldParser">
      <mime>application/hello</mime>
    </parser>
  </parsers>
</properties>

This was saved into file called parsers.xml.

Then I follow the example in
https://tika.apache.org/1.5/parser_guide.html#Create_your_Parser_class and
ad the parser class.

My question is what do I need to do add to the "java -jar
tika-server-1.11.jar -h 0.0.0.0" command for it to load my custom parser?

Thanks in advanced,
James Brooking

Re: Fwd: Issues adding custom content-type

Posted by James Brooking <j...@jambroo.com>.
Just to follow up on my issue - it seems to work and shows Hello in my 
list of content types. I think it could have been something really 
simple like a mistyped filename.

├── org
│   └── apache
│       └── tika
│           ├── mime
│           │   └── custom-mimetypes.xml
│           └── parser
│               └── hello
│                   ├── HelloParser.class
│                   └── HelloParser.java
└── tika-server-1.11.jar

I ran tika-server-1.11.jar using "java -classpath .:tika-server-1.11.jar 
org.apache.tika.server.TikaServerCli -h 0.0.0.0" and it shows:
*application/hello*
when I go to /mime-types.

To get my parser to be listed in Parsers do I need to add anything to my 
java run script? I can see at the bottom of this page 
https://tika.apache.org/1.11/parser_guide.html#Add_your_MIME-Type they 
say to explicitly tell your AutoDetectParser to include the new parser 
but I am unsure of how to do that.

James

On 02/02/2016 04:41 PM, James Brooking wrote:
> That ran the server fine but when I access 
> http://localhost:9998/parsers HelloParser is not present. Still no 
> errors or any output regarding it.
>
> On Tue, Feb 2, 2016 at 4:33 PM, Nick Burch <apache@gagravarr.org 
> <ma...@gagravarr.org>> wrote:
>
>     On Tue, 2 Feb 2016, James Brooking wrote:
>
>         I tried to add a classpath attribute but that didn't seem to
>         change
>         anything:
>         java -classpath "." -jar tika-server-1.11.jar -h 0.0.0.0
>
>
>     The -jar and -classpath options are sadly mutually incompatible
>
>     Try with:
>       -classpath .:tika-server-1.11.jar
>     org.apache.tika.server.TikaServerCli -h 0.0.0.0
>
>     Nick
>
>


Re: Fwd: Issues adding custom content-type

Posted by James Brooking <j...@jambroo.com>.
That ran the server fine but when I access http://localhost:9998/parsers
HelloParser is not present. Still no errors or any output regarding it.

On Tue, Feb 2, 2016 at 4:33 PM, Nick Burch <ap...@gagravarr.org> wrote:

> On Tue, 2 Feb 2016, James Brooking wrote:
>
>> I tried to add a classpath attribute but that didn't seem to change
>> anything:
>> java -classpath "." -jar tika-server-1.11.jar -h 0.0.0.0
>>
>
> The -jar and -classpath options are sadly mutually incompatible
>
> Try with:
>   -classpath .:tika-server-1.11.jar org.apache.tika.server.TikaServerCli
> -h 0.0.0.0
>
> Nick
>

Re: Fwd: Issues adding custom content-type

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 2 Feb 2016, James Brooking wrote:
> I tried to add a classpath attribute but that didn't seem to change
> anything:
> java -classpath "." -jar tika-server-1.11.jar -h 0.0.0.0

The -jar and -classpath options are sadly mutually incompatible

Try with:
   -classpath .:tika-server-1.11.jar org.apache.tika.server.TikaServerCli -h 0.0.0.0

Nick

Re: Fwd: Issues adding custom content-type

Posted by James Brooking <j...@jambroo.com>.
Hi Nick,

Thanks for taking the time to reply to my question.

You are right I am a bit confused by the different ways. Okay - if I don't
do the custom parser and follow
https://tika.apache.org/1.11/parser_guide.html#Add_your_MIME-Type I have
the following directory structure
.
├── org
│   └── apache
│       └── tika
│           ├── mime
│           │   └── custom-mimetypes.xml
│           └── parser
│               └── hello
│                   ├── HelloParser.class
│                   └── HelloParser.java
└── tika-server-1.11.jar

I tried to add a classpath attribute but that didn't seem to change
anything:
java -classpath "." -jar tika-server-1.11.jar -h 0.0.0.0

The server is functioning but when I go to the list of parsers HelloParser
is not there.

James

On Tue, Feb 2, 2016 at 2:23 PM, Nick Burch <ap...@gagravarr.org> wrote:

> On Tue, 2 Feb 2016, James Brooking wrote:
>
>> I created a custom content-type like so:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <properties>
>>  <parsers>
>>    <parser class="org.apache.tika.parser.DefaultParser">
>>      <mime-exclude>application/hello</mime-exclude>
>>    </parser>
>>    <parser class="org.apache.tika.parser.hello.HelloWorldParser">
>>      <mime>application/hello</mime>
>>    </parser>
>>  </parsers>
>> </properties>
>>
>> This was saved into file called parsers.xml.
>>
>
> That's not a custom mime type / content type file, that seems to be a
> custom Tika XML file. You seem to be confusing several things...
>
> Firstly, to define the mime type - as explained in
> https://tika.apache.org/1.11/parser_guide.html#Add_your_MIME-Type it
> needs to be called custom-mimetypes.xml and stored in the directory
> org/apache/tika/mime/ somewhere on your classpath
>
> If you want to explicitly load a custom parser, rather than letting auto
> loading work for you (which the parser quick guide sets up), then you need
> to follow
> https://tika.apache.org/1.11/configuring.html#Configuring_Parsers
>
> Nick
>

Re: Fwd: Issues adding custom content-type

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 2 Feb 2016, James Brooking wrote:
> I created a custom content-type like so:
> <?xml version="1.0" encoding="UTF-8"?>
> <properties>
>  <parsers>
>    <parser class="org.apache.tika.parser.DefaultParser">
>      <mime-exclude>application/hello</mime-exclude>
>    </parser>
>    <parser class="org.apache.tika.parser.hello.HelloWorldParser">
>      <mime>application/hello</mime>
>    </parser>
>  </parsers>
> </properties>
>
> This was saved into file called parsers.xml.

That's not a custom mime type / content type file, that seems to be a 
custom Tika XML file. You seem to be confusing several things...

Firstly, to define the mime type - as explained in
https://tika.apache.org/1.11/parser_guide.html#Add_your_MIME-Type it needs 
to be called custom-mimetypes.xml and stored in the directory 
org/apache/tika/mime/ somewhere on your classpath

If you want to explicitly load a custom parser, rather than letting auto 
loading work for you (which the parser quick guide sets up), then you need 
to follow 
https://tika.apache.org/1.11/configuring.html#Configuring_Parsers

Nick