You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@tika.apache.org by Dmitry Minkovsky <dm...@gmail.com> on 2015/03/10 02:59:58 UTC

Facade uses the EmptyParser despite correct type detection

I am trying to use the Tika facade. Here's my test code:


Tika tika = new Tika();
Metadata md = new Metadata();

try {
    String content = tika.parseToString(src, md, 100000);

    System.out.println("Content length: " + content.length());

    for (String s: md.names()) {
        System.out.println(s + ": " + md.get(s));
    }
}
catch (TikaException e) { System.out.println(e); }


Here's the output:

> Content length: 0
> X-Parsed-By: org.apache.tika.parser.EmptyParser
> Content-Type: text/html

So:

* If Tika correctly identifies the input as text/html, why does it use the
EmptyParser?
* If I'm supposed to pass a parser, which parser should I pass for best
results, assuming that autodetection is successful, as it seems to be above.

Thank you,
Dmitry

Re: Facade uses the EmptyParser despite correct type detection

Posted by Dmitry Minkovsky <dm...@gmail.com>.

Pardon the interruption:  I did not have tika-parsers on my classpath!

Thank you,
Dmitry

On Mon, Mar 9, 2015 at 9:59 PM, Dmitry Minkovsky <dm...@gmail.com>
wrote:

> I am trying to use the Tika facade. Here's my test code:
>
>
> Tika tika = new Tika();
> Metadata md = new Metadata();
>
> try {
>     String content = tika.parseToString(src, md, 100000);
>
>     System.out.println("Content length: " + content.length());
>
>     for (String s: md.names()) {
>         System.out.println(s + ": " + md.get(s));
>     }
> }
> catch (TikaException e) { System.out.println(e); }
>
>
> Here's the output:
>
> > Content length: 0
> > X-Parsed-By: org.apache.tika.parser.EmptyParser
> > Content-Type: text/html
>
> So:
>
> * If Tika correctly identifies the input as text/html, why does it use the
> EmptyParser?
> * If I'm supposed to pass a parser, which parser should I pass for best
> results, assuming that autodetection is successful, as it seems to be above.
>
> Thank you,
> Dmitry
>