You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Nicholas DiPiazza <ni...@gmail.com> on 2020/06/26 18:30:27 UTC

Tika Server - Getting the log output with MDC to associate the file being parsed

I am happily using Tika Server to replace some in-memory usage of Apache
Tika we have been using for years.

I am stuck with one thing.... I have sent a file to be parsed to the unpack
endpoint /unpack/all

I get back a zip file with the metadata, and text extracted. Great!

But some docs failed to parse, and I'll need to know why. For example,
something is encrypted.

But the response comes back 422. What I really need is to get feedback from
the tika server why it failed. In particular, the error message.

Is there another endpoint I should be using?

-NIcholas DIPiazza

Re: Tika Server - Getting the log output with MDC to associate the file being parsed

Posted by Nicholas DiPiazza <ni...@gmail.com>.
Thanks Tim! That did the trick. I misread what that parameter meant
originally.

And double thanks for the /rmeta. that's a much better fit for what i'm
doing!

On Fri, Jun 26, 2020 at 2:23 PM Tim Allison <ta...@apache.org> wrote:

> Depends on what you're trying to do.  If you want all of the text+metadata
> out of your files including embedded files, I'd use /rmeta
>
> If you start tika-server with  -s or --includeStack, /tika and /rmeta will
> return the full stacktrace.  I can't remember if /unpack will or not.
>
> If you need the literal bytes from the embedded files, then /unpack is the
> right endpoint.
>
> If /unpack isn't returning the stacktrace when you start the server with
> the -s option, please report it.  That endpoint should work like /tika and
> /rmeta with the -s option.
>
> On Fri, Jun 26, 2020 at 2:30 PM Nicholas DiPiazza <
> nicholas.dipiazza@gmail.com> wrote:
>
> > I am happily using Tika Server to replace some in-memory usage of Apache
> > Tika we have been using for years.
> >
> > I am stuck with one thing.... I have sent a file to be parsed to the
> unpack
> > endpoint /unpack/all
> >
> > I get back a zip file with the metadata, and text extracted. Great!
> >
> > But some docs failed to parse, and I'll need to know why. For example,
> > something is encrypted.
> >
> > But the response comes back 422. What I really need is to get feedback
> from
> > the tika server why it failed. In particular, the error message.
> >
> > Is there another endpoint I should be using?
> >
> > -NIcholas DIPiazza
> >
>

Re: Tika Server - Getting the log output with MDC to associate the file being parsed

Posted by Tim Allison <ta...@apache.org>.
Depends on what you're trying to do.  If you want all of the text+metadata
out of your files including embedded files, I'd use /rmeta

If you start tika-server with  -s or --includeStack, /tika and /rmeta will
return the full stacktrace.  I can't remember if /unpack will or not.

If you need the literal bytes from the embedded files, then /unpack is the
right endpoint.

If /unpack isn't returning the stacktrace when you start the server with
the -s option, please report it.  That endpoint should work like /tika and
/rmeta with the -s option.

On Fri, Jun 26, 2020 at 2:30 PM Nicholas DiPiazza <
nicholas.dipiazza@gmail.com> wrote:

> I am happily using Tika Server to replace some in-memory usage of Apache
> Tika we have been using for years.
>
> I am stuck with one thing.... I have sent a file to be parsed to the unpack
> endpoint /unpack/all
>
> I get back a zip file with the metadata, and text extracted. Great!
>
> But some docs failed to parse, and I'll need to know why. For example,
> something is encrypted.
>
> But the response comes back 422. What I really need is to get feedback from
> the tika server why it failed. In particular, the error message.
>
> Is there another endpoint I should be using?
>
> -NIcholas DIPiazza
>