You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Sergey Beryozkin (JIRA)" <ji...@apache.org> on 2014/01/19 19:21:20 UTC

[jira] [Comment Edited] (TIKA-1198) Consider optionally utilizing CXF JAX-RS Attachment support

    [ https://issues.apache.org/jira/browse/TIKA-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875956#comment-13875956 ] 

Sergey Beryozkin edited comment on TIKA-1198 at 1/19/14 6:20 PM:
-----------------------------------------------------------------

Dave, I've missed your comment with the exception trace, sorry about it.

After seeing a comment from Jeremy I've tested the JAX-RS server and I can confirm all works as expected.

Note, "curl -T somefile targetURI" does not set Content-Type which explains the exception you are seeing. TikaServer has two resource methods accepting PUT payloads on the same path, one - specifically the multipart/form-data ones and another - all other types of payloads, and it uses a wildcard to match all possible types.  Thus a method with a more specific JAX-RS Consumes value (multipart/form-data) is chosen when no Content-Type is available: the error actually mentions an octet-stream - this is a default content type assigned to an individual  multipart/form-data.

Two fixes are possible:

1. Use -H curl parameter, for example, I've started a server (using a newly added -Pserver profile) and posted a pom.xml to it, adding '-H "Content-Type: text/xml"' and all worked fine. So the actual 'fix' is to update the docs and recommend to set up Content-Type when no multiparts are used.

2. Have a TikaServer resource method accepting multiparts listen on a unique path, say on "http://localhost:9998/tika/form"

Option 2 is less 'disruptive' but option 1 is marginally cleaner IMHO as the clients PUT-ing something into the server are expected to set Content-Type.

I'm fine with implementing Option 2 though too - perhaps it can be done anyway but users should be encouraged to set content types anyway - this can optimize the parsing, aka, avoid doing the detection at the parser level and optionally use a Content-Type  

So, will we add a "/form" to a multipart/form-data accepting resource method or keep things as is ?

Cheers, Sergey
 


was (Author: sergey_beryozkin):
Dave, I've missed your comment with the exception trace, sorry about it.

After seeing a comment from Jeremy I've tested the JAX-RS server and I can confirm all works as expected.

Note, "curl -T somefile targetURI" does not set Content-Type which explains the exception you are seeing. TikaServer has two resource methods accepting PUT payloads on the same path, one - specifically the multipart/form-data ones and another - all other types of payloads, and it uses a wildcard to match all possible types.  Thus a method with a more specific JAX-RS Consumes value (multipart/form-data) is chosen: the error actually mentions an octet-stream - this is a default content type assigned to an individual  multipart/form-data.

Two fixes are possible:

1. Use -H curl parameter, for example, I've started a server (using a newly added -Pserver profile) and posted a pom.xml to it, adding '-H "Content-Type: text/xml"' and all worked fine. So the actual 'fix' is to update the docs and recommend to set up Content-Type when no multiparts are used.

2. Have a TikaServer resource method accepting multiparts listen on a unique path, say on "http://localhost:9998/tika/form"

Option 2 is less 'disruptive' but option 1 is marginally cleaner IMHO as the clients PUT-ing something into the server are expected to set Content-Type.

I'm fine with implementing Option 2 though too - perhaps it can be done anyway but users should be encouraged to set content types anyway - this can optimize the parsing, aka, avoid doing the detection at the parser level and optionally use a Content-Type  

So, will we add a "/form" to a multipart/form-data accepting resource method or keep things as is ?

Cheers, Sergey
 

> Consider optionally utilizing CXF JAX-RS Attachment support
> -----------------------------------------------------------
>
>                 Key: TIKA-1198
>                 URL: https://issues.apache.org/jira/browse/TIKA-1198
>             Project: Tika
>          Issue Type: Wish
>          Components: server
>            Reporter: Sergey Beryozkin
>            Priority: Minor
>
> CXF offers a fairly extensive support for multiparts:
> http://cxf.apache.org/docs/jax-rs-multiparts.html
> Perhaps some of that can help with the server offering more options to do with uploading/downloading files



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)