You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Sergey Beryozkin (JIRA)" <ji...@apache.org> on 2014/01/19 19:21:20 UTC
[jira] [Comment Edited] (TIKA-1198) Consider optionally utilizing
CXF JAX-RS Attachment support
[ https://issues.apache.org/jira/browse/TIKA-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875956#comment-13875956 ]
Sergey Beryozkin edited comment on TIKA-1198 at 1/19/14 6:20 PM:
-----------------------------------------------------------------
Dave, I've missed your comment with the exception trace, sorry about it.
After seeing a comment from Jeremy I've tested the JAX-RS server and I can confirm all works as expected.
Note, "curl -T somefile targetURI" does not set Content-Type which explains the exception you are seeing. TikaServer has two resource methods accepting PUT payloads on the same path, one - specifically the multipart/form-data ones and another - all other types of payloads, and it uses a wildcard to match all possible types. Thus a method with a more specific JAX-RS Consumes value (multipart/form-data) is chosen when no Content-Type is available: the error actually mentions an octet-stream - this is a default content type assigned to an individual multipart/form-data.
Two fixes are possible:
1. Use -H curl parameter, for example, I've started a server (using a newly added -Pserver profile) and posted a pom.xml to it, adding '-H "Content-Type: text/xml"' and all worked fine. So the actual 'fix' is to update the docs and recommend to set up Content-Type when no multiparts are used.
2. Have a TikaServer resource method accepting multiparts listen on a unique path, say on "http://localhost:9998/tika/form"
Option 2 is less 'disruptive' but option 1 is marginally cleaner IMHO as the clients PUT-ing something into the server are expected to set Content-Type.
I'm fine with implementing Option 2 though too - perhaps it can be done anyway but users should be encouraged to set content types anyway - this can optimize the parsing, aka, avoid doing the detection at the parser level and optionally use a Content-Type
So, will we add a "/form" to a multipart/form-data accepting resource method or keep things as is ?
Cheers, Sergey
was (Author: sergey_beryozkin):
Dave, I've missed your comment with the exception trace, sorry about it.
After seeing a comment from Jeremy I've tested the JAX-RS server and I can confirm all works as expected.
Note, "curl -T somefile targetURI" does not set Content-Type which explains the exception you are seeing. TikaServer has two resource methods accepting PUT payloads on the same path, one - specifically the multipart/form-data ones and another - all other types of payloads, and it uses a wildcard to match all possible types. Thus a method with a more specific JAX-RS Consumes value (multipart/form-data) is chosen: the error actually mentions an octet-stream - this is a default content type assigned to an individual multipart/form-data.
Two fixes are possible:
1. Use -H curl parameter, for example, I've started a server (using a newly added -Pserver profile) and posted a pom.xml to it, adding '-H "Content-Type: text/xml"' and all worked fine. So the actual 'fix' is to update the docs and recommend to set up Content-Type when no multiparts are used.
2. Have a TikaServer resource method accepting multiparts listen on a unique path, say on "http://localhost:9998/tika/form"
Option 2 is less 'disruptive' but option 1 is marginally cleaner IMHO as the clients PUT-ing something into the server are expected to set Content-Type.
I'm fine with implementing Option 2 though too - perhaps it can be done anyway but users should be encouraged to set content types anyway - this can optimize the parsing, aka, avoid doing the detection at the parser level and optionally use a Content-Type
So, will we add a "/form" to a multipart/form-data accepting resource method or keep things as is ?
Cheers, Sergey
> Consider optionally utilizing CXF JAX-RS Attachment support
> -----------------------------------------------------------
>
> Key: TIKA-1198
> URL: https://issues.apache.org/jira/browse/TIKA-1198
> Project: Tika
> Issue Type: Wish
> Components: server
> Reporter: Sergey Beryozkin
> Priority: Minor
>
> CXF offers a fairly extensive support for multiparts:
> http://cxf.apache.org/docs/jax-rs-multiparts.html
> Perhaps some of that can help with the server offering more options to do with uploading/downloading files
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)