You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (Created) (JIRA)" <ji...@apache.org> on 2012/02/03 15:19:53 UTC

[jira] [Created] (STANBOL-481) Multi ContentPart RESTful API extensions

Multi ContentPart RESTful API extensions
----------------------------------------

                 Key: STANBOL-481
                 URL: https://issues.apache.org/jira/browse/STANBOL-481
             Project: Stanbol
          Issue Type: Sub-task
          Components: Enhancer
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


Sub-task about the implementation of the RESTful API extensions related to multipart content items

Copied form the main Issue:

- query params:
Optional inputWithMetadata -> expects multipart/mime with 2 sections of which the first is rdf
Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections
Optional omitMetada -> no metadate in the result, makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-481) Multi ContentPart RESTful API extensions

Posted by "Rupert Westenthaler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201322#comment-13201322 ] 

Rupert Westenthaler commented on STANBOL-481:
---------------------------------------------

Short Update:

I have written a small prototype using 

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpmime</artifactId>
</dependency>
<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
</dependency>

I do like this API much more than javax.mail, because it allows easily to "stream" all the data directly to the OutputStream!

that produces for the ContentItem

uri: urn:test
content type : text/html
contentPart:
   uri: urn:test:text
   content type : text/plain

metadata:
   urn:test, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, urn:types:Document

the following Multipart Mime:

--OHPnQAMLYDkzNWjtO3eHRL_Gmkj3msRmPDuY
Content-Disposition: form-data; name="metadata"; filename="metadata"
Content-Type: application/rdf+xml; charset=UTF-8
Content-Transfer-Encoding: 7bit

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="urn:types:" > 
  <rdf:Description rdf:about="urn:test">
    <rdf:type rdf:resource="urn:types:Document"/>
  </rdf:Description>
</rdf:RDF>

--OHPnQAMLYDkzNWjtO3eHRL_Gmkj3msRmPDuY
Content-Disposition: form-data; name="content"
Content-Type: multipart/alternate; charset=UTF-8
Content-Transfer-Encoding: 7bit

--contentParts
Content-Disposition: form-data; name="urn:test_main"
Content-Type: text/html
Content-Transfer-Encoding: binary

<html><body>Test</body></html>
--contentParts
Content-Disposition: form-data; name="run:text:text"
Content-Type: text/plain
Content-Transfer-Encoding: binary

test
--contentParts--

--OHPnQAMLYDkzNWjtO3eHRL_Gmkj3msRmPDuY--

This should be in conformance with the specification


                
> Multi ContentPart RESTful API extensions
> ----------------------------------------
>
>                 Key: STANBOL-481
>                 URL: https://issues.apache.org/jira/browse/STANBOL-481
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> Sub-task about the implementation of the RESTful API extensions related to multipart content items
> Copied form the main Issue:
> - query params:
> Optional inputWithMetadata -> expects multipart/mime with 2 sections of which the first is rdf
> Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections
> Optional omitMetada -> no metadate in the result, makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (STANBOL-481) Multi ContentPart RESTful API extensions

Posted by "Rupert Westenthaler (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-481.
-----------------------------------------

    Resolution: Fixed

Finished Implementation with #1243965
Documentation for the new Features can be found at [1] and the inline RESTful service documentation of the Stanbol Web UI



[1] http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/enhancerrest.html#multi-part_contentitem_support
                
> Multi ContentPart RESTful API extensions
> ----------------------------------------
>
>                 Key: STANBOL-481
>                 URL: https://issues.apache.org/jira/browse/STANBOL-481
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>         Attachments: contentItemAsMultipartMime.txt
>
>
> Sub-task about the implementation of the RESTful API extensions related to multipart content items
> Copied form the main Issue:
> - query params:
> Optional inputWithMetadata -> expects multipart/mime with 2 sections of which the first is rdf
> Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections
> Optional omitMetada -> no metadate in the result, makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (STANBOL-481) Multi ContentPart RESTful API extensions

Posted by "Rupert Westenthaler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203610#comment-13203610 ] 

Rupert Westenthaler edited comment on STANBOL-481 at 2/9/12 8:50 AM:
---------------------------------------------------------------------

This is a suggestion to change the original proposal for the RESTful API extensions to support ContentParts with the request for feedback.

## MultiPart ContentItem RESTful API

Users that parse parameters that cause Stanbol to send multiple ContentParts MUST

 * NOT use an Accept header: In this case the result will default to "multipart/form-data"
 * set the Accept header to "multipart/form-data"

Users that want to parse multiple ContentItem parts (such as metadata and/or alternate content versions) MUST the the Content-Type header to "multipart/form-data" and follow the specification as defined in the above comments.

### Removal "inputWithMetadata"

In my opinion this parameter is not neede.

Rational: If the parsed content is "multipart/from-data" content negotiation can be used to automatically detect that a ContentItem is uploaded. If the first part has than the name "metadata" this means that metadata are provided. If the first part is "content" than only content (but possible in different alternate versions) is provided. 

### Output ContentParts to Responses

__Original Proposal:__ Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections

This proposal suggests the usage of the index (an integer number) to select ContentParts to be included. However for users it will be very hard to know such indexes because the do depend on the ordering of components adding ContentParts to the contentItem.

As an example: As an result of STANBOL-431 the EnhancementJobManager now adds an MGraph with the ExecutionMetadata to the ContentItem. Because the EnhancementJobManager does this before the first Engine is called. This ContentPart will be always the second contentPart (index '1') of an ContentItem. A alternate version (e.g. a "plain/text" version created be the MetaxaEngine) will be most likely found at index '2'
As soon as this issue is implemented Users will be able to directly parsed alternate versions of Content in the Request. This will again change the ordering, because such parts will be added before the ExecutionMetadata.

In addition contentPart is an rather abstract concept. I would argue that most users would have more interest in specifying what alternate versions of the original content they would like to be included in the response. This would ask for an interface that allows to specify a list of MediaTypes that should be included and maybe an additional switch to exclude the original (parsed) content.

Based on [1] there is also a second type of contentPart that is typically identified by well known URIs. Currently this is used for the ExecutionMetadata, but this could be also used to include the original responses of remote services such as Zemanta, opencalais ... For such kind of contentParts users would need to have the possibility to include contentParts based on the URI

Based on that I suggest the following API:

* __outputContentType=[mediaType]:__ This should include all Blobs with an mediaType that is compatible with the parsed value (e.g. '*' ... all, 'text/*'' ... all text versions, 'text/plain' ... only the plain text version of the parsed content. May be used multiple times to parsed several values
* __omitParsed=[true/false]:__ exclude all parsed versions form the response. Default is 'false'.
* __outputContentPart=[uri/'*']:__ This will include the ContentPart with that URI. The value of '*' indicates that all contentParts (other than Blobs) should be included. May be used multiple times to parsed several URIs.

NOTE: Even if only a single Blob is serialized in the response the Multipart MIME part "content" will still be a "mime/alternate", but than only with an single entry.


### Omit Metadata in the Response

No changes to the original proposal

* __omitMetadata=[true/false]:__ If enabled no metadata in the result. makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

### Returning a single ContentPart

Requests that use an __"Accept"__ header AND __omitMetadata=true__ are interpreted like

* outputContent={accept-header-value}

however instead of using "multipart/form-data" the content parts of this request are directly serialized to the Response


[1] http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/contentitem.html#contentparts

                
      was (Author: rwesten):
    This is a suggestion to change the original proposal for the RESTful API extensions to support ContentParts with the request for feedback.

## MultiPart ContentItem RESTful API

Users that parse parameters that cause Stanbol to send multiple ContentParts MUST

 * NOT use an Accept header: In this case the result will default to "multipart/form-data"
 * set the Accept header to "multipart/form-data"

Users that want to parse multiple ContentItem parts (such as metadata and/or alternate content versions) MUST the the Content-Type header to "multipart/form-data" and follow the specification as defined in the above comments.

### Removal "inputWithMetadata"

In my opinion this parameter is not neede.

Rational: If the parsed content is "multipart/from-data" content negotiation can be used to automatically detect that a ContentItem is uploaded. If the first part has than the name "metadata" this means that metadata are provided. If the first part is "content" than only content (but possible in different alternate versions) is provided. 

### Output ContentParts to Responses

__Original Proposal:__ Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections

This proposal suggests the usage of the index (an integer number) to select ContentParts to be included. However for users it will be very hard to know such indexes because the do depend on the ordering of components adding ContentParts to the contentItem.

As an example: As an result of STANBOL-431 the EnhancementJobManager now adds an MGraph with the ExecutionMetadata to the ContentItem. Because the EnhancementJobManager does this before the first Engine is called. This ContentPart will be always the second contentPart (index '1') of an ContentItem. A alternate version (e.g. a "plain/text" version created be the MetaxaEngine) will be most likely found at index '2'
As soon as this issue is implemented Users will be able to directly parsed alternate versions of Content in the Request. This will again change the ordering, because such parts will be added before the ExecutionMetadata.

In addition contentPart is an rather abstract concept. I would argue that most users would have more interest in specifying what alternate versions of the original content they would like to be included in the response. This would ask for an interface that allows to specify a list of MediaTypes that should be included and maybe an additional switch to exclude the original (parsed) content.

Based on [1] there is also a second type of contentPart that is typically identified by well known URIs. Currently this is used for the ExecutionMetadata, but this could be also used to include the original responses of remote services such as Zemanta, opencalais ... For such kind of contentParts users would need to have the possibility to include contentParts based on the URI

Based on that I suggest the following API:

* __outputContent=[mediaType]:__ This should include all Blobs with an mediaType that is compatible with the parsed value (e.g. '*' ... all, 'text/*'' ... all text versions, 'text/plain' ... only the plain text version of the parsed content. May be used multiple times to parsed several values
* __omitParsed=[true/false]:__ exclude all parsed versions form the response. Default is 'false'.
* __outputContentPart=[uri/'*']:__ This will include the ContentPart with that URI. The value of '*' indicates that all contentParts (other than Blobs) should be included. May be used multiple times to parsed several URIs.

NOTE: Even if only a single Blob is serialized in the response the Multipart MIME part "content" will still be a "mime/alternate", but than only with an single entry.


### Omit Metada in the Response

No changes to the original proposal

* __omitMetada=[true/false]:__ If enabled no metadate in the result. makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

### Returning a single ContentPart

Requests that use an __"Accept"__ header AND __omitMetada=true__ are interpreted like

* outputContent={accept-header-value}

however instead of using "multipart/form-data" the content parts of this request are directly serialized to the Response


[1] http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/contentitem.html#contentparts

                  
> Multi ContentPart RESTful API extensions
> ----------------------------------------
>
>                 Key: STANBOL-481
>                 URL: https://issues.apache.org/jira/browse/STANBOL-481
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>         Attachments: contentItemAsMultipartMime.txt
>
>
> Sub-task about the implementation of the RESTful API extensions related to multipart content items
> Copied form the main Issue:
> - query params:
> Optional inputWithMetadata -> expects multipart/mime with 2 sections of which the first is rdf
> Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections
> Optional omitMetada -> no metadate in the result, makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (STANBOL-481) Multi ContentPart RESTful API extensions

Posted by "Rupert Westenthaler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203610#comment-13203610 ] 

Rupert Westenthaler edited comment on STANBOL-481 at 2/14/12 2:01 PM:
----------------------------------------------------------------------

This is a suggestion to change the original proposal for the RESTful API extensions to support ContentParts with the request for feedback.

## MultiPart ContentItem RESTful API

Users that parse parameters that cause Stanbol to send multiple ContentParts MUST

 * NOT use an Accept header: In this case the result will default to "multipart/form-data"
 * set the Accept header to "multipart/form-data"

Users that want to parse multiple ContentItem parts (such as metadata and/or alternate content versions) MUST the the Content-Type header to "multipart/form-data" and follow the specification as defined in the above comments.

### Removal "inputWithMetadata"

In my opinion this parameter is not neede.

Rational: If the parsed content is "multipart/from-data" content negotiation can be used to automatically detect that a ContentItem is uploaded. If the first part has than the name "metadata" this means that metadata are provided. If the first part is "content" than only content (but possible in different alternate versions) is provided. 

### Output ContentParts to Responses

__Original Proposal:__ Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections

This proposal suggests the usage of the index (an integer number) to select ContentParts to be included. However for users it will be very hard to know such indexes because the do depend on the ordering of components adding ContentParts to the contentItem.

As an example: As an result of STANBOL-431 the EnhancementJobManager now adds an MGraph with the ExecutionMetadata to the ContentItem. Because the EnhancementJobManager does this before the first Engine is called. This ContentPart will be always the second contentPart (index '1') of an ContentItem. A alternate version (e.g. a "plain/text" version created be the MetaxaEngine) will be most likely found at index '2'
As soon as this issue is implemented Users will be able to directly parsed alternate versions of Content in the Request. This will again change the ordering, because such parts will be added before the ExecutionMetadata.

In addition contentPart is an rather abstract concept. I would argue that most users would have more interest in specifying what alternate versions of the original content they would like to be included in the response. This would ask for an interface that allows to specify a list of MediaTypes that should be included and maybe an additional switch to exclude the original (parsed) content.

Based on [1] there is also a second type of contentPart that is typically identified by well known URIs. Currently this is used for the ExecutionMetadata, but this could be also used to include the original responses of remote services such as Zemanta, opencalais ... For such kind of contentParts users would need to have the possibility to include contentParts based on the URI

Based on that I suggest the following API:

* __outputContentType=[mediaType]:__ This should include all Blobs with an mediaType that is compatible with the parsed value (e.g. '*' ... all, 'text/*'' ... all text versions, 'text/plain' ... only the plain text version of the parsed content. May be used multiple times to parsed several values
* __omitParsed=[true/false]:__ exclude all parsed versions form the response. Default is 'false'.
* __outputContentPart=[uri/'*']:__ This will include the ContentPart with that URI. The value of '*' indicates that all contentParts (other than Blobs) should be included. May be used multiple times to parsed several URIs.
* __rdfFormat=[rdfMimeType]:__ This allows for requests that result in multipart/from-data encoded responses to specify the used RDF serialization format. Supported formats and defaults are the same as for normal Enhancer Requests. 

NOTE: Even if only a single Blob is serialized in the response the Multipart MIME part "content" will still be a "mime/alternate", but than only with an single entry.


### Omit Metadata in the Response

No changes to the original proposal

* __omitMetadata=[true/false]:__ If enabled no metadata in the result. makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

### Returning a single ContentPart

Requests that use an __"Accept"__ header AND __omitMetadata=true__ are interpreted like

* outputContent={accept-header-value}

however instead of using "multipart/form-data" the content parts of this request are directly serialized to the Response


[1] http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/contentitem.html#contentparts

                
      was (Author: rwesten):
    This is a suggestion to change the original proposal for the RESTful API extensions to support ContentParts with the request for feedback.

## MultiPart ContentItem RESTful API

Users that parse parameters that cause Stanbol to send multiple ContentParts MUST

 * NOT use an Accept header: In this case the result will default to "multipart/form-data"
 * set the Accept header to "multipart/form-data"

Users that want to parse multiple ContentItem parts (such as metadata and/or alternate content versions) MUST the the Content-Type header to "multipart/form-data" and follow the specification as defined in the above comments.

### Removal "inputWithMetadata"

In my opinion this parameter is not neede.

Rational: If the parsed content is "multipart/from-data" content negotiation can be used to automatically detect that a ContentItem is uploaded. If the first part has than the name "metadata" this means that metadata are provided. If the first part is "content" than only content (but possible in different alternate versions) is provided. 

### Output ContentParts to Responses

__Original Proposal:__ Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections

This proposal suggests the usage of the index (an integer number) to select ContentParts to be included. However for users it will be very hard to know such indexes because the do depend on the ordering of components adding ContentParts to the contentItem.

As an example: As an result of STANBOL-431 the EnhancementJobManager now adds an MGraph with the ExecutionMetadata to the ContentItem. Because the EnhancementJobManager does this before the first Engine is called. This ContentPart will be always the second contentPart (index '1') of an ContentItem. A alternate version (e.g. a "plain/text" version created be the MetaxaEngine) will be most likely found at index '2'
As soon as this issue is implemented Users will be able to directly parsed alternate versions of Content in the Request. This will again change the ordering, because such parts will be added before the ExecutionMetadata.

In addition contentPart is an rather abstract concept. I would argue that most users would have more interest in specifying what alternate versions of the original content they would like to be included in the response. This would ask for an interface that allows to specify a list of MediaTypes that should be included and maybe an additional switch to exclude the original (parsed) content.

Based on [1] there is also a second type of contentPart that is typically identified by well known URIs. Currently this is used for the ExecutionMetadata, but this could be also used to include the original responses of remote services such as Zemanta, opencalais ... For such kind of contentParts users would need to have the possibility to include contentParts based on the URI

Based on that I suggest the following API:

* __outputContentType=[mediaType]:__ This should include all Blobs with an mediaType that is compatible with the parsed value (e.g. '*' ... all, 'text/*'' ... all text versions, 'text/plain' ... only the plain text version of the parsed content. May be used multiple times to parsed several values
* __omitParsed=[true/false]:__ exclude all parsed versions form the response. Default is 'false'.
* __outputContentPart=[uri/'*']:__ This will include the ContentPart with that URI. The value of '*' indicates that all contentParts (other than Blobs) should be included. May be used multiple times to parsed several URIs.

NOTE: Even if only a single Blob is serialized in the response the Multipart MIME part "content" will still be a "mime/alternate", but than only with an single entry.


### Omit Metadata in the Response

No changes to the original proposal

* __omitMetadata=[true/false]:__ If enabled no metadata in the result. makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

### Returning a single ContentPart

Requests that use an __"Accept"__ header AND __omitMetadata=true__ are interpreted like

* outputContent={accept-header-value}

however instead of using "multipart/form-data" the content parts of this request are directly serialized to the Response


[1] http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/contentitem.html#contentparts

                  
> Multi ContentPart RESTful API extensions
> ----------------------------------------
>
>                 Key: STANBOL-481
>                 URL: https://issues.apache.org/jira/browse/STANBOL-481
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>         Attachments: contentItemAsMultipartMime.txt
>
>
> Sub-task about the implementation of the RESTful API extensions related to multipart content items
> Copied form the main Issue:
> - query params:
> Optional inputWithMetadata -> expects multipart/mime with 2 sections of which the first is rdf
> Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections
> Optional omitMetada -> no metadate in the result, makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (STANBOL-481) Multi ContentPart RESTful API extensions

Posted by "Rupert Westenthaler (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler updated STANBOL-481:
----------------------------------------

    Attachment: contentItemAsMultipartMime.txt

This provides detailed information on how ContentItems are serialized as Multipart MIME (see Attachment contentItemAsMultipartMime.txt).

NOTE that this describes all information of a ContentItem that can be serialized as Multipart MIME. Based on the parameters provided for the request some of such information might be filtered.


## ContentItem Multipart MIME format 

* "contentItem" is now used as "boundary" for the different parts of the ContentItem
* Content-Type of the response is "multipart/form-data; boundary=contentItem; charset=UTF-8". Note the inclusion of the "boundary" parameter!

### Metadata:

* If present this will be the first Part.
* Content is the serialized RDF Graph of the metadata. Basically the data currently returned by the Enhancer.
* "metadata" is used as name for this part 
* the URI of the contentItem is used as fileName

### Content

* If present this follows immediately to the metadata. If metadata are present this will be at the second position.
* The content is encoded as "multipart/alternate" - nested multipart.
* the boundary "contentParts" is used to separate different content versions
* the name is "content"
* no fileName is provided.

#### Content Versions

A single Content is parsed to the Enhancer, but this might be converted to other formats (e.g. "text/html" -> "text/plain") by EnhancementEngines. This alternate versions are added as ContentParts with the type Blob to the ContentItem.

All such Blobs are serialized as individual parts within the "multipart/alternate" with the name "content". They use the following properties:

* name: the URI of the contentPart
* Content-Type and charset as specified by the Blob
* Content: 1:1 copy of the InputStream provided by the Blob

Note that the metadata might (or might not) provide additional information about the Content Versions. This will be represented by triples using the URI of the contentPart as Subject.

### Other Information

ContentItem may include also content parts other than alternate versions of the Content. This are e.g. used to store metadata about the enhancement process (ExecutionMetadata and ExecutionPlan). If such contentParts are RDF graphs ( instances of Clerezza TripleCollection) they can also be added as parts of the content item.

* Other parts MUST always contained in a part with an index > of the "content". This also implies that the index > as of the "metadata"
* The URI of the content part is used as name
* The Content-Type - the RDF serialization format - will be the same as specified/used for the "metadata".
                
> Multi ContentPart RESTful API extensions
> ----------------------------------------
>
>                 Key: STANBOL-481
>                 URL: https://issues.apache.org/jira/browse/STANBOL-481
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>         Attachments: contentItemAsMultipartMime.txt
>
>
> Sub-task about the implementation of the RESTful API extensions related to multipart content items
> Copied form the main Issue:
> - query params:
> Optional inputWithMetadata -> expects multipart/mime with 2 sections of which the first is rdf
> Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections
> Optional omitMetada -> no metadate in the result, makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (STANBOL-481) Multi ContentPart RESTful API extensions

Posted by "Rupert Westenthaler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201322#comment-13201322 ] 

Rupert Westenthaler edited comment on STANBOL-481 at 2/7/12 1:29 PM:
---------------------------------------------------------------------

The current plan is to implement ContentItemWriter and ContentItemReader as MessageBodyWriter/Reader. This will allow a nice integration of ContentItem with JAX-RS resource.

This also means the the defualt Multipart MIME functionality of Jersey will NOT be used for ContentItems.

### Serialization:

I have implemented a  prototype using the "org.apache.httpcomponents.httpmime" framework. This looks really great, because it natively supports "streaming". This has the big advantage (in contrast to the javax.mail) that there will be no issues even if the ContentItem refers to very big content items.

The dependencies are: 

    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpmime</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpclient</artifactId>
    </dependency>

### Parsing

For parsing I have a working prototype that uses the "FileUpload.getItemIterator(..)". This is also a streaming based solution and works without any data to be written to the disc of kept in memory. See also [1].

The dependencies are: 

    <dependency>
        <groupId>commons-fileupload</groupId>
        <artifactId>commons-fileupload</artifactId>
    </dependency>
    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
    </dependency>

Details about the used Multipart MIME format for content Items see the next comment.

[1]  http://commons.apache.org/fileupload/streaming.html

                
      was (Author: rwesten):
    Short Update:

I have written a small prototype using 

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpmime</artifactId>
</dependency>
<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
</dependency>

I do like this API much more than javax.mail, because it allows easily to "stream" all the data directly to the OutputStream!

that produces for the ContentItem

uri: urn:test
content type : text/html
contentPart:
   uri: urn:test:text
   content type : text/plain

metadata:
   urn:test, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, urn:types:Document

the following Multipart Mime:

--OHPnQAMLYDkzNWjtO3eHRL_Gmkj3msRmPDuY
Content-Disposition: form-data; name="metadata"; filename="metadata"
Content-Type: application/rdf+xml; charset=UTF-8
Content-Transfer-Encoding: 7bit

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="urn:types:" > 
  <rdf:Description rdf:about="urn:test">
    <rdf:type rdf:resource="urn:types:Document"/>
  </rdf:Description>
</rdf:RDF>

--OHPnQAMLYDkzNWjtO3eHRL_Gmkj3msRmPDuY
Content-Disposition: form-data; name="content"
Content-Type: multipart/alternate; charset=UTF-8
Content-Transfer-Encoding: 7bit

--contentParts
Content-Disposition: form-data; name="urn:test_main"
Content-Type: text/html
Content-Transfer-Encoding: binary

<html><body>Test</body></html>
--contentParts
Content-Disposition: form-data; name="run:text:text"
Content-Type: text/plain
Content-Transfer-Encoding: binary

test
--contentParts--

--OHPnQAMLYDkzNWjtO3eHRL_Gmkj3msRmPDuY--

This should be in conformance with the specification


                  
> Multi ContentPart RESTful API extensions
> ----------------------------------------
>
>                 Key: STANBOL-481
>                 URL: https://issues.apache.org/jira/browse/STANBOL-481
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>         Attachments: contentItemAsMultipartMime.txt
>
>
> Sub-task about the implementation of the RESTful API extensions related to multipart content items
> Copied form the main Issue:
> - query params:
> Optional inputWithMetadata -> expects multipart/mime with 2 sections of which the first is rdf
> Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections
> Optional omitMetada -> no metadate in the result, makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-481) Multi ContentPart RESTful API extensions

Posted by "Rupert Westenthaler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203610#comment-13203610 ] 

Rupert Westenthaler commented on STANBOL-481:
---------------------------------------------

This is a suggestion to change the original proposal for the RESTful API extensions to support ContentParts with the request for feedback.

## MultiPart ContentItem RESTful API


### Removal "inputWithMetadata"

In my opinion this parameter is not neede.

Rational: If the parsed content is "multipart/from-data" content negotiation can be used to automatically detect that a ContentItem is uploaded. If the first part has than the name "metadata" this means that metadata are provided. If the first part is "content" than only content (but possible in different alternate versions) is provided. 

### Output ContentParts to Responses

__Original Proposal:__ Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections

This proposal suggests the usage of the index (an integer number) to select ContentParts to be included. However for users it will be very hard to know such indexes because the do depend on the ordering of components adding ContentParts to the contentItem.

As an example: As an result of STANBOL-431 the EnhancementJobManager now adds an MGraph with the ExecutionMetadata to the ContentItem. Because the EnhancementJobManager does this before the first Engine is called. This ContentPart will be always the second contentPart (index '1') of an ContentItem. A alternate version (e.g. a "plain/text" version created be the MetaxaEngine) will be most likely found at index '2'
As soon as this issue is implemented Users will be able to directly parsed alternate versions of Content in the Request. This will again change the ordering, because such parts will be added before the ExecutionMetadata.

In addition contentPart is an rather abstract concept. I would argue that most users would have more interest in specifying what alternate versions of the original content they would like to be included in the response. This would ask for an interface that allows to specify a list of MediaTypes that should be included and maybe an additional switch to exclude the original (parsed) content.

Based on [1] there is also a second type of contentPart that is typically identified by well known URIs. Currently this is used for the ExecutionMetadata, but this could be also used to include the original responses of remote services such as Zemanta, opencalais ... For such kind of contentParts users would need to have the possibility to include contentParts based on the URI

Based on that I suggest the following API:

* __outputContent=[mediaType]:__ This should include all Blobs with an mediaType that is compatible with the parsed value (e.g. '*' ... all, 'text/*'' ... all text versions, 'text/plain' ... only the plain text version of the parsed content. May be used multiple times to parsed several values
* __omitParsed=[true/false]:__ exclude all parsed versions form the response. Default is 'false'.
* __outputContentPart=[uri/'*']:__ This will include the ContentPart with that URI. The value of '*' indicates that all contentParts (other than Blobs) should be included. May be used multiple times to parsed several URIs.

NOTE: Even if only a single Blob is serialized in the response the Multipart MIME part "content" will still be a "mime/alternate", but than only with an single entry.

### Omit Metada in the Response

No changes to the original proposal

* __omitMetada=[true/false]:__ If enabled no metadate in the result. makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 


[1] http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/contentitem.html#contentparts

                
> Multi ContentPart RESTful API extensions
> ----------------------------------------
>
>                 Key: STANBOL-481
>                 URL: https://issues.apache.org/jira/browse/STANBOL-481
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>         Attachments: contentItemAsMultipartMime.txt
>
>
> Sub-task about the implementation of the RESTful API extensions related to multipart content items
> Copied form the main Issue:
> - query params:
> Optional inputWithMetadata -> expects multipart/mime with 2 sections of which the first is rdf
> Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections
> Optional omitMetada -> no metadate in the result, makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (STANBOL-481) Multi ContentPart RESTful API extensions

Posted by "Rupert Westenthaler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203610#comment-13203610 ] 

Rupert Westenthaler edited comment on STANBOL-481 at 2/8/12 2:16 PM:
---------------------------------------------------------------------

This is a suggestion to change the original proposal for the RESTful API extensions to support ContentParts with the request for feedback.

## MultiPart ContentItem RESTful API

Users that parse parameters that cause Stanbol to send multiple ContentParts MUST

 * NOT use an Accept header: In this case the result will default to "multipart/form-data"
 * set the Accept header to "multipart/form-data"

Users that want to parse multiple ContentItem parts (such as metadata and/or alternate content versions) MUST the the Content-Type header to "multipart/form-data" and follow the specification as defined in the above comments.

### Removal "inputWithMetadata"

In my opinion this parameter is not neede.

Rational: If the parsed content is "multipart/from-data" content negotiation can be used to automatically detect that a ContentItem is uploaded. If the first part has than the name "metadata" this means that metadata are provided. If the first part is "content" than only content (but possible in different alternate versions) is provided. 

### Output ContentParts to Responses

__Original Proposal:__ Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections

This proposal suggests the usage of the index (an integer number) to select ContentParts to be included. However for users it will be very hard to know such indexes because the do depend on the ordering of components adding ContentParts to the contentItem.

As an example: As an result of STANBOL-431 the EnhancementJobManager now adds an MGraph with the ExecutionMetadata to the ContentItem. Because the EnhancementJobManager does this before the first Engine is called. This ContentPart will be always the second contentPart (index '1') of an ContentItem. A alternate version (e.g. a "plain/text" version created be the MetaxaEngine) will be most likely found at index '2'
As soon as this issue is implemented Users will be able to directly parsed alternate versions of Content in the Request. This will again change the ordering, because such parts will be added before the ExecutionMetadata.

In addition contentPart is an rather abstract concept. I would argue that most users would have more interest in specifying what alternate versions of the original content they would like to be included in the response. This would ask for an interface that allows to specify a list of MediaTypes that should be included and maybe an additional switch to exclude the original (parsed) content.

Based on [1] there is also a second type of contentPart that is typically identified by well known URIs. Currently this is used for the ExecutionMetadata, but this could be also used to include the original responses of remote services such as Zemanta, opencalais ... For such kind of contentParts users would need to have the possibility to include contentParts based on the URI

Based on that I suggest the following API:

* __outputContent=[mediaType]:__ This should include all Blobs with an mediaType that is compatible with the parsed value (e.g. '*' ... all, 'text/*'' ... all text versions, 'text/plain' ... only the plain text version of the parsed content. May be used multiple times to parsed several values
* __omitParsed=[true/false]:__ exclude all parsed versions form the response. Default is 'false'.
* __outputContentPart=[uri/'*']:__ This will include the ContentPart with that URI. The value of '*' indicates that all contentParts (other than Blobs) should be included. May be used multiple times to parsed several URIs.

NOTE: Even if only a single Blob is serialized in the response the Multipart MIME part "content" will still be a "mime/alternate", but than only with an single entry.


### Omit Metada in the Response

No changes to the original proposal

* __omitMetada=[true/false]:__ If enabled no metadate in the result. makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

### Returning a single ContentPart

Requests that use an __"Accept"__ header AND __omitMetada=true__ are interpreted like

* outputContent={accept-header-value}

however instead of using "multipart/form-data" the content parts of this request are directly serialized to the Response


[1] http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/contentitem.html#contentparts

                
      was (Author: rwesten):
    This is a suggestion to change the original proposal for the RESTful API extensions to support ContentParts with the request for feedback.

## MultiPart ContentItem RESTful API


### Removal "inputWithMetadata"

In my opinion this parameter is not neede.

Rational: If the parsed content is "multipart/from-data" content negotiation can be used to automatically detect that a ContentItem is uploaded. If the first part has than the name "metadata" this means that metadata are provided. If the first part is "content" than only content (but possible in different alternate versions) is provided. 

### Output ContentParts to Responses

__Original Proposal:__ Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections

This proposal suggests the usage of the index (an integer number) to select ContentParts to be included. However for users it will be very hard to know such indexes because the do depend on the ordering of components adding ContentParts to the contentItem.

As an example: As an result of STANBOL-431 the EnhancementJobManager now adds an MGraph with the ExecutionMetadata to the ContentItem. Because the EnhancementJobManager does this before the first Engine is called. This ContentPart will be always the second contentPart (index '1') of an ContentItem. A alternate version (e.g. a "plain/text" version created be the MetaxaEngine) will be most likely found at index '2'
As soon as this issue is implemented Users will be able to directly parsed alternate versions of Content in the Request. This will again change the ordering, because such parts will be added before the ExecutionMetadata.

In addition contentPart is an rather abstract concept. I would argue that most users would have more interest in specifying what alternate versions of the original content they would like to be included in the response. This would ask for an interface that allows to specify a list of MediaTypes that should be included and maybe an additional switch to exclude the original (parsed) content.

Based on [1] there is also a second type of contentPart that is typically identified by well known URIs. Currently this is used for the ExecutionMetadata, but this could be also used to include the original responses of remote services such as Zemanta, opencalais ... For such kind of contentParts users would need to have the possibility to include contentParts based on the URI

Based on that I suggest the following API:

* __outputContent=[mediaType]:__ This should include all Blobs with an mediaType that is compatible with the parsed value (e.g. '*' ... all, 'text/*'' ... all text versions, 'text/plain' ... only the plain text version of the parsed content. May be used multiple times to parsed several values
* __omitParsed=[true/false]:__ exclude all parsed versions form the response. Default is 'false'.
* __outputContentPart=[uri/'*']:__ This will include the ContentPart with that URI. The value of '*' indicates that all contentParts (other than Blobs) should be included. May be used multiple times to parsed several URIs.

NOTE: Even if only a single Blob is serialized in the response the Multipart MIME part "content" will still be a "mime/alternate", but than only with an single entry.

### Omit Metada in the Response

No changes to the original proposal

* __omitMetada=[true/false]:__ If enabled no metadate in the result. makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 


[1] http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/contentitem.html#contentparts

                  
> Multi ContentPart RESTful API extensions
> ----------------------------------------
>
>                 Key: STANBOL-481
>                 URL: https://issues.apache.org/jira/browse/STANBOL-481
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>         Attachments: contentItemAsMultipartMime.txt
>
>
> Sub-task about the implementation of the RESTful API extensions related to multipart content items
> Copied form the main Issue:
> - query params:
> Optional inputWithMetadata -> expects multipart/mime with 2 sections of which the first is rdf
> Optional outputWithContentParts[=<section-ordinal>] -> the result is multipart (instead of rdf) containing rdf as the first section and the parts in the second section, if there is more than one part this second section is itself multipart, this argument might be repated to have different sections
> Optional omitMetada -> no metadate in the result, makes only sense with outputContentParts argument, the result will correspond to the second section of the malipart returned without this argument 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira