You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (Created) (JIRA)" <ji...@apache.org> on 2012/02/02 12:11:53 UTC

[jira] [Created] (STANBOL-478) Change Metaxa Engine to use ContentParts

Change Metaxa Engine to use ContentParts
----------------------------------------

                 Key: STANBOL-478
                 URL: https://issues.apache.org/jira/browse/STANBOL-478
             Project: Stanbol
          Issue Type: Improvement
          Components: Enhancer
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


Instead of adding the Text version of an ContentItem to the metadata Metaxa should use the new added ContentParts API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-478) Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart

Posted by "Rupert Westenthaler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210201#comment-13210201 ] 

Rupert Westenthaler commented on STANBOL-478:
---------------------------------------------

regarding: "outputContent parameter is wrongly termed 'outputContentType'. "

Thx for reporting: Already corrected in [1]. Will update the integrated doc soonish



[1] http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/enhancerrest.html
                
> Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-478
>                 URL: https://issues.apache.org/jira/browse/STANBOL-478
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Walter Kasper
>
> Instead of adding/reading the "text/plain" version of an ContentItem to/from the metadata of the ContentItem the new ContentPart API should be used for that.
> This will require the Metaxa Engine to store literal values of all Triples with the ContentItem.getUri() as subject and
>     
>     http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> as property to a Blob and add this as ContentPart to the ContentItem.
> Other EnhancementEngines need than to search for a Blob with the MimeType "text/plain" instead of retrieving the plain text from the metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (STANBOL-478) Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart

Posted by "Walter Kasper (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Walter Kasper reopened STANBOL-478:
-----------------------------------

      Assignee: Walter Kasper  (was: Rupert Westenthaler)

For external clients that use Metaxa for text extraction the text will not be visible/accessible anymore in the metadata graph. There should be at least an option for them to have the text included in the metadata so they can retrieve it there by a simple Sparql query.
                
> Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-478
>                 URL: https://issues.apache.org/jira/browse/STANBOL-478
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Walter Kasper
>
> Instead of adding/reading the "text/plain" version of an ContentItem to/from the metadata of the ContentItem the new ContentPart API should be used for that.
> This will require the Metaxa Engine to store literal values of all Triples with the ContentItem.getUri() as subject and
>     
>     http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> as property to a Blob and add this as ContentPart to the ContentItem.
> Other EnhancementEngines need than to search for a Blob with the MimeType "text/plain" instead of retrieving the plain text from the metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-478) Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart

Posted by "Walter Kasper (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210194#comment-13210194 ] 

Walter Kasper commented on STANBOL-478:
---------------------------------------

That multi-part version was not available on our Stanbol version from 20120213, and would have broken our interfaces anyway. Today's version (20120217) seems to provide that API but there is an error in the REST API description: the outputContent parameter is wrongly termed 'outputContentType'.

Best

Walter
                
> Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-478
>                 URL: https://issues.apache.org/jira/browse/STANBOL-478
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Walter Kasper
>
> Instead of adding/reading the "text/plain" version of an ContentItem to/from the metadata of the ContentItem the new ContentPart API should be used for that.
> This will require the Metaxa Engine to store literal values of all Triples with the ContentItem.getUri() as subject and
>     
>     http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> as property to a Blob and add this as ContentPart to the ContentItem.
> Other EnhancementEngines need than to search for a Blob with the MimeType "text/plain" instead of retrieving the plain text from the metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-478) Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart

Posted by "Rupert Westenthaler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210137#comment-13210137 ] 

Rupert Westenthaler commented on STANBOL-478:
---------------------------------------------

This could be implemented by using https://issues.apache.org/jira/browse/STANBOL-488 on a per-request bases.
                
> Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-478
>                 URL: https://issues.apache.org/jira/browse/STANBOL-478
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Walter Kasper
>
> Instead of adding/reading the "text/plain" version of an ContentItem to/from the metadata of the ContentItem the new ContentPart API should be used for that.
> This will require the Metaxa Engine to store literal values of all Triples with the ContentItem.getUri() as subject and
>     
>     http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> as property to a Blob and add this as ContentPart to the ContentItem.
> Other EnhancementEngines need than to search for a Blob with the MimeType "text/plain" instead of retrieving the plain text from the metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (STANBOL-478) Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart

Posted by "Rupert Westenthaler (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-478.
-----------------------------------------

    Resolution: Fixed

implemented with revision  #1239618
                
> Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-478
>                 URL: https://issues.apache.org/jira/browse/STANBOL-478
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> Instead of adding/reading the "text/plain" version of an ContentItem to/from the metadata of the ContentItem the new ContentPart API should be used for that.
> This will require the Metaxa Engine to store literal values of all Triples with the ContentItem.getUri() as subject and
>     
>     http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> as property to a Blob and add this as ContentPart to the ContentItem.
> Other EnhancementEngines need than to search for a Blob with the MimeType "text/plain" instead of retrieving the plain text from the metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (STANBOL-478) Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart

Posted by "Walter Kasper (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Walter Kasper resolved STANBOL-478.
-----------------------------------

    Resolution: Fixed

Added option to include extracted text directly in the metadata graph
                
> Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-478
>                 URL: https://issues.apache.org/jira/browse/STANBOL-478
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Walter Kasper
>
> Instead of adding/reading the "text/plain" version of an ContentItem to/from the metadata of the ContentItem the new ContentPart API should be used for that.
> This will require the Metaxa Engine to store literal values of all Triples with the ContentItem.getUri() as subject and
>     
>     http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> as property to a Blob and add this as ContentPart to the ContentItem.
> Other EnhancementEngines need than to search for a Blob with the MimeType "text/plain" instead of retrieving the plain text from the metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (STANBOL-478) Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart

Posted by "Rupert Westenthaler (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler updated STANBOL-478:
----------------------------------------

    Description: 
Instead of adding/reading the "text/plain" version of an ContentItem to/from the metadata of the ContentItem the new ContentPart API should be used for that.

This will require the Metaxa Engine to store literal values of all Triples with the ContentItem.getUri() as subject and
    
    http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent

as property to a Blob and add this as ContentPart to the ContentItem.

Other EnhancementEngines need than to search for a Blob with the MimeType "text/plain" instead of retrieving the plain text from the metadata.

  was:Instead of adding the Text version of an ContentItem to the metadata Metaxa should use the new added ContentParts API

        Summary: Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart  (was: Change Metaxa Engine to use ContentParts)
    
> Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-478
>                 URL: https://issues.apache.org/jira/browse/STANBOL-478
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> Instead of adding/reading the "text/plain" version of an ContentItem to/from the metadata of the ContentItem the new ContentPart API should be used for that.
> This will require the Metaxa Engine to store literal values of all Triples with the ContentItem.getUri() as subject and
>     
>     http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> as property to a Blob and add this as ContentPart to the ContentItem.
> Other EnhancementEngines need than to search for a Blob with the MimeType "text/plain" instead of retrieving the plain text from the metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (STANBOL-478) Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart

Posted by "Rupert Westenthaler (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210137#comment-13210137 ] 

Rupert Westenthaler edited comment on STANBOL-478 at 2/17/12 9:02 AM:
----------------------------------------------------------------------

The - on demand - inclusion of this could be implemented by using STANBOL-488 - EnhancementProperties - on a per-request base. 

But this would bypass the now preferred multi-part content item API. With this functionality the intended way for get the plain text content is use the multi-part content item RESTful API extension to request specific content parts of ContentItems. I am currently working on a detailed documentation on that with a lot of examples for typical use cases (including this one). The Web UI of the Enhancer already includes a description of the RESTful services.

In general I do not understand the mention of SPARQL, because the Stanbol Enhancer does not store enhancement results and therefore can not provide a sparql endpoint. If this refers to the SPARQL endpoint of the Contenthub, than you might want to have a look at STANBOL-471. This would allow to have a SPARQL endpoint on top of (S1) - the store that contains enhanced content items.

As part of the work an STANBOL-471 I will also provide an LDPath wrapper for ContentItems. This will also have access to the "contents" of the contentItem. A simple LDPath command like

nie:plainTextContent = fn:content(".","text/plain");

would than take the content stored in the current contentItem with the mime type "text/plain" and index/store the value with the property nie:plainTextContent

best
Rupert
                
      was (Author: rwesten):
    This could be implemented by using https://issues.apache.org/jira/browse/STANBOL-488 on a per-request bases.
                  
> Change Metaxa Engine to create PlainText version as ContentPart and change other Engines to retrieve PlainText version from ContentPart
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-478
>                 URL: https://issues.apache.org/jira/browse/STANBOL-478
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Walter Kasper
>
> Instead of adding/reading the "text/plain" version of an ContentItem to/from the metadata of the ContentItem the new ContentPart API should be used for that.
> This will require the Metaxa Engine to store literal values of all Triples with the ContentItem.getUri() as subject and
>     
>     http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> as property to a Blob and add this as ContentPart to the ContentItem.
> Other EnhancementEngines need than to search for a Blob with the MimeType "text/plain" instead of retrieving the plain text from the metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira