You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by harish suvarna <hs...@gmail.com> on 2013/02/21 23:26:16 UTC

Re: Stanbol serialization

Thanks Rupert. Another related question is that I have an engine and it
returns json-ld with annotations. Is there a way I output the same json-ld
without any modifications through Stanbol serialization. I.e. the restful
api to the engine-chain returns us back the same json-ld produced by this
particular enhancement engine?

That means we may have to drop all the default namespace definitions,
annotations by language-detection that come in the current serialization.
More than dropping these, if I can pass-through the json-ld returned by the
engine, that be great.

-harish

On Sat, Jan 19, 2013 at 12:48 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi
>
> I am not sure if I 100% understand your question but here are some
> links that might help
>
> [1] described TopicAnnotations
> [2] describes how to use TopicAnnotations in Tag/Category suggestion use
> cases
>
> In Summary topics are annotated by
>
> * creating a fise:TextAnnotation that selects the part of the text
> where the Topics apply to. Currently this is always the whole document
> (meaning a filse:TextAnnotation with no fise:start/fise:end values
> * creating one fise:topicAnnotation per suggested Topic and linking
> those (by the dc:relation property) with the fise:TextAnnotation.
>
> hope this answers your question
>
> best
> Rupert
>
>
> [1]
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetopicannotation
> [2]
> http://stanbol.apache.org/docs/trunk/enhancementusage.html#process-content-categorizations
>
> On Sat, Jan 19, 2013 at 2:18 AM, harish suvarna <hs...@gmail.com>
> wrote:
> > I am trying to output possible document categories from the enahcnement
> > engine. Since the document categories are at the document level, they are
> > not assosiated with any content keywords or phrases or any selection
> text.
> > I see the code in keywordlinking engine which stores the identified
> > phrases/keywords in the graph. I am just wondering whether I can change
> the
> > Stanbol serialization code to accomodate some of my own name spaces in
> > json-ld and a special dictionary. For ex, if an article on President
> > Obama's peace initiatives may have 2 or 3 categories.
> >
> > Peace ---> World Countries-> USA (--> indicates parent)
> > Accomplishments--.American Presidents--->Presidents--->WordlPoliticians
> >
> > I would like to store these in json-ld as text annotations or topic
> > annotations (not sure.) The language detection is stored as text
> annotation.
> > Can you please hint me how I can output my own
> >
> > a. namespace MyOnt http://xxxxx.yyy.xom/ontologies
> > b. whether it should be text annotation or topic annotation
> > c. any other engine which stores document level meta data.
> >
> > I am trying to execute the topic engine in stanbol to understand how it
> > stores but unfortunately the topic engine is not working for me (I am
> > posting another thread on that).
> >
> > {
> >       "@subject": "urn:enhancement-2ed240f7-91cc-cfcd-7097-2d9eea6c4d6d",
> >       "@type": [
> >         "enhancer:Enhancement",
> >         "enhancer:TextAnnotation"
> >       ],
> >       "dc:created": "2012-10-14T12:27:35.503Z",
> >       "dc:creator":
> > "org.apache.stanbol.enhancer.engines.langdetect.MyKeyWordEngine",
> >       "enhancer:entity-type": [
> >         "MyOnt:USA",
> >         "MyOnt:World Countries",
> >         "MyOnt:Peace",
> >       ],
> >  },
> > {
> >       "@subject": "urn:enhancement-2ed240f7-91cc-cfcd-7097-2d9eea6c4d6d",
> >       "@type": [
> >         "enhancer:Enhancement",
> >         "enhancer:TextAnnotation"
> >       ],
> >       "dc:created": "2012-10-14T12:27:35.503Z",
> >       "dc:creator":
> > "org.apache.stanbol.enhancer.engines.langdetect.MyKeyWordEngine",
> >       "enhancer:entity-type": [
> >         "MyOnt:World Politicians",
> >         "MyOnt:Presidents",
> >         "MyOnt:American Presidents",
> >         "MyOnt:Accomplishments"
> >       ],
> >  },
> >
> > --
> > Thanks
> > Harish
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
Thanks
Harish

Re: Stanbol serialization

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Harish

While EnhancementEngines are allowed to perform any changes to the
metadata of the parsed ContentItem (this includes the deletion of
enhancements of other Engines) they do not have influence on the
Serialization of the RDF graph representing the enhancement metadata.

While not intended by using the multipart ContentItem extension you
might be able to achieve what you are trying to do. [1] provides an
example for that.

So assuming that your EnhancementEngine serializes parts/all of the
enhancement metadata as JsonLD to a Blob and than adds this Blob as
ContentPart to the ContentItem.

    @Reference
    ContentItemFactory ciFactory;

    //within the computeEnhancement method
    ContentItem ci; //the content item parsed to the method

    ContentSink jsonLdSink;
    try {
        jsonLdSink = ciFactory.createContentSink(
            "application/ld+json; charset="+UTF8.name());
    } catch (IOException e) {
        IOUtils.closeQuietly(in); //close the input stream
        throw new EngineException("Error while initialising Blob for" +
            "writing the JsonLD serialized enhancement resultst",e);
    }

    //Now you can obtain an OutputStream from the ContentSink
    //to stream the serialized JsonLD
    jsonLdSink.getOutputStream();

    //close the output stream after serialization finishes

    //finally you need to create an URI for your ContentPart
    String random = randomUUID().toString(); // you can also use fixed URI
    UriRef textBlobUri = new UriRef("urn:{your-engine}:jsonld:"+random);

    //and add your Blob as ContentPart
    ci.addPart(textBlobUri, plainTextSink.getBlob());

After that a request like

    curl -v -X POST -H "Accept: application/ld+json" \
        -H "Content-type: text/plain; charset=UTF-8" \
        -T {text-file} http://localhost:8080/enhancer?omitMetadata=true

should return the Data as serialized by your EnhancementEngine to the Blob

I have never tried to use this for custom serialized RDF, but I do use
this sometimes to use the Stanbol Enhancer API to access Apache Tika
and there is also an integration test for this API. So if their is not
a specific issue with RDF specific media types in the Accept header it
should work as described.

best
Rupert


[1] http://stanbol.apache.org/docs/trunk/components/enhancer/enhancerrest.html#example-2-directly-return-the-plain-text-version-of-parsed-content



On Thu, Feb 21, 2013 at 11:26 PM, harish suvarna <hs...@gmail.com> wrote:
> Thanks Rupert. Another related question is that I have an engine and it
> returns json-ld with annotations. Is there a way I output the same json-ld
> without any modifications through Stanbol serialization. I.e. the restful
> api to the engine-chain returns us back the same json-ld produced by this
> particular enhancement engine?
>
> That means we may have to drop all the default namespace definitions,
> annotations by language-detection that come in the current serialization.
> More than dropping these, if I can pass-through the json-ld returned by the
> engine, that be great.
>
> -harish
>
> On Sat, Jan 19, 2013 at 12:48 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi
>>
>> I am not sure if I 100% understand your question but here are some
>> links that might help
>>
>> [1] described TopicAnnotations
>> [2] describes how to use TopicAnnotations in Tag/Category suggestion use
>> cases
>>
>> In Summary topics are annotated by
>>
>> * creating a fise:TextAnnotation that selects the part of the text
>> where the Topics apply to. Currently this is always the whole document
>> (meaning a filse:TextAnnotation with no fise:start/fise:end values
>> * creating one fise:topicAnnotation per suggested Topic and linking
>> those (by the dc:relation property) with the fise:TextAnnotation.
>>
>> hope this answers your question
>>
>> best
>> Rupert
>>
>>
>> [1]
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetopicannotation
>> [2]
>> http://stanbol.apache.org/docs/trunk/enhancementusage.html#process-content-categorizations
>>
>> On Sat, Jan 19, 2013 at 2:18 AM, harish suvarna <hs...@gmail.com>
>> wrote:
>> > I am trying to output possible document categories from the enahcnement
>> > engine. Since the document categories are at the document level, they are
>> > not assosiated with any content keywords or phrases or any selection
>> text.
>> > I see the code in keywordlinking engine which stores the identified
>> > phrases/keywords in the graph. I am just wondering whether I can change
>> the
>> > Stanbol serialization code to accomodate some of my own name spaces in
>> > json-ld and a special dictionary. For ex, if an article on President
>> > Obama's peace initiatives may have 2 or 3 categories.
>> >
>> > Peace ---> World Countries-> USA (--> indicates parent)
>> > Accomplishments--.American Presidents--->Presidents--->WordlPoliticians
>> >
>> > I would like to store these in json-ld as text annotations or topic
>> > annotations (not sure.) The language detection is stored as text
>> annotation.
>> > Can you please hint me how I can output my own
>> >
>> > a. namespace MyOnt http://xxxxx.yyy.xom/ontologies
>> > b. whether it should be text annotation or topic annotation
>> > c. any other engine which stores document level meta data.
>> >
>> > I am trying to execute the topic engine in stanbol to understand how it
>> > stores but unfortunately the topic engine is not working for me (I am
>> > posting another thread on that).
>> >
>> > {
>> >       "@subject": "urn:enhancement-2ed240f7-91cc-cfcd-7097-2d9eea6c4d6d",
>> >       "@type": [
>> >         "enhancer:Enhancement",
>> >         "enhancer:TextAnnotation"
>> >       ],
>> >       "dc:created": "2012-10-14T12:27:35.503Z",
>> >       "dc:creator":
>> > "org.apache.stanbol.enhancer.engines.langdetect.MyKeyWordEngine",
>> >       "enhancer:entity-type": [
>> >         "MyOnt:USA",
>> >         "MyOnt:World Countries",
>> >         "MyOnt:Peace",
>> >       ],
>> >  },
>> > {
>> >       "@subject": "urn:enhancement-2ed240f7-91cc-cfcd-7097-2d9eea6c4d6d",
>> >       "@type": [
>> >         "enhancer:Enhancement",
>> >         "enhancer:TextAnnotation"
>> >       ],
>> >       "dc:created": "2012-10-14T12:27:35.503Z",
>> >       "dc:creator":
>> > "org.apache.stanbol.enhancer.engines.langdetect.MyKeyWordEngine",
>> >       "enhancer:entity-type": [
>> >         "MyOnt:World Politicians",
>> >         "MyOnt:Presidents",
>> >         "MyOnt:American Presidents",
>> >         "MyOnt:Accomplishments"
>> >       ],
>> >  },
>> >
>> > --
>> > Thanks
>> > Harish
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> Thanks
> Harish



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen