You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Srecko Joksimovic <sr...@gmail.com> on 2012/01/09 23:31:10 UTC

RE: Taxonomy linking

Hello Rupert,

 

Could you please give me an example of annotation various types of
documents?

As I understood from
http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/metaxaengine
.html

 

And 

 

curl -i -X POST -H "Content-Type:text/html" -T testpage.html
http://localhost:8080/engines

 

MIME type should match to document type. But (maybe this is going to be
stupid question). when I annotated text, I called method like this:

 

IOUtils.write(_string_to_annotate, out);           

IOUtils.closeQuietly(out);

 

For document of any type, I should probably convert document content to byte
array, and then call similar method?

I'm asking this because I didn't see the possibility to provide document URL
and to get results. I suppose that this would be the only way?

 

Best,

Srecko

Re: Taxonomy linking

Posted by srecko joksimovic <sr...@gmail.com>.

Thank you very much!

That was the information I needed.

Best

On Tue, Jan 10, 2012 at 8:11 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

>
> On 09.01.2012, at 23:31, Srecko Joksimovic wrote:
>
> > Hello Rupert,
> >
> > Could you please give me an example of annotation various types of
> documents?
> > As I understood from
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/metaxaengine.html
> >
> > And
> >
> > curl -i -X POST -H "Content-Type:text/html" -T testpage.html
> http://localhost:8080/engines
> >
> > MIME type should match to document type. But (maybe this is going to be
> stupid question)… when I annotated text, I called method like this:
> > IOUtils.write(_string_to_annotate, out);
> > IOUtils.closeQuietly(out);
> >
> > For document of any type, I should probably convert document content to
> byte array, and then call similar method?
> > I’m asking this because I didn’t see the possibility to provide document
> URL and to get results. I suppose that this would be the only way?
> >
>
> Generally the MIME type of the content MUST  match the parsed value of the
> Content-Type header. Maybe the Metaxa engine has also some way to detect
> the MIME type based on the content, But I do not know if this is the case.
>
> It is also true that for binary documents you need to use byte oriented
> methods of IOUtils. However I would also consider to "stream" the data
> directly from the file to the OutputStream of the POST request to avoid
> loading the whole content into memory.
>
> Note that for textual content you should also correctly set the Charset.
> If you use an other Charset than "UTF-8" you do need to set it as parameter
> to the parsed "Content-Type" header.  Such as
>
>    Media-Type: text/plain;  charset=UTF-16
>
> best
> Rupert
>
>

Re: Taxonomy linking

Posted by Rupert Westenthaler <ru...@gmail.com>.

On 09.01.2012, at 23:31, Srecko Joksimovic wrote:

> Hello Rupert,
>  
> Could you please give me an example of annotation various types of documents?
> As I understood from http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/metaxaengine.html
>  
> And
>  
> curl -i -X POST -H "Content-Type:text/html" -T testpage.html http://localhost:8080/engines
>  
> MIME type should match to document type. But (maybe this is going to be stupid question)… when I annotated text, I called method like this:
> IOUtils.write(_string_to_annotate, out);          
> IOUtils.closeQuietly(out);
>  
> For document of any type, I should probably convert document content to byte array, and then call similar method?
> I’m asking this because I didn’t see the possibility to provide document URL and to get results. I suppose that this would be the only way?
>  

Generally the MIME type of the content MUST  match the parsed value of the Content-Type header. Maybe the Metaxa engine has also some way to detect the MIME type based on the content, But I do not know if this is the case.

It is also true that for binary documents you need to use byte oriented methods of IOUtils. However I would also consider to "stream" the data directly from the file to the OutputStream of the POST request to avoid loading the whole content into memory.

Note that for textual content you should also correctly set the Charset. If you use an other Charset than "UTF-8" you do need to set it as parameter to the parsed "Content-Type" header.  Such as

    Media-Type: text/plain;  charset=UTF-16

best
Rupert