You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Srecko Joksimovic <sr...@gmail.com> on 2012/01/09 23:31:10 UTC
RE: Taxonomy linking
Hello Rupert,
Could you please give me an example of annotation various types of
documents?
As I understood from
http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/metaxaengine
.html
And
curl -i -X POST -H "Content-Type:text/html" -T testpage.html
http://localhost:8080/engines
MIME type should match to document type. But (maybe this is going to be
stupid question). when I annotated text, I called method like this:
IOUtils.write(_string_to_annotate, out);
IOUtils.closeQuietly(out);
For document of any type, I should probably convert document content to byte
array, and then call similar method?
I'm asking this because I didn't see the possibility to provide document URL
and to get results. I suppose that this would be the only way?
Best,
Srecko
Re: Taxonomy linking
Posted by srecko joksimovic <sr...@gmail.com>.
Thank you very much!
That was the information I needed.
Best
On Tue, Jan 10, 2012 at 8:11 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:
>
> On 09.01.2012, at 23:31, Srecko Joksimovic wrote:
>
> > Hello Rupert,
> >
> > Could you please give me an example of annotation various types of
> documents?
> > As I understood from
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/metaxaengine.html
> >
> > And
> >
> > curl -i -X POST -H "Content-Type:text/html" -T testpage.html
> http://localhost:8080/engines
> >
> > MIME type should match to document type. But (maybe this is going to be
> stupid question)… when I annotated text, I called method like this:
> > IOUtils.write(_string_to_annotate, out);
> > IOUtils.closeQuietly(out);
> >
> > For document of any type, I should probably convert document content to
> byte array, and then call similar method?
> > I’m asking this because I didn’t see the possibility to provide document
> URL and to get results. I suppose that this would be the only way?
> >
>
> Generally the MIME type of the content MUST match the parsed value of the
> Content-Type header. Maybe the Metaxa engine has also some way to detect
> the MIME type based on the content, But I do not know if this is the case.
>
> It is also true that for binary documents you need to use byte oriented
> methods of IOUtils. However I would also consider to "stream" the data
> directly from the file to the OutputStream of the POST request to avoid
> loading the whole content into memory.
>
> Note that for textual content you should also correctly set the Charset.
> If you use an other Charset than "UTF-8" you do need to set it as parameter
> to the parsed "Content-Type" header. Such as
>
> Media-Type: text/plain; charset=UTF-16
>
> best
> Rupert
>
>
Re: Taxonomy linking
Posted by Rupert Westenthaler <ru...@gmail.com>.
On 09.01.2012, at 23:31, Srecko Joksimovic wrote:
> Hello Rupert,
>
> Could you please give me an example of annotation various types of documents?
> As I understood from http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/metaxaengine.html
>
> And
>
> curl -i -X POST -H "Content-Type:text/html" -T testpage.html http://localhost:8080/engines
>
> MIME type should match to document type. But (maybe this is going to be stupid question)… when I annotated text, I called method like this:
> IOUtils.write(_string_to_annotate, out);
> IOUtils.closeQuietly(out);
>
> For document of any type, I should probably convert document content to byte array, and then call similar method?
> I’m asking this because I didn’t see the possibility to provide document URL and to get results. I suppose that this would be the only way?
>
Generally the MIME type of the content MUST match the parsed value of the Content-Type header. Maybe the Metaxa engine has also some way to detect the MIME type based on the content, But I do not know if this is the case.
It is also true that for binary documents you need to use byte oriented methods of IOUtils. However I would also consider to "stream" the data directly from the file to the OutputStream of the POST request to avoid loading the whole content into memory.
Note that for textual content you should also correctly set the Charset. If you use an other Charset than "UTF-8" you do need to set it as parameter to the parsed "Content-Type" header. Such as
Media-Type: text/plain; charset=UTF-16
best
Rupert