You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Barnabas Szasz <bs...@gmail.com> on 2013/11/25 10:37:05 UTC

Stanbol and scraper integration

Dear Stanbol community,

I have a use case where I need to extract metadata from structures (mostly HTML or XML), where the position determines the meaning. Since entity recognition is part of the task (so the extracted strings should be resolved against vocabularies) I am considering Stanbol for this job.
Now the question is where a scraper (like scraperwiki.com) would fit in such an architecture? Shall I implement a wrapper for the scraper as an enhancer? In this case if an engine adds annotation to the document in the chain, would the next engine in the chain be able to do entity recognition on the annotation?
Or would you recommend a different approach?

Thanks,
Barna

Re: Stanbol and scraper integration

Posted by Rafa Haro <rh...@apache.org>.

Hi Barnabas,

Could you provide more details about how your documents are structured 
and would be the format of the extracted strings using the scraper?

Regards,

Rafa

El 25/11/13 10:37, Barnabas Szasz escribió:
> Dear Stanbol community,
>
> I have a use case where I need to extract metadata from structures (mostly HTML or XML), where the position determines the meaning. Since entity recognition is part of the task (so the extracted strings should be resolved against vocabularies) I am considering Stanbol for this job.
> Now the question is where a scraper (like scraperwiki.com) would fit in such an architecture? Shall I implement a wrapper for the scraper as an enhancer? In this case if an engine adds annotation to the document in the chain, would the next engine in the chain be able to do entity recognition on the annotation?
> Or would you recommend a different approach?
>
> Thanks,
> Barna
>