You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Alex Lopez <al...@flordeutopia.pt> on 2011/03/14 16:06:55 UTC

Categorization

Hi Stanbol devs,

we are working on a semantic app. that will/should do content 
categorization and other stuff.

So I have some code that takes detected entities as input (in form of 
dbpedia urls) and looks up in dbpedia both YAGO categories and wikipedia 
categories (the ones that use to be skos:subject and now are dct:subject).

But of course this is tied to particular namespaces/service, I would 
like to expand it and abstract the particulars while retaining the 
functionality, something like a wrapper:

input: resource/resources/text (raw content)
output: categories/topics

Keep in mind I'm not looking for categories of the sort 
Person/Organisation... more of the sort Science/Jazz Musicians etc

So I've been following Stanbol's devs list for some time, and I'm exited 
about the possibilities, maybe I can use some of it for this particular 
requeriment. Right now, I see it as a collection of services, for 
example I can see zemanta doing what I want with DMOZ topics, included 
as an engine. (are there other engines doing similar?)

But I wonder, is there a "central" place with methods/services doing 
this for all implementations? or perhaps I misunderstood what the 
project is about...

kind of a getTopicsForResource(){
     getDMOZ();
     getWikipediaCategories();
     getFreebaseCategories();
     ...
}

If not, what are good place to look for this kind of functionality so I 
can include it in my method?

Now I understand this in an incubating project so perhaps this is a 
planned feature, do you have any roadmap? Any expected date for a "first 
release" of stanbol?

Thanks and congratulations for an already impressing project! We'll be 
definetelly checking more of it out.

Regards,

Alex Lopez
Flor de Utopia

Re: Categorization

Posted by Alex Lopez <al...@flordeutopia.pt>.
Thank you Olivier,
we'll go on to test OpenCalais / Zemanta / SalsaDev separately for the 
time being.

Em 16-03-2011 11:20, Olivier Grisel escreveu:
> 2011/3/14 Alex Lopez<al...@flordeutopia.pt>:
>> Hi Stanbol devs,
>>
>> we are working on a semantic app. that will/should do content categorization
>> and other stuff.
>>
>> So I have some code that takes detected entities as input (in form of
>> dbpedia urls) and looks up in dbpedia both YAGO categories and wikipedia
>> categories (the ones that use to be skos:subject and now are dct:subject).
>>
>> But of course this is tied to particular namespaces/service, I would like to
>> expand it and abstract the particulars while retaining the functionality,
>> something like a wrapper:
>>
>> input: resource/resources/text (raw content)
>> output: categories/topics
>>
>> Keep in mind I'm not looking for categories of the sort
>> Person/Organisation... more of the sort Science/Jazz Musicians etc
>>
>> So I've been following Stanbol's devs list for some time, and I'm exited
>> about the possibilities, maybe I can use some of it for this particular
>> requeriment. Right now, I see it as a collection of services, for example I
>> can see zemanta doing what I want with DMOZ topics, included as an engine.
>> (are there other engines doing similar?)
>>
>> But I wonder, is there a "central" place with methods/services doing this
>> for all implementations? or perhaps I misunderstood what the project is
>> about...
>>
>> kind of a getTopicsForResource(){
>>     getDMOZ();
>>     getWikipediaCategories();
>>     getFreebaseCategories();
>>     ...
>> }
>>
>> If not, what are good place to look for this kind of functionality so I can
>> include it in my method?
>>
>> Now I understand this in an incubating project so perhaps this is a planned
>> feature, do you have any roadmap? Any expected date for a "first release" of
>> stanbol?
>
> Yes this is a planned feature. The existing
> RelatedTopicEnhancementEngine is to be reimplemented to use the entity
> hub index and to build predefined topic indexes out of the dbpedia
> skos hierarchy and the fulltext of the related articles (to be able to
> perform similarity queries using the MoreLikeThis feature of Solr).
>
> We also need to extend the Stanbol vocabulary to handle topics that
> are not entities.
>
> In the mean time you can use OpenCalais / Zemanta / SalsaDev directly.
>

Re: Categorization

Posted by Olivier Grisel <ol...@ensta.org>.
2011/3/14 Alex Lopez <al...@flordeutopia.pt>:
> Hi Stanbol devs,
>
> we are working on a semantic app. that will/should do content categorization
> and other stuff.
>
> So I have some code that takes detected entities as input (in form of
> dbpedia urls) and looks up in dbpedia both YAGO categories and wikipedia
> categories (the ones that use to be skos:subject and now are dct:subject).
>
> But of course this is tied to particular namespaces/service, I would like to
> expand it and abstract the particulars while retaining the functionality,
> something like a wrapper:
>
> input: resource/resources/text (raw content)
> output: categories/topics
>
> Keep in mind I'm not looking for categories of the sort
> Person/Organisation... more of the sort Science/Jazz Musicians etc
>
> So I've been following Stanbol's devs list for some time, and I'm exited
> about the possibilities, maybe I can use some of it for this particular
> requeriment. Right now, I see it as a collection of services, for example I
> can see zemanta doing what I want with DMOZ topics, included as an engine.
> (are there other engines doing similar?)
>
> But I wonder, is there a "central" place with methods/services doing this
> for all implementations? or perhaps I misunderstood what the project is
> about...
>
> kind of a getTopicsForResource(){
>    getDMOZ();
>    getWikipediaCategories();
>    getFreebaseCategories();
>    ...
> }
>
> If not, what are good place to look for this kind of functionality so I can
> include it in my method?
>
> Now I understand this in an incubating project so perhaps this is a planned
> feature, do you have any roadmap? Any expected date for a "first release" of
> stanbol?

Yes this is a planned feature. The existing
RelatedTopicEnhancementEngine is to be reimplemented to use the entity
hub index and to build predefined topic indexes out of the dbpedia
skos hierarchy and the fulltext of the related articles (to be able to
perform similarity queries using the MoreLikeThis feature of Solr).

We also need to extend the Stanbol vocabulary to handle topics that
are not entities.

In the mean time you can use OpenCalais / Zemanta / SalsaDev directly.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel