You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Rupert Westenthaler <ru...@gmail.com> on 2012/03/23 17:21:14 UTC

Fwd: [Dbp-spotlight-users] [GSoC 2012] Project Proposal for "Integrate DBpedia Spotlight as Enhancement Engine within Apache Stanbol" (Siwei Yu)

Hi Stanbol community

Let me forward this very good discussion and proposal for integrating DBpedia Spotlight with Apache Stanbol. 

Feedback is very welcome!

best
Rupert Westenthaler

> From: Pablo Mendes <pa...@gmail.com>
> Subject: Re: [Dbp-spotlight-users] [GSoC 2012] Project Proposal for "Integrate DBpedia Spotlight as Enhancement Engine within Apache Stanbol" (Siwei Yu)
> Date: 23. März 2012 16:02:24 MEZ
> To: Siwei Yu <ma...@gmail.com>
> Cc: Rupert Westenthaler <ru...@gmail.com>, dbp-spotlight-developers@lists.sourceforge.net
> 
> 
> Hi Siwei, (switching to dbp-spotlight-developers, as to avoid spamming users in dbp-spotlight-users)
> Please see answers below.
> 
> On Fri, Mar 23, 2012 at 3:51 PM, Siwei Yu <ma...@gmail.com> wrote:
> Dear Pablo and Rupert,
> 
> I'm sorry to post an incomplete email just now. Please ignore the
> previous email.
> 
> No problem. I figured it was an accidental ctrl+enter.
>  
> 
> Thanks a lot for your instructions! According to your comments, let me
> summarise the current status of the service mapped to the four stages:
> (1) Spotting, (2) Candidate Selection, (3) Disambiguation, (4)
> Filtering
> /annotate: (1), (2), (3)first candidate, (4)
> /candidate: (1), (2), (3)all candidate
> /disambiguate: (3)
> /feedback: not implemented
> Please let me know if the previous summary is incorrect.
> 
> 
> Correct.
> 
>  
> 
> However, Apache Stanbol each Enhancement Engine in an Enhancement
> Chain handles single task respectively (Rupert, is it true?). The
> functions of Enhancement Engines are not supposed to overlap others.
> We need to adjust the services of DBpedia Spotlight as follows:
> /spot: (1), to be implemented in this project, for DBpediaSpotlightSpotEngine
> 
> 
> It is likely that we will implement /spot for release v0.6, which may happen before GSoC starts.
> 
> 
> /candidate: (2), to be refactored from current status, for
> DBpediaSpotlightCandidateEngine
> /disambiguate: (3), to be refactored from current status, for
> DBpediaSpotlightDisambiguateEngine
> 
> 
> We would probably provide a wrapper, rather than a refactored version.
> 
>  
> /filter: (4), to be implemented in this project, for
> DBpediaSpotlightFilterEngine
> As to /annotate, I think it's a complicated service which is not
> applicable for Apache Stanbol's "single task for each Enhancement
> Engine" requirement. But we can retain it for DBpedia Spotlight for
> other users (i.e. not for Apache Stanbol).
> 
> Sounds like /annotate would be an enhancement chain.
>  
> The /feedback API could be interesting, which I'd like to try to
> implement. More details should be discussed beforehand. However, I'm
> not sure there's enough time to complete it in this two-month summer.
> 
> I don't feel like wrapping DBpedia Spotlight classes is enough for a summer-long coding project.
> You should include the /feedback in your project to make it stronger. 
> This API should take in feedback from any CMS, as Stanbol is CMS-agnostic.
> It should be able to store and later let engines query those, in order to learn from their mistakes.
> You could think, for example, about filtering implementations that would use feedback data to stop making the same mistakes.
> This is potentially the most interesting part for this project idea.
> 
>  
> 
> If the project scopes discussed above are generally OK, I'd like to
> think about the project plan and come up with a project proposal
> draft.
> 
> By the way, I have two small questions for DBpedia Spotlight Spotting
> and Enhancement Chain:
> 1. For Pablo, it's mentioned in [3] that there're three
> implementations for Spotting: Ling Pipe Spotter, Trie Spotter, Ling
> Pipe Chunk Spotter. How does /annotate determine which the best
> implementation is, for a service request? Can the user choose among
> them manually by sending different parameter(s)?
> 
> We also have by now 4 other implementations. We have to update the documentation.
> Please see: 
> http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Daiber-Rajapakse-Sasaki-Bizer-DBpediaSpotlight-LREC2012.pdf
>  
> 2. For Rupert, could you please show me some examples of Enhancement
> Chain? I've studied some Enhancement Engines here [1]. I can
> understand how an individual Enhancement Engine works and how to
> implement a new one. After studying [2], I find Enhancement Chain a
> little confusing. Could you please lead me to the source code of the
> implementation of a concrete Enhancement Chain? I want to know the
> data I/O interface from one Enhancement Engine to another. In other
> words, how do the output of an Enhancement Engine become the input of
> another one?
> 
> Best regards,
> Siwei Yu
> 
> [1] http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/list.html
> [2] http://incubator.apache.org/stanbol/docs/trunk/enhancer/chains/
> [3] http://wiki.dbpedia.org/spotlight/technicaldocumentation?v=3qy
> 
> > On Wed, Mar 21, 2012 at 4:27 PM, Rupert Westenthaler
> > <ru...@gmail.com> wrote:
> >>
> >> Hi Siwei Yu, Pablo
> >>
> >> see my comments inline. To make it better readable I also removed the
> >> parts of the mail that are not relevant to my comments.
> >>
> >> On Wed, Mar 21, 2012 at 12:01 AM, Pablo Mendes <pa...@gmail.com> wrote:
> >> > On Tue, Mar 20, 2012 at 4:24 PM, Siwei Yu <ma...@gmail.com> wrote:
> >> >> 2. Should I develop one Enhancement Engine containing three services,
> >> >> or three engines (i.e. each service as an engine)? It's maybe related
> >> >> to the service function granularity. What's your opinion?
> >> >
> >> >
> >> > We could have one engine for each task separately, and an enhancement chain
> >> > should connect them together. We should also introduce a REST API /spot for
> >> > (1). We could perhaps make /candidates implement only (2) and make /annotate
> >> > accept a &verbose=on to act like the current /candidates does.
> >> >
> >> > Besides all of this reorganization that has to happen, Rupert is the guy
> >> > from Stanbol that can help you position your application in that regard.
> >> >
> >>
> >> I fully agree with that.
> >>
> >> Having separate EnhancementEngines for spotting, candidates selection
> >> and disambiguation would provide a lot of additional flexibility to
> >> experienced Stanbol users as they could even use parts of the DBpedia
> >> Spotlight functionalities within their existing enhancement engines.
> >>
> >> The definition of a  DBpedia Spotlight EnhancementChain ensures that
> >> typical users can use Spotlight without the need to know the inner
> >> working. Users would just need to send enhancement requests to
> >> "http://{host}:{port}/enhancer/chin/dbpedia" assuming that the DBpedia
> >> Spotlight chain is called "dbpedia". There would even be the
> >> possibility to make the Dbpedia Spotlight EnhancementChain the default
> >> enhancement chain so that requests to "/enhancer" would be processed
> >> by it.
> >>
> >> >>
> >> >> By the way, my name is Siwei Yu. I have good knowledge of semantic
> >> >> technologies, such as RDF, OWL, SPARQL. I'm also familiar with the
> >> >> mainstream Java based RDF/OWL processing tools like owlapi, Jena,
> >> >> Sesame, AllegroGraph. I have strong Java coding skills with of good
> >> >> knowledge of the software design patterns. My research background
> >> >> meets the requirements very well. I believe it'll be a wonderful
> >> >> summer working with the DBpedia Spotlight community.
> >> >
> >> >
> >> > It would be good if you leveraged some of your Semantic Web background in
> >> > your application. The idea of a /feedback API, which receives corrections
> >> > made by the users could fit well in this regard.
> >> >
> >>
> >> A feedback API is also something that would be interesting for the
> >> Stanbol Enhancer.
> >>
> >> best
> >> Rupert Westenthaler
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> 


Re: [Dbp-spotlight-users] [GSoC 2012] Project Proposal for "Integrate DBpedia Spotlight as Enhancement Engine within Apache Stanbol" (Siwei Yu)

Posted by seralf <se...@gmail.com>.
looks very promising!

what about IKS?
http://wiki.iks-project.eu/index.php/Main_Page



2012/3/23 Rupert Westenthaler <ru...@gmail.com>

> Hi Stanbol community
>
> Let me forward this very good discussion and proposal for integrating
> DBpedia Spotlight with Apache Stanbol.
>
> Feedback is very welcome!
>
> best
> Rupert Westenthaler
>
> > From: Pablo Mendes <pa...@gmail.com>
> > Subject: Re: [Dbp-spotlight-users] [GSoC 2012] Project Proposal for
> "Integrate DBpedia Spotlight as Enhancement Engine within Apache Stanbol"
> (Siwei Yu)
> > Date: 23. März 2012 16:02:24 MEZ
> > To: Siwei Yu <ma...@gmail.com>
> > Cc: Rupert Westenthaler <ru...@gmail.com>,
> dbp-spotlight-developers@lists.sourceforge.net
> >
> >
> > Hi Siwei, (switching to dbp-spotlight-developers, as to avoid spamming
> users in dbp-spotlight-users)
> > Please see answers below.
> >
> > On Fri, Mar 23, 2012 at 3:51 PM, Siwei Yu <ma...@gmail.com> wrote:
> > Dear Pablo and Rupert,
> >
> > I'm sorry to post an incomplete email just now. Please ignore the
> > previous email.
> >
> > No problem. I figured it was an accidental ctrl+enter.
> >
> >
> > Thanks a lot for your instructions! According to your comments, let me
> > summarise the current status of the service mapped to the four stages:
> > (1) Spotting, (2) Candidate Selection, (3) Disambiguation, (4)
> > Filtering
> > /annotate: (1), (2), (3)first candidate, (4)
> > /candidate: (1), (2), (3)all candidate
> > /disambiguate: (3)
> > /feedback: not implemented
> > Please let me know if the previous summary is incorrect.
> >
> >
> > Correct.
> >
> >
> >
> > However, Apache Stanbol each Enhancement Engine in an Enhancement
> > Chain handles single task respectively (Rupert, is it true?). The
> > functions of Enhancement Engines are not supposed to overlap others.
> > We need to adjust the services of DBpedia Spotlight as follows:
> > /spot: (1), to be implemented in this project, for
> DBpediaSpotlightSpotEngine
> >
> >
> > It is likely that we will implement /spot for release v0.6, which may
> happen before GSoC starts.
> >
> >
> > /candidate: (2), to be refactored from current status, for
> > DBpediaSpotlightCandidateEngine
> > /disambiguate: (3), to be refactored from current status, for
> > DBpediaSpotlightDisambiguateEngine
> >
> >
> > We would probably provide a wrapper, rather than a refactored version.
> >
> >
> > /filter: (4), to be implemented in this project, for
> > DBpediaSpotlightFilterEngine
> > As to /annotate, I think it's a complicated service which is not
> > applicable for Apache Stanbol's "single task for each Enhancement
> > Engine" requirement. But we can retain it for DBpedia Spotlight for
> > other users (i.e. not for Apache Stanbol).
> >
> > Sounds like /annotate would be an enhancement chain.
> >
> > The /feedback API could be interesting, which I'd like to try to
> > implement. More details should be discussed beforehand. However, I'm
> > not sure there's enough time to complete it in this two-month summer.
> >
> > I don't feel like wrapping DBpedia Spotlight classes is enough for a
> summer-long coding project.
> > You should include the /feedback in your project to make it stronger.
> > This API should take in feedback from any CMS, as Stanbol is
> CMS-agnostic.
> > It should be able to store and later let engines query those, in order
> to learn from their mistakes.
> > You could think, for example, about filtering implementations that would
> use feedback data to stop making the same mistakes.
> > This is potentially the most interesting part for this project idea.
> >
> >
> >
> > If the project scopes discussed above are generally OK, I'd like to
> > think about the project plan and come up with a project proposal
> > draft.
> >
> > By the way, I have two small questions for DBpedia Spotlight Spotting
> > and Enhancement Chain:
> > 1. For Pablo, it's mentioned in [3] that there're three
> > implementations for Spotting: Ling Pipe Spotter, Trie Spotter, Ling
> > Pipe Chunk Spotter. How does /annotate determine which the best
> > implementation is, for a service request? Can the user choose among
> > them manually by sending different parameter(s)?
> >
> > We also have by now 4 other implementations. We have to update the
> documentation.
> > Please see:
> >
> http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Daiber-Rajapakse-Sasaki-Bizer-DBpediaSpotlight-LREC2012.pdf
> >
> > 2. For Rupert, could you please show me some examples of Enhancement
> > Chain? I've studied some Enhancement Engines here [1]. I can
> > understand how an individual Enhancement Engine works and how to
> > implement a new one. After studying [2], I find Enhancement Chain a
> > little confusing. Could you please lead me to the source code of the
> > implementation of a concrete Enhancement Chain? I want to know the
> > data I/O interface from one Enhancement Engine to another. In other
> > words, how do the output of an Enhancement Engine become the input of
> > another one?
> >
> > Best regards,
> > Siwei Yu
> >
> > [1]
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/list.html
> > [2] http://incubator.apache.org/stanbol/docs/trunk/enhancer/chains/
> > [3] http://wiki.dbpedia.org/spotlight/technicaldocumentation?v=3qy
> >
> > > On Wed, Mar 21, 2012 at 4:27 PM, Rupert Westenthaler
> > > <ru...@gmail.com> wrote:
> > >>
> > >> Hi Siwei Yu, Pablo
> > >>
> > >> see my comments inline. To make it better readable I also removed the
> > >> parts of the mail that are not relevant to my comments.
> > >>
> > >> On Wed, Mar 21, 2012 at 12:01 AM, Pablo Mendes <pa...@gmail.com>
> wrote:
> > >> > On Tue, Mar 20, 2012 at 4:24 PM, Siwei Yu <ma...@gmail.com>
> wrote:
> > >> >> 2. Should I develop one Enhancement Engine containing three
> services,
> > >> >> or three engines (i.e. each service as an engine)? It's maybe
> related
> > >> >> to the service function granularity. What's your opinion?
> > >> >
> > >> >
> > >> > We could have one engine for each task separately, and an
> enhancement chain
> > >> > should connect them together. We should also introduce a REST API
> /spot for
> > >> > (1). We could perhaps make /candidates implement only (2) and make
> /annotate
> > >> > accept a &verbose=on to act like the current /candidates does.
> > >> >
> > >> > Besides all of this reorganization that has to happen, Rupert is
> the guy
> > >> > from Stanbol that can help you position your application in that
> regard.
> > >> >
> > >>
> > >> I fully agree with that.
> > >>
> > >> Having separate EnhancementEngines for spotting, candidates selection
> > >> and disambiguation would provide a lot of additional flexibility to
> > >> experienced Stanbol users as they could even use parts of the DBpedia
> > >> Spotlight functionalities within their existing enhancement engines.
> > >>
> > >> The definition of a  DBpedia Spotlight EnhancementChain ensures that
> > >> typical users can use Spotlight without the need to know the inner
> > >> working. Users would just need to send enhancement requests to
> > >> "http://{host}:{port}/enhancer/chin/dbpedia" assuming that the
> DBpedia
> > >> Spotlight chain is called "dbpedia". There would even be the
> > >> possibility to make the Dbpedia Spotlight EnhancementChain the default
> > >> enhancement chain so that requests to "/enhancer" would be processed
> > >> by it.
> > >>
> > >> >>
> > >> >> By the way, my name is Siwei Yu. I have good knowledge of semantic
> > >> >> technologies, such as RDF, OWL, SPARQL. I'm also familiar with the
> > >> >> mainstream Java based RDF/OWL processing tools like owlapi, Jena,
> > >> >> Sesame, AllegroGraph. I have strong Java coding skills with of good
> > >> >> knowledge of the software design patterns. My research background
> > >> >> meets the requirements very well. I believe it'll be a wonderful
> > >> >> summer working with the DBpedia Spotlight community.
> > >> >
> > >> >
> > >> > It would be good if you leveraged some of your Semantic Web
> background in
> > >> > your application. The idea of a /feedback API, which receives
> corrections
> > >> > made by the users could fit well in this regard.
> > >> >
> > >>
> > >> A feedback API is also something that would be interesting for the
> > >> Stanbol Enhancer.
> > >>
> > >> best
> > >> Rupert Westenthaler
> > >>
> > >> --
> > >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > >> | Bodenlehenstraße 11                             ++43-699-11108907
> > >> | A-5500 Bischofshofen
> >
>
>