You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by kritarth anand <kr...@gmail.com> on 2012/07/25 11:50:49 UTC

Entity Disambiguation: Midterm

Hi all,

I would like to start more interaction with the Stanbol Community by
sharing the first iteration of the Entity Disambiguation Engine. I would
really like you all to take a look at it and give me your valuable opinion.

https://github.com/kritarthanand/Disambiguation-Stanbol

The repo consists of the engines' code.It is very easy to install, the
instructions are present in the Readme file.

Besides the engine it also contains my Mid Term Report which describes the
engine a little and also talks about future possible algorithms that can be
used for Entity Disambiguation. Disambiguation is a complex problem and we
should have an efficient and performs well too. Therefore I would really
like Stanbol community to take part in discussion with Enthusiasm.

Please share your views,


Kritarth

Re: Entity Disambiguation: Midterm

Posted by Anuj Kumar <an...@gmail.com>.

Thanks Kritarth, Rupert and Pablo.This brings in a lot of clarity.

Regards,
Anuj

On Fri, Aug 10, 2012 at 2:35 PM, Pablo N. Mendes <pa...@gmail.com>wrote:

> Hi all,
> It will perhaps be useful to organize the discussion around methods, rather
> than implementations. Talking about implementations may be specially
> confusing because:
> 1) DBpedia Spotlight has DBpedia in the name. However, there are no
> theoretical restrictions on the choice of KB, and not even actual technical
> restrictions either, although in practice there might still a be few pieces
> of hardcoded references in our codebase (which can be easily removed).
> 2) DBpedia Spotlight is an open source Scala/Java tool for you to install
> and use in house. However, it offers a web service deployment for
> demonstration that for obvious reasons does not expose all of the possible
> combinations of functionality that the underlying code is able to offer.
>
> Similarly to Stanbol, DBpedia Spotlight also assumes very little of
> vocabularies. If all you have are labels, you can use our CandidateSearcher
> and use a measure of "default sense" to pick a URI. We've experimented with
> p(URI) as the overall prominence of an entity in the KB. We've also looked
> at p(URI|label) as a measure for finding the default sense (a URI) for a
> given label (the phrase found in text).
>
> Now, if you have labels *and context*, you can do a lot more. We also offer
> a ContextSearcher where given a label and a piece of text, one can obtain a
> rank of the most likely URIs given that context. Comparisons are made based
> on cosine similarity between vectors with tf*icf weights (modified tf*idf).
> In practical terms, we search a Lucene index using a custom similarity
> class. The task is to compare a vector made out of the input text with many
> vectors representing each entity in your target KB. We call these entity
> representations "context".
>
> There are many ways to obtain context for entities at "training" time:
> 1) Lesk-style: perhaps the oldest technique, models context based on
> "definitions" of each entity (dictionary style). If the incoming text
> contains many terms in the definition of entities, then you assume that the
> entity is close in meaning to the text, therefore is the right one to pick.
> 2) Shallow KB neighborhood: collects, for each entity, the labels of other
> entities in the neighborhood based on the KB structure (this is what Rupert
> mentioned). This is rather similar to Lesk-style, but has the cool feature
> (in an RDF world) of not really requiring dictionary entries, but just
> using the relationships in the KB to obtain more "words".
> 3) Occurrence/Mention-based: collects examples where the entity is known to
> have occurred / been mentioned. These examples are paragraphs mentioning
> the entity (and usually also other entities). So when the input text looks
> like one of these paragraphs (rather, the aggregation of all these
> paragraphs) for an entity, we assume that the entity is the right one to
> pick.
>
>
> For all three cases above, the model of the context is a vector of words
> and can, therefore, use either Stanbol or DBpedia Spotlight's
> implementations. Note that 3 will include both 1 and 2 (guaranteed for
> Wikipedia, expected in general for most reasonable training data), and
> that's why DBpedia Spotlight uses that by default. However, in practical
> terms, all that DBpedia Spotlight asks for is "some text" that the user can
> be free to generate however he/she wants.
>
> Besides the 3 methods above, there are other graph-based algorithms, joint
> inference for collective disambiguation algorithms, and so on. But I have
> omitted them for brevity, as they are not directly related to the questions
> raised by this thread.
>
> It would be interesting to compare 1, 2 and 3 so that users of Stanbol can
> have an idea of minimal accuracy expected in different cases, and how they
> can increase as you provide more context.
>
> Hope this helps.
>
> Cheers,
> Pablo
>
> PS: I used "label" here where we usually use "surface form" in DBpedia
> Spotlight. We consider "label" to be more like the "name" of an entity, or
> the value for "rdfs:label", while "surface form" is any phrase used to
> refer to an entity in text, even if it's not an rdfs:label. To keep it
> simple, I also used "entity" where we usually talk about "resource" in
> DBpedia Spotlight.
>
>
> On Thu, Aug 9, 2012 at 10:06 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
> > Hi,
> >
> > Stanbol currently assumes very little of Vocabularies. Basically you
> > need only an URI and a label to get an Entity suggested.
> >
> > If you want to do some kind of disambiguation you will clearly need
> > more information about Entities.
> >
> > Here the question is what kind of information the "spotlight approach"
> > needs. AFAIK this approach is based on "surface forms" - labels used
> > to refer to an Entity and "mentions" - sentences that mentions an
> > Entity. Kritarth please correct me if I get this wrong. But if this is
> > correct users would need to provide "mentions" for being able to use
> > DBpedia spotlight like disambiguations.
> >
> > I think other rather typical information would be the "semantic
> > context" - other entities referenced by an Entity. Based on that one
> > can also do disambiguation (e.g. Solr MLT over the labels of the
> > semantic context with the labels of the current sentence; or MLT over
> > the URIs of the semantic Context with URIs of other extracted Entities
> > in the current sentence/text section of the whole document).
> >
> > best
> > Rupert
> >
> > On Thu, Aug 9, 2012 at 7:27 PM, kritarth anand <kritarth.anand@gmail.com
> >
> > wrote:
> > > I was not sure if spotlight approach would work for all kinds of
> > > vocabularies that Stanbol might have.
> > >
> > > I was concerned that the structure of vocabulary it assumes is
> satisfied
> > by
> > > dbpedia but might not be satisfied by any custom vocabulary we might
> have
> > > in any other deployment.
> > >
> > > On Thu, Aug 9, 2012 at 10:51 PM, Anuj Kumar <an...@gmail.com>
> wrote:
> > >
> > >> Hi Kritarth,
> > >>
> > >> Thanks for the explanation. Spotlight approach sounds good to me but
> if
> > you
> > >> have time, it would be good to compare it with the other two for the
> > >> purpose of this study.
> > >>
> > >> On the third point, I am still not clear. Do you want to convey that
> > >> Spotlight's disambiguation algorithm can work only with DBpedia?
> > >>
> > >> Regards,
> > >> Anuj
> > >>
> > >> On Thu, Aug 9, 2012 at 8:18 PM, kritarth anand <
> > kritarth.anand@gmail.com
> > >> >wrote:
> > >>
> > >> > Dear Anuj,
> > >> >
> > >> > Sorry for Delayed reply.
> > >> >
> > >> > 1. In the current implementation of Stanbol what we see essentially
> > is.
> > >> >       a. We find all the entities in the given paragraph
> > >> >       b. For each entity query with a string of other entities as
> > >> > additional info to query dbpedia
> > >> >       c. Now we change the confidence values.
> > >> >
> > >> > 3. I'll answer this one first. I am not very sure of what Stanbol
> > expects
> > >> > from a vocabulary. All the other papers I read had seen were not
> > making
> > >> any
> > >> > assumptions on Vocabulary mainly they were using Wikipedia. I was
> > >> confused
> > >> > if it meant more flexibility. After discussion with Pablo and
> Rupert.
> > I
> > >> > think it is a way to go.
> > >> >
> > >> > 2. I am inclined towards using Spotlight approach as it seems to be
> > >> better
> > >> > than the other too and I would like comments from you guys if it is
> a
> > >> good
> > >> > way to proceed.
> > >> >
> > >> > Kritarth
> > >> >
> > >> >
> > >> > On Sun, Jul 29, 2012 at 11:29 AM, Anuj Kumar <an...@gmail.com>
> > wrote:
> > >> >
> > >> > > Hi Kritarth,
> > >> > >
> > >> > > Thanks for sharing the details. I have few questions-
> > >> > >
> > >> > > 1. Can you elaborate the current implementation? Is it using the
> > >> existing
> > >> > > MLT feature?
> > >> > > 2. Which one of the three algorithms are you planning to use?
> > >> > > 3. On the spotlight part, can you explain more on why you say- "I
> am
> > >> not
> > >> > > sure if we can play around that much with any vocabulary and not
> > just
> > >> > > DBpedia."?
> > >> > >
> > >> > > Also, there is a minor typo in the report under Approach section-
> > "Yhe
> > >> > > behavior
> > >> > > can be explained as follows:"
> > >> > >
> > >> > > Thanks,
> > >> > > Anuj
> > >> > >
> > >> > > On Wed, Jul 25, 2012 at 3:20 PM, kritarth anand <
> > >> > kritarth.anand@gmail.com
> > >> > > >wrote:
> > >> > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > I would like to start more interaction with the Stanbol
> Community
> > by
> > >> > > > sharing the first iteration of the Entity Disambiguation
> Engine. I
> > >> > would
> > >> > > > really like you all to take a look at it and give me your
> valuable
> > >> > > opinion.
> > >> > > >
> > >> > > > https://github.com/kritarthanand/Disambiguation-Stanbol
> > >> > > >
> > >> > > > The repo consists of the engines' code.It is very easy to
> install,
> > >> the
> > >> > > > instructions are present in the Readme file.
> > >> > > >
> > >> > > > Besides the engine it also contains my Mid Term Report which
> > >> describes
> > >> > > the
> > >> > > > engine a little and also talks about future possible algorithms
> > that
> > >> > can
> > >> > > be
> > >> > > > used for Entity Disambiguation. Disambiguation is a complex
> > problem
> > >> and
> > >> > > we
> > >> > > > should have an efficient and performs well too. Therefore I
> would
> > >> > really
> > >> > > > like Stanbol community to take part in discussion with
> Enthusiasm.
> > >> > > >
> > >> > > > Please share your views,
> > >> > > >
> > >> > > >
> > >> > > > Kritarth
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> >
> >
> > --
> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > | Bodenlehenstraße 11                             ++43-699-11108907
> > | A-5500 Bischofshofen
> >
>
>
>
> --
> ---
> Pablo N. Mendes
> http://pablomendes.com
> Events: http://wole2012.eurecom.fr
>

Re: Entity Disambiguation: Midterm

Posted by "Pablo N. Mendes" <pa...@gmail.com>.

Hi all,
It will perhaps be useful to organize the discussion around methods, rather
than implementations. Talking about implementations may be specially
confusing because:
1) DBpedia Spotlight has DBpedia in the name. However, there are no
theoretical restrictions on the choice of KB, and not even actual technical
restrictions either, although in practice there might still a be few pieces
of hardcoded references in our codebase (which can be easily removed).
2) DBpedia Spotlight is an open source Scala/Java tool for you to install
and use in house. However, it offers a web service deployment for
demonstration that for obvious reasons does not expose all of the possible
combinations of functionality that the underlying code is able to offer.

Similarly to Stanbol, DBpedia Spotlight also assumes very little of
vocabularies. If all you have are labels, you can use our CandidateSearcher
and use a measure of "default sense" to pick a URI. We've experimented with
p(URI) as the overall prominence of an entity in the KB. We've also looked
at p(URI|label) as a measure for finding the default sense (a URI) for a
given label (the phrase found in text).

Now, if you have labels *and context*, you can do a lot more. We also offer
a ContextSearcher where given a label and a piece of text, one can obtain a
rank of the most likely URIs given that context. Comparisons are made based
on cosine similarity between vectors with tf*icf weights (modified tf*idf).
In practical terms, we search a Lucene index using a custom similarity
class. The task is to compare a vector made out of the input text with many
vectors representing each entity in your target KB. We call these entity
representations "context".

There are many ways to obtain context for entities at "training" time:
1) Lesk-style: perhaps the oldest technique, models context based on
"definitions" of each entity (dictionary style). If the incoming text
contains many terms in the definition of entities, then you assume that the
entity is close in meaning to the text, therefore is the right one to pick.
2) Shallow KB neighborhood: collects, for each entity, the labels of other
entities in the neighborhood based on the KB structure (this is what Rupert
mentioned). This is rather similar to Lesk-style, but has the cool feature
(in an RDF world) of not really requiring dictionary entries, but just
using the relationships in the KB to obtain more "words".
3) Occurrence/Mention-based: collects examples where the entity is known to
have occurred / been mentioned. These examples are paragraphs mentioning
the entity (and usually also other entities). So when the input text looks
like one of these paragraphs (rather, the aggregation of all these
paragraphs) for an entity, we assume that the entity is the right one to
pick.

For all three cases above, the model of the context is a vector of words
and can, therefore, use either Stanbol or DBpedia Spotlight's
implementations. Note that 3 will include both 1 and 2 (guaranteed for
Wikipedia, expected in general for most reasonable training data), and
that's why DBpedia Spotlight uses that by default. However, in practical
terms, all that DBpedia Spotlight asks for is "some text" that the user can
be free to generate however he/she wants.

Besides the 3 methods above, there are other graph-based algorithms, joint
inference for collective disambiguation algorithms, and so on. But I have
omitted them for brevity, as they are not directly related to the questions
raised by this thread.

It would be interesting to compare 1, 2 and 3 so that users of Stanbol can
have an idea of minimal accuracy expected in different cases, and how they
can increase as you provide more context.

Hope this helps.

Cheers,
Pablo

PS: I used "label" here where we usually use "surface form" in DBpedia
Spotlight. We consider "label" to be more like the "name" of an entity, or
the value for "rdfs:label", while "surface form" is any phrase used to
refer to an entity in text, even if it's not an rdfs:label. To keep it
simple, I also used "entity" where we usually talk about "resource" in
DBpedia Spotlight.

On Thu, Aug 9, 2012 at 10:06 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi,
>
> Stanbol currently assumes very little of Vocabularies. Basically you
> need only an URI and a label to get an Entity suggested.
>
> If you want to do some kind of disambiguation you will clearly need
> more information about Entities.
>
> Here the question is what kind of information the "spotlight approach"
> needs. AFAIK this approach is based on "surface forms" - labels used
> to refer to an Entity and "mentions" - sentences that mentions an
> Entity. Kritarth please correct me if I get this wrong. But if this is
> correct users would need to provide "mentions" for being able to use
> DBpedia spotlight like disambiguations.
>
> I think other rather typical information would be the "semantic
> context" - other entities referenced by an Entity. Based on that one
> can also do disambiguation (e.g. Solr MLT over the labels of the
> semantic context with the labels of the current sentence; or MLT over
> the URIs of the semantic Context with URIs of other extracted Entities
> in the current sentence/text section of the whole document).
>
> best
> Rupert
>
> On Thu, Aug 9, 2012 at 7:27 PM, kritarth anand <kr...@gmail.com>
> wrote:
> > I was not sure if spotlight approach would work for all kinds of
> > vocabularies that Stanbol might have.
> >
> > I was concerned that the structure of vocabulary it assumes is satisfied
> by
> > dbpedia but might not be satisfied by any custom vocabulary we might have
> > in any other deployment.
> >
> > On Thu, Aug 9, 2012 at 10:51 PM, Anuj Kumar <an...@gmail.com> wrote:
> >
> >> Hi Kritarth,
> >>
> >> Thanks for the explanation. Spotlight approach sounds good to me but if
> you
> >> have time, it would be good to compare it with the other two for the
> >> purpose of this study.
> >>
> >> On the third point, I am still not clear. Do you want to convey that
> >> Spotlight's disambiguation algorithm can work only with DBpedia?
> >>
> >> Regards,
> >> Anuj
> >>
> >> On Thu, Aug 9, 2012 at 8:18 PM, kritarth anand <
> kritarth.anand@gmail.com
> >> >wrote:
> >>
> >> > Dear Anuj,
> >> >
> >> > Sorry for Delayed reply.
> >> >
> >> > 1. In the current implementation of Stanbol what we see essentially
> is.
> >> >       a. We find all the entities in the given paragraph
> >> >       b. For each entity query with a string of other entities as
> >> > additional info to query dbpedia
> >> >       c. Now we change the confidence values.
> >> >
> >> > 3. I'll answer this one first. I am not very sure of what Stanbol
> expects
> >> > from a vocabulary. All the other papers I read had seen were not
> making
> >> any
> >> > assumptions on Vocabulary mainly they were using Wikipedia. I was
> >> confused
> >> > if it meant more flexibility. After discussion with Pablo and Rupert.
> I
> >> > think it is a way to go.
> >> >
> >> > 2. I am inclined towards using Spotlight approach as it seems to be
> >> better
> >> > than the other too and I would like comments from you guys if it is a
> >> good
> >> > way to proceed.
> >> >
> >> > Kritarth
> >> >
> >> >
> >> > On Sun, Jul 29, 2012 at 11:29 AM, Anuj Kumar <an...@gmail.com>
> wrote:
> >> >
> >> > > Hi Kritarth,
> >> > >
> >> > > Thanks for sharing the details. I have few questions-
> >> > >
> >> > > 1. Can you elaborate the current implementation? Is it using the
> >> existing
> >> > > MLT feature?
> >> > > 2. Which one of the three algorithms are you planning to use?
> >> > > 3. On the spotlight part, can you explain more on why you say- "I am
> >> not
> >> > > sure if we can play around that much with any vocabulary and not
> just
> >> > > DBpedia."?
> >> > >
> >> > > Also, there is a minor typo in the report under Approach section-
> "Yhe
> >> > > behavior
> >> > > can be explained as follows:"
> >> > >
> >> > > Thanks,
> >> > > Anuj
> >> > >
> >> > > On Wed, Jul 25, 2012 at 3:20 PM, kritarth anand <
> >> > kritarth.anand@gmail.com
> >> > > >wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > I would like to start more interaction with the Stanbol Community
> by
> >> > > > sharing the first iteration of the Entity Disambiguation Engine. I
> >> > would
> >> > > > really like you all to take a look at it and give me your valuable
> >> > > opinion.
> >> > > >
> >> > > > https://github.com/kritarthanand/Disambiguation-Stanbol
> >> > > >
> >> > > > The repo consists of the engines' code.It is very easy to install,
> >> the
> >> > > > instructions are present in the Readme file.
> >> > > >
> >> > > > Besides the engine it also contains my Mid Term Report which
> >> describes
> >> > > the
> >> > > > engine a little and also talks about future possible algorithms
> that
> >> > can
> >> > > be
> >> > > > used for Entity Disambiguation. Disambiguation is a complex
> problem
> >> and
> >> > > we
> >> > > > should have an efficient and performs well too. Therefore I would
> >> > really
> >> > > > like Stanbol community to take part in discussion with Enthusiasm.
> >> > > >
> >> > > > Please share your views,
> >> > > >
> >> > > >
> >> > > > Kritarth
> >> > > >
> >> > >
> >> >
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

-- 
---
Pablo N. Mendes
http://pablomendes.com
Events: http://wole2012.eurecom.fr

Re: Entity Disambiguation: Midterm

Posted by Rupert Westenthaler <ru...@gmail.com>.

Hi,

Stanbol currently assumes very little of Vocabularies. Basically you
need only an URI and a label to get an Entity suggested.

If you want to do some kind of disambiguation you will clearly need
more information about Entities.

Here the question is what kind of information the "spotlight approach"
needs. AFAIK this approach is based on "surface forms" - labels used
to refer to an Entity and "mentions" - sentences that mentions an
Entity. Kritarth please correct me if I get this wrong. But if this is
correct users would need to provide "mentions" for being able to use
DBpedia spotlight like disambiguations.

I think other rather typical information would be the "semantic
context" - other entities referenced by an Entity. Based on that one
can also do disambiguation (e.g. Solr MLT over the labels of the
semantic context with the labels of the current sentence; or MLT over
the URIs of the semantic Context with URIs of other extracted Entities
in the current sentence/text section of the whole document).

best
Rupert

On Thu, Aug 9, 2012 at 7:27 PM, kritarth anand <kr...@gmail.com> wrote:
> I was not sure if spotlight approach would work for all kinds of
> vocabularies that Stanbol might have.
>
> I was concerned that the structure of vocabulary it assumes is satisfied by
> dbpedia but might not be satisfied by any custom vocabulary we might have
> in any other deployment.
>
> On Thu, Aug 9, 2012 at 10:51 PM, Anuj Kumar <an...@gmail.com> wrote:
>
>> Hi Kritarth,
>>
>> Thanks for the explanation. Spotlight approach sounds good to me but if you
>> have time, it would be good to compare it with the other two for the
>> purpose of this study.
>>
>> On the third point, I am still not clear. Do you want to convey that
>> Spotlight's disambiguation algorithm can work only with DBpedia?
>>
>> Regards,
>> Anuj
>>
>> On Thu, Aug 9, 2012 at 8:18 PM, kritarth anand <kritarth.anand@gmail.com
>> >wrote:
>>
>> > Dear Anuj,
>> >
>> > Sorry for Delayed reply.
>> >
>> > 1. In the current implementation of Stanbol what we see essentially is.
>> >       a. We find all the entities in the given paragraph
>> >       b. For each entity query with a string of other entities as
>> > additional info to query dbpedia
>> >       c. Now we change the confidence values.
>> >
>> > 3. I'll answer this one first. I am not very sure of what Stanbol expects
>> > from a vocabulary. All the other papers I read had seen were not making
>> any
>> > assumptions on Vocabulary mainly they were using Wikipedia. I was
>> confused
>> > if it meant more flexibility. After discussion with Pablo and Rupert. I
>> > think it is a way to go.
>> >
>> > 2. I am inclined towards using Spotlight approach as it seems to be
>> better
>> > than the other too and I would like comments from you guys if it is a
>> good
>> > way to proceed.
>> >
>> > Kritarth
>> >
>> >
>> > On Sun, Jul 29, 2012 at 11:29 AM, Anuj Kumar <an...@gmail.com> wrote:
>> >
>> > > Hi Kritarth,
>> > >
>> > > Thanks for sharing the details. I have few questions-
>> > >
>> > > 1. Can you elaborate the current implementation? Is it using the
>> existing
>> > > MLT feature?
>> > > 2. Which one of the three algorithms are you planning to use?
>> > > 3. On the spotlight part, can you explain more on why you say- "I am
>> not
>> > > sure if we can play around that much with any vocabulary and not just
>> > > DBpedia."?
>> > >
>> > > Also, there is a minor typo in the report under Approach section- "Yhe
>> > > behavior
>> > > can be explained as follows:"
>> > >
>> > > Thanks,
>> > > Anuj
>> > >
>> > > On Wed, Jul 25, 2012 at 3:20 PM, kritarth anand <
>> > kritarth.anand@gmail.com
>> > > >wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > I would like to start more interaction with the Stanbol Community by
>> > > > sharing the first iteration of the Entity Disambiguation Engine. I
>> > would
>> > > > really like you all to take a look at it and give me your valuable
>> > > opinion.
>> > > >
>> > > > https://github.com/kritarthanand/Disambiguation-Stanbol
>> > > >
>> > > > The repo consists of the engines' code.It is very easy to install,
>> the
>> > > > instructions are present in the Readme file.
>> > > >
>> > > > Besides the engine it also contains my Mid Term Report which
>> describes
>> > > the
>> > > > engine a little and also talks about future possible algorithms that
>> > can
>> > > be
>> > > > used for Entity Disambiguation. Disambiguation is a complex problem
>> and
>> > > we
>> > > > should have an efficient and performs well too. Therefore I would
>> > really
>> > > > like Stanbol community to take part in discussion with Enthusiasm.
>> > > >
>> > > > Please share your views,
>> > > >
>> > > >
>> > > > Kritarth
>> > > >
>> > >
>> >
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Entity Disambiguation: Midterm

Posted by kritarth anand <kr...@gmail.com>.

I was not sure if spotlight approach would work for all kinds of
vocabularies that Stanbol might have.

I was concerned that the structure of vocabulary it assumes is satisfied by
dbpedia but might not be satisfied by any custom vocabulary we might have
in any other deployment.

On Thu, Aug 9, 2012 at 10:51 PM, Anuj Kumar <an...@gmail.com> wrote:

> Hi Kritarth,
>
> Thanks for the explanation. Spotlight approach sounds good to me but if you
> have time, it would be good to compare it with the other two for the
> purpose of this study.
>
> On the third point, I am still not clear. Do you want to convey that
> Spotlight's disambiguation algorithm can work only with DBpedia?
>
> Regards,
> Anuj
>
> On Thu, Aug 9, 2012 at 8:18 PM, kritarth anand <kritarth.anand@gmail.com
> >wrote:
>
> > Dear Anuj,
> >
> > Sorry for Delayed reply.
> >
> > 1. In the current implementation of Stanbol what we see essentially is.
> >       a. We find all the entities in the given paragraph
> >       b. For each entity query with a string of other entities as
> > additional info to query dbpedia
> >       c. Now we change the confidence values.
> >
> > 3. I'll answer this one first. I am not very sure of what Stanbol expects
> > from a vocabulary. All the other papers I read had seen were not making
> any
> > assumptions on Vocabulary mainly they were using Wikipedia. I was
> confused
> > if it meant more flexibility. After discussion with Pablo and Rupert. I
> > think it is a way to go.
> >
> > 2. I am inclined towards using Spotlight approach as it seems to be
> better
> > than the other too and I would like comments from you guys if it is a
> good
> > way to proceed.
> >
> > Kritarth
> >
> >
> > On Sun, Jul 29, 2012 at 11:29 AM, Anuj Kumar <an...@gmail.com> wrote:
> >
> > > Hi Kritarth,
> > >
> > > Thanks for sharing the details. I have few questions-
> > >
> > > 1. Can you elaborate the current implementation? Is it using the
> existing
> > > MLT feature?
> > > 2. Which one of the three algorithms are you planning to use?
> > > 3. On the spotlight part, can you explain more on why you say- "I am
> not
> > > sure if we can play around that much with any vocabulary and not just
> > > DBpedia."?
> > >
> > > Also, there is a minor typo in the report under Approach section- "Yhe
> > > behavior
> > > can be explained as follows:"
> > >
> > > Thanks,
> > > Anuj
> > >
> > > On Wed, Jul 25, 2012 at 3:20 PM, kritarth anand <
> > kritarth.anand@gmail.com
> > > >wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to start more interaction with the Stanbol Community by
> > > > sharing the first iteration of the Entity Disambiguation Engine. I
> > would
> > > > really like you all to take a look at it and give me your valuable
> > > opinion.
> > > >
> > > > https://github.com/kritarthanand/Disambiguation-Stanbol
> > > >
> > > > The repo consists of the engines' code.It is very easy to install,
> the
> > > > instructions are present in the Readme file.
> > > >
> > > > Besides the engine it also contains my Mid Term Report which
> describes
> > > the
> > > > engine a little and also talks about future possible algorithms that
> > can
> > > be
> > > > used for Entity Disambiguation. Disambiguation is a complex problem
> and
> > > we
> > > > should have an efficient and performs well too. Therefore I would
> > really
> > > > like Stanbol community to take part in discussion with Enthusiasm.
> > > >
> > > > Please share your views,
> > > >
> > > >
> > > > Kritarth
> > > >
> > >
> >
>

Re: Entity Disambiguation: Midterm

Posted by Anuj Kumar <an...@gmail.com>.

Hi Kritarth,

Thanks for the explanation. Spotlight approach sounds good to me but if you
have time, it would be good to compare it with the other two for the
purpose of this study.

On the third point, I am still not clear. Do you want to convey that
Spotlight's disambiguation algorithm can work only with DBpedia?

Regards,
Anuj

On Thu, Aug 9, 2012 at 8:18 PM, kritarth anand <kr...@gmail.com>wrote:

> Dear Anuj,
>
> Sorry for Delayed reply.
>
> 1. In the current implementation of Stanbol what we see essentially is.
>       a. We find all the entities in the given paragraph
>       b. For each entity query with a string of other entities as
> additional info to query dbpedia
>       c. Now we change the confidence values.
>
> 3. I'll answer this one first. I am not very sure of what Stanbol expects
> from a vocabulary. All the other papers I read had seen were not making any
> assumptions on Vocabulary mainly they were using Wikipedia. I was confused
> if it meant more flexibility. After discussion with Pablo and Rupert. I
> think it is a way to go.
>
> 2. I am inclined towards using Spotlight approach as it seems to be better
> than the other too and I would like comments from you guys if it is a good
> way to proceed.
>
> Kritarth
>
>
> On Sun, Jul 29, 2012 at 11:29 AM, Anuj Kumar <an...@gmail.com> wrote:
>
> > Hi Kritarth,
> >
> > Thanks for sharing the details. I have few questions-
> >
> > 1. Can you elaborate the current implementation? Is it using the existing
> > MLT feature?
> > 2. Which one of the three algorithms are you planning to use?
> > 3. On the spotlight part, can you explain more on why you say- "I am not
> > sure if we can play around that much with any vocabulary and not just
> > DBpedia."?
> >
> > Also, there is a minor typo in the report under Approach section- "Yhe
> > behavior
> > can be explained as follows:"
> >
> > Thanks,
> > Anuj
> >
> > On Wed, Jul 25, 2012 at 3:20 PM, kritarth anand <
> kritarth.anand@gmail.com
> > >wrote:
> >
> > > Hi all,
> > >
> > > I would like to start more interaction with the Stanbol Community by
> > > sharing the first iteration of the Entity Disambiguation Engine. I
> would
> > > really like you all to take a look at it and give me your valuable
> > opinion.
> > >
> > > https://github.com/kritarthanand/Disambiguation-Stanbol
> > >
> > > The repo consists of the engines' code.It is very easy to install, the
> > > instructions are present in the Readme file.
> > >
> > > Besides the engine it also contains my Mid Term Report which describes
> > the
> > > engine a little and also talks about future possible algorithms that
> can
> > be
> > > used for Entity Disambiguation. Disambiguation is a complex problem and
> > we
> > > should have an efficient and performs well too. Therefore I would
> really
> > > like Stanbol community to take part in discussion with Enthusiasm.
> > >
> > > Please share your views,
> > >
> > >
> > > Kritarth
> > >
> >
>

Re: Entity Disambiguation: Midterm

Posted by kritarth anand <kr...@gmail.com>.

Dear Anuj,

Sorry for Delayed reply.

1. In the current implementation of Stanbol what we see essentially is.
      a. We find all the entities in the given paragraph
      b. For each entity query with a string of other entities as
additional info to query dbpedia
      c. Now we change the confidence values.

3. I'll answer this one first. I am not very sure of what Stanbol expects
from a vocabulary. All the other papers I read had seen were not making any
assumptions on Vocabulary mainly they were using Wikipedia. I was confused
if it meant more flexibility. After discussion with Pablo and Rupert. I
think it is a way to go.

2. I am inclined towards using Spotlight approach as it seems to be better
than the other too and I would like comments from you guys if it is a good
way to proceed.

Kritarth


On Sun, Jul 29, 2012 at 11:29 AM, Anuj Kumar <an...@gmail.com> wrote:

> Hi Kritarth,
>
> Thanks for sharing the details. I have few questions-
>
> 1. Can you elaborate the current implementation? Is it using the existing
> MLT feature?
> 2. Which one of the three algorithms are you planning to use?
> 3. On the spotlight part, can you explain more on why you say- "I am not
> sure if we can play around that much with any vocabulary and not just
> DBpedia."?
>
> Also, there is a minor typo in the report under Approach section- "Yhe
> behavior
> can be explained as follows:"
>
> Thanks,
> Anuj
>
> On Wed, Jul 25, 2012 at 3:20 PM, kritarth anand <kritarth.anand@gmail.com
> >wrote:
>
> > Hi all,
> >
> > I would like to start more interaction with the Stanbol Community by
> > sharing the first iteration of the Entity Disambiguation Engine. I would
> > really like you all to take a look at it and give me your valuable
> opinion.
> >
> > https://github.com/kritarthanand/Disambiguation-Stanbol
> >
> > The repo consists of the engines' code.It is very easy to install, the
> > instructions are present in the Readme file.
> >
> > Besides the engine it also contains my Mid Term Report which describes
> the
> > engine a little and also talks about future possible algorithms that can
> be
> > used for Entity Disambiguation. Disambiguation is a complex problem and
> we
> > should have an efficient and performs well too. Therefore I would really
> > like Stanbol community to take part in discussion with Enthusiasm.
> >
> > Please share your views,
> >
> >
> > Kritarth
> >
>

Re: Entity Disambiguation: Midterm

Posted by Anuj Kumar <an...@gmail.com>.

Hi Kritarth,

Thanks for sharing the details. I have few questions-

1. Can you elaborate the current implementation? Is it using the existing
MLT feature?
2. Which one of the three algorithms are you planning to use?
3. On the spotlight part, can you explain more on why you say- "I am not
sure if we can play around that much with any vocabulary and not just
DBpedia."?

Also, there is a minor typo in the report under Approach section- "Yhe behavior
can be explained as follows:"

Thanks,
Anuj

On Wed, Jul 25, 2012 at 3:20 PM, kritarth anand <kr...@gmail.com>wrote:

> Hi all,
>
> I would like to start more interaction with the Stanbol Community by
> sharing the first iteration of the Entity Disambiguation Engine. I would
> really like you all to take a look at it and give me your valuable opinion.
>
> https://github.com/kritarthanand/Disambiguation-Stanbol
>
> The repo consists of the engines' code.It is very easy to install, the
> instructions are present in the Readme file.
>
> Besides the engine it also contains my Mid Term Report which describes the
> engine a little and also talks about future possible algorithms that can be
> used for Entity Disambiguation. Disambiguation is a complex problem and we
> should have an efficient and performs well too. Therefore I would really
> like Stanbol community to take part in discussion with Enthusiasm.
>
> Please share your views,
>
>
> Kritarth
>