You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Cristian Petroaca <cr...@gmail.com> on 2013/09/01 19:56:13 UTC

Re: Relation extraction feature

Related to the Stanford Dependency Tree Feature, this is the way the output
from the tool looks like for this sentence : "Mary and Tom met Danny today"
:


2013/8/30 Cristian Petroaca <cr...@gmail.com>

> Hi Rupert,
>
> Ok, so after looking at the JSON output from the Stanford NLP Server and
> the coref module I'm thinking I can represent the coreference information
> this way:
> Each "Token" or "Chunk" will contain an additional coref annotation with
> the following structure :
>
> "stanbol.enhancer.nlp.coref" {
>     "tag" : //does this need to exist?
>     "isRepresentative" : true/false, // whether this token or chunk is the
> representative mention in the chain
>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the mention
> is found
>                            "startWord" : 2 //the first word making up the
> mention
>                            "endWord" : 3 //the last word making up the
> mention
>                          }, ...
>                        ],
>     "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> }
>
> The CorefTag should resemble this model.
>
> What do you think?
>
> Cristian
>
>
> 2013/8/24 Rupert Westenthaler <ru...@gmail.com>
>
>> Hi Cristian,
>>
>> you can not directly call StanfordNLP components from Stanbol, but you
>> have to extend the RESTful service to include the information you
>> need. The main reason for that is that the license of StanfordNLP is
>> not compatible with the Apache Software License. So Stanbol can not
>> directly link to the StanfordNLP API.
>>
>> You will need to
>>
>> 1. define an additional class {yourTag} extends Tag<{yourType}> class
>> in the o.a.s.enhancer.nlp module
>> 2. add JSON parsing and serialization support for this tag to the
>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
>>
>> As (1) would be necessary anyway the only additional thing you need to
>> develop is (2). After that you can add {yourTag} instance to the
>> AnalyzedText in the StanfornNLP integration. The
>> RestfulNlpAnalysisEngine will parse them from the response. All
>> engines executed after the RestfulNlpAnalysisEngine will have access
>> to your annotations.
>>
>> If you have a design for {yourTag} - the model you would like to use
>> to represent your data - I can help with (1) and (2).
>>
>> best
>> Rupert
>>
>>
>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > Hi Rupert,
>> >
>> > Thanks for the info. Looking at the standbol-stanfordnlp project I see
>> that
>> > the stanford nlp is not implemented as an EnhancementEngine but rather
>> it
>> > is used directly in a Jetty Server instance. How does that fit into the
>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
>> routine
>> > from my TripleExtractionEnhancementEngine which lives in the Stanbol
>> stack?
>> >
>> > Thanks,
>> > Cristian
>> >
>> >
>> > 2013/8/12 Rupert Westenthaler <ru...@gmail.com>
>> >
>> >> Hi Cristian,
>> >>
>> >> Sorry for the late response, but I was offline for the last two weeks
>> >>
>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
>> >> <cr...@gmail.com> wrote:
>> >> > Hi Rupert,
>> >> >
>> >> > After doing some tests it seems that the Stanford NLP coreference
>> module
>> >> is
>> >> > much more accurate than the Open NLP one.So I decided to extend
>> Stanford
>> >> > NLP to add coreference there.
>> >>
>> >> The Stanford NLP integration is not part of the Stanbol codebase
>> >> because the licenses are not compatible.
>> >>
>> >> You can find the Stanford NLP integration on
>> >>
>> >>     https://github.com/westei/stanbol-stanfordnlp
>> >>
>> >> just create a fork and send pull requests.
>> >>
>> >>
>> >> > Could you add the necessary projects on the branch? And also remove
>> the
>> >> > Open NLP ones?
>> >> >
>> >>
>> >> Currently the branch
>> >>
>> >>
>> >>
>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>> >>
>> >> only contains the "nlp" and the "nlp-json" modules. IMO those should
>> >> be enough for adding coreference support.
>> >>
>> >> IMO you will need to
>> >>
>> >> * add an model for representing coreference to the nlp module
>> >> * add parsing and serializing support to the nlp-json module
>> >> * add the implementation to your fork of the stanbol-stanfordnlp
>> project
>> >>
>> >> best
>> >> Rupert
>> >>
>> >>
>> >>
>> >> > Thanks,
>> >> > Cristian
>> >> >
>> >> >
>> >> > 2013/7/5 Rupert Westenthaler <ru...@gmail.com>
>> >> >
>> >> >> Hi Cristian,
>> >> >>
>> >> >> I created the branch at
>> >> >>
>> >> >>
>> >> >>
>> >>
>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>> >> >>
>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me know if
>> >> >> you would like to have more
>> >> >>
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
>> >> >> <cr...@gmail.com> wrote:
>> >> >> > Hi Rupert,
>> >> >> >
>> >> >> > I created jiras :
>> https://issues.apache.org/jira/browse/STANBOL-1132and
>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The original
>> one
>> >> in
>> >> >> > dependent upon these.
>> >> >> > Please let me know when I can start using the branch.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Cristian
>> >> >> >
>> >> >> >
>> >> >> > 2013/6/27 Cristian Petroaca <cr...@gmail.com>
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> 2013/6/27 Rupert Westenthaler <ru...@gmail.com>
>> >> >> >>
>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
>> >> >> >>> <cr...@gmail.com> wrote:
>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
>> previous
>> >> >> e-mail.
>> >> >> >>> By
>> >> >> >>> > the way, does Open NLP have the ability to build dependency
>> trees?
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>> AFAIK OpenNLP does not provide this feature.
>> >> >> >>>
>> >> >> >>
>> >> >> >> Then , since the Stanford NLP lib is also integrated into
>> Stanbol,
>> >> I'll
>> >> >> >> take a look at how I can extend its integration to include the
>> >> >> dependency
>> >> >> >> tree feature.
>> >> >> >>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>  >
>> >> >> >>> > 2013/6/23 Cristian Petroaca <cr...@gmail.com>
>> >> >> >>> >
>> >> >> >>> >> Hi Rupert,
>> >> >> >>> >>
>> >> >> >>> >> I created jira
>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
>> >> >> >>> >> As you suggested I would start with extending the Stanford
>> NLP
>> >> with
>> >> >> >>> >> co-reference resolution but I think also with dependency
>> trees
>> >> >> because
>> >> >> >>> I
>> >> >> >>> >> also need to know the Subject of the sentence and the object
>> >> that it
>> >> >> >>> >> affects, right?
>> >> >> >>> >>
>> >> >> >>> >> Given that I need to extend the Stanford NLP API in Stanbol
>> for
>> >> >> >>> >> co-reference and dependency trees, how do I proceed with
>> this?
>> >> Do I
>> >> >> >>> create
>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that can I
>> >> start
>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm done
>> I'll
>> >> send
>> >> >> >>> you
>> >> >> >>> >> guys the patch fo review?
>> >> >> >>> >>
>> >> >> >>>
>> >> >> >>> I would create two "New Feature" type Issues one for adding
>> support
>> >> >> >>> for "dependency trees" and the other for "co-reference"
>> support. You
>> >> >> >>> should also define "depends on" relations between STANBOL-1121
>> and
>> >> >> >>> those two new issues.
>> >> >> >>>
>> >> >> >>> Sub-task could also work, but as adding those features would be
>> also
>> >> >> >>> interesting for other things I would rather define them as
>> separate
>> >> >> >>> issues.
>> >> >> >>>
>> >> >> >>>
>> >> >> >> 2 New Features connected with the original jira it is then.
>> >> >> >>
>> >> >> >>
>> >> >> >>> If you would prefer to work in an own branch please tell me.
>> This
>> >> >> >>> could have the advantage that patches would not be affected by
>> >> changes
>> >> >> >>> in the trunk.
>> >> >> >>>
>> >> >> >>> Yes, a separate branch sounds good.
>> >> >> >>
>> >> >> >> best
>> >> >> >>> Rupert
>> >> >> >>>
>> >> >> >>> >> Regards,
>> >> >> >>> >> Cristian
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <rupert.westenthaler@gmail.com
>> >
>> >> >> >>> >>
>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>> >> >> >>> >>> <cr...@gmail.com> wrote:
>> >> >> >>> >>> > Hi Rupert,
>> >> >> >>> >>> >
>> >> >> >>> >>> > Agreed on the
>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>> >> >> >>> >>> > data structure.
>> >> >> >>> >>> >
>> >> >> >>> >>> > Should I open up a Jira for all of this in order to
>> >> encapsulate
>> >> >> this
>> >> >> >>> >>> > information and establish the goals and these initial
>> steps
>> >> >> towards
>> >> >> >>> >>> these
>> >> >> >>> >>> > goals?
>> >> >> >>> >>>
>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
>> >> >> >>> >>>
>> >> >> >>> >>> > How should I proceed further? Should I create some design
>> >> >> documents
>> >> >> >>> that
>> >> >> >>> >>> > need to be reviewed?
>> >> >> >>> >>>
>> >> >> >>> >>> Usually it is the best to write design related text
>> directly in
>> >> >> JIRA
>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later to
>> use
>> >> this
>> >> >> >>> >>> text directly for the documentation on the Stanbol Webpage.
>> >> >> >>> >>>
>> >> >> >>> >>> best
>> >> >> >>> >>> Rupert
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
>> >> >> >>> >>> >
>> >> >> >>> >>> > Regards,
>> >> >> >>> >>> > Cristian
>> >> >> >>> >>> >
>> >> >> >>> >>> >
>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
>> rupert.westenthaler@gmail.com>
>> >> >> >>> >>> >
>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
>> >> >> >>> >>> >> <cr...@gmail.com> wrote:
>> >> >> >>> >>> >> > HI Rupert,
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
>> >> rupert.westenthaler@gmail.com>
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> >> Hi Cristian, all
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> really interesting use case!
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions on
>> how
>> >> this
>> >> >> >>> could
>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
>> experiences
>> >> >> and
>> >> >> >>> >>> lessons
>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
>> >> information
>> >> >> >>> system
>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this Project
>> >> excluded
>> >> >> the
>> >> >> >>> >>> >> >> extraction of Events from unstructured text (because
>> the
>> >> >> Olympic
>> >> >> >>> >>> >> >> Information System was already providing event data
>> as XML
>> >> >> >>> messages)
>> >> >> >>> >>> >> >> the semantic search capabilities of this system where
>> very
>> >> >> >>> similar
>> >> >> >>> >>> as
>> >> >> >>> >>> >> >> the one described by your use case.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations,
>> but a
>> >> >> formal
>> >> >> >>> >>> >> >> representation of the situation described by the
>> text. So
>> >> >> lets
>> >> >> >>> >>> assume
>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or Situation)
>> >> >> described
>> >> >> >>> in
>> >> >> >>> >>> the
>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
>> advices on
>> >> >> how to
>> >> >> >>> >>> model
>> >> >> >>> >>> >> >> those. The important relation for modeling this
>> >> >> Participation:
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> where ..
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do have
>> an
>> >> >> >>> identity so
>> >> >> >>> >>> we
>> >> >> >>> >>> >> >> would typically refer to them as Entities referenced
>> by a
>> >> >> >>> setting.
>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as
>> well as
>> >> >> >>> >>> >> >> social-objects.
>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants are
>> >> entities
>> >> >> that
>> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities ...
>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
>> relation
>> >> where
>> >> >> >>> >>> >> >> Endurants participate in Perdurants
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
>> intermediate
>> >> >> >>> resources
>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to
>> define
>> >> one
>> >> >> >>> resource
>> >> >> >>> >>> >> >> being the context for all described data. I would call
>> >> this
>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
>> sub-concept to
>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about the
>> >> extracted
>> >> >> >>> >>> Setting
>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to annotate
>> that
>> >> >> >>> Endurant is
>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
>> >> >> >>> fise:SettingAnnotation).
>> >> >> >>> >>> >> >> The Endurant itself is described by existing
>> >> >> fise:TextAnnotaion
>> >> >> >>> (the
>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
>> Entities).
>> >> >> >>> Basically
>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
>> >> >> EnhancementEngine
>> >> >> >>> to
>> >> >> >>> >>> >> >> state that several mentions (in possible different
>> >> >> sentences) do
>> >> >> >>> >>> >> >> represent the same Endurant as participating in the
>> >> Setting.
>> >> >> In
>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
>> property
>> >> >> >>> (similar
>> >> >> >>> >>> as
>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) of an
>> >> >> >>> participant
>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an
>> action)
>> >> Cause
>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a passive
>> >> role
>> >> >> in
>> >> >> >>> an
>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but I am
>> >> >> wondering
>> >> >> >>> if
>> >> >> >>> >>> one
>> >> >> >>> >>> >> >> could extract those information.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
>> >> Perdurant
>> >> >> in
>> >> >> >>> the
>> >> >> >>> >>> >> >> context of the Setting. Also fise:OccurrentAnnotation
>> can
>> >> >> link
>> >> >> >>> to
>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
>> defining
>> >> the
>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation suggesting
>> >> well
>> >> >> >>> known
>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
>> country,
>> >> or
>> >> >> an
>> >> >> >>> >>> >> >> upraising ...). In addition fise:OccurrentAnnotation
>> can
>> >> >> define
>> >> >> >>> >>> >> >> dc:has-participant links to
>> fise:ParticipantAnnotation. In
>> >> >> this
>> >> >> >>> case
>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this Perturant
>> >> (the
>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are temporal
>> >> >> indexed
>> >> >> >>> this
>> >> >> >>> >>> >> >> annotation should also support properties for
>> defining the
>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot of
>> sense
>> >> >> with
>> >> >> >>> the
>> >> >> >>> >>> >> remark
>> >> >> >>> >>> >> > that you probably won't be able to always extract the
>> date
>> >> >> for a
>> >> >> >>> >>> given
>> >> >> >>> >>> >> > setting(situation).
>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which the
>> >> object
>> >> >> upon
>> >> >> >>> >>> which
>> >> >> >>> >>> >> the
>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a transitory
>> >> object (
>> >> >> >>> such
>> >> >> >>> >>> as an
>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
>> example
>> >> we
>> >> >> can
>> >> >> >>> >>> have
>> >> >> >>> >>> >> the
>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the Endurant (
>> >> >> Subject )
>> >> >> >>> >>> which
>> >> >> >>> >>> >> > performs the action of "invading" on another Eundurant,
>> >> namely
>> >> >> >>> >>> "Irak".
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
>> Patient.
>> >> Both
>> >> >> >>> are
>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
>> Perdurant. So
>> >> >> >>> ideally
>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the dc:type
>> >> >> caos:Agent,
>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
>> >> >> >>> fise:EntityAnnotation
>> >> >> >>> >>> >> linking to dbpedia:United_States
>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the dc:type
>> >> >> >>> caos:Patient,
>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
>> dc:type
>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
>> "invades"
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and the
>> >> Object
>> >> >> >>> come
>> >> >> >>> >>> into
>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
>> >> dc:"property"
>> >> >> >>> where
>> >> >> >>> >>> the
>> >> >> >>> >>> >> > property = verb which links to the Object in noun
>> form. For
>> >> >> >>> example
>> >> >> >>> >>> take
>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would have
>> the
>> >> >> "USA"
>> >> >> >>> >>> Entity
>> >> >> >>> >>> >> with
>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
>> Endurant
>> >> >> would
>> >> >> >>> >>> have as
>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs which
>> link
>> >> it
>> >> >> to
>> >> >> >>> an
>> >> >> >>> >>> >> Object.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> As explained above you would have a
>> fise:OccurrentAnnotation
>> >> >> that
>> >> >> >>> >>> >> represents the Perdurant. The information that the
>> activity
>> >> >> >>> mention in
>> >> >> >>> >>> >> the text is "invades" would be by linking to a
>> >> >> >>> fise:TextAnnotation. If
>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that defines
>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could also
>> link
>> >> >> to an
>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> best
>> >> >> >>> >>> >> Rupert
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > ### Consuming the data:
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> I think this model should be sufficient for use-cases
>> as
>> >> >> >>> described
>> >> >> >>> >>> by
>> >> >> >>> >>> >> you.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> Users would be able to consume data on the setting
>> level.
>> >> >> This
>> >> >> >>> can
>> >> >> >>> >>> be
>> >> >> >>> >>> >> >> done my simple retrieving all
>> fise:ParticipantAnnotation
>> >> as
>> >> >> >>> well as
>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW
>> this
>> >> was
>> >> >> the
>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
>> allows
>> >> >> >>> queries for
>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you could
>> >> filter
>> >> >> >>> for
>> >> >> >>> >>> >> >> Settings that involve a {Person}, activities:Arrested
>> and
>> >> a
>> >> >> >>> specific
>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach you
>> will
>> >> >> get
>> >> >> >>> >>> results
>> >> >> >>> >>> >> >> for Setting where the {Person} participated and an
>> other
>> >> >> person
>> >> >> >>> was
>> >> >> >>> >>> >> >> arrested.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> An other possibility would be to process enhancement
>> >> results
>> >> >> on
>> >> >> >>> the
>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a much
>> >> higher
>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to correctly
>> answer
>> >> >> the
>> >> >> >>> query
>> >> >> >>> >>> >> >> used as an example above). But I am wondering if the
>> >> quality
>> >> >> of
>> >> >> >>> the
>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I have
>> >> also
>> >> >> >>> doubts
>> >> >> >>> >>> if
>> >> >> >>> >>> >> >> this can be still realized by using semantic indexing
>> to
>> >> >> Apache
>> >> >> >>> Solr
>> >> >> >>> >>> >> >> or if it would be better/necessary to store results
>> in a
>> >> >> >>> TripleStore
>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> The methodology and query language used by YAGO [3] is
>> >> also
>> >> >> very
>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7 SPOTL(X)
>> >> >> >>> >>> Representation).
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> An other related Topic is the enrichment of Entities
>> >> >> (especially
>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings extracted
>> >> form
>> >> >> >>> >>> Documents.
>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are temporal
>> >> >> indexed.
>> >> >> >>> That
>> >> >> >>> >>> >> >> means that at the time when added to a knowledge base
>> they
>> >> >> might
>> >> >> >>> >>> still
>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
>> refinement
>> >> of
>> >> >> such
>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be critical
>> for
>> >> a
>> >> >> >>> System
>> >> >> >>> >>> >> >> like described in your use-case.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
>> >> >> >>> >>> >> >> <cr...@gmail.com> wrote:
>> >> >> >>> >>> >> >> >
>> >> >> >>> >>> >> >> > First of all I have to mention that I am new in the
>> >> field
>> >> >> of
>> >> >> >>> >>> semantic
>> >> >> >>> >>> >> >> > technologies, I've started to read about them in the
>> >> last
>> >> >> 4-5
>> >> >> >>> >>> >> >> months.Having
>> >> >> >>> >>> >> >> > said that I have a high level overview of what is a
>> good
>> >> >> >>> approach
>> >> >> >>> >>> to
>> >> >> >>> >>> >> >> solve
>> >> >> >>> >>> >> >> > this problem. There are a number of papers on the
>> >> internet
>> >> >> >>> which
>> >> >> >>> >>> >> describe
>> >> >> >>> >>> >> >> > what steps need to be taken such as : named entity
>> >> >> >>> recognition,
>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
>> supports
>> >> >> >>> sentence
>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, NER
>> and
>> >> >> lemma.
>> >> >> >>> >>> support
>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees is
>> >> currently
>> >> >> >>> >>> missing.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol [4].
>> At
>> >> the
>> >> >> >>> moment
>> >> >> >>> >>> it
>> >> >> >>> >>> >> >> only supports English, but I do already work to
>> include
>> >> the
>> >> >> >>> other
>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
>> already
>> >> >> >>> integrated
>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But
>> note
>> >> >> that
>> >> >> >>> for
>> >> >> >>> >>> all
>> >> >> >>> >>> >> >> those the integration excludes support for
>> co-reference
>> >> and
>> >> >> >>> >>> dependency
>> >> >> >>> >>> >> >> trees.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a first
>> >> >> prototype
>> >> >> >>> by
>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if available -
>> >> Chunks
>> >> >> >>> (e.g.
>> >> >> >>> >>> >> >> Noun phrases).
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature like
>> >> Relation
>> >> >> >>> >>> extraction
>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
>> >> >> >>> >>> >> > What kind of effort would be required for a
>> co-reference
>> >> >> >>> resolution
>> >> >> >>> >>> tool
>> >> >> >>> >>> >> > integration into Stanbol?
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
>> before
>> >> we
>> >> >> can
>> >> >> >>> >>> >> build such an engine we would need to
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with Annotations
>> for
>> >> >> >>> >>> co-reference
>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for those
>> >> >> annotation
>> >> >> >>> so
>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
>> >> co-reference
>> >> >> >>> >>> >> information
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate the
>> >> >> extracted
>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> Don't make to to complex. Defining a proper structure to
>> >> >> represent
>> >> >> >>> >>> >> Events will only pay-off if we can also successfully
>> extract
>> >> >> such
>> >> >> >>> >>> >> information form processed texts.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> I would start with
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>  * fise:SettingAnnotation
>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple if
>> there
>> >> are
>> >> >> >>> more
>> >> >> >>> >>> >> suggestions)
>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
>> >> fise:Instrument,
>> >> >> >>> >>> fise:Cause
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>> >> >> >>> >>> >>     * dc:type set to fise:Activity
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> If it turns out that we can extract more, we can add more
>> >> >> >>> structure to
>> >> >> >>> >>> >> those annotations. We might also think about using an own
>> >> >> namespace
>> >> >> >>> >>> >> for those extensions to the annotation structure.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated into
>> >> >> Stanbol.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
>> >> enhancement
>> >> >> >>> chain
>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> You should have a look at
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot of
>> things
>> >> >> with
>> >> >> >>> NLP
>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
>> verbs) to
>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
>> dependency
>> >> >> trees
>> >> >> >>> >>> >> you code will need to do similar things with Nouns,
>> Pronouns
>> >> and
>> >> >> >>> >>> >> Verbs.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
>> >> >> representation
>> >> >> >>> of
>> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation
>> [2].
>> >> >> >>> Something
>> >> >> >>> >>> >> similar will also be required by the
>> EventExtractionEngine
>> >> for
>> >> >> fast
>> >> >> >>> >>> >> access to such annotations while iterating over the
>> >> Sentences of
>> >> >> >>> the
>> >> >> >>> >>> >> text.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> best
>> >> >> >>> >>> >> Rupert
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> [1]
>> >> >> >>> >>> >>
>> >> >> >>> >>>
>> >> >> >>>
>> >> >>
>> >>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>> >> >> >>> >>> >> [2]
>> >> >> >>> >>> >>
>> >> >> >>> >>>
>> >> >> >>>
>> >> >>
>> >>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > Thanks
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
>> >> >> >>> >>> >> >> best
>> >> >> >>> >>> >> >> Rupert
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> --
>> >> >> >>> >>> >> >> | Rupert Westenthaler
>> >> >> rupert.westenthaler@gmail.com
>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
>> >> >> >>> ++43-699-11108907
>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> --
>> >> >> >>> >>> >> | Rupert Westenthaler
>> >> rupert.westenthaler@gmail.com
>> >> >> >>> >>> >> | Bodenlehenstraße 11
>> >> >> >>> ++43-699-11108907
>> >> >> >>> >>> >> | A-5500 Bischofshofen
>> >> >> >>> >>> >>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>> --
>> >> >> >>> >>> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> >> >> >>> >>> | Bodenlehenstraße 11
>> >> >> ++43-699-11108907
>> >> >> >>> >>> | A-5500 Bischofshofen
>> >> >> >>> >>>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> >>> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >> >> >>> | A-5500 Bischofshofen
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Re: Relation extraction feature

Posted by Cristian Petroaca <cr...@gmail.com>.
Ok. This means that I'll need to do a little refactoring on
the ValueTypeParser.parser() method to include a reference to the
AnalyzedText object coming from AnalyzedTextParser.


2013/9/16 Rupert Westenthaler <ru...@gmail.com>

> Hi Cristian
>
> If you have start/end and type of the referenced Span you can use the
> according
>
>     AnalysedText#add**
>
> e.g.
>
>     AnalysedText#addToken(start, end)
>     AnalysedText#addChunk(start, end)
>
> method and just use the returned instance. Those methods do all the
> magic. Meaning if the referenced Span does not yet exist (forward
> reference) it will create a new instance. If the Span already exists
> (backward reference) you will get the existing instance including all
> the other annotations already parsed from the JSON. In case of a
> forward reference the Span created by you (for forward references)
> other annotations will be added by the same way.
>
> This behavior is also the reason why the constructors of the TokenImpl
> and ChunkImpl (and all other **Impl) are not public.
>
> A similar code can be found in the
>
>     AnalyzedTextParser#parseSpan(AnalysedText at, JsonNode node)
>
> method (o.a.s.enhancer.nlp.json module)
>
>
> So if you have a reference to a Span in your Java API:
>
> (1) parse the start/end/type of the reference
> (2) call add**(start, end) on the AnalysedText
> (3) add the returned Span to your set with references
>
> If you want your references to be sorted you should use NavigableSet
> instead of Set.
>
> best
> Rupert
>
> On Sun, Sep 15, 2013 at 2:32 PM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > I've already started to implement the Coreference bit first in the nlp
> and
> > nlp-json projects. There's one thing that I don't know how to implement.
> > The CorefTag class contains a Set<Span> mentions member (represents the
> > "mentions" array defined in an earlier mail) and in the
> > CorefTagSupport.parse() method I need to reconstuct the CorefTag object
> > from json. I can't figure out how can I construct the aforementioned
> member
> > which should contain the references to mentions whch are Span objects
> found
> > in the AnalyzedTextImpl. One problem is I don't have access to the
> > AnalyzedTextImpl object and even if I did there could be situations in
> > which I am constructing a CorefTag for a Span which contains mentions to
> > other Spans which have not been parsed yet and they don't exist in the
> > AnalyzedTextImpl.
> >
> > One solution would be not to link with the actual Span references from
> the
> > AnalyzedTextImpl but to create new Span Objects (ChunkImpl, TokenImpl).
> > That would need the ChunkImpl and TokenImpl constructors to be changed
> from
> > protected to public.
> >
> >
> > 2013/9/12 Rupert Westenthaler <ru...@gmail.com>
> >
> >> Hi Cristian,
> >>
> >> In fact I missed it. Sorry for that.
> >>
> >> I think the revised proposal looks like a good start. Usually one
> >> needs make some adaptions when writing the actual code.
> >>
> >> If you have a first version attach it to an issue and I will commit it
> >> to the branch.
> >>
> >> best
> >> Rupert
> >>
> >>
> >> On Thu, Sep 12, 2013 at 9:04 AM, Cristian Petroaca
> >> <cr...@gmail.com> wrote:
> >> > Hi Rupert,
> >> >
> >> > This is a reminder in case you missed this e-mail.
> >> >
> >> > Cristian
> >> >
> >> >
> >> > 2013/9/3 Cristian Petroaca <cr...@gmail.com>
> >> >
> >> >> Ok, then to sum it up we would have :
> >> >>
> >> >> 1. Coref
> >> >>
> >> >> "stanbol.enhancer.nlp.coref" {
> >> >>     "isRepresentative" : true/false, // whether this token or chunk
> is
> >> the
> >> >> representative mention in the chain
> >> >>     "mentions" : [ { "type" : "Token", // type of element which
> refers
> >> to
> >> >> this token/chunk
> >> >>  "start": 123 , // start index of the mentioning element
> >> >>  "end": 130 // end index of the mentioning element
> >> >>                     }, ...
> >> >>                  ],
> >> >>     "class" : ""class" :
> >> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >> >> }
> >> >>
> >> >>
> >> >> 2. Dependency tree
> >> >>
> >> >> "stanbol.enhancer.nlp.dependency" : {
> >> >> "relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP
> >> >> notation
> >> >>                        "dep" : 12, // type of relation - Stanbol NLP
> >> >> mapped value - ordinal number in enum Dependency
> >> >> "role" : "gov/dep", // whether this token is the depender or the
> >> dependee
> >> >>  "type" : "Token", // type of element with which this token is in
> >> relation
> >> >> "start" : 123, // start index of the relating token
> >> >>  "end" : 130 // end index of the relating token
> >> >> },
> >> >> ...
> >> >>  ]
> >> >> "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
> >> >> }
> >> >>
> >> >>
> >> >> 2013/9/2 Rupert Westenthaler <ru...@gmail.com>
> >> >>
> >> >>> Hi Cristian,
> >> >>>
> >> >>> let me provide some feedback to your proposals:
> >> >>>
> >> >>> ### Referring other Spans
> >> >>>
> >> >>> Both suggested annotations require to link other spans (Sentence,
> >> >>> Chunk or Token). For that we should introduce a JSON element used
> for
> >> >>> referring those elements and use it for all usages.
> >> >>>
> >> >>> In the java model this would allow you to have a reference to the
> >> >>> other Span (Sentence, Chunk, Token). In the serialized form you
> would
> >> >>> have JSON elements with the "type", "start" and "end" attributes as
> >> >>> those three uniquely identify any span.
> >> >>>
> >> >>> Here an example based on the "mention" attribute as defined by the
> >> >>> proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >> >>>
> >> >>>     ...
> >> >>>     "mentions" : [ {
> >> >>>         "type" : "Token",
> >> >>>         "start": 123 ,
> >> >>>         "end": 130 } ,{
> >> >>>         "type" : "Token",
> >> >>>         "start": 157 ,
> >> >>>         "end": 165 }],
> >> >>>     ...
> >> >>>
> >> >>> Similar token links in
> >> >>> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should
> also
> >> >>> use this model.
> >> >>>
> >> >>> ### Usage of Controlled Vocabularies
> >> >>>
> >> >>> In addition the DependencyTag also seams to use a controlled
> >> >>> vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
> >> >>> NLP module tries to define those in some kind of Ontology. For POS
> >> >>> tags we use OLIA ontology [1]. This is important as most NLP
> >> >>> frameworks will use different strings and we need to unify those to
> >> >>> commons IDs so that component that consume those data do not depend
> on
> >> >>> a specific NLP tool.
> >> >>>
> >> >>> Because the usage of Ontologies within Java is not well supported.
> The
> >> >>> Stanbol NLP module defines Java Enumerations for those Ontologies
> such
> >> >>> as the POS type enumeration [2].
> >> >>>
> >> >>> Both the Java Model as well as the JSON serialization do support
> both
> >> >>> (1) the lexical tag as used by the NLP tool and (2) the mapped
> >> >>> concept. In the Java API via two different methods and in the JSON
> >> >>> serialization via two separate keys.
> >> >>>
> >> >>> To make this more clear here an example for a POS annotation of a
> >> proper
> >> >>> noun.
> >> >>>
> >> >>>     "stanbol.enhancer.nlp.pos" : {
> >> >>>         "tag" : "PN",
> >> >>>         "pos" : 53,
> >> >>>         "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
> >> >>>         "prob" : 0.95
> >> >>>     }
> >> >>>
> >> >>> where
> >> >>>
> >> >>>     "tag" : "PN"
> >> >>>
> >> >>> is the lexical form as used by the NLP tool and
> >> >>>
> >> >>>     "pos" : 53
> >> >>>
> >> >>> refers to the ordinal number of the entry "ProperNoun" in the POS
> >> >>> enumeration
> >> >>>
> >> >>> IMO the "type" property of DependencyTag should use a similar
> design.
> >> >>>
> >> >>> best
> >> >>> Rupert
> >> >>>
> >> >>> [1] http://olia.nlp2rdf.org/
> >> >>> [2]
> >> >>>
> >>
> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java
> >> >>>
> >> >>> On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
> >> >>> <cr...@gmail.com> wrote:
> >> >>> > Sorry, pressed sent too soon :).
> >> >>> >
> >> >>> > Continued :
> >> >>> >
> >> >>> > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4,
> Tom-3),
> >> >>> > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
> >> >>> >
> >> >>> > Given this, we can have for each "Token" an additional dependency
> >> >>> > annotation :
> >> >>> >
> >> >>> > "stanbol.enhancer.nlp.dependency" : {
> >> >>> > "tag" : //is it necessary?
> >> >>> > "relations" : [ { "type" : "nsubj", //type of relation
> >> >>> >   "role" : "gov/dep", //whether it is depender or the dependee
> >> >>> >   "dependencyValue" : "met", // the word with which the token has
> a
> >> >>> relation
> >> >>> >   "dependencyIndexInSentence" : "2" //the index of the dependency
> in
> >> the
> >> >>> > current sentence
> >> >>> > }
> >> >>> > ...
> >> >>> > ]
> >> >>> >                 "class" :
> >> >>> > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
> >> >>> >         }
> >> >>> >
> >> >>> > 2013/9/1 Cristian Petroaca <cr...@gmail.com>
> >> >>> >
> >> >>> >> Related to the Stanford Dependency Tree Feature, this is the way
> the
> >> >>> >> output from the tool looks like for this sentence : "Mary and Tom
> >> met
> >> >>> Danny
> >> >>> >> today" :
> >> >>> >>
> >> >>> >>
> >> >>> >> 2013/8/30 Cristian Petroaca <cr...@gmail.com>
> >> >>> >>
> >> >>> >>> Hi Rupert,
> >> >>> >>>
> >> >>> >>> Ok, so after looking at the JSON output from the Stanford NLP
> >> Server
> >> >>> and
> >> >>> >>> the coref module I'm thinking I can represent the coreference
> >> >>> information
> >> >>> >>> this way:
> >> >>> >>> Each "Token" or "Chunk" will contain an additional coref
> annotation
> >> >>> with
> >> >>> >>> the following structure :
> >> >>> >>>
> >> >>> >>> "stanbol.enhancer.nlp.coref" {
> >> >>> >>>     "tag" : //does this need to exist?
> >> >>> >>>     "isRepresentative" : true/false, // whether this token or
> >> chunk is
> >> >>> >>> the representative mention in the chain
> >> >>> >>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which
> the
> >> >>> mention
> >> >>> >>> is found
> >> >>> >>>                            "startWord" : 2 //the first word
> making
> >> up
> >> >>> the
> >> >>> >>> mention
> >> >>> >>>                            "endWord" : 3 //the last word making
> up
> >> the
> >> >>> >>> mention
> >> >>> >>>                          }, ...
> >> >>> >>>                        ],
> >> >>> >>>     "class" : ""class" :
> >> >>> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >> >>> >>> }
> >> >>> >>>
> >> >>> >>> The CorefTag should resemble this model.
> >> >>> >>>
> >> >>> >>> What do you think?
> >> >>> >>>
> >> >>> >>> Cristian
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> 2013/8/24 Rupert Westenthaler <ru...@gmail.com>
> >> >>> >>>
> >> >>> >>>> Hi Cristian,
> >> >>> >>>>
> >> >>> >>>> you can not directly call StanfordNLP components from Stanbol,
> but
> >> >>> you
> >> >>> >>>> have to extend the RESTful service to include the information
> you
> >> >>> >>>> need. The main reason for that is that the license of
> StanfordNLP
> >> is
> >> >>> >>>> not compatible with the Apache Software License. So Stanbol can
> >> not
> >> >>> >>>> directly link to the StanfordNLP API.
> >> >>> >>>>
> >> >>> >>>> You will need to
> >> >>> >>>>
> >> >>> >>>> 1. define an additional class {yourTag} extends Tag<{yourType}>
> >> class
> >> >>> >>>> in the o.a.s.enhancer.nlp module
> >> >>> >>>> 2. add JSON parsing and serialization support for this tag to
> the
> >> >>> >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an
> >> example)
> >> >>> >>>>
> >> >>> >>>> As (1) would be necessary anyway the only additional thing you
> >> need
> >> >>> to
> >> >>> >>>> develop is (2). After that you can add {yourTag} instance to
> the
> >> >>> >>>> AnalyzedText in the StanfornNLP integration. The
> >> >>> >>>> RestfulNlpAnalysisEngine will parse them from the response. All
> >> >>> >>>> engines executed after the RestfulNlpAnalysisEngine will have
> >> access
> >> >>> >>>> to your annotations.
> >> >>> >>>>
> >> >>> >>>> If you have a design for {yourTag} - the model you would like
> to
> >> use
> >> >>> >>>> to represent your data - I can help with (1) and (2).
> >> >>> >>>>
> >> >>> >>>> best
> >> >>> >>>> Rupert
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
> >> >>> >>>> <cr...@gmail.com> wrote:
> >> >>> >>>> > Hi Rupert,
> >> >>> >>>> >
> >> >>> >>>> > Thanks for the info. Looking at the standbol-stanfordnlp
> >> project I
> >> >>> see
> >> >>> >>>> that
> >> >>> >>>> > the stanford nlp is not implemented as an EnhancementEngine
> but
> >> >>> rather
> >> >>> >>>> it
> >> >>> >>>> > is used directly in a Jetty Server instance. How does that
> fit
> >> >>> into the
> >> >>> >>>> > Stanbol stack? For example how can I call the
> >> StanfordNlpAnalyzer's
> >> >>> >>>> routine
> >> >>> >>>> > from my TripleExtractionEnhancementEngine which lives in the
> >> >>> Stanbol
> >> >>> >>>> stack?
> >> >>> >>>> >
> >> >>> >>>> > Thanks,
> >> >>> >>>> > Cristian
> >> >>> >>>> >
> >> >>> >>>> >
> >> >>> >>>> > 2013/8/12 Rupert Westenthaler <rupert.westenthaler@gmail.com
> >
> >> >>> >>>> >
> >> >>> >>>> >> Hi Cristian,
> >> >>> >>>> >>
> >> >>> >>>> >> Sorry for the late response, but I was offline for the last
> two
> >> >>> weeks
> >> >>> >>>> >>
> >> >>> >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
> >> >>> >>>> >> <cr...@gmail.com> wrote:
> >> >>> >>>> >> > Hi Rupert,
> >> >>> >>>> >> >
> >> >>> >>>> >> > After doing some tests it seems that the Stanford NLP
> >> >>> coreference
> >> >>> >>>> module
> >> >>> >>>> >> is
> >> >>> >>>> >> > much more accurate than the Open NLP one.So I decided to
> >> extend
> >> >>> >>>> Stanford
> >> >>> >>>> >> > NLP to add coreference there.
> >> >>> >>>> >>
> >> >>> >>>> >> The Stanford NLP integration is not part of the Stanbol
> >> codebase
> >> >>> >>>> >> because the licenses are not compatible.
> >> >>> >>>> >>
> >> >>> >>>> >> You can find the Stanford NLP integration on
> >> >>> >>>> >>
> >> >>> >>>> >>     https://github.com/westei/stanbol-stanfordnlp
> >> >>> >>>> >>
> >> >>> >>>> >> just create a fork and send pull requests.
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >> > Could you add the necessary projects on the branch? And
> also
> >> >>> remove
> >> >>> >>>> the
> >> >>> >>>> >> > Open NLP ones?
> >> >>> >>>> >> >
> >> >>> >>>> >>
> >> >>> >>>> >> Currently the branch
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>>
> >> >>>
> >>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >> >>> >>>> >>
> >> >>> >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO
> those
> >> >>> should
> >> >>> >>>> >> be enough for adding coreference support.
> >> >>> >>>> >>
> >> >>> >>>> >> IMO you will need to
> >> >>> >>>> >>
> >> >>> >>>> >> * add an model for representing coreference to the nlp
> module
> >> >>> >>>> >> * add parsing and serializing support to the nlp-json module
> >> >>> >>>> >> * add the implementation to your fork of the
> >> stanbol-stanfordnlp
> >> >>> >>>> project
> >> >>> >>>> >>
> >> >>> >>>> >> best
> >> >>> >>>> >> Rupert
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >> > Thanks,
> >> >>> >>>> >> > Cristian
> >> >>> >>>> >> >
> >> >>> >>>> >> >
> >> >>> >>>> >> > 2013/7/5 Rupert Westenthaler <
> rupert.westenthaler@gmail.com>
> >> >>> >>>> >> >
> >> >>> >>>> >> >> Hi Cristian,
> >> >>> >>>> >> >>
> >> >>> >>>> >> >> I created the branch at
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >>
> >> >>> >>>>
> >> >>>
> >>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >> >>> >>>> >> >>
> >> >>> >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module.
> Let me
> >> >>> know
> >> >>> >>>> if
> >> >>> >>>> >> >> you would like to have more
> >> >>> >>>> >> >>
> >> >>> >>>> >> >> best
> >> >>> >>>> >> >> Rupert
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
> >> >>> >>>> >> >> <cr...@gmail.com> wrote:
> >> >>> >>>> >> >> > Hi Rupert,
> >> >>> >>>> >> >> >
> >> >>> >>>> >> >> > I created jiras :
> >> >>> >>>> https://issues.apache.org/jira/browse/STANBOL-1132and
> >> >>> >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133.
> The
> >> >>> >>>> original one
> >> >>> >>>> >> in
> >> >>> >>>> >> >> > dependent upon these.
> >> >>> >>>> >> >> > Please let me know when I can start using the branch.
> >> >>> >>>> >> >> >
> >> >>> >>>> >> >> > Thanks,
> >> >>> >>>> >> >> > Cristian
> >> >>> >>>> >> >> >
> >> >>> >>>> >> >> >
> >> >>> >>>> >> >> > 2013/6/27 Cristian Petroaca <
> cristian.petroaca@gmail.com>
> >> >>> >>>> >> >> >
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >> 2013/6/27 Rupert Westenthaler <
> >> >>> rupert.westenthaler@gmail.com>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
> >> >>> >>>> >> >> >>> <cr...@gmail.com> wrote:
> >> >>> >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford
> in my
> >> >>> >>>> previous
> >> >>> >>>> >> >> e-mail.
> >> >>> >>>> >> >> >>> By
> >> >>> >>>> >> >> >>> > the way, does Open NLP have the ability to build
> >> >>> dependency
> >> >>> >>>> trees?
> >> >>> >>>> >> >> >>> >
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >> Then , since the Stanford NLP lib is also integrated
> into
> >> >>> >>>> Stanbol,
> >> >>> >>>> >> I'll
> >> >>> >>>> >> >> >> take a look at how I can extend its integration to
> >> include
> >> >>> the
> >> >>> >>>> >> >> dependency
> >> >>> >>>> >> >> >> tree feature.
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>  >
> >> >>> >>>> >> >> >>> > 2013/6/23 Cristian Petroaca <
> >> cristian.petroaca@gmail.com
> >> >>> >
> >> >>> >>>> >> >> >>> >
> >> >>> >>>> >> >> >>> >> Hi Rupert,
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >> I created jira
> >> >>> >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
> >> >>> >>>> >> >> >>> >> As you suggested I would start with extending the
> >> >>> Stanford
> >> >>> >>>> NLP
> >> >>> >>>> >> with
> >> >>> >>>> >> >> >>> >> co-reference resolution but I think also with
> >> dependency
> >> >>> >>>> trees
> >> >>> >>>> >> >> because
> >> >>> >>>> >> >> >>> I
> >> >>> >>>> >> >> >>> >> also need to know the Subject of the sentence and
> the
> >> >>> object
> >> >>> >>>> >> that it
> >> >>> >>>> >> >> >>> >> affects, right?
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API
> in
> >> >>> Stanbol
> >> >>> >>>> for
> >> >>> >>>> >> >> >>> >> co-reference and dependency trees, how do I
> proceed
> >> with
> >> >>> >>>> this?
> >> >>> >>>> >> Do I
> >> >>> >>>> >> >> >>> create
> >> >>> >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After
> >> that
> >> >>> can I
> >> >>> >>>> >> start
> >> >>> >>>> >> >> >>> >> implementing on my local copy of Stanbol and when
> I'm
> >> >>> done
> >> >>> >>>> I'll
> >> >>> >>>> >> send
> >> >>> >>>> >> >> >>> you
> >> >>> >>>> >> >> >>> >> guys the patch fo review?
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> I would create two "New Feature" type Issues one for
> >> adding
> >> >>> >>>> support
> >> >>> >>>> >> >> >>> for "dependency trees" and the other for
> "co-reference"
> >> >>> >>>> support. You
> >> >>> >>>> >> >> >>> should also define "depends on" relations between
> >> >>> STANBOL-1121
> >> >>> >>>> and
> >> >>> >>>> >> >> >>> those two new issues.
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> Sub-task could also work, but as adding those
> features
> >> >>> would
> >> >>> >>>> be also
> >> >>> >>>> >> >> >>> interesting for other things I would rather define
> them
> >> as
> >> >>> >>>> separate
> >> >>> >>>> >> >> >>> issues.
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >> 2 New Features connected with the original jira it is
> >> then.
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>> If you would prefer to work in an own branch please
> tell
> >> >>> me.
> >> >>> >>>> This
> >> >>> >>>> >> >> >>> could have the advantage that patches would not be
> >> >>> affected by
> >> >>> >>>> >> changes
> >> >>> >>>> >> >> >>> in the trunk.
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> Yes, a separate branch sounds good.
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >> best
> >> >>> >>>> >> >> >>> Rupert
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> >> Regards,
> >> >>> >>>> >> >> >>> >> Cristian
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
> >> >>> >>>> rupert.westenthaler@gmail.com>
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian
> Petroaca
> >> >>> >>>> >> >> >>> >>> <cr...@gmail.com> wrote:
> >> >>> >>>> >> >> >>> >>> > Hi Rupert,
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> > Agreed on the
> >> >>> >>>> >> >> >>>
> >> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
> >> >>> >>>> >> >> >>> >>> > data structure.
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> > Should I open up a Jira for all of this in
> order
> >> to
> >> >>> >>>> >> encapsulate
> >> >>> >>>> >> >> this
> >> >>> >>>> >> >> >>> >>> > information and establish the goals and these
> >> initial
> >> >>> >>>> steps
> >> >>> >>>> >> >> towards
> >> >>> >>>> >> >> >>> >>> these
> >> >>> >>>> >> >> >>> >>> > goals?
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be
> >> great.
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> > How should I proceed further? Should I create
> some
> >> >>> design
> >> >>> >>>> >> >> documents
> >> >>> >>>> >> >> >>> that
> >> >>> >>>> >> >> >>> >>> > need to be reviewed?
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> Usually it is the best to write design related
> text
> >> >>> >>>> directly in
> >> >>> >>>> >> >> JIRA
> >> >>> >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us
> >> later
> >> >>> to
> >> >>> >>>> use
> >> >>> >>>> >> this
> >> >>> >>>> >> >> >>> >>> text directly for the documentation on the
> Stanbol
> >> >>> Webpage.
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> best
> >> >>> >>>> >> >> >>> >>> Rupert
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> > Regards,
> >> >>> >>>> >> >> >>> >>> > Cristian
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
> >> >>> >>>> rupert.westenthaler@gmail.com>
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian
> >> Petroaca
> >> >>> >>>> >> >> >>> >>> >> <cr...@gmail.com> wrote:
> >> >>> >>>> >> >> >>> >>> >> > HI Rupert,
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > First of all thanks for the detailed
> >> suggestions.
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
> >> >>> >>>> >> rupert.westenthaler@gmail.com>
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> >> Hi Cristian, all
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> really interesting use case!
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> In this mail I will try to give some
> >> suggestions
> >> >>> on
> >> >>> >>>> how
> >> >>> >>>> >> this
> >> >>> >>>> >> >> >>> could
> >> >>> >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly
> based on
> >> >>> >>>> experiences
> >> >>> >>>> >> >> and
> >> >>> >>>> >> >> >>> >>> lessons
> >> >>> >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we
> >> built an
> >> >>> >>>> >> information
> >> >>> >>>> >> >> >>> system
> >> >>> >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this
> >> >>> Project
> >> >>> >>>> >> excluded
> >> >>> >>>> >> >> the
> >> >>> >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text
> >> >>> (because
> >> >>> >>>> the
> >> >>> >>>> >> >> Olympic
> >> >>> >>>> >> >> >>> >>> >> >> Information System was already providing
> event
> >> >>> data
> >> >>> >>>> as XML
> >> >>> >>>> >> >> >>> messages)
> >> >>> >>>> >> >> >>> >>> >> >> the semantic search capabilities of this
> >> system
> >> >>> >>>> where very
> >> >>> >>>> >> >> >>> similar
> >> >>> >>>> >> >> >>> >>> as
> >> >>> >>>> >> >> >>> >>> >> >> the one described by your use case.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract
> >> >>> relations,
> >> >>> >>>> but a
> >> >>> >>>> >> >> formal
> >> >>> >>>> >> >> >>> >>> >> >> representation of the situation described
> by
> >> the
> >> >>> >>>> text. So
> >> >>> >>>> >> >> lets
> >> >>> >>>> >> >> >>> >>> assume
> >> >>> >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or
> >> >>> Situation)
> >> >>> >>>> >> >> described
> >> >>> >>>> >> >> >>> in
> >> >>> >>>> >> >> >>> >>> the
> >> >>> >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives
> some
> >> >>> >>>> advices on
> >> >>> >>>> >> >> how to
> >> >>> >>>> >> >> >>> >>> model
> >> >>> >>>> >> >> >>> >>> >> >> those. The important relation for modeling
> >> this
> >> >>> >>>> >> >> Participation:
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> where ..
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants):
> Endurants
> >> do
> >> >>> have
> >> >>> >>>> an
> >> >>> >>>> >> >> >>> identity so
> >> >>> >>>> >> >> >>> >>> we
> >> >>> >>>> >> >> >>> >>> >> >> would typically refer to them as Entities
> >> >>> referenced
> >> >>> >>>> by a
> >> >>> >>>> >> >> >>> setting.
> >> >>> >>>> >> >> >>> >>> >> >> Note that this includes physical,
> >> non-physical as
> >> >>> >>>> well as
> >> >>> >>>> >> >> >>> >>> >> >> social-objects.
> >> >>> >>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):
>  Perdurants
> >> >>> are
> >> >>> >>>> >> entities
> >> >>> >>>> >> >> that
> >> >>> >>>> >> >> >>> >>> >> >> happen in time. This refers to Events,
> >> >>> Activities ...
> >> >>> >>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time
> indexed
> >> >>> >>>> relation
> >> >>> >>>> >> where
> >> >>> >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define
> some
> >> >>> >>>> intermediate
> >> >>> >>>> >> >> >>> resources
> >> >>> >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary
> >> relations.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really
> handy
> >> to
> >> >>> >>>> define
> >> >>> >>>> >> one
> >> >>> >>>> >> >> >>> resource
> >> >>> >>>> >> >> >>> >>> >> >> being the context for all described data. I
> >> would
> >> >>> >>>> call
> >> >>> >>>> >> this
> >> >>> >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
> >> >>> >>>> sub-concept to
> >> >>> >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement
> >> about
> >> >>> the
> >> >>> >>>> >> extracted
> >> >>> >>>> >> >> >>> >>> Setting
> >> >>> >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation
> to
> >> it.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to
> >> >>> annotate
> >> >>> >>>> that
> >> >>> >>>> >> >> >>> Endurant is
> >> >>> >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
> >> >>> >>>> >> >> >>> fise:SettingAnnotation).
> >> >>> >>>> >> >> >>> >>> >> >> The Endurant itself is described by
> existing
> >> >>> >>>> >> >> fise:TextAnnotaion
> >> >>> >>>> >> >> >>> (the
> >> >>> >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation
> (suggested
> >> >>> >>>> Entities).
> >> >>> >>>> >> >> >>> Basically
> >> >>> >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow
> an
> >> >>> >>>> >> >> EnhancementEngine
> >> >>> >>>> >> >> >>> to
> >> >>> >>>> >> >> >>> >>> >> >> state that several mentions (in possible
> >> >>> different
> >> >>> >>>> >> >> sentences) do
> >> >>> >>>> >> >> >>> >>> >> >> represent the same Endurant as
> participating
> >> in
> >> >>> the
> >> >>> >>>> >> Setting.
> >> >>> >>>> >> >> In
> >> >>> >>>> >> >> >>> >>> >> >> addition it would be possible to use the
> >> dc:type
> >> >>> >>>> property
> >> >>> >>>> >> >> >>> (similar
> >> >>> >>>> >> >> >>> >>> as
> >> >>> >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the
> >> role(s)
> >> >>> of
> >> >>> >>>> an
> >> >>> >>>> >> >> >>> participant
> >> >>> >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally
> performs
> >> an
> >> >>> >>>> action)
> >> >>> >>>> >> Cause
> >> >>> >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide),
> Patient (a
> >> >>> >>>> passive
> >> >>> >>>> >> role
> >> >>> >>>> >> >> in
> >> >>> >>>> >> >> >>> an
> >> >>> >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an
> process)),
> >> but
> >> >>> I am
> >> >>> >>>> >> >> wondering
> >> >>> >>>> >> >> >>> if
> >> >>> >>>> >> >> >>> >>> one
> >> >>> >>>> >> >> >>> >>> >> >> could extract those information.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to
> >> annotate a
> >> >>> >>>> >> Perdurant
> >> >>> >>>> >> >> in
> >> >>> >>>> >> >> >>> the
> >> >>> >>>> >> >> >>> >>> >> >> context of the Setting. Also
> >> >>> >>>> fise:OccurrentAnnotation can
> >> >>> >>>> >> >> link
> >> >>> >>>> >> >> >>> to
> >> >>> >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the
> >> text
> >> >>> >>>> defining
> >> >>> >>>> >> the
> >> >>> >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
> >> >>> >>>> suggesting
> >> >>> >>>> >> well
> >> >>> >>>> >> >> >>> known
> >> >>> >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election
> >> in a
> >> >>> >>>> country,
> >> >>> >>>> >> or
> >> >>> >>>> >> >> an
> >> >>> >>>> >> >> >>> >>> >> >> upraising ...). In addition
> >> >>> fise:OccurrentAnnotation
> >> >>> >>>> can
> >> >>> >>>> >> >> define
> >> >>> >>>> >> >> >>> >>> >> >> dc:has-participant links to
> >> >>> >>>> fise:ParticipantAnnotation. In
> >> >>> >>>> >> >> this
> >> >>> >>>> >> >> >>> case
> >> >>> >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant
> (the
> >> >>> >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in
> this
> >> >>> >>>> Perturant
> >> >>> >>>> >> (the
> >> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences
> are
> >> >>> >>>> temporal
> >> >>> >>>> >> >> indexed
> >> >>> >>>> >> >> >>> this
> >> >>> >>>> >> >> >>> >>> >> >> annotation should also support properties
> for
> >> >>> >>>> defining the
> >> >>> >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> Indeed, an event based data structure
> makes a
> >> >>> lot of
> >> >>> >>>> sense
> >> >>> >>>> >> >> with
> >> >>> >>>> >> >> >>> the
> >> >>> >>>> >> >> >>> >>> >> remark
> >> >>> >>>> >> >> >>> >>> >> > that you probably won't be able to always
> >> extract
> >> >>> the
> >> >>> >>>> date
> >> >>> >>>> >> >> for a
> >> >>> >>>> >> >> >>> >>> given
> >> >>> >>>> >> >> >>> >>> >> > setting(situation).
> >> >>> >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in
> >> which
> >> >>> the
> >> >>> >>>> >> object
> >> >>> >>>> >> >> upon
> >> >>> >>>> >> >> >>> >>> which
> >> >>> >>>> >> >> >>> >>> >> the
> >> >>> >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a
> >> >>> transitory
> >> >>> >>>> >> object (
> >> >>> >>>> >> >> >>> such
> >> >>> >>>> >> >> >>> >>> as an
> >> >>> >>>> >> >> >>> >>> >> > event, activity ) but rather another
> Endurant.
> >> For
> >> >>> >>>> example
> >> >>> >>>> >> we
> >> >>> >>>> >> >> can
> >> >>> >>>> >> >> >>> >>> have
> >> >>> >>>> >> >> >>> >>> >> the
> >> >>> >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the
> >> >>> Endurant
> >> >>> >>>> (
> >> >>> >>>> >> >> Subject )
> >> >>> >>>> >> >> >>> >>> which
> >> >>> >>>> >> >> >>> >>> >> > performs the action of "invading" on another
> >> >>> >>>> Eundurant,
> >> >>> >>>> >> namely
> >> >>> >>>> >> >> >>> >>> "Irak".
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq
> >> the
> >> >>> >>>> Patient.
> >> >>> >>>> >> Both
> >> >>> >>>> >> >> >>> are
> >> >>> >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be
> the
> >> >>> >>>> Perdurant. So
> >> >>> >>>> >> >> >>> ideally
> >> >>> >>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation"
> with:
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with
> the
> >> >>> dc:type
> >> >>> >>>> >> >> caos:Agent,
> >> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA"
> and a
> >> >>> >>>> >> >> >>> fise:EntityAnnotation
> >> >>> >>>> >> >> >>> >>> >> linking to dbpedia:United_States
> >> >>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with
> the
> >> >>> dc:type
> >> >>> >>>> >> >> >>> caos:Patient,
> >> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak"
> and a
> >> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
> >> >>> >>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades"
> with
> >> the
> >> >>> >>>> dc:type
> >> >>> >>>> >> >> >>> >>> >> caos:Activity, linking to a
> fise:TextAnnotation
> >> for
> >> >>> >>>> "invades"
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the
> Subject
> >> >>> and
> >> >>> >>>> the
> >> >>> >>>> >> Object
> >> >>> >>>> >> >> >>> come
> >> >>> >>>> >> >> >>> >>> into
> >> >>> >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would
> have a
> >> >>> >>>> >> dc:"property"
> >> >>> >>>> >> >> >>> where
> >> >>> >>>> >> >> >>> >>> the
> >> >>> >>>> >> >> >>> >>> >> > property = verb which links to the Object in
> >> noun
> >> >>> >>>> form. For
> >> >>> >>>> >> >> >>> example
> >> >>> >>>> >> >> >>> >>> take
> >> >>> >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You
> >> would
> >> >>> have
> >> >>> >>>> the
> >> >>> >>>> >> >> "USA"
> >> >>> >>>> >> >> >>> >>> Entity
> >> >>> >>>> >> >> >>> >>> >> with
> >> >>> >>>> >> >> >>> >>> >> > dc:invader which points to the Object
> "Irak".
> >> The
> >> >>> >>>> Endurant
> >> >>> >>>> >> >> would
> >> >>> >>>> >> >> >>> >>> have as
> >> >>> >>>> >> >> >>> >>> >> > many dc:"property" elements as there are
> verbs
> >> >>> which
> >> >>> >>>> link
> >> >>> >>>> >> it
> >> >>> >>>> >> >> to
> >> >>> >>>> >> >> >>> an
> >> >>> >>>> >> >> >>> >>> >> Object.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> As explained above you would have a
> >> >>> >>>> fise:OccurrentAnnotation
> >> >>> >>>> >> >> that
> >> >>> >>>> >> >> >>> >>> >> represents the Perdurant. The information that
> >> the
> >> >>> >>>> activity
> >> >>> >>>> >> >> >>> mention in
> >> >>> >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
> >> >>> >>>> >> >> >>> fise:TextAnnotation. If
> >> >>> >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks
> that
> >> >>> defines
> >> >>> >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation
> >> could
> >> >>> >>>> also link
> >> >>> >>>> >> >> to an
> >> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> best
> >> >>> >>>> >> >> >>> >>> >> Rupert
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > ### Consuming the data:
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> I think this model should be sufficient for
> >> >>> >>>> use-cases as
> >> >>> >>>> >> >> >>> described
> >> >>> >>>> >> >> >>> >>> by
> >> >>> >>>> >> >> >>> >>> >> you.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> Users would be able to consume data on the
> >> >>> setting
> >> >>> >>>> level.
> >> >>> >>>> >> >> This
> >> >>> >>>> >> >> >>> can
> >> >>> >>>> >> >> >>> >>> be
> >> >>> >>>> >> >> >>> >>> >> >> done my simple retrieving all
> >> >>> >>>> fise:ParticipantAnnotation
> >> >>> >>>> >> as
> >> >>> >>>> >> >> >>> well as
> >> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a
> >> setting.
> >> >>> BTW
> >> >>> >>>> this
> >> >>> >>>> >> was
> >> >>> >>>> >> >> the
> >> >>> >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic
> >> search. It
> >> >>> >>>> allows
> >> >>> >>>> >> >> >>> queries for
> >> >>> >>>> >> >> >>> >>> >> >> Settings that involve specific Entities
> e.g.
> >> you
> >> >>> >>>> could
> >> >>> >>>> >> filter
> >> >>> >>>> >> >> >>> for
> >> >>> >>>> >> >> >>> >>> >> >> Settings that involve a {Person},
> >> >>> >>>> activities:Arrested and
> >> >>> >>>> >> a
> >> >>> >>>> >> >> >>> specific
> >> >>> >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this
> >> approach
> >> >>> >>>> you will
> >> >>> >>>> >> >> get
> >> >>> >>>> >> >> >>> >>> results
> >> >>> >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated
> >> and
> >> >>> an
> >> >>> >>>> other
> >> >>> >>>> >> >> person
> >> >>> >>>> >> >> >>> was
> >> >>> >>>> >> >> >>> >>> >> >> arrested.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> An other possibility would be to process
> >> >>> enhancement
> >> >>> >>>> >> results
> >> >>> >>>> >> >> on
> >> >>> >>>> >> >> >>> the
> >> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow
> to
> >> a
> >> >>> much
> >> >>> >>>> >> higher
> >> >>> >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to
> >> >>> correctly
> >> >>> >>>> answer
> >> >>> >>>> >> >> the
> >> >>> >>>> >> >> >>> query
> >> >>> >>>> >> >> >>> >>> >> >> used as an example above). But I am
> wondering
> >> if
> >> >>> the
> >> >>> >>>> >> quality
> >> >>> >>>> >> >> of
> >> >>> >>>> >> >> >>> the
> >> >>> >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for
> >> this. I
> >> >>> >>>> have
> >> >>> >>>> >> also
> >> >>> >>>> >> >> >>> doubts
> >> >>> >>>> >> >> >>> >>> if
> >> >>> >>>> >> >> >>> >>> >> >> this can be still realized by using
> semantic
> >> >>> >>>> indexing to
> >> >>> >>>> >> >> Apache
> >> >>> >>>> >> >> >>> Solr
> >> >>> >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store
> >> >>> results
> >> >>> >>>> in a
> >> >>> >>>> >> >> >>> TripleStore
> >> >>> >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> The methodology and query language used by
> >> YAGO
> >> >>> [3]
> >> >>> >>>> is
> >> >>> >>>> >> also
> >> >>> >>>> >> >> very
> >> >>> >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter
> 7
> >> >>> SPOTL(X)
> >> >>> >>>> >> >> >>> >>> Representation).
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of
> >> >>> Entities
> >> >>> >>>> >> >> (especially
> >> >>> >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on
> Settings
> >> >>> >>>> extracted
> >> >>> >>>> >> form
> >> >>> >>>> >> >> >>> >>> Documents.
> >> >>> >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants
> are
> >> >>> >>>> temporal
> >> >>> >>>> >> >> indexed.
> >> >>> >>>> >> >> >>> That
> >> >>> >>>> >> >> >>> >>> >> >> means that at the time when added to a
> >> knowledge
> >> >>> >>>> base they
> >> >>> >>>> >> >> might
> >> >>> >>>> >> >> >>> >>> still
> >> >>> >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching
> and
> >> >>> >>>> refinement
> >> >>> >>>> >> of
> >> >>> >>>> >> >> such
> >> >>> >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to
> be
> >> >>> >>>> critical for
> >> >>> >>>> >> a
> >> >>> >>>> >> >> >>> System
> >> >>> >>>> >> >> >>> >>> >> >> like described in your use-case.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian
> >> >>> Petroaca
> >> >>> >>>> >> >> >>> >>> >> >> <cr...@gmail.com> wrote:
> >> >>> >>>> >> >> >>> >>> >> >> >
> >> >>> >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am
> new
> >> >>> in the
> >> >>> >>>> >> field
> >> >>> >>>> >> >> of
> >> >>> >>>> >> >> >>> >>> semantic
> >> >>> >>>> >> >> >>> >>> >> >> > technologies, I've started to read about
> >> them
> >> >>> in
> >> >>> >>>> the
> >> >>> >>>> >> last
> >> >>> >>>> >> >> 4-5
> >> >>> >>>> >> >> >>> >>> >> >> months.Having
> >> >>> >>>> >> >> >>> >>> >> >> > said that I have a high level overview of
> >> what
> >> >>> is
> >> >>> >>>> a good
> >> >>> >>>> >> >> >>> approach
> >> >>> >>>> >> >> >>> >>> to
> >> >>> >>>> >> >> >>> >>> >> >> solve
> >> >>> >>>> >> >> >>> >>> >> >> > this problem. There are a number of
> papers
> >> on
> >> >>> the
> >> >>> >>>> >> internet
> >> >>> >>>> >> >> >>> which
> >> >>> >>>> >> >> >>> >>> >> describe
> >> >>> >>>> >> >> >>> >>> >> >> > what steps need to be taken such as :
> named
> >> >>> entity
> >> >>> >>>> >> >> >>> recognition,
> >> >>> >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and
> >> >>> others.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently
> >> only
> >> >>> >>>> supports
> >> >>> >>>> >> >> >>> sentence
> >> >>> >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging,
> >> Chunking,
> >> >>> NER
> >> >>> >>>> and
> >> >>> >>>> >> >> lemma.
> >> >>> >>>> >> >> >>> >>> support
> >> >>> >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency
> >> trees
> >> >>> is
> >> >>> >>>> >> currently
> >> >>> >>>> >> >> >>> >>> missing.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with
> >> Stanbol
> >> >>> [4].
> >> >>> >>>> At
> >> >>> >>>> >> the
> >> >>> >>>> >> >> >>> moment
> >> >>> >>>> >> >> >>> >>> it
> >> >>> >>>> >> >> >>> >>> >> >> only supports English, but I do already
> work
> >> to
> >> >>> >>>> include
> >> >>> >>>> >> the
> >> >>> >>>> >> >> >>> other
> >> >>> >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework
> that
> >> is
> >> >>> >>>> already
> >> >>> >>>> >> >> >>> integrated
> >> >>> >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane
> >> [6].
> >> >>> But
> >> >>> >>>> note
> >> >>> >>>> >> >> that
> >> >>> >>>> >> >> >>> for
> >> >>> >>>> >> >> >>> >>> all
> >> >>> >>>> >> >> >>> >>> >> >> those the integration excludes support for
> >> >>> >>>> co-reference
> >> >>> >>>> >> and
> >> >>> >>>> >> >> >>> >>> dependency
> >> >>> >>>> >> >> >>> >>> >> >> trees.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> Anyways I am confident that one can
> implement
> >> a
> >> >>> first
> >> >>> >>>> >> >> prototype
> >> >>> >>>> >> >> >>> by
> >> >>> >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if
> >> >>> available
> >> >>> >>>> -
> >> >>> >>>> >> Chunks
> >> >>> >>>> >> >> >>> (e.g.
> >> >>> >>>> >> >> >>> >>> >> >> Noun phrases).
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a
> feature
> >> >>> like
> >> >>> >>>> >> Relation
> >> >>> >>>> >> >> >>> >>> extraction
> >> >>> >>>> >> >> >>> >>> >> > would be implemented as an
> EnhancementEngine?
> >> >>> >>>> >> >> >>> >>> >> > What kind of effort would be required for a
> >> >>> >>>> co-reference
> >> >>> >>>> >> >> >>> resolution
> >> >>> >>>> >> >> >>> >>> tool
> >> >>> >>>> >> >> >>> >>> >> > integration into Stanbol?
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> Yes in the end it would be an
> EnhancementEngine.
> >> But
> >> >>> >>>> before
> >> >>> >>>> >> we
> >> >>> >>>> >> >> can
> >> >>> >>>> >> >> >>> >>> >> build such an engine we would need to
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
> >> >>> >>>> Annotations for
> >> >>> >>>> >> >> >>> >>> co-reference
> >> >>> >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing
> for
> >> >>> those
> >> >>> >>>> >> >> annotation
> >> >>> >>>> >> >> >>> so
> >> >>> >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can
> provide
> >> >>> >>>> >> co-reference
> >> >>> >>>> >> >> >>> >>> >> information
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2
> aspects:
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > 1. Determine the best data structure to
> >> >>> encapsulate
> >> >>> >>>> the
> >> >>> >>>> >> >> extracted
> >> >>> >>>> >> >> >>> >>> >> > information. I'll take a closer look at
> Dolce.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper
> >> >>> structure to
> >> >>> >>>> >> >> represent
> >> >>> >>>> >> >> >>> >>> >> Events will only pay-off if we can also
> >> successfully
> >> >>> >>>> extract
> >> >>> >>>> >> >> such
> >> >>> >>>> >> >> >>> >>> >> information form processed texts.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> I would start with
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>  * fise:SettingAnnotation
> >> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
> >> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >> >>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >> >>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >> >>> >>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation}
> >> (multiple
> >> >>> if
> >> >>> >>>> there
> >> >>> >>>> >> are
> >> >>> >>>> >> >> >>> more
> >> >>> >>>> >> >> >>> >>> >> suggestions)
> >> >>> >>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
> >> >>> >>>> >> fise:Instrument,
> >> >>> >>>> >> >> >>> >>> fise:Cause
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
> >> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >> >>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >> >>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >> >>> >>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> If it turns out that we can extract more, we
> can
> >> add
> >> >>> >>>> more
> >> >>> >>>> >> >> >>> structure to
> >> >>> >>>> >> >> >>> >>> >> those annotations. We might also think about
> >> using
> >> >>> an
> >> >>> >>>> own
> >> >>> >>>> >> >> namespace
> >> >>> >>>> >> >> >>> >>> >> for those extensions to the annotation
> structure.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> > 2. Determine how should all of this be
> >> integrated
> >> >>> into
> >> >>> >>>> >> >> Stanbol.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and
> >> configure a
> >> >>> >>>> >> enhancement
> >> >>> >>>> >> >> >>> chain
> >> >>> >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> You should have a look at
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does
> a
> >> lot
> >> >>> of
> >> >>> >>>> things
> >> >>> >>>> >> >> with
> >> >>> >>>> >> >> >>> NLP
> >> >>> >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives
> >> (via
> >> >>> >>>> verbs) to
> >> >>> >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use
> >> explicit
> >> >>> >>>> dependency
> >> >>> >>>> >> >> trees
> >> >>> >>>> >> >> >>> >>> >> you code will need to do similar things with
> >> Nouns,
> >> >>> >>>> Pronouns
> >> >>> >>>> >> and
> >> >>> >>>> >> >> >>> >>> >> Verbs.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a
> >> Java
> >> >>> >>>> >> >> representation
> >> >>> >>>> >> >> >>> of
> >> >>> >>>> >> >> >>> >>> >> present fise:TextAnnotation and
> >> >>> fise:EntityAnnotation
> >> >>> >>>> [2].
> >> >>> >>>> >> >> >>> Something
> >> >>> >>>> >> >> >>> >>> >> similar will also be required by the
> >> >>> >>>> EventExtractionEngine
> >> >>> >>>> >> for
> >> >>> >>>> >> >> fast
> >> >>> >>>> >> >> >>> >>> >> access to such annotations while iterating
> over
> >> the
> >> >>> >>>> >> Sentences of
> >> >>> >>>> >> >> >>> the
> >> >>> >>>> >> >> >>> >>> >> text.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> best
> >> >>> >>>> >> >> >>> >>> >> Rupert
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> [1]
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >>
> >> >>> >>>> >>
> >> >>> >>>>
> >> >>>
> >>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
> >> >>> >>>> >> >> >>> >>> >> [2]
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >>
> >> >>> >>>> >>
> >> >>> >>>>
> >> >>>
> >>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > Thanks
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
> >> >>> >>>> >> >> >>> >>> >> >> best
> >> >>> >>>> >> >> >>> >>> >> >> Rupert
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> --
> >> >>> >>>> >> >> >>> >>> >> >> | Rupert Westenthaler
> >> >>> >>>> >> >> rupert.westenthaler@gmail.com
> >> >>> >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
> >> >>> >>>> >> >> >>> ++43-699-11108907
> >> >>> >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> --
> >> >>> >>>> >> >> >>> >>> >> | Rupert Westenthaler
> >> >>> >>>> >> rupert.westenthaler@gmail.com
> >> >>> >>>> >> >> >>> >>> >> | Bodenlehenstraße 11
> >> >>> >>>> >> >> >>> ++43-699-11108907
> >> >>> >>>> >> >> >>> >>> >> | A-5500 Bischofshofen
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> --
> >> >>> >>>> >> >> >>> >>> | Rupert Westenthaler
> >> >>> >>>> rupert.westenthaler@gmail.com
> >> >>> >>>> >> >> >>> >>> | Bodenlehenstraße 11
> >> >>> >>>> >> >> ++43-699-11108907
> >> >>> >>>> >> >> >>> >>> | A-5500 Bischofshofen
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> --
> >> >>> >>>> >> >> >>> | Rupert Westenthaler
> >> >>> >>>> rupert.westenthaler@gmail.com
> >> >>> >>>> >> >> >>> | Bodenlehenstraße 11
> >> >>> >>>> ++43-699-11108907
> >> >>> >>>> >> >> >>> | A-5500 Bischofshofen
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >> --
> >> >>> >>>> >> >> | Rupert Westenthaler
> >> >>> rupert.westenthaler@gmail.com
> >> >>> >>>> >> >> | Bodenlehenstraße 11
> >> >>> >>>> ++43-699-11108907
> >> >>> >>>> >> >> | A-5500 Bischofshofen
> >> >>> >>>> >> >>
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >> --
> >> >>> >>>> >> | Rupert Westenthaler
> >> rupert.westenthaler@gmail.com
> >> >>> >>>> >> | Bodenlehenstraße 11
> >> >>> ++43-699-11108907
> >> >>> >>>> >> | A-5500 Bischofshofen
> >> >>> >>>> >>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> --
> >> >>> >>>> | Rupert Westenthaler
> rupert.westenthaler@gmail.com
> >> >>> >>>> | Bodenlehenstraße 11
> >> ++43-699-11108907
> >> >>> >>>> | A-5500 Bischofshofen
> >> >>> >>>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >>> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >>> | A-5500 Bischofshofen
> >> >>>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Relation extraction feature

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian

If you have start/end and type of the referenced Span you can use the according

    AnalysedText#add**

e.g.

    AnalysedText#addToken(start, end)
    AnalysedText#addChunk(start, end)

method and just use the returned instance. Those methods do all the
magic. Meaning if the referenced Span does not yet exist (forward
reference) it will create a new instance. If the Span already exists
(backward reference) you will get the existing instance including all
the other annotations already parsed from the JSON. In case of a
forward reference the Span created by you (for forward references)
other annotations will be added by the same way.

This behavior is also the reason why the constructors of the TokenImpl
and ChunkImpl (and all other **Impl) are not public.

A similar code can be found in the

    AnalyzedTextParser#parseSpan(AnalysedText at, JsonNode node)

method (o.a.s.enhancer.nlp.json module)


So if you have a reference to a Span in your Java API:

(1) parse the start/end/type of the reference
(2) call add**(start, end) on the AnalysedText
(3) add the returned Span to your set with references

If you want your references to be sorted you should use NavigableSet
instead of Set.

best
Rupert

On Sun, Sep 15, 2013 at 2:32 PM, Cristian Petroaca
<cr...@gmail.com> wrote:
> I've already started to implement the Coreference bit first in the nlp and
> nlp-json projects. There's one thing that I don't know how to implement.
> The CorefTag class contains a Set<Span> mentions member (represents the
> "mentions" array defined in an earlier mail) and in the
> CorefTagSupport.parse() method I need to reconstuct the CorefTag object
> from json. I can't figure out how can I construct the aforementioned member
> which should contain the references to mentions whch are Span objects found
> in the AnalyzedTextImpl. One problem is I don't have access to the
> AnalyzedTextImpl object and even if I did there could be situations in
> which I am constructing a CorefTag for a Span which contains mentions to
> other Spans which have not been parsed yet and they don't exist in the
> AnalyzedTextImpl.
>
> One solution would be not to link with the actual Span references from the
> AnalyzedTextImpl but to create new Span Objects (ChunkImpl, TokenImpl).
> That would need the ChunkImpl and TokenImpl constructors to be changed from
> protected to public.
>
>
> 2013/9/12 Rupert Westenthaler <ru...@gmail.com>
>
>> Hi Cristian,
>>
>> In fact I missed it. Sorry for that.
>>
>> I think the revised proposal looks like a good start. Usually one
>> needs make some adaptions when writing the actual code.
>>
>> If you have a first version attach it to an issue and I will commit it
>> to the branch.
>>
>> best
>> Rupert
>>
>>
>> On Thu, Sep 12, 2013 at 9:04 AM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > Hi Rupert,
>> >
>> > This is a reminder in case you missed this e-mail.
>> >
>> > Cristian
>> >
>> >
>> > 2013/9/3 Cristian Petroaca <cr...@gmail.com>
>> >
>> >> Ok, then to sum it up we would have :
>> >>
>> >> 1. Coref
>> >>
>> >> "stanbol.enhancer.nlp.coref" {
>> >>     "isRepresentative" : true/false, // whether this token or chunk is
>> the
>> >> representative mention in the chain
>> >>     "mentions" : [ { "type" : "Token", // type of element which refers
>> to
>> >> this token/chunk
>> >>  "start": 123 , // start index of the mentioning element
>> >>  "end": 130 // end index of the mentioning element
>> >>                     }, ...
>> >>                  ],
>> >>     "class" : ""class" :
>> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>> >> }
>> >>
>> >>
>> >> 2. Dependency tree
>> >>
>> >> "stanbol.enhancer.nlp.dependency" : {
>> >> "relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP
>> >> notation
>> >>                        "dep" : 12, // type of relation - Stanbol NLP
>> >> mapped value - ordinal number in enum Dependency
>> >> "role" : "gov/dep", // whether this token is the depender or the
>> dependee
>> >>  "type" : "Token", // type of element with which this token is in
>> relation
>> >> "start" : 123, // start index of the relating token
>> >>  "end" : 130 // end index of the relating token
>> >> },
>> >> ...
>> >>  ]
>> >> "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
>> >> }
>> >>
>> >>
>> >> 2013/9/2 Rupert Westenthaler <ru...@gmail.com>
>> >>
>> >>> Hi Cristian,
>> >>>
>> >>> let me provide some feedback to your proposals:
>> >>>
>> >>> ### Referring other Spans
>> >>>
>> >>> Both suggested annotations require to link other spans (Sentence,
>> >>> Chunk or Token). For that we should introduce a JSON element used for
>> >>> referring those elements and use it for all usages.
>> >>>
>> >>> In the java model this would allow you to have a reference to the
>> >>> other Span (Sentence, Chunk, Token). In the serialized form you would
>> >>> have JSON elements with the "type", "start" and "end" attributes as
>> >>> those three uniquely identify any span.
>> >>>
>> >>> Here an example based on the "mention" attribute as defined by the
>> >>> proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>> >>>
>> >>>     ...
>> >>>     "mentions" : [ {
>> >>>         "type" : "Token",
>> >>>         "start": 123 ,
>> >>>         "end": 130 } ,{
>> >>>         "type" : "Token",
>> >>>         "start": 157 ,
>> >>>         "end": 165 }],
>> >>>     ...
>> >>>
>> >>> Similar token links in
>> >>> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should also
>> >>> use this model.
>> >>>
>> >>> ### Usage of Controlled Vocabularies
>> >>>
>> >>> In addition the DependencyTag also seams to use a controlled
>> >>> vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
>> >>> NLP module tries to define those in some kind of Ontology. For POS
>> >>> tags we use OLIA ontology [1]. This is important as most NLP
>> >>> frameworks will use different strings and we need to unify those to
>> >>> commons IDs so that component that consume those data do not depend on
>> >>> a specific NLP tool.
>> >>>
>> >>> Because the usage of Ontologies within Java is not well supported. The
>> >>> Stanbol NLP module defines Java Enumerations for those Ontologies such
>> >>> as the POS type enumeration [2].
>> >>>
>> >>> Both the Java Model as well as the JSON serialization do support both
>> >>> (1) the lexical tag as used by the NLP tool and (2) the mapped
>> >>> concept. In the Java API via two different methods and in the JSON
>> >>> serialization via two separate keys.
>> >>>
>> >>> To make this more clear here an example for a POS annotation of a
>> proper
>> >>> noun.
>> >>>
>> >>>     "stanbol.enhancer.nlp.pos" : {
>> >>>         "tag" : "PN",
>> >>>         "pos" : 53,
>> >>>         "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
>> >>>         "prob" : 0.95
>> >>>     }
>> >>>
>> >>> where
>> >>>
>> >>>     "tag" : "PN"
>> >>>
>> >>> is the lexical form as used by the NLP tool and
>> >>>
>> >>>     "pos" : 53
>> >>>
>> >>> refers to the ordinal number of the entry "ProperNoun" in the POS
>> >>> enumeration
>> >>>
>> >>> IMO the "type" property of DependencyTag should use a similar design.
>> >>>
>> >>> best
>> >>> Rupert
>> >>>
>> >>> [1] http://olia.nlp2rdf.org/
>> >>> [2]
>> >>>
>> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java
>> >>>
>> >>> On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
>> >>> <cr...@gmail.com> wrote:
>> >>> > Sorry, pressed sent too soon :).
>> >>> >
>> >>> > Continued :
>> >>> >
>> >>> > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
>> >>> > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
>> >>> >
>> >>> > Given this, we can have for each "Token" an additional dependency
>> >>> > annotation :
>> >>> >
>> >>> > "stanbol.enhancer.nlp.dependency" : {
>> >>> > "tag" : //is it necessary?
>> >>> > "relations" : [ { "type" : "nsubj", //type of relation
>> >>> >   "role" : "gov/dep", //whether it is depender or the dependee
>> >>> >   "dependencyValue" : "met", // the word with which the token has a
>> >>> relation
>> >>> >   "dependencyIndexInSentence" : "2" //the index of the dependency in
>> the
>> >>> > current sentence
>> >>> > }
>> >>> > ...
>> >>> > ]
>> >>> >                 "class" :
>> >>> > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
>> >>> >         }
>> >>> >
>> >>> > 2013/9/1 Cristian Petroaca <cr...@gmail.com>
>> >>> >
>> >>> >> Related to the Stanford Dependency Tree Feature, this is the way the
>> >>> >> output from the tool looks like for this sentence : "Mary and Tom
>> met
>> >>> Danny
>> >>> >> today" :
>> >>> >>
>> >>> >>
>> >>> >> 2013/8/30 Cristian Petroaca <cr...@gmail.com>
>> >>> >>
>> >>> >>> Hi Rupert,
>> >>> >>>
>> >>> >>> Ok, so after looking at the JSON output from the Stanford NLP
>> Server
>> >>> and
>> >>> >>> the coref module I'm thinking I can represent the coreference
>> >>> information
>> >>> >>> this way:
>> >>> >>> Each "Token" or "Chunk" will contain an additional coref annotation
>> >>> with
>> >>> >>> the following structure :
>> >>> >>>
>> >>> >>> "stanbol.enhancer.nlp.coref" {
>> >>> >>>     "tag" : //does this need to exist?
>> >>> >>>     "isRepresentative" : true/false, // whether this token or
>> chunk is
>> >>> >>> the representative mention in the chain
>> >>> >>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the
>> >>> mention
>> >>> >>> is found
>> >>> >>>                            "startWord" : 2 //the first word making
>> up
>> >>> the
>> >>> >>> mention
>> >>> >>>                            "endWord" : 3 //the last word making up
>> the
>> >>> >>> mention
>> >>> >>>                          }, ...
>> >>> >>>                        ],
>> >>> >>>     "class" : ""class" :
>> >>> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>> >>> >>> }
>> >>> >>>
>> >>> >>> The CorefTag should resemble this model.
>> >>> >>>
>> >>> >>> What do you think?
>> >>> >>>
>> >>> >>> Cristian
>> >>> >>>
>> >>> >>>
>> >>> >>> 2013/8/24 Rupert Westenthaler <ru...@gmail.com>
>> >>> >>>
>> >>> >>>> Hi Cristian,
>> >>> >>>>
>> >>> >>>> you can not directly call StanfordNLP components from Stanbol, but
>> >>> you
>> >>> >>>> have to extend the RESTful service to include the information you
>> >>> >>>> need. The main reason for that is that the license of StanfordNLP
>> is
>> >>> >>>> not compatible with the Apache Software License. So Stanbol can
>> not
>> >>> >>>> directly link to the StanfordNLP API.
>> >>> >>>>
>> >>> >>>> You will need to
>> >>> >>>>
>> >>> >>>> 1. define an additional class {yourTag} extends Tag<{yourType}>
>> class
>> >>> >>>> in the o.a.s.enhancer.nlp module
>> >>> >>>> 2. add JSON parsing and serialization support for this tag to the
>> >>> >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an
>> example)
>> >>> >>>>
>> >>> >>>> As (1) would be necessary anyway the only additional thing you
>> need
>> >>> to
>> >>> >>>> develop is (2). After that you can add {yourTag} instance to the
>> >>> >>>> AnalyzedText in the StanfornNLP integration. The
>> >>> >>>> RestfulNlpAnalysisEngine will parse them from the response. All
>> >>> >>>> engines executed after the RestfulNlpAnalysisEngine will have
>> access
>> >>> >>>> to your annotations.
>> >>> >>>>
>> >>> >>>> If you have a design for {yourTag} - the model you would like to
>> use
>> >>> >>>> to represent your data - I can help with (1) and (2).
>> >>> >>>>
>> >>> >>>> best
>> >>> >>>> Rupert
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
>> >>> >>>> <cr...@gmail.com> wrote:
>> >>> >>>> > Hi Rupert,
>> >>> >>>> >
>> >>> >>>> > Thanks for the info. Looking at the standbol-stanfordnlp
>> project I
>> >>> see
>> >>> >>>> that
>> >>> >>>> > the stanford nlp is not implemented as an EnhancementEngine but
>> >>> rather
>> >>> >>>> it
>> >>> >>>> > is used directly in a Jetty Server instance. How does that fit
>> >>> into the
>> >>> >>>> > Stanbol stack? For example how can I call the
>> StanfordNlpAnalyzer's
>> >>> >>>> routine
>> >>> >>>> > from my TripleExtractionEnhancementEngine which lives in the
>> >>> Stanbol
>> >>> >>>> stack?
>> >>> >>>> >
>> >>> >>>> > Thanks,
>> >>> >>>> > Cristian
>> >>> >>>> >
>> >>> >>>> >
>> >>> >>>> > 2013/8/12 Rupert Westenthaler <ru...@gmail.com>
>> >>> >>>> >
>> >>> >>>> >> Hi Cristian,
>> >>> >>>> >>
>> >>> >>>> >> Sorry for the late response, but I was offline for the last two
>> >>> weeks
>> >>> >>>> >>
>> >>> >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
>> >>> >>>> >> <cr...@gmail.com> wrote:
>> >>> >>>> >> > Hi Rupert,
>> >>> >>>> >> >
>> >>> >>>> >> > After doing some tests it seems that the Stanford NLP
>> >>> coreference
>> >>> >>>> module
>> >>> >>>> >> is
>> >>> >>>> >> > much more accurate than the Open NLP one.So I decided to
>> extend
>> >>> >>>> Stanford
>> >>> >>>> >> > NLP to add coreference there.
>> >>> >>>> >>
>> >>> >>>> >> The Stanford NLP integration is not part of the Stanbol
>> codebase
>> >>> >>>> >> because the licenses are not compatible.
>> >>> >>>> >>
>> >>> >>>> >> You can find the Stanford NLP integration on
>> >>> >>>> >>
>> >>> >>>> >>     https://github.com/westei/stanbol-stanfordnlp
>> >>> >>>> >>
>> >>> >>>> >> just create a fork and send pull requests.
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> > Could you add the necessary projects on the branch? And also
>> >>> remove
>> >>> >>>> the
>> >>> >>>> >> > Open NLP ones?
>> >>> >>>> >> >
>> >>> >>>> >>
>> >>> >>>> >> Currently the branch
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>>
>> >>>
>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>> >>> >>>> >>
>> >>> >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those
>> >>> should
>> >>> >>>> >> be enough for adding coreference support.
>> >>> >>>> >>
>> >>> >>>> >> IMO you will need to
>> >>> >>>> >>
>> >>> >>>> >> * add an model for representing coreference to the nlp module
>> >>> >>>> >> * add parsing and serializing support to the nlp-json module
>> >>> >>>> >> * add the implementation to your fork of the
>> stanbol-stanfordnlp
>> >>> >>>> project
>> >>> >>>> >>
>> >>> >>>> >> best
>> >>> >>>> >> Rupert
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> > Thanks,
>> >>> >>>> >> > Cristian
>> >>> >>>> >> >
>> >>> >>>> >> >
>> >>> >>>> >> > 2013/7/5 Rupert Westenthaler <ru...@gmail.com>
>> >>> >>>> >> >
>> >>> >>>> >> >> Hi Cristian,
>> >>> >>>> >> >>
>> >>> >>>> >> >> I created the branch at
>> >>> >>>> >> >>
>> >>> >>>> >> >>
>> >>> >>>> >> >>
>> >>> >>>> >>
>> >>> >>>>
>> >>>
>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>> >>> >>>> >> >>
>> >>> >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me
>> >>> know
>> >>> >>>> if
>> >>> >>>> >> >> you would like to have more
>> >>> >>>> >> >>
>> >>> >>>> >> >> best
>> >>> >>>> >> >> Rupert
>> >>> >>>> >> >>
>> >>> >>>> >> >>
>> >>> >>>> >> >>
>> >>> >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
>> >>> >>>> >> >> <cr...@gmail.com> wrote:
>> >>> >>>> >> >> > Hi Rupert,
>> >>> >>>> >> >> >
>> >>> >>>> >> >> > I created jiras :
>> >>> >>>> https://issues.apache.org/jira/browse/STANBOL-1132and
>> >>> >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
>> >>> >>>> original one
>> >>> >>>> >> in
>> >>> >>>> >> >> > dependent upon these.
>> >>> >>>> >> >> > Please let me know when I can start using the branch.
>> >>> >>>> >> >> >
>> >>> >>>> >> >> > Thanks,
>> >>> >>>> >> >> > Cristian
>> >>> >>>> >> >> >
>> >>> >>>> >> >> >
>> >>> >>>> >> >> > 2013/6/27 Cristian Petroaca <cr...@gmail.com>
>> >>> >>>> >> >> >
>> >>> >>>> >> >> >>
>> >>> >>>> >> >> >>
>> >>> >>>> >> >> >>
>> >>> >>>> >> >> >> 2013/6/27 Rupert Westenthaler <
>> >>> rupert.westenthaler@gmail.com>
>> >>> >>>> >> >> >>
>> >>> >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
>> >>> >>>> >> >> >>> <cr...@gmail.com> wrote:
>> >>> >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
>> >>> >>>> previous
>> >>> >>>> >> >> e-mail.
>> >>> >>>> >> >> >>> By
>> >>> >>>> >> >> >>> > the way, does Open NLP have the ability to build
>> >>> dependency
>> >>> >>>> trees?
>> >>> >>>> >> >> >>> >
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>
>> >>> >>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
>> >>> >>>> Stanbol,
>> >>> >>>> >> I'll
>> >>> >>>> >> >> >> take a look at how I can extend its integration to
>> include
>> >>> the
>> >>> >>>> >> >> dependency
>> >>> >>>> >> >> >> tree feature.
>> >>> >>>> >> >> >>
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>  >
>> >>> >>>> >> >> >>> > 2013/6/23 Cristian Petroaca <
>> cristian.petroaca@gmail.com
>> >>> >
>> >>> >>>> >> >> >>> >
>> >>> >>>> >> >> >>> >> Hi Rupert,
>> >>> >>>> >> >> >>> >>
>> >>> >>>> >> >> >>> >> I created jira
>> >>> >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
>> >>> >>>> >> >> >>> >> As you suggested I would start with extending the
>> >>> Stanford
>> >>> >>>> NLP
>> >>> >>>> >> with
>> >>> >>>> >> >> >>> >> co-reference resolution but I think also with
>> dependency
>> >>> >>>> trees
>> >>> >>>> >> >> because
>> >>> >>>> >> >> >>> I
>> >>> >>>> >> >> >>> >> also need to know the Subject of the sentence and the
>> >>> object
>> >>> >>>> >> that it
>> >>> >>>> >> >> >>> >> affects, right?
>> >>> >>>> >> >> >>> >>
>> >>> >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in
>> >>> Stanbol
>> >>> >>>> for
>> >>> >>>> >> >> >>> >> co-reference and dependency trees, how do I proceed
>> with
>> >>> >>>> this?
>> >>> >>>> >> Do I
>> >>> >>>> >> >> >>> create
>> >>> >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After
>> that
>> >>> can I
>> >>> >>>> >> start
>> >>> >>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm
>> >>> done
>> >>> >>>> I'll
>> >>> >>>> >> send
>> >>> >>>> >> >> >>> you
>> >>> >>>> >> >> >>> >> guys the patch fo review?
>> >>> >>>> >> >> >>> >>
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>> I would create two "New Feature" type Issues one for
>> adding
>> >>> >>>> support
>> >>> >>>> >> >> >>> for "dependency trees" and the other for "co-reference"
>> >>> >>>> support. You
>> >>> >>>> >> >> >>> should also define "depends on" relations between
>> >>> STANBOL-1121
>> >>> >>>> and
>> >>> >>>> >> >> >>> those two new issues.
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>> Sub-task could also work, but as adding those features
>> >>> would
>> >>> >>>> be also
>> >>> >>>> >> >> >>> interesting for other things I would rather define them
>> as
>> >>> >>>> separate
>> >>> >>>> >> >> >>> issues.
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >> 2 New Features connected with the original jira it is
>> then.
>> >>> >>>> >> >> >>
>> >>> >>>> >> >> >>
>> >>> >>>> >> >> >>> If you would prefer to work in an own branch please tell
>> >>> me.
>> >>> >>>> This
>> >>> >>>> >> >> >>> could have the advantage that patches would not be
>> >>> affected by
>> >>> >>>> >> changes
>> >>> >>>> >> >> >>> in the trunk.
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>> Yes, a separate branch sounds good.
>> >>> >>>> >> >> >>
>> >>> >>>> >> >> >> best
>> >>> >>>> >> >> >>> Rupert
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>> >> Regards,
>> >>> >>>> >> >> >>> >> Cristian
>> >>> >>>> >> >> >>> >>
>> >>> >>>> >> >> >>> >>
>> >>> >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
>> >>> >>>> rupert.westenthaler@gmail.com>
>> >>> >>>> >> >> >>> >>
>> >>> >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>> >>> >>>> >> >> >>> >>> <cr...@gmail.com> wrote:
>> >>> >>>> >> >> >>> >>> > Hi Rupert,
>> >>> >>>> >> >> >>> >>> >
>> >>> >>>> >> >> >>> >>> > Agreed on the
>> >>> >>>> >> >> >>>
>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>> >>> >>>> >> >> >>> >>> > data structure.
>> >>> >>>> >> >> >>> >>> >
>> >>> >>>> >> >> >>> >>> > Should I open up a Jira for all of this in order
>> to
>> >>> >>>> >> encapsulate
>> >>> >>>> >> >> this
>> >>> >>>> >> >> >>> >>> > information and establish the goals and these
>> initial
>> >>> >>>> steps
>> >>> >>>> >> >> towards
>> >>> >>>> >> >> >>> >>> these
>> >>> >>>> >> >> >>> >>> > goals?
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be
>> great.
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>> >>> > How should I proceed further? Should I create some
>> >>> design
>> >>> >>>> >> >> documents
>> >>> >>>> >> >> >>> that
>> >>> >>>> >> >> >>> >>> > need to be reviewed?
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>> >>> Usually it is the best to write design related text
>> >>> >>>> directly in
>> >>> >>>> >> >> JIRA
>> >>> >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us
>> later
>> >>> to
>> >>> >>>> use
>> >>> >>>> >> this
>> >>> >>>> >> >> >>> >>> text directly for the documentation on the Stanbol
>> >>> Webpage.
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>> >>> best
>> >>> >>>> >> >> >>> >>> Rupert
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
>> >>> >>>> >> >> >>> >>> >
>> >>> >>>> >> >> >>> >>> > Regards,
>> >>> >>>> >> >> >>> >>> > Cristian
>> >>> >>>> >> >> >>> >>> >
>> >>> >>>> >> >> >>> >>> >
>> >>> >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
>> >>> >>>> rupert.westenthaler@gmail.com>
>> >>> >>>> >> >> >>> >>> >
>> >>> >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian
>> Petroaca
>> >>> >>>> >> >> >>> >>> >> <cr...@gmail.com> wrote:
>> >>> >>>> >> >> >>> >>> >> > HI Rupert,
>> >>> >>>> >> >> >>> >>> >> >
>> >>> >>>> >> >> >>> >>> >> > First of all thanks for the detailed
>> suggestions.
>> >>> >>>> >> >> >>> >>> >> >
>> >>> >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
>> >>> >>>> >> rupert.westenthaler@gmail.com>
>> >>> >>>> >> >> >>> >>> >> >
>> >>> >>>> >> >> >>> >>> >> >> Hi Cristian, all
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> really interesting use case!
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> In this mail I will try to give some
>> suggestions
>> >>> on
>> >>> >>>> how
>> >>> >>>> >> this
>> >>> >>>> >> >> >>> could
>> >>> >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
>> >>> >>>> experiences
>> >>> >>>> >> >> and
>> >>> >>>> >> >> >>> >>> lessons
>> >>> >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we
>> built an
>> >>> >>>> >> information
>> >>> >>>> >> >> >>> system
>> >>> >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this
>> >>> Project
>> >>> >>>> >> excluded
>> >>> >>>> >> >> the
>> >>> >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text
>> >>> (because
>> >>> >>>> the
>> >>> >>>> >> >> Olympic
>> >>> >>>> >> >> >>> >>> >> >> Information System was already providing event
>> >>> data
>> >>> >>>> as XML
>> >>> >>>> >> >> >>> messages)
>> >>> >>>> >> >> >>> >>> >> >> the semantic search capabilities of this
>> system
>> >>> >>>> where very
>> >>> >>>> >> >> >>> similar
>> >>> >>>> >> >> >>> >>> as
>> >>> >>>> >> >> >>> >>> >> >> the one described by your use case.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract
>> >>> relations,
>> >>> >>>> but a
>> >>> >>>> >> >> formal
>> >>> >>>> >> >> >>> >>> >> >> representation of the situation described by
>> the
>> >>> >>>> text. So
>> >>> >>>> >> >> lets
>> >>> >>>> >> >> >>> >>> assume
>> >>> >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or
>> >>> Situation)
>> >>> >>>> >> >> described
>> >>> >>>> >> >> >>> in
>> >>> >>>> >> >> >>> >>> the
>> >>> >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
>> >>> >>>> advices on
>> >>> >>>> >> >> how to
>> >>> >>>> >> >> >>> >>> model
>> >>> >>>> >> >> >>> >>> >> >> those. The important relation for modeling
>> this
>> >>> >>>> >> >> Participation:
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> where ..
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants
>> do
>> >>> have
>> >>> >>>> an
>> >>> >>>> >> >> >>> identity so
>> >>> >>>> >> >> >>> >>> we
>> >>> >>>> >> >> >>> >>> >> >> would typically refer to them as Entities
>> >>> referenced
>> >>> >>>> by a
>> >>> >>>> >> >> >>> setting.
>> >>> >>>> >> >> >>> >>> >> >> Note that this includes physical,
>> non-physical as
>> >>> >>>> well as
>> >>> >>>> >> >> >>> >>> >> >> social-objects.
>> >>> >>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants
>> >>> are
>> >>> >>>> >> entities
>> >>> >>>> >> >> that
>> >>> >>>> >> >> >>> >>> >> >> happen in time. This refers to Events,
>> >>> Activities ...
>> >>> >>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
>> >>> >>>> relation
>> >>> >>>> >> where
>> >>> >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
>> >>> >>>> intermediate
>> >>> >>>> >> >> >>> resources
>> >>> >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary
>> relations.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy
>> to
>> >>> >>>> define
>> >>> >>>> >> one
>> >>> >>>> >> >> >>> resource
>> >>> >>>> >> >> >>> >>> >> >> being the context for all described data. I
>> would
>> >>> >>>> call
>> >>> >>>> >> this
>> >>> >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
>> >>> >>>> sub-concept to
>> >>> >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement
>> about
>> >>> the
>> >>> >>>> >> extracted
>> >>> >>>> >> >> >>> >>> Setting
>> >>> >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to
>> it.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to
>> >>> annotate
>> >>> >>>> that
>> >>> >>>> >> >> >>> Endurant is
>> >>> >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
>> >>> >>>> >> >> >>> fise:SettingAnnotation).
>> >>> >>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
>> >>> >>>> >> >> fise:TextAnnotaion
>> >>> >>>> >> >> >>> (the
>> >>> >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
>> >>> >>>> Entities).
>> >>> >>>> >> >> >>> Basically
>> >>> >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
>> >>> >>>> >> >> EnhancementEngine
>> >>> >>>> >> >> >>> to
>> >>> >>>> >> >> >>> >>> >> >> state that several mentions (in possible
>> >>> different
>> >>> >>>> >> >> sentences) do
>> >>> >>>> >> >> >>> >>> >> >> represent the same Endurant as participating
>> in
>> >>> the
>> >>> >>>> >> Setting.
>> >>> >>>> >> >> In
>> >>> >>>> >> >> >>> >>> >> >> addition it would be possible to use the
>> dc:type
>> >>> >>>> property
>> >>> >>>> >> >> >>> (similar
>> >>> >>>> >> >> >>> >>> as
>> >>> >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the
>> role(s)
>> >>> of
>> >>> >>>> an
>> >>> >>>> >> >> >>> participant
>> >>> >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs
>> an
>> >>> >>>> action)
>> >>> >>>> >> Cause
>> >>> >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
>> >>> >>>> passive
>> >>> >>>> >> role
>> >>> >>>> >> >> in
>> >>> >>>> >> >> >>> an
>> >>> >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)),
>> but
>> >>> I am
>> >>> >>>> >> >> wondering
>> >>> >>>> >> >> >>> if
>> >>> >>>> >> >> >>> >>> one
>> >>> >>>> >> >> >>> >>> >> >> could extract those information.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to
>> annotate a
>> >>> >>>> >> Perdurant
>> >>> >>>> >> >> in
>> >>> >>>> >> >> >>> the
>> >>> >>>> >> >> >>> >>> >> >> context of the Setting. Also
>> >>> >>>> fise:OccurrentAnnotation can
>> >>> >>>> >> >> link
>> >>> >>>> >> >> >>> to
>> >>> >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the
>> text
>> >>> >>>> defining
>> >>> >>>> >> the
>> >>> >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
>> >>> >>>> suggesting
>> >>> >>>> >> well
>> >>> >>>> >> >> >>> known
>> >>> >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election
>> in a
>> >>> >>>> country,
>> >>> >>>> >> or
>> >>> >>>> >> >> an
>> >>> >>>> >> >> >>> >>> >> >> upraising ...). In addition
>> >>> fise:OccurrentAnnotation
>> >>> >>>> can
>> >>> >>>> >> >> define
>> >>> >>>> >> >> >>> >>> >> >> dc:has-participant links to
>> >>> >>>> fise:ParticipantAnnotation. In
>> >>> >>>> >> >> this
>> >>> >>>> >> >> >>> case
>> >>> >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
>> >>> >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
>> >>> >>>> Perturant
>> >>> >>>> >> (the
>> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
>> >>> >>>> temporal
>> >>> >>>> >> >> indexed
>> >>> >>>> >> >> >>> this
>> >>> >>>> >> >> >>> >>> >> >> annotation should also support properties for
>> >>> >>>> defining the
>> >>> >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a
>> >>> lot of
>> >>> >>>> sense
>> >>> >>>> >> >> with
>> >>> >>>> >> >> >>> the
>> >>> >>>> >> >> >>> >>> >> remark
>> >>> >>>> >> >> >>> >>> >> > that you probably won't be able to always
>> extract
>> >>> the
>> >>> >>>> date
>> >>> >>>> >> >> for a
>> >>> >>>> >> >> >>> >>> given
>> >>> >>>> >> >> >>> >>> >> > setting(situation).
>> >>> >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
>> >>> >>>> >> >> >>> >>> >> >
>> >>> >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in
>> which
>> >>> the
>> >>> >>>> >> object
>> >>> >>>> >> >> upon
>> >>> >>>> >> >> >>> >>> which
>> >>> >>>> >> >> >>> >>> >> the
>> >>> >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a
>> >>> transitory
>> >>> >>>> >> object (
>> >>> >>>> >> >> >>> such
>> >>> >>>> >> >> >>> >>> as an
>> >>> >>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant.
>> For
>> >>> >>>> example
>> >>> >>>> >> we
>> >>> >>>> >> >> can
>> >>> >>>> >> >> >>> >>> have
>> >>> >>>> >> >> >>> >>> >> the
>> >>> >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the
>> >>> Endurant
>> >>> >>>> (
>> >>> >>>> >> >> Subject )
>> >>> >>>> >> >> >>> >>> which
>> >>> >>>> >> >> >>> >>> >> > performs the action of "invading" on another
>> >>> >>>> Eundurant,
>> >>> >>>> >> namely
>> >>> >>>> >> >> >>> >>> "Irak".
>> >>> >>>> >> >> >>> >>> >> >
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq
>> the
>> >>> >>>> Patient.
>> >>> >>>> >> Both
>> >>> >>>> >> >> >>> are
>> >>> >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
>> >>> >>>> Perdurant. So
>> >>> >>>> >> >> >>> ideally
>> >>> >>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the
>> >>> dc:type
>> >>> >>>> >> >> caos:Agent,
>> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
>> >>> >>>> >> >> >>> fise:EntityAnnotation
>> >>> >>>> >> >> >>> >>> >> linking to dbpedia:United_States
>> >>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the
>> >>> dc:type
>> >>> >>>> >> >> >>> caos:Patient,
>> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
>> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>> >>> >>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with
>> the
>> >>> >>>> dc:type
>> >>> >>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation
>> for
>> >>> >>>> "invades"
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject
>> >>> and
>> >>> >>>> the
>> >>> >>>> >> Object
>> >>> >>>> >> >> >>> come
>> >>> >>>> >> >> >>> >>> into
>> >>> >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
>> >>> >>>> >> dc:"property"
>> >>> >>>> >> >> >>> where
>> >>> >>>> >> >> >>> >>> the
>> >>> >>>> >> >> >>> >>> >> > property = verb which links to the Object in
>> noun
>> >>> >>>> form. For
>> >>> >>>> >> >> >>> example
>> >>> >>>> >> >> >>> >>> take
>> >>> >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You
>> would
>> >>> have
>> >>> >>>> the
>> >>> >>>> >> >> "USA"
>> >>> >>>> >> >> >>> >>> Entity
>> >>> >>>> >> >> >>> >>> >> with
>> >>> >>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak".
>> The
>> >>> >>>> Endurant
>> >>> >>>> >> >> would
>> >>> >>>> >> >> >>> >>> have as
>> >>> >>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs
>> >>> which
>> >>> >>>> link
>> >>> >>>> >> it
>> >>> >>>> >> >> to
>> >>> >>>> >> >> >>> an
>> >>> >>>> >> >> >>> >>> >> Object.
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> As explained above you would have a
>> >>> >>>> fise:OccurrentAnnotation
>> >>> >>>> >> >> that
>> >>> >>>> >> >> >>> >>> >> represents the Perdurant. The information that
>> the
>> >>> >>>> activity
>> >>> >>>> >> >> >>> mention in
>> >>> >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
>> >>> >>>> >> >> >>> fise:TextAnnotation. If
>> >>> >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that
>> >>> defines
>> >>> >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation
>> could
>> >>> >>>> also link
>> >>> >>>> >> >> to an
>> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> best
>> >>> >>>> >> >> >>> >>> >> Rupert
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> >
>> >>> >>>> >> >> >>> >>> >> > ### Consuming the data:
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> I think this model should be sufficient for
>> >>> >>>> use-cases as
>> >>> >>>> >> >> >>> described
>> >>> >>>> >> >> >>> >>> by
>> >>> >>>> >> >> >>> >>> >> you.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> Users would be able to consume data on the
>> >>> setting
>> >>> >>>> level.
>> >>> >>>> >> >> This
>> >>> >>>> >> >> >>> can
>> >>> >>>> >> >> >>> >>> be
>> >>> >>>> >> >> >>> >>> >> >> done my simple retrieving all
>> >>> >>>> fise:ParticipantAnnotation
>> >>> >>>> >> as
>> >>> >>>> >> >> >>> well as
>> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a
>> setting.
>> >>> BTW
>> >>> >>>> this
>> >>> >>>> >> was
>> >>> >>>> >> >> the
>> >>> >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic
>> search. It
>> >>> >>>> allows
>> >>> >>>> >> >> >>> queries for
>> >>> >>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g.
>> you
>> >>> >>>> could
>> >>> >>>> >> filter
>> >>> >>>> >> >> >>> for
>> >>> >>>> >> >> >>> >>> >> >> Settings that involve a {Person},
>> >>> >>>> activities:Arrested and
>> >>> >>>> >> a
>> >>> >>>> >> >> >>> specific
>> >>> >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this
>> approach
>> >>> >>>> you will
>> >>> >>>> >> >> get
>> >>> >>>> >> >> >>> >>> results
>> >>> >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated
>> and
>> >>> an
>> >>> >>>> other
>> >>> >>>> >> >> person
>> >>> >>>> >> >> >>> was
>> >>> >>>> >> >> >>> >>> >> >> arrested.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> An other possibility would be to process
>> >>> enhancement
>> >>> >>>> >> results
>> >>> >>>> >> >> on
>> >>> >>>> >> >> >>> the
>> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to
>> a
>> >>> much
>> >>> >>>> >> higher
>> >>> >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to
>> >>> correctly
>> >>> >>>> answer
>> >>> >>>> >> >> the
>> >>> >>>> >> >> >>> query
>> >>> >>>> >> >> >>> >>> >> >> used as an example above). But I am wondering
>> if
>> >>> the
>> >>> >>>> >> quality
>> >>> >>>> >> >> of
>> >>> >>>> >> >> >>> the
>> >>> >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for
>> this. I
>> >>> >>>> have
>> >>> >>>> >> also
>> >>> >>>> >> >> >>> doubts
>> >>> >>>> >> >> >>> >>> if
>> >>> >>>> >> >> >>> >>> >> >> this can be still realized by using semantic
>> >>> >>>> indexing to
>> >>> >>>> >> >> Apache
>> >>> >>>> >> >> >>> Solr
>> >>> >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store
>> >>> results
>> >>> >>>> in a
>> >>> >>>> >> >> >>> TripleStore
>> >>> >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> The methodology and query language used by
>> YAGO
>> >>> [3]
>> >>> >>>> is
>> >>> >>>> >> also
>> >>> >>>> >> >> very
>> >>> >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7
>> >>> SPOTL(X)
>> >>> >>>> >> >> >>> >>> Representation).
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of
>> >>> Entities
>> >>> >>>> >> >> (especially
>> >>> >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
>> >>> >>>> extracted
>> >>> >>>> >> form
>> >>> >>>> >> >> >>> >>> Documents.
>> >>> >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
>> >>> >>>> temporal
>> >>> >>>> >> >> indexed.
>> >>> >>>> >> >> >>> That
>> >>> >>>> >> >> >>> >>> >> >> means that at the time when added to a
>> knowledge
>> >>> >>>> base they
>> >>> >>>> >> >> might
>> >>> >>>> >> >> >>> >>> still
>> >>> >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
>> >>> >>>> refinement
>> >>> >>>> >> of
>> >>> >>>> >> >> such
>> >>> >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
>> >>> >>>> critical for
>> >>> >>>> >> a
>> >>> >>>> >> >> >>> System
>> >>> >>>> >> >> >>> >>> >> >> like described in your use-case.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian
>> >>> Petroaca
>> >>> >>>> >> >> >>> >>> >> >> <cr...@gmail.com> wrote:
>> >>> >>>> >> >> >>> >>> >> >> >
>> >>> >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new
>> >>> in the
>> >>> >>>> >> field
>> >>> >>>> >> >> of
>> >>> >>>> >> >> >>> >>> semantic
>> >>> >>>> >> >> >>> >>> >> >> > technologies, I've started to read about
>> them
>> >>> in
>> >>> >>>> the
>> >>> >>>> >> last
>> >>> >>>> >> >> 4-5
>> >>> >>>> >> >> >>> >>> >> >> months.Having
>> >>> >>>> >> >> >>> >>> >> >> > said that I have a high level overview of
>> what
>> >>> is
>> >>> >>>> a good
>> >>> >>>> >> >> >>> approach
>> >>> >>>> >> >> >>> >>> to
>> >>> >>>> >> >> >>> >>> >> >> solve
>> >>> >>>> >> >> >>> >>> >> >> > this problem. There are a number of papers
>> on
>> >>> the
>> >>> >>>> >> internet
>> >>> >>>> >> >> >>> which
>> >>> >>>> >> >> >>> >>> >> describe
>> >>> >>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named
>> >>> entity
>> >>> >>>> >> >> >>> recognition,
>> >>> >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and
>> >>> others.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently
>> only
>> >>> >>>> supports
>> >>> >>>> >> >> >>> sentence
>> >>> >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging,
>> Chunking,
>> >>> NER
>> >>> >>>> and
>> >>> >>>> >> >> lemma.
>> >>> >>>> >> >> >>> >>> support
>> >>> >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency
>> trees
>> >>> is
>> >>> >>>> >> currently
>> >>> >>>> >> >> >>> >>> missing.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with
>> Stanbol
>> >>> [4].
>> >>> >>>> At
>> >>> >>>> >> the
>> >>> >>>> >> >> >>> moment
>> >>> >>>> >> >> >>> >>> it
>> >>> >>>> >> >> >>> >>> >> >> only supports English, but I do already work
>> to
>> >>> >>>> include
>> >>> >>>> >> the
>> >>> >>>> >> >> >>> other
>> >>> >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that
>> is
>> >>> >>>> already
>> >>> >>>> >> >> >>> integrated
>> >>> >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane
>> [6].
>> >>> But
>> >>> >>>> note
>> >>> >>>> >> >> that
>> >>> >>>> >> >> >>> for
>> >>> >>>> >> >> >>> >>> all
>> >>> >>>> >> >> >>> >>> >> >> those the integration excludes support for
>> >>> >>>> co-reference
>> >>> >>>> >> and
>> >>> >>>> >> >> >>> >>> dependency
>> >>> >>>> >> >> >>> >>> >> >> trees.
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement
>> a
>> >>> first
>> >>> >>>> >> >> prototype
>> >>> >>>> >> >> >>> by
>> >>> >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if
>> >>> available
>> >>> >>>> -
>> >>> >>>> >> Chunks
>> >>> >>>> >> >> >>> (e.g.
>> >>> >>>> >> >> >>> >>> >> >> Noun phrases).
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature
>> >>> like
>> >>> >>>> >> Relation
>> >>> >>>> >> >> >>> >>> extraction
>> >>> >>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
>> >>> >>>> >> >> >>> >>> >> > What kind of effort would be required for a
>> >>> >>>> co-reference
>> >>> >>>> >> >> >>> resolution
>> >>> >>>> >> >> >>> >>> tool
>> >>> >>>> >> >> >>> >>> >> > integration into Stanbol?
>> >>> >>>> >> >> >>> >>> >> >
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine.
>> But
>> >>> >>>> before
>> >>> >>>> >> we
>> >>> >>>> >> >> can
>> >>> >>>> >> >> >>> >>> >> build such an engine we would need to
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
>> >>> >>>> Annotations for
>> >>> >>>> >> >> >>> >>> co-reference
>> >>> >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for
>> >>> those
>> >>> >>>> >> >> annotation
>> >>> >>>> >> >> >>> so
>> >>> >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
>> >>> >>>> >> co-reference
>> >>> >>>> >> >> >>> >>> >> information
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
>> >>> >>>> >> >> >>> >>> >> >
>> >>> >>>> >> >> >>> >>> >> > 1. Determine the best data structure to
>> >>> encapsulate
>> >>> >>>> the
>> >>> >>>> >> >> extracted
>> >>> >>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper
>> >>> structure to
>> >>> >>>> >> >> represent
>> >>> >>>> >> >> >>> >>> >> Events will only pay-off if we can also
>> successfully
>> >>> >>>> extract
>> >>> >>>> >> >> such
>> >>> >>>> >> >> >>> >>> >> information form processed texts.
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> I would start with
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >>  * fise:SettingAnnotation
>> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
>> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>> >>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>> >>> >>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation}
>> (multiple
>> >>> if
>> >>> >>>> there
>> >>> >>>> >> are
>> >>> >>>> >> >> >>> more
>> >>> >>>> >> >> >>> >>> >> suggestions)
>> >>> >>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
>> >>> >>>> >> fise:Instrument,
>> >>> >>>> >> >> >>> >>> fise:Cause
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
>> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>> >>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>> >>> >>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> If it turns out that we can extract more, we can
>> add
>> >>> >>>> more
>> >>> >>>> >> >> >>> structure to
>> >>> >>>> >> >> >>> >>> >> those annotations. We might also think about
>> using
>> >>> an
>> >>> >>>> own
>> >>> >>>> >> >> namespace
>> >>> >>>> >> >> >>> >>> >> for those extensions to the annotation structure.
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> > 2. Determine how should all of this be
>> integrated
>> >>> into
>> >>> >>>> >> >> Stanbol.
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and
>> configure a
>> >>> >>>> >> enhancement
>> >>> >>>> >> >> >>> chain
>> >>> >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> You should have a look at
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a
>> lot
>> >>> of
>> >>> >>>> things
>> >>> >>>> >> >> with
>> >>> >>>> >> >> >>> NLP
>> >>> >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives
>> (via
>> >>> >>>> verbs) to
>> >>> >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use
>> explicit
>> >>> >>>> dependency
>> >>> >>>> >> >> trees
>> >>> >>>> >> >> >>> >>> >> you code will need to do similar things with
>> Nouns,
>> >>> >>>> Pronouns
>> >>> >>>> >> and
>> >>> >>>> >> >> >>> >>> >> Verbs.
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a
>> Java
>> >>> >>>> >> >> representation
>> >>> >>>> >> >> >>> of
>> >>> >>>> >> >> >>> >>> >> present fise:TextAnnotation and
>> >>> fise:EntityAnnotation
>> >>> >>>> [2].
>> >>> >>>> >> >> >>> Something
>> >>> >>>> >> >> >>> >>> >> similar will also be required by the
>> >>> >>>> EventExtractionEngine
>> >>> >>>> >> for
>> >>> >>>> >> >> fast
>> >>> >>>> >> >> >>> >>> >> access to such annotations while iterating over
>> the
>> >>> >>>> >> Sentences of
>> >>> >>>> >> >> >>> the
>> >>> >>>> >> >> >>> >>> >> text.
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> best
>> >>> >>>> >> >> >>> >>> >> Rupert
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> [1]
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >>
>> >>> >>>> >>
>> >>> >>>>
>> >>>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>> >>> >>>> >> >> >>> >>> >> [2]
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >>
>> >>> >>>> >>
>> >>> >>>>
>> >>>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> >
>> >>> >>>> >> >> >>> >>> >> > Thanks
>> >>> >>>> >> >> >>> >>> >> >
>> >>> >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
>> >>> >>>> >> >> >>> >>> >> >> best
>> >>> >>>> >> >> >>> >>> >> >> Rupert
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >> >> --
>> >>> >>>> >> >> >>> >>> >> >> | Rupert Westenthaler
>> >>> >>>> >> >> rupert.westenthaler@gmail.com
>> >>> >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
>> >>> >>>> >> >> >>> ++43-699-11108907
>> >>> >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
>> >>> >>>> >> >> >>> >>> >> >>
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>> >> --
>> >>> >>>> >> >> >>> >>> >> | Rupert Westenthaler
>> >>> >>>> >> rupert.westenthaler@gmail.com
>> >>> >>>> >> >> >>> >>> >> | Bodenlehenstraße 11
>> >>> >>>> >> >> >>> ++43-699-11108907
>> >>> >>>> >> >> >>> >>> >> | A-5500 Bischofshofen
>> >>> >>>> >> >> >>> >>> >>
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>> >>> --
>> >>> >>>> >> >> >>> >>> | Rupert Westenthaler
>> >>> >>>> rupert.westenthaler@gmail.com
>> >>> >>>> >> >> >>> >>> | Bodenlehenstraße 11
>> >>> >>>> >> >> ++43-699-11108907
>> >>> >>>> >> >> >>> >>> | A-5500 Bischofshofen
>> >>> >>>> >> >> >>> >>>
>> >>> >>>> >> >> >>> >>
>> >>> >>>> >> >> >>> >>
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>> --
>> >>> >>>> >> >> >>> | Rupert Westenthaler
>> >>> >>>> rupert.westenthaler@gmail.com
>> >>> >>>> >> >> >>> | Bodenlehenstraße 11
>> >>> >>>> ++43-699-11108907
>> >>> >>>> >> >> >>> | A-5500 Bischofshofen
>> >>> >>>> >> >> >>>
>> >>> >>>> >> >> >>
>> >>> >>>> >> >> >>
>> >>> >>>> >> >>
>> >>> >>>> >> >>
>> >>> >>>> >> >>
>> >>> >>>> >> >> --
>> >>> >>>> >> >> | Rupert Westenthaler
>> >>> rupert.westenthaler@gmail.com
>> >>> >>>> >> >> | Bodenlehenstraße 11
>> >>> >>>> ++43-699-11108907
>> >>> >>>> >> >> | A-5500 Bischofshofen
>> >>> >>>> >> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >>
>> >>> >>>> >> --
>> >>> >>>> >> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> >>> >>>> >> | Bodenlehenstraße 11
>> >>> ++43-699-11108907
>> >>> >>>> >> | A-5500 Bischofshofen
>> >>> >>>> >>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> --
>> >>> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>> >>>> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >>> >>>> | A-5500 Bischofshofen
>> >>> >>>>
>> >>> >>>
>> >>> >>>
>> >>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>> | Bodenlehenstraße 11                             ++43-699-11108907
>> >>> | A-5500 Bischofshofen
>> >>>
>> >>
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Relation extraction feature

Posted by Cristian Petroaca <cr...@gmail.com>.
I've already started to implement the Coreference bit first in the nlp and
nlp-json projects. There's one thing that I don't know how to implement.
The CorefTag class contains a Set<Span> mentions member (represents the
"mentions" array defined in an earlier mail) and in the
CorefTagSupport.parse() method I need to reconstuct the CorefTag object
from json. I can't figure out how can I construct the aforementioned member
which should contain the references to mentions whch are Span objects found
in the AnalyzedTextImpl. One problem is I don't have access to the
AnalyzedTextImpl object and even if I did there could be situations in
which I am constructing a CorefTag for a Span which contains mentions to
other Spans which have not been parsed yet and they don't exist in the
AnalyzedTextImpl.

One solution would be not to link with the actual Span references from the
AnalyzedTextImpl but to create new Span Objects (ChunkImpl, TokenImpl).
That would need the ChunkImpl and TokenImpl constructors to be changed from
protected to public.


2013/9/12 Rupert Westenthaler <ru...@gmail.com>

> Hi Cristian,
>
> In fact I missed it. Sorry for that.
>
> I think the revised proposal looks like a good start. Usually one
> needs make some adaptions when writing the actual code.
>
> If you have a first version attach it to an issue and I will commit it
> to the branch.
>
> best
> Rupert
>
>
> On Thu, Sep 12, 2013 at 9:04 AM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > Hi Rupert,
> >
> > This is a reminder in case you missed this e-mail.
> >
> > Cristian
> >
> >
> > 2013/9/3 Cristian Petroaca <cr...@gmail.com>
> >
> >> Ok, then to sum it up we would have :
> >>
> >> 1. Coref
> >>
> >> "stanbol.enhancer.nlp.coref" {
> >>     "isRepresentative" : true/false, // whether this token or chunk is
> the
> >> representative mention in the chain
> >>     "mentions" : [ { "type" : "Token", // type of element which refers
> to
> >> this token/chunk
> >>  "start": 123 , // start index of the mentioning element
> >>  "end": 130 // end index of the mentioning element
> >>                     }, ...
> >>                  ],
> >>     "class" : ""class" :
> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >> }
> >>
> >>
> >> 2. Dependency tree
> >>
> >> "stanbol.enhancer.nlp.dependency" : {
> >> "relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP
> >> notation
> >>                        "dep" : 12, // type of relation - Stanbol NLP
> >> mapped value - ordinal number in enum Dependency
> >> "role" : "gov/dep", // whether this token is the depender or the
> dependee
> >>  "type" : "Token", // type of element with which this token is in
> relation
> >> "start" : 123, // start index of the relating token
> >>  "end" : 130 // end index of the relating token
> >> },
> >> ...
> >>  ]
> >> "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
> >> }
> >>
> >>
> >> 2013/9/2 Rupert Westenthaler <ru...@gmail.com>
> >>
> >>> Hi Cristian,
> >>>
> >>> let me provide some feedback to your proposals:
> >>>
> >>> ### Referring other Spans
> >>>
> >>> Both suggested annotations require to link other spans (Sentence,
> >>> Chunk or Token). For that we should introduce a JSON element used for
> >>> referring those elements and use it for all usages.
> >>>
> >>> In the java model this would allow you to have a reference to the
> >>> other Span (Sentence, Chunk, Token). In the serialized form you would
> >>> have JSON elements with the "type", "start" and "end" attributes as
> >>> those three uniquely identify any span.
> >>>
> >>> Here an example based on the "mention" attribute as defined by the
> >>> proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >>>
> >>>     ...
> >>>     "mentions" : [ {
> >>>         "type" : "Token",
> >>>         "start": 123 ,
> >>>         "end": 130 } ,{
> >>>         "type" : "Token",
> >>>         "start": 157 ,
> >>>         "end": 165 }],
> >>>     ...
> >>>
> >>> Similar token links in
> >>> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should also
> >>> use this model.
> >>>
> >>> ### Usage of Controlled Vocabularies
> >>>
> >>> In addition the DependencyTag also seams to use a controlled
> >>> vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
> >>> NLP module tries to define those in some kind of Ontology. For POS
> >>> tags we use OLIA ontology [1]. This is important as most NLP
> >>> frameworks will use different strings and we need to unify those to
> >>> commons IDs so that component that consume those data do not depend on
> >>> a specific NLP tool.
> >>>
> >>> Because the usage of Ontologies within Java is not well supported. The
> >>> Stanbol NLP module defines Java Enumerations for those Ontologies such
> >>> as the POS type enumeration [2].
> >>>
> >>> Both the Java Model as well as the JSON serialization do support both
> >>> (1) the lexical tag as used by the NLP tool and (2) the mapped
> >>> concept. In the Java API via two different methods and in the JSON
> >>> serialization via two separate keys.
> >>>
> >>> To make this more clear here an example for a POS annotation of a
> proper
> >>> noun.
> >>>
> >>>     "stanbol.enhancer.nlp.pos" : {
> >>>         "tag" : "PN",
> >>>         "pos" : 53,
> >>>         "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
> >>>         "prob" : 0.95
> >>>     }
> >>>
> >>> where
> >>>
> >>>     "tag" : "PN"
> >>>
> >>> is the lexical form as used by the NLP tool and
> >>>
> >>>     "pos" : 53
> >>>
> >>> refers to the ordinal number of the entry "ProperNoun" in the POS
> >>> enumeration
> >>>
> >>> IMO the "type" property of DependencyTag should use a similar design.
> >>>
> >>> best
> >>> Rupert
> >>>
> >>> [1] http://olia.nlp2rdf.org/
> >>> [2]
> >>>
> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java
> >>>
> >>> On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
> >>> <cr...@gmail.com> wrote:
> >>> > Sorry, pressed sent too soon :).
> >>> >
> >>> > Continued :
> >>> >
> >>> > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
> >>> > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
> >>> >
> >>> > Given this, we can have for each "Token" an additional dependency
> >>> > annotation :
> >>> >
> >>> > "stanbol.enhancer.nlp.dependency" : {
> >>> > "tag" : //is it necessary?
> >>> > "relations" : [ { "type" : "nsubj", //type of relation
> >>> >   "role" : "gov/dep", //whether it is depender or the dependee
> >>> >   "dependencyValue" : "met", // the word with which the token has a
> >>> relation
> >>> >   "dependencyIndexInSentence" : "2" //the index of the dependency in
> the
> >>> > current sentence
> >>> > }
> >>> > ...
> >>> > ]
> >>> >                 "class" :
> >>> > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
> >>> >         }
> >>> >
> >>> > 2013/9/1 Cristian Petroaca <cr...@gmail.com>
> >>> >
> >>> >> Related to the Stanford Dependency Tree Feature, this is the way the
> >>> >> output from the tool looks like for this sentence : "Mary and Tom
> met
> >>> Danny
> >>> >> today" :
> >>> >>
> >>> >>
> >>> >> 2013/8/30 Cristian Petroaca <cr...@gmail.com>
> >>> >>
> >>> >>> Hi Rupert,
> >>> >>>
> >>> >>> Ok, so after looking at the JSON output from the Stanford NLP
> Server
> >>> and
> >>> >>> the coref module I'm thinking I can represent the coreference
> >>> information
> >>> >>> this way:
> >>> >>> Each "Token" or "Chunk" will contain an additional coref annotation
> >>> with
> >>> >>> the following structure :
> >>> >>>
> >>> >>> "stanbol.enhancer.nlp.coref" {
> >>> >>>     "tag" : //does this need to exist?
> >>> >>>     "isRepresentative" : true/false, // whether this token or
> chunk is
> >>> >>> the representative mention in the chain
> >>> >>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the
> >>> mention
> >>> >>> is found
> >>> >>>                            "startWord" : 2 //the first word making
> up
> >>> the
> >>> >>> mention
> >>> >>>                            "endWord" : 3 //the last word making up
> the
> >>> >>> mention
> >>> >>>                          }, ...
> >>> >>>                        ],
> >>> >>>     "class" : ""class" :
> >>> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >>> >>> }
> >>> >>>
> >>> >>> The CorefTag should resemble this model.
> >>> >>>
> >>> >>> What do you think?
> >>> >>>
> >>> >>> Cristian
> >>> >>>
> >>> >>>
> >>> >>> 2013/8/24 Rupert Westenthaler <ru...@gmail.com>
> >>> >>>
> >>> >>>> Hi Cristian,
> >>> >>>>
> >>> >>>> you can not directly call StanfordNLP components from Stanbol, but
> >>> you
> >>> >>>> have to extend the RESTful service to include the information you
> >>> >>>> need. The main reason for that is that the license of StanfordNLP
> is
> >>> >>>> not compatible with the Apache Software License. So Stanbol can
> not
> >>> >>>> directly link to the StanfordNLP API.
> >>> >>>>
> >>> >>>> You will need to
> >>> >>>>
> >>> >>>> 1. define an additional class {yourTag} extends Tag<{yourType}>
> class
> >>> >>>> in the o.a.s.enhancer.nlp module
> >>> >>>> 2. add JSON parsing and serialization support for this tag to the
> >>> >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an
> example)
> >>> >>>>
> >>> >>>> As (1) would be necessary anyway the only additional thing you
> need
> >>> to
> >>> >>>> develop is (2). After that you can add {yourTag} instance to the
> >>> >>>> AnalyzedText in the StanfornNLP integration. The
> >>> >>>> RestfulNlpAnalysisEngine will parse them from the response. All
> >>> >>>> engines executed after the RestfulNlpAnalysisEngine will have
> access
> >>> >>>> to your annotations.
> >>> >>>>
> >>> >>>> If you have a design for {yourTag} - the model you would like to
> use
> >>> >>>> to represent your data - I can help with (1) and (2).
> >>> >>>>
> >>> >>>> best
> >>> >>>> Rupert
> >>> >>>>
> >>> >>>>
> >>> >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
> >>> >>>> <cr...@gmail.com> wrote:
> >>> >>>> > Hi Rupert,
> >>> >>>> >
> >>> >>>> > Thanks for the info. Looking at the standbol-stanfordnlp
> project I
> >>> see
> >>> >>>> that
> >>> >>>> > the stanford nlp is not implemented as an EnhancementEngine but
> >>> rather
> >>> >>>> it
> >>> >>>> > is used directly in a Jetty Server instance. How does that fit
> >>> into the
> >>> >>>> > Stanbol stack? For example how can I call the
> StanfordNlpAnalyzer's
> >>> >>>> routine
> >>> >>>> > from my TripleExtractionEnhancementEngine which lives in the
> >>> Stanbol
> >>> >>>> stack?
> >>> >>>> >
> >>> >>>> > Thanks,
> >>> >>>> > Cristian
> >>> >>>> >
> >>> >>>> >
> >>> >>>> > 2013/8/12 Rupert Westenthaler <ru...@gmail.com>
> >>> >>>> >
> >>> >>>> >> Hi Cristian,
> >>> >>>> >>
> >>> >>>> >> Sorry for the late response, but I was offline for the last two
> >>> weeks
> >>> >>>> >>
> >>> >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
> >>> >>>> >> <cr...@gmail.com> wrote:
> >>> >>>> >> > Hi Rupert,
> >>> >>>> >> >
> >>> >>>> >> > After doing some tests it seems that the Stanford NLP
> >>> coreference
> >>> >>>> module
> >>> >>>> >> is
> >>> >>>> >> > much more accurate than the Open NLP one.So I decided to
> extend
> >>> >>>> Stanford
> >>> >>>> >> > NLP to add coreference there.
> >>> >>>> >>
> >>> >>>> >> The Stanford NLP integration is not part of the Stanbol
> codebase
> >>> >>>> >> because the licenses are not compatible.
> >>> >>>> >>
> >>> >>>> >> You can find the Stanford NLP integration on
> >>> >>>> >>
> >>> >>>> >>     https://github.com/westei/stanbol-stanfordnlp
> >>> >>>> >>
> >>> >>>> >> just create a fork and send pull requests.
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >> > Could you add the necessary projects on the branch? And also
> >>> remove
> >>> >>>> the
> >>> >>>> >> > Open NLP ones?
> >>> >>>> >> >
> >>> >>>> >>
> >>> >>>> >> Currently the branch
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>>
> >>>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >>> >>>> >>
> >>> >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those
> >>> should
> >>> >>>> >> be enough for adding coreference support.
> >>> >>>> >>
> >>> >>>> >> IMO you will need to
> >>> >>>> >>
> >>> >>>> >> * add an model for representing coreference to the nlp module
> >>> >>>> >> * add parsing and serializing support to the nlp-json module
> >>> >>>> >> * add the implementation to your fork of the
> stanbol-stanfordnlp
> >>> >>>> project
> >>> >>>> >>
> >>> >>>> >> best
> >>> >>>> >> Rupert
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >> > Thanks,
> >>> >>>> >> > Cristian
> >>> >>>> >> >
> >>> >>>> >> >
> >>> >>>> >> > 2013/7/5 Rupert Westenthaler <ru...@gmail.com>
> >>> >>>> >> >
> >>> >>>> >> >> Hi Cristian,
> >>> >>>> >> >>
> >>> >>>> >> >> I created the branch at
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >>
> >>> >>>>
> >>>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >>> >>>> >> >>
> >>> >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me
> >>> know
> >>> >>>> if
> >>> >>>> >> >> you would like to have more
> >>> >>>> >> >>
> >>> >>>> >> >> best
> >>> >>>> >> >> Rupert
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
> >>> >>>> >> >> <cr...@gmail.com> wrote:
> >>> >>>> >> >> > Hi Rupert,
> >>> >>>> >> >> >
> >>> >>>> >> >> > I created jiras :
> >>> >>>> https://issues.apache.org/jira/browse/STANBOL-1132and
> >>> >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
> >>> >>>> original one
> >>> >>>> >> in
> >>> >>>> >> >> > dependent upon these.
> >>> >>>> >> >> > Please let me know when I can start using the branch.
> >>> >>>> >> >> >
> >>> >>>> >> >> > Thanks,
> >>> >>>> >> >> > Cristian
> >>> >>>> >> >> >
> >>> >>>> >> >> >
> >>> >>>> >> >> > 2013/6/27 Cristian Petroaca <cr...@gmail.com>
> >>> >>>> >> >> >
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >> 2013/6/27 Rupert Westenthaler <
> >>> rupert.westenthaler@gmail.com>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
> >>> >>>> >> >> >>> <cr...@gmail.com> wrote:
> >>> >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
> >>> >>>> previous
> >>> >>>> >> >> e-mail.
> >>> >>>> >> >> >>> By
> >>> >>>> >> >> >>> > the way, does Open NLP have the ability to build
> >>> dependency
> >>> >>>> trees?
> >>> >>>> >> >> >>> >
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
> >>> >>>> Stanbol,
> >>> >>>> >> I'll
> >>> >>>> >> >> >> take a look at how I can extend its integration to
> include
> >>> the
> >>> >>>> >> >> dependency
> >>> >>>> >> >> >> tree feature.
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>  >
> >>> >>>> >> >> >>> > 2013/6/23 Cristian Petroaca <
> cristian.petroaca@gmail.com
> >>> >
> >>> >>>> >> >> >>> >
> >>> >>>> >> >> >>> >> Hi Rupert,
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >> I created jira
> >>> >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
> >>> >>>> >> >> >>> >> As you suggested I would start with extending the
> >>> Stanford
> >>> >>>> NLP
> >>> >>>> >> with
> >>> >>>> >> >> >>> >> co-reference resolution but I think also with
> dependency
> >>> >>>> trees
> >>> >>>> >> >> because
> >>> >>>> >> >> >>> I
> >>> >>>> >> >> >>> >> also need to know the Subject of the sentence and the
> >>> object
> >>> >>>> >> that it
> >>> >>>> >> >> >>> >> affects, right?
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in
> >>> Stanbol
> >>> >>>> for
> >>> >>>> >> >> >>> >> co-reference and dependency trees, how do I proceed
> with
> >>> >>>> this?
> >>> >>>> >> Do I
> >>> >>>> >> >> >>> create
> >>> >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After
> that
> >>> can I
> >>> >>>> >> start
> >>> >>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm
> >>> done
> >>> >>>> I'll
> >>> >>>> >> send
> >>> >>>> >> >> >>> you
> >>> >>>> >> >> >>> >> guys the patch fo review?
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> I would create two "New Feature" type Issues one for
> adding
> >>> >>>> support
> >>> >>>> >> >> >>> for "dependency trees" and the other for "co-reference"
> >>> >>>> support. You
> >>> >>>> >> >> >>> should also define "depends on" relations between
> >>> STANBOL-1121
> >>> >>>> and
> >>> >>>> >> >> >>> those two new issues.
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> Sub-task could also work, but as adding those features
> >>> would
> >>> >>>> be also
> >>> >>>> >> >> >>> interesting for other things I would rather define them
> as
> >>> >>>> separate
> >>> >>>> >> >> >>> issues.
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >> 2 New Features connected with the original jira it is
> then.
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>> If you would prefer to work in an own branch please tell
> >>> me.
> >>> >>>> This
> >>> >>>> >> >> >>> could have the advantage that patches would not be
> >>> affected by
> >>> >>>> >> changes
> >>> >>>> >> >> >>> in the trunk.
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> Yes, a separate branch sounds good.
> >>> >>>> >> >> >>
> >>> >>>> >> >> >> best
> >>> >>>> >> >> >>> Rupert
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> >> Regards,
> >>> >>>> >> >> >>> >> Cristian
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
> >>> >>>> rupert.westenthaler@gmail.com>
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
> >>> >>>> >> >> >>> >>> <cr...@gmail.com> wrote:
> >>> >>>> >> >> >>> >>> > Hi Rupert,
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> > Agreed on the
> >>> >>>> >> >> >>>
> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
> >>> >>>> >> >> >>> >>> > data structure.
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> > Should I open up a Jira for all of this in order
> to
> >>> >>>> >> encapsulate
> >>> >>>> >> >> this
> >>> >>>> >> >> >>> >>> > information and establish the goals and these
> initial
> >>> >>>> steps
> >>> >>>> >> >> towards
> >>> >>>> >> >> >>> >>> these
> >>> >>>> >> >> >>> >>> > goals?
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be
> great.
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> > How should I proceed further? Should I create some
> >>> design
> >>> >>>> >> >> documents
> >>> >>>> >> >> >>> that
> >>> >>>> >> >> >>> >>> > need to be reviewed?
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> Usually it is the best to write design related text
> >>> >>>> directly in
> >>> >>>> >> >> JIRA
> >>> >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us
> later
> >>> to
> >>> >>>> use
> >>> >>>> >> this
> >>> >>>> >> >> >>> >>> text directly for the documentation on the Stanbol
> >>> Webpage.
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> best
> >>> >>>> >> >> >>> >>> Rupert
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> > Regards,
> >>> >>>> >> >> >>> >>> > Cristian
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
> >>> >>>> rupert.westenthaler@gmail.com>
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian
> Petroaca
> >>> >>>> >> >> >>> >>> >> <cr...@gmail.com> wrote:
> >>> >>>> >> >> >>> >>> >> > HI Rupert,
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > First of all thanks for the detailed
> suggestions.
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
> >>> >>>> >> rupert.westenthaler@gmail.com>
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> >> Hi Cristian, all
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> really interesting use case!
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> In this mail I will try to give some
> suggestions
> >>> on
> >>> >>>> how
> >>> >>>> >> this
> >>> >>>> >> >> >>> could
> >>> >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
> >>> >>>> experiences
> >>> >>>> >> >> and
> >>> >>>> >> >> >>> >>> lessons
> >>> >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we
> built an
> >>> >>>> >> information
> >>> >>>> >> >> >>> system
> >>> >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this
> >>> Project
> >>> >>>> >> excluded
> >>> >>>> >> >> the
> >>> >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text
> >>> (because
> >>> >>>> the
> >>> >>>> >> >> Olympic
> >>> >>>> >> >> >>> >>> >> >> Information System was already providing event
> >>> data
> >>> >>>> as XML
> >>> >>>> >> >> >>> messages)
> >>> >>>> >> >> >>> >>> >> >> the semantic search capabilities of this
> system
> >>> >>>> where very
> >>> >>>> >> >> >>> similar
> >>> >>>> >> >> >>> >>> as
> >>> >>>> >> >> >>> >>> >> >> the one described by your use case.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract
> >>> relations,
> >>> >>>> but a
> >>> >>>> >> >> formal
> >>> >>>> >> >> >>> >>> >> >> representation of the situation described by
> the
> >>> >>>> text. So
> >>> >>>> >> >> lets
> >>> >>>> >> >> >>> >>> assume
> >>> >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or
> >>> Situation)
> >>> >>>> >> >> described
> >>> >>>> >> >> >>> in
> >>> >>>> >> >> >>> >>> the
> >>> >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
> >>> >>>> advices on
> >>> >>>> >> >> how to
> >>> >>>> >> >> >>> >>> model
> >>> >>>> >> >> >>> >>> >> >> those. The important relation for modeling
> this
> >>> >>>> >> >> Participation:
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> where ..
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants
> do
> >>> have
> >>> >>>> an
> >>> >>>> >> >> >>> identity so
> >>> >>>> >> >> >>> >>> we
> >>> >>>> >> >> >>> >>> >> >> would typically refer to them as Entities
> >>> referenced
> >>> >>>> by a
> >>> >>>> >> >> >>> setting.
> >>> >>>> >> >> >>> >>> >> >> Note that this includes physical,
> non-physical as
> >>> >>>> well as
> >>> >>>> >> >> >>> >>> >> >> social-objects.
> >>> >>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants
> >>> are
> >>> >>>> >> entities
> >>> >>>> >> >> that
> >>> >>>> >> >> >>> >>> >> >> happen in time. This refers to Events,
> >>> Activities ...
> >>> >>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
> >>> >>>> relation
> >>> >>>> >> where
> >>> >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
> >>> >>>> intermediate
> >>> >>>> >> >> >>> resources
> >>> >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary
> relations.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy
> to
> >>> >>>> define
> >>> >>>> >> one
> >>> >>>> >> >> >>> resource
> >>> >>>> >> >> >>> >>> >> >> being the context for all described data. I
> would
> >>> >>>> call
> >>> >>>> >> this
> >>> >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
> >>> >>>> sub-concept to
> >>> >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement
> about
> >>> the
> >>> >>>> >> extracted
> >>> >>>> >> >> >>> >>> Setting
> >>> >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to
> it.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to
> >>> annotate
> >>> >>>> that
> >>> >>>> >> >> >>> Endurant is
> >>> >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
> >>> >>>> >> >> >>> fise:SettingAnnotation).
> >>> >>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
> >>> >>>> >> >> fise:TextAnnotaion
> >>> >>>> >> >> >>> (the
> >>> >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
> >>> >>>> Entities).
> >>> >>>> >> >> >>> Basically
> >>> >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
> >>> >>>> >> >> EnhancementEngine
> >>> >>>> >> >> >>> to
> >>> >>>> >> >> >>> >>> >> >> state that several mentions (in possible
> >>> different
> >>> >>>> >> >> sentences) do
> >>> >>>> >> >> >>> >>> >> >> represent the same Endurant as participating
> in
> >>> the
> >>> >>>> >> Setting.
> >>> >>>> >> >> In
> >>> >>>> >> >> >>> >>> >> >> addition it would be possible to use the
> dc:type
> >>> >>>> property
> >>> >>>> >> >> >>> (similar
> >>> >>>> >> >> >>> >>> as
> >>> >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the
> role(s)
> >>> of
> >>> >>>> an
> >>> >>>> >> >> >>> participant
> >>> >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs
> an
> >>> >>>> action)
> >>> >>>> >> Cause
> >>> >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
> >>> >>>> passive
> >>> >>>> >> role
> >>> >>>> >> >> in
> >>> >>>> >> >> >>> an
> >>> >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)),
> but
> >>> I am
> >>> >>>> >> >> wondering
> >>> >>>> >> >> >>> if
> >>> >>>> >> >> >>> >>> one
> >>> >>>> >> >> >>> >>> >> >> could extract those information.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to
> annotate a
> >>> >>>> >> Perdurant
> >>> >>>> >> >> in
> >>> >>>> >> >> >>> the
> >>> >>>> >> >> >>> >>> >> >> context of the Setting. Also
> >>> >>>> fise:OccurrentAnnotation can
> >>> >>>> >> >> link
> >>> >>>> >> >> >>> to
> >>> >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the
> text
> >>> >>>> defining
> >>> >>>> >> the
> >>> >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
> >>> >>>> suggesting
> >>> >>>> >> well
> >>> >>>> >> >> >>> known
> >>> >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election
> in a
> >>> >>>> country,
> >>> >>>> >> or
> >>> >>>> >> >> an
> >>> >>>> >> >> >>> >>> >> >> upraising ...). In addition
> >>> fise:OccurrentAnnotation
> >>> >>>> can
> >>> >>>> >> >> define
> >>> >>>> >> >> >>> >>> >> >> dc:has-participant links to
> >>> >>>> fise:ParticipantAnnotation. In
> >>> >>>> >> >> this
> >>> >>>> >> >> >>> case
> >>> >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
> >>> >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
> >>> >>>> Perturant
> >>> >>>> >> (the
> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
> >>> >>>> temporal
> >>> >>>> >> >> indexed
> >>> >>>> >> >> >>> this
> >>> >>>> >> >> >>> >>> >> >> annotation should also support properties for
> >>> >>>> defining the
> >>> >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a
> >>> lot of
> >>> >>>> sense
> >>> >>>> >> >> with
> >>> >>>> >> >> >>> the
> >>> >>>> >> >> >>> >>> >> remark
> >>> >>>> >> >> >>> >>> >> > that you probably won't be able to always
> extract
> >>> the
> >>> >>>> date
> >>> >>>> >> >> for a
> >>> >>>> >> >> >>> >>> given
> >>> >>>> >> >> >>> >>> >> > setting(situation).
> >>> >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in
> which
> >>> the
> >>> >>>> >> object
> >>> >>>> >> >> upon
> >>> >>>> >> >> >>> >>> which
> >>> >>>> >> >> >>> >>> >> the
> >>> >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a
> >>> transitory
> >>> >>>> >> object (
> >>> >>>> >> >> >>> such
> >>> >>>> >> >> >>> >>> as an
> >>> >>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant.
> For
> >>> >>>> example
> >>> >>>> >> we
> >>> >>>> >> >> can
> >>> >>>> >> >> >>> >>> have
> >>> >>>> >> >> >>> >>> >> the
> >>> >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the
> >>> Endurant
> >>> >>>> (
> >>> >>>> >> >> Subject )
> >>> >>>> >> >> >>> >>> which
> >>> >>>> >> >> >>> >>> >> > performs the action of "invading" on another
> >>> >>>> Eundurant,
> >>> >>>> >> namely
> >>> >>>> >> >> >>> >>> "Irak".
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq
> the
> >>> >>>> Patient.
> >>> >>>> >> Both
> >>> >>>> >> >> >>> are
> >>> >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
> >>> >>>> Perdurant. So
> >>> >>>> >> >> >>> ideally
> >>> >>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the
> >>> dc:type
> >>> >>>> >> >> caos:Agent,
> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
> >>> >>>> >> >> >>> fise:EntityAnnotation
> >>> >>>> >> >> >>> >>> >> linking to dbpedia:United_States
> >>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the
> >>> dc:type
> >>> >>>> >> >> >>> caos:Patient,
> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
> >>> >>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with
> the
> >>> >>>> dc:type
> >>> >>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation
> for
> >>> >>>> "invades"
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject
> >>> and
> >>> >>>> the
> >>> >>>> >> Object
> >>> >>>> >> >> >>> come
> >>> >>>> >> >> >>> >>> into
> >>> >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
> >>> >>>> >> dc:"property"
> >>> >>>> >> >> >>> where
> >>> >>>> >> >> >>> >>> the
> >>> >>>> >> >> >>> >>> >> > property = verb which links to the Object in
> noun
> >>> >>>> form. For
> >>> >>>> >> >> >>> example
> >>> >>>> >> >> >>> >>> take
> >>> >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You
> would
> >>> have
> >>> >>>> the
> >>> >>>> >> >> "USA"
> >>> >>>> >> >> >>> >>> Entity
> >>> >>>> >> >> >>> >>> >> with
> >>> >>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak".
> The
> >>> >>>> Endurant
> >>> >>>> >> >> would
> >>> >>>> >> >> >>> >>> have as
> >>> >>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs
> >>> which
> >>> >>>> link
> >>> >>>> >> it
> >>> >>>> >> >> to
> >>> >>>> >> >> >>> an
> >>> >>>> >> >> >>> >>> >> Object.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> As explained above you would have a
> >>> >>>> fise:OccurrentAnnotation
> >>> >>>> >> >> that
> >>> >>>> >> >> >>> >>> >> represents the Perdurant. The information that
> the
> >>> >>>> activity
> >>> >>>> >> >> >>> mention in
> >>> >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
> >>> >>>> >> >> >>> fise:TextAnnotation. If
> >>> >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that
> >>> defines
> >>> >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation
> could
> >>> >>>> also link
> >>> >>>> >> >> to an
> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> best
> >>> >>>> >> >> >>> >>> >> Rupert
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > ### Consuming the data:
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> I think this model should be sufficient for
> >>> >>>> use-cases as
> >>> >>>> >> >> >>> described
> >>> >>>> >> >> >>> >>> by
> >>> >>>> >> >> >>> >>> >> you.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> Users would be able to consume data on the
> >>> setting
> >>> >>>> level.
> >>> >>>> >> >> This
> >>> >>>> >> >> >>> can
> >>> >>>> >> >> >>> >>> be
> >>> >>>> >> >> >>> >>> >> >> done my simple retrieving all
> >>> >>>> fise:ParticipantAnnotation
> >>> >>>> >> as
> >>> >>>> >> >> >>> well as
> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a
> setting.
> >>> BTW
> >>> >>>> this
> >>> >>>> >> was
> >>> >>>> >> >> the
> >>> >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic
> search. It
> >>> >>>> allows
> >>> >>>> >> >> >>> queries for
> >>> >>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g.
> you
> >>> >>>> could
> >>> >>>> >> filter
> >>> >>>> >> >> >>> for
> >>> >>>> >> >> >>> >>> >> >> Settings that involve a {Person},
> >>> >>>> activities:Arrested and
> >>> >>>> >> a
> >>> >>>> >> >> >>> specific
> >>> >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this
> approach
> >>> >>>> you will
> >>> >>>> >> >> get
> >>> >>>> >> >> >>> >>> results
> >>> >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated
> and
> >>> an
> >>> >>>> other
> >>> >>>> >> >> person
> >>> >>>> >> >> >>> was
> >>> >>>> >> >> >>> >>> >> >> arrested.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> An other possibility would be to process
> >>> enhancement
> >>> >>>> >> results
> >>> >>>> >> >> on
> >>> >>>> >> >> >>> the
> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to
> a
> >>> much
> >>> >>>> >> higher
> >>> >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to
> >>> correctly
> >>> >>>> answer
> >>> >>>> >> >> the
> >>> >>>> >> >> >>> query
> >>> >>>> >> >> >>> >>> >> >> used as an example above). But I am wondering
> if
> >>> the
> >>> >>>> >> quality
> >>> >>>> >> >> of
> >>> >>>> >> >> >>> the
> >>> >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for
> this. I
> >>> >>>> have
> >>> >>>> >> also
> >>> >>>> >> >> >>> doubts
> >>> >>>> >> >> >>> >>> if
> >>> >>>> >> >> >>> >>> >> >> this can be still realized by using semantic
> >>> >>>> indexing to
> >>> >>>> >> >> Apache
> >>> >>>> >> >> >>> Solr
> >>> >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store
> >>> results
> >>> >>>> in a
> >>> >>>> >> >> >>> TripleStore
> >>> >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> The methodology and query language used by
> YAGO
> >>> [3]
> >>> >>>> is
> >>> >>>> >> also
> >>> >>>> >> >> very
> >>> >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7
> >>> SPOTL(X)
> >>> >>>> >> >> >>> >>> Representation).
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of
> >>> Entities
> >>> >>>> >> >> (especially
> >>> >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
> >>> >>>> extracted
> >>> >>>> >> form
> >>> >>>> >> >> >>> >>> Documents.
> >>> >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
> >>> >>>> temporal
> >>> >>>> >> >> indexed.
> >>> >>>> >> >> >>> That
> >>> >>>> >> >> >>> >>> >> >> means that at the time when added to a
> knowledge
> >>> >>>> base they
> >>> >>>> >> >> might
> >>> >>>> >> >> >>> >>> still
> >>> >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
> >>> >>>> refinement
> >>> >>>> >> of
> >>> >>>> >> >> such
> >>> >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
> >>> >>>> critical for
> >>> >>>> >> a
> >>> >>>> >> >> >>> System
> >>> >>>> >> >> >>> >>> >> >> like described in your use-case.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian
> >>> Petroaca
> >>> >>>> >> >> >>> >>> >> >> <cr...@gmail.com> wrote:
> >>> >>>> >> >> >>> >>> >> >> >
> >>> >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new
> >>> in the
> >>> >>>> >> field
> >>> >>>> >> >> of
> >>> >>>> >> >> >>> >>> semantic
> >>> >>>> >> >> >>> >>> >> >> > technologies, I've started to read about
> them
> >>> in
> >>> >>>> the
> >>> >>>> >> last
> >>> >>>> >> >> 4-5
> >>> >>>> >> >> >>> >>> >> >> months.Having
> >>> >>>> >> >> >>> >>> >> >> > said that I have a high level overview of
> what
> >>> is
> >>> >>>> a good
> >>> >>>> >> >> >>> approach
> >>> >>>> >> >> >>> >>> to
> >>> >>>> >> >> >>> >>> >> >> solve
> >>> >>>> >> >> >>> >>> >> >> > this problem. There are a number of papers
> on
> >>> the
> >>> >>>> >> internet
> >>> >>>> >> >> >>> which
> >>> >>>> >> >> >>> >>> >> describe
> >>> >>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named
> >>> entity
> >>> >>>> >> >> >>> recognition,
> >>> >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and
> >>> others.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently
> only
> >>> >>>> supports
> >>> >>>> >> >> >>> sentence
> >>> >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging,
> Chunking,
> >>> NER
> >>> >>>> and
> >>> >>>> >> >> lemma.
> >>> >>>> >> >> >>> >>> support
> >>> >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency
> trees
> >>> is
> >>> >>>> >> currently
> >>> >>>> >> >> >>> >>> missing.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with
> Stanbol
> >>> [4].
> >>> >>>> At
> >>> >>>> >> the
> >>> >>>> >> >> >>> moment
> >>> >>>> >> >> >>> >>> it
> >>> >>>> >> >> >>> >>> >> >> only supports English, but I do already work
> to
> >>> >>>> include
> >>> >>>> >> the
> >>> >>>> >> >> >>> other
> >>> >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that
> is
> >>> >>>> already
> >>> >>>> >> >> >>> integrated
> >>> >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane
> [6].
> >>> But
> >>> >>>> note
> >>> >>>> >> >> that
> >>> >>>> >> >> >>> for
> >>> >>>> >> >> >>> >>> all
> >>> >>>> >> >> >>> >>> >> >> those the integration excludes support for
> >>> >>>> co-reference
> >>> >>>> >> and
> >>> >>>> >> >> >>> >>> dependency
> >>> >>>> >> >> >>> >>> >> >> trees.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement
> a
> >>> first
> >>> >>>> >> >> prototype
> >>> >>>> >> >> >>> by
> >>> >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if
> >>> available
> >>> >>>> -
> >>> >>>> >> Chunks
> >>> >>>> >> >> >>> (e.g.
> >>> >>>> >> >> >>> >>> >> >> Noun phrases).
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature
> >>> like
> >>> >>>> >> Relation
> >>> >>>> >> >> >>> >>> extraction
> >>> >>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
> >>> >>>> >> >> >>> >>> >> > What kind of effort would be required for a
> >>> >>>> co-reference
> >>> >>>> >> >> >>> resolution
> >>> >>>> >> >> >>> >>> tool
> >>> >>>> >> >> >>> >>> >> > integration into Stanbol?
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine.
> But
> >>> >>>> before
> >>> >>>> >> we
> >>> >>>> >> >> can
> >>> >>>> >> >> >>> >>> >> build such an engine we would need to
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
> >>> >>>> Annotations for
> >>> >>>> >> >> >>> >>> co-reference
> >>> >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for
> >>> those
> >>> >>>> >> >> annotation
> >>> >>>> >> >> >>> so
> >>> >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
> >>> >>>> >> co-reference
> >>> >>>> >> >> >>> >>> >> information
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > 1. Determine the best data structure to
> >>> encapsulate
> >>> >>>> the
> >>> >>>> >> >> extracted
> >>> >>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper
> >>> structure to
> >>> >>>> >> >> represent
> >>> >>>> >> >> >>> >>> >> Events will only pay-off if we can also
> successfully
> >>> >>>> extract
> >>> >>>> >> >> such
> >>> >>>> >> >> >>> >>> >> information form processed texts.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> I would start with
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>  * fise:SettingAnnotation
> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >>> >>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation}
> (multiple
> >>> if
> >>> >>>> there
> >>> >>>> >> are
> >>> >>>> >> >> >>> more
> >>> >>>> >> >> >>> >>> >> suggestions)
> >>> >>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
> >>> >>>> >> fise:Instrument,
> >>> >>>> >> >> >>> >>> fise:Cause
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >>> >>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> If it turns out that we can extract more, we can
> add
> >>> >>>> more
> >>> >>>> >> >> >>> structure to
> >>> >>>> >> >> >>> >>> >> those annotations. We might also think about
> using
> >>> an
> >>> >>>> own
> >>> >>>> >> >> namespace
> >>> >>>> >> >> >>> >>> >> for those extensions to the annotation structure.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> > 2. Determine how should all of this be
> integrated
> >>> into
> >>> >>>> >> >> Stanbol.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and
> configure a
> >>> >>>> >> enhancement
> >>> >>>> >> >> >>> chain
> >>> >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> You should have a look at
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a
> lot
> >>> of
> >>> >>>> things
> >>> >>>> >> >> with
> >>> >>>> >> >> >>> NLP
> >>> >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives
> (via
> >>> >>>> verbs) to
> >>> >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use
> explicit
> >>> >>>> dependency
> >>> >>>> >> >> trees
> >>> >>>> >> >> >>> >>> >> you code will need to do similar things with
> Nouns,
> >>> >>>> Pronouns
> >>> >>>> >> and
> >>> >>>> >> >> >>> >>> >> Verbs.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a
> Java
> >>> >>>> >> >> representation
> >>> >>>> >> >> >>> of
> >>> >>>> >> >> >>> >>> >> present fise:TextAnnotation and
> >>> fise:EntityAnnotation
> >>> >>>> [2].
> >>> >>>> >> >> >>> Something
> >>> >>>> >> >> >>> >>> >> similar will also be required by the
> >>> >>>> EventExtractionEngine
> >>> >>>> >> for
> >>> >>>> >> >> fast
> >>> >>>> >> >> >>> >>> >> access to such annotations while iterating over
> the
> >>> >>>> >> Sentences of
> >>> >>>> >> >> >>> the
> >>> >>>> >> >> >>> >>> >> text.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> best
> >>> >>>> >> >> >>> >>> >> Rupert
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> [1]
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >>
> >>> >>>> >>
> >>> >>>>
> >>>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
> >>> >>>> >> >> >>> >>> >> [2]
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >>
> >>> >>>> >>
> >>> >>>>
> >>>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > Thanks
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
> >>> >>>> >> >> >>> >>> >> >> best
> >>> >>>> >> >> >>> >>> >> >> Rupert
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> --
> >>> >>>> >> >> >>> >>> >> >> | Rupert Westenthaler
> >>> >>>> >> >> rupert.westenthaler@gmail.com
> >>> >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
> >>> >>>> >> >> >>> ++43-699-11108907
> >>> >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> --
> >>> >>>> >> >> >>> >>> >> | Rupert Westenthaler
> >>> >>>> >> rupert.westenthaler@gmail.com
> >>> >>>> >> >> >>> >>> >> | Bodenlehenstraße 11
> >>> >>>> >> >> >>> ++43-699-11108907
> >>> >>>> >> >> >>> >>> >> | A-5500 Bischofshofen
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> --
> >>> >>>> >> >> >>> >>> | Rupert Westenthaler
> >>> >>>> rupert.westenthaler@gmail.com
> >>> >>>> >> >> >>> >>> | Bodenlehenstraße 11
> >>> >>>> >> >> ++43-699-11108907
> >>> >>>> >> >> >>> >>> | A-5500 Bischofshofen
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> --
> >>> >>>> >> >> >>> | Rupert Westenthaler
> >>> >>>> rupert.westenthaler@gmail.com
> >>> >>>> >> >> >>> | Bodenlehenstraße 11
> >>> >>>> ++43-699-11108907
> >>> >>>> >> >> >>> | A-5500 Bischofshofen
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >> --
> >>> >>>> >> >> | Rupert Westenthaler
> >>> rupert.westenthaler@gmail.com
> >>> >>>> >> >> | Bodenlehenstraße 11
> >>> >>>> ++43-699-11108907
> >>> >>>> >> >> | A-5500 Bischofshofen
> >>> >>>> >> >>
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >> --
> >>> >>>> >> | Rupert Westenthaler
> rupert.westenthaler@gmail.com
> >>> >>>> >> | Bodenlehenstraße 11
> >>> ++43-699-11108907
> >>> >>>> >> | A-5500 Bischofshofen
> >>> >>>> >>
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> --
> >>> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>> >>>> | Bodenlehenstraße 11
> ++43-699-11108907
> >>> >>>> | A-5500 Bischofshofen
> >>> >>>>
> >>> >>>
> >>> >>>
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>> | Bodenlehenstraße 11                             ++43-699-11108907
> >>> | A-5500 Bischofshofen
> >>>
> >>
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Relation extraction feature

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian,

In fact I missed it. Sorry for that.

I think the revised proposal looks like a good start. Usually one
needs make some adaptions when writing the actual code.

If you have a first version attach it to an issue and I will commit it
to the branch.

best
Rupert


On Thu, Sep 12, 2013 at 9:04 AM, Cristian Petroaca
<cr...@gmail.com> wrote:
> Hi Rupert,
>
> This is a reminder in case you missed this e-mail.
>
> Cristian
>
>
> 2013/9/3 Cristian Petroaca <cr...@gmail.com>
>
>> Ok, then to sum it up we would have :
>>
>> 1. Coref
>>
>> "stanbol.enhancer.nlp.coref" {
>>     "isRepresentative" : true/false, // whether this token or chunk is the
>> representative mention in the chain
>>     "mentions" : [ { "type" : "Token", // type of element which refers to
>> this token/chunk
>>  "start": 123 , // start index of the mentioning element
>>  "end": 130 // end index of the mentioning element
>>                     }, ...
>>                  ],
>>     "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>> }
>>
>>
>> 2. Dependency tree
>>
>> "stanbol.enhancer.nlp.dependency" : {
>> "relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP
>> notation
>>                        "dep" : 12, // type of relation - Stanbol NLP
>> mapped value - ordinal number in enum Dependency
>> "role" : "gov/dep", // whether this token is the depender or the dependee
>>  "type" : "Token", // type of element with which this token is in relation
>> "start" : 123, // start index of the relating token
>>  "end" : 130 // end index of the relating token
>> },
>> ...
>>  ]
>> "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
>> }
>>
>>
>> 2013/9/2 Rupert Westenthaler <ru...@gmail.com>
>>
>>> Hi Cristian,
>>>
>>> let me provide some feedback to your proposals:
>>>
>>> ### Referring other Spans
>>>
>>> Both suggested annotations require to link other spans (Sentence,
>>> Chunk or Token). For that we should introduce a JSON element used for
>>> referring those elements and use it for all usages.
>>>
>>> In the java model this would allow you to have a reference to the
>>> other Span (Sentence, Chunk, Token). In the serialized form you would
>>> have JSON elements with the "type", "start" and "end" attributes as
>>> those three uniquely identify any span.
>>>
>>> Here an example based on the "mention" attribute as defined by the
>>> proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>>>
>>>     ...
>>>     "mentions" : [ {
>>>         "type" : "Token",
>>>         "start": 123 ,
>>>         "end": 130 } ,{
>>>         "type" : "Token",
>>>         "start": 157 ,
>>>         "end": 165 }],
>>>     ...
>>>
>>> Similar token links in
>>> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should also
>>> use this model.
>>>
>>> ### Usage of Controlled Vocabularies
>>>
>>> In addition the DependencyTag also seams to use a controlled
>>> vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
>>> NLP module tries to define those in some kind of Ontology. For POS
>>> tags we use OLIA ontology [1]. This is important as most NLP
>>> frameworks will use different strings and we need to unify those to
>>> commons IDs so that component that consume those data do not depend on
>>> a specific NLP tool.
>>>
>>> Because the usage of Ontologies within Java is not well supported. The
>>> Stanbol NLP module defines Java Enumerations for those Ontologies such
>>> as the POS type enumeration [2].
>>>
>>> Both the Java Model as well as the JSON serialization do support both
>>> (1) the lexical tag as used by the NLP tool and (2) the mapped
>>> concept. In the Java API via two different methods and in the JSON
>>> serialization via two separate keys.
>>>
>>> To make this more clear here an example for a POS annotation of a proper
>>> noun.
>>>
>>>     "stanbol.enhancer.nlp.pos" : {
>>>         "tag" : "PN",
>>>         "pos" : 53,
>>>         "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
>>>         "prob" : 0.95
>>>     }
>>>
>>> where
>>>
>>>     "tag" : "PN"
>>>
>>> is the lexical form as used by the NLP tool and
>>>
>>>     "pos" : 53
>>>
>>> refers to the ordinal number of the entry "ProperNoun" in the POS
>>> enumeration
>>>
>>> IMO the "type" property of DependencyTag should use a similar design.
>>>
>>> best
>>> Rupert
>>>
>>> [1] http://olia.nlp2rdf.org/
>>> [2]
>>> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java
>>>
>>> On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
>>> <cr...@gmail.com> wrote:
>>> > Sorry, pressed sent too soon :).
>>> >
>>> > Continued :
>>> >
>>> > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
>>> > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
>>> >
>>> > Given this, we can have for each "Token" an additional dependency
>>> > annotation :
>>> >
>>> > "stanbol.enhancer.nlp.dependency" : {
>>> > "tag" : //is it necessary?
>>> > "relations" : [ { "type" : "nsubj", //type of relation
>>> >   "role" : "gov/dep", //whether it is depender or the dependee
>>> >   "dependencyValue" : "met", // the word with which the token has a
>>> relation
>>> >   "dependencyIndexInSentence" : "2" //the index of the dependency in the
>>> > current sentence
>>> > }
>>> > ...
>>> > ]
>>> >                 "class" :
>>> > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
>>> >         }
>>> >
>>> > 2013/9/1 Cristian Petroaca <cr...@gmail.com>
>>> >
>>> >> Related to the Stanford Dependency Tree Feature, this is the way the
>>> >> output from the tool looks like for this sentence : "Mary and Tom met
>>> Danny
>>> >> today" :
>>> >>
>>> >>
>>> >> 2013/8/30 Cristian Petroaca <cr...@gmail.com>
>>> >>
>>> >>> Hi Rupert,
>>> >>>
>>> >>> Ok, so after looking at the JSON output from the Stanford NLP Server
>>> and
>>> >>> the coref module I'm thinking I can represent the coreference
>>> information
>>> >>> this way:
>>> >>> Each "Token" or "Chunk" will contain an additional coref annotation
>>> with
>>> >>> the following structure :
>>> >>>
>>> >>> "stanbol.enhancer.nlp.coref" {
>>> >>>     "tag" : //does this need to exist?
>>> >>>     "isRepresentative" : true/false, // whether this token or chunk is
>>> >>> the representative mention in the chain
>>> >>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the
>>> mention
>>> >>> is found
>>> >>>                            "startWord" : 2 //the first word making up
>>> the
>>> >>> mention
>>> >>>                            "endWord" : 3 //the last word making up the
>>> >>> mention
>>> >>>                          }, ...
>>> >>>                        ],
>>> >>>     "class" : ""class" :
>>> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>>> >>> }
>>> >>>
>>> >>> The CorefTag should resemble this model.
>>> >>>
>>> >>> What do you think?
>>> >>>
>>> >>> Cristian
>>> >>>
>>> >>>
>>> >>> 2013/8/24 Rupert Westenthaler <ru...@gmail.com>
>>> >>>
>>> >>>> Hi Cristian,
>>> >>>>
>>> >>>> you can not directly call StanfordNLP components from Stanbol, but
>>> you
>>> >>>> have to extend the RESTful service to include the information you
>>> >>>> need. The main reason for that is that the license of StanfordNLP is
>>> >>>> not compatible with the Apache Software License. So Stanbol can not
>>> >>>> directly link to the StanfordNLP API.
>>> >>>>
>>> >>>> You will need to
>>> >>>>
>>> >>>> 1. define an additional class {yourTag} extends Tag<{yourType}> class
>>> >>>> in the o.a.s.enhancer.nlp module
>>> >>>> 2. add JSON parsing and serialization support for this tag to the
>>> >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
>>> >>>>
>>> >>>> As (1) would be necessary anyway the only additional thing you need
>>> to
>>> >>>> develop is (2). After that you can add {yourTag} instance to the
>>> >>>> AnalyzedText in the StanfornNLP integration. The
>>> >>>> RestfulNlpAnalysisEngine will parse them from the response. All
>>> >>>> engines executed after the RestfulNlpAnalysisEngine will have access
>>> >>>> to your annotations.
>>> >>>>
>>> >>>> If you have a design for {yourTag} - the model you would like to use
>>> >>>> to represent your data - I can help with (1) and (2).
>>> >>>>
>>> >>>> best
>>> >>>> Rupert
>>> >>>>
>>> >>>>
>>> >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
>>> >>>> <cr...@gmail.com> wrote:
>>> >>>> > Hi Rupert,
>>> >>>> >
>>> >>>> > Thanks for the info. Looking at the standbol-stanfordnlp project I
>>> see
>>> >>>> that
>>> >>>> > the stanford nlp is not implemented as an EnhancementEngine but
>>> rather
>>> >>>> it
>>> >>>> > is used directly in a Jetty Server instance. How does that fit
>>> into the
>>> >>>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
>>> >>>> routine
>>> >>>> > from my TripleExtractionEnhancementEngine which lives in the
>>> Stanbol
>>> >>>> stack?
>>> >>>> >
>>> >>>> > Thanks,
>>> >>>> > Cristian
>>> >>>> >
>>> >>>> >
>>> >>>> > 2013/8/12 Rupert Westenthaler <ru...@gmail.com>
>>> >>>> >
>>> >>>> >> Hi Cristian,
>>> >>>> >>
>>> >>>> >> Sorry for the late response, but I was offline for the last two
>>> weeks
>>> >>>> >>
>>> >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
>>> >>>> >> <cr...@gmail.com> wrote:
>>> >>>> >> > Hi Rupert,
>>> >>>> >> >
>>> >>>> >> > After doing some tests it seems that the Stanford NLP
>>> coreference
>>> >>>> module
>>> >>>> >> is
>>> >>>> >> > much more accurate than the Open NLP one.So I decided to extend
>>> >>>> Stanford
>>> >>>> >> > NLP to add coreference there.
>>> >>>> >>
>>> >>>> >> The Stanford NLP integration is not part of the Stanbol codebase
>>> >>>> >> because the licenses are not compatible.
>>> >>>> >>
>>> >>>> >> You can find the Stanford NLP integration on
>>> >>>> >>
>>> >>>> >>     https://github.com/westei/stanbol-stanfordnlp
>>> >>>> >>
>>> >>>> >> just create a fork and send pull requests.
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> > Could you add the necessary projects on the branch? And also
>>> remove
>>> >>>> the
>>> >>>> >> > Open NLP ones?
>>> >>>> >> >
>>> >>>> >>
>>> >>>> >> Currently the branch
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>>
>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>> >>>> >>
>>> >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those
>>> should
>>> >>>> >> be enough for adding coreference support.
>>> >>>> >>
>>> >>>> >> IMO you will need to
>>> >>>> >>
>>> >>>> >> * add an model for representing coreference to the nlp module
>>> >>>> >> * add parsing and serializing support to the nlp-json module
>>> >>>> >> * add the implementation to your fork of the stanbol-stanfordnlp
>>> >>>> project
>>> >>>> >>
>>> >>>> >> best
>>> >>>> >> Rupert
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> > Thanks,
>>> >>>> >> > Cristian
>>> >>>> >> >
>>> >>>> >> >
>>> >>>> >> > 2013/7/5 Rupert Westenthaler <ru...@gmail.com>
>>> >>>> >> >
>>> >>>> >> >> Hi Cristian,
>>> >>>> >> >>
>>> >>>> >> >> I created the branch at
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >>
>>> >>>>
>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>> >>>> >> >>
>>> >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me
>>> know
>>> >>>> if
>>> >>>> >> >> you would like to have more
>>> >>>> >> >>
>>> >>>> >> >> best
>>> >>>> >> >> Rupert
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
>>> >>>> >> >> <cr...@gmail.com> wrote:
>>> >>>> >> >> > Hi Rupert,
>>> >>>> >> >> >
>>> >>>> >> >> > I created jiras :
>>> >>>> https://issues.apache.org/jira/browse/STANBOL-1132and
>>> >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
>>> >>>> original one
>>> >>>> >> in
>>> >>>> >> >> > dependent upon these.
>>> >>>> >> >> > Please let me know when I can start using the branch.
>>> >>>> >> >> >
>>> >>>> >> >> > Thanks,
>>> >>>> >> >> > Cristian
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > 2013/6/27 Cristian Petroaca <cr...@gmail.com>
>>> >>>> >> >> >
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >> 2013/6/27 Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com>
>>> >>>> >> >> >>
>>> >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
>>> >>>> >> >> >>> <cr...@gmail.com> wrote:
>>> >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
>>> >>>> previous
>>> >>>> >> >> e-mail.
>>> >>>> >> >> >>> By
>>> >>>> >> >> >>> > the way, does Open NLP have the ability to build
>>> dependency
>>> >>>> trees?
>>> >>>> >> >> >>> >
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
>>> >>>> >> >> >>>
>>> >>>> >> >> >>
>>> >>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
>>> >>>> Stanbol,
>>> >>>> >> I'll
>>> >>>> >> >> >> take a look at how I can extend its integration to include
>>> the
>>> >>>> >> >> dependency
>>> >>>> >> >> >> tree feature.
>>> >>>> >> >> >>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>  >
>>> >>>> >> >> >>> > 2013/6/23 Cristian Petroaca <cristian.petroaca@gmail.com
>>> >
>>> >>>> >> >> >>> >
>>> >>>> >> >> >>> >> Hi Rupert,
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >> I created jira
>>> >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
>>> >>>> >> >> >>> >> As you suggested I would start with extending the
>>> Stanford
>>> >>>> NLP
>>> >>>> >> with
>>> >>>> >> >> >>> >> co-reference resolution but I think also with dependency
>>> >>>> trees
>>> >>>> >> >> because
>>> >>>> >> >> >>> I
>>> >>>> >> >> >>> >> also need to know the Subject of the sentence and the
>>> object
>>> >>>> >> that it
>>> >>>> >> >> >>> >> affects, right?
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in
>>> Stanbol
>>> >>>> for
>>> >>>> >> >> >>> >> co-reference and dependency trees, how do I proceed with
>>> >>>> this?
>>> >>>> >> Do I
>>> >>>> >> >> >>> create
>>> >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that
>>> can I
>>> >>>> >> start
>>> >>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm
>>> done
>>> >>>> I'll
>>> >>>> >> send
>>> >>>> >> >> >>> you
>>> >>>> >> >> >>> >> guys the patch fo review?
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> I would create two "New Feature" type Issues one for adding
>>> >>>> support
>>> >>>> >> >> >>> for "dependency trees" and the other for "co-reference"
>>> >>>> support. You
>>> >>>> >> >> >>> should also define "depends on" relations between
>>> STANBOL-1121
>>> >>>> and
>>> >>>> >> >> >>> those two new issues.
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> Sub-task could also work, but as adding those features
>>> would
>>> >>>> be also
>>> >>>> >> >> >>> interesting for other things I would rather define them as
>>> >>>> separate
>>> >>>> >> >> >>> issues.
>>> >>>> >> >> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >> >> 2 New Features connected with the original jira it is then.
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >>> If you would prefer to work in an own branch please tell
>>> me.
>>> >>>> This
>>> >>>> >> >> >>> could have the advantage that patches would not be
>>> affected by
>>> >>>> >> changes
>>> >>>> >> >> >>> in the trunk.
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> Yes, a separate branch sounds good.
>>> >>>> >> >> >>
>>> >>>> >> >> >> best
>>> >>>> >> >> >>> Rupert
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> >> Regards,
>>> >>>> >> >> >>> >> Cristian
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
>>> >>>> rupert.westenthaler@gmail.com>
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>>> >>>> >> >> >>> >>> <cr...@gmail.com> wrote:
>>> >>>> >> >> >>> >>> > Hi Rupert,
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> > Agreed on the
>>> >>>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>>> >>>> >> >> >>> >>> > data structure.
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> > Should I open up a Jira for all of this in order to
>>> >>>> >> encapsulate
>>> >>>> >> >> this
>>> >>>> >> >> >>> >>> > information and establish the goals and these initial
>>> >>>> steps
>>> >>>> >> >> towards
>>> >>>> >> >> >>> >>> these
>>> >>>> >> >> >>> >>> > goals?
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> > How should I proceed further? Should I create some
>>> design
>>> >>>> >> >> documents
>>> >>>> >> >> >>> that
>>> >>>> >> >> >>> >>> > need to be reviewed?
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> Usually it is the best to write design related text
>>> >>>> directly in
>>> >>>> >> >> JIRA
>>> >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later
>>> to
>>> >>>> use
>>> >>>> >> this
>>> >>>> >> >> >>> >>> text directly for the documentation on the Stanbol
>>> Webpage.
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> best
>>> >>>> >> >> >>> >>> Rupert
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> > Regards,
>>> >>>> >> >> >>> >>> > Cristian
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
>>> >>>> rupert.westenthaler@gmail.com>
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
>>> >>>> >> >> >>> >>> >> <cr...@gmail.com> wrote:
>>> >>>> >> >> >>> >>> >> > HI Rupert,
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
>>> >>>> >> rupert.westenthaler@gmail.com>
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> >> Hi Cristian, all
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> really interesting use case!
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions
>>> on
>>> >>>> how
>>> >>>> >> this
>>> >>>> >> >> >>> could
>>> >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
>>> >>>> experiences
>>> >>>> >> >> and
>>> >>>> >> >> >>> >>> lessons
>>> >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
>>> >>>> >> information
>>> >>>> >> >> >>> system
>>> >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this
>>> Project
>>> >>>> >> excluded
>>> >>>> >> >> the
>>> >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text
>>> (because
>>> >>>> the
>>> >>>> >> >> Olympic
>>> >>>> >> >> >>> >>> >> >> Information System was already providing event
>>> data
>>> >>>> as XML
>>> >>>> >> >> >>> messages)
>>> >>>> >> >> >>> >>> >> >> the semantic search capabilities of this system
>>> >>>> where very
>>> >>>> >> >> >>> similar
>>> >>>> >> >> >>> >>> as
>>> >>>> >> >> >>> >>> >> >> the one described by your use case.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract
>>> relations,
>>> >>>> but a
>>> >>>> >> >> formal
>>> >>>> >> >> >>> >>> >> >> representation of the situation described by the
>>> >>>> text. So
>>> >>>> >> >> lets
>>> >>>> >> >> >>> >>> assume
>>> >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or
>>> Situation)
>>> >>>> >> >> described
>>> >>>> >> >> >>> in
>>> >>>> >> >> >>> >>> the
>>> >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
>>> >>>> advices on
>>> >>>> >> >> how to
>>> >>>> >> >> >>> >>> model
>>> >>>> >> >> >>> >>> >> >> those. The important relation for modeling this
>>> >>>> >> >> Participation:
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> where ..
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do
>>> have
>>> >>>> an
>>> >>>> >> >> >>> identity so
>>> >>>> >> >> >>> >>> we
>>> >>>> >> >> >>> >>> >> >> would typically refer to them as Entities
>>> referenced
>>> >>>> by a
>>> >>>> >> >> >>> setting.
>>> >>>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as
>>> >>>> well as
>>> >>>> >> >> >>> >>> >> >> social-objects.
>>> >>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants
>>> are
>>> >>>> >> entities
>>> >>>> >> >> that
>>> >>>> >> >> >>> >>> >> >> happen in time. This refers to Events,
>>> Activities ...
>>> >>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
>>> >>>> relation
>>> >>>> >> where
>>> >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
>>> >>>> intermediate
>>> >>>> >> >> >>> resources
>>> >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to
>>> >>>> define
>>> >>>> >> one
>>> >>>> >> >> >>> resource
>>> >>>> >> >> >>> >>> >> >> being the context for all described data. I would
>>> >>>> call
>>> >>>> >> this
>>> >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
>>> >>>> sub-concept to
>>> >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about
>>> the
>>> >>>> >> extracted
>>> >>>> >> >> >>> >>> Setting
>>> >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to
>>> annotate
>>> >>>> that
>>> >>>> >> >> >>> Endurant is
>>> >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
>>> >>>> >> >> >>> fise:SettingAnnotation).
>>> >>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
>>> >>>> >> >> fise:TextAnnotaion
>>> >>>> >> >> >>> (the
>>> >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
>>> >>>> Entities).
>>> >>>> >> >> >>> Basically
>>> >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
>>> >>>> >> >> EnhancementEngine
>>> >>>> >> >> >>> to
>>> >>>> >> >> >>> >>> >> >> state that several mentions (in possible
>>> different
>>> >>>> >> >> sentences) do
>>> >>>> >> >> >>> >>> >> >> represent the same Endurant as participating in
>>> the
>>> >>>> >> Setting.
>>> >>>> >> >> In
>>> >>>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
>>> >>>> property
>>> >>>> >> >> >>> (similar
>>> >>>> >> >> >>> >>> as
>>> >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s)
>>> of
>>> >>>> an
>>> >>>> >> >> >>> participant
>>> >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an
>>> >>>> action)
>>> >>>> >> Cause
>>> >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
>>> >>>> passive
>>> >>>> >> role
>>> >>>> >> >> in
>>> >>>> >> >> >>> an
>>> >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but
>>> I am
>>> >>>> >> >> wondering
>>> >>>> >> >> >>> if
>>> >>>> >> >> >>> >>> one
>>> >>>> >> >> >>> >>> >> >> could extract those information.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
>>> >>>> >> Perdurant
>>> >>>> >> >> in
>>> >>>> >> >> >>> the
>>> >>>> >> >> >>> >>> >> >> context of the Setting. Also
>>> >>>> fise:OccurrentAnnotation can
>>> >>>> >> >> link
>>> >>>> >> >> >>> to
>>> >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
>>> >>>> defining
>>> >>>> >> the
>>> >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
>>> >>>> suggesting
>>> >>>> >> well
>>> >>>> >> >> >>> known
>>> >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
>>> >>>> country,
>>> >>>> >> or
>>> >>>> >> >> an
>>> >>>> >> >> >>> >>> >> >> upraising ...). In addition
>>> fise:OccurrentAnnotation
>>> >>>> can
>>> >>>> >> >> define
>>> >>>> >> >> >>> >>> >> >> dc:has-participant links to
>>> >>>> fise:ParticipantAnnotation. In
>>> >>>> >> >> this
>>> >>>> >> >> >>> case
>>> >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
>>> >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
>>> >>>> Perturant
>>> >>>> >> (the
>>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
>>> >>>> temporal
>>> >>>> >> >> indexed
>>> >>>> >> >> >>> this
>>> >>>> >> >> >>> >>> >> >> annotation should also support properties for
>>> >>>> defining the
>>> >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a
>>> lot of
>>> >>>> sense
>>> >>>> >> >> with
>>> >>>> >> >> >>> the
>>> >>>> >> >> >>> >>> >> remark
>>> >>>> >> >> >>> >>> >> > that you probably won't be able to always extract
>>> the
>>> >>>> date
>>> >>>> >> >> for a
>>> >>>> >> >> >>> >>> given
>>> >>>> >> >> >>> >>> >> > setting(situation).
>>> >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which
>>> the
>>> >>>> >> object
>>> >>>> >> >> upon
>>> >>>> >> >> >>> >>> which
>>> >>>> >> >> >>> >>> >> the
>>> >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a
>>> transitory
>>> >>>> >> object (
>>> >>>> >> >> >>> such
>>> >>>> >> >> >>> >>> as an
>>> >>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
>>> >>>> example
>>> >>>> >> we
>>> >>>> >> >> can
>>> >>>> >> >> >>> >>> have
>>> >>>> >> >> >>> >>> >> the
>>> >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the
>>> Endurant
>>> >>>> (
>>> >>>> >> >> Subject )
>>> >>>> >> >> >>> >>> which
>>> >>>> >> >> >>> >>> >> > performs the action of "invading" on another
>>> >>>> Eundurant,
>>> >>>> >> namely
>>> >>>> >> >> >>> >>> "Irak".
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
>>> >>>> Patient.
>>> >>>> >> Both
>>> >>>> >> >> >>> are
>>> >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
>>> >>>> Perdurant. So
>>> >>>> >> >> >>> ideally
>>> >>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the
>>> dc:type
>>> >>>> >> >> caos:Agent,
>>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
>>> >>>> >> >> >>> fise:EntityAnnotation
>>> >>>> >> >> >>> >>> >> linking to dbpedia:United_States
>>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the
>>> dc:type
>>> >>>> >> >> >>> caos:Patient,
>>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
>>> >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>>> >>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
>>> >>>> dc:type
>>> >>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
>>> >>>> "invades"
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject
>>> and
>>> >>>> the
>>> >>>> >> Object
>>> >>>> >> >> >>> come
>>> >>>> >> >> >>> >>> into
>>> >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
>>> >>>> >> dc:"property"
>>> >>>> >> >> >>> where
>>> >>>> >> >> >>> >>> the
>>> >>>> >> >> >>> >>> >> > property = verb which links to the Object in noun
>>> >>>> form. For
>>> >>>> >> >> >>> example
>>> >>>> >> >> >>> >>> take
>>> >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would
>>> have
>>> >>>> the
>>> >>>> >> >> "USA"
>>> >>>> >> >> >>> >>> Entity
>>> >>>> >> >> >>> >>> >> with
>>> >>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
>>> >>>> Endurant
>>> >>>> >> >> would
>>> >>>> >> >> >>> >>> have as
>>> >>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs
>>> which
>>> >>>> link
>>> >>>> >> it
>>> >>>> >> >> to
>>> >>>> >> >> >>> an
>>> >>>> >> >> >>> >>> >> Object.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> As explained above you would have a
>>> >>>> fise:OccurrentAnnotation
>>> >>>> >> >> that
>>> >>>> >> >> >>> >>> >> represents the Perdurant. The information that the
>>> >>>> activity
>>> >>>> >> >> >>> mention in
>>> >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
>>> >>>> >> >> >>> fise:TextAnnotation. If
>>> >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that
>>> defines
>>> >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could
>>> >>>> also link
>>> >>>> >> >> to an
>>> >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> best
>>> >>>> >> >> >>> >>> >> Rupert
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > ### Consuming the data:
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> I think this model should be sufficient for
>>> >>>> use-cases as
>>> >>>> >> >> >>> described
>>> >>>> >> >> >>> >>> by
>>> >>>> >> >> >>> >>> >> you.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> Users would be able to consume data on the
>>> setting
>>> >>>> level.
>>> >>>> >> >> This
>>> >>>> >> >> >>> can
>>> >>>> >> >> >>> >>> be
>>> >>>> >> >> >>> >>> >> >> done my simple retrieving all
>>> >>>> fise:ParticipantAnnotation
>>> >>>> >> as
>>> >>>> >> >> >>> well as
>>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting.
>>> BTW
>>> >>>> this
>>> >>>> >> was
>>> >>>> >> >> the
>>> >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
>>> >>>> allows
>>> >>>> >> >> >>> queries for
>>> >>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you
>>> >>>> could
>>> >>>> >> filter
>>> >>>> >> >> >>> for
>>> >>>> >> >> >>> >>> >> >> Settings that involve a {Person},
>>> >>>> activities:Arrested and
>>> >>>> >> a
>>> >>>> >> >> >>> specific
>>> >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach
>>> >>>> you will
>>> >>>> >> >> get
>>> >>>> >> >> >>> >>> results
>>> >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated and
>>> an
>>> >>>> other
>>> >>>> >> >> person
>>> >>>> >> >> >>> was
>>> >>>> >> >> >>> >>> >> >> arrested.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> An other possibility would be to process
>>> enhancement
>>> >>>> >> results
>>> >>>> >> >> on
>>> >>>> >> >> >>> the
>>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a
>>> much
>>> >>>> >> higher
>>> >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to
>>> correctly
>>> >>>> answer
>>> >>>> >> >> the
>>> >>>> >> >> >>> query
>>> >>>> >> >> >>> >>> >> >> used as an example above). But I am wondering if
>>> the
>>> >>>> >> quality
>>> >>>> >> >> of
>>> >>>> >> >> >>> the
>>> >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I
>>> >>>> have
>>> >>>> >> also
>>> >>>> >> >> >>> doubts
>>> >>>> >> >> >>> >>> if
>>> >>>> >> >> >>> >>> >> >> this can be still realized by using semantic
>>> >>>> indexing to
>>> >>>> >> >> Apache
>>> >>>> >> >> >>> Solr
>>> >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store
>>> results
>>> >>>> in a
>>> >>>> >> >> >>> TripleStore
>>> >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> The methodology and query language used by YAGO
>>> [3]
>>> >>>> is
>>> >>>> >> also
>>> >>>> >> >> very
>>> >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7
>>> SPOTL(X)
>>> >>>> >> >> >>> >>> Representation).
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of
>>> Entities
>>> >>>> >> >> (especially
>>> >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
>>> >>>> extracted
>>> >>>> >> form
>>> >>>> >> >> >>> >>> Documents.
>>> >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
>>> >>>> temporal
>>> >>>> >> >> indexed.
>>> >>>> >> >> >>> That
>>> >>>> >> >> >>> >>> >> >> means that at the time when added to a knowledge
>>> >>>> base they
>>> >>>> >> >> might
>>> >>>> >> >> >>> >>> still
>>> >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
>>> >>>> refinement
>>> >>>> >> of
>>> >>>> >> >> such
>>> >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
>>> >>>> critical for
>>> >>>> >> a
>>> >>>> >> >> >>> System
>>> >>>> >> >> >>> >>> >> >> like described in your use-case.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian
>>> Petroaca
>>> >>>> >> >> >>> >>> >> >> <cr...@gmail.com> wrote:
>>> >>>> >> >> >>> >>> >> >> >
>>> >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new
>>> in the
>>> >>>> >> field
>>> >>>> >> >> of
>>> >>>> >> >> >>> >>> semantic
>>> >>>> >> >> >>> >>> >> >> > technologies, I've started to read about them
>>> in
>>> >>>> the
>>> >>>> >> last
>>> >>>> >> >> 4-5
>>> >>>> >> >> >>> >>> >> >> months.Having
>>> >>>> >> >> >>> >>> >> >> > said that I have a high level overview of what
>>> is
>>> >>>> a good
>>> >>>> >> >> >>> approach
>>> >>>> >> >> >>> >>> to
>>> >>>> >> >> >>> >>> >> >> solve
>>> >>>> >> >> >>> >>> >> >> > this problem. There are a number of papers on
>>> the
>>> >>>> >> internet
>>> >>>> >> >> >>> which
>>> >>>> >> >> >>> >>> >> describe
>>> >>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named
>>> entity
>>> >>>> >> >> >>> recognition,
>>> >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and
>>> others.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
>>> >>>> supports
>>> >>>> >> >> >>> sentence
>>> >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking,
>>> NER
>>> >>>> and
>>> >>>> >> >> lemma.
>>> >>>> >> >> >>> >>> support
>>> >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees
>>> is
>>> >>>> >> currently
>>> >>>> >> >> >>> >>> missing.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol
>>> [4].
>>> >>>> At
>>> >>>> >> the
>>> >>>> >> >> >>> moment
>>> >>>> >> >> >>> >>> it
>>> >>>> >> >> >>> >>> >> >> only supports English, but I do already work to
>>> >>>> include
>>> >>>> >> the
>>> >>>> >> >> >>> other
>>> >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
>>> >>>> already
>>> >>>> >> >> >>> integrated
>>> >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6].
>>> But
>>> >>>> note
>>> >>>> >> >> that
>>> >>>> >> >> >>> for
>>> >>>> >> >> >>> >>> all
>>> >>>> >> >> >>> >>> >> >> those the integration excludes support for
>>> >>>> co-reference
>>> >>>> >> and
>>> >>>> >> >> >>> >>> dependency
>>> >>>> >> >> >>> >>> >> >> trees.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a
>>> first
>>> >>>> >> >> prototype
>>> >>>> >> >> >>> by
>>> >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if
>>> available
>>> >>>> -
>>> >>>> >> Chunks
>>> >>>> >> >> >>> (e.g.
>>> >>>> >> >> >>> >>> >> >> Noun phrases).
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature
>>> like
>>> >>>> >> Relation
>>> >>>> >> >> >>> >>> extraction
>>> >>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
>>> >>>> >> >> >>> >>> >> > What kind of effort would be required for a
>>> >>>> co-reference
>>> >>>> >> >> >>> resolution
>>> >>>> >> >> >>> >>> tool
>>> >>>> >> >> >>> >>> >> > integration into Stanbol?
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
>>> >>>> before
>>> >>>> >> we
>>> >>>> >> >> can
>>> >>>> >> >> >>> >>> >> build such an engine we would need to
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
>>> >>>> Annotations for
>>> >>>> >> >> >>> >>> co-reference
>>> >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for
>>> those
>>> >>>> >> >> annotation
>>> >>>> >> >> >>> so
>>> >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
>>> >>>> >> co-reference
>>> >>>> >> >> >>> >>> >> information
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > 1. Determine the best data structure to
>>> encapsulate
>>> >>>> the
>>> >>>> >> >> extracted
>>> >>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper
>>> structure to
>>> >>>> >> >> represent
>>> >>>> >> >> >>> >>> >> Events will only pay-off if we can also successfully
>>> >>>> extract
>>> >>>> >> >> such
>>> >>>> >> >> >>> >>> >> information form processed texts.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> I would start with
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>  * fise:SettingAnnotation
>>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
>>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>> >>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple
>>> if
>>> >>>> there
>>> >>>> >> are
>>> >>>> >> >> >>> more
>>> >>>> >> >> >>> >>> >> suggestions)
>>> >>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
>>> >>>> >> fise:Instrument,
>>> >>>> >> >> >>> >>> fise:Cause
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
>>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>> >>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> If it turns out that we can extract more, we can add
>>> >>>> more
>>> >>>> >> >> >>> structure to
>>> >>>> >> >> >>> >>> >> those annotations. We might also think about using
>>> an
>>> >>>> own
>>> >>>> >> >> namespace
>>> >>>> >> >> >>> >>> >> for those extensions to the annotation structure.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated
>>> into
>>> >>>> >> >> Stanbol.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
>>> >>>> >> enhancement
>>> >>>> >> >> >>> chain
>>> >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> You should have a look at
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot
>>> of
>>> >>>> things
>>> >>>> >> >> with
>>> >>>> >> >> >>> NLP
>>> >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
>>> >>>> verbs) to
>>> >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
>>> >>>> dependency
>>> >>>> >> >> trees
>>> >>>> >> >> >>> >>> >> you code will need to do similar things with Nouns,
>>> >>>> Pronouns
>>> >>>> >> and
>>> >>>> >> >> >>> >>> >> Verbs.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
>>> >>>> >> >> representation
>>> >>>> >> >> >>> of
>>> >>>> >> >> >>> >>> >> present fise:TextAnnotation and
>>> fise:EntityAnnotation
>>> >>>> [2].
>>> >>>> >> >> >>> Something
>>> >>>> >> >> >>> >>> >> similar will also be required by the
>>> >>>> EventExtractionEngine
>>> >>>> >> for
>>> >>>> >> >> fast
>>> >>>> >> >> >>> >>> >> access to such annotations while iterating over the
>>> >>>> >> Sentences of
>>> >>>> >> >> >>> the
>>> >>>> >> >> >>> >>> >> text.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> best
>>> >>>> >> >> >>> >>> >> Rupert
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> [1]
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >>
>>> >>>> >>
>>> >>>>
>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>>> >>>> >> >> >>> >>> >> [2]
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >>
>>> >>>> >>
>>> >>>>
>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > Thanks
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
>>> >>>> >> >> >>> >>> >> >> best
>>> >>>> >> >> >>> >>> >> >> Rupert
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> --
>>> >>>> >> >> >>> >>> >> >> | Rupert Westenthaler
>>> >>>> >> >> rupert.westenthaler@gmail.com
>>> >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
>>> >>>> >> >> >>> ++43-699-11108907
>>> >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> --
>>> >>>> >> >> >>> >>> >> | Rupert Westenthaler
>>> >>>> >> rupert.westenthaler@gmail.com
>>> >>>> >> >> >>> >>> >> | Bodenlehenstraße 11
>>> >>>> >> >> >>> ++43-699-11108907
>>> >>>> >> >> >>> >>> >> | A-5500 Bischofshofen
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> --
>>> >>>> >> >> >>> >>> | Rupert Westenthaler
>>> >>>> rupert.westenthaler@gmail.com
>>> >>>> >> >> >>> >>> | Bodenlehenstraße 11
>>> >>>> >> >> ++43-699-11108907
>>> >>>> >> >> >>> >>> | A-5500 Bischofshofen
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> --
>>> >>>> >> >> >>> | Rupert Westenthaler
>>> >>>> rupert.westenthaler@gmail.com
>>> >>>> >> >> >>> | Bodenlehenstraße 11
>>> >>>> ++43-699-11108907
>>> >>>> >> >> >>> | A-5500 Bischofshofen
>>> >>>> >> >> >>>
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >> --
>>> >>>> >> >> | Rupert Westenthaler
>>> rupert.westenthaler@gmail.com
>>> >>>> >> >> | Bodenlehenstraße 11
>>> >>>> ++43-699-11108907
>>> >>>> >> >> | A-5500 Bischofshofen
>>> >>>> >> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> --
>>> >>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >>>> >> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >>>> >> | A-5500 Bischofshofen
>>> >>>> >>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >>>> | A-5500 Bischofshofen
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Relation extraction feature

Posted by Cristian Petroaca <cr...@gmail.com>.
Hi Rupert,

This is a reminder in case you missed this e-mail.

Cristian


2013/9/3 Cristian Petroaca <cr...@gmail.com>

> Ok, then to sum it up we would have :
>
> 1. Coref
>
> "stanbol.enhancer.nlp.coref" {
>     "isRepresentative" : true/false, // whether this token or chunk is the
> representative mention in the chain
>     "mentions" : [ { "type" : "Token", // type of element which refers to
> this token/chunk
>  "start": 123 , // start index of the mentioning element
>  "end": 130 // end index of the mentioning element
>                     }, ...
>                  ],
>     "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> }
>
>
> 2. Dependency tree
>
> "stanbol.enhancer.nlp.dependency" : {
> "relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP
> notation
>                        "dep" : 12, // type of relation - Stanbol NLP
> mapped value - ordinal number in enum Dependency
> "role" : "gov/dep", // whether this token is the depender or the dependee
>  "type" : "Token", // type of element with which this token is in relation
> "start" : 123, // start index of the relating token
>  "end" : 130 // end index of the relating token
> },
> ...
>  ]
> "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
> }
>
>
> 2013/9/2 Rupert Westenthaler <ru...@gmail.com>
>
>> Hi Cristian,
>>
>> let me provide some feedback to your proposals:
>>
>> ### Referring other Spans
>>
>> Both suggested annotations require to link other spans (Sentence,
>> Chunk or Token). For that we should introduce a JSON element used for
>> referring those elements and use it for all usages.
>>
>> In the java model this would allow you to have a reference to the
>> other Span (Sentence, Chunk, Token). In the serialized form you would
>> have JSON elements with the "type", "start" and "end" attributes as
>> those three uniquely identify any span.
>>
>> Here an example based on the "mention" attribute as defined by the
>> proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>>
>>     ...
>>     "mentions" : [ {
>>         "type" : "Token",
>>         "start": 123 ,
>>         "end": 130 } ,{
>>         "type" : "Token",
>>         "start": 157 ,
>>         "end": 165 }],
>>     ...
>>
>> Similar token links in
>> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should also
>> use this model.
>>
>> ### Usage of Controlled Vocabularies
>>
>> In addition the DependencyTag also seams to use a controlled
>> vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
>> NLP module tries to define those in some kind of Ontology. For POS
>> tags we use OLIA ontology [1]. This is important as most NLP
>> frameworks will use different strings and we need to unify those to
>> commons IDs so that component that consume those data do not depend on
>> a specific NLP tool.
>>
>> Because the usage of Ontologies within Java is not well supported. The
>> Stanbol NLP module defines Java Enumerations for those Ontologies such
>> as the POS type enumeration [2].
>>
>> Both the Java Model as well as the JSON serialization do support both
>> (1) the lexical tag as used by the NLP tool and (2) the mapped
>> concept. In the Java API via two different methods and in the JSON
>> serialization via two separate keys.
>>
>> To make this more clear here an example for a POS annotation of a proper
>> noun.
>>
>>     "stanbol.enhancer.nlp.pos" : {
>>         "tag" : "PN",
>>         "pos" : 53,
>>         "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
>>         "prob" : 0.95
>>     }
>>
>> where
>>
>>     "tag" : "PN"
>>
>> is the lexical form as used by the NLP tool and
>>
>>     "pos" : 53
>>
>> refers to the ordinal number of the entry "ProperNoun" in the POS
>> enumeration
>>
>> IMO the "type" property of DependencyTag should use a similar design.
>>
>> best
>> Rupert
>>
>> [1] http://olia.nlp2rdf.org/
>> [2]
>> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java
>>
>> On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > Sorry, pressed sent too soon :).
>> >
>> > Continued :
>> >
>> > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
>> > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
>> >
>> > Given this, we can have for each "Token" an additional dependency
>> > annotation :
>> >
>> > "stanbol.enhancer.nlp.dependency" : {
>> > "tag" : //is it necessary?
>> > "relations" : [ { "type" : "nsubj", //type of relation
>> >   "role" : "gov/dep", //whether it is depender or the dependee
>> >   "dependencyValue" : "met", // the word with which the token has a
>> relation
>> >   "dependencyIndexInSentence" : "2" //the index of the dependency in the
>> > current sentence
>> > }
>> > ...
>> > ]
>> >                 "class" :
>> > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
>> >         }
>> >
>> > 2013/9/1 Cristian Petroaca <cr...@gmail.com>
>> >
>> >> Related to the Stanford Dependency Tree Feature, this is the way the
>> >> output from the tool looks like for this sentence : "Mary and Tom met
>> Danny
>> >> today" :
>> >>
>> >>
>> >> 2013/8/30 Cristian Petroaca <cr...@gmail.com>
>> >>
>> >>> Hi Rupert,
>> >>>
>> >>> Ok, so after looking at the JSON output from the Stanford NLP Server
>> and
>> >>> the coref module I'm thinking I can represent the coreference
>> information
>> >>> this way:
>> >>> Each "Token" or "Chunk" will contain an additional coref annotation
>> with
>> >>> the following structure :
>> >>>
>> >>> "stanbol.enhancer.nlp.coref" {
>> >>>     "tag" : //does this need to exist?
>> >>>     "isRepresentative" : true/false, // whether this token or chunk is
>> >>> the representative mention in the chain
>> >>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the
>> mention
>> >>> is found
>> >>>                            "startWord" : 2 //the first word making up
>> the
>> >>> mention
>> >>>                            "endWord" : 3 //the last word making up the
>> >>> mention
>> >>>                          }, ...
>> >>>                        ],
>> >>>     "class" : ""class" :
>> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>> >>> }
>> >>>
>> >>> The CorefTag should resemble this model.
>> >>>
>> >>> What do you think?
>> >>>
>> >>> Cristian
>> >>>
>> >>>
>> >>> 2013/8/24 Rupert Westenthaler <ru...@gmail.com>
>> >>>
>> >>>> Hi Cristian,
>> >>>>
>> >>>> you can not directly call StanfordNLP components from Stanbol, but
>> you
>> >>>> have to extend the RESTful service to include the information you
>> >>>> need. The main reason for that is that the license of StanfordNLP is
>> >>>> not compatible with the Apache Software License. So Stanbol can not
>> >>>> directly link to the StanfordNLP API.
>> >>>>
>> >>>> You will need to
>> >>>>
>> >>>> 1. define an additional class {yourTag} extends Tag<{yourType}> class
>> >>>> in the o.a.s.enhancer.nlp module
>> >>>> 2. add JSON parsing and serialization support for this tag to the
>> >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
>> >>>>
>> >>>> As (1) would be necessary anyway the only additional thing you need
>> to
>> >>>> develop is (2). After that you can add {yourTag} instance to the
>> >>>> AnalyzedText in the StanfornNLP integration. The
>> >>>> RestfulNlpAnalysisEngine will parse them from the response. All
>> >>>> engines executed after the RestfulNlpAnalysisEngine will have access
>> >>>> to your annotations.
>> >>>>
>> >>>> If you have a design for {yourTag} - the model you would like to use
>> >>>> to represent your data - I can help with (1) and (2).
>> >>>>
>> >>>> best
>> >>>> Rupert
>> >>>>
>> >>>>
>> >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
>> >>>> <cr...@gmail.com> wrote:
>> >>>> > Hi Rupert,
>> >>>> >
>> >>>> > Thanks for the info. Looking at the standbol-stanfordnlp project I
>> see
>> >>>> that
>> >>>> > the stanford nlp is not implemented as an EnhancementEngine but
>> rather
>> >>>> it
>> >>>> > is used directly in a Jetty Server instance. How does that fit
>> into the
>> >>>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
>> >>>> routine
>> >>>> > from my TripleExtractionEnhancementEngine which lives in the
>> Stanbol
>> >>>> stack?
>> >>>> >
>> >>>> > Thanks,
>> >>>> > Cristian
>> >>>> >
>> >>>> >
>> >>>> > 2013/8/12 Rupert Westenthaler <ru...@gmail.com>
>> >>>> >
>> >>>> >> Hi Cristian,
>> >>>> >>
>> >>>> >> Sorry for the late response, but I was offline for the last two
>> weeks
>> >>>> >>
>> >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
>> >>>> >> <cr...@gmail.com> wrote:
>> >>>> >> > Hi Rupert,
>> >>>> >> >
>> >>>> >> > After doing some tests it seems that the Stanford NLP
>> coreference
>> >>>> module
>> >>>> >> is
>> >>>> >> > much more accurate than the Open NLP one.So I decided to extend
>> >>>> Stanford
>> >>>> >> > NLP to add coreference there.
>> >>>> >>
>> >>>> >> The Stanford NLP integration is not part of the Stanbol codebase
>> >>>> >> because the licenses are not compatible.
>> >>>> >>
>> >>>> >> You can find the Stanford NLP integration on
>> >>>> >>
>> >>>> >>     https://github.com/westei/stanbol-stanfordnlp
>> >>>> >>
>> >>>> >> just create a fork and send pull requests.
>> >>>> >>
>> >>>> >>
>> >>>> >> > Could you add the necessary projects on the branch? And also
>> remove
>> >>>> the
>> >>>> >> > Open NLP ones?
>> >>>> >> >
>> >>>> >>
>> >>>> >> Currently the branch
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>>
>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>> >>>> >>
>> >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those
>> should
>> >>>> >> be enough for adding coreference support.
>> >>>> >>
>> >>>> >> IMO you will need to
>> >>>> >>
>> >>>> >> * add an model for representing coreference to the nlp module
>> >>>> >> * add parsing and serializing support to the nlp-json module
>> >>>> >> * add the implementation to your fork of the stanbol-stanfordnlp
>> >>>> project
>> >>>> >>
>> >>>> >> best
>> >>>> >> Rupert
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> > Thanks,
>> >>>> >> > Cristian
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > 2013/7/5 Rupert Westenthaler <ru...@gmail.com>
>> >>>> >> >
>> >>>> >> >> Hi Cristian,
>> >>>> >> >>
>> >>>> >> >> I created the branch at
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >>
>> >>>>
>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>> >>>> >> >>
>> >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me
>> know
>> >>>> if
>> >>>> >> >> you would like to have more
>> >>>> >> >>
>> >>>> >> >> best
>> >>>> >> >> Rupert
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
>> >>>> >> >> <cr...@gmail.com> wrote:
>> >>>> >> >> > Hi Rupert,
>> >>>> >> >> >
>> >>>> >> >> > I created jiras :
>> >>>> https://issues.apache.org/jira/browse/STANBOL-1132and
>> >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
>> >>>> original one
>> >>>> >> in
>> >>>> >> >> > dependent upon these.
>> >>>> >> >> > Please let me know when I can start using the branch.
>> >>>> >> >> >
>> >>>> >> >> > Thanks,
>> >>>> >> >> > Cristian
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > 2013/6/27 Cristian Petroaca <cr...@gmail.com>
>> >>>> >> >> >
>> >>>> >> >> >>
>> >>>> >> >> >>
>> >>>> >> >> >>
>> >>>> >> >> >> 2013/6/27 Rupert Westenthaler <
>> rupert.westenthaler@gmail.com>
>> >>>> >> >> >>
>> >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
>> >>>> >> >> >>> <cr...@gmail.com> wrote:
>> >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
>> >>>> previous
>> >>>> >> >> e-mail.
>> >>>> >> >> >>> By
>> >>>> >> >> >>> > the way, does Open NLP have the ability to build
>> dependency
>> >>>> trees?
>> >>>> >> >> >>> >
>> >>>> >> >> >>>
>> >>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
>> >>>> >> >> >>>
>> >>>> >> >> >>
>> >>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
>> >>>> Stanbol,
>> >>>> >> I'll
>> >>>> >> >> >> take a look at how I can extend its integration to include
>> the
>> >>>> >> >> dependency
>> >>>> >> >> >> tree feature.
>> >>>> >> >> >>
>> >>>> >> >> >>>
>> >>>> >> >> >>>
>> >>>> >> >> >>  >
>> >>>> >> >> >>> > 2013/6/23 Cristian Petroaca <cristian.petroaca@gmail.com
>> >
>> >>>> >> >> >>> >
>> >>>> >> >> >>> >> Hi Rupert,
>> >>>> >> >> >>> >>
>> >>>> >> >> >>> >> I created jira
>> >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
>> >>>> >> >> >>> >> As you suggested I would start with extending the
>> Stanford
>> >>>> NLP
>> >>>> >> with
>> >>>> >> >> >>> >> co-reference resolution but I think also with dependency
>> >>>> trees
>> >>>> >> >> because
>> >>>> >> >> >>> I
>> >>>> >> >> >>> >> also need to know the Subject of the sentence and the
>> object
>> >>>> >> that it
>> >>>> >> >> >>> >> affects, right?
>> >>>> >> >> >>> >>
>> >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in
>> Stanbol
>> >>>> for
>> >>>> >> >> >>> >> co-reference and dependency trees, how do I proceed with
>> >>>> this?
>> >>>> >> Do I
>> >>>> >> >> >>> create
>> >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that
>> can I
>> >>>> >> start
>> >>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm
>> done
>> >>>> I'll
>> >>>> >> send
>> >>>> >> >> >>> you
>> >>>> >> >> >>> >> guys the patch fo review?
>> >>>> >> >> >>> >>
>> >>>> >> >> >>>
>> >>>> >> >> >>> I would create two "New Feature" type Issues one for adding
>> >>>> support
>> >>>> >> >> >>> for "dependency trees" and the other for "co-reference"
>> >>>> support. You
>> >>>> >> >> >>> should also define "depends on" relations between
>> STANBOL-1121
>> >>>> and
>> >>>> >> >> >>> those two new issues.
>> >>>> >> >> >>>
>> >>>> >> >> >>> Sub-task could also work, but as adding those features
>> would
>> >>>> be also
>> >>>> >> >> >>> interesting for other things I would rather define them as
>> >>>> separate
>> >>>> >> >> >>> issues.
>> >>>> >> >> >>>
>> >>>> >> >> >>>
>> >>>> >> >> >> 2 New Features connected with the original jira it is then.
>> >>>> >> >> >>
>> >>>> >> >> >>
>> >>>> >> >> >>> If you would prefer to work in an own branch please tell
>> me.
>> >>>> This
>> >>>> >> >> >>> could have the advantage that patches would not be
>> affected by
>> >>>> >> changes
>> >>>> >> >> >>> in the trunk.
>> >>>> >> >> >>>
>> >>>> >> >> >>> Yes, a separate branch sounds good.
>> >>>> >> >> >>
>> >>>> >> >> >> best
>> >>>> >> >> >>> Rupert
>> >>>> >> >> >>>
>> >>>> >> >> >>> >> Regards,
>> >>>> >> >> >>> >> Cristian
>> >>>> >> >> >>> >>
>> >>>> >> >> >>> >>
>> >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
>> >>>> rupert.westenthaler@gmail.com>
>> >>>> >> >> >>> >>
>> >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>> >>>> >> >> >>> >>> <cr...@gmail.com> wrote:
>> >>>> >> >> >>> >>> > Hi Rupert,
>> >>>> >> >> >>> >>> >
>> >>>> >> >> >>> >>> > Agreed on the
>> >>>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>> >>>> >> >> >>> >>> > data structure.
>> >>>> >> >> >>> >>> >
>> >>>> >> >> >>> >>> > Should I open up a Jira for all of this in order to
>> >>>> >> encapsulate
>> >>>> >> >> this
>> >>>> >> >> >>> >>> > information and establish the goals and these initial
>> >>>> steps
>> >>>> >> >> towards
>> >>>> >> >> >>> >>> these
>> >>>> >> >> >>> >>> > goals?
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>> >>> > How should I proceed further? Should I create some
>> design
>> >>>> >> >> documents
>> >>>> >> >> >>> that
>> >>>> >> >> >>> >>> > need to be reviewed?
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>> >>> Usually it is the best to write design related text
>> >>>> directly in
>> >>>> >> >> JIRA
>> >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later
>> to
>> >>>> use
>> >>>> >> this
>> >>>> >> >> >>> >>> text directly for the documentation on the Stanbol
>> Webpage.
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>> >>> best
>> >>>> >> >> >>> >>> Rupert
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
>> >>>> >> >> >>> >>> >
>> >>>> >> >> >>> >>> > Regards,
>> >>>> >> >> >>> >>> > Cristian
>> >>>> >> >> >>> >>> >
>> >>>> >> >> >>> >>> >
>> >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
>> >>>> rupert.westenthaler@gmail.com>
>> >>>> >> >> >>> >>> >
>> >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
>> >>>> >> >> >>> >>> >> <cr...@gmail.com> wrote:
>> >>>> >> >> >>> >>> >> > HI Rupert,
>> >>>> >> >> >>> >>> >> >
>> >>>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
>> >>>> >> >> >>> >>> >> >
>> >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
>> >>>> >> rupert.westenthaler@gmail.com>
>> >>>> >> >> >>> >>> >> >
>> >>>> >> >> >>> >>> >> >> Hi Cristian, all
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> really interesting use case!
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions
>> on
>> >>>> how
>> >>>> >> this
>> >>>> >> >> >>> could
>> >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
>> >>>> experiences
>> >>>> >> >> and
>> >>>> >> >> >>> >>> lessons
>> >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
>> >>>> >> information
>> >>>> >> >> >>> system
>> >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this
>> Project
>> >>>> >> excluded
>> >>>> >> >> the
>> >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text
>> (because
>> >>>> the
>> >>>> >> >> Olympic
>> >>>> >> >> >>> >>> >> >> Information System was already providing event
>> data
>> >>>> as XML
>> >>>> >> >> >>> messages)
>> >>>> >> >> >>> >>> >> >> the semantic search capabilities of this system
>> >>>> where very
>> >>>> >> >> >>> similar
>> >>>> >> >> >>> >>> as
>> >>>> >> >> >>> >>> >> >> the one described by your use case.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract
>> relations,
>> >>>> but a
>> >>>> >> >> formal
>> >>>> >> >> >>> >>> >> >> representation of the situation described by the
>> >>>> text. So
>> >>>> >> >> lets
>> >>>> >> >> >>> >>> assume
>> >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or
>> Situation)
>> >>>> >> >> described
>> >>>> >> >> >>> in
>> >>>> >> >> >>> >>> the
>> >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
>> >>>> advices on
>> >>>> >> >> how to
>> >>>> >> >> >>> >>> model
>> >>>> >> >> >>> >>> >> >> those. The important relation for modeling this
>> >>>> >> >> Participation:
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> where ..
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do
>> have
>> >>>> an
>> >>>> >> >> >>> identity so
>> >>>> >> >> >>> >>> we
>> >>>> >> >> >>> >>> >> >> would typically refer to them as Entities
>> referenced
>> >>>> by a
>> >>>> >> >> >>> setting.
>> >>>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as
>> >>>> well as
>> >>>> >> >> >>> >>> >> >> social-objects.
>> >>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants
>> are
>> >>>> >> entities
>> >>>> >> >> that
>> >>>> >> >> >>> >>> >> >> happen in time. This refers to Events,
>> Activities ...
>> >>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
>> >>>> relation
>> >>>> >> where
>> >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
>> >>>> intermediate
>> >>>> >> >> >>> resources
>> >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to
>> >>>> define
>> >>>> >> one
>> >>>> >> >> >>> resource
>> >>>> >> >> >>> >>> >> >> being the context for all described data. I would
>> >>>> call
>> >>>> >> this
>> >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
>> >>>> sub-concept to
>> >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about
>> the
>> >>>> >> extracted
>> >>>> >> >> >>> >>> Setting
>> >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to
>> annotate
>> >>>> that
>> >>>> >> >> >>> Endurant is
>> >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
>> >>>> >> >> >>> fise:SettingAnnotation).
>> >>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
>> >>>> >> >> fise:TextAnnotaion
>> >>>> >> >> >>> (the
>> >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
>> >>>> Entities).
>> >>>> >> >> >>> Basically
>> >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
>> >>>> >> >> EnhancementEngine
>> >>>> >> >> >>> to
>> >>>> >> >> >>> >>> >> >> state that several mentions (in possible
>> different
>> >>>> >> >> sentences) do
>> >>>> >> >> >>> >>> >> >> represent the same Endurant as participating in
>> the
>> >>>> >> Setting.
>> >>>> >> >> In
>> >>>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
>> >>>> property
>> >>>> >> >> >>> (similar
>> >>>> >> >> >>> >>> as
>> >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s)
>> of
>> >>>> an
>> >>>> >> >> >>> participant
>> >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an
>> >>>> action)
>> >>>> >> Cause
>> >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
>> >>>> passive
>> >>>> >> role
>> >>>> >> >> in
>> >>>> >> >> >>> an
>> >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but
>> I am
>> >>>> >> >> wondering
>> >>>> >> >> >>> if
>> >>>> >> >> >>> >>> one
>> >>>> >> >> >>> >>> >> >> could extract those information.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
>> >>>> >> Perdurant
>> >>>> >> >> in
>> >>>> >> >> >>> the
>> >>>> >> >> >>> >>> >> >> context of the Setting. Also
>> >>>> fise:OccurrentAnnotation can
>> >>>> >> >> link
>> >>>> >> >> >>> to
>> >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
>> >>>> defining
>> >>>> >> the
>> >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
>> >>>> suggesting
>> >>>> >> well
>> >>>> >> >> >>> known
>> >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
>> >>>> country,
>> >>>> >> or
>> >>>> >> >> an
>> >>>> >> >> >>> >>> >> >> upraising ...). In addition
>> fise:OccurrentAnnotation
>> >>>> can
>> >>>> >> >> define
>> >>>> >> >> >>> >>> >> >> dc:has-participant links to
>> >>>> fise:ParticipantAnnotation. In
>> >>>> >> >> this
>> >>>> >> >> >>> case
>> >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
>> >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
>> >>>> Perturant
>> >>>> >> (the
>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
>> >>>> temporal
>> >>>> >> >> indexed
>> >>>> >> >> >>> this
>> >>>> >> >> >>> >>> >> >> annotation should also support properties for
>> >>>> defining the
>> >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a
>> lot of
>> >>>> sense
>> >>>> >> >> with
>> >>>> >> >> >>> the
>> >>>> >> >> >>> >>> >> remark
>> >>>> >> >> >>> >>> >> > that you probably won't be able to always extract
>> the
>> >>>> date
>> >>>> >> >> for a
>> >>>> >> >> >>> >>> given
>> >>>> >> >> >>> >>> >> > setting(situation).
>> >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
>> >>>> >> >> >>> >>> >> >
>> >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which
>> the
>> >>>> >> object
>> >>>> >> >> upon
>> >>>> >> >> >>> >>> which
>> >>>> >> >> >>> >>> >> the
>> >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a
>> transitory
>> >>>> >> object (
>> >>>> >> >> >>> such
>> >>>> >> >> >>> >>> as an
>> >>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
>> >>>> example
>> >>>> >> we
>> >>>> >> >> can
>> >>>> >> >> >>> >>> have
>> >>>> >> >> >>> >>> >> the
>> >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the
>> Endurant
>> >>>> (
>> >>>> >> >> Subject )
>> >>>> >> >> >>> >>> which
>> >>>> >> >> >>> >>> >> > performs the action of "invading" on another
>> >>>> Eundurant,
>> >>>> >> namely
>> >>>> >> >> >>> >>> "Irak".
>> >>>> >> >> >>> >>> >> >
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
>> >>>> Patient.
>> >>>> >> Both
>> >>>> >> >> >>> are
>> >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
>> >>>> Perdurant. So
>> >>>> >> >> >>> ideally
>> >>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the
>> dc:type
>> >>>> >> >> caos:Agent,
>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
>> >>>> >> >> >>> fise:EntityAnnotation
>> >>>> >> >> >>> >>> >> linking to dbpedia:United_States
>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the
>> dc:type
>> >>>> >> >> >>> caos:Patient,
>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
>> >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>> >>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
>> >>>> dc:type
>> >>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
>> >>>> "invades"
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject
>> and
>> >>>> the
>> >>>> >> Object
>> >>>> >> >> >>> come
>> >>>> >> >> >>> >>> into
>> >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
>> >>>> >> dc:"property"
>> >>>> >> >> >>> where
>> >>>> >> >> >>> >>> the
>> >>>> >> >> >>> >>> >> > property = verb which links to the Object in noun
>> >>>> form. For
>> >>>> >> >> >>> example
>> >>>> >> >> >>> >>> take
>> >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would
>> have
>> >>>> the
>> >>>> >> >> "USA"
>> >>>> >> >> >>> >>> Entity
>> >>>> >> >> >>> >>> >> with
>> >>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
>> >>>> Endurant
>> >>>> >> >> would
>> >>>> >> >> >>> >>> have as
>> >>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs
>> which
>> >>>> link
>> >>>> >> it
>> >>>> >> >> to
>> >>>> >> >> >>> an
>> >>>> >> >> >>> >>> >> Object.
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> As explained above you would have a
>> >>>> fise:OccurrentAnnotation
>> >>>> >> >> that
>> >>>> >> >> >>> >>> >> represents the Perdurant. The information that the
>> >>>> activity
>> >>>> >> >> >>> mention in
>> >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
>> >>>> >> >> >>> fise:TextAnnotation. If
>> >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that
>> defines
>> >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could
>> >>>> also link
>> >>>> >> >> to an
>> >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> best
>> >>>> >> >> >>> >>> >> Rupert
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> >
>> >>>> >> >> >>> >>> >> > ### Consuming the data:
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> I think this model should be sufficient for
>> >>>> use-cases as
>> >>>> >> >> >>> described
>> >>>> >> >> >>> >>> by
>> >>>> >> >> >>> >>> >> you.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> Users would be able to consume data on the
>> setting
>> >>>> level.
>> >>>> >> >> This
>> >>>> >> >> >>> can
>> >>>> >> >> >>> >>> be
>> >>>> >> >> >>> >>> >> >> done my simple retrieving all
>> >>>> fise:ParticipantAnnotation
>> >>>> >> as
>> >>>> >> >> >>> well as
>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting.
>> BTW
>> >>>> this
>> >>>> >> was
>> >>>> >> >> the
>> >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
>> >>>> allows
>> >>>> >> >> >>> queries for
>> >>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you
>> >>>> could
>> >>>> >> filter
>> >>>> >> >> >>> for
>> >>>> >> >> >>> >>> >> >> Settings that involve a {Person},
>> >>>> activities:Arrested and
>> >>>> >> a
>> >>>> >> >> >>> specific
>> >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach
>> >>>> you will
>> >>>> >> >> get
>> >>>> >> >> >>> >>> results
>> >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated and
>> an
>> >>>> other
>> >>>> >> >> person
>> >>>> >> >> >>> was
>> >>>> >> >> >>> >>> >> >> arrested.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> An other possibility would be to process
>> enhancement
>> >>>> >> results
>> >>>> >> >> on
>> >>>> >> >> >>> the
>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a
>> much
>> >>>> >> higher
>> >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to
>> correctly
>> >>>> answer
>> >>>> >> >> the
>> >>>> >> >> >>> query
>> >>>> >> >> >>> >>> >> >> used as an example above). But I am wondering if
>> the
>> >>>> >> quality
>> >>>> >> >> of
>> >>>> >> >> >>> the
>> >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I
>> >>>> have
>> >>>> >> also
>> >>>> >> >> >>> doubts
>> >>>> >> >> >>> >>> if
>> >>>> >> >> >>> >>> >> >> this can be still realized by using semantic
>> >>>> indexing to
>> >>>> >> >> Apache
>> >>>> >> >> >>> Solr
>> >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store
>> results
>> >>>> in a
>> >>>> >> >> >>> TripleStore
>> >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> The methodology and query language used by YAGO
>> [3]
>> >>>> is
>> >>>> >> also
>> >>>> >> >> very
>> >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7
>> SPOTL(X)
>> >>>> >> >> >>> >>> Representation).
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of
>> Entities
>> >>>> >> >> (especially
>> >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
>> >>>> extracted
>> >>>> >> form
>> >>>> >> >> >>> >>> Documents.
>> >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
>> >>>> temporal
>> >>>> >> >> indexed.
>> >>>> >> >> >>> That
>> >>>> >> >> >>> >>> >> >> means that at the time when added to a knowledge
>> >>>> base they
>> >>>> >> >> might
>> >>>> >> >> >>> >>> still
>> >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
>> >>>> refinement
>> >>>> >> of
>> >>>> >> >> such
>> >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
>> >>>> critical for
>> >>>> >> a
>> >>>> >> >> >>> System
>> >>>> >> >> >>> >>> >> >> like described in your use-case.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian
>> Petroaca
>> >>>> >> >> >>> >>> >> >> <cr...@gmail.com> wrote:
>> >>>> >> >> >>> >>> >> >> >
>> >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new
>> in the
>> >>>> >> field
>> >>>> >> >> of
>> >>>> >> >> >>> >>> semantic
>> >>>> >> >> >>> >>> >> >> > technologies, I've started to read about them
>> in
>> >>>> the
>> >>>> >> last
>> >>>> >> >> 4-5
>> >>>> >> >> >>> >>> >> >> months.Having
>> >>>> >> >> >>> >>> >> >> > said that I have a high level overview of what
>> is
>> >>>> a good
>> >>>> >> >> >>> approach
>> >>>> >> >> >>> >>> to
>> >>>> >> >> >>> >>> >> >> solve
>> >>>> >> >> >>> >>> >> >> > this problem. There are a number of papers on
>> the
>> >>>> >> internet
>> >>>> >> >> >>> which
>> >>>> >> >> >>> >>> >> describe
>> >>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named
>> entity
>> >>>> >> >> >>> recognition,
>> >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and
>> others.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
>> >>>> supports
>> >>>> >> >> >>> sentence
>> >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking,
>> NER
>> >>>> and
>> >>>> >> >> lemma.
>> >>>> >> >> >>> >>> support
>> >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees
>> is
>> >>>> >> currently
>> >>>> >> >> >>> >>> missing.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol
>> [4].
>> >>>> At
>> >>>> >> the
>> >>>> >> >> >>> moment
>> >>>> >> >> >>> >>> it
>> >>>> >> >> >>> >>> >> >> only supports English, but I do already work to
>> >>>> include
>> >>>> >> the
>> >>>> >> >> >>> other
>> >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
>> >>>> already
>> >>>> >> >> >>> integrated
>> >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6].
>> But
>> >>>> note
>> >>>> >> >> that
>> >>>> >> >> >>> for
>> >>>> >> >> >>> >>> all
>> >>>> >> >> >>> >>> >> >> those the integration excludes support for
>> >>>> co-reference
>> >>>> >> and
>> >>>> >> >> >>> >>> dependency
>> >>>> >> >> >>> >>> >> >> trees.
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a
>> first
>> >>>> >> >> prototype
>> >>>> >> >> >>> by
>> >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if
>> available
>> >>>> -
>> >>>> >> Chunks
>> >>>> >> >> >>> (e.g.
>> >>>> >> >> >>> >>> >> >> Noun phrases).
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature
>> like
>> >>>> >> Relation
>> >>>> >> >> >>> >>> extraction
>> >>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
>> >>>> >> >> >>> >>> >> > What kind of effort would be required for a
>> >>>> co-reference
>> >>>> >> >> >>> resolution
>> >>>> >> >> >>> >>> tool
>> >>>> >> >> >>> >>> >> > integration into Stanbol?
>> >>>> >> >> >>> >>> >> >
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
>> >>>> before
>> >>>> >> we
>> >>>> >> >> can
>> >>>> >> >> >>> >>> >> build such an engine we would need to
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
>> >>>> Annotations for
>> >>>> >> >> >>> >>> co-reference
>> >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for
>> those
>> >>>> >> >> annotation
>> >>>> >> >> >>> so
>> >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
>> >>>> >> co-reference
>> >>>> >> >> >>> >>> >> information
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
>> >>>> >> >> >>> >>> >> >
>> >>>> >> >> >>> >>> >> > 1. Determine the best data structure to
>> encapsulate
>> >>>> the
>> >>>> >> >> extracted
>> >>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper
>> structure to
>> >>>> >> >> represent
>> >>>> >> >> >>> >>> >> Events will only pay-off if we can also successfully
>> >>>> extract
>> >>>> >> >> such
>> >>>> >> >> >>> >>> >> information form processed texts.
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> I would start with
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >>  * fise:SettingAnnotation
>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>> >>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple
>> if
>> >>>> there
>> >>>> >> are
>> >>>> >> >> >>> more
>> >>>> >> >> >>> >>> >> suggestions)
>> >>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
>> >>>> >> fise:Instrument,
>> >>>> >> >> >>> >>> fise:Cause
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>> >>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> If it turns out that we can extract more, we can add
>> >>>> more
>> >>>> >> >> >>> structure to
>> >>>> >> >> >>> >>> >> those annotations. We might also think about using
>> an
>> >>>> own
>> >>>> >> >> namespace
>> >>>> >> >> >>> >>> >> for those extensions to the annotation structure.
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated
>> into
>> >>>> >> >> Stanbol.
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
>> >>>> >> enhancement
>> >>>> >> >> >>> chain
>> >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> You should have a look at
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot
>> of
>> >>>> things
>> >>>> >> >> with
>> >>>> >> >> >>> NLP
>> >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
>> >>>> verbs) to
>> >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
>> >>>> dependency
>> >>>> >> >> trees
>> >>>> >> >> >>> >>> >> you code will need to do similar things with Nouns,
>> >>>> Pronouns
>> >>>> >> and
>> >>>> >> >> >>> >>> >> Verbs.
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
>> >>>> >> >> representation
>> >>>> >> >> >>> of
>> >>>> >> >> >>> >>> >> present fise:TextAnnotation and
>> fise:EntityAnnotation
>> >>>> [2].
>> >>>> >> >> >>> Something
>> >>>> >> >> >>> >>> >> similar will also be required by the
>> >>>> EventExtractionEngine
>> >>>> >> for
>> >>>> >> >> fast
>> >>>> >> >> >>> >>> >> access to such annotations while iterating over the
>> >>>> >> Sentences of
>> >>>> >> >> >>> the
>> >>>> >> >> >>> >>> >> text.
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> best
>> >>>> >> >> >>> >>> >> Rupert
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> [1]
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>>
>> >>>> >> >>
>> >>>> >>
>> >>>>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>> >>>> >> >> >>> >>> >> [2]
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>>
>> >>>> >> >>
>> >>>> >>
>> >>>>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> >
>> >>>> >> >> >>> >>> >> > Thanks
>> >>>> >> >> >>> >>> >> >
>> >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
>> >>>> >> >> >>> >>> >> >> best
>> >>>> >> >> >>> >>> >> >> Rupert
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >> >> --
>> >>>> >> >> >>> >>> >> >> | Rupert Westenthaler
>> >>>> >> >> rupert.westenthaler@gmail.com
>> >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
>> >>>> >> >> >>> ++43-699-11108907
>> >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
>> >>>> >> >> >>> >>> >> >>
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>> >> --
>> >>>> >> >> >>> >>> >> | Rupert Westenthaler
>> >>>> >> rupert.westenthaler@gmail.com
>> >>>> >> >> >>> >>> >> | Bodenlehenstraße 11
>> >>>> >> >> >>> ++43-699-11108907
>> >>>> >> >> >>> >>> >> | A-5500 Bischofshofen
>> >>>> >> >> >>> >>> >>
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>> >>> --
>> >>>> >> >> >>> >>> | Rupert Westenthaler
>> >>>> rupert.westenthaler@gmail.com
>> >>>> >> >> >>> >>> | Bodenlehenstraße 11
>> >>>> >> >> ++43-699-11108907
>> >>>> >> >> >>> >>> | A-5500 Bischofshofen
>> >>>> >> >> >>> >>>
>> >>>> >> >> >>> >>
>> >>>> >> >> >>> >>
>> >>>> >> >> >>>
>> >>>> >> >> >>>
>> >>>> >> >> >>>
>> >>>> >> >> >>> --
>> >>>> >> >> >>> | Rupert Westenthaler
>> >>>> rupert.westenthaler@gmail.com
>> >>>> >> >> >>> | Bodenlehenstraße 11
>> >>>> ++43-699-11108907
>> >>>> >> >> >>> | A-5500 Bischofshofen
>> >>>> >> >> >>>
>> >>>> >> >> >>
>> >>>> >> >> >>
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >> --
>> >>>> >> >> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> >>>> >> >> | Bodenlehenstraße 11
>> >>>> ++43-699-11108907
>> >>>> >> >> | A-5500 Bischofshofen
>> >>>> >> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> --
>> >>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>>> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >>>> >> | A-5500 Bischofshofen
>> >>>> >>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>>> | Bodenlehenstraße 11                             ++43-699-11108907
>> >>>> | A-5500 Bischofshofen
>> >>>>
>> >>>
>> >>>
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Re: Relation extraction feature

Posted by Cristian Petroaca <cr...@gmail.com>.
Ok, then to sum it up we would have :

1. Coref

"stanbol.enhancer.nlp.coref" {
    "isRepresentative" : true/false, // whether this token or chunk is the
representative mention in the chain
    "mentions" : [ { "type" : "Token", // type of element which refers to
this token/chunk
 "start": 123 , // start index of the mentioning element
 "end": 130 // end index of the mentioning element
                    }, ...
                 ],
    "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
}


2. Dependency tree

"stanbol.enhancer.nlp.dependency" : {
"relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP
notation
                      "dep" : 12, // type of relation - Stanbol NLP mapped
value - ordinal number in enum Dependency
"role" : "gov/dep", // whether this token is the depender or the dependee
"type" : "Token", // type of element with which this token is in relation
"start" : 123, // start index of the relating token
"end" : 130 // end index of the relating token
},
...
]
"class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
}


2013/9/2 Rupert Westenthaler <ru...@gmail.com>

> Hi Cristian,
>
> let me provide some feedback to your proposals:
>
> ### Referring other Spans
>
> Both suggested annotations require to link other spans (Sentence,
> Chunk or Token). For that we should introduce a JSON element used for
> referring those elements and use it for all usages.
>
> In the java model this would allow you to have a reference to the
> other Span (Sentence, Chunk, Token). In the serialized form you would
> have JSON elements with the "type", "start" and "end" attributes as
> those three uniquely identify any span.
>
> Here an example based on the "mention" attribute as defined by the
> proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>
>     ...
>     "mentions" : [ {
>         "type" : "Token",
>         "start": 123 ,
>         "end": 130 } ,{
>         "type" : "Token",
>         "start": 157 ,
>         "end": 165 }],
>     ...
>
> Similar token links in
> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should also
> use this model.
>
> ### Usage of Controlled Vocabularies
>
> In addition the DependencyTag also seams to use a controlled
> vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
> NLP module tries to define those in some kind of Ontology. For POS
> tags we use OLIA ontology [1]. This is important as most NLP
> frameworks will use different strings and we need to unify those to
> commons IDs so that component that consume those data do not depend on
> a specific NLP tool.
>
> Because the usage of Ontologies within Java is not well supported. The
> Stanbol NLP module defines Java Enumerations for those Ontologies such
> as the POS type enumeration [2].
>
> Both the Java Model as well as the JSON serialization do support both
> (1) the lexical tag as used by the NLP tool and (2) the mapped
> concept. In the Java API via two different methods and in the JSON
> serialization via two separate keys.
>
> To make this more clear here an example for a POS annotation of a proper
> noun.
>
>     "stanbol.enhancer.nlp.pos" : {
>         "tag" : "PN",
>         "pos" : 53,
>         "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
>         "prob" : 0.95
>     }
>
> where
>
>     "tag" : "PN"
>
> is the lexical form as used by the NLP tool and
>
>     "pos" : 53
>
> refers to the ordinal number of the entry "ProperNoun" in the POS
> enumeration
>
> IMO the "type" property of DependencyTag should use a similar design.
>
> best
> Rupert
>
> [1] http://olia.nlp2rdf.org/
> [2]
> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java
>
> On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > Sorry, pressed sent too soon :).
> >
> > Continued :
> >
> > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
> > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
> >
> > Given this, we can have for each "Token" an additional dependency
> > annotation :
> >
> > "stanbol.enhancer.nlp.dependency" : {
> > "tag" : //is it necessary?
> > "relations" : [ { "type" : "nsubj", //type of relation
> >   "role" : "gov/dep", //whether it is depender or the dependee
> >   "dependencyValue" : "met", // the word with which the token has a
> relation
> >   "dependencyIndexInSentence" : "2" //the index of the dependency in the
> > current sentence
> > }
> > ...
> > ]
> >                 "class" :
> > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
> >         }
> >
> > 2013/9/1 Cristian Petroaca <cr...@gmail.com>
> >
> >> Related to the Stanford Dependency Tree Feature, this is the way the
> >> output from the tool looks like for this sentence : "Mary and Tom met
> Danny
> >> today" :
> >>
> >>
> >> 2013/8/30 Cristian Petroaca <cr...@gmail.com>
> >>
> >>> Hi Rupert,
> >>>
> >>> Ok, so after looking at the JSON output from the Stanford NLP Server
> and
> >>> the coref module I'm thinking I can represent the coreference
> information
> >>> this way:
> >>> Each "Token" or "Chunk" will contain an additional coref annotation
> with
> >>> the following structure :
> >>>
> >>> "stanbol.enhancer.nlp.coref" {
> >>>     "tag" : //does this need to exist?
> >>>     "isRepresentative" : true/false, // whether this token or chunk is
> >>> the representative mention in the chain
> >>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the
> mention
> >>> is found
> >>>                            "startWord" : 2 //the first word making up
> the
> >>> mention
> >>>                            "endWord" : 3 //the last word making up the
> >>> mention
> >>>                          }, ...
> >>>                        ],
> >>>     "class" : ""class" :
> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >>> }
> >>>
> >>> The CorefTag should resemble this model.
> >>>
> >>> What do you think?
> >>>
> >>> Cristian
> >>>
> >>>
> >>> 2013/8/24 Rupert Westenthaler <ru...@gmail.com>
> >>>
> >>>> Hi Cristian,
> >>>>
> >>>> you can not directly call StanfordNLP components from Stanbol, but you
> >>>> have to extend the RESTful service to include the information you
> >>>> need. The main reason for that is that the license of StanfordNLP is
> >>>> not compatible with the Apache Software License. So Stanbol can not
> >>>> directly link to the StanfordNLP API.
> >>>>
> >>>> You will need to
> >>>>
> >>>> 1. define an additional class {yourTag} extends Tag<{yourType}> class
> >>>> in the o.a.s.enhancer.nlp module
> >>>> 2. add JSON parsing and serialization support for this tag to the
> >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
> >>>>
> >>>> As (1) would be necessary anyway the only additional thing you need to
> >>>> develop is (2). After that you can add {yourTag} instance to the
> >>>> AnalyzedText in the StanfornNLP integration. The
> >>>> RestfulNlpAnalysisEngine will parse them from the response. All
> >>>> engines executed after the RestfulNlpAnalysisEngine will have access
> >>>> to your annotations.
> >>>>
> >>>> If you have a design for {yourTag} - the model you would like to use
> >>>> to represent your data - I can help with (1) and (2).
> >>>>
> >>>> best
> >>>> Rupert
> >>>>
> >>>>
> >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
> >>>> <cr...@gmail.com> wrote:
> >>>> > Hi Rupert,
> >>>> >
> >>>> > Thanks for the info. Looking at the standbol-stanfordnlp project I
> see
> >>>> that
> >>>> > the stanford nlp is not implemented as an EnhancementEngine but
> rather
> >>>> it
> >>>> > is used directly in a Jetty Server instance. How does that fit into
> the
> >>>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
> >>>> routine
> >>>> > from my TripleExtractionEnhancementEngine which lives in the Stanbol
> >>>> stack?
> >>>> >
> >>>> > Thanks,
> >>>> > Cristian
> >>>> >
> >>>> >
> >>>> > 2013/8/12 Rupert Westenthaler <ru...@gmail.com>
> >>>> >
> >>>> >> Hi Cristian,
> >>>> >>
> >>>> >> Sorry for the late response, but I was offline for the last two
> weeks
> >>>> >>
> >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
> >>>> >> <cr...@gmail.com> wrote:
> >>>> >> > Hi Rupert,
> >>>> >> >
> >>>> >> > After doing some tests it seems that the Stanford NLP coreference
> >>>> module
> >>>> >> is
> >>>> >> > much more accurate than the Open NLP one.So I decided to extend
> >>>> Stanford
> >>>> >> > NLP to add coreference there.
> >>>> >>
> >>>> >> The Stanford NLP integration is not part of the Stanbol codebase
> >>>> >> because the licenses are not compatible.
> >>>> >>
> >>>> >> You can find the Stanford NLP integration on
> >>>> >>
> >>>> >>     https://github.com/westei/stanbol-stanfordnlp
> >>>> >>
> >>>> >> just create a fork and send pull requests.
> >>>> >>
> >>>> >>
> >>>> >> > Could you add the necessary projects on the branch? And also
> remove
> >>>> the
> >>>> >> > Open NLP ones?
> >>>> >> >
> >>>> >>
> >>>> >> Currently the branch
> >>>> >>
> >>>> >>
> >>>> >>
> >>>>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >>>> >>
> >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those
> should
> >>>> >> be enough for adding coreference support.
> >>>> >>
> >>>> >> IMO you will need to
> >>>> >>
> >>>> >> * add an model for representing coreference to the nlp module
> >>>> >> * add parsing and serializing support to the nlp-json module
> >>>> >> * add the implementation to your fork of the stanbol-stanfordnlp
> >>>> project
> >>>> >>
> >>>> >> best
> >>>> >> Rupert
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> > Thanks,
> >>>> >> > Cristian
> >>>> >> >
> >>>> >> >
> >>>> >> > 2013/7/5 Rupert Westenthaler <ru...@gmail.com>
> >>>> >> >
> >>>> >> >> Hi Cristian,
> >>>> >> >>
> >>>> >> >> I created the branch at
> >>>> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> >>
> >>>>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >>>> >> >>
> >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me
> know
> >>>> if
> >>>> >> >> you would like to have more
> >>>> >> >>
> >>>> >> >> best
> >>>> >> >> Rupert
> >>>> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
> >>>> >> >> <cr...@gmail.com> wrote:
> >>>> >> >> > Hi Rupert,
> >>>> >> >> >
> >>>> >> >> > I created jiras :
> >>>> https://issues.apache.org/jira/browse/STANBOL-1132and
> >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
> >>>> original one
> >>>> >> in
> >>>> >> >> > dependent upon these.
> >>>> >> >> > Please let me know when I can start using the branch.
> >>>> >> >> >
> >>>> >> >> > Thanks,
> >>>> >> >> > Cristian
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > 2013/6/27 Cristian Petroaca <cr...@gmail.com>
> >>>> >> >> >
> >>>> >> >> >>
> >>>> >> >> >>
> >>>> >> >> >>
> >>>> >> >> >> 2013/6/27 Rupert Westenthaler <rupert.westenthaler@gmail.com
> >
> >>>> >> >> >>
> >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
> >>>> >> >> >>> <cr...@gmail.com> wrote:
> >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
> >>>> previous
> >>>> >> >> e-mail.
> >>>> >> >> >>> By
> >>>> >> >> >>> > the way, does Open NLP have the ability to build
> dependency
> >>>> trees?
> >>>> >> >> >>> >
> >>>> >> >> >>>
> >>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
> >>>> >> >> >>>
> >>>> >> >> >>
> >>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
> >>>> Stanbol,
> >>>> >> I'll
> >>>> >> >> >> take a look at how I can extend its integration to include
> the
> >>>> >> >> dependency
> >>>> >> >> >> tree feature.
> >>>> >> >> >>
> >>>> >> >> >>>
> >>>> >> >> >>>
> >>>> >> >> >>  >
> >>>> >> >> >>> > 2013/6/23 Cristian Petroaca <cr...@gmail.com>
> >>>> >> >> >>> >
> >>>> >> >> >>> >> Hi Rupert,
> >>>> >> >> >>> >>
> >>>> >> >> >>> >> I created jira
> >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
> >>>> >> >> >>> >> As you suggested I would start with extending the
> Stanford
> >>>> NLP
> >>>> >> with
> >>>> >> >> >>> >> co-reference resolution but I think also with dependency
> >>>> trees
> >>>> >> >> because
> >>>> >> >> >>> I
> >>>> >> >> >>> >> also need to know the Subject of the sentence and the
> object
> >>>> >> that it
> >>>> >> >> >>> >> affects, right?
> >>>> >> >> >>> >>
> >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in
> Stanbol
> >>>> for
> >>>> >> >> >>> >> co-reference and dependency trees, how do I proceed with
> >>>> this?
> >>>> >> Do I
> >>>> >> >> >>> create
> >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that
> can I
> >>>> >> start
> >>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm
> done
> >>>> I'll
> >>>> >> send
> >>>> >> >> >>> you
> >>>> >> >> >>> >> guys the patch fo review?
> >>>> >> >> >>> >>
> >>>> >> >> >>>
> >>>> >> >> >>> I would create two "New Feature" type Issues one for adding
> >>>> support
> >>>> >> >> >>> for "dependency trees" and the other for "co-reference"
> >>>> support. You
> >>>> >> >> >>> should also define "depends on" relations between
> STANBOL-1121
> >>>> and
> >>>> >> >> >>> those two new issues.
> >>>> >> >> >>>
> >>>> >> >> >>> Sub-task could also work, but as adding those features would
> >>>> be also
> >>>> >> >> >>> interesting for other things I would rather define them as
> >>>> separate
> >>>> >> >> >>> issues.
> >>>> >> >> >>>
> >>>> >> >> >>>
> >>>> >> >> >> 2 New Features connected with the original jira it is then.
> >>>> >> >> >>
> >>>> >> >> >>
> >>>> >> >> >>> If you would prefer to work in an own branch please tell me.
> >>>> This
> >>>> >> >> >>> could have the advantage that patches would not be affected
> by
> >>>> >> changes
> >>>> >> >> >>> in the trunk.
> >>>> >> >> >>>
> >>>> >> >> >>> Yes, a separate branch sounds good.
> >>>> >> >> >>
> >>>> >> >> >> best
> >>>> >> >> >>> Rupert
> >>>> >> >> >>>
> >>>> >> >> >>> >> Regards,
> >>>> >> >> >>> >> Cristian
> >>>> >> >> >>> >>
> >>>> >> >> >>> >>
> >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
> >>>> rupert.westenthaler@gmail.com>
> >>>> >> >> >>> >>
> >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
> >>>> >> >> >>> >>> <cr...@gmail.com> wrote:
> >>>> >> >> >>> >>> > Hi Rupert,
> >>>> >> >> >>> >>> >
> >>>> >> >> >>> >>> > Agreed on the
> >>>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
> >>>> >> >> >>> >>> > data structure.
> >>>> >> >> >>> >>> >
> >>>> >> >> >>> >>> > Should I open up a Jira for all of this in order to
> >>>> >> encapsulate
> >>>> >> >> this
> >>>> >> >> >>> >>> > information and establish the goals and these initial
> >>>> steps
> >>>> >> >> towards
> >>>> >> >> >>> >>> these
> >>>> >> >> >>> >>> > goals?
> >>>> >> >> >>> >>>
> >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
> >>>> >> >> >>> >>>
> >>>> >> >> >>> >>> > How should I proceed further? Should I create some
> design
> >>>> >> >> documents
> >>>> >> >> >>> that
> >>>> >> >> >>> >>> > need to be reviewed?
> >>>> >> >> >>> >>>
> >>>> >> >> >>> >>> Usually it is the best to write design related text
> >>>> directly in
> >>>> >> >> JIRA
> >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later
> to
> >>>> use
> >>>> >> this
> >>>> >> >> >>> >>> text directly for the documentation on the Stanbol
> Webpage.
> >>>> >> >> >>> >>>
> >>>> >> >> >>> >>> best
> >>>> >> >> >>> >>> Rupert
> >>>> >> >> >>> >>>
> >>>> >> >> >>> >>>
> >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
> >>>> >> >> >>> >>> >
> >>>> >> >> >>> >>> > Regards,
> >>>> >> >> >>> >>> > Cristian
> >>>> >> >> >>> >>> >
> >>>> >> >> >>> >>> >
> >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
> >>>> rupert.westenthaler@gmail.com>
> >>>> >> >> >>> >>> >
> >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
> >>>> >> >> >>> >>> >> <cr...@gmail.com> wrote:
> >>>> >> >> >>> >>> >> > HI Rupert,
> >>>> >> >> >>> >>> >> >
> >>>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
> >>>> >> >> >>> >>> >> >
> >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
> >>>> >> rupert.westenthaler@gmail.com>
> >>>> >> >> >>> >>> >> >
> >>>> >> >> >>> >>> >> >> Hi Cristian, all
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> really interesting use case!
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions
> on
> >>>> how
> >>>> >> this
> >>>> >> >> >>> could
> >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
> >>>> experiences
> >>>> >> >> and
> >>>> >> >> >>> >>> lessons
> >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
> >>>> >> information
> >>>> >> >> >>> system
> >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this
> Project
> >>>> >> excluded
> >>>> >> >> the
> >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text
> (because
> >>>> the
> >>>> >> >> Olympic
> >>>> >> >> >>> >>> >> >> Information System was already providing event
> data
> >>>> as XML
> >>>> >> >> >>> messages)
> >>>> >> >> >>> >>> >> >> the semantic search capabilities of this system
> >>>> where very
> >>>> >> >> >>> similar
> >>>> >> >> >>> >>> as
> >>>> >> >> >>> >>> >> >> the one described by your use case.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations,
> >>>> but a
> >>>> >> >> formal
> >>>> >> >> >>> >>> >> >> representation of the situation described by the
> >>>> text. So
> >>>> >> >> lets
> >>>> >> >> >>> >>> assume
> >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or
> Situation)
> >>>> >> >> described
> >>>> >> >> >>> in
> >>>> >> >> >>> >>> the
> >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
> >>>> advices on
> >>>> >> >> how to
> >>>> >> >> >>> >>> model
> >>>> >> >> >>> >>> >> >> those. The important relation for modeling this
> >>>> >> >> Participation:
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> where ..
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do
> have
> >>>> an
> >>>> >> >> >>> identity so
> >>>> >> >> >>> >>> we
> >>>> >> >> >>> >>> >> >> would typically refer to them as Entities
> referenced
> >>>> by a
> >>>> >> >> >>> setting.
> >>>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as
> >>>> well as
> >>>> >> >> >>> >>> >> >> social-objects.
> >>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants are
> >>>> >> entities
> >>>> >> >> that
> >>>> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities
> ...
> >>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
> >>>> relation
> >>>> >> where
> >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
> >>>> intermediate
> >>>> >> >> >>> resources
> >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to
> >>>> define
> >>>> >> one
> >>>> >> >> >>> resource
> >>>> >> >> >>> >>> >> >> being the context for all described data. I would
> >>>> call
> >>>> >> this
> >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
> >>>> sub-concept to
> >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about
> the
> >>>> >> extracted
> >>>> >> >> >>> >>> Setting
> >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to annotate
> >>>> that
> >>>> >> >> >>> Endurant is
> >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
> >>>> >> >> >>> fise:SettingAnnotation).
> >>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
> >>>> >> >> fise:TextAnnotaion
> >>>> >> >> >>> (the
> >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
> >>>> Entities).
> >>>> >> >> >>> Basically
> >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
> >>>> >> >> EnhancementEngine
> >>>> >> >> >>> to
> >>>> >> >> >>> >>> >> >> state that several mentions (in possible different
> >>>> >> >> sentences) do
> >>>> >> >> >>> >>> >> >> represent the same Endurant as participating in
> the
> >>>> >> Setting.
> >>>> >> >> In
> >>>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
> >>>> property
> >>>> >> >> >>> (similar
> >>>> >> >> >>> >>> as
> >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s)
> of
> >>>> an
> >>>> >> >> >>> participant
> >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an
> >>>> action)
> >>>> >> Cause
> >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
> >>>> passive
> >>>> >> role
> >>>> >> >> in
> >>>> >> >> >>> an
> >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but
> I am
> >>>> >> >> wondering
> >>>> >> >> >>> if
> >>>> >> >> >>> >>> one
> >>>> >> >> >>> >>> >> >> could extract those information.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
> >>>> >> Perdurant
> >>>> >> >> in
> >>>> >> >> >>> the
> >>>> >> >> >>> >>> >> >> context of the Setting. Also
> >>>> fise:OccurrentAnnotation can
> >>>> >> >> link
> >>>> >> >> >>> to
> >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
> >>>> defining
> >>>> >> the
> >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
> >>>> suggesting
> >>>> >> well
> >>>> >> >> >>> known
> >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
> >>>> country,
> >>>> >> or
> >>>> >> >> an
> >>>> >> >> >>> >>> >> >> upraising ...). In addition
> fise:OccurrentAnnotation
> >>>> can
> >>>> >> >> define
> >>>> >> >> >>> >>> >> >> dc:has-participant links to
> >>>> fise:ParticipantAnnotation. In
> >>>> >> >> this
> >>>> >> >> >>> case
> >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
> >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
> >>>> Perturant
> >>>> >> (the
> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
> >>>> temporal
> >>>> >> >> indexed
> >>>> >> >> >>> this
> >>>> >> >> >>> >>> >> >> annotation should also support properties for
> >>>> defining the
> >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot
> of
> >>>> sense
> >>>> >> >> with
> >>>> >> >> >>> the
> >>>> >> >> >>> >>> >> remark
> >>>> >> >> >>> >>> >> > that you probably won't be able to always extract
> the
> >>>> date
> >>>> >> >> for a
> >>>> >> >> >>> >>> given
> >>>> >> >> >>> >>> >> > setting(situation).
> >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
> >>>> >> >> >>> >>> >> >
> >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which
> the
> >>>> >> object
> >>>> >> >> upon
> >>>> >> >> >>> >>> which
> >>>> >> >> >>> >>> >> the
> >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a
> transitory
> >>>> >> object (
> >>>> >> >> >>> such
> >>>> >> >> >>> >>> as an
> >>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
> >>>> example
> >>>> >> we
> >>>> >> >> can
> >>>> >> >> >>> >>> have
> >>>> >> >> >>> >>> >> the
> >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the
> Endurant
> >>>> (
> >>>> >> >> Subject )
> >>>> >> >> >>> >>> which
> >>>> >> >> >>> >>> >> > performs the action of "invading" on another
> >>>> Eundurant,
> >>>> >> namely
> >>>> >> >> >>> >>> "Irak".
> >>>> >> >> >>> >>> >> >
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
> >>>> Patient.
> >>>> >> Both
> >>>> >> >> >>> are
> >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
> >>>> Perdurant. So
> >>>> >> >> >>> ideally
> >>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the
> dc:type
> >>>> >> >> caos:Agent,
> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
> >>>> >> >> >>> fise:EntityAnnotation
> >>>> >> >> >>> >>> >> linking to dbpedia:United_States
> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the
> dc:type
> >>>> >> >> >>> caos:Patient,
> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
> >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
> >>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
> >>>> dc:type
> >>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
> >>>> "invades"
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and
> >>>> the
> >>>> >> Object
> >>>> >> >> >>> come
> >>>> >> >> >>> >>> into
> >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
> >>>> >> dc:"property"
> >>>> >> >> >>> where
> >>>> >> >> >>> >>> the
> >>>> >> >> >>> >>> >> > property = verb which links to the Object in noun
> >>>> form. For
> >>>> >> >> >>> example
> >>>> >> >> >>> >>> take
> >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would
> have
> >>>> the
> >>>> >> >> "USA"
> >>>> >> >> >>> >>> Entity
> >>>> >> >> >>> >>> >> with
> >>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
> >>>> Endurant
> >>>> >> >> would
> >>>> >> >> >>> >>> have as
> >>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs
> which
> >>>> link
> >>>> >> it
> >>>> >> >> to
> >>>> >> >> >>> an
> >>>> >> >> >>> >>> >> Object.
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> As explained above you would have a
> >>>> fise:OccurrentAnnotation
> >>>> >> >> that
> >>>> >> >> >>> >>> >> represents the Perdurant. The information that the
> >>>> activity
> >>>> >> >> >>> mention in
> >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
> >>>> >> >> >>> fise:TextAnnotation. If
> >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that
> defines
> >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could
> >>>> also link
> >>>> >> >> to an
> >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> best
> >>>> >> >> >>> >>> >> Rupert
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> >
> >>>> >> >> >>> >>> >> > ### Consuming the data:
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> I think this model should be sufficient for
> >>>> use-cases as
> >>>> >> >> >>> described
> >>>> >> >> >>> >>> by
> >>>> >> >> >>> >>> >> you.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> Users would be able to consume data on the setting
> >>>> level.
> >>>> >> >> This
> >>>> >> >> >>> can
> >>>> >> >> >>> >>> be
> >>>> >> >> >>> >>> >> >> done my simple retrieving all
> >>>> fise:ParticipantAnnotation
> >>>> >> as
> >>>> >> >> >>> well as
> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting.
> BTW
> >>>> this
> >>>> >> was
> >>>> >> >> the
> >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
> >>>> allows
> >>>> >> >> >>> queries for
> >>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you
> >>>> could
> >>>> >> filter
> >>>> >> >> >>> for
> >>>> >> >> >>> >>> >> >> Settings that involve a {Person},
> >>>> activities:Arrested and
> >>>> >> a
> >>>> >> >> >>> specific
> >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach
> >>>> you will
> >>>> >> >> get
> >>>> >> >> >>> >>> results
> >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated and an
> >>>> other
> >>>> >> >> person
> >>>> >> >> >>> was
> >>>> >> >> >>> >>> >> >> arrested.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> An other possibility would be to process
> enhancement
> >>>> >> results
> >>>> >> >> on
> >>>> >> >> >>> the
> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a
> much
> >>>> >> higher
> >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to
> correctly
> >>>> answer
> >>>> >> >> the
> >>>> >> >> >>> query
> >>>> >> >> >>> >>> >> >> used as an example above). But I am wondering if
> the
> >>>> >> quality
> >>>> >> >> of
> >>>> >> >> >>> the
> >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I
> >>>> have
> >>>> >> also
> >>>> >> >> >>> doubts
> >>>> >> >> >>> >>> if
> >>>> >> >> >>> >>> >> >> this can be still realized by using semantic
> >>>> indexing to
> >>>> >> >> Apache
> >>>> >> >> >>> Solr
> >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store
> results
> >>>> in a
> >>>> >> >> >>> TripleStore
> >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> The methodology and query language used by YAGO
> [3]
> >>>> is
> >>>> >> also
> >>>> >> >> very
> >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7
> SPOTL(X)
> >>>> >> >> >>> >>> Representation).
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of
> Entities
> >>>> >> >> (especially
> >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
> >>>> extracted
> >>>> >> form
> >>>> >> >> >>> >>> Documents.
> >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
> >>>> temporal
> >>>> >> >> indexed.
> >>>> >> >> >>> That
> >>>> >> >> >>> >>> >> >> means that at the time when added to a knowledge
> >>>> base they
> >>>> >> >> might
> >>>> >> >> >>> >>> still
> >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
> >>>> refinement
> >>>> >> of
> >>>> >> >> such
> >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
> >>>> critical for
> >>>> >> a
> >>>> >> >> >>> System
> >>>> >> >> >>> >>> >> >> like described in your use-case.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
> >>>> >> >> >>> >>> >> >> <cr...@gmail.com> wrote:
> >>>> >> >> >>> >>> >> >> >
> >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new in
> the
> >>>> >> field
> >>>> >> >> of
> >>>> >> >> >>> >>> semantic
> >>>> >> >> >>> >>> >> >> > technologies, I've started to read about them in
> >>>> the
> >>>> >> last
> >>>> >> >> 4-5
> >>>> >> >> >>> >>> >> >> months.Having
> >>>> >> >> >>> >>> >> >> > said that I have a high level overview of what
> is
> >>>> a good
> >>>> >> >> >>> approach
> >>>> >> >> >>> >>> to
> >>>> >> >> >>> >>> >> >> solve
> >>>> >> >> >>> >>> >> >> > this problem. There are a number of papers on
> the
> >>>> >> internet
> >>>> >> >> >>> which
> >>>> >> >> >>> >>> >> describe
> >>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named
> entity
> >>>> >> >> >>> recognition,
> >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
> >>>> supports
> >>>> >> >> >>> sentence
> >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking,
> NER
> >>>> and
> >>>> >> >> lemma.
> >>>> >> >> >>> >>> support
> >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees
> is
> >>>> >> currently
> >>>> >> >> >>> >>> missing.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol
> [4].
> >>>> At
> >>>> >> the
> >>>> >> >> >>> moment
> >>>> >> >> >>> >>> it
> >>>> >> >> >>> >>> >> >> only supports English, but I do already work to
> >>>> include
> >>>> >> the
> >>>> >> >> >>> other
> >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
> >>>> already
> >>>> >> >> >>> integrated
> >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6].
> But
> >>>> note
> >>>> >> >> that
> >>>> >> >> >>> for
> >>>> >> >> >>> >>> all
> >>>> >> >> >>> >>> >> >> those the integration excludes support for
> >>>> co-reference
> >>>> >> and
> >>>> >> >> >>> >>> dependency
> >>>> >> >> >>> >>> >> >> trees.
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a
> first
> >>>> >> >> prototype
> >>>> >> >> >>> by
> >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if
> available
> >>>> -
> >>>> >> Chunks
> >>>> >> >> >>> (e.g.
> >>>> >> >> >>> >>> >> >> Noun phrases).
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature
> like
> >>>> >> Relation
> >>>> >> >> >>> >>> extraction
> >>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
> >>>> >> >> >>> >>> >> > What kind of effort would be required for a
> >>>> co-reference
> >>>> >> >> >>> resolution
> >>>> >> >> >>> >>> tool
> >>>> >> >> >>> >>> >> > integration into Stanbol?
> >>>> >> >> >>> >>> >> >
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
> >>>> before
> >>>> >> we
> >>>> >> >> can
> >>>> >> >> >>> >>> >> build such an engine we would need to
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
> >>>> Annotations for
> >>>> >> >> >>> >>> co-reference
> >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for
> those
> >>>> >> >> annotation
> >>>> >> >> >>> so
> >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
> >>>> >> co-reference
> >>>> >> >> >>> >>> >> information
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
> >>>> >> >> >>> >>> >> >
> >>>> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate
> >>>> the
> >>>> >> >> extracted
> >>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper
> structure to
> >>>> >> >> represent
> >>>> >> >> >>> >>> >> Events will only pay-off if we can also successfully
> >>>> extract
> >>>> >> >> such
> >>>> >> >> >>> >>> >> information form processed texts.
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> I would start with
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >>  * fise:SettingAnnotation
> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple if
> >>>> there
> >>>> >> are
> >>>> >> >> >>> more
> >>>> >> >> >>> >>> >> suggestions)
> >>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
> >>>> >> fise:Instrument,
> >>>> >> >> >>> >>> fise:Cause
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> If it turns out that we can extract more, we can add
> >>>> more
> >>>> >> >> >>> structure to
> >>>> >> >> >>> >>> >> those annotations. We might also think about using an
> >>>> own
> >>>> >> >> namespace
> >>>> >> >> >>> >>> >> for those extensions to the annotation structure.
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated
> into
> >>>> >> >> Stanbol.
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
> >>>> >> enhancement
> >>>> >> >> >>> chain
> >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> You should have a look at
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot
> of
> >>>> things
> >>>> >> >> with
> >>>> >> >> >>> NLP
> >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
> >>>> verbs) to
> >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
> >>>> dependency
> >>>> >> >> trees
> >>>> >> >> >>> >>> >> you code will need to do similar things with Nouns,
> >>>> Pronouns
> >>>> >> and
> >>>> >> >> >>> >>> >> Verbs.
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
> >>>> >> >> representation
> >>>> >> >> >>> of
> >>>> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation
> >>>> [2].
> >>>> >> >> >>> Something
> >>>> >> >> >>> >>> >> similar will also be required by the
> >>>> EventExtractionEngine
> >>>> >> for
> >>>> >> >> fast
> >>>> >> >> >>> >>> >> access to such annotations while iterating over the
> >>>> >> Sentences of
> >>>> >> >> >>> the
> >>>> >> >> >>> >>> >> text.
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> best
> >>>> >> >> >>> >>> >> Rupert
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> [1]
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>>
> >>>> >> >> >>>
> >>>> >> >>
> >>>> >>
> >>>>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
> >>>> >> >> >>> >>> >> [2]
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>>
> >>>> >> >> >>>
> >>>> >> >>
> >>>> >>
> >>>>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> >
> >>>> >> >> >>> >>> >> > Thanks
> >>>> >> >> >>> >>> >> >
> >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
> >>>> >> >> >>> >>> >> >> best
> >>>> >> >> >>> >>> >> >> Rupert
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >> >> --
> >>>> >> >> >>> >>> >> >> | Rupert Westenthaler
> >>>> >> >> rupert.westenthaler@gmail.com
> >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
> >>>> >> >> >>> ++43-699-11108907
> >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
> >>>> >> >> >>> >>> >> >>
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>> >> --
> >>>> >> >> >>> >>> >> | Rupert Westenthaler
> >>>> >> rupert.westenthaler@gmail.com
> >>>> >> >> >>> >>> >> | Bodenlehenstraße 11
> >>>> >> >> >>> ++43-699-11108907
> >>>> >> >> >>> >>> >> | A-5500 Bischofshofen
> >>>> >> >> >>> >>> >>
> >>>> >> >> >>> >>>
> >>>> >> >> >>> >>>
> >>>> >> >> >>> >>>
> >>>> >> >> >>> >>> --
> >>>> >> >> >>> >>> | Rupert Westenthaler
> >>>> rupert.westenthaler@gmail.com
> >>>> >> >> >>> >>> | Bodenlehenstraße 11
> >>>> >> >> ++43-699-11108907
> >>>> >> >> >>> >>> | A-5500 Bischofshofen
> >>>> >> >> >>> >>>
> >>>> >> >> >>> >>
> >>>> >> >> >>> >>
> >>>> >> >> >>>
> >>>> >> >> >>>
> >>>> >> >> >>>
> >>>> >> >> >>> --
> >>>> >> >> >>> | Rupert Westenthaler
> >>>> rupert.westenthaler@gmail.com
> >>>> >> >> >>> | Bodenlehenstraße 11
> >>>> ++43-699-11108907
> >>>> >> >> >>> | A-5500 Bischofshofen
> >>>> >> >> >>>
> >>>> >> >> >>
> >>>> >> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> --
> >>>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>>> >> >> | Bodenlehenstraße 11
> >>>> ++43-699-11108907
> >>>> >> >> | A-5500 Bischofshofen
> >>>> >> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> --
> >>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >>>> >> | A-5500 Bischofshofen
> >>>> >>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>>> | Bodenlehenstraße 11                             ++43-699-11108907
> >>>> | A-5500 Bischofshofen
> >>>>
> >>>
> >>>
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Relation extraction feature

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian,

let me provide some feedback to your proposals:

### Referring other Spans

Both suggested annotations require to link other spans (Sentence,
Chunk or Token). For that we should introduce a JSON element used for
referring those elements and use it for all usages.

In the java model this would allow you to have a reference to the
other Span (Sentence, Chunk, Token). In the serialized form you would
have JSON elements with the "type", "start" and "end" attributes as
those three uniquely identify any span.

Here an example based on the "mention" attribute as defined by the
proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"

    ...
    "mentions" : [ {
        "type" : "Token",
        "start": 123 ,
        "end": 130 } ,{
        "type" : "Token",
        "start": 157 ,
        "end": 165 }],
    ...

Similar token links in
"org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should also
use this model.

### Usage of Controlled Vocabularies

In addition the DependencyTag also seams to use a controlled
vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
NLP module tries to define those in some kind of Ontology. For POS
tags we use OLIA ontology [1]. This is important as most NLP
frameworks will use different strings and we need to unify those to
commons IDs so that component that consume those data do not depend on
a specific NLP tool.

Because the usage of Ontologies within Java is not well supported. The
Stanbol NLP module defines Java Enumerations for those Ontologies such
as the POS type enumeration [2].

Both the Java Model as well as the JSON serialization do support both
(1) the lexical tag as used by the NLP tool and (2) the mapped
concept. In the Java API via two different methods and in the JSON
serialization via two separate keys.

To make this more clear here an example for a POS annotation of a proper noun.

    "stanbol.enhancer.nlp.pos" : {
        "tag" : "PN",
        "pos" : 53,
        "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
        "prob" : 0.95
    }

where

    "tag" : "PN"

is the lexical form as used by the NLP tool and

    "pos" : 53

refers to the ordinal number of the entry "ProperNoun" in the POS enumeration

IMO the "type" property of DependencyTag should use a similar design.

best
Rupert

[1] http://olia.nlp2rdf.org/
[2] http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java

On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
<cr...@gmail.com> wrote:
> Sorry, pressed sent too soon :).
>
> Continued :
>
> nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
> root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
>
> Given this, we can have for each "Token" an additional dependency
> annotation :
>
> "stanbol.enhancer.nlp.dependency" : {
> "tag" : //is it necessary?
> "relations" : [ { "type" : "nsubj", //type of relation
>   "role" : "gov/dep", //whether it is depender or the dependee
>   "dependencyValue" : "met", // the word with which the token has a relation
>   "dependencyIndexInSentence" : "2" //the index of the dependency in the
> current sentence
> }
> ...
> ]
>                 "class" :
> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
>         }
>
> 2013/9/1 Cristian Petroaca <cr...@gmail.com>
>
>> Related to the Stanford Dependency Tree Feature, this is the way the
>> output from the tool looks like for this sentence : "Mary and Tom met Danny
>> today" :
>>
>>
>> 2013/8/30 Cristian Petroaca <cr...@gmail.com>
>>
>>> Hi Rupert,
>>>
>>> Ok, so after looking at the JSON output from the Stanford NLP Server and
>>> the coref module I'm thinking I can represent the coreference information
>>> this way:
>>> Each "Token" or "Chunk" will contain an additional coref annotation with
>>> the following structure :
>>>
>>> "stanbol.enhancer.nlp.coref" {
>>>     "tag" : //does this need to exist?
>>>     "isRepresentative" : true/false, // whether this token or chunk is
>>> the representative mention in the chain
>>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the mention
>>> is found
>>>                            "startWord" : 2 //the first word making up the
>>> mention
>>>                            "endWord" : 3 //the last word making up the
>>> mention
>>>                          }, ...
>>>                        ],
>>>     "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>>> }
>>>
>>> The CorefTag should resemble this model.
>>>
>>> What do you think?
>>>
>>> Cristian
>>>
>>>
>>> 2013/8/24 Rupert Westenthaler <ru...@gmail.com>
>>>
>>>> Hi Cristian,
>>>>
>>>> you can not directly call StanfordNLP components from Stanbol, but you
>>>> have to extend the RESTful service to include the information you
>>>> need. The main reason for that is that the license of StanfordNLP is
>>>> not compatible with the Apache Software License. So Stanbol can not
>>>> directly link to the StanfordNLP API.
>>>>
>>>> You will need to
>>>>
>>>> 1. define an additional class {yourTag} extends Tag<{yourType}> class
>>>> in the o.a.s.enhancer.nlp module
>>>> 2. add JSON parsing and serialization support for this tag to the
>>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
>>>>
>>>> As (1) would be necessary anyway the only additional thing you need to
>>>> develop is (2). After that you can add {yourTag} instance to the
>>>> AnalyzedText in the StanfornNLP integration. The
>>>> RestfulNlpAnalysisEngine will parse them from the response. All
>>>> engines executed after the RestfulNlpAnalysisEngine will have access
>>>> to your annotations.
>>>>
>>>> If you have a design for {yourTag} - the model you would like to use
>>>> to represent your data - I can help with (1) and (2).
>>>>
>>>> best
>>>> Rupert
>>>>
>>>>
>>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
>>>> <cr...@gmail.com> wrote:
>>>> > Hi Rupert,
>>>> >
>>>> > Thanks for the info. Looking at the standbol-stanfordnlp project I see
>>>> that
>>>> > the stanford nlp is not implemented as an EnhancementEngine but rather
>>>> it
>>>> > is used directly in a Jetty Server instance. How does that fit into the
>>>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
>>>> routine
>>>> > from my TripleExtractionEnhancementEngine which lives in the Stanbol
>>>> stack?
>>>> >
>>>> > Thanks,
>>>> > Cristian
>>>> >
>>>> >
>>>> > 2013/8/12 Rupert Westenthaler <ru...@gmail.com>
>>>> >
>>>> >> Hi Cristian,
>>>> >>
>>>> >> Sorry for the late response, but I was offline for the last two weeks
>>>> >>
>>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
>>>> >> <cr...@gmail.com> wrote:
>>>> >> > Hi Rupert,
>>>> >> >
>>>> >> > After doing some tests it seems that the Stanford NLP coreference
>>>> module
>>>> >> is
>>>> >> > much more accurate than the Open NLP one.So I decided to extend
>>>> Stanford
>>>> >> > NLP to add coreference there.
>>>> >>
>>>> >> The Stanford NLP integration is not part of the Stanbol codebase
>>>> >> because the licenses are not compatible.
>>>> >>
>>>> >> You can find the Stanford NLP integration on
>>>> >>
>>>> >>     https://github.com/westei/stanbol-stanfordnlp
>>>> >>
>>>> >> just create a fork and send pull requests.
>>>> >>
>>>> >>
>>>> >> > Could you add the necessary projects on the branch? And also remove
>>>> the
>>>> >> > Open NLP ones?
>>>> >> >
>>>> >>
>>>> >> Currently the branch
>>>> >>
>>>> >>
>>>> >>
>>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>>> >>
>>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those should
>>>> >> be enough for adding coreference support.
>>>> >>
>>>> >> IMO you will need to
>>>> >>
>>>> >> * add an model for representing coreference to the nlp module
>>>> >> * add parsing and serializing support to the nlp-json module
>>>> >> * add the implementation to your fork of the stanbol-stanfordnlp
>>>> project
>>>> >>
>>>> >> best
>>>> >> Rupert
>>>> >>
>>>> >>
>>>> >>
>>>> >> > Thanks,
>>>> >> > Cristian
>>>> >> >
>>>> >> >
>>>> >> > 2013/7/5 Rupert Westenthaler <ru...@gmail.com>
>>>> >> >
>>>> >> >> Hi Cristian,
>>>> >> >>
>>>> >> >> I created the branch at
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >>
>>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>>> >> >>
>>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me know
>>>> if
>>>> >> >> you would like to have more
>>>> >> >>
>>>> >> >> best
>>>> >> >> Rupert
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
>>>> >> >> <cr...@gmail.com> wrote:
>>>> >> >> > Hi Rupert,
>>>> >> >> >
>>>> >> >> > I created jiras :
>>>> https://issues.apache.org/jira/browse/STANBOL-1132and
>>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
>>>> original one
>>>> >> in
>>>> >> >> > dependent upon these.
>>>> >> >> > Please let me know when I can start using the branch.
>>>> >> >> >
>>>> >> >> > Thanks,
>>>> >> >> > Cristian
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > 2013/6/27 Cristian Petroaca <cr...@gmail.com>
>>>> >> >> >
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >> 2013/6/27 Rupert Westenthaler <ru...@gmail.com>
>>>> >> >> >>
>>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
>>>> >> >> >>> <cr...@gmail.com> wrote:
>>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
>>>> previous
>>>> >> >> e-mail.
>>>> >> >> >>> By
>>>> >> >> >>> > the way, does Open NLP have the ability to build dependency
>>>> trees?
>>>> >> >> >>> >
>>>> >> >> >>>
>>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
>>>> >> >> >>>
>>>> >> >> >>
>>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
>>>> Stanbol,
>>>> >> I'll
>>>> >> >> >> take a look at how I can extend its integration to include the
>>>> >> >> dependency
>>>> >> >> >> tree feature.
>>>> >> >> >>
>>>> >> >> >>>
>>>> >> >> >>>
>>>> >> >> >>  >
>>>> >> >> >>> > 2013/6/23 Cristian Petroaca <cr...@gmail.com>
>>>> >> >> >>> >
>>>> >> >> >>> >> Hi Rupert,
>>>> >> >> >>> >>
>>>> >> >> >>> >> I created jira
>>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
>>>> >> >> >>> >> As you suggested I would start with extending the Stanford
>>>> NLP
>>>> >> with
>>>> >> >> >>> >> co-reference resolution but I think also with dependency
>>>> trees
>>>> >> >> because
>>>> >> >> >>> I
>>>> >> >> >>> >> also need to know the Subject of the sentence and the object
>>>> >> that it
>>>> >> >> >>> >> affects, right?
>>>> >> >> >>> >>
>>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in Stanbol
>>>> for
>>>> >> >> >>> >> co-reference and dependency trees, how do I proceed with
>>>> this?
>>>> >> Do I
>>>> >> >> >>> create
>>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that can I
>>>> >> start
>>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm done
>>>> I'll
>>>> >> send
>>>> >> >> >>> you
>>>> >> >> >>> >> guys the patch fo review?
>>>> >> >> >>> >>
>>>> >> >> >>>
>>>> >> >> >>> I would create two "New Feature" type Issues one for adding
>>>> support
>>>> >> >> >>> for "dependency trees" and the other for "co-reference"
>>>> support. You
>>>> >> >> >>> should also define "depends on" relations between STANBOL-1121
>>>> and
>>>> >> >> >>> those two new issues.
>>>> >> >> >>>
>>>> >> >> >>> Sub-task could also work, but as adding those features would
>>>> be also
>>>> >> >> >>> interesting for other things I would rather define them as
>>>> separate
>>>> >> >> >>> issues.
>>>> >> >> >>>
>>>> >> >> >>>
>>>> >> >> >> 2 New Features connected with the original jira it is then.
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >>> If you would prefer to work in an own branch please tell me.
>>>> This
>>>> >> >> >>> could have the advantage that patches would not be affected by
>>>> >> changes
>>>> >> >> >>> in the trunk.
>>>> >> >> >>>
>>>> >> >> >>> Yes, a separate branch sounds good.
>>>> >> >> >>
>>>> >> >> >> best
>>>> >> >> >>> Rupert
>>>> >> >> >>>
>>>> >> >> >>> >> Regards,
>>>> >> >> >>> >> Cristian
>>>> >> >> >>> >>
>>>> >> >> >>> >>
>>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
>>>> rupert.westenthaler@gmail.com>
>>>> >> >> >>> >>
>>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>>>> >> >> >>> >>> <cr...@gmail.com> wrote:
>>>> >> >> >>> >>> > Hi Rupert,
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> > Agreed on the
>>>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>>>> >> >> >>> >>> > data structure.
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> > Should I open up a Jira for all of this in order to
>>>> >> encapsulate
>>>> >> >> this
>>>> >> >> >>> >>> > information and establish the goals and these initial
>>>> steps
>>>> >> >> towards
>>>> >> >> >>> >>> these
>>>> >> >> >>> >>> > goals?
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> > How should I proceed further? Should I create some design
>>>> >> >> documents
>>>> >> >> >>> that
>>>> >> >> >>> >>> > need to be reviewed?
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> Usually it is the best to write design related text
>>>> directly in
>>>> >> >> JIRA
>>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later to
>>>> use
>>>> >> this
>>>> >> >> >>> >>> text directly for the documentation on the Stanbol Webpage.
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> best
>>>> >> >> >>> >>> Rupert
>>>> >> >> >>> >>>
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> > Regards,
>>>> >> >> >>> >>> > Cristian
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
>>>> rupert.westenthaler@gmail.com>
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
>>>> >> >> >>> >>> >> <cr...@gmail.com> wrote:
>>>> >> >> >>> >>> >> > HI Rupert,
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
>>>> >> rupert.westenthaler@gmail.com>
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> >> Hi Cristian, all
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> really interesting use case!
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions on
>>>> how
>>>> >> this
>>>> >> >> >>> could
>>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
>>>> experiences
>>>> >> >> and
>>>> >> >> >>> >>> lessons
>>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
>>>> >> information
>>>> >> >> >>> system
>>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this Project
>>>> >> excluded
>>>> >> >> the
>>>> >> >> >>> >>> >> >> extraction of Events from unstructured text (because
>>>> the
>>>> >> >> Olympic
>>>> >> >> >>> >>> >> >> Information System was already providing event data
>>>> as XML
>>>> >> >> >>> messages)
>>>> >> >> >>> >>> >> >> the semantic search capabilities of this system
>>>> where very
>>>> >> >> >>> similar
>>>> >> >> >>> >>> as
>>>> >> >> >>> >>> >> >> the one described by your use case.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations,
>>>> but a
>>>> >> >> formal
>>>> >> >> >>> >>> >> >> representation of the situation described by the
>>>> text. So
>>>> >> >> lets
>>>> >> >> >>> >>> assume
>>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or Situation)
>>>> >> >> described
>>>> >> >> >>> in
>>>> >> >> >>> >>> the
>>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
>>>> advices on
>>>> >> >> how to
>>>> >> >> >>> >>> model
>>>> >> >> >>> >>> >> >> those. The important relation for modeling this
>>>> >> >> Participation:
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> where ..
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do have
>>>> an
>>>> >> >> >>> identity so
>>>> >> >> >>> >>> we
>>>> >> >> >>> >>> >> >> would typically refer to them as Entities referenced
>>>> by a
>>>> >> >> >>> setting.
>>>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as
>>>> well as
>>>> >> >> >>> >>> >> >> social-objects.
>>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants are
>>>> >> entities
>>>> >> >> that
>>>> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities ...
>>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
>>>> relation
>>>> >> where
>>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
>>>> intermediate
>>>> >> >> >>> resources
>>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to
>>>> define
>>>> >> one
>>>> >> >> >>> resource
>>>> >> >> >>> >>> >> >> being the context for all described data. I would
>>>> call
>>>> >> this
>>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
>>>> sub-concept to
>>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about the
>>>> >> extracted
>>>> >> >> >>> >>> Setting
>>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to annotate
>>>> that
>>>> >> >> >>> Endurant is
>>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
>>>> >> >> >>> fise:SettingAnnotation).
>>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
>>>> >> >> fise:TextAnnotaion
>>>> >> >> >>> (the
>>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
>>>> Entities).
>>>> >> >> >>> Basically
>>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
>>>> >> >> EnhancementEngine
>>>> >> >> >>> to
>>>> >> >> >>> >>> >> >> state that several mentions (in possible different
>>>> >> >> sentences) do
>>>> >> >> >>> >>> >> >> represent the same Endurant as participating in the
>>>> >> Setting.
>>>> >> >> In
>>>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
>>>> property
>>>> >> >> >>> (similar
>>>> >> >> >>> >>> as
>>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) of
>>>> an
>>>> >> >> >>> participant
>>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an
>>>> action)
>>>> >> Cause
>>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
>>>> passive
>>>> >> role
>>>> >> >> in
>>>> >> >> >>> an
>>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but I am
>>>> >> >> wondering
>>>> >> >> >>> if
>>>> >> >> >>> >>> one
>>>> >> >> >>> >>> >> >> could extract those information.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
>>>> >> Perdurant
>>>> >> >> in
>>>> >> >> >>> the
>>>> >> >> >>> >>> >> >> context of the Setting. Also
>>>> fise:OccurrentAnnotation can
>>>> >> >> link
>>>> >> >> >>> to
>>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
>>>> defining
>>>> >> the
>>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
>>>> suggesting
>>>> >> well
>>>> >> >> >>> known
>>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
>>>> country,
>>>> >> or
>>>> >> >> an
>>>> >> >> >>> >>> >> >> upraising ...). In addition fise:OccurrentAnnotation
>>>> can
>>>> >> >> define
>>>> >> >> >>> >>> >> >> dc:has-participant links to
>>>> fise:ParticipantAnnotation. In
>>>> >> >> this
>>>> >> >> >>> case
>>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
>>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
>>>> Perturant
>>>> >> (the
>>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
>>>> temporal
>>>> >> >> indexed
>>>> >> >> >>> this
>>>> >> >> >>> >>> >> >> annotation should also support properties for
>>>> defining the
>>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot of
>>>> sense
>>>> >> >> with
>>>> >> >> >>> the
>>>> >> >> >>> >>> >> remark
>>>> >> >> >>> >>> >> > that you probably won't be able to always extract the
>>>> date
>>>> >> >> for a
>>>> >> >> >>> >>> given
>>>> >> >> >>> >>> >> > setting(situation).
>>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which the
>>>> >> object
>>>> >> >> upon
>>>> >> >> >>> >>> which
>>>> >> >> >>> >>> >> the
>>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a transitory
>>>> >> object (
>>>> >> >> >>> such
>>>> >> >> >>> >>> as an
>>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
>>>> example
>>>> >> we
>>>> >> >> can
>>>> >> >> >>> >>> have
>>>> >> >> >>> >>> >> the
>>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the Endurant
>>>> (
>>>> >> >> Subject )
>>>> >> >> >>> >>> which
>>>> >> >> >>> >>> >> > performs the action of "invading" on another
>>>> Eundurant,
>>>> >> namely
>>>> >> >> >>> >>> "Irak".
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
>>>> Patient.
>>>> >> Both
>>>> >> >> >>> are
>>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
>>>> Perdurant. So
>>>> >> >> >>> ideally
>>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the dc:type
>>>> >> >> caos:Agent,
>>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
>>>> >> >> >>> fise:EntityAnnotation
>>>> >> >> >>> >>> >> linking to dbpedia:United_States
>>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the dc:type
>>>> >> >> >>> caos:Patient,
>>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
>>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
>>>> dc:type
>>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
>>>> "invades"
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and
>>>> the
>>>> >> Object
>>>> >> >> >>> come
>>>> >> >> >>> >>> into
>>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
>>>> >> dc:"property"
>>>> >> >> >>> where
>>>> >> >> >>> >>> the
>>>> >> >> >>> >>> >> > property = verb which links to the Object in noun
>>>> form. For
>>>> >> >> >>> example
>>>> >> >> >>> >>> take
>>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would have
>>>> the
>>>> >> >> "USA"
>>>> >> >> >>> >>> Entity
>>>> >> >> >>> >>> >> with
>>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
>>>> Endurant
>>>> >> >> would
>>>> >> >> >>> >>> have as
>>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs which
>>>> link
>>>> >> it
>>>> >> >> to
>>>> >> >> >>> an
>>>> >> >> >>> >>> >> Object.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> As explained above you would have a
>>>> fise:OccurrentAnnotation
>>>> >> >> that
>>>> >> >> >>> >>> >> represents the Perdurant. The information that the
>>>> activity
>>>> >> >> >>> mention in
>>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
>>>> >> >> >>> fise:TextAnnotation. If
>>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that defines
>>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could
>>>> also link
>>>> >> >> to an
>>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> best
>>>> >> >> >>> >>> >> Rupert
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > ### Consuming the data:
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> I think this model should be sufficient for
>>>> use-cases as
>>>> >> >> >>> described
>>>> >> >> >>> >>> by
>>>> >> >> >>> >>> >> you.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> Users would be able to consume data on the setting
>>>> level.
>>>> >> >> This
>>>> >> >> >>> can
>>>> >> >> >>> >>> be
>>>> >> >> >>> >>> >> >> done my simple retrieving all
>>>> fise:ParticipantAnnotation
>>>> >> as
>>>> >> >> >>> well as
>>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW
>>>> this
>>>> >> was
>>>> >> >> the
>>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
>>>> allows
>>>> >> >> >>> queries for
>>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you
>>>> could
>>>> >> filter
>>>> >> >> >>> for
>>>> >> >> >>> >>> >> >> Settings that involve a {Person},
>>>> activities:Arrested and
>>>> >> a
>>>> >> >> >>> specific
>>>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach
>>>> you will
>>>> >> >> get
>>>> >> >> >>> >>> results
>>>> >> >> >>> >>> >> >> for Setting where the {Person} participated and an
>>>> other
>>>> >> >> person
>>>> >> >> >>> was
>>>> >> >> >>> >>> >> >> arrested.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> An other possibility would be to process enhancement
>>>> >> results
>>>> >> >> on
>>>> >> >> >>> the
>>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a much
>>>> >> higher
>>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to correctly
>>>> answer
>>>> >> >> the
>>>> >> >> >>> query
>>>> >> >> >>> >>> >> >> used as an example above). But I am wondering if the
>>>> >> quality
>>>> >> >> of
>>>> >> >> >>> the
>>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I
>>>> have
>>>> >> also
>>>> >> >> >>> doubts
>>>> >> >> >>> >>> if
>>>> >> >> >>> >>> >> >> this can be still realized by using semantic
>>>> indexing to
>>>> >> >> Apache
>>>> >> >> >>> Solr
>>>> >> >> >>> >>> >> >> or if it would be better/necessary to store results
>>>> in a
>>>> >> >> >>> TripleStore
>>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> The methodology and query language used by YAGO [3]
>>>> is
>>>> >> also
>>>> >> >> very
>>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7 SPOTL(X)
>>>> >> >> >>> >>> Representation).
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of Entities
>>>> >> >> (especially
>>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
>>>> extracted
>>>> >> form
>>>> >> >> >>> >>> Documents.
>>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
>>>> temporal
>>>> >> >> indexed.
>>>> >> >> >>> That
>>>> >> >> >>> >>> >> >> means that at the time when added to a knowledge
>>>> base they
>>>> >> >> might
>>>> >> >> >>> >>> still
>>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
>>>> refinement
>>>> >> of
>>>> >> >> such
>>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
>>>> critical for
>>>> >> a
>>>> >> >> >>> System
>>>> >> >> >>> >>> >> >> like described in your use-case.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
>>>> >> >> >>> >>> >> >> <cr...@gmail.com> wrote:
>>>> >> >> >>> >>> >> >> >
>>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new in the
>>>> >> field
>>>> >> >> of
>>>> >> >> >>> >>> semantic
>>>> >> >> >>> >>> >> >> > technologies, I've started to read about them in
>>>> the
>>>> >> last
>>>> >> >> 4-5
>>>> >> >> >>> >>> >> >> months.Having
>>>> >> >> >>> >>> >> >> > said that I have a high level overview of what is
>>>> a good
>>>> >> >> >>> approach
>>>> >> >> >>> >>> to
>>>> >> >> >>> >>> >> >> solve
>>>> >> >> >>> >>> >> >> > this problem. There are a number of papers on the
>>>> >> internet
>>>> >> >> >>> which
>>>> >> >> >>> >>> >> describe
>>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named entity
>>>> >> >> >>> recognition,
>>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
>>>> supports
>>>> >> >> >>> sentence
>>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, NER
>>>> and
>>>> >> >> lemma.
>>>> >> >> >>> >>> support
>>>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees is
>>>> >> currently
>>>> >> >> >>> >>> missing.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol [4].
>>>> At
>>>> >> the
>>>> >> >> >>> moment
>>>> >> >> >>> >>> it
>>>> >> >> >>> >>> >> >> only supports English, but I do already work to
>>>> include
>>>> >> the
>>>> >> >> >>> other
>>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
>>>> already
>>>> >> >> >>> integrated
>>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But
>>>> note
>>>> >> >> that
>>>> >> >> >>> for
>>>> >> >> >>> >>> all
>>>> >> >> >>> >>> >> >> those the integration excludes support for
>>>> co-reference
>>>> >> and
>>>> >> >> >>> >>> dependency
>>>> >> >> >>> >>> >> >> trees.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a first
>>>> >> >> prototype
>>>> >> >> >>> by
>>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if available
>>>> -
>>>> >> Chunks
>>>> >> >> >>> (e.g.
>>>> >> >> >>> >>> >> >> Noun phrases).
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature like
>>>> >> Relation
>>>> >> >> >>> >>> extraction
>>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
>>>> >> >> >>> >>> >> > What kind of effort would be required for a
>>>> co-reference
>>>> >> >> >>> resolution
>>>> >> >> >>> >>> tool
>>>> >> >> >>> >>> >> > integration into Stanbol?
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
>>>> before
>>>> >> we
>>>> >> >> can
>>>> >> >> >>> >>> >> build such an engine we would need to
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
>>>> Annotations for
>>>> >> >> >>> >>> co-reference
>>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for those
>>>> >> >> annotation
>>>> >> >> >>> so
>>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
>>>> >> co-reference
>>>> >> >> >>> >>> >> information
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate
>>>> the
>>>> >> >> extracted
>>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper structure to
>>>> >> >> represent
>>>> >> >> >>> >>> >> Events will only pay-off if we can also successfully
>>>> extract
>>>> >> >> such
>>>> >> >> >>> >>> >> information form processed texts.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> I would start with
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>  * fise:SettingAnnotation
>>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
>>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple if
>>>> there
>>>> >> are
>>>> >> >> >>> more
>>>> >> >> >>> >>> >> suggestions)
>>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
>>>> >> fise:Instrument,
>>>> >> >> >>> >>> fise:Cause
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
>>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> If it turns out that we can extract more, we can add
>>>> more
>>>> >> >> >>> structure to
>>>> >> >> >>> >>> >> those annotations. We might also think about using an
>>>> own
>>>> >> >> namespace
>>>> >> >> >>> >>> >> for those extensions to the annotation structure.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated into
>>>> >> >> Stanbol.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
>>>> >> enhancement
>>>> >> >> >>> chain
>>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> You should have a look at
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot of
>>>> things
>>>> >> >> with
>>>> >> >> >>> NLP
>>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
>>>> verbs) to
>>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
>>>> dependency
>>>> >> >> trees
>>>> >> >> >>> >>> >> you code will need to do similar things with Nouns,
>>>> Pronouns
>>>> >> and
>>>> >> >> >>> >>> >> Verbs.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
>>>> >> >> representation
>>>> >> >> >>> of
>>>> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation
>>>> [2].
>>>> >> >> >>> Something
>>>> >> >> >>> >>> >> similar will also be required by the
>>>> EventExtractionEngine
>>>> >> for
>>>> >> >> fast
>>>> >> >> >>> >>> >> access to such annotations while iterating over the
>>>> >> Sentences of
>>>> >> >> >>> the
>>>> >> >> >>> >>> >> text.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> best
>>>> >> >> >>> >>> >> Rupert
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> [1]
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>>
>>>> >> >> >>>
>>>> >> >>
>>>> >>
>>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>>>> >> >> >>> >>> >> [2]
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>>
>>>> >> >> >>>
>>>> >> >>
>>>> >>
>>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > Thanks
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
>>>> >> >> >>> >>> >> >> best
>>>> >> >> >>> >>> >> >> Rupert
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> --
>>>> >> >> >>> >>> >> >> | Rupert Westenthaler
>>>> >> >> rupert.westenthaler@gmail.com
>>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
>>>> >> >> >>> ++43-699-11108907
>>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> --
>>>> >> >> >>> >>> >> | Rupert Westenthaler
>>>> >> rupert.westenthaler@gmail.com
>>>> >> >> >>> >>> >> | Bodenlehenstraße 11
>>>> >> >> >>> ++43-699-11108907
>>>> >> >> >>> >>> >> | A-5500 Bischofshofen
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>>
>>>> >> >> >>> >>>
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> --
>>>> >> >> >>> >>> | Rupert Westenthaler
>>>> rupert.westenthaler@gmail.com
>>>> >> >> >>> >>> | Bodenlehenstraße 11
>>>> >> >> ++43-699-11108907
>>>> >> >> >>> >>> | A-5500 Bischofshofen
>>>> >> >> >>> >>>
>>>> >> >> >>> >>
>>>> >> >> >>> >>
>>>> >> >> >>>
>>>> >> >> >>>
>>>> >> >> >>>
>>>> >> >> >>> --
>>>> >> >> >>> | Rupert Westenthaler
>>>> rupert.westenthaler@gmail.com
>>>> >> >> >>> | Bodenlehenstraße 11
>>>> ++43-699-11108907
>>>> >> >> >>> | A-5500 Bischofshofen
>>>> >> >> >>>
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> >> >> | Bodenlehenstraße 11
>>>> ++43-699-11108907
>>>> >> >> | A-5500 Bischofshofen
>>>> >> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> >> | A-5500 Bischofshofen
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>>
>>>
>>>
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Relation extraction feature

Posted by Cristian Petroaca <cr...@gmail.com>.
Sorry, pressed sent too soon :).

Continued :

nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]

Given this, we can have for each "Token" an additional dependency
annotation :

"stanbol.enhancer.nlp.dependency" : {
"tag" : //is it necessary?
"relations" : [ { "type" : "nsubj", //type of relation
  "role" : "gov/dep", //whether it is depender or the dependee
  "dependencyValue" : "met", // the word with which the token has a relation
  "dependencyIndexInSentence" : "2" //the index of the dependency in the
current sentence
}
...
]
                "class" :
"org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
        }

2013/9/1 Cristian Petroaca <cr...@gmail.com>

> Related to the Stanford Dependency Tree Feature, this is the way the
> output from the tool looks like for this sentence : "Mary and Tom met Danny
> today" :
>
>
> 2013/8/30 Cristian Petroaca <cr...@gmail.com>
>
>> Hi Rupert,
>>
>> Ok, so after looking at the JSON output from the Stanford NLP Server and
>> the coref module I'm thinking I can represent the coreference information
>> this way:
>> Each "Token" or "Chunk" will contain an additional coref annotation with
>> the following structure :
>>
>> "stanbol.enhancer.nlp.coref" {
>>     "tag" : //does this need to exist?
>>     "isRepresentative" : true/false, // whether this token or chunk is
>> the representative mention in the chain
>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the mention
>> is found
>>                            "startWord" : 2 //the first word making up the
>> mention
>>                            "endWord" : 3 //the last word making up the
>> mention
>>                          }, ...
>>                        ],
>>     "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>> }
>>
>> The CorefTag should resemble this model.
>>
>> What do you think?
>>
>> Cristian
>>
>>
>> 2013/8/24 Rupert Westenthaler <ru...@gmail.com>
>>
>>> Hi Cristian,
>>>
>>> you can not directly call StanfordNLP components from Stanbol, but you
>>> have to extend the RESTful service to include the information you
>>> need. The main reason for that is that the license of StanfordNLP is
>>> not compatible with the Apache Software License. So Stanbol can not
>>> directly link to the StanfordNLP API.
>>>
>>> You will need to
>>>
>>> 1. define an additional class {yourTag} extends Tag<{yourType}> class
>>> in the o.a.s.enhancer.nlp module
>>> 2. add JSON parsing and serialization support for this tag to the
>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
>>>
>>> As (1) would be necessary anyway the only additional thing you need to
>>> develop is (2). After that you can add {yourTag} instance to the
>>> AnalyzedText in the StanfornNLP integration. The
>>> RestfulNlpAnalysisEngine will parse them from the response. All
>>> engines executed after the RestfulNlpAnalysisEngine will have access
>>> to your annotations.
>>>
>>> If you have a design for {yourTag} - the model you would like to use
>>> to represent your data - I can help with (1) and (2).
>>>
>>> best
>>> Rupert
>>>
>>>
>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
>>> <cr...@gmail.com> wrote:
>>> > Hi Rupert,
>>> >
>>> > Thanks for the info. Looking at the standbol-stanfordnlp project I see
>>> that
>>> > the stanford nlp is not implemented as an EnhancementEngine but rather
>>> it
>>> > is used directly in a Jetty Server instance. How does that fit into the
>>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
>>> routine
>>> > from my TripleExtractionEnhancementEngine which lives in the Stanbol
>>> stack?
>>> >
>>> > Thanks,
>>> > Cristian
>>> >
>>> >
>>> > 2013/8/12 Rupert Westenthaler <ru...@gmail.com>
>>> >
>>> >> Hi Cristian,
>>> >>
>>> >> Sorry for the late response, but I was offline for the last two weeks
>>> >>
>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
>>> >> <cr...@gmail.com> wrote:
>>> >> > Hi Rupert,
>>> >> >
>>> >> > After doing some tests it seems that the Stanford NLP coreference
>>> module
>>> >> is
>>> >> > much more accurate than the Open NLP one.So I decided to extend
>>> Stanford
>>> >> > NLP to add coreference there.
>>> >>
>>> >> The Stanford NLP integration is not part of the Stanbol codebase
>>> >> because the licenses are not compatible.
>>> >>
>>> >> You can find the Stanford NLP integration on
>>> >>
>>> >>     https://github.com/westei/stanbol-stanfordnlp
>>> >>
>>> >> just create a fork and send pull requests.
>>> >>
>>> >>
>>> >> > Could you add the necessary projects on the branch? And also remove
>>> the
>>> >> > Open NLP ones?
>>> >> >
>>> >>
>>> >> Currently the branch
>>> >>
>>> >>
>>> >>
>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>> >>
>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those should
>>> >> be enough for adding coreference support.
>>> >>
>>> >> IMO you will need to
>>> >>
>>> >> * add an model for representing coreference to the nlp module
>>> >> * add parsing and serializing support to the nlp-json module
>>> >> * add the implementation to your fork of the stanbol-stanfordnlp
>>> project
>>> >>
>>> >> best
>>> >> Rupert
>>> >>
>>> >>
>>> >>
>>> >> > Thanks,
>>> >> > Cristian
>>> >> >
>>> >> >
>>> >> > 2013/7/5 Rupert Westenthaler <ru...@gmail.com>
>>> >> >
>>> >> >> Hi Cristian,
>>> >> >>
>>> >> >> I created the branch at
>>> >> >>
>>> >> >>
>>> >> >>
>>> >>
>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>> >> >>
>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me know
>>> if
>>> >> >> you would like to have more
>>> >> >>
>>> >> >> best
>>> >> >> Rupert
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
>>> >> >> <cr...@gmail.com> wrote:
>>> >> >> > Hi Rupert,
>>> >> >> >
>>> >> >> > I created jiras :
>>> https://issues.apache.org/jira/browse/STANBOL-1132and
>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
>>> original one
>>> >> in
>>> >> >> > dependent upon these.
>>> >> >> > Please let me know when I can start using the branch.
>>> >> >> >
>>> >> >> > Thanks,
>>> >> >> > Cristian
>>> >> >> >
>>> >> >> >
>>> >> >> > 2013/6/27 Cristian Petroaca <cr...@gmail.com>
>>> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> 2013/6/27 Rupert Westenthaler <ru...@gmail.com>
>>> >> >> >>
>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
>>> >> >> >>> <cr...@gmail.com> wrote:
>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
>>> previous
>>> >> >> e-mail.
>>> >> >> >>> By
>>> >> >> >>> > the way, does Open NLP have the ability to build dependency
>>> trees?
>>> >> >> >>> >
>>> >> >> >>>
>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
>>> >> >> >>>
>>> >> >> >>
>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
>>> Stanbol,
>>> >> I'll
>>> >> >> >> take a look at how I can extend its integration to include the
>>> >> >> dependency
>>> >> >> >> tree feature.
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >> >>  >
>>> >> >> >>> > 2013/6/23 Cristian Petroaca <cr...@gmail.com>
>>> >> >> >>> >
>>> >> >> >>> >> Hi Rupert,
>>> >> >> >>> >>
>>> >> >> >>> >> I created jira
>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
>>> >> >> >>> >> As you suggested I would start with extending the Stanford
>>> NLP
>>> >> with
>>> >> >> >>> >> co-reference resolution but I think also with dependency
>>> trees
>>> >> >> because
>>> >> >> >>> I
>>> >> >> >>> >> also need to know the Subject of the sentence and the object
>>> >> that it
>>> >> >> >>> >> affects, right?
>>> >> >> >>> >>
>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in Stanbol
>>> for
>>> >> >> >>> >> co-reference and dependency trees, how do I proceed with
>>> this?
>>> >> Do I
>>> >> >> >>> create
>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that can I
>>> >> start
>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm done
>>> I'll
>>> >> send
>>> >> >> >>> you
>>> >> >> >>> >> guys the patch fo review?
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>> I would create two "New Feature" type Issues one for adding
>>> support
>>> >> >> >>> for "dependency trees" and the other for "co-reference"
>>> support. You
>>> >> >> >>> should also define "depends on" relations between STANBOL-1121
>>> and
>>> >> >> >>> those two new issues.
>>> >> >> >>>
>>> >> >> >>> Sub-task could also work, but as adding those features would
>>> be also
>>> >> >> >>> interesting for other things I would rather define them as
>>> separate
>>> >> >> >>> issues.
>>> >> >> >>>
>>> >> >> >>>
>>> >> >> >> 2 New Features connected with the original jira it is then.
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>> If you would prefer to work in an own branch please tell me.
>>> This
>>> >> >> >>> could have the advantage that patches would not be affected by
>>> >> changes
>>> >> >> >>> in the trunk.
>>> >> >> >>>
>>> >> >> >>> Yes, a separate branch sounds good.
>>> >> >> >>
>>> >> >> >> best
>>> >> >> >>> Rupert
>>> >> >> >>>
>>> >> >> >>> >> Regards,
>>> >> >> >>> >> Cristian
>>> >> >> >>> >>
>>> >> >> >>> >>
>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com>
>>> >> >> >>> >>
>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>>> >> >> >>> >>> <cr...@gmail.com> wrote:
>>> >> >> >>> >>> > Hi Rupert,
>>> >> >> >>> >>> >
>>> >> >> >>> >>> > Agreed on the
>>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>>> >> >> >>> >>> > data structure.
>>> >> >> >>> >>> >
>>> >> >> >>> >>> > Should I open up a Jira for all of this in order to
>>> >> encapsulate
>>> >> >> this
>>> >> >> >>> >>> > information and establish the goals and these initial
>>> steps
>>> >> >> towards
>>> >> >> >>> >>> these
>>> >> >> >>> >>> > goals?
>>> >> >> >>> >>>
>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
>>> >> >> >>> >>>
>>> >> >> >>> >>> > How should I proceed further? Should I create some design
>>> >> >> documents
>>> >> >> >>> that
>>> >> >> >>> >>> > need to be reviewed?
>>> >> >> >>> >>>
>>> >> >> >>> >>> Usually it is the best to write design related text
>>> directly in
>>> >> >> JIRA
>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later to
>>> use
>>> >> this
>>> >> >> >>> >>> text directly for the documentation on the Stanbol Webpage.
>>> >> >> >>> >>>
>>> >> >> >>> >>> best
>>> >> >> >>> >>> Rupert
>>> >> >> >>> >>>
>>> >> >> >>> >>>
>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
>>> >> >> >>> >>> >
>>> >> >> >>> >>> > Regards,
>>> >> >> >>> >>> > Cristian
>>> >> >> >>> >>> >
>>> >> >> >>> >>> >
>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com>
>>> >> >> >>> >>> >
>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
>>> >> >> >>> >>> >> <cr...@gmail.com> wrote:
>>> >> >> >>> >>> >> > HI Rupert,
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
>>> >> rupert.westenthaler@gmail.com>
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> >> Hi Cristian, all
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> really interesting use case!
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions on
>>> how
>>> >> this
>>> >> >> >>> could
>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
>>> experiences
>>> >> >> and
>>> >> >> >>> >>> lessons
>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
>>> >> information
>>> >> >> >>> system
>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this Project
>>> >> excluded
>>> >> >> the
>>> >> >> >>> >>> >> >> extraction of Events from unstructured text (because
>>> the
>>> >> >> Olympic
>>> >> >> >>> >>> >> >> Information System was already providing event data
>>> as XML
>>> >> >> >>> messages)
>>> >> >> >>> >>> >> >> the semantic search capabilities of this system
>>> where very
>>> >> >> >>> similar
>>> >> >> >>> >>> as
>>> >> >> >>> >>> >> >> the one described by your use case.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations,
>>> but a
>>> >> >> formal
>>> >> >> >>> >>> >> >> representation of the situation described by the
>>> text. So
>>> >> >> lets
>>> >> >> >>> >>> assume
>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or Situation)
>>> >> >> described
>>> >> >> >>> in
>>> >> >> >>> >>> the
>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
>>> advices on
>>> >> >> how to
>>> >> >> >>> >>> model
>>> >> >> >>> >>> >> >> those. The important relation for modeling this
>>> >> >> Participation:
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> where ..
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do have
>>> an
>>> >> >> >>> identity so
>>> >> >> >>> >>> we
>>> >> >> >>> >>> >> >> would typically refer to them as Entities referenced
>>> by a
>>> >> >> >>> setting.
>>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as
>>> well as
>>> >> >> >>> >>> >> >> social-objects.
>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants are
>>> >> entities
>>> >> >> that
>>> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities ...
>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
>>> relation
>>> >> where
>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
>>> intermediate
>>> >> >> >>> resources
>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to
>>> define
>>> >> one
>>> >> >> >>> resource
>>> >> >> >>> >>> >> >> being the context for all described data. I would
>>> call
>>> >> this
>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
>>> sub-concept to
>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about the
>>> >> extracted
>>> >> >> >>> >>> Setting
>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to annotate
>>> that
>>> >> >> >>> Endurant is
>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
>>> >> >> >>> fise:SettingAnnotation).
>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
>>> >> >> fise:TextAnnotaion
>>> >> >> >>> (the
>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
>>> Entities).
>>> >> >> >>> Basically
>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
>>> >> >> EnhancementEngine
>>> >> >> >>> to
>>> >> >> >>> >>> >> >> state that several mentions (in possible different
>>> >> >> sentences) do
>>> >> >> >>> >>> >> >> represent the same Endurant as participating in the
>>> >> Setting.
>>> >> >> In
>>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
>>> property
>>> >> >> >>> (similar
>>> >> >> >>> >>> as
>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) of
>>> an
>>> >> >> >>> participant
>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an
>>> action)
>>> >> Cause
>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
>>> passive
>>> >> role
>>> >> >> in
>>> >> >> >>> an
>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but I am
>>> >> >> wondering
>>> >> >> >>> if
>>> >> >> >>> >>> one
>>> >> >> >>> >>> >> >> could extract those information.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
>>> >> Perdurant
>>> >> >> in
>>> >> >> >>> the
>>> >> >> >>> >>> >> >> context of the Setting. Also
>>> fise:OccurrentAnnotation can
>>> >> >> link
>>> >> >> >>> to
>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
>>> defining
>>> >> the
>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
>>> suggesting
>>> >> well
>>> >> >> >>> known
>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
>>> country,
>>> >> or
>>> >> >> an
>>> >> >> >>> >>> >> >> upraising ...). In addition fise:OccurrentAnnotation
>>> can
>>> >> >> define
>>> >> >> >>> >>> >> >> dc:has-participant links to
>>> fise:ParticipantAnnotation. In
>>> >> >> this
>>> >> >> >>> case
>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
>>> Perturant
>>> >> (the
>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
>>> temporal
>>> >> >> indexed
>>> >> >> >>> this
>>> >> >> >>> >>> >> >> annotation should also support properties for
>>> defining the
>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot of
>>> sense
>>> >> >> with
>>> >> >> >>> the
>>> >> >> >>> >>> >> remark
>>> >> >> >>> >>> >> > that you probably won't be able to always extract the
>>> date
>>> >> >> for a
>>> >> >> >>> >>> given
>>> >> >> >>> >>> >> > setting(situation).
>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which the
>>> >> object
>>> >> >> upon
>>> >> >> >>> >>> which
>>> >> >> >>> >>> >> the
>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a transitory
>>> >> object (
>>> >> >> >>> such
>>> >> >> >>> >>> as an
>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
>>> example
>>> >> we
>>> >> >> can
>>> >> >> >>> >>> have
>>> >> >> >>> >>> >> the
>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the Endurant
>>> (
>>> >> >> Subject )
>>> >> >> >>> >>> which
>>> >> >> >>> >>> >> > performs the action of "invading" on another
>>> Eundurant,
>>> >> namely
>>> >> >> >>> >>> "Irak".
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
>>> Patient.
>>> >> Both
>>> >> >> >>> are
>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
>>> Perdurant. So
>>> >> >> >>> ideally
>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the dc:type
>>> >> >> caos:Agent,
>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
>>> >> >> >>> fise:EntityAnnotation
>>> >> >> >>> >>> >> linking to dbpedia:United_States
>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the dc:type
>>> >> >> >>> caos:Patient,
>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
>>> dc:type
>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
>>> "invades"
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and
>>> the
>>> >> Object
>>> >> >> >>> come
>>> >> >> >>> >>> into
>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
>>> >> dc:"property"
>>> >> >> >>> where
>>> >> >> >>> >>> the
>>> >> >> >>> >>> >> > property = verb which links to the Object in noun
>>> form. For
>>> >> >> >>> example
>>> >> >> >>> >>> take
>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would have
>>> the
>>> >> >> "USA"
>>> >> >> >>> >>> Entity
>>> >> >> >>> >>> >> with
>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
>>> Endurant
>>> >> >> would
>>> >> >> >>> >>> have as
>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs which
>>> link
>>> >> it
>>> >> >> to
>>> >> >> >>> an
>>> >> >> >>> >>> >> Object.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> As explained above you would have a
>>> fise:OccurrentAnnotation
>>> >> >> that
>>> >> >> >>> >>> >> represents the Perdurant. The information that the
>>> activity
>>> >> >> >>> mention in
>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
>>> >> >> >>> fise:TextAnnotation. If
>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that defines
>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could
>>> also link
>>> >> >> to an
>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> best
>>> >> >> >>> >>> >> Rupert
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > ### Consuming the data:
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> I think this model should be sufficient for
>>> use-cases as
>>> >> >> >>> described
>>> >> >> >>> >>> by
>>> >> >> >>> >>> >> you.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> Users would be able to consume data on the setting
>>> level.
>>> >> >> This
>>> >> >> >>> can
>>> >> >> >>> >>> be
>>> >> >> >>> >>> >> >> done my simple retrieving all
>>> fise:ParticipantAnnotation
>>> >> as
>>> >> >> >>> well as
>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW
>>> this
>>> >> was
>>> >> >> the
>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
>>> allows
>>> >> >> >>> queries for
>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you
>>> could
>>> >> filter
>>> >> >> >>> for
>>> >> >> >>> >>> >> >> Settings that involve a {Person},
>>> activities:Arrested and
>>> >> a
>>> >> >> >>> specific
>>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach
>>> you will
>>> >> >> get
>>> >> >> >>> >>> results
>>> >> >> >>> >>> >> >> for Setting where the {Person} participated and an
>>> other
>>> >> >> person
>>> >> >> >>> was
>>> >> >> >>> >>> >> >> arrested.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> An other possibility would be to process enhancement
>>> >> results
>>> >> >> on
>>> >> >> >>> the
>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a much
>>> >> higher
>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to correctly
>>> answer
>>> >> >> the
>>> >> >> >>> query
>>> >> >> >>> >>> >> >> used as an example above). But I am wondering if the
>>> >> quality
>>> >> >> of
>>> >> >> >>> the
>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I
>>> have
>>> >> also
>>> >> >> >>> doubts
>>> >> >> >>> >>> if
>>> >> >> >>> >>> >> >> this can be still realized by using semantic
>>> indexing to
>>> >> >> Apache
>>> >> >> >>> Solr
>>> >> >> >>> >>> >> >> or if it would be better/necessary to store results
>>> in a
>>> >> >> >>> TripleStore
>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> The methodology and query language used by YAGO [3]
>>> is
>>> >> also
>>> >> >> very
>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7 SPOTL(X)
>>> >> >> >>> >>> Representation).
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of Entities
>>> >> >> (especially
>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
>>> extracted
>>> >> form
>>> >> >> >>> >>> Documents.
>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
>>> temporal
>>> >> >> indexed.
>>> >> >> >>> That
>>> >> >> >>> >>> >> >> means that at the time when added to a knowledge
>>> base they
>>> >> >> might
>>> >> >> >>> >>> still
>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
>>> refinement
>>> >> of
>>> >> >> such
>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
>>> critical for
>>> >> a
>>> >> >> >>> System
>>> >> >> >>> >>> >> >> like described in your use-case.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
>>> >> >> >>> >>> >> >> <cr...@gmail.com> wrote:
>>> >> >> >>> >>> >> >> >
>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new in the
>>> >> field
>>> >> >> of
>>> >> >> >>> >>> semantic
>>> >> >> >>> >>> >> >> > technologies, I've started to read about them in
>>> the
>>> >> last
>>> >> >> 4-5
>>> >> >> >>> >>> >> >> months.Having
>>> >> >> >>> >>> >> >> > said that I have a high level overview of what is
>>> a good
>>> >> >> >>> approach
>>> >> >> >>> >>> to
>>> >> >> >>> >>> >> >> solve
>>> >> >> >>> >>> >> >> > this problem. There are a number of papers on the
>>> >> internet
>>> >> >> >>> which
>>> >> >> >>> >>> >> describe
>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named entity
>>> >> >> >>> recognition,
>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
>>> supports
>>> >> >> >>> sentence
>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, NER
>>> and
>>> >> >> lemma.
>>> >> >> >>> >>> support
>>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees is
>>> >> currently
>>> >> >> >>> >>> missing.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol [4].
>>> At
>>> >> the
>>> >> >> >>> moment
>>> >> >> >>> >>> it
>>> >> >> >>> >>> >> >> only supports English, but I do already work to
>>> include
>>> >> the
>>> >> >> >>> other
>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
>>> already
>>> >> >> >>> integrated
>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But
>>> note
>>> >> >> that
>>> >> >> >>> for
>>> >> >> >>> >>> all
>>> >> >> >>> >>> >> >> those the integration excludes support for
>>> co-reference
>>> >> and
>>> >> >> >>> >>> dependency
>>> >> >> >>> >>> >> >> trees.
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a first
>>> >> >> prototype
>>> >> >> >>> by
>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if available
>>> -
>>> >> Chunks
>>> >> >> >>> (e.g.
>>> >> >> >>> >>> >> >> Noun phrases).
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature like
>>> >> Relation
>>> >> >> >>> >>> extraction
>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
>>> >> >> >>> >>> >> > What kind of effort would be required for a
>>> co-reference
>>> >> >> >>> resolution
>>> >> >> >>> >>> tool
>>> >> >> >>> >>> >> > integration into Stanbol?
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
>>> before
>>> >> we
>>> >> >> can
>>> >> >> >>> >>> >> build such an engine we would need to
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
>>> Annotations for
>>> >> >> >>> >>> co-reference
>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for those
>>> >> >> annotation
>>> >> >> >>> so
>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
>>> >> co-reference
>>> >> >> >>> >>> >> information
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate
>>> the
>>> >> >> extracted
>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper structure to
>>> >> >> represent
>>> >> >> >>> >>> >> Events will only pay-off if we can also successfully
>>> extract
>>> >> >> such
>>> >> >> >>> >>> >> information form processed texts.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> I would start with
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>  * fise:SettingAnnotation
>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple if
>>> there
>>> >> are
>>> >> >> >>> more
>>> >> >> >>> >>> >> suggestions)
>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
>>> >> fise:Instrument,
>>> >> >> >>> >>> fise:Cause
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> If it turns out that we can extract more, we can add
>>> more
>>> >> >> >>> structure to
>>> >> >> >>> >>> >> those annotations. We might also think about using an
>>> own
>>> >> >> namespace
>>> >> >> >>> >>> >> for those extensions to the annotation structure.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated into
>>> >> >> Stanbol.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
>>> >> enhancement
>>> >> >> >>> chain
>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> You should have a look at
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot of
>>> things
>>> >> >> with
>>> >> >> >>> NLP
>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
>>> verbs) to
>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
>>> dependency
>>> >> >> trees
>>> >> >> >>> >>> >> you code will need to do similar things with Nouns,
>>> Pronouns
>>> >> and
>>> >> >> >>> >>> >> Verbs.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
>>> >> >> representation
>>> >> >> >>> of
>>> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation
>>> [2].
>>> >> >> >>> Something
>>> >> >> >>> >>> >> similar will also be required by the
>>> EventExtractionEngine
>>> >> for
>>> >> >> fast
>>> >> >> >>> >>> >> access to such annotations while iterating over the
>>> >> Sentences of
>>> >> >> >>> the
>>> >> >> >>> >>> >> text.
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> best
>>> >> >> >>> >>> >> Rupert
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> [1]
>>> >> >> >>> >>> >>
>>> >> >> >>> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>>> >> >> >>> >>> >> [2]
>>> >> >> >>> >>> >>
>>> >> >> >>> >>>
>>> >> >> >>>
>>> >> >>
>>> >>
>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > Thanks
>>> >> >> >>> >>> >> >
>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
>>> >> >> >>> >>> >> >> best
>>> >> >> >>> >>> >> >> Rupert
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >> >> --
>>> >> >> >>> >>> >> >> | Rupert Westenthaler
>>> >> >> rupert.westenthaler@gmail.com
>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
>>> >> >> >>> ++43-699-11108907
>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
>>> >> >> >>> >>> >> >>
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >>
>>> >> >> >>> >>> >> --
>>> >> >> >>> >>> >> | Rupert Westenthaler
>>> >> rupert.westenthaler@gmail.com
>>> >> >> >>> >>> >> | Bodenlehenstraße 11
>>> >> >> >>> ++43-699-11108907
>>> >> >> >>> >>> >> | A-5500 Bischofshofen
>>> >> >> >>> >>> >>
>>> >> >> >>> >>>
>>> >> >> >>> >>>
>>> >> >> >>> >>>
>>> >> >> >>> >>> --
>>> >> >> >>> >>> | Rupert Westenthaler
>>> rupert.westenthaler@gmail.com
>>> >> >> >>> >>> | Bodenlehenstraße 11
>>> >> >> ++43-699-11108907
>>> >> >> >>> >>> | A-5500 Bischofshofen
>>> >> >> >>> >>>
>>> >> >> >>> >>
>>> >> >> >>> >>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >> >>>
>>> >> >> >>> --
>>> >> >> >>> | Rupert Westenthaler
>>> rupert.westenthaler@gmail.com
>>> >> >> >>> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >> >> >>> | A-5500 Bischofshofen
>>> >> >> >>>
>>> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >> >> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >> >> | A-5500 Bischofshofen
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >> | A-5500 Bischofshofen
>>> >>
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>