You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Cristian Petroaca <cr...@gmail.com> on 2014/04/28 15:58:36 UTC

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Hi,

I've started to implement the dbpedia properties logic and I'd like to get
some feedback on some things that I am doing :
I want to get a NER from the text and search for it in the dbpedia data so
that I can get certain dbpedia properties.
The way I'm trying to do this is by getting the NER_ANNOTATION chunk's text
and search that in the Entityhub ( which from what I saw is by default
configured with dbpedia data). I haven't yet performed a query to actually
get the data but before I continue I'd like to ask if this is the way to go?

Thanks,
Cristian


2014-03-28 15:12 GMT+02:00 Cristian Petroaca <cr...@gmail.com>:

> Examples :
>
> 1. Group membership :
>     a. Spatial membership :
>
>         "Microsoft anounced its 2013 earnings. <coref>The Richmond-based
> company</coref> made huge profits."
>
>     b. Organisational membership :
>
>        "Mick Jagger started a new solo album. <coref>The Rolling Stones
> singer</coref> did not say what the theme will be."
>
> 2. Functional membership :
>
>    "Allianz announced its 2013 earnings. <coref>The financial services
> company</coref> made a huge profit."
>
> 3.  If no matches were found for the current NER with rules from above
> then if the yago:class which matched has more than 2 nouns then we also
> consider this a good co-reference but with a lower confidence maybe.
>
>    "Boris Becker will take part in a demonstrative tennis match.
> <coref>The former tennis player</coref> will play again after 10 years."
>
>
> 2014-03-28 12:22 GMT+02:00 Rupert Westenthaler <
> rupert.westenthaler@gmail.com>:
>
>> Hi Cristian, all
>>
>> Looks good to me, nut I am not sure if I got everything. If you could
>> provide example texts where those rules apply it would make it much
>> easier to understand.
>>
>> Instead of using dbpedia properties you should define your own domain
>> model (ontology). You can than align the dbpedia properties to your
>> model. This will allow it to apply this approach also to knowledge
>> bases other than dbpedia.
>>
>> For people new to this thread: The above message adds to the
>> suggestion first made by Cristian on 4th February. Also the following
>> 4 messages (until 7th Feb) provide additional context.
>>
>> best
>> Rupert
>>
>>
>> On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > Hi guys,
>> >
>> > After Rupert's last suggestions related to this enhancement engine I
>> > devised a more comprehensive algorithm for matching the noun phrases
>> > against the NER properties.Please take a look and let me know what you
>> > think. Thanks.
>> >
>> > The following rules will be applied to every noun phrase in order to
>> find
>> > co-references:
>> >
>> > 1. For each NER prior to the current noun phrase in the text match the
>> > yago:class label to the contents of the noun phrase.
>> >
>> > For the NERs which have a yago:class which matches, apply:
>> >
>> > 2. Group membership rules :
>> >
>> >     a. spatial membership : the NER is part of a Location. If the noun
>> > phrase contains a LOCATION or a demonym then check any location
>> properties
>> > of the matching NER.
>> >
>> >     If matching NER is a :
>> >     - person, match against :birthPlace, :region, :nationality
>> >     - organisation, match against :foundationPlace, :locationCity,
>> > :location, :hometown
>> >     - place, match against :country, :subdivisionName, :location,
>> >
>> >     Ex: The Italian President, The Richmond-based company
>> >
>> >     b. organisational membership : the NER is part of an Organisation.
>> If
>> > the noun phrase contains an ORGANISATION then check the following
>> > properties of the maching NER:
>> >
>> >     If matching NER is :
>> >     - person, match against :occupation, :associatedActs
>> >     - organisation ?
>> >     - location ?
>> >
>> > Ex: The Microsoft executive, The Pink Floyd singer
>> >
>> > 3. Functional description rule: the noun phrase describes what the NER
>> does
>> > conceptually.
>> > If there are no NERs in the noun phrase then match the following
>> properties
>> > of the matching NER to the contents of the noun phrase (aside from the
>> > nouns which are part of the yago:class) :
>> >
>> >    If NER is a:
>> >    - person ?
>> >    - organisation : , match against :service, :industry, :genre
>> >    - location ?
>> >
>> > Ex:  The software company.
>> >
>> > 4. If no matches were found for the current NER with rules 2 or 3 then
>> if
>> > the yago:class which matched has more than 2 nouns then we also consider
>> > this a good co-reference but with a lower confidence maybe.
>> >
>> > Ex: The former tennis player, the theoretical physicist.
>> >
>> > 5. Based on the number of nouns which matched we create a confidence
>> level.
>> > The number of matched nouns cannot be lower than 2 and we must have a
>> > yago:class match.
>> >
>> > For all NERs which got to this point, select the closest ones in the
>> text
>> > to the noun phrase which matched against the same properties (yago:class
>> > and dbpedia) and mark them as co-references.
>> >
>> > Note: all noun phrases need to be lemmatized before all of this in case
>> > there are any plurals.
>> >
>> >
>> > 2014-03-25 20:50 GMT+02:00 Cristian Petroaca <
>> cristian.petroaca@gmail.com>:
>> >
>> >> That worked. Thanks.
>> >>
>> >> So, there are no exceptions during the startup of the launcher.
>> >> The component tab in the felix console shows 6 WeightedChains the first
>> >> time, including the default one but after my changes and a restart
>> there
>> >> are only 5 - the default one is missing altogether.
>> >>
>> >>
>> >> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
>> >> rupert.westenthaler@gmail.com>:
>> >>
>> >> Hi Cristian,
>> >>>
>> >>> I do see the same problem since last Friday. The solution as mentions
>> >>> by [1] works for me.
>> >>>
>> >>>     mvn -Djsse.enableSNIExtension=false {goals}
>> >>>
>> >>> No Idea why https connections to github do currently cause this. I
>> >>> could not find anything related via Google. So I suggest to use the
>> >>> system property for now. If this persists for longer we can adapt the
>> >>> build files accordingly.
>> >>>
>> >>> best
>> >>> Rupert
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> [1]
>> >>>
>> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
>> >>>
>> >>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
>> >>> <cr...@gmail.com> wrote:
>> >>> > I did a clean on the whole project and now I wanted to do another
>> "mvn
>> >>> > clean install" but I am getting this :
>> >>> >
>> >>> > "[INFO]
>> >>> >
>> ------------------------------------------------------------------------
>> >>> > [ERROR] Failed to execute goal
>> >>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
>> >>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es:
>> An
>> >>> Ant
>> >>> > BuildE
>> >>> > xception has occured: The following error occurred while executing
>> this
>> >>> > line:
>> >>> > [ERROR]
>> >>> >
>> C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
>> >>> > 3: Failed to copy
>> >>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
>> >>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin
>> to
>> >>> > C:\Data\Pr
>> >>> >
>> >>>
>> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
>> >>> > data\opennlp\es-pos-maxent.bin due to
>> javax.net.ssl.SSLProtocolException
>> >>> > handshake alert : unrecognized_name"
>> >>> >
>> >>> >
>> >>> >
>> >>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
>> >>> > rupert.westenthaler@gmail.com>:
>> >>> >
>> >>> >> Hi Cristian,
>> >>> >>
>> >>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
>> >>> >> <cr...@gmail.com> wrote:
>> >>> >> >
>> >>> >>
>> >>>
>> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
>> >>> >> > service.ranking=I"-2147483648"
>> >>> >> > stanbol.enhancer.chain.name="default"
>> >>> >>
>> >>> >> Does look fine to me. Do you see any exception during the startup
>> of
>> >>> >> the launcher. Can you check the status of this component in the
>> >>> >> component tab of the felix web console [1] (search for
>> >>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain").
>> If
>> >>> >> you have multiple you can find the correct one by comparing the
>> >>> >> "Properties" with those in the configuration file.
>> >>> >>
>> >>> >> I guess that the according service is in the 'unsatisfied' as you
>> do
>> >>> >> not see it in the web interface. But if this is the case you should
>> >>> >> also see the according exception in the log. You can also manually
>> >>> >> stop/start the component. In this case the exception should be
>> >>> >> re-thrown and you do not need to search the log for it.
>> >>> >>
>> >>> >> best
>> >>> >> Rupert
>> >>> >>
>> >>> >>
>> >>> >> [1] http://localhost:8080/system/console/components
>> >>> >>
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
>> >>> >> rupert.westenthaler@gmail.com
>> >>> >> >>:
>> >>> >> >
>> >>> >> >> Hi Cristian,
>> >>> >> >>
>> >>> >> >> you can not send attachments to the list. Please copy the
>> contents
>> >>> >> >> directly to the mail
>> >>> >> >>
>> >>> >> >> thx
>> >>> >> >> Rupert
>> >>> >> >>
>> >>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>> >>> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> > The config attached.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>> >>> >> >> > <ru...@gmail.com>:
>> >>> >> >> >
>> >>> >> >> >> Hi Cristian,
>> >>> >> >> >>
>> >>> >> >> >> can you provide the contents of the chain after your
>> >>> modifications?
>> >>> >> >> >> Would be interesting to test why the chain is no longer
>> active
>> >>> after
>> >>> >> >> >> the restart.
>> >>> >> >> >>
>> >>> >> >> >> You can find the config file in the 'stanbol/fileinstall'
>> folder.
>> >>> >> >> >>
>> >>> >> >> >> best
>> >>> >> >> >> Rupert
>> >>> >> >> >>
>> >>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>> >>> >> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> >> > Related to the default chain selection rules : before
>> restart I
>> >>> >> had a
>> >>> >> >> >> > chain
>> >>> >> >> >> > with the name 'default' as in I could access it via
>> >>> >> >> >> > enhancer/chain/default.
>> >>> >> >> >> > Then I just added another engine to the 'default' chain. I
>> >>> assumed
>> >>> >> >> that
>> >>> >> >> >> > after the restart the chain with the 'default' name would
>> be
>> >>> >> >> persisted.
>> >>> >> >> >> > So
>> >>> >> >> >> > the first rule should have been applied after the restart
>> as
>> >>> well.
>> >>> >> But
>> >>> >> >> >> > instead I cannot reach it via enhancer/chain/default
>> anymore
>> >>> so its
>> >>> >> >> >> > gone.
>> >>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in any
>> >>> way, I
>> >>> >> >> just
>> >>> >> >> >> > wanted to understand where the problem is.
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>> >>> >> >> >> > <rupert.westenthaler@gmail.com
>> >>> >> >> >> >>:
>> >>> >> >> >> >
>> >>> >> >> >> >> Hi Cristian
>> >>> >> >> >> >>
>> >>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>> >>> >> >> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
>> >>> >> >> >> >> >
>> >>> >> >> >> >> > 2. I start the stable launcher -> create a new instance
>> of
>> >>> the
>> >>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At this
>> >>> point
>> >>> >> >> >> >> > everything
>> >>> >> >> >> >> > looks good and works ok.
>> >>> >> >> >> >> > After I restart the server the default chain is gone and
>> >>> >> instead I
>> >>> >> >> >> >> > see
>> >>> >> >> >> >> this
>> >>> >> >> >> >> > in the enhancement chains page : all-active (default,
>> id:
>> >>> 149,
>> >>> >> >> >> >> > ranking:
>> >>> >> >> >> >> 0,
>> >>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not
>> contain
>> >>> the
>> >>> >> >> >> >> > 'default'
>> >>> >> >> >> >> > word before the restart.
>> >>> >> >> >> >> >
>> >>> >> >> >> >>
>> >>> >> >> >> >> Please note the default chain selection rules as
>> described at
>> >>> [1].
>> >>> >> >> You
>> >>> >> >> >> >> can also access chains chains under
>> >>> '/enhancer/chain/{chain-name}'
>> >>> >> >> >> >>
>> >>> >> >> >> >> best
>> >>> >> >> >> >> Rupert
>> >>> >> >> >> >>
>> >>> >> >> >> >> [1]
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>> >>> >> >> >> >>
>> >>> >> >> >> >> > It looks like the config files are exactly what I need.
>> >>> Thanks.
>> >>> >> >> >> >> >
>> >>> >> >> >> >> >
>> >>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >>:
>> >>> >> >> >> >> >
>> >>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> >>> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> >> >> >> > Thanks Rupert.
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> > A couple more questions/issues :
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing
>> this
>> >>> in the
>> >>> >> >> >> >> >> > console
>> >>> >> >> >> >> >> > output :
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains
>> get
>> >>> >> messed
>> >>> >> >> >> >> >> > up. I
>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to
>> it
>> >>> so
>> >>> >> there
>> >>> >> >> >> >> >> > are
>> >>> >> >> >> >> 11
>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>> contains
>> >>> >> around
>> >>> >> >> 23
>> >>> >> >> >> >> >> engines
>> >>> >> >> >> >> >> > in total.
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> I was not able to replicate this. What I tried was
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> (1) start up the stable launcher
>> >>> >> >> >> >> >> (2) add an additional engine to the default chain
>> >>> >> >> >> >> >> (3) restart the launcher
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> The default chain was not changed after (2) and (3).
>> So I
>> >>> would
>> >>> >> >> need
>> >>> >> >> >> >> >> further information for knowing why this is happening.
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> Generally it is better to create you own chain
>> instance as
>> >>> >> >> modifying
>> >>> >> >> >> >> >> one that is provided by the default configuration. I
>> would
>> >>> also
>> >>> >> >> >> >> >> recommend that you keep your test configuration in text
>> >>> files
>> >>> >> and
>> >>> >> >> to
>> >>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing
>> so
>> >>> >> prevent
>> >>> >> >> you
>> >>> >> >> >> >> >> from manually entering the configuration after a
>> software
>> >>> >> update.
>> >>> >> >> >> >> >> The
>> >>> >> >> >> >> >> production-mode section [3] provides information on
>> how to
>> >>> do
>> >>> >> >> that.
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> best
>> >>> >> >> >> >> >> Rupert
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> >>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
>> >>> >> >> >> >> >> [3]
>> http://stanbol.apache.org/docs/trunk/production-mode
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> > ERROR: Bundle
>> >>> org.apache.stanbol.enhancer.engine.topic.web
>> >>> >> >> [153]:
>> >>> >> >> >> >> Error
>> >>> >> >> >> >> >> > starting
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >>
>> >>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> >>> >> >> >> >> >> > (org.osgi
>> >>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in
>> >>> bundle
>> >>> >> >> >> >> >> > org.apache.stanbol.e
>> >>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve
>> 153.0:
>> >>> >> missing
>> >>> >> >> >> >> >> > requirement [15
>> >>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>> >>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>> >>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
>> >>> constraint in
>> >>> >> >> >> >> >> > bundle
>> >>> >> >> >> >> >> > org.apache.s
>> >>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to
>> resolve
>> >>> >> 153.0:
>> >>> >> >> >> >> missing
>> >>> >> >> >> >> >> > require
>> >>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>> >>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>> >>> >> >> >> >> >> > )
>> >>> >> >> >> >> >> >         at
>> >>> >> >> >> >> >>
>> >>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >>> >> >> >> >> >> >         at
>> >>> >> >> >> >>
>> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >>> >> >> >> >> >> >         at
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >>
>> >>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >         at
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >>
>> >>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> >>> >> >> >> >> >> > )
>> >>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> > Despite of this the server starts fine and I can use
>> the
>> >>> >> >> enhancer
>> >>> >> >> >> >> fine.
>> >>> >> >> >> >> >> Do
>> >>> >> >> >> >> >> > you guys see this as well?
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains
>> get
>> >>> >> messed
>> >>> >> >> >> >> >> > up. I
>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to
>> it
>> >>> so
>> >>> >> there
>> >>> >> >> >> >> >> > are
>> >>> >> >> >> >> 11
>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>> contains
>> >>> >> around
>> >>> >> >> 23
>> >>> >> >> >> >> >> engines
>> >>> >> >> >> >> >> > in total.
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> >>> >> >> >> >> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >> >>:
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> Hi Cristian,
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> NER Annotations are typically available as both
>> >>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and
>>  fise:TextAnnotation
>> >>> [1]
>> >>> >> in
>> >>> >> >> the
>> >>> >> >> >> >> >> >> enhancement metadata. As you are already accessing
>> the
>> >>> >> >> >> >> >> >> AnayzedText I
>> >>> >> >> >> >> >> >> would prefer using the
>>  NlpAnnotations.NER_ANNOTATION.
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> best
>> >>> >> >> >> >> >> >> Rupert
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> [1]
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >>> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> >> >> >> >> > Thanks.
>> >>> >> >> >> >> >> >> > I assume I should get the Named entities using the
>> >>> same
>> >>> >> but
>> >>> >> >> >> >> >> >> > with
>> >>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>> >>> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >>> >> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
>> >>> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >> Hallo Cristian,
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement
>> >>> results.
>> >>> >> >> You
>> >>> >> >> >> >> need to
>> >>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> here is some demo code you can use in the
>> >>> >> computeEnhancement
>> >>> >> >> >> >> method
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>         AnalysedText at =
>> >>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>> >>> >> >> >> >> ci,
>> >>> >> >> >> >> >> >> true);
>> >>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
>> >>> >> >> >> >> >> >> >> at.getSentences();
>> >>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as
>> single
>> >>> >> >> sentence
>> >>> >> >> >> >> >> >> >>             sections =
>> >>> >> Collections.singleton(at).iterator();
>> >>> >> >> >> >> >> >> >>         }
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>         while(sections.hasNext()){
>> >>> >> >> >> >> >> >> >>             Section section = sections.next();
>> >>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
>> >>> >> >> >> >> >> >> >>
>> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
>> >>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
>> >>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
>> >>> >> >> >> >> >> >> >>
>> >>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >>> >> >> >> >> >> >> >>                 if(phrase.value().getCategory()
>> ==
>> >>> >> >> >> >> >> >> LexicalCategory.Noun){
>> >>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase
>> [{},{}]
>> >>> {}",
>> >>> >> >> new
>> >>> >> >> >> >> >> Object[]{
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>
>> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >>> >> >> >> >> >> >> >>                 }
>> >>> >> >> >> >> >> >> >>             }
>> >>> >> >> >> >> >> >> >>         }
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> hope this helps
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> best
>> >>> >> >> >> >> >> >> >> Rupert
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> [1]
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >>> >> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> >> >> >> >> >> > I started to implement the engine and I'm
>> having
>> >>> >> problems
>> >>> >> >> >> >> >> >> >> > with
>> >>> >> >> >> >> >> getting
>> >>> >> >> >> >> >> >> >> > results for noun phrases. I modified the
>> "default"
>> >>> >> >> weighted
>> >>> >> >> >> >> chain
>> >>> >> >> >> >> >> to
>> >>> >> >> >> >> >> >> also
>> >>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample
>> text
>> >>> :
>> >>> >> >> "Angela
>> >>> >> >> >> >> Merkel
>> >>> >> >> >> >> >> >> >> visted
>> >>> >> >> >> >> >> >> >> > China. The german chancellor met with various
>> >>> people".
>> >>> >> I
>> >>> >> >> >> >> expected
>> >>> >> >> >> >> >> that
>> >>> >> >> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> > RDF XML output would contain some info about
>> the
>> >>> noun
>> >>> >> >> >> >> >> >> >> > phrases
>> >>> >> >> >> >> but I
>> >>> >> >> >> >> >> >> >> cannot
>> >>> >> >> >> >> >> >> >> > see any.
>> >>> >> >> >> >> >> >> >> > Could you point me to the correct way to
>> generate
>> >>> the
>> >>> >> noun
>> >>> >> >> >> >> phrases?
>> >>> >> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >> > Thanks,
>> >>> >> >> >> >> >> >> >> > Cristian
>> >>> >> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
>> >>> >> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >> >> Opened
>> >>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
>> >>> >> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>
>> >>> >> >> >> >> >> >> >> >> :
>> >>> >> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> >> Hi Rupert,
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll
>> also
>> >>> >> take a
>> >>> >> >> >> >> >> >> >> >>> look
>> >>> >> >> >> >> at
>> >>> >> >> >> >> >> >> Yago.
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked
>> about
>> >>> here.
>> >>> >> It
>> >>> >> >> >> >> >> >> >> >>> will
>> >>> >> >> >> >> >> >> probably
>> >>> >> >> >> >> >> >> >> >>> have just a draft-like description for now
>> and
>> >>> will
>> >>> >> be
>> >>> >> >> >> >> >> >> >> >>> updated
>> >>> >> >> >> >> >> as I
>> >>> >> >> >> >> >> >> go
>> >>> >> >> >> >> >> >> >> >>> along.
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>> Thanks,
>> >>> >> >> >> >> >> >> >> >>> Cristian
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert
>> Westenthaler <
>> >>> >> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>> Hi Cristian,
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You
>> should
>> >>> have
>> >>> >> a
>> >>> >> >> >> >> >> >> >> >>>> look at
>> >>> >> >> >> >> >> Yago2
>> >>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago
>> taxonomy
>> >>> is
>> >>> >> much
>> >>> >> >> >> >> better
>> >>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia.
>> Mapping
>> >>> >> >> >> >> >> >> >> >>>> suggestions of
>> >>> >> >> >> >> >> >> dbpedia
>> >>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both
>> dbpedia and
>> >>> >> yago2
>> >>> >> >> do
>> >>> >> >> >> >> >> provide
>> >>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
>> >>> >> Redmond's
>> >>> >> >> >> >> >> >> >> >>>> >> company
>> >>> >> >> >> >> >> made
>> >>> >> >> >> >> >> >> a
>> >>> >> >> >> >> >> >> >> >>>> >> huge profit".
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial
>> >>> contexts
>> >>> >> >> are
>> >>> >> >> >> >> >> >> >> >>>> very
>> >>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for
>> >>> >> >> referencing.
>> >>> >> >> >> >> >> >> >> >>>> So I
>> >>> >> >> >> >> >> would
>> >>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial
>> context.
>> >>> For
>> >>> >> >> >> >> >> >> >> >>>> spatial
>> >>> >> >> >> >> >> >> Entities
>> >>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for
>> other
>> >>> >> (like a
>> >>> >> >> >> >> Person,
>> >>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial
>> >>> entities
>> >>> >> >> >> >> >> >> >> >>>> define
>> >>> >> >> >> >> >> their
>> >>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be
>> >>> used to
>> >>> >> >> >> >> >> >> >> >>>> correctly
>> >>> >> >> >> >> >> link
>> >>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the
>> "spatial"
>> >>> >> >> context
>> >>> >> >> >> >> >> >> >> >>>> of
>> >>> >> >> >> >> each
>> >>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities that
>> are
>> >>> >> cities,
>> >>> >> >> >> >> regions,
>> >>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because
>> >>> those
>> >>> >> are
>> >>> >> >> >> >> >> >> >> >>>> very
>> >>> >> >> >> >> often
>> >>> >> >> >> >> >> >> used
>> >>> >> >> >> >> >> >> >> >>>> for coreferences.
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> [1]
>> http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >>> >> >> >> >> >> >> >> >>>> [2]
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >>> >> >> >> >> >> >> >> >>>> [3]
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
>> >>> Petroaca
>> >>> >> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>> >>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for
>> each
>> >>> >> entity,
>> >>> >> >> >> >> >> >> >> >>>> > in
>> >>> >> >> >> >> this
>> >>> >> >> >> >> >> >> case
>> >>> >> >> >> >> >> >> >> for
>> >>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >>> >> >> >> >> >> >> >> >>>> > category:Microsoft
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> category:Software_companies_of_the_United_States
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> category:Software_companies_based_in_Washington_(state)
>> >>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> category:1975_establishments_in_the_United_States
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> category:Companies_based_in_Redmond,_Washington
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >>
>> category:Multinational_companies_headquartered_in_the_United_States
>> >>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
>> >>> >> >> Redmont,Washington"
>> >>> >> >> >> >> which
>> >>> >> >> >> >> >> >> could
>> >>> >> >> >> >> >> >> >> be
>> >>> >> >> >> >> >> >> >> >>>> > matched.
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > There is still other contextual
>> information
>> >>> from
>> >>> >> >> >> >> >> >> >> >>>> > dbpedia
>> >>> >> >> >> >> which
>> >>> >> >> >> >> >> >> can
>> >>> >> >> >> >> >> >> >> be
>> >>> >> >> >> >> >> >> >> >>>> used.
>> >>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could
>> also
>> >>> >> >> include :
>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack
>> Obama) :
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>> >>> >> >> >> >> >> >> >> >>>> >
>>  dbpedia:Author
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>> >>> >> >> >> >> >> >> >> >>>> >
>>  dbpedia:Lawyer
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this
>> as I
>> >>> think
>> >>> >> >> that
>> >>> >> >> >> >> >> >> >> >>>> > it
>> >>> >> >> >> >> may
>> >>> >> >> >> >> >> >> have
>> >>> >> >> >> >> >> >> >> >>>> some
>> >>> >> >> >> >> >> >> >> >>>> > value in increasing the number of
>> coreference
>> >>> >> >> >> >> >> >> >> >>>> > resolutions
>> >>> >> >> >> >> and
>> >>> >> >> >> >> >> I'd
>> >>> >> >> >> >> >> >> >> like
>> >>> >> >> >> >> >> >> >> >>>> to
>> >>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than
>> >>> recall
>> >>> >> >> since
>> >>> >> >> >> >> >> >> >> >>>> > we
>> >>> >> >> >> >> >> already
>> >>> >> >> >> >> >> >> >> have
>> >>> >> >> >> >> >> >> >> >>>> a
>> >>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the
>> stanford
>> >>> nlp
>> >>> >> tool
>> >>> >> >> >> >> >> >> >> >>>> > and
>> >>> >> >> >> >> this
>> >>> >> >> >> >> >> >> would
>> >>> >> >> >> >> >> >> >> >>>> be as
>> >>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is how
>> I
>> >>> would
>> >>> >> >> like
>> >>> >> >> >> >> >> >> >> >>>> > to
>> >>> >> >> >> >> use
>> >>> >> >> >> >> >> >> it).
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a
>> jira? I
>> >>> >> could
>> >>> >> >> >> >> >> >> >> >>>> > update
>> >>> >> >> >> >> it
>> >>> >> >> >> >> >> to
>> >>> >> >> >> >> >> >> >> show
>> >>> >> >> >> >> >> >> >> >>>> my
>> >>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it
>> >>> turns
>> >>> >> out
>> >>> >> >> >> >> >> >> >> >>>> > that
>> >>> >> >> >> >> it
>> >>> >> >> >> >> >> was
>> >>> >> >> >> >> >> >> a
>> >>> >> >> >> >> >> >> >> bad
>> >>> >> >> >> >> >> >> >> >>>> idea
>> >>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll
>> end up
>> >>> >> with
>> >>> >> >> >> >> >> >> >> >>>> > more
>> >>> >> >> >> >> >> >> knowledge
>> >>> >> >> >> >> >> >> >> >>>> about
>> >>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want
>> to be
>> >>> the
>> >>> >> >> >> >> >> >> >> >>>> >> devil's
>> >>> >> >> >> >> >> >> advocate
>> >>> >> >> >> >> >> >> >> but
>> >>> >> >> >> >> >> >> >> >>>> I'm
>> >>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the
>> >>> dbpedia
>> >>> >> >> >> >> categories
>> >>> >> >> >> >> >> >> >> feature.
>> >>> >> >> >> >> >> >> >> >>>> For
>> >>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
>> >>> "Microsoft
>> >>> >> >> posted
>> >>> >> >> >> >> >> >> >> >>>> >> its
>> >>> >> >> >> >> >> 2013
>> >>> >> >> >> >> >> >> >> >>>> earnings.
>> >>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge
>> profit".
>> >>> So,
>> >>> >> maybe
>> >>> >> >> >> >> >> including
>> >>> >> >> >> >> >> >> more
>> >>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could
>> >>> >> increase
>> >>> >> >> the
>> >>> >> >> >> >> recall
>> >>> >> >> >> >> >> >> but
>> >>> >> >> >> >> >> >> >> of
>> >>> >> >> >> >> >> >> >> >>>> course
>> >>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>> >> Cheers,
>> >>> >> >> >> >> >> >> >> >>>> >> Rafa
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
>> >>> escribió:
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description
>> of the
>> >>> >> steps
>> >>> >> >> >> >> >> >> >> >>>> >> for
>> >>> >> >> >> >> >> making
>> >>> >> >> >> >> >> >> this
>> >>> >> >> >> >> >> >> >> >>>> kind of
>> >>> >> >> >> >> >> >> >> >>>> >>> coreference work.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the
>> following
>> >>> >> text in
>> >>> >> >> >> >> >> >> >> >>>> >>> the
>> >>> >> >> >> >> >> steps
>> >>> >> >> >> >> >> >> >> below
>> >>> >> >> >> >> >> >> >> >>>> in
>> >>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer :
>> "Microsoft
>> >>> posted
>> >>> >> >> its
>> >>> >> >> >> >> >> >> >> >>>> >>> 2013
>> >>> >> >> >> >> >> >> >> earnings.
>> >>> >> >> >> >> >> >> >> >>>> The
>> >>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text
>> which
>> >>> has :
>> >>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
>> >>> >> reference
>> >>> >> >> to
>> >>> >> >> >> >> >> >> >> >>>> >>> an
>> >>> >> >> >> >> >> entity
>> >>> >> >> >> >> >> >> >> local
>> >>> >> >> >> >> >> >> >> >>>> to
>> >>> >> >> >> >> >> >> >> >>>> >>> the
>> >>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but
>> not
>> >>> >> "another,
>> >>> >> >> >> >> every",
>> >>> >> >> >> >> >> etc
>> >>> >> >> >> >> >> >> >> which
>> >>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity
>> outside of
>> >>> the
>> >>> >> >> text.
>> >>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun
>> aside
>> >>> from
>> >>> >> the
>> >>> >> >> >> >> >> >> >> >>>> >>> main
>> >>> >> >> >> >> >> >> required
>> >>> >> >> >> >> >> >> >> >>>> noun
>> >>> >> >> >> >> >> >> >> >>>> >>> which
>> >>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I
>> will not
>> >>> >> count
>> >>> >> >> >> >> >> >> >> >>>> >>> "The
>> >>> >> >> >> >> >> >> company"
>> >>> >> >> >> >> >> >> >> as
>> >>> >> >> >> >> >> >> >> >>>> being
>> >>> >> >> >> >> >> >> >> >>>> >>> a
>> >>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
>> >>> create a
>> >>> >> lot
>> >>> >> >> of
>> >>> >> >> >> >> false
>> >>> >> >> >> >> >> >> >> >>>> positives by
>> >>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some
>> words
>> >>> >> such
>> >>> >> >> as
>> >>> >> >> >> >> >> >> >> >>>> >>> "in
>> >>> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> company
>> >>> >> >> >> >> >> >> >> >>>> of
>> >>> >> >> >> >> >> >> >> >>>> >>> good people".
>> >>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good
>> candidate
>> >>> >> since we
>> >>> >> >> >> >> >> >> >> >>>> >>> also
>> >>> >> >> >> >> >> have
>> >>> >> >> >> >> >> >> >> >>>> "software".
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase
>> to the
>> >>> >> >> contents
>> >>> >> >> >> >> >> >> >> >>>> >>> of
>> >>> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> dbpedia
>> >>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found
>> prior
>> >>> to
>> >>> >> the
>> >>> >> >> >> >> location
>> >>> >> >> >> >> >> of
>> >>> >> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> >>>> noun
>> >>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
>> >>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the
>> following
>> >>> >> format
>> >>> >> >> >> >> >> >> >> >>>> >>> (for
>> >>> >> >> >> >> >> >> Microsoft
>> >>> >> >> >> >> >> >> >> for
>> >>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the
>> United
>> >>> >> >> States".
>> >>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software company"
>> with
>> >>> >> that.
>> >>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in
>> the
>> >>> >> dbpedia
>> >>> >> >> >> >> category
>> >>> >> >> >> >> >> >> has a
>> >>> >> >> >> >> >> >> >> >>>> plural
>> >>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all
>> categories
>> >>> which
>> >>> >> I
>> >>> >> >> >> >> >> >> >> >>>> >>> saw. I
>> >>> >> >> >> >> >> don't
>> >>> >> >> >> >> >> >> >> know
>> >>> >> >> >> >> >> >> >> >>>> if
>> >>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
>> >>> thought
>> >>> >> of
>> >>> >> >> >> >> applying a
>> >>> >> >> >> >> >> >> >> >>>> lemmatizer on
>> >>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in
>> order
>> >>> for
>> >>> >> them
>> >>> >> >> to
>> >>> >> >> >> >> have a
>> >>> >> >> >> >> >> >> >> common
>> >>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun
>> >>> phrase
>> >>> >> >> itself
>> >>> >> >> >> >> has a
>> >>> >> >> >> >> >> >> plural
>> >>> >> >> >> >> >> >> >> >>>> form.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison
>> >>> only the
>> >>> >> >> >> >> >> >> >> >>>> >>> words in
>> >>> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> >>>> category
>> >>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
>> >>> prepositions
>> >>> >> or
>> >>> >> >> >> >> >> determiners
>> >>> >> >> >> >> >> >> >> such
>> >>> >> >> >> >> >> >> >> >>>> as "of
>> >>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag
>> the
>> >>> >> >> categories
>> >>> >> >> >> >> >> contents
>> >>> >> >> >> >> >> >> as
>> >>> >> >> >> >> >> >> >> >>>> well.
>> >>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and
>> lemma
>> >>> on
>> >>> >> the
>> >>> >> >> >> >> dbpedia
>> >>> >> >> >> >> >> >> >> >>>> categories when
>> >>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub
>> and
>> >>> >> storing
>> >>> >> >> >> >> >> >> >> >>>> >>> them
>> >>> >> >> >> >> for
>> >>> >> >> >> >> >> >> later
>> >>> >> >> >> >> >> >> >> >>>> use - I
>> >>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
>> >>> moment.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in
>> the
>> >>> noun
>> >>> >> >> phrase
>> >>> >> >> >> >> with
>> >>> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> >>>> equivalent
>> >>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the
>> >>> number
>> >>> >> of
>> >>> >> >> >> >> matches I
>> >>> >> >> >> >> >> >> can
>> >>> >> >> >> >> >> >> >> >>>> create a
>> >>> >> >> >> >> >> >> >> >>>> >>> confidence level.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase
>> with
>> >>> the
>> >>> >> >> >> >> >> >> >> >>>> >>> rdf:type
>> >>> >> >> >> >> from
>> >>> >> >> >> >> >> >> >> dbpedia
>> >>> >> >> >> >> >> >> >> >>>> of the
>> >>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase
>> the
>> >>> >> >> confidence
>> >>> >> >> >> >> level.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities
>> >>> which
>> >>> >> can
>> >>> >> >> >> >> >> >> >> >>>> >>> match a
>> >>> >> >> >> >> >> >> certain
>> >>> >> >> >> >> >> >> >> >>>> noun
>> >>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with
>> the
>> >>> >> closest
>> >>> >> >> >> >> >> >> >> >>>> >>> named
>> >>> >> >> >> >> >> entity
>> >>> >> >> >> >> >> >> >> prior
>> >>> >> >> >> >> >> >> >> >>>> to it
>> >>> >> >> >> >> >> >> >> >>>> >>> in the text.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> What do you think?
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> Cristian
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>> >>> >> >> >> >> cristian.petroaca@gmail.com>:
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic
>> but
>> >>> I'm
>> >>> >> >> >> >> >> >> >> >>>> >>>> working on
>> >>> >> >> >> >> >> it.
>> >>> >> >> >> >> >> >> I'll
>> >>> >> >> >> >> >> >> >> >>>> provide
>> >>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
>> >>> >> feedback on
>> >>> >> >> >> >> >> >> >> >>>> >>>> it.
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools
>> >>> such as
>> >>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
>> >>> >> >> >> >> and
>> >>> >> >> >> >> >> >> >> >>>> CherryPicker
>> >>> >> >> >> >> >> >> >> >>>> >>>> and
>> >>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> Cristian
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rharo@apache.org
>> >:
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about your
>> >>> >> concrete
>> >>> >> >> >> >> heuristic,
>> >>> >> >> >> >> >> >> in my
>> >>> >> >> >> >> >> >> >> >>>> honest
>> >>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a
>> >>> lot of
>> >>> >> >> false
>> >>> >> >> >> >> >> >> positives. I
>> >>> >> >> >> >> >> >> >> >>>> don't
>> >>> >> >> >> >> >> >> >> >>>> >>>>> know
>> >>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some
>> "locality"
>> >>> >> >> features
>> >>> >> >> >> >> >> >> >> >>>> >>>>> to
>> >>> >> >> >> >> >> detect
>> >>> >> >> >> >> >> >> >> such
>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into
>> >>> account
>> >>> >> >> that
>> >>> >> >> >> >> >> >> >> >>>> >>>>> it
>> >>> >> >> >> >> is
>> >>> >> >> >> >> >> >> quite
>> >>> >> >> >> >> >> >> >> >>>> usual
>> >>> >> >> >> >> >> >> >> >>>> >>>>> that
>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even
>> in
>> >>> >> >> different
>> >>> >> >> >> >> >> >> paragraphs.
>> >>> >> >> >> >> >> >> >> >>>> Although
>> >>> >> >> >> >> >> >> >> >>>> >>>>> I'm
>> >>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
>> >>> >> Understanding,
>> >>> >> >> I
>> >>> >> >> >> >> would
>> >>> >> >> >> >> >> say
>> >>> >> >> >> >> >> >> it
>> >>> >> >> >> >> >> >> >> is
>> >>> >> >> >> >> >> >> >> >>>> quite
>> >>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent
>> precision/recall
>> >>> rates
>> >>> >> >> for
>> >>> >> >> >> >> >> >> coreferencing
>> >>> >> >> >> >> >> >> >> >>>> using
>> >>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try
>> to
>> >>> >> others
>> >>> >> >> >> >> >> >> >> >>>> >>>>> tools
>> >>> >> >> >> >> like
>> >>> >> >> >> >> >> >> BART
>> >>> >> >> >> >> >> >> >> (
>> >>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
>> >>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
>> >>> escribió:
>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
>> >>> implementing
>> >>> >> the
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Event
>> >>> >> >> >> >> >> >> >> extraction
>> >>> >> >> >> >> >> >> >> >>>> Engine
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
>> >>> >> >> >> >> >> >>
>> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >>> >> >> >> >> >> >> >> >>>> to
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> have
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given
>> text.
>> >>> >> This
>> >>> >> >> is
>> >>> >> >> >> >> >> provided
>> >>> >> >> >> >> >> >> now
>> >>> >> >> >> >> >> >> >> >>>> via the
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I
>> saw
>> >>> this
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> module
>> >>> >> >> >> >> is
>> >>> >> >> >> >> >> >> >> performing
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal
>> (Barack
>> >>> Obama
>> >>> >> and
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
>> >>> >> >> >> >> >> Obama)
>> >>> >> >> >> >> >> >> >> >>>> coreference
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences
>> from
>> >>> the
>> >>> >> text
>> >>> >> >> I
>> >>> >> >> >> >> though
>> >>> >> >> >> >> >> of
>> >>> >> >> >> >> >> >> >> >>>> creating
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> some
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
>> >>> >> >> coreference :
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights.
>> The
>> >>> >> software
>> >>> >> >> >> >> company
>> >>> >> >> >> >> >> just
>> >>> >> >> >> >> >> >> >> >>>> announced
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> its
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously
>> >>> refers
>> >>> >> to
>> >>> >> >> >> >> "Apple".
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of
>> >>> Named
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
>> >>> >> >> >> >> >> which
>> >>> >> >> >> >> >> >> are
>> >>> >> >> >> >> >> >> >> of
>> >>> >> >> >> >> >> >> >> >>>> the
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in
>> this
>> >>> case
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
>> >>> >> >> >> >> and
>> >>> >> >> >> >> >> >> also
>> >>> >> >> >> >> >> >> >> >>>> have
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the
>> >>> dbpedia
>> >>> >> >> >> >> categories
>> >>> >> >> >> >> >> of
>> >>> >> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> >>>> named
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as
>> >>> "The
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> software
>> >>> >> >> >> >> >> >> company" in
>> >>> >> >> >> >> >> >> >> >>>> the
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> text
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using
>> the
>> >>> new
>> >>> >> Pos
>> >>> >> >> Tag
>> >>> >> >> >> >> Based
>> >>> >> >> >> >> >> >> Phrase
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
>> >>> >> dependency
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
>> >>> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> >>>> sentence and
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if
>> this
>> >>> kind
>> >>> >> of
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic
>> >>> >> >> >> >> >> would
>> >>> >> >> >> >> >> >> be
>> >>> >> >> >> >> >> >> >> >>>> useful
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> as a
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case
>> the
>> >>> >> >> precision
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> and
>> >>> >> >> >> >> >> >> recall
>> >>> >> >> >> >> >> >> >> are
>> >>> >> >> >> >> >> >> >> >>>> good
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> --
>> >>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>> >>> >> >> >> >> >> >> ++43-699-11108907
>> >>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> --
>> >>> >> >> >> >> >> >> >> | Rupert Westenthaler
>> >>> >> >> >> >> >> >> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
>> >>> >> >> >> >> ++43-699-11108907
>> >>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> --
>> >>> >> >> >> >> >> >> | Rupert Westenthaler
>> >>> >> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >> >> | Bodenlehenstraße 11
>> >>> >> >> >> >> >> >> ++43-699-11108907
>> >>> >> >> >> >> >> >> | A-5500 Bischofshofen
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> --
>> >>> >> >> >> >> >> | Rupert Westenthaler
>> >>> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >> | Bodenlehenstraße 11
>> >>> >> >> ++43-699-11108907
>> >>> >> >> >> >> >> | A-5500 Bischofshofen
>> >>> >> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >> --
>> >>> >> >> >> >> | Rupert Westenthaler
>> >>> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> | Bodenlehenstraße 11
>> >>> >> ++43-699-11108907
>> >>> >> >> >> >> | A-5500 Bischofshofen
>> >>> >> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> >>> >> >> >> | Bodenlehenstraße 11
>> >>> ++43-699-11108907
>> >>> >> >> >> | A-5500 Bischofshofen
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>> >> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >>> >> >> | A-5500 Bischofshofen
>> >>> >> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >>> >> | A-5500 Bischofshofen
>> >>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>> | Bodenlehenstraße 11                             ++43-699-11108907
>> >>> | A-5500 Bischofshofen
>> >>>
>> >>
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.

Hi Cristian,

I see several possible solutions:

1. The indexing tool does support LDPath. That means you can import
all the required RDF files and use LDPath to append the labels of the
Yago Types directly to the dbpedia entities.  This would prevent
additional lookups to retrieve the types, but also increase the size
of the index a lot.
2. You could also index the Yago Types and use an additional Entityhub
lookup to retrieve them. In this case you should first collect all
types referenced by Entities in the processed text and in a second
step retrieve the labels. While this means additional lookups it will
only load the labels for an type once. In addition you could use a
cache for types.
3. Your engine could use LDPath to retrieve the types. This would
require to index the data like with option (2) and use a LDPath
statement similar to (1). It would be the slowest solution (as it
requires an additional lookup for every extracted entity) but require
the least code.

regarding:

On Wed, May 7, 2014 at 9:02 AM, Cristian Petroaca
<cr...@gmail.com> wrote:
> I can get the labels from one of the yago downloads here :
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoMultilingualClassLabels.txt.
> I'll need another yago download file to map the yago wordnet classes to
> dbpedia uris. That could be done via a script maybe.

I hope there is also an RDF files with that labels. In that case you
need just to add it to the resource/rdfdata directory.

best
Rupert

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.

Hi Rupert,

Looking into the yago_types.nt file which assigns yago classes to dbpedia
entities I realized that there are no yago class labels present, I just
have the class uri like : <http://dbpedia/..something../President1829302/.
I also need the class labels so that I can compare them to the noun token's
string from the text.

I can get the labels from one of the yago downloads here :
http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoMultilingualClassLabels.txt.
I'll need another yago download file to map the yago wordnet classes to
dbpedia uris. That could be done via a script maybe.

Once I have the dbpedia_yago_class_uri -> label file is it possible to
integrate this data in the dbpedia index and later be able to query the
labels from the 'dbpedia' Site? As you can see the file won't refer to the
actual dbpedia entity but to the yago class as being the subject in the
triple. So how would that work in the dbpedia indexing process? What should
I change in the mappings.txt file? Briefly looking through the mappings.txt
file this looks similar to how the skos categories are indexed.

Other than that, I saw that someone will be working on integrating YAGO as
part of Gsoc 2014. So maybe waiting for that is an option too but I don't
know what the extent of the integration will be.

Thanks,
Cristi


2014-04-29 11:06 GMT+03:00 Cristian Petroaca <cr...@gmail.com>:

> Ok, I think I kind of figured it out. If I want to use the dbpedia data
> index I need to use the SiteManager to get the Site with id = "dbpedia".
> Then I can query the Site directly.
>
> I have some additional questions though :
> 1. In my particular case I want to be able to also get the yago class of
> the given entity. These properties come with yago-types.nt file from
> dbpedia and this file is not present in the entityhub dbpedia data fetch
> scripts here :
> https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/dbpedia-3.8/fetch_data_en_int.sh.
> Also this file comes with dbpedia 3.9. This means that I need to rebuild
> the dbpedia index data with 3.9 and the new yago-types.nt file. Is this
> correct?
>
> 2. I also need to be able to get some specific dbpedia properties from the
> index, such as dbpedia-owl:locationCity and others for a given entity. At
> the moment these are not available when doing a query on the dbpedia Site.
> I suppose I need to place them in
> https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/src/main/resources/indexing/config/mappings.txtand do a rebuild of the dbpedia index?
>
> Thanks.
> Cristian
>
>
> 2014-04-28 16:58 GMT+03:00 Cristian Petroaca <cr...@gmail.com>
> :
>
>> Hi,
>>
>> I've started to implement the dbpedia properties logic and I'd like to
>> get some feedback on some things that I am doing :
>> I want to get a NER from the text and search for it in the dbpedia data
>> so that I can get certain dbpedia properties.
>> The way I'm trying to do this is by getting the NER_ANNOTATION chunk's
>> text and search that in the Entityhub ( which from what I saw is by default
>> configured with dbpedia data). I haven't yet performed a query to actually
>> get the data but before I continue I'd like to ask if this is the way to go?
>>
>> Thanks,
>> Cristian
>>
>>
>> 2014-03-28 15:12 GMT+02:00 Cristian Petroaca <cristian.petroaca@gmail.com
>> >:
>>
>>> Examples :
>>>
>>> 1. Group membership :
>>>     a. Spatial membership :
>>>
>>>         "Microsoft anounced its 2013 earnings. <coref>The Richmond-based
>>> company</coref> made huge profits."
>>>
>>>     b. Organisational membership :
>>>
>>>        "Mick Jagger started a new solo album. <coref>The Rolling Stones
>>> singer</coref> did not say what the theme will be."
>>>
>>> 2. Functional membership :
>>>
>>>    "Allianz announced its 2013 earnings. <coref>The financial services
>>> company</coref> made a huge profit."
>>>
>>> 3.  If no matches were found for the current NER with rules from above
>>> then if the yago:class which matched has more than 2 nouns then we also
>>> consider this a good co-reference but with a lower confidence maybe.
>>>
>>>    "Boris Becker will take part in a demonstrative tennis match.
>>> <coref>The former tennis player</coref> will play again after 10 years."
>>>
>>>
>>> 2014-03-28 12:22 GMT+02:00 Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com>:
>>>
>>>> Hi Cristian, all
>>>>
>>>> Looks good to me, nut I am not sure if I got everything. If you could
>>>> provide example texts where those rules apply it would make it much
>>>> easier to understand.
>>>>
>>>> Instead of using dbpedia properties you should define your own domain
>>>> model (ontology). You can than align the dbpedia properties to your
>>>> model. This will allow it to apply this approach also to knowledge
>>>> bases other than dbpedia.
>>>>
>>>> For people new to this thread: The above message adds to the
>>>> suggestion first made by Cristian on 4th February. Also the following
>>>> 4 messages (until 7th Feb) provide additional context.
>>>>
>>>> best
>>>> Rupert
>>>>
>>>>
>>>> On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca
>>>> <cr...@gmail.com> wrote:
>>>> > Hi guys,
>>>> >
>>>> > After Rupert's last suggestions related to this enhancement engine I
>>>> > devised a more comprehensive algorithm for matching the noun phrases
>>>> > against the NER properties.Please take a look and let me know what you
>>>> > think. Thanks.
>>>> >
>>>> > The following rules will be applied to every noun phrase in order to
>>>> find
>>>> > co-references:
>>>> >
>>>> > 1. For each NER prior to the current noun phrase in the text match the
>>>> > yago:class label to the contents of the noun phrase.
>>>> >
>>>> > For the NERs which have a yago:class which matches, apply:
>>>> >
>>>> > 2. Group membership rules :
>>>> >
>>>> >     a. spatial membership : the NER is part of a Location. If the noun
>>>> > phrase contains a LOCATION or a demonym then check any location
>>>> properties
>>>> > of the matching NER.
>>>> >
>>>> >     If matching NER is a :
>>>> >     - person, match against :birthPlace, :region, :nationality
>>>> >     - organisation, match against :foundationPlace, :locationCity,
>>>> > :location, :hometown
>>>> >     - place, match against :country, :subdivisionName, :location,
>>>> >
>>>> >     Ex: The Italian President, The Richmond-based company
>>>> >
>>>> >     b. organisational membership : the NER is part of an
>>>> Organisation. If
>>>> > the noun phrase contains an ORGANISATION then check the following
>>>> > properties of the maching NER:
>>>> >
>>>> >     If matching NER is :
>>>> >     - person, match against :occupation, :associatedActs
>>>> >     - organisation ?
>>>> >     - location ?
>>>> >
>>>> > Ex: The Microsoft executive, The Pink Floyd singer
>>>> >
>>>> > 3. Functional description rule: the noun phrase describes what the
>>>> NER does
>>>> > conceptually.
>>>> > If there are no NERs in the noun phrase then match the following
>>>> properties
>>>> > of the matching NER to the contents of the noun phrase (aside from the
>>>> > nouns which are part of the yago:class) :
>>>> >
>>>> >    If NER is a:
>>>> >    - person ?
>>>> >    - organisation : , match against :service, :industry, :genre
>>>> >    - location ?
>>>> >
>>>> > Ex:  The software company.
>>>> >
>>>> > 4. If no matches were found for the current NER with rules 2 or 3
>>>> then if
>>>> > the yago:class which matched has more than 2 nouns then we also
>>>> consider
>>>> > this a good co-reference but with a lower confidence maybe.
>>>> >
>>>> > Ex: The former tennis player, the theoretical physicist.
>>>> >
>>>> > 5. Based on the number of nouns which matched we create a confidence
>>>> level.
>>>> > The number of matched nouns cannot be lower than 2 and we must have a
>>>> > yago:class match.
>>>> >
>>>> > For all NERs which got to this point, select the closest ones in the
>>>> text
>>>> > to the noun phrase which matched against the same properties
>>>> (yago:class
>>>> > and dbpedia) and mark them as co-references.
>>>> >
>>>> > Note: all noun phrases need to be lemmatized before all of this in
>>>> case
>>>> > there are any plurals.
>>>> >
>>>> >
>>>> > 2014-03-25 20:50 GMT+02:00 Cristian Petroaca <
>>>> cristian.petroaca@gmail.com>:
>>>> >
>>>> >> That worked. Thanks.
>>>> >>
>>>> >> So, there are no exceptions during the startup of the launcher.
>>>> >> The component tab in the felix console shows 6 WeightedChains the
>>>> first
>>>> >> time, including the default one but after my changes and a restart
>>>> there
>>>> >> are only 5 - the default one is missing altogether.
>>>> >>
>>>> >>
>>>> >> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
>>>> >> rupert.westenthaler@gmail.com>:
>>>> >>
>>>> >> Hi Cristian,
>>>> >>>
>>>> >>> I do see the same problem since last Friday. The solution as
>>>> mentions
>>>> >>> by [1] works for me.
>>>> >>>
>>>> >>>     mvn -Djsse.enableSNIExtension=false {goals}
>>>> >>>
>>>> >>> No Idea why https connections to github do currently cause this. I
>>>> >>> could not find anything related via Google. So I suggest to use the
>>>> >>> system property for now. If this persists for longer we can adapt
>>>> the
>>>> >>> build files accordingly.
>>>> >>>
>>>> >>> best
>>>> >>> Rupert
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> [1]
>>>> >>>
>>>> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
>>>> >>>
>>>> >>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
>>>> >>> <cr...@gmail.com> wrote:
>>>> >>> > I did a clean on the whole project and now I wanted to do another
>>>> "mvn
>>>> >>> > clean install" but I am getting this :
>>>> >>> >
>>>> >>> > "[INFO]
>>>> >>> >
>>>> ------------------------------------------------------------------------
>>>> >>> > [ERROR] Failed to execute goal
>>>> >>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
>>>> >>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es:
>>>> An
>>>> >>> Ant
>>>> >>> > BuildE
>>>> >>> > xception has occured: The following error occurred while
>>>> executing this
>>>> >>> > line:
>>>> >>> > [ERROR]
>>>> >>> >
>>>> C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
>>>> >>> > 3: Failed to copy
>>>> >>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
>>>> >>> >
>>>> 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to
>>>> >>> > C:\Data\Pr
>>>> >>> >
>>>> >>>
>>>> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
>>>> >>> > data\opennlp\es-pos-maxent.bin due to
>>>> javax.net.ssl.SSLProtocolException
>>>> >>> > handshake alert : unrecognized_name"
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
>>>> >>> > rupert.westenthaler@gmail.com>:
>>>> >>> >
>>>> >>> >> Hi Cristian,
>>>> >>> >>
>>>> >>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
>>>> >>> >> <cr...@gmail.com> wrote:
>>>> >>> >> >
>>>> >>> >>
>>>> >>>
>>>> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
>>>> >>> >> > service.ranking=I"-2147483648"
>>>> >>> >> > stanbol.enhancer.chain.name="default"
>>>> >>> >>
>>>> >>> >> Does look fine to me. Do you see any exception during the
>>>> startup of
>>>> >>> >> the launcher. Can you check the status of this component in the
>>>> >>> >> component tab of the felix web console [1] (search for
>>>> >>> >>
>>>> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
>>>> >>> >> you have multiple you can find the correct one by comparing the
>>>> >>> >> "Properties" with those in the configuration file.
>>>> >>> >>
>>>> >>> >> I guess that the according service is in the 'unsatisfied' as
>>>> you do
>>>> >>> >> not see it in the web interface. But if this is the case you
>>>> should
>>>> >>> >> also see the according exception in the log. You can also
>>>> manually
>>>> >>> >> stop/start the component. In this case the exception should be
>>>> >>> >> re-thrown and you do not need to search the log for it.
>>>> >>> >>
>>>> >>> >> best
>>>> >>> >> Rupert
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> [1] http://localhost:8080/system/console/components
>>>> >>> >>
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
>>>> >>> >> rupert.westenthaler@gmail.com
>>>> >>> >> >>:
>>>> >>> >> >
>>>> >>> >> >> Hi Cristian,
>>>> >>> >> >>
>>>> >>> >> >> you can not send attachments to the list. Please copy the
>>>> contents
>>>> >>> >> >> directly to the mail
>>>> >>> >> >>
>>>> >>> >> >> thx
>>>> >>> >> >> Rupert
>>>> >>> >> >>
>>>> >>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>>>> >>> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> > The config attached.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>>>> >>> >> >> > <ru...@gmail.com>:
>>>> >>> >> >> >
>>>> >>> >> >> >> Hi Cristian,
>>>> >>> >> >> >>
>>>> >>> >> >> >> can you provide the contents of the chain after your
>>>> >>> modifications?
>>>> >>> >> >> >> Would be interesting to test why the chain is no longer
>>>> active
>>>> >>> after
>>>> >>> >> >> >> the restart.
>>>> >>> >> >> >>
>>>> >>> >> >> >> You can find the config file in the 'stanbol/fileinstall'
>>>> folder.
>>>> >>> >> >> >>
>>>> >>> >> >> >> best
>>>> >>> >> >> >> Rupert
>>>> >>> >> >> >>
>>>> >>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>>>> >>> >> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> > Related to the default chain selection rules : before
>>>> restart I
>>>> >>> >> had a
>>>> >>> >> >> >> > chain
>>>> >>> >> >> >> > with the name 'default' as in I could access it via
>>>> >>> >> >> >> > enhancer/chain/default.
>>>> >>> >> >> >> > Then I just added another engine to the 'default' chain.
>>>> I
>>>> >>> assumed
>>>> >>> >> >> that
>>>> >>> >> >> >> > after the restart the chain with the 'default' name
>>>> would be
>>>> >>> >> >> persisted.
>>>> >>> >> >> >> > So
>>>> >>> >> >> >> > the first rule should have been applied after the
>>>> restart as
>>>> >>> well.
>>>> >>> >> But
>>>> >>> >> >> >> > instead I cannot reach it via enhancer/chain/default
>>>> anymore
>>>> >>> so its
>>>> >>> >> >> >> > gone.
>>>> >>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in
>>>> any
>>>> >>> way, I
>>>> >>> >> >> just
>>>> >>> >> >> >> > wanted to understand where the problem is.
>>>> >>> >> >> >> >
>>>> >>> >> >> >> >
>>>> >>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>>>> >>> >> >> >> > <rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >>:
>>>> >>> >> >> >> >
>>>> >>> >> >> >> >> Hi Cristian
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>>>> >>> >> >> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
>>>> >>> >> >> >> >> >
>>>> >>> >> >> >> >> > 2. I start the stable launcher -> create a new
>>>> instance of
>>>> >>> the
>>>> >>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At
>>>> this
>>>> >>> point
>>>> >>> >> >> >> >> > everything
>>>> >>> >> >> >> >> > looks good and works ok.
>>>> >>> >> >> >> >> > After I restart the server the default chain is gone
>>>> and
>>>> >>> >> instead I
>>>> >>> >> >> >> >> > see
>>>> >>> >> >> >> >> this
>>>> >>> >> >> >> >> > in the enhancement chains page : all-active (default,
>>>> id:
>>>> >>> 149,
>>>> >>> >> >> >> >> > ranking:
>>>> >>> >> >> >> >> 0,
>>>> >>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not
>>>> contain
>>>> >>> the
>>>> >>> >> >> >> >> > 'default'
>>>> >>> >> >> >> >> > word before the restart.
>>>> >>> >> >> >> >> >
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> Please note the default chain selection rules as
>>>> described at
>>>> >>> [1].
>>>> >>> >> >> You
>>>> >>> >> >> >> >> can also access chains chains under
>>>> >>> '/enhancer/chain/{chain-name}'
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> best
>>>> >>> >> >> >> >> Rupert
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> [1]
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> > It looks like the config files are exactly what I
>>>> need.
>>>> >>> Thanks.
>>>> >>> >> >> >> >> >
>>>> >>> >> >> >> >> >
>>>> >>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>>>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >>:
>>>> >>> >> >> >> >> >
>>>> >>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>>>> >>> >> >> >> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> >> >> > Thanks Rupert.
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> > A couple more questions/issues :
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing
>>>> this
>>>> >>> in the
>>>> >>> >> >> >> >> >> > console
>>>> >>> >> >> >> >> >> > output :
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted
>>>> Chains get
>>>> >>> >> messed
>>>> >>> >> >> >> >> >> > up. I
>>>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine
>>>> to it
>>>> >>> so
>>>> >>> >> there
>>>> >>> >> >> >> >> >> > are
>>>> >>> >> >> >> >> 11
>>>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>>>> contains
>>>> >>> >> around
>>>> >>> >> >> 23
>>>> >>> >> >> >> >> >> engines
>>>> >>> >> >> >> >> >> > in total.
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> I was not able to replicate this. What I tried was
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> (1) start up the stable launcher
>>>> >>> >> >> >> >> >> (2) add an additional engine to the default chain
>>>> >>> >> >> >> >> >> (3) restart the launcher
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> The default chain was not changed after (2) and (3).
>>>> So I
>>>> >>> would
>>>> >>> >> >> need
>>>> >>> >> >> >> >> >> further information for knowing why this is
>>>> happening.
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> Generally it is better to create you own chain
>>>> instance as
>>>> >>> >> >> modifying
>>>> >>> >> >> >> >> >> one that is provided by the default configuration. I
>>>> would
>>>> >>> also
>>>> >>> >> >> >> >> >> recommend that you keep your test configuration in
>>>> text
>>>> >>> files
>>>> >>> >> and
>>>> >>> >> >> to
>>>> >>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder.
>>>> Doing so
>>>> >>> >> prevent
>>>> >>> >> >> you
>>>> >>> >> >> >> >> >> from manually entering the configuration after a
>>>> software
>>>> >>> >> update.
>>>> >>> >> >> >> >> >> The
>>>> >>> >> >> >> >> >> production-mode section [3] provides information on
>>>> how to
>>>> >>> do
>>>> >>> >> >> that.
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> best
>>>> >>> >> >> >> >> >> Rupert
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> [1]
>>>> https://issues.apache.org/jira/browse/STANBOL-1278
>>>> >>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
>>>> >>> >> >> >> >> >> [3]
>>>> http://stanbol.apache.org/docs/trunk/production-mode
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> > ERROR: Bundle
>>>> >>> org.apache.stanbol.enhancer.engine.topic.web
>>>> >>> >> >> [153]:
>>>> >>> >> >> >> >> Error
>>>> >>> >> >> >> >> >> > starting
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >>
>>>> >>>
>>>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>>>> >>> >> >> >> >> >> > (org.osgi
>>>> >>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint
>>>> in
>>>> >>> bundle
>>>> >>> >> >> >> >> >> > org.apache.stanbol.e
>>>> >>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve
>>>> 153.0:
>>>> >>> >> missing
>>>> >>> >> >> >> >> >> > requirement [15
>>>> >>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>>>> >>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>>>> >>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
>>>> >>> constraint in
>>>> >>> >> >> >> >> >> > bundle
>>>> >>> >> >> >> >> >> > org.apache.s
>>>> >>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to
>>>> resolve
>>>> >>> >> 153.0:
>>>> >>> >> >> >> >> missing
>>>> >>> >> >> >> >> >> > require
>>>> >>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>>>> >>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>>>> >>> >> >> >> >> >> > )
>>>> >>> >> >> >> >> >> >         at
>>>> >>> >> >> >> >> >>
>>>> >>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>>>> >>> >> >> >> >> >> >         at
>>>> >>> >> >> >> >>
>>>> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>>>> >>> >> >> >> >> >> >         at
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >>
>>>> >>>
>>>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >         at
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >>
>>>> >>>
>>>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>>>> >>> >> >> >> >> >> > )
>>>> >>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> > Despite of this the server starts fine and I can
>>>> use the
>>>> >>> >> >> enhancer
>>>> >>> >> >> >> >> fine.
>>>> >>> >> >> >> >> >> Do
>>>> >>> >> >> >> >> >> > you guys see this as well?
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted
>>>> Chains get
>>>> >>> >> messed
>>>> >>> >> >> >> >> >> > up. I
>>>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine
>>>> to it
>>>> >>> so
>>>> >>> >> there
>>>> >>> >> >> >> >> >> > are
>>>> >>> >> >> >> >> 11
>>>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>>>> contains
>>>> >>> >> around
>>>> >>> >> >> 23
>>>> >>> >> >> >> >> >> engines
>>>> >>> >> >> >> >> >> > in total.
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>>>> >>> >> >> >> >> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >> >>:
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> Hi Cristian,
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> NER Annotations are typically available as both
>>>> >>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and
>>>>  fise:TextAnnotation
>>>> >>> [1]
>>>> >>> >> in
>>>> >>> >> >> the
>>>> >>> >> >> >> >> >> >> enhancement metadata. As you are already
>>>> accessing the
>>>> >>> >> >> >> >> >> >> AnayzedText I
>>>> >>> >> >> >> >> >> >> would prefer using the
>>>>  NlpAnnotations.NER_ANNOTATION.
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> best
>>>> >>> >> >> >> >> >> >> Rupert
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> [1]
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian
>>>> Petroaca
>>>> >>> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> >> >> >> > Thanks.
>>>> >>> >> >> >> >> >> >> > I assume I should get the Named entities using
>>>> the
>>>> >>> same
>>>> >>> >> but
>>>> >>> >> >> >> >> >> >> > with
>>>> >>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>>>> >>> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>>>> >>> >> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
>>>> >>> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >> Hallo Cristian,
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF
>>>> enhancement
>>>> >>> results.
>>>> >>> >> >> You
>>>> >>> >> >> >> >> need to
>>>> >>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> here is some demo code you can use in the
>>>> >>> >> computeEnhancement
>>>> >>> >> >> >> >> method
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>         AnalysedText at =
>>>> >>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>>>> >>> >> >> >> >> ci,
>>>> >>> >> >> >> >> >> >> true);
>>>> >>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
>>>> >>> >> >> >> >> >> >> >> at.getSentences();
>>>> >>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as
>>>> single
>>>> >>> >> >> sentence
>>>> >>> >> >> >> >> >> >> >>             sections =
>>>> >>> >> Collections.singleton(at).iterator();
>>>> >>> >> >> >> >> >> >> >>         }
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>         while(sections.hasNext()){
>>>> >>> >> >> >> >> >> >> >>             Section section = sections.next();
>>>> >>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
>>>> >>> >> >> >> >> >> >> >>
>>>> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>>>> >>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
>>>> >>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
>>>> >>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>>>> >>> >> >> >> >> >> >> >>
>>>> if(phrase.value().getCategory() ==
>>>> >>> >> >> >> >> >> >> LexicalCategory.Noun){
>>>> >>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase
>>>> [{},{}]
>>>> >>> {}",
>>>> >>> >> >> new
>>>> >>> >> >> >> >> >> Object[]{
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>
>>>> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>>>> >>> >> >> >> >> >> >> >>                 }
>>>> >>> >> >> >> >> >> >> >>             }
>>>> >>> >> >> >> >> >> >> >>         }
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> hope this helps
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> best
>>>> >>> >> >> >> >> >> >> >> Rupert
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> [1]
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian
>>>> Petroaca
>>>> >>> >> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> >> >> >> >> > I started to implement the engine and I'm
>>>> having
>>>> >>> >> problems
>>>> >>> >> >> >> >> >> >> >> > with
>>>> >>> >> >> >> >> >> getting
>>>> >>> >> >> >> >> >> >> >> > results for noun phrases. I modified the
>>>> "default"
>>>> >>> >> >> weighted
>>>> >>> >> >> >> >> chain
>>>> >>> >> >> >> >> >> to
>>>> >>> >> >> >> >> >> >> also
>>>> >>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a
>>>> sample text
>>>> >>> :
>>>> >>> >> >> "Angela
>>>> >>> >> >> >> >> Merkel
>>>> >>> >> >> >> >> >> >> >> visted
>>>> >>> >> >> >> >> >> >> >> > China. The german chancellor met with various
>>>> >>> people".
>>>> >>> >> I
>>>> >>> >> >> >> >> expected
>>>> >>> >> >> >> >> >> that
>>>> >>> >> >> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> > RDF XML output would contain some info about
>>>> the
>>>> >>> noun
>>>> >>> >> >> >> >> >> >> >> > phrases
>>>> >>> >> >> >> >> but I
>>>> >>> >> >> >> >> >> >> >> cannot
>>>> >>> >> >> >> >> >> >> >> > see any.
>>>> >>> >> >> >> >> >> >> >> > Could you point me to the correct way to
>>>> generate
>>>> >>> the
>>>> >>> >> noun
>>>> >>> >> >> >> >> phrases?
>>>> >>> >> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >> > Thanks,
>>>> >>> >> >> >> >> >> >> >> > Cristian
>>>> >>> >> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca
>>>> <
>>>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
>>>> >>> >> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >> >> Opened
>>>> >>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
>>>> >>> >> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian
>>>> Petroaca <
>>>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>
>>>> >>> >> >> >> >> >> >> >> >> :
>>>> >>> >> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> >> Hi Rupert,
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea.
>>>> I'll also
>>>> >>> >> take a
>>>> >>> >> >> >> >> >> >> >> >>> look
>>>> >>> >> >> >> >> at
>>>> >>> >> >> >> >> >> >> Yago.
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked
>>>> about
>>>> >>> here.
>>>> >>> >> It
>>>> >>> >> >> >> >> >> >> >> >>> will
>>>> >>> >> >> >> >> >> >> probably
>>>> >>> >> >> >> >> >> >> >> >>> have just a draft-like description for now
>>>> and
>>>> >>> will
>>>> >>> >> be
>>>> >>> >> >> >> >> >> >> >> >>> updated
>>>> >>> >> >> >> >> >> as I
>>>> >>> >> >> >> >> >> >> go
>>>> >>> >> >> >> >> >> >> >> >>> along.
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>> Thanks,
>>>> >>> >> >> >> >> >> >> >> >>> Cristian
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert
>>>> Westenthaler <
>>>> >>> >> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>> Hi Cristian,
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You
>>>> should
>>>> >>> have
>>>> >>> >> a
>>>> >>> >> >> >> >> >> >> >> >>>> look at
>>>> >>> >> >> >> >> >> Yago2
>>>> >>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago
>>>> taxonomy
>>>> >>> is
>>>> >>> >> much
>>>> >>> >> >> >> >> better
>>>> >>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia.
>>>> Mapping
>>>> >>> >> >> >> >> >> >> >> >>>> suggestions of
>>>> >>> >> >> >> >> >> >> dbpedia
>>>> >>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both
>>>> dbpedia and
>>>> >>> >> yago2
>>>> >>> >> >> do
>>>> >>> >> >> >> >> >> provide
>>>> >>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings.
>>>> The
>>>> >>> >> Redmond's
>>>> >>> >> >> >> >> >> >> >> >>>> >> company
>>>> >>> >> >> >> >> >> made
>>>> >>> >> >> >> >> >> >> a
>>>> >>> >> >> >> >> >> >> >> >>>> >> huge profit".
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> Thats actually a very good example.
>>>> Spatial
>>>> >>> contexts
>>>> >>> >> >> are
>>>> >>> >> >> >> >> >> >> >> >>>> very
>>>> >>> >> >> >> >> >> >> >> >>>> important as they tend to be often used
>>>> for
>>>> >>> >> >> referencing.
>>>> >>> >> >> >> >> >> >> >> >>>> So I
>>>> >>> >> >> >> >> >> would
>>>> >>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial
>>>> context.
>>>> >>> For
>>>> >>> >> >> >> >> >> >> >> >>>> spatial
>>>> >>> >> >> >> >> >> >> Entities
>>>> >>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for
>>>> other
>>>> >>> >> (like a
>>>> >>> >> >> >> >> Person,
>>>> >>> >> >> >> >> >> >> >> >>>> Company) you could use relations to
>>>> spatial
>>>> >>> entities
>>>> >>> >> >> >> >> >> >> >> >>>> define
>>>> >>> >> >> >> >> >> their
>>>> >>> >> >> >> >> >> >> >> >>>> spatial context. This context could than
>>>> be
>>>> >>> used to
>>>> >>> >> >> >> >> >> >> >> >>>> correctly
>>>> >>> >> >> >> >> >> link
>>>> >>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the
>>>> "spatial"
>>>> >>> >> >> context
>>>> >>> >> >> >> >> >> >> >> >>>> of
>>>> >>> >> >> >> >> each
>>>> >>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities
>>>> that are
>>>> >>> >> cities,
>>>> >>> >> >> >> >> regions,
>>>> >>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension,
>>>> because
>>>> >>> those
>>>> >>> >> are
>>>> >>> >> >> >> >> >> >> >> >>>> very
>>>> >>> >> >> >> >> often
>>>> >>> >> >> >> >> >> >> used
>>>> >>> >> >> >> >> >> >> >> >>>> for coreferences.
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> [1]
>>>> http://www.mpi-inf.mpg.de/yago-naga/yago/
>>>> >>> >> >> >> >> >> >> >> >>>> [2]
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>>>> >>> >> >> >> >> >> >> >> >>>> [3]
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
>>>> >>> Petroaca
>>>> >>> >> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories
>>>> for each
>>>> >>> >> entity,
>>>> >>> >> >> >> >> >> >> >> >>>> > in
>>>> >>> >> >> >> >> this
>>>> >>> >> >> >> >> >> >> case
>>>> >>> >> >> >> >> >> >> >> for
>>>> >>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> category:Companies_in_the_NASDAQ-100_Index
>>>> >>> >> >> >> >> >> >> >> >>>> > category:Microsoft
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> category:Software_companies_of_the_United_States
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> category:Software_companies_based_in_Washington_(state)
>>>> >>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> category:1975_establishments_in_the_United_States
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> category:Companies_based_in_Redmond,_Washington
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >>
>>>> category:Multinational_companies_headquartered_in_the_United_States
>>>> >>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
>>>> >>> >> >> Redmont,Washington"
>>>> >>> >> >> >> >> which
>>>> >>> >> >> >> >> >> >> could
>>>> >>> >> >> >> >> >> >> >> be
>>>> >>> >> >> >> >> >> >> >> >>>> > matched.
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > There is still other contextual
>>>> information
>>>> >>> from
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia
>>>> >>> >> >> >> >> which
>>>> >>> >> >> >> >> >> >> can
>>>> >>> >> >> >> >> >> >> >> be
>>>> >>> >> >> >> >> >> >> >> >>>> used.
>>>> >>> >> >> >> >> >> >> >> >>>> > For example for an Organization we
>>>> could also
>>>> >>> >> >> include :
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service
>>>> Providers
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack
>>>> Obama) :
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>>  dbpedia:Author
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>>  dbpedia:Lawyer
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this
>>>> as I
>>>> >>> think
>>>> >>> >> >> that
>>>> >>> >> >> >> >> >> >> >> >>>> > it
>>>> >>> >> >> >> >> may
>>>> >>> >> >> >> >> >> >> have
>>>> >>> >> >> >> >> >> >> >> >>>> some
>>>> >>> >> >> >> >> >> >> >> >>>> > value in increasing the number of
>>>> coreference
>>>> >>> >> >> >> >> >> >> >> >>>> > resolutions
>>>> >>> >> >> >> >> and
>>>> >>> >> >> >> >> >> I'd
>>>> >>> >> >> >> >> >> >> >> like
>>>> >>> >> >> >> >> >> >> >> >>>> to
>>>> >>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather
>>>> than
>>>> >>> recall
>>>> >>> >> >> since
>>>> >>> >> >> >> >> >> >> >> >>>> > we
>>>> >>> >> >> >> >> >> already
>>>> >>> >> >> >> >> >> >> >> have
>>>> >>> >> >> >> >> >> >> >> >>>> a
>>>> >>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the
>>>> stanford
>>>> >>> nlp
>>>> >>> >> tool
>>>> >>> >> >> >> >> >> >> >> >>>> > and
>>>> >>> >> >> >> >> this
>>>> >>> >> >> >> >> >> >> would
>>>> >>> >> >> >> >> >> >> >> >>>> be as
>>>> >>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is
>>>> how I
>>>> >>> would
>>>> >>> >> >> like
>>>> >>> >> >> >> >> >> >> >> >>>> > to
>>>> >>> >> >> >> >> use
>>>> >>> >> >> >> >> >> >> it).
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a
>>>> jira? I
>>>> >>> >> could
>>>> >>> >> >> >> >> >> >> >> >>>> > update
>>>> >>> >> >> >> >> it
>>>> >>> >> >> >> >> >> to
>>>> >>> >> >> >> >> >> >> >> show
>>>> >>> >> >> >> >> >> >> >> >>>> my
>>>> >>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if
>>>> it
>>>> >>> turns
>>>> >>> >> out
>>>> >>> >> >> >> >> >> >> >> >>>> > that
>>>> >>> >> >> >> >> it
>>>> >>> >> >> >> >> >> was
>>>> >>> >> >> >> >> >> >> a
>>>> >>> >> >> >> >> >> >> >> bad
>>>> >>> >> >> >> >> >> >> >> >>>> idea
>>>> >>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll
>>>> end up
>>>> >>> >> with
>>>> >>> >> >> >> >> >> >> >> >>>> > more
>>>> >>> >> >> >> >> >> >> knowledge
>>>> >>> >> >> >> >> >> >> >> >>>> about
>>>> >>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want
>>>> to be
>>>> >>> the
>>>> >>> >> >> >> >> >> >> >> >>>> >> devil's
>>>> >>> >> >> >> >> >> >> advocate
>>>> >>> >> >> >> >> >> >> >> but
>>>> >>> >> >> >> >> >> >> >> >>>> I'm
>>>> >>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using
>>>> the
>>>> >>> dbpedia
>>>> >>> >> >> >> >> categories
>>>> >>> >> >> >> >> >> >> >> feature.
>>>> >>> >> >> >> >> >> >> >> >>>> For
>>>> >>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
>>>> >>> "Microsoft
>>>> >>> >> >> posted
>>>> >>> >> >> >> >> >> >> >> >>>> >> its
>>>> >>> >> >> >> >> >> 2013
>>>> >>> >> >> >> >> >> >> >> >>>> earnings.
>>>> >>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge
>>>> profit".
>>>> >>> So,
>>>> >>> >> maybe
>>>> >>> >> >> >> >> >> including
>>>> >>> >> >> >> >> >> >> more
>>>> >>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia
>>>> could
>>>> >>> >> increase
>>>> >>> >> >> the
>>>> >>> >> >> >> >> recall
>>>> >>> >> >> >> >> >> >> but
>>>> >>> >> >> >> >> >> >> >> of
>>>> >>> >> >> >> >> >> >> >> >>>> course
>>>> >>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>> >> Cheers,
>>>> >>> >> >> >> >> >> >> >> >>>> >> Rafa
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
>>>> >>> escribió:
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description
>>>> of the
>>>> >>> >> steps
>>>> >>> >> >> >> >> >> >> >> >>>> >> for
>>>> >>> >> >> >> >> >> making
>>>> >>> >> >> >> >> >> >> this
>>>> >>> >> >> >> >> >> >> >> >>>> kind of
>>>> >>> >> >> >> >> >> >> >> >>>> >>> coreference work.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the
>>>> following
>>>> >>> >> text in
>>>> >>> >> >> >> >> >> >> >> >>>> >>> the
>>>> >>> >> >> >> >> >> steps
>>>> >>> >> >> >> >> >> >> >> below
>>>> >>> >> >> >> >> >> >> >> >>>> in
>>>> >>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer :
>>>> "Microsoft
>>>> >>> posted
>>>> >>> >> >> its
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 2013
>>>> >>> >> >> >> >> >> >> >> earnings.
>>>> >>> >> >> >> >> >> >> >> >>>> The
>>>> >>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text
>>>> which
>>>> >>> has :
>>>> >>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which
>>>> implies
>>>> >>> >> reference
>>>> >>> >> >> to
>>>> >>> >> >> >> >> >> >> >> >>>> >>> an
>>>> >>> >> >> >> >> >> entity
>>>> >>> >> >> >> >> >> >> >> local
>>>> >>> >> >> >> >> >> >> >> >>>> to
>>>> >>> >> >> >> >> >> >> >> >>>> >>> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but
>>>> not
>>>> >>> >> "another,
>>>> >>> >> >> >> >> every",
>>>> >>> >> >> >> >> >> etc
>>>> >>> >> >> >> >> >> >> >> which
>>>> >>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity
>>>> outside of
>>>> >>> the
>>>> >>> >> >> text.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun
>>>> aside
>>>> >>> from
>>>> >>> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>> main
>>>> >>> >> >> >> >> >> >> required
>>>> >>> >> >> >> >> >> >> >> >>>> noun
>>>> >>> >> >> >> >> >> >> >> >>>> >>> which
>>>> >>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I
>>>> will not
>>>> >>> >> count
>>>> >>> >> >> >> >> >> >> >> >>>> >>> "The
>>>> >>> >> >> >> >> >> >> company"
>>>> >>> >> >> >> >> >> >> >> as
>>>> >>> >> >> >> >> >> >> >> >>>> being
>>>> >>> >> >> >> >> >> >> >> >>>> >>> a
>>>> >>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
>>>> >>> create a
>>>> >>> >> lot
>>>> >>> >> >> of
>>>> >>> >> >> >> >> false
>>>> >>> >> >> >> >> >> >> >> >>>> positives by
>>>> >>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of
>>>> some words
>>>> >>> >> such
>>>> >>> >> >> as
>>>> >>> >> >> >> >> >> >> >> >>>> >>> "in
>>>> >>> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> company
>>>> >>> >> >> >> >> >> >> >> >>>> of
>>>> >>> >> >> >> >> >> >> >> >>>> >>> good people".
>>>> >>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good
>>>> candidate
>>>> >>> >> since we
>>>> >>> >> >> >> >> >> >> >> >>>> >>> also
>>>> >>> >> >> >> >> >> have
>>>> >>> >> >> >> >> >> >> >> >>>> "software".
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase
>>>> to the
>>>> >>> >> >> contents
>>>> >>> >> >> >> >> >> >> >> >>>> >>> of
>>>> >>> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> dbpedia
>>>> >>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found
>>>> prior
>>>> >>> to
>>>> >>> >> the
>>>> >>> >> >> >> >> location
>>>> >>> >> >> >> >> >> of
>>>> >>> >> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> noun
>>>> >>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
>>>> >>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the
>>>> following
>>>> >>> >> format
>>>> >>> >> >> >> >> >> >> >> >>>> >>> (for
>>>> >>> >> >> >> >> >> >> Microsoft
>>>> >>> >> >> >> >> >> >> >> for
>>>> >>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the
>>>> United
>>>> >>> >> >> States".
>>>> >>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software
>>>> company" with
>>>> >>> >> that.
>>>> >>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun
>>>> in the
>>>> >>> >> dbpedia
>>>> >>> >> >> >> >> category
>>>> >>> >> >> >> >> >> >> has a
>>>> >>> >> >> >> >> >> >> >> >>>> plural
>>>> >>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all
>>>> categories
>>>> >>> which
>>>> >>> >> I
>>>> >>> >> >> >> >> >> >> >> >>>> >>> saw. I
>>>> >>> >> >> >> >> >> don't
>>>> >>> >> >> >> >> >> >> >> know
>>>> >>> >> >> >> >> >> >> >> >>>> if
>>>> >>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
>>>> >>> thought
>>>> >>> >> of
>>>> >>> >> >> >> >> applying a
>>>> >>> >> >> >> >> >> >> >> >>>> lemmatizer on
>>>> >>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in
>>>> order
>>>> >>> for
>>>> >>> >> them
>>>> >>> >> >> to
>>>> >>> >> >> >> >> have a
>>>> >>> >> >> >> >> >> >> >> common
>>>> >>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the
>>>> noun
>>>> >>> phrase
>>>> >>> >> >> itself
>>>> >>> >> >> >> >> has a
>>>> >>> >> >> >> >> >> >> plural
>>>> >>> >> >> >> >> >> >> >> >>>> form.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for
>>>> comparison
>>>> >>> only the
>>>> >>> >> >> >> >> >> >> >> >>>> >>> words in
>>>> >>> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> category
>>>> >>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
>>>> >>> prepositions
>>>> >>> >> or
>>>> >>> >> >> >> >> >> determiners
>>>> >>> >> >> >> >> >> >> >> such
>>>> >>> >> >> >> >> >> >> >> >>>> as "of
>>>> >>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos
>>>> tag the
>>>> >>> >> >> categories
>>>> >>> >> >> >> >> >> contents
>>>> >>> >> >> >> >> >> >> as
>>>> >>> >> >> >> >> >> >> >> >>>> well.
>>>> >>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and
>>>> lemma
>>>> >>> on
>>>> >>> >> the
>>>> >>> >> >> >> >> dbpedia
>>>> >>> >> >> >> >> >> >> >> >>>> categories when
>>>> >>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity
>>>> hub and
>>>> >>> >> storing
>>>> >>> >> >> >> >> >> >> >> >>>> >>> them
>>>> >>> >> >> >> >> for
>>>> >>> >> >> >> >> >> >> later
>>>> >>> >> >> >> >> >> >> >> >>>> use - I
>>>> >>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
>>>> >>> moment.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in
>>>> the
>>>> >>> noun
>>>> >>> >> >> phrase
>>>> >>> >> >> >> >> with
>>>> >>> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> equivalent
>>>> >>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on
>>>> the
>>>> >>> number
>>>> >>> >> of
>>>> >>> >> >> >> >> matches I
>>>> >>> >> >> >> >> >> >> can
>>>> >>> >> >> >> >> >> >> >> >>>> create a
>>>> >>> >> >> >> >> >> >> >> >>>> >>> confidence level.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase
>>>> with
>>>> >>> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>> rdf:type
>>>> >>> >> >> >> >> from
>>>> >>> >> >> >> >> >> >> >> dbpedia
>>>> >>> >> >> >> >> >> >> >> >>>> of the
>>>> >>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches
>>>> increase the
>>>> >>> >> >> confidence
>>>> >>> >> >> >> >> level.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named
>>>> entities
>>>> >>> which
>>>> >>> >> can
>>>> >>> >> >> >> >> >> >> >> >>>> >>> match a
>>>> >>> >> >> >> >> >> >> certain
>>>> >>> >> >> >> >> >> >> >> >>>> noun
>>>> >>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with
>>>> the
>>>> >>> >> closest
>>>> >>> >> >> >> >> >> >> >> >>>> >>> named
>>>> >>> >> >> >> >> >> entity
>>>> >>> >> >> >> >> >> >> >> prior
>>>> >>> >> >> >> >> >> >> >> >>>> to it
>>>> >>> >> >> >> >> >> >> >> >>>> >>> in the text.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> What do you think?
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> Cristian
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>>>> >>> >> >> >> >> cristian.petroaca@gmail.com>:
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete
>>>> heursitic but
>>>> >>> I'm
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> working on
>>>> >>> >> >> >> >> >> it.
>>>> >>> >> >> >> >> >> >> I'll
>>>> >>> >> >> >> >> >> >> >> >>>> provide
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me
>>>> a
>>>> >>> >> feedback on
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> it.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref
>>>> tools
>>>> >>> such as
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
>>>> >>> >> >> >> >> and
>>>> >>> >> >> >> >> >> >> >> >>>> CherryPicker
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> and
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a
>>>> coreference.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> Cristian
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <
>>>> rharo@apache.org>:
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about
>>>> your
>>>> >>> >> concrete
>>>> >>> >> >> >> >> heuristic,
>>>> >>> >> >> >> >> >> >> in my
>>>> >>> >> >> >> >> >> >> >> >>>> honest
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could
>>>> produce a
>>>> >>> lot of
>>>> >>> >> >> false
>>>> >>> >> >> >> >> >> >> positives. I
>>>> >>> >> >> >> >> >> >> >> >>>> don't
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> know
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some
>>>> "locality"
>>>> >>> >> >> features
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> to
>>>> >>> >> >> >> >> >> detect
>>>> >>> >> >> >> >> >> >> >> such
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take
>>>> into
>>>> >>> account
>>>> >>> >> >> that
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> it
>>>> >>> >> >> >> >> is
>>>> >>> >> >> >> >> >> >> quite
>>>> >>> >> >> >> >> >> >> >> >>>> usual
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> that
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs
>>>> even in
>>>> >>> >> >> different
>>>> >>> >> >> >> >> >> >> paragraphs.
>>>> >>> >> >> >> >> >> >> >> >>>> Although
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> I'm
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
>>>> >>> >> Understanding,
>>>> >>> >> >> I
>>>> >>> >> >> >> >> would
>>>> >>> >> >> >> >> >> say
>>>> >>> >> >> >> >> >> >> it
>>>> >>> >> >> >> >> >> >> >> is
>>>> >>> >> >> >> >> >> >> >> >>>> quite
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent
>>>> precision/recall
>>>> >>> rates
>>>> >>> >> >> for
>>>> >>> >> >> >> >> >> >> coreferencing
>>>> >>> >> >> >> >> >> >> >> >>>> using
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a
>>>> try to
>>>> >>> >> others
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> tools
>>>> >>> >> >> >> >> like
>>>> >>> >> >> >> >> >> >> BART
>>>> >>> >> >> >> >> >> >> >> (
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
>>>> >>> escribió:
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
>>>> >>> implementing
>>>> >>> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Event
>>>> >>> >> >> >> >> >> >> >> extraction
>>>> >>> >> >> >> >> >> >> >> >>>> Engine
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
>>>> >>> >> >> >> >> >> >>
>>>> https://issues.apache.org/jira/browse/STANBOL-1121is
>>>> >>> >> >> >> >> >> >> >> >>>> to
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> have
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the
>>>> given text.
>>>> >>> >> This
>>>> >>> >> >> is
>>>> >>> >> >> >> >> >> provided
>>>> >>> >> >> >> >> >> >> now
>>>> >>> >> >> >> >> >> >> >> >>>> via the
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as
>>>> I saw
>>>> >>> this
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> module
>>>> >>> >> >> >> >> is
>>>> >>> >> >> >> >> >> >> >> performing
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal
>>>> (Barack
>>>> >>> Obama
>>>> >>> >> and
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
>>>> >>> >> >> >> >> >> Obama)
>>>> >>> >> >> >> >> >> >> >> >>>> coreference
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences
>>>> from
>>>> >>> the
>>>> >>> >> text
>>>> >>> >> >> I
>>>> >>> >> >> >> >> though
>>>> >>> >> >> >> >> >> of
>>>> >>> >> >> >> >> >> >> >> >>>> creating
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> some
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind
>>>> of
>>>> >>> >> >> coreference :
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights.
>>>> The
>>>> >>> >> software
>>>> >>> >> >> >> >> company
>>>> >>> >> >> >> >> >> just
>>>> >>> >> >> >> >> >> >> >> >>>> announced
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> its
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company"
>>>> obviously
>>>> >>> refers
>>>> >>> >> to
>>>> >>> >> >> >> >> "Apple".
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences
>>>> of
>>>> >>> Named
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
>>>> >>> >> >> >> >> >> which
>>>> >>> >> >> >> >> >> >> are
>>>> >>> >> >> >> >> >> >> >> of
>>>> >>> >> >> >> >> >> >> >> >>>> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in
>>>> this
>>>> >>> case
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
>>>> >>> >> >> >> >> and
>>>> >>> >> >> >> >> >> >> also
>>>> >>> >> >> >> >> >> >> >> >>>> have
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in
>>>> the
>>>> >>> dbpedia
>>>> >>> >> >> >> >> categories
>>>> >>> >> >> >> >> >> of
>>>> >>> >> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> named
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such
>>>> as
>>>> >>> "The
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> software
>>>> >>> >> >> >> >> >> >> company" in
>>>> >>> >> >> >> >> >> >> >> >>>> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> text
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using
>>>> the
>>>> >>> new
>>>> >>> >> Pos
>>>> >>> >> >> Tag
>>>> >>> >> >> >> >> Based
>>>> >>> >> >> >> >> >> >> Phrase
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
>>>> >>> >> dependency
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
>>>> >>> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> sentence and
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or
>>>> objects.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if
>>>> this
>>>> >>> kind
>>>> >>> >> of
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic
>>>> >>> >> >> >> >> >> would
>>>> >>> >> >> >> >> >> >> be
>>>> >>> >> >> >> >> >> >> >> >>>> useful
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> as a
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in
>>>> case the
>>>> >>> >> >> precision
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> and
>>>> >>> >> >> >> >> >> >> recall
>>>> >>> >> >> >> >> >> >> >> are
>>>> >>> >> >> >> >> >> >> >> >>>> good
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> --
>>>> >>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
>>>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>>>> >>> >> >> >> >> >> >> ++43-699-11108907
>>>> >>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> --
>>>> >>> >> >> >> >> >> >> >> | Rupert Westenthaler
>>>> >>> >> >> >> >> >> >> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
>>>> >>> >> >> >> >> ++43-699-11108907
>>>> >>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> --
>>>> >>> >> >> >> >> >> >> | Rupert Westenthaler
>>>> >>> >> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >> >> | Bodenlehenstraße 11
>>>> >>> >> >> >> >> >> >> ++43-699-11108907
>>>> >>> >> >> >> >> >> >> | A-5500 Bischofshofen
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> --
>>>> >>> >> >> >> >> >> | Rupert Westenthaler
>>>> >>> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >> | Bodenlehenstraße 11
>>>> >>> >> >> ++43-699-11108907
>>>> >>> >> >> >> >> >> | A-5500 Bischofshofen
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> --
>>>> >>> >> >> >> >> | Rupert Westenthaler
>>>> >>> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> | Bodenlehenstraße 11
>>>> >>> >> ++43-699-11108907
>>>> >>> >> >> >> >> | A-5500 Bischofshofen
>>>> >>> >> >> >> >>
>>>> >>> >> >> >>
>>>> >>> >> >> >>
>>>> >>> >> >> >>
>>>> >>> >> >> >> --
>>>> >>> >> >> >> | Rupert Westenthaler
>>>> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> | Bodenlehenstraße 11
>>>> >>> ++43-699-11108907
>>>> >>> >> >> >> | A-5500 Bischofshofen
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >>
>>>> >>> >> >>
>>>> >>> >> >>
>>>> >>> >> >> --
>>>> >>> >> >> | Rupert Westenthaler
>>>> rupert.westenthaler@gmail.com
>>>> >>> >> >> | Bodenlehenstraße 11
>>>> ++43-699-11108907
>>>> >>> >> >> | A-5500 Bischofshofen
>>>> >>> >> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> --
>>>> >>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> >>> >> | Bodenlehenstraße 11
>>>> ++43-699-11108907
>>>> >>> >> | A-5500 Bischofshofen
>>>> >>> >>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> >>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> >>> | A-5500 Bischofshofen
>>>> >>>
>>>> >>
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>>
>>>
>>>
>>
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.

Ok, I think I kind of figured it out. If I want to use the dbpedia data
index I need to use the SiteManager to get the Site with id = "dbpedia".
Then I can query the Site directly.

I have some additional questions though :
1. In my particular case I want to be able to also get the yago class of
the given entity. These properties come with yago-types.nt file from
dbpedia and this file is not present in the entityhub dbpedia data fetch
scripts here :
https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/dbpedia-3.8/fetch_data_en_int.sh.
Also this file comes with dbpedia 3.9. This means that I need to rebuild
the dbpedia index data with 3.9 and the new yago-types.nt file. Is this
correct?

2. I also need to be able to get some specific dbpedia properties from the
index, such as dbpedia-owl:locationCity and others for a given entity. At
the moment these are not available when doing a query on the dbpedia Site.
I suppose I need to place them in
https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/src/main/resources/indexing/config/mappings.txtand
do a rebuild of the dbpedia index?

Thanks.
Cristian


2014-04-28 16:58 GMT+03:00 Cristian Petroaca <cr...@gmail.com>:

> Hi,
>
> I've started to implement the dbpedia properties logic and I'd like to get
> some feedback on some things that I am doing :
> I want to get a NER from the text and search for it in the dbpedia data so
> that I can get certain dbpedia properties.
> The way I'm trying to do this is by getting the NER_ANNOTATION chunk's
> text and search that in the Entityhub ( which from what I saw is by default
> configured with dbpedia data). I haven't yet performed a query to actually
> get the data but before I continue I'd like to ask if this is the way to go?
>
> Thanks,
> Cristian
>
>
> 2014-03-28 15:12 GMT+02:00 Cristian Petroaca <cr...@gmail.com>
> :
>
>> Examples :
>>
>> 1. Group membership :
>>     a. Spatial membership :
>>
>>         "Microsoft anounced its 2013 earnings. <coref>The Richmond-based
>> company</coref> made huge profits."
>>
>>     b. Organisational membership :
>>
>>        "Mick Jagger started a new solo album. <coref>The Rolling Stones
>> singer</coref> did not say what the theme will be."
>>
>> 2. Functional membership :
>>
>>    "Allianz announced its 2013 earnings. <coref>The financial services
>> company</coref> made a huge profit."
>>
>> 3.  If no matches were found for the current NER with rules from above
>> then if the yago:class which matched has more than 2 nouns then we also
>> consider this a good co-reference but with a lower confidence maybe.
>>
>>    "Boris Becker will take part in a demonstrative tennis match.
>> <coref>The former tennis player</coref> will play again after 10 years."
>>
>>
>> 2014-03-28 12:22 GMT+02:00 Rupert Westenthaler <
>> rupert.westenthaler@gmail.com>:
>>
>>> Hi Cristian, all
>>>
>>> Looks good to me, nut I am not sure if I got everything. If you could
>>> provide example texts where those rules apply it would make it much
>>> easier to understand.
>>>
>>> Instead of using dbpedia properties you should define your own domain
>>> model (ontology). You can than align the dbpedia properties to your
>>> model. This will allow it to apply this approach also to knowledge
>>> bases other than dbpedia.
>>>
>>> For people new to this thread: The above message adds to the
>>> suggestion first made by Cristian on 4th February. Also the following
>>> 4 messages (until 7th Feb) provide additional context.
>>>
>>> best
>>> Rupert
>>>
>>>
>>> On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca
>>> <cr...@gmail.com> wrote:
>>> > Hi guys,
>>> >
>>> > After Rupert's last suggestions related to this enhancement engine I
>>> > devised a more comprehensive algorithm for matching the noun phrases
>>> > against the NER properties.Please take a look and let me know what you
>>> > think. Thanks.
>>> >
>>> > The following rules will be applied to every noun phrase in order to
>>> find
>>> > co-references:
>>> >
>>> > 1. For each NER prior to the current noun phrase in the text match the
>>> > yago:class label to the contents of the noun phrase.
>>> >
>>> > For the NERs which have a yago:class which matches, apply:
>>> >
>>> > 2. Group membership rules :
>>> >
>>> >     a. spatial membership : the NER is part of a Location. If the noun
>>> > phrase contains a LOCATION or a demonym then check any location
>>> properties
>>> > of the matching NER.
>>> >
>>> >     If matching NER is a :
>>> >     - person, match against :birthPlace, :region, :nationality
>>> >     - organisation, match against :foundationPlace, :locationCity,
>>> > :location, :hometown
>>> >     - place, match against :country, :subdivisionName, :location,
>>> >
>>> >     Ex: The Italian President, The Richmond-based company
>>> >
>>> >     b. organisational membership : the NER is part of an Organisation.
>>> If
>>> > the noun phrase contains an ORGANISATION then check the following
>>> > properties of the maching NER:
>>> >
>>> >     If matching NER is :
>>> >     - person, match against :occupation, :associatedActs
>>> >     - organisation ?
>>> >     - location ?
>>> >
>>> > Ex: The Microsoft executive, The Pink Floyd singer
>>> >
>>> > 3. Functional description rule: the noun phrase describes what the NER
>>> does
>>> > conceptually.
>>> > If there are no NERs in the noun phrase then match the following
>>> properties
>>> > of the matching NER to the contents of the noun phrase (aside from the
>>> > nouns which are part of the yago:class) :
>>> >
>>> >    If NER is a:
>>> >    - person ?
>>> >    - organisation : , match against :service, :industry, :genre
>>> >    - location ?
>>> >
>>> > Ex:  The software company.
>>> >
>>> > 4. If no matches were found for the current NER with rules 2 or 3 then
>>> if
>>> > the yago:class which matched has more than 2 nouns then we also
>>> consider
>>> > this a good co-reference but with a lower confidence maybe.
>>> >
>>> > Ex: The former tennis player, the theoretical physicist.
>>> >
>>> > 5. Based on the number of nouns which matched we create a confidence
>>> level.
>>> > The number of matched nouns cannot be lower than 2 and we must have a
>>> > yago:class match.
>>> >
>>> > For all NERs which got to this point, select the closest ones in the
>>> text
>>> > to the noun phrase which matched against the same properties
>>> (yago:class
>>> > and dbpedia) and mark them as co-references.
>>> >
>>> > Note: all noun phrases need to be lemmatized before all of this in case
>>> > there are any plurals.
>>> >
>>> >
>>> > 2014-03-25 20:50 GMT+02:00 Cristian Petroaca <
>>> cristian.petroaca@gmail.com>:
>>> >
>>> >> That worked. Thanks.
>>> >>
>>> >> So, there are no exceptions during the startup of the launcher.
>>> >> The component tab in the felix console shows 6 WeightedChains the
>>> first
>>> >> time, including the default one but after my changes and a restart
>>> there
>>> >> are only 5 - the default one is missing altogether.
>>> >>
>>> >>
>>> >> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
>>> >> rupert.westenthaler@gmail.com>:
>>> >>
>>> >> Hi Cristian,
>>> >>>
>>> >>> I do see the same problem since last Friday. The solution as mentions
>>> >>> by [1] works for me.
>>> >>>
>>> >>>     mvn -Djsse.enableSNIExtension=false {goals}
>>> >>>
>>> >>> No Idea why https connections to github do currently cause this. I
>>> >>> could not find anything related via Google. So I suggest to use the
>>> >>> system property for now. If this persists for longer we can adapt the
>>> >>> build files accordingly.
>>> >>>
>>> >>> best
>>> >>> Rupert
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> [1]
>>> >>>
>>> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
>>> >>>
>>> >>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
>>> >>> <cr...@gmail.com> wrote:
>>> >>> > I did a clean on the whole project and now I wanted to do another
>>> "mvn
>>> >>> > clean install" but I am getting this :
>>> >>> >
>>> >>> > "[INFO]
>>> >>> >
>>> ------------------------------------------------------------------------
>>> >>> > [ERROR] Failed to execute goal
>>> >>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
>>> >>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es:
>>> An
>>> >>> Ant
>>> >>> > BuildE
>>> >>> > xception has occured: The following error occurred while executing
>>> this
>>> >>> > line:
>>> >>> > [ERROR]
>>> >>> >
>>> C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
>>> >>> > 3: Failed to copy
>>> >>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
>>> >>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin
>>> to
>>> >>> > C:\Data\Pr
>>> >>> >
>>> >>>
>>> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
>>> >>> > data\opennlp\es-pos-maxent.bin due to
>>> javax.net.ssl.SSLProtocolException
>>> >>> > handshake alert : unrecognized_name"
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
>>> >>> > rupert.westenthaler@gmail.com>:
>>> >>> >
>>> >>> >> Hi Cristian,
>>> >>> >>
>>> >>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
>>> >>> >> <cr...@gmail.com> wrote:
>>> >>> >> >
>>> >>> >>
>>> >>>
>>> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
>>> >>> >> > service.ranking=I"-2147483648"
>>> >>> >> > stanbol.enhancer.chain.name="default"
>>> >>> >>
>>> >>> >> Does look fine to me. Do you see any exception during the startup
>>> of
>>> >>> >> the launcher. Can you check the status of this component in the
>>> >>> >> component tab of the felix web console [1] (search for
>>> >>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain").
>>> If
>>> >>> >> you have multiple you can find the correct one by comparing the
>>> >>> >> "Properties" with those in the configuration file.
>>> >>> >>
>>> >>> >> I guess that the according service is in the 'unsatisfied' as you
>>> do
>>> >>> >> not see it in the web interface. But if this is the case you
>>> should
>>> >>> >> also see the according exception in the log. You can also manually
>>> >>> >> stop/start the component. In this case the exception should be
>>> >>> >> re-thrown and you do not need to search the log for it.
>>> >>> >>
>>> >>> >> best
>>> >>> >> Rupert
>>> >>> >>
>>> >>> >>
>>> >>> >> [1] http://localhost:8080/system/console/components
>>> >>> >>
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
>>> >>> >> rupert.westenthaler@gmail.com
>>> >>> >> >>:
>>> >>> >> >
>>> >>> >> >> Hi Cristian,
>>> >>> >> >>
>>> >>> >> >> you can not send attachments to the list. Please copy the
>>> contents
>>> >>> >> >> directly to the mail
>>> >>> >> >>
>>> >>> >> >> thx
>>> >>> >> >> Rupert
>>> >>> >> >>
>>> >>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>>> >>> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> > The config attached.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>>> >>> >> >> > <ru...@gmail.com>:
>>> >>> >> >> >
>>> >>> >> >> >> Hi Cristian,
>>> >>> >> >> >>
>>> >>> >> >> >> can you provide the contents of the chain after your
>>> >>> modifications?
>>> >>> >> >> >> Would be interesting to test why the chain is no longer
>>> active
>>> >>> after
>>> >>> >> >> >> the restart.
>>> >>> >> >> >>
>>> >>> >> >> >> You can find the config file in the 'stanbol/fileinstall'
>>> folder.
>>> >>> >> >> >>
>>> >>> >> >> >> best
>>> >>> >> >> >> Rupert
>>> >>> >> >> >>
>>> >>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>>> >>> >> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> >> > Related to the default chain selection rules : before
>>> restart I
>>> >>> >> had a
>>> >>> >> >> >> > chain
>>> >>> >> >> >> > with the name 'default' as in I could access it via
>>> >>> >> >> >> > enhancer/chain/default.
>>> >>> >> >> >> > Then I just added another engine to the 'default' chain. I
>>> >>> assumed
>>> >>> >> >> that
>>> >>> >> >> >> > after the restart the chain with the 'default' name would
>>> be
>>> >>> >> >> persisted.
>>> >>> >> >> >> > So
>>> >>> >> >> >> > the first rule should have been applied after the restart
>>> as
>>> >>> well.
>>> >>> >> But
>>> >>> >> >> >> > instead I cannot reach it via enhancer/chain/default
>>> anymore
>>> >>> so its
>>> >>> >> >> >> > gone.
>>> >>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in
>>> any
>>> >>> way, I
>>> >>> >> >> just
>>> >>> >> >> >> > wanted to understand where the problem is.
>>> >>> >> >> >> >
>>> >>> >> >> >> >
>>> >>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>>> >>> >> >> >> > <rupert.westenthaler@gmail.com
>>> >>> >> >> >> >>:
>>> >>> >> >> >> >
>>> >>> >> >> >> >> Hi Cristian
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>>> >>> >> >> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> > 2. I start the stable launcher -> create a new
>>> instance of
>>> >>> the
>>> >>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At
>>> this
>>> >>> point
>>> >>> >> >> >> >> > everything
>>> >>> >> >> >> >> > looks good and works ok.
>>> >>> >> >> >> >> > After I restart the server the default chain is gone
>>> and
>>> >>> >> instead I
>>> >>> >> >> >> >> > see
>>> >>> >> >> >> >> this
>>> >>> >> >> >> >> > in the enhancement chains page : all-active (default,
>>> id:
>>> >>> 149,
>>> >>> >> >> >> >> > ranking:
>>> >>> >> >> >> >> 0,
>>> >>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not
>>> contain
>>> >>> the
>>> >>> >> >> >> >> > 'default'
>>> >>> >> >> >> >> > word before the restart.
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> Please note the default chain selection rules as
>>> described at
>>> >>> [1].
>>> >>> >> >> You
>>> >>> >> >> >> >> can also access chains chains under
>>> >>> '/enhancer/chain/{chain-name}'
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> best
>>> >>> >> >> >> >> Rupert
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> [1]
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> > It looks like the config files are exactly what I need.
>>> >>> Thanks.
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >>:
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>>> >>> >> >> >> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> >> >> >> > Thanks Rupert.
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> > A couple more questions/issues :
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing
>>> this
>>> >>> in the
>>> >>> >> >> >> >> >> > console
>>> >>> >> >> >> >> >> > output :
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted
>>> Chains get
>>> >>> >> messed
>>> >>> >> >> >> >> >> > up. I
>>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine
>>> to it
>>> >>> so
>>> >>> >> there
>>> >>> >> >> >> >> >> > are
>>> >>> >> >> >> >> 11
>>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>>> contains
>>> >>> >> around
>>> >>> >> >> 23
>>> >>> >> >> >> >> >> engines
>>> >>> >> >> >> >> >> > in total.
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> I was not able to replicate this. What I tried was
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> (1) start up the stable launcher
>>> >>> >> >> >> >> >> (2) add an additional engine to the default chain
>>> >>> >> >> >> >> >> (3) restart the launcher
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> The default chain was not changed after (2) and (3).
>>> So I
>>> >>> would
>>> >>> >> >> need
>>> >>> >> >> >> >> >> further information for knowing why this is happening.
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> Generally it is better to create you own chain
>>> instance as
>>> >>> >> >> modifying
>>> >>> >> >> >> >> >> one that is provided by the default configuration. I
>>> would
>>> >>> also
>>> >>> >> >> >> >> >> recommend that you keep your test configuration in
>>> text
>>> >>> files
>>> >>> >> and
>>> >>> >> >> to
>>> >>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing
>>> so
>>> >>> >> prevent
>>> >>> >> >> you
>>> >>> >> >> >> >> >> from manually entering the configuration after a
>>> software
>>> >>> >> update.
>>> >>> >> >> >> >> >> The
>>> >>> >> >> >> >> >> production-mode section [3] provides information on
>>> how to
>>> >>> do
>>> >>> >> >> that.
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> best
>>> >>> >> >> >> >> >> Rupert
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> [1]
>>> https://issues.apache.org/jira/browse/STANBOL-1278
>>> >>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
>>> >>> >> >> >> >> >> [3]
>>> http://stanbol.apache.org/docs/trunk/production-mode
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> > ERROR: Bundle
>>> >>> org.apache.stanbol.enhancer.engine.topic.web
>>> >>> >> >> [153]:
>>> >>> >> >> >> >> Error
>>> >>> >> >> >> >> >> > starting
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >>
>>> >>>
>>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>>> >>> >> >> >> >> >> > (org.osgi
>>> >>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in
>>> >>> bundle
>>> >>> >> >> >> >> >> > org.apache.stanbol.e
>>> >>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve
>>> 153.0:
>>> >>> >> missing
>>> >>> >> >> >> >> >> > requirement [15
>>> >>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>>> >>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>>> >>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
>>> >>> constraint in
>>> >>> >> >> >> >> >> > bundle
>>> >>> >> >> >> >> >> > org.apache.s
>>> >>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to
>>> resolve
>>> >>> >> 153.0:
>>> >>> >> >> >> >> missing
>>> >>> >> >> >> >> >> > require
>>> >>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>>> >>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>>> >>> >> >> >> >> >> > )
>>> >>> >> >> >> >> >> >         at
>>> >>> >> >> >> >> >>
>>> >>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>>> >>> >> >> >> >> >> >         at
>>> >>> >> >> >> >>
>>> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>>> >>> >> >> >> >> >> >         at
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >>
>>> >>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >         at
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >>
>>> >>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>>> >>> >> >> >> >> >> > )
>>> >>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> > Despite of this the server starts fine and I can
>>> use the
>>> >>> >> >> enhancer
>>> >>> >> >> >> >> fine.
>>> >>> >> >> >> >> >> Do
>>> >>> >> >> >> >> >> > you guys see this as well?
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted
>>> Chains get
>>> >>> >> messed
>>> >>> >> >> >> >> >> > up. I
>>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine
>>> to it
>>> >>> so
>>> >>> >> there
>>> >>> >> >> >> >> >> > are
>>> >>> >> >> >> >> 11
>>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>>> contains
>>> >>> >> around
>>> >>> >> >> 23
>>> >>> >> >> >> >> >> engines
>>> >>> >> >> >> >> >> > in total.
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>>> >>> >> >> >> >> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >> >>:
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> Hi Cristian,
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> NER Annotations are typically available as both
>>> >>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and
>>>  fise:TextAnnotation
>>> >>> [1]
>>> >>> >> in
>>> >>> >> >> the
>>> >>> >> >> >> >> >> >> enhancement metadata. As you are already accessing
>>> the
>>> >>> >> >> >> >> >> >> AnayzedText I
>>> >>> >> >> >> >> >> >> would prefer using the
>>>  NlpAnnotations.NER_ANNOTATION.
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> best
>>> >>> >> >> >> >> >> >> Rupert
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> [1]
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>>> >>> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> >> >> >> >> > Thanks.
>>> >>> >> >> >> >> >> >> > I assume I should get the Named entities using
>>> the
>>> >>> same
>>> >>> >> but
>>> >>> >> >> >> >> >> >> > with
>>> >>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>>> >>> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>>> >>> >> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
>>> >>> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >> Hallo Cristian,
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement
>>> >>> results.
>>> >>> >> >> You
>>> >>> >> >> >> >> need to
>>> >>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> here is some demo code you can use in the
>>> >>> >> computeEnhancement
>>> >>> >> >> >> >> method
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>         AnalysedText at =
>>> >>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>>> >>> >> >> >> >> ci,
>>> >>> >> >> >> >> >> >> true);
>>> >>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
>>> >>> >> >> >> >> >> >> >> at.getSentences();
>>> >>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as
>>> single
>>> >>> >> >> sentence
>>> >>> >> >> >> >> >> >> >>             sections =
>>> >>> >> Collections.singleton(at).iterator();
>>> >>> >> >> >> >> >> >> >>         }
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>         while(sections.hasNext()){
>>> >>> >> >> >> >> >> >> >>             Section section = sections.next();
>>> >>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
>>> >>> >> >> >> >> >> >> >>
>>> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>>> >>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
>>> >>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
>>> >>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
>>> >>> >> >> >> >> >> >> >>
>>> >>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>>> >>> >> >> >> >> >> >> >>                 if(phrase.value().getCategory()
>>> ==
>>> >>> >> >> >> >> >> >> LexicalCategory.Noun){
>>> >>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase
>>> [{},{}]
>>> >>> {}",
>>> >>> >> >> new
>>> >>> >> >> >> >> >> Object[]{
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>
>>> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>>> >>> >> >> >> >> >> >> >>                 }
>>> >>> >> >> >> >> >> >> >>             }
>>> >>> >> >> >> >> >> >> >>         }
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> hope this helps
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> best
>>> >>> >> >> >> >> >> >> >> Rupert
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> [1]
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian
>>> Petroaca
>>> >>> >> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> >> >> >> >> >> > I started to implement the engine and I'm
>>> having
>>> >>> >> problems
>>> >>> >> >> >> >> >> >> >> > with
>>> >>> >> >> >> >> >> getting
>>> >>> >> >> >> >> >> >> >> > results for noun phrases. I modified the
>>> "default"
>>> >>> >> >> weighted
>>> >>> >> >> >> >> chain
>>> >>> >> >> >> >> >> to
>>> >>> >> >> >> >> >> >> also
>>> >>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample
>>> text
>>> >>> :
>>> >>> >> >> "Angela
>>> >>> >> >> >> >> Merkel
>>> >>> >> >> >> >> >> >> >> visted
>>> >>> >> >> >> >> >> >> >> > China. The german chancellor met with various
>>> >>> people".
>>> >>> >> I
>>> >>> >> >> >> >> expected
>>> >>> >> >> >> >> >> that
>>> >>> >> >> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> > RDF XML output would contain some info about
>>> the
>>> >>> noun
>>> >>> >> >> >> >> >> >> >> > phrases
>>> >>> >> >> >> >> but I
>>> >>> >> >> >> >> >> >> >> cannot
>>> >>> >> >> >> >> >> >> >> > see any.
>>> >>> >> >> >> >> >> >> >> > Could you point me to the correct way to
>>> generate
>>> >>> the
>>> >>> >> noun
>>> >>> >> >> >> >> phrases?
>>> >>> >> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >> > Thanks,
>>> >>> >> >> >> >> >> >> >> > Cristian
>>> >>> >> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
>>> >>> >> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >> >> Opened
>>> >>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
>>> >>> >> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca
>>> <
>>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>
>>> >>> >> >> >> >> >> >> >> >> :
>>> >>> >> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> >> Hi Rupert,
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea.
>>> I'll also
>>> >>> >> take a
>>> >>> >> >> >> >> >> >> >> >>> look
>>> >>> >> >> >> >> at
>>> >>> >> >> >> >> >> >> Yago.
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked
>>> about
>>> >>> here.
>>> >>> >> It
>>> >>> >> >> >> >> >> >> >> >>> will
>>> >>> >> >> >> >> >> >> probably
>>> >>> >> >> >> >> >> >> >> >>> have just a draft-like description for now
>>> and
>>> >>> will
>>> >>> >> be
>>> >>> >> >> >> >> >> >> >> >>> updated
>>> >>> >> >> >> >> >> as I
>>> >>> >> >> >> >> >> >> go
>>> >>> >> >> >> >> >> >> >> >>> along.
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>> Thanks,
>>> >>> >> >> >> >> >> >> >> >>> Cristian
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert
>>> Westenthaler <
>>> >>> >> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>> Hi Cristian,
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You
>>> should
>>> >>> have
>>> >>> >> a
>>> >>> >> >> >> >> >> >> >> >>>> look at
>>> >>> >> >> >> >> >> Yago2
>>> >>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago
>>> taxonomy
>>> >>> is
>>> >>> >> much
>>> >>> >> >> >> >> better
>>> >>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia.
>>> Mapping
>>> >>> >> >> >> >> >> >> >> >>>> suggestions of
>>> >>> >> >> >> >> >> >> dbpedia
>>> >>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both
>>> dbpedia and
>>> >>> >> yago2
>>> >>> >> >> do
>>> >>> >> >> >> >> >> provide
>>> >>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
>>> >>> >> Redmond's
>>> >>> >> >> >> >> >> >> >> >>>> >> company
>>> >>> >> >> >> >> >> made
>>> >>> >> >> >> >> >> >> a
>>> >>> >> >> >> >> >> >> >> >>>> >> huge profit".
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial
>>> >>> contexts
>>> >>> >> >> are
>>> >>> >> >> >> >> >> >> >> >>>> very
>>> >>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for
>>> >>> >> >> referencing.
>>> >>> >> >> >> >> >> >> >> >>>> So I
>>> >>> >> >> >> >> >> would
>>> >>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial
>>> context.
>>> >>> For
>>> >>> >> >> >> >> >> >> >> >>>> spatial
>>> >>> >> >> >> >> >> >> Entities
>>> >>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for
>>> other
>>> >>> >> (like a
>>> >>> >> >> >> >> Person,
>>> >>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial
>>> >>> entities
>>> >>> >> >> >> >> >> >> >> >>>> define
>>> >>> >> >> >> >> >> their
>>> >>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be
>>> >>> used to
>>> >>> >> >> >> >> >> >> >> >>>> correctly
>>> >>> >> >> >> >> >> link
>>> >>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the
>>> "spatial"
>>> >>> >> >> context
>>> >>> >> >> >> >> >> >> >> >>>> of
>>> >>> >> >> >> >> each
>>> >>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities
>>> that are
>>> >>> >> cities,
>>> >>> >> >> >> >> regions,
>>> >>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because
>>> >>> those
>>> >>> >> are
>>> >>> >> >> >> >> >> >> >> >>>> very
>>> >>> >> >> >> >> often
>>> >>> >> >> >> >> >> >> used
>>> >>> >> >> >> >> >> >> >> >>>> for coreferences.
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> [1]
>>> http://www.mpi-inf.mpg.de/yago-naga/yago/
>>> >>> >> >> >> >> >> >> >> >>>> [2]
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>>> >>> >> >> >> >> >> >> >> >>>> [3]
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
>>> >>> Petroaca
>>> >>> >> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>>> >>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for
>>> each
>>> >>> >> entity,
>>> >>> >> >> >> >> >> >> >> >>>> > in
>>> >>> >> >> >> >> this
>>> >>> >> >> >> >> >> >> case
>>> >>> >> >> >> >> >> >> >> for
>>> >>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> category:Companies_in_the_NASDAQ-100_Index
>>> >>> >> >> >> >> >> >> >> >>>> > category:Microsoft
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> category:Software_companies_of_the_United_States
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> category:Software_companies_based_in_Washington_(state)
>>> >>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> category:1975_establishments_in_the_United_States
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> category:Companies_based_in_Redmond,_Washington
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >>
>>> category:Multinational_companies_headquartered_in_the_United_States
>>> >>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
>>> >>> >> >> Redmont,Washington"
>>> >>> >> >> >> >> which
>>> >>> >> >> >> >> >> >> could
>>> >>> >> >> >> >> >> >> >> be
>>> >>> >> >> >> >> >> >> >> >>>> > matched.
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > There is still other contextual
>>> information
>>> >>> from
>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia
>>> >>> >> >> >> >> which
>>> >>> >> >> >> >> >> >> can
>>> >>> >> >> >> >> >> >> >> be
>>> >>> >> >> >> >> >> >> >> >>>> used.
>>> >>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could
>>> also
>>> >>> >> >> include :
>>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
>>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service
>>> Providers
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack
>>> Obama) :
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>>> >>> >> >> >> >> >> >> >> >>>> >
>>>  dbpedia:Author
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>>> >>> >> >> >> >> >> >> >> >>>> >
>>>  dbpedia:Lawyer
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this
>>> as I
>>> >>> think
>>> >>> >> >> that
>>> >>> >> >> >> >> >> >> >> >>>> > it
>>> >>> >> >> >> >> may
>>> >>> >> >> >> >> >> >> have
>>> >>> >> >> >> >> >> >> >> >>>> some
>>> >>> >> >> >> >> >> >> >> >>>> > value in increasing the number of
>>> coreference
>>> >>> >> >> >> >> >> >> >> >>>> > resolutions
>>> >>> >> >> >> >> and
>>> >>> >> >> >> >> >> I'd
>>> >>> >> >> >> >> >> >> >> like
>>> >>> >> >> >> >> >> >> >> >>>> to
>>> >>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than
>>> >>> recall
>>> >>> >> >> since
>>> >>> >> >> >> >> >> >> >> >>>> > we
>>> >>> >> >> >> >> >> already
>>> >>> >> >> >> >> >> >> >> have
>>> >>> >> >> >> >> >> >> >> >>>> a
>>> >>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the
>>> stanford
>>> >>> nlp
>>> >>> >> tool
>>> >>> >> >> >> >> >> >> >> >>>> > and
>>> >>> >> >> >> >> this
>>> >>> >> >> >> >> >> >> would
>>> >>> >> >> >> >> >> >> >> >>>> be as
>>> >>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is
>>> how I
>>> >>> would
>>> >>> >> >> like
>>> >>> >> >> >> >> >> >> >> >>>> > to
>>> >>> >> >> >> >> use
>>> >>> >> >> >> >> >> >> it).
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a
>>> jira? I
>>> >>> >> could
>>> >>> >> >> >> >> >> >> >> >>>> > update
>>> >>> >> >> >> >> it
>>> >>> >> >> >> >> >> to
>>> >>> >> >> >> >> >> >> >> show
>>> >>> >> >> >> >> >> >> >> >>>> my
>>> >>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if
>>> it
>>> >>> turns
>>> >>> >> out
>>> >>> >> >> >> >> >> >> >> >>>> > that
>>> >>> >> >> >> >> it
>>> >>> >> >> >> >> >> was
>>> >>> >> >> >> >> >> >> a
>>> >>> >> >> >> >> >> >> >> bad
>>> >>> >> >> >> >> >> >> >> >>>> idea
>>> >>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll
>>> end up
>>> >>> >> with
>>> >>> >> >> >> >> >> >> >> >>>> > more
>>> >>> >> >> >> >> >> >> knowledge
>>> >>> >> >> >> >> >> >> >> >>>> about
>>> >>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want
>>> to be
>>> >>> the
>>> >>> >> >> >> >> >> >> >> >>>> >> devil's
>>> >>> >> >> >> >> >> >> advocate
>>> >>> >> >> >> >> >> >> >> but
>>> >>> >> >> >> >> >> >> >> >>>> I'm
>>> >>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the
>>> >>> dbpedia
>>> >>> >> >> >> >> categories
>>> >>> >> >> >> >> >> >> >> feature.
>>> >>> >> >> >> >> >> >> >> >>>> For
>>> >>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
>>> >>> "Microsoft
>>> >>> >> >> posted
>>> >>> >> >> >> >> >> >> >> >>>> >> its
>>> >>> >> >> >> >> >> 2013
>>> >>> >> >> >> >> >> >> >> >>>> earnings.
>>> >>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge
>>> profit".
>>> >>> So,
>>> >>> >> maybe
>>> >>> >> >> >> >> >> including
>>> >>> >> >> >> >> >> >> more
>>> >>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia
>>> could
>>> >>> >> increase
>>> >>> >> >> the
>>> >>> >> >> >> >> recall
>>> >>> >> >> >> >> >> >> but
>>> >>> >> >> >> >> >> >> >> of
>>> >>> >> >> >> >> >> >> >> >>>> course
>>> >>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>> >> Cheers,
>>> >>> >> >> >> >> >> >> >> >>>> >> Rafa
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
>>> >>> escribió:
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description
>>> of the
>>> >>> >> steps
>>> >>> >> >> >> >> >> >> >> >>>> >> for
>>> >>> >> >> >> >> >> making
>>> >>> >> >> >> >> >> >> this
>>> >>> >> >> >> >> >> >> >> >>>> kind of
>>> >>> >> >> >> >> >> >> >> >>>> >>> coreference work.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the
>>> following
>>> >>> >> text in
>>> >>> >> >> >> >> >> >> >> >>>> >>> the
>>> >>> >> >> >> >> >> steps
>>> >>> >> >> >> >> >> >> >> below
>>> >>> >> >> >> >> >> >> >> >>>> in
>>> >>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer :
>>> "Microsoft
>>> >>> posted
>>> >>> >> >> its
>>> >>> >> >> >> >> >> >> >> >>>> >>> 2013
>>> >>> >> >> >> >> >> >> >> earnings.
>>> >>> >> >> >> >> >> >> >> >>>> The
>>> >>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text
>>> which
>>> >>> has :
>>> >>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
>>> >>> >> reference
>>> >>> >> >> to
>>> >>> >> >> >> >> >> >> >> >>>> >>> an
>>> >>> >> >> >> >> >> entity
>>> >>> >> >> >> >> >> >> >> local
>>> >>> >> >> >> >> >> >> >> >>>> to
>>> >>> >> >> >> >> >> >> >> >>>> >>> the
>>> >>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but
>>> not
>>> >>> >> "another,
>>> >>> >> >> >> >> every",
>>> >>> >> >> >> >> >> etc
>>> >>> >> >> >> >> >> >> >> which
>>> >>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity
>>> outside of
>>> >>> the
>>> >>> >> >> text.
>>> >>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun
>>> aside
>>> >>> from
>>> >>> >> the
>>> >>> >> >> >> >> >> >> >> >>>> >>> main
>>> >>> >> >> >> >> >> >> required
>>> >>> >> >> >> >> >> >> >> >>>> noun
>>> >>> >> >> >> >> >> >> >> >>>> >>> which
>>> >>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I
>>> will not
>>> >>> >> count
>>> >>> >> >> >> >> >> >> >> >>>> >>> "The
>>> >>> >> >> >> >> >> >> company"
>>> >>> >> >> >> >> >> >> >> as
>>> >>> >> >> >> >> >> >> >> >>>> being
>>> >>> >> >> >> >> >> >> >> >>>> >>> a
>>> >>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
>>> >>> create a
>>> >>> >> lot
>>> >>> >> >> of
>>> >>> >> >> >> >> false
>>> >>> >> >> >> >> >> >> >> >>>> positives by
>>> >>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some
>>> words
>>> >>> >> such
>>> >>> >> >> as
>>> >>> >> >> >> >> >> >> >> >>>> >>> "in
>>> >>> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> company
>>> >>> >> >> >> >> >> >> >> >>>> of
>>> >>> >> >> >> >> >> >> >> >>>> >>> good people".
>>> >>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good
>>> candidate
>>> >>> >> since we
>>> >>> >> >> >> >> >> >> >> >>>> >>> also
>>> >>> >> >> >> >> >> have
>>> >>> >> >> >> >> >> >> >> >>>> "software".
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase
>>> to the
>>> >>> >> >> contents
>>> >>> >> >> >> >> >> >> >> >>>> >>> of
>>> >>> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> dbpedia
>>> >>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found
>>> prior
>>> >>> to
>>> >>> >> the
>>> >>> >> >> >> >> location
>>> >>> >> >> >> >> >> of
>>> >>> >> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> >>>> noun
>>> >>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
>>> >>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the
>>> following
>>> >>> >> format
>>> >>> >> >> >> >> >> >> >> >>>> >>> (for
>>> >>> >> >> >> >> >> >> Microsoft
>>> >>> >> >> >> >> >> >> >> for
>>> >>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the
>>> United
>>> >>> >> >> States".
>>> >>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software
>>> company" with
>>> >>> >> that.
>>> >>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun
>>> in the
>>> >>> >> dbpedia
>>> >>> >> >> >> >> category
>>> >>> >> >> >> >> >> >> has a
>>> >>> >> >> >> >> >> >> >> >>>> plural
>>> >>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all
>>> categories
>>> >>> which
>>> >>> >> I
>>> >>> >> >> >> >> >> >> >> >>>> >>> saw. I
>>> >>> >> >> >> >> >> don't
>>> >>> >> >> >> >> >> >> >> know
>>> >>> >> >> >> >> >> >> >> >>>> if
>>> >>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
>>> >>> thought
>>> >>> >> of
>>> >>> >> >> >> >> applying a
>>> >>> >> >> >> >> >> >> >> >>>> lemmatizer on
>>> >>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in
>>> order
>>> >>> for
>>> >>> >> them
>>> >>> >> >> to
>>> >>> >> >> >> >> have a
>>> >>> >> >> >> >> >> >> >> common
>>> >>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun
>>> >>> phrase
>>> >>> >> >> itself
>>> >>> >> >> >> >> has a
>>> >>> >> >> >> >> >> >> plural
>>> >>> >> >> >> >> >> >> >> >>>> form.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison
>>> >>> only the
>>> >>> >> >> >> >> >> >> >> >>>> >>> words in
>>> >>> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> >>>> category
>>> >>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
>>> >>> prepositions
>>> >>> >> or
>>> >>> >> >> >> >> >> determiners
>>> >>> >> >> >> >> >> >> >> such
>>> >>> >> >> >> >> >> >> >> >>>> as "of
>>> >>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag
>>> the
>>> >>> >> >> categories
>>> >>> >> >> >> >> >> contents
>>> >>> >> >> >> >> >> >> as
>>> >>> >> >> >> >> >> >> >> >>>> well.
>>> >>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and
>>> lemma
>>> >>> on
>>> >>> >> the
>>> >>> >> >> >> >> dbpedia
>>> >>> >> >> >> >> >> >> >> >>>> categories when
>>> >>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub
>>> and
>>> >>> >> storing
>>> >>> >> >> >> >> >> >> >> >>>> >>> them
>>> >>> >> >> >> >> for
>>> >>> >> >> >> >> >> >> later
>>> >>> >> >> >> >> >> >> >> >>>> use - I
>>> >>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
>>> >>> moment.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in
>>> the
>>> >>> noun
>>> >>> >> >> phrase
>>> >>> >> >> >> >> with
>>> >>> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> >>>> equivalent
>>> >>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on
>>> the
>>> >>> number
>>> >>> >> of
>>> >>> >> >> >> >> matches I
>>> >>> >> >> >> >> >> >> can
>>> >>> >> >> >> >> >> >> >> >>>> create a
>>> >>> >> >> >> >> >> >> >> >>>> >>> confidence level.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase
>>> with
>>> >>> the
>>> >>> >> >> >> >> >> >> >> >>>> >>> rdf:type
>>> >>> >> >> >> >> from
>>> >>> >> >> >> >> >> >> >> dbpedia
>>> >>> >> >> >> >> >> >> >> >>>> of the
>>> >>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase
>>> the
>>> >>> >> >> confidence
>>> >>> >> >> >> >> level.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities
>>> >>> which
>>> >>> >> can
>>> >>> >> >> >> >> >> >> >> >>>> >>> match a
>>> >>> >> >> >> >> >> >> certain
>>> >>> >> >> >> >> >> >> >> >>>> noun
>>> >>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with
>>> the
>>> >>> >> closest
>>> >>> >> >> >> >> >> >> >> >>>> >>> named
>>> >>> >> >> >> >> >> entity
>>> >>> >> >> >> >> >> >> >> prior
>>> >>> >> >> >> >> >> >> >> >>>> to it
>>> >>> >> >> >> >> >> >> >> >>>> >>> in the text.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> What do you think?
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> Cristian
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>>> >>> >> >> >> >> cristian.petroaca@gmail.com>:
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic
>>> but
>>> >>> I'm
>>> >>> >> >> >> >> >> >> >> >>>> >>>> working on
>>> >>> >> >> >> >> >> it.
>>> >>> >> >> >> >> >> >> I'll
>>> >>> >> >> >> >> >> >> >> >>>> provide
>>> >>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
>>> >>> >> feedback on
>>> >>> >> >> >> >> >> >> >> >>>> >>>> it.
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools
>>> >>> such as
>>> >>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
>>> >>> >> >> >> >> and
>>> >>> >> >> >> >> >> >> >> >>>> CherryPicker
>>> >>> >> >> >> >> >> >> >> >>>> >>>> and
>>> >>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> Cristian
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <
>>> rharo@apache.org>:
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about
>>> your
>>> >>> >> concrete
>>> >>> >> >> >> >> heuristic,
>>> >>> >> >> >> >> >> >> in my
>>> >>> >> >> >> >> >> >> >> >>>> honest
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce
>>> a
>>> >>> lot of
>>> >>> >> >> false
>>> >>> >> >> >> >> >> >> positives. I
>>> >>> >> >> >> >> >> >> >> >>>> don't
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> know
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some
>>> "locality"
>>> >>> >> >> features
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> to
>>> >>> >> >> >> >> >> detect
>>> >>> >> >> >> >> >> >> >> such
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take
>>> into
>>> >>> account
>>> >>> >> >> that
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> it
>>> >>> >> >> >> >> is
>>> >>> >> >> >> >> >> >> quite
>>> >>> >> >> >> >> >> >> >> >>>> usual
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> that
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs
>>> even in
>>> >>> >> >> different
>>> >>> >> >> >> >> >> >> paragraphs.
>>> >>> >> >> >> >> >> >> >> >>>> Although
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> I'm
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
>>> >>> >> Understanding,
>>> >>> >> >> I
>>> >>> >> >> >> >> would
>>> >>> >> >> >> >> >> say
>>> >>> >> >> >> >> >> >> it
>>> >>> >> >> >> >> >> >> >> is
>>> >>> >> >> >> >> >> >> >> >>>> quite
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent
>>> precision/recall
>>> >>> rates
>>> >>> >> >> for
>>> >>> >> >> >> >> >> >> coreferencing
>>> >>> >> >> >> >> >> >> >> >>>> using
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a
>>> try to
>>> >>> >> others
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> tools
>>> >>> >> >> >> >> like
>>> >>> >> >> >> >> >> >> BART
>>> >>> >> >> >> >> >> >> >> (
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
>>> >>> escribió:
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
>>> >>> implementing
>>> >>> >> the
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Event
>>> >>> >> >> >> >> >> >> >> extraction
>>> >>> >> >> >> >> >> >> >> >>>> Engine
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
>>> >>> >> >> >> >> >> >>
>>> https://issues.apache.org/jira/browse/STANBOL-1121is
>>> >>> >> >> >> >> >> >> >> >>>> to
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> have
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given
>>> text.
>>> >>> >> This
>>> >>> >> >> is
>>> >>> >> >> >> >> >> provided
>>> >>> >> >> >> >> >> >> now
>>> >>> >> >> >> >> >> >> >> >>>> via the
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as
>>> I saw
>>> >>> this
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> module
>>> >>> >> >> >> >> is
>>> >>> >> >> >> >> >> >> >> performing
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal
>>> (Barack
>>> >>> Obama
>>> >>> >> and
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
>>> >>> >> >> >> >> >> Obama)
>>> >>> >> >> >> >> >> >> >> >>>> coreference
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences
>>> from
>>> >>> the
>>> >>> >> text
>>> >>> >> >> I
>>> >>> >> >> >> >> though
>>> >>> >> >> >> >> >> of
>>> >>> >> >> >> >> >> >> >> >>>> creating
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> some
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
>>> >>> >> >> coreference :
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights.
>>> The
>>> >>> >> software
>>> >>> >> >> >> >> company
>>> >>> >> >> >> >> >> just
>>> >>> >> >> >> >> >> >> >> >>>> announced
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> its
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company"
>>> obviously
>>> >>> refers
>>> >>> >> to
>>> >>> >> >> >> >> "Apple".
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences
>>> of
>>> >>> Named
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
>>> >>> >> >> >> >> >> which
>>> >>> >> >> >> >> >> >> are
>>> >>> >> >> >> >> >> >> >> of
>>> >>> >> >> >> >> >> >> >> >>>> the
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in
>>> this
>>> >>> case
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
>>> >>> >> >> >> >> and
>>> >>> >> >> >> >> >> >> also
>>> >>> >> >> >> >> >> >> >> >>>> have
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the
>>> >>> dbpedia
>>> >>> >> >> >> >> categories
>>> >>> >> >> >> >> >> of
>>> >>> >> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> >>>> named
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such
>>> as
>>> >>> "The
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> software
>>> >>> >> >> >> >> >> >> company" in
>>> >>> >> >> >> >> >> >> >> >>>> the
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> text
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using
>>> the
>>> >>> new
>>> >>> >> Pos
>>> >>> >> >> Tag
>>> >>> >> >> >> >> Based
>>> >>> >> >> >> >> >> >> Phrase
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
>>> >>> >> dependency
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
>>> >>> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> >>>> sentence and
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if
>>> this
>>> >>> kind
>>> >>> >> of
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic
>>> >>> >> >> >> >> >> would
>>> >>> >> >> >> >> >> >> be
>>> >>> >> >> >> >> >> >> >> >>>> useful
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> as a
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in
>>> case the
>>> >>> >> >> precision
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> and
>>> >>> >> >> >> >> >> >> recall
>>> >>> >> >> >> >> >> >> >> are
>>> >>> >> >> >> >> >> >> >> >>>> good
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> --
>>> >>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
>>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>>> >>> >> >> >> >> >> >> ++43-699-11108907
>>> >>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> --
>>> >>> >> >> >> >> >> >> >> | Rupert Westenthaler
>>> >>> >> >> >> >> >> >> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
>>> >>> >> >> >> >> ++43-699-11108907
>>> >>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> --
>>> >>> >> >> >> >> >> >> | Rupert Westenthaler
>>> >>> >> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >> >> | Bodenlehenstraße 11
>>> >>> >> >> >> >> >> >> ++43-699-11108907
>>> >>> >> >> >> >> >> >> | A-5500 Bischofshofen
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> --
>>> >>> >> >> >> >> >> | Rupert Westenthaler
>>> >>> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >> | Bodenlehenstraße 11
>>> >>> >> >> ++43-699-11108907
>>> >>> >> >> >> >> >> | A-5500 Bischofshofen
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> --
>>> >>> >> >> >> >> | Rupert Westenthaler
>>> >>> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> | Bodenlehenstraße 11
>>> >>> >> ++43-699-11108907
>>> >>> >> >> >> >> | A-5500 Bischofshofen
>>> >>> >> >> >> >>
>>> >>> >> >> >>
>>> >>> >> >> >>
>>> >>> >> >> >>
>>> >>> >> >> >> --
>>> >>> >> >> >> | Rupert Westenthaler
>>> rupert.westenthaler@gmail.com
>>> >>> >> >> >> | Bodenlehenstraße 11
>>> >>> ++43-699-11108907
>>> >>> >> >> >> | A-5500 Bischofshofen
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> --
>>> >>> >> >> | Rupert Westenthaler
>>> rupert.westenthaler@gmail.com
>>> >>> >> >> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >>> >> >> | A-5500 Bischofshofen
>>> >>> >> >>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >>> >> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >>> >> | A-5500 Bischofshofen
>>> >>> >>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >>> | A-5500 Bischofshofen
>>> >>>
>>> >>
>>> >>
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>