You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Cristian Petroaca <cr...@gmail.com> on 2014/02/04 09:50:53 UTC

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Back with a more detailed description of the steps for making this kind of
coreference work.

I will be using references to the following text in the steps below in
order to make things clearer : "Microsoft posted its 2013 earnings. The
software company made a huge profit."

1. For every noun phrase in the text which has :
    a. a determinate pos which implies reference to an entity local to the
text, such as "the, this, these") but not "another, every", etc which
implies a reference to an entity outside of the text.
    b. having at least another noun aside from the main required noun which
further describes it. For example I will not count "The company" as being a
legitimate candidate since this could create a lot of false positives by
considering the double meaning of some words such as "in the company of
good people".
"The software company" is a good candidate since we also have "software".

2. match the nouns in the noun phrase to the contents of the dbpedia
categories of each named entity found prior to the location of the noun
phrase in the text.
The dbpedia categories are in the following format (for Microsoft for
example) : "Software companies of the United States".
 So we try to match "software company" with that.
First, as you can see, the main noun in the dbpedia category has a plural
form and it's the same for all categories which I saw. I don't know if
there's an easier way to do this but I thought of applying a lemmatizer on
the category and the noun phrase in order for them to have a common
denominator.This also works if the noun phrase itself has a plural form.

Second, I'll need to use for comparison only the words in the category
which are themselves nouns and not prepositions or determiners such as "of
the".This means that I need to pos tag the categories contents as well.
I was thinking of running the pos and lemma on the dbpedia categories when
building the dbpedia backed entity hub and storing them for later use - I
don't know how feasible this is at the moment.

After this I can compare each noun in the noun phrase with the equivalent
nouns in the categories and based on the number of matches I can create a
confidence level.

3. match the noun of the noun phrase with the rdf:type from dbpedia of the
named entity. If this matches increase the confidence level.

4. If there are multiple named entities which can match a certain noun
phrase then link the noun phrase with the closest named entity prior to it
in the text.

What do you think?

Cristian

2014-01-31 Cristian Petroaca <cr...@gmail.com>:

> Hi Rafa,
>
> I don't yet have a concrete heursitic but I'm working on it. I'll provide
> it here so that you guys can give me a feedback on it.
>
> What are "locality" features?
>
> I looked at Bart and other coref tools such as ArkRef and CherryPicker and
> they don't provide such a coreference.
>
> Cristian
>
>
> 2014-01-30 Rafa Haro <rh...@apache.org>:
>
> Hi Cristian,
>>
>> Without having more details about your concrete heuristic, in my honest
>> opinion, such approach could produce a lot of false positives. I don't know
>> if you are planning to use some "locality" features to detect such
>> coreferences but you need to take into account that it is quite usual that
>> coreferenced mentions can occurs even in different paragraphs. Although I'm
>> not an expert in Natural Language Understanding, I would say it is quite
>> difficult to get decent precision/recall rates for coreferencing using
>> fixed rules. Maybe you can give a try to others tools like BART (
>> http://www.bart-coref.org/).
>>
>> Cheers,
>> Rafa Haro
>>
>> El 30/01/14 10:33, Cristian Petroaca escribió:
>>
>>  Hi,
>>>
>>> One of the necessary steps for implementing the Event extraction Engine
>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121 is to have
>>> coreference resolution in the given text. This is provided now via the
>>> stanford-nlp project but as far as I saw this module is performing mostly
>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama) coreference
>>> resolution.
>>>
>>> In order to get more coreferences from the text I though of creating some
>>> logic that would detect this kind of coreference :
>>> "Apple reaches new profit heights. The software company just announced
>>> its
>>> 2013 earnings."
>>> Here "The software company" obviously refers to "Apple".
>>> So I'd like to detect coreferences of Named Entities which are of the
>>> rdf:type of the Named Entity , in this case "company" and also have
>>> attributes which can be found in the dbpedia categories of the named
>>> entity, in this case "software".
>>>
>>> The detection of coreferences such as "The software company" in the text
>>> would also be done by either using the new Pos Tag Based Phrase
>>> extraction
>>> Engine (noun phrases) or by using a dependency tree of the sentence and
>>> picking up only subjects or objects.
>>>
>>> At this point I'd like to know if this kind of logic would be useful as a
>>> separate Enhancement Engine (in case the precision and recall are good
>>> enough) in Stanbol?
>>>
>>> Thanks,
>>> Cristian
>>>
>>>
>>
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian,

I see several possible solutions:

1. The indexing tool does support LDPath. That means you can import
all the required RDF files and use LDPath to append the labels of the
Yago Types directly to the dbpedia entities.  This would prevent
additional lookups to retrieve the types, but also increase the size
of the index a lot.
2. You could also index the Yago Types and use an additional Entityhub
lookup to retrieve them. In this case you should first collect all
types referenced by Entities in the processed text and in a second
step retrieve the labels. While this means additional lookups it will
only load the labels for an type once. In addition you could use a
cache for types.
3. Your engine could use LDPath to retrieve the types. This would
require to index the data like with option (2) and use a LDPath
statement similar to (1). It would be the slowest solution (as it
requires an additional lookup for every extracted entity) but require
the least code.


regarding:

On Wed, May 7, 2014 at 9:02 AM, Cristian Petroaca
<cr...@gmail.com> wrote:
> I can get the labels from one of the yago downloads here :
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoMultilingualClassLabels.txt.
> I'll need another yago download file to map the yago wordnet classes to
> dbpedia uris. That could be done via a script maybe.

I hope there is also an RDF files with that labels. In that case you
need just to add it to the resource/rdfdata directory.

best
Rupert



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
Hi Rupert,

Looking into the yago_types.nt file which assigns yago classes to dbpedia
entities I realized that there are no yago class labels present, I just
have the class uri like : <http://dbpedia/..something../President1829302/.
I also need the class labels so that I can compare them to the noun token's
string from the text.

I can get the labels from one of the yago downloads here :
http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoMultilingualClassLabels.txt.
I'll need another yago download file to map the yago wordnet classes to
dbpedia uris. That could be done via a script maybe.

Once I have the dbpedia_yago_class_uri -> label file is it possible to
integrate this data in the dbpedia index and later be able to query the
labels from the 'dbpedia' Site? As you can see the file won't refer to the
actual dbpedia entity but to the yago class as being the subject in the
triple. So how would that work in the dbpedia indexing process? What should
I change in the mappings.txt file? Briefly looking through the mappings.txt
file this looks similar to how the skos categories are indexed.

Other than that, I saw that someone will be working on integrating YAGO as
part of Gsoc 2014. So maybe waiting for that is an option too but I don't
know what the extent of the integration will be.

Thanks,
Cristi


2014-04-29 11:06 GMT+03:00 Cristian Petroaca <cr...@gmail.com>:

> Ok, I think I kind of figured it out. If I want to use the dbpedia data
> index I need to use the SiteManager to get the Site with id = "dbpedia".
> Then I can query the Site directly.
>
> I have some additional questions though :
> 1. In my particular case I want to be able to also get the yago class of
> the given entity. These properties come with yago-types.nt file from
> dbpedia and this file is not present in the entityhub dbpedia data fetch
> scripts here :
> https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/dbpedia-3.8/fetch_data_en_int.sh.
> Also this file comes with dbpedia 3.9. This means that I need to rebuild
> the dbpedia index data with 3.9 and the new yago-types.nt file. Is this
> correct?
>
> 2. I also need to be able to get some specific dbpedia properties from the
> index, such as dbpedia-owl:locationCity and others for a given entity. At
> the moment these are not available when doing a query on the dbpedia Site.
> I suppose I need to place them in
> https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/src/main/resources/indexing/config/mappings.txtand do a rebuild of the dbpedia index?
>
> Thanks.
> Cristian
>
>
> 2014-04-28 16:58 GMT+03:00 Cristian Petroaca <cr...@gmail.com>
> :
>
>> Hi,
>>
>> I've started to implement the dbpedia properties logic and I'd like to
>> get some feedback on some things that I am doing :
>> I want to get a NER from the text and search for it in the dbpedia data
>> so that I can get certain dbpedia properties.
>> The way I'm trying to do this is by getting the NER_ANNOTATION chunk's
>> text and search that in the Entityhub ( which from what I saw is by default
>> configured with dbpedia data). I haven't yet performed a query to actually
>> get the data but before I continue I'd like to ask if this is the way to go?
>>
>> Thanks,
>> Cristian
>>
>>
>> 2014-03-28 15:12 GMT+02:00 Cristian Petroaca <cristian.petroaca@gmail.com
>> >:
>>
>>> Examples :
>>>
>>> 1. Group membership :
>>>     a. Spatial membership :
>>>
>>>         "Microsoft anounced its 2013 earnings. <coref>The Richmond-based
>>> company</coref> made huge profits."
>>>
>>>     b. Organisational membership :
>>>
>>>        "Mick Jagger started a new solo album. <coref>The Rolling Stones
>>> singer</coref> did not say what the theme will be."
>>>
>>> 2. Functional membership :
>>>
>>>    "Allianz announced its 2013 earnings. <coref>The financial services
>>> company</coref> made a huge profit."
>>>
>>> 3.  If no matches were found for the current NER with rules from above
>>> then if the yago:class which matched has more than 2 nouns then we also
>>> consider this a good co-reference but with a lower confidence maybe.
>>>
>>>    "Boris Becker will take part in a demonstrative tennis match.
>>> <coref>The former tennis player</coref> will play again after 10 years."
>>>
>>>
>>> 2014-03-28 12:22 GMT+02:00 Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com>:
>>>
>>>> Hi Cristian, all
>>>>
>>>> Looks good to me, nut I am not sure if I got everything. If you could
>>>> provide example texts where those rules apply it would make it much
>>>> easier to understand.
>>>>
>>>> Instead of using dbpedia properties you should define your own domain
>>>> model (ontology). You can than align the dbpedia properties to your
>>>> model. This will allow it to apply this approach also to knowledge
>>>> bases other than dbpedia.
>>>>
>>>> For people new to this thread: The above message adds to the
>>>> suggestion first made by Cristian on 4th February. Also the following
>>>> 4 messages (until 7th Feb) provide additional context.
>>>>
>>>> best
>>>> Rupert
>>>>
>>>>
>>>> On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca
>>>> <cr...@gmail.com> wrote:
>>>> > Hi guys,
>>>> >
>>>> > After Rupert's last suggestions related to this enhancement engine I
>>>> > devised a more comprehensive algorithm for matching the noun phrases
>>>> > against the NER properties.Please take a look and let me know what you
>>>> > think. Thanks.
>>>> >
>>>> > The following rules will be applied to every noun phrase in order to
>>>> find
>>>> > co-references:
>>>> >
>>>> > 1. For each NER prior to the current noun phrase in the text match the
>>>> > yago:class label to the contents of the noun phrase.
>>>> >
>>>> > For the NERs which have a yago:class which matches, apply:
>>>> >
>>>> > 2. Group membership rules :
>>>> >
>>>> >     a. spatial membership : the NER is part of a Location. If the noun
>>>> > phrase contains a LOCATION or a demonym then check any location
>>>> properties
>>>> > of the matching NER.
>>>> >
>>>> >     If matching NER is a :
>>>> >     - person, match against :birthPlace, :region, :nationality
>>>> >     - organisation, match against :foundationPlace, :locationCity,
>>>> > :location, :hometown
>>>> >     - place, match against :country, :subdivisionName, :location,
>>>> >
>>>> >     Ex: The Italian President, The Richmond-based company
>>>> >
>>>> >     b. organisational membership : the NER is part of an
>>>> Organisation. If
>>>> > the noun phrase contains an ORGANISATION then check the following
>>>> > properties of the maching NER:
>>>> >
>>>> >     If matching NER is :
>>>> >     - person, match against :occupation, :associatedActs
>>>> >     - organisation ?
>>>> >     - location ?
>>>> >
>>>> > Ex: The Microsoft executive, The Pink Floyd singer
>>>> >
>>>> > 3. Functional description rule: the noun phrase describes what the
>>>> NER does
>>>> > conceptually.
>>>> > If there are no NERs in the noun phrase then match the following
>>>> properties
>>>> > of the matching NER to the contents of the noun phrase (aside from the
>>>> > nouns which are part of the yago:class) :
>>>> >
>>>> >    If NER is a:
>>>> >    - person ?
>>>> >    - organisation : , match against :service, :industry, :genre
>>>> >    - location ?
>>>> >
>>>> > Ex:  The software company.
>>>> >
>>>> > 4. If no matches were found for the current NER with rules 2 or 3
>>>> then if
>>>> > the yago:class which matched has more than 2 nouns then we also
>>>> consider
>>>> > this a good co-reference but with a lower confidence maybe.
>>>> >
>>>> > Ex: The former tennis player, the theoretical physicist.
>>>> >
>>>> > 5. Based on the number of nouns which matched we create a confidence
>>>> level.
>>>> > The number of matched nouns cannot be lower than 2 and we must have a
>>>> > yago:class match.
>>>> >
>>>> > For all NERs which got to this point, select the closest ones in the
>>>> text
>>>> > to the noun phrase which matched against the same properties
>>>> (yago:class
>>>> > and dbpedia) and mark them as co-references.
>>>> >
>>>> > Note: all noun phrases need to be lemmatized before all of this in
>>>> case
>>>> > there are any plurals.
>>>> >
>>>> >
>>>> > 2014-03-25 20:50 GMT+02:00 Cristian Petroaca <
>>>> cristian.petroaca@gmail.com>:
>>>> >
>>>> >> That worked. Thanks.
>>>> >>
>>>> >> So, there are no exceptions during the startup of the launcher.
>>>> >> The component tab in the felix console shows 6 WeightedChains the
>>>> first
>>>> >> time, including the default one but after my changes and a restart
>>>> there
>>>> >> are only 5 - the default one is missing altogether.
>>>> >>
>>>> >>
>>>> >> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
>>>> >> rupert.westenthaler@gmail.com>:
>>>> >>
>>>> >> Hi Cristian,
>>>> >>>
>>>> >>> I do see the same problem since last Friday. The solution as
>>>> mentions
>>>> >>> by [1] works for me.
>>>> >>>
>>>> >>>     mvn -Djsse.enableSNIExtension=false {goals}
>>>> >>>
>>>> >>> No Idea why https connections to github do currently cause this. I
>>>> >>> could not find anything related via Google. So I suggest to use the
>>>> >>> system property for now. If this persists for longer we can adapt
>>>> the
>>>> >>> build files accordingly.
>>>> >>>
>>>> >>> best
>>>> >>> Rupert
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> [1]
>>>> >>>
>>>> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
>>>> >>>
>>>> >>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
>>>> >>> <cr...@gmail.com> wrote:
>>>> >>> > I did a clean on the whole project and now I wanted to do another
>>>> "mvn
>>>> >>> > clean install" but I am getting this :
>>>> >>> >
>>>> >>> > "[INFO]
>>>> >>> >
>>>> ------------------------------------------------------------------------
>>>> >>> > [ERROR] Failed to execute goal
>>>> >>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
>>>> >>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es:
>>>> An
>>>> >>> Ant
>>>> >>> > BuildE
>>>> >>> > xception has occured: The following error occurred while
>>>> executing this
>>>> >>> > line:
>>>> >>> > [ERROR]
>>>> >>> >
>>>> C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
>>>> >>> > 3: Failed to copy
>>>> >>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
>>>> >>> >
>>>> 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to
>>>> >>> > C:\Data\Pr
>>>> >>> >
>>>> >>>
>>>> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
>>>> >>> > data\opennlp\es-pos-maxent.bin due to
>>>> javax.net.ssl.SSLProtocolException
>>>> >>> > handshake alert : unrecognized_name"
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
>>>> >>> > rupert.westenthaler@gmail.com>:
>>>> >>> >
>>>> >>> >> Hi Cristian,
>>>> >>> >>
>>>> >>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
>>>> >>> >> <cr...@gmail.com> wrote:
>>>> >>> >> >
>>>> >>> >>
>>>> >>>
>>>> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
>>>> >>> >> > service.ranking=I"-2147483648"
>>>> >>> >> > stanbol.enhancer.chain.name="default"
>>>> >>> >>
>>>> >>> >> Does look fine to me. Do you see any exception during the
>>>> startup of
>>>> >>> >> the launcher. Can you check the status of this component in the
>>>> >>> >> component tab of the felix web console [1] (search for
>>>> >>> >>
>>>> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
>>>> >>> >> you have multiple you can find the correct one by comparing the
>>>> >>> >> "Properties" with those in the configuration file.
>>>> >>> >>
>>>> >>> >> I guess that the according service is in the 'unsatisfied' as
>>>> you do
>>>> >>> >> not see it in the web interface. But if this is the case you
>>>> should
>>>> >>> >> also see the according exception in the log. You can also
>>>> manually
>>>> >>> >> stop/start the component. In this case the exception should be
>>>> >>> >> re-thrown and you do not need to search the log for it.
>>>> >>> >>
>>>> >>> >> best
>>>> >>> >> Rupert
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> [1] http://localhost:8080/system/console/components
>>>> >>> >>
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
>>>> >>> >> rupert.westenthaler@gmail.com
>>>> >>> >> >>:
>>>> >>> >> >
>>>> >>> >> >> Hi Cristian,
>>>> >>> >> >>
>>>> >>> >> >> you can not send attachments to the list. Please copy the
>>>> contents
>>>> >>> >> >> directly to the mail
>>>> >>> >> >>
>>>> >>> >> >> thx
>>>> >>> >> >> Rupert
>>>> >>> >> >>
>>>> >>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>>>> >>> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> > The config attached.
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>>>> >>> >> >> > <ru...@gmail.com>:
>>>> >>> >> >> >
>>>> >>> >> >> >> Hi Cristian,
>>>> >>> >> >> >>
>>>> >>> >> >> >> can you provide the contents of the chain after your
>>>> >>> modifications?
>>>> >>> >> >> >> Would be interesting to test why the chain is no longer
>>>> active
>>>> >>> after
>>>> >>> >> >> >> the restart.
>>>> >>> >> >> >>
>>>> >>> >> >> >> You can find the config file in the 'stanbol/fileinstall'
>>>> folder.
>>>> >>> >> >> >>
>>>> >>> >> >> >> best
>>>> >>> >> >> >> Rupert
>>>> >>> >> >> >>
>>>> >>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>>>> >>> >> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> > Related to the default chain selection rules : before
>>>> restart I
>>>> >>> >> had a
>>>> >>> >> >> >> > chain
>>>> >>> >> >> >> > with the name 'default' as in I could access it via
>>>> >>> >> >> >> > enhancer/chain/default.
>>>> >>> >> >> >> > Then I just added another engine to the 'default' chain.
>>>> I
>>>> >>> assumed
>>>> >>> >> >> that
>>>> >>> >> >> >> > after the restart the chain with the 'default' name
>>>> would be
>>>> >>> >> >> persisted.
>>>> >>> >> >> >> > So
>>>> >>> >> >> >> > the first rule should have been applied after the
>>>> restart as
>>>> >>> well.
>>>> >>> >> But
>>>> >>> >> >> >> > instead I cannot reach it via enhancer/chain/default
>>>> anymore
>>>> >>> so its
>>>> >>> >> >> >> > gone.
>>>> >>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in
>>>> any
>>>> >>> way, I
>>>> >>> >> >> just
>>>> >>> >> >> >> > wanted to understand where the problem is.
>>>> >>> >> >> >> >
>>>> >>> >> >> >> >
>>>> >>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>>>> >>> >> >> >> > <rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >>:
>>>> >>> >> >> >> >
>>>> >>> >> >> >> >> Hi Cristian
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>>>> >>> >> >> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
>>>> >>> >> >> >> >> >
>>>> >>> >> >> >> >> > 2. I start the stable launcher -> create a new
>>>> instance of
>>>> >>> the
>>>> >>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At
>>>> this
>>>> >>> point
>>>> >>> >> >> >> >> > everything
>>>> >>> >> >> >> >> > looks good and works ok.
>>>> >>> >> >> >> >> > After I restart the server the default chain is gone
>>>> and
>>>> >>> >> instead I
>>>> >>> >> >> >> >> > see
>>>> >>> >> >> >> >> this
>>>> >>> >> >> >> >> > in the enhancement chains page : all-active (default,
>>>> id:
>>>> >>> 149,
>>>> >>> >> >> >> >> > ranking:
>>>> >>> >> >> >> >> 0,
>>>> >>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not
>>>> contain
>>>> >>> the
>>>> >>> >> >> >> >> > 'default'
>>>> >>> >> >> >> >> > word before the restart.
>>>> >>> >> >> >> >> >
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> Please note the default chain selection rules as
>>>> described at
>>>> >>> [1].
>>>> >>> >> >> You
>>>> >>> >> >> >> >> can also access chains chains under
>>>> >>> '/enhancer/chain/{chain-name}'
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> best
>>>> >>> >> >> >> >> Rupert
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> [1]
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> > It looks like the config files are exactly what I
>>>> need.
>>>> >>> Thanks.
>>>> >>> >> >> >> >> >
>>>> >>> >> >> >> >> >
>>>> >>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>>>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >>:
>>>> >>> >> >> >> >> >
>>>> >>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>>>> >>> >> >> >> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> >> >> > Thanks Rupert.
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> > A couple more questions/issues :
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing
>>>> this
>>>> >>> in the
>>>> >>> >> >> >> >> >> > console
>>>> >>> >> >> >> >> >> > output :
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted
>>>> Chains get
>>>> >>> >> messed
>>>> >>> >> >> >> >> >> > up. I
>>>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine
>>>> to it
>>>> >>> so
>>>> >>> >> there
>>>> >>> >> >> >> >> >> > are
>>>> >>> >> >> >> >> 11
>>>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>>>> contains
>>>> >>> >> around
>>>> >>> >> >> 23
>>>> >>> >> >> >> >> >> engines
>>>> >>> >> >> >> >> >> > in total.
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> I was not able to replicate this. What I tried was
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> (1) start up the stable launcher
>>>> >>> >> >> >> >> >> (2) add an additional engine to the default chain
>>>> >>> >> >> >> >> >> (3) restart the launcher
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> The default chain was not changed after (2) and (3).
>>>> So I
>>>> >>> would
>>>> >>> >> >> need
>>>> >>> >> >> >> >> >> further information for knowing why this is
>>>> happening.
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> Generally it is better to create you own chain
>>>> instance as
>>>> >>> >> >> modifying
>>>> >>> >> >> >> >> >> one that is provided by the default configuration. I
>>>> would
>>>> >>> also
>>>> >>> >> >> >> >> >> recommend that you keep your test configuration in
>>>> text
>>>> >>> files
>>>> >>> >> and
>>>> >>> >> >> to
>>>> >>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder.
>>>> Doing so
>>>> >>> >> prevent
>>>> >>> >> >> you
>>>> >>> >> >> >> >> >> from manually entering the configuration after a
>>>> software
>>>> >>> >> update.
>>>> >>> >> >> >> >> >> The
>>>> >>> >> >> >> >> >> production-mode section [3] provides information on
>>>> how to
>>>> >>> do
>>>> >>> >> >> that.
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> best
>>>> >>> >> >> >> >> >> Rupert
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> [1]
>>>> https://issues.apache.org/jira/browse/STANBOL-1278
>>>> >>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
>>>> >>> >> >> >> >> >> [3]
>>>> http://stanbol.apache.org/docs/trunk/production-mode
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> > ERROR: Bundle
>>>> >>> org.apache.stanbol.enhancer.engine.topic.web
>>>> >>> >> >> [153]:
>>>> >>> >> >> >> >> Error
>>>> >>> >> >> >> >> >> > starting
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >>
>>>> >>>
>>>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>>>> >>> >> >> >> >> >> > (org.osgi
>>>> >>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint
>>>> in
>>>> >>> bundle
>>>> >>> >> >> >> >> >> > org.apache.stanbol.e
>>>> >>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve
>>>> 153.0:
>>>> >>> >> missing
>>>> >>> >> >> >> >> >> > requirement [15
>>>> >>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>>>> >>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>>>> >>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
>>>> >>> constraint in
>>>> >>> >> >> >> >> >> > bundle
>>>> >>> >> >> >> >> >> > org.apache.s
>>>> >>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to
>>>> resolve
>>>> >>> >> 153.0:
>>>> >>> >> >> >> >> missing
>>>> >>> >> >> >> >> >> > require
>>>> >>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>>>> >>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>>>> >>> >> >> >> >> >> > )
>>>> >>> >> >> >> >> >> >         at
>>>> >>> >> >> >> >> >>
>>>> >>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>>>> >>> >> >> >> >> >> >         at
>>>> >>> >> >> >> >>
>>>> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>>>> >>> >> >> >> >> >> >         at
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >>
>>>> >>>
>>>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >         at
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >>
>>>> >>>
>>>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>>>> >>> >> >> >> >> >> > )
>>>> >>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> > Despite of this the server starts fine and I can
>>>> use the
>>>> >>> >> >> enhancer
>>>> >>> >> >> >> >> fine.
>>>> >>> >> >> >> >> >> Do
>>>> >>> >> >> >> >> >> > you guys see this as well?
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted
>>>> Chains get
>>>> >>> >> messed
>>>> >>> >> >> >> >> >> > up. I
>>>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine
>>>> to it
>>>> >>> so
>>>> >>> >> there
>>>> >>> >> >> >> >> >> > are
>>>> >>> >> >> >> >> 11
>>>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>>>> contains
>>>> >>> >> around
>>>> >>> >> >> 23
>>>> >>> >> >> >> >> >> engines
>>>> >>> >> >> >> >> >> > in total.
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>>>> >>> >> >> >> >> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >> >>:
>>>> >>> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> Hi Cristian,
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> NER Annotations are typically available as both
>>>> >>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and
>>>>  fise:TextAnnotation
>>>> >>> [1]
>>>> >>> >> in
>>>> >>> >> >> the
>>>> >>> >> >> >> >> >> >> enhancement metadata. As you are already
>>>> accessing the
>>>> >>> >> >> >> >> >> >> AnayzedText I
>>>> >>> >> >> >> >> >> >> would prefer using the
>>>>  NlpAnnotations.NER_ANNOTATION.
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> best
>>>> >>> >> >> >> >> >> >> Rupert
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> [1]
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian
>>>> Petroaca
>>>> >>> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> >> >> >> > Thanks.
>>>> >>> >> >> >> >> >> >> > I assume I should get the Named entities using
>>>> the
>>>> >>> same
>>>> >>> >> but
>>>> >>> >> >> >> >> >> >> > with
>>>> >>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>>>> >>> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>>>> >>> >> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
>>>> >>> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >> Hallo Cristian,
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF
>>>> enhancement
>>>> >>> results.
>>>> >>> >> >> You
>>>> >>> >> >> >> >> need to
>>>> >>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> here is some demo code you can use in the
>>>> >>> >> computeEnhancement
>>>> >>> >> >> >> >> method
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>         AnalysedText at =
>>>> >>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>>>> >>> >> >> >> >> ci,
>>>> >>> >> >> >> >> >> >> true);
>>>> >>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
>>>> >>> >> >> >> >> >> >> >> at.getSentences();
>>>> >>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as
>>>> single
>>>> >>> >> >> sentence
>>>> >>> >> >> >> >> >> >> >>             sections =
>>>> >>> >> Collections.singleton(at).iterator();
>>>> >>> >> >> >> >> >> >> >>         }
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>         while(sections.hasNext()){
>>>> >>> >> >> >> >> >> >> >>             Section section = sections.next();
>>>> >>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
>>>> >>> >> >> >> >> >> >> >>
>>>> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>>>> >>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
>>>> >>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
>>>> >>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>>>> >>> >> >> >> >> >> >> >>
>>>> if(phrase.value().getCategory() ==
>>>> >>> >> >> >> >> >> >> LexicalCategory.Noun){
>>>> >>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase
>>>> [{},{}]
>>>> >>> {}",
>>>> >>> >> >> new
>>>> >>> >> >> >> >> >> Object[]{
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>
>>>> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>>>> >>> >> >> >> >> >> >> >>                 }
>>>> >>> >> >> >> >> >> >> >>             }
>>>> >>> >> >> >> >> >> >> >>         }
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> hope this helps
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> best
>>>> >>> >> >> >> >> >> >> >> Rupert
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> [1]
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian
>>>> Petroaca
>>>> >>> >> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> >> >> >> >> > I started to implement the engine and I'm
>>>> having
>>>> >>> >> problems
>>>> >>> >> >> >> >> >> >> >> > with
>>>> >>> >> >> >> >> >> getting
>>>> >>> >> >> >> >> >> >> >> > results for noun phrases. I modified the
>>>> "default"
>>>> >>> >> >> weighted
>>>> >>> >> >> >> >> chain
>>>> >>> >> >> >> >> >> to
>>>> >>> >> >> >> >> >> >> also
>>>> >>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a
>>>> sample text
>>>> >>> :
>>>> >>> >> >> "Angela
>>>> >>> >> >> >> >> Merkel
>>>> >>> >> >> >> >> >> >> >> visted
>>>> >>> >> >> >> >> >> >> >> > China. The german chancellor met with various
>>>> >>> people".
>>>> >>> >> I
>>>> >>> >> >> >> >> expected
>>>> >>> >> >> >> >> >> that
>>>> >>> >> >> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> > RDF XML output would contain some info about
>>>> the
>>>> >>> noun
>>>> >>> >> >> >> >> >> >> >> > phrases
>>>> >>> >> >> >> >> but I
>>>> >>> >> >> >> >> >> >> >> cannot
>>>> >>> >> >> >> >> >> >> >> > see any.
>>>> >>> >> >> >> >> >> >> >> > Could you point me to the correct way to
>>>> generate
>>>> >>> the
>>>> >>> >> noun
>>>> >>> >> >> >> >> phrases?
>>>> >>> >> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >> > Thanks,
>>>> >>> >> >> >> >> >> >> >> > Cristian
>>>> >>> >> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca
>>>> <
>>>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
>>>> >>> >> >> >> >> >> >> >> >
>>>> >>> >> >> >> >> >> >> >> >> Opened
>>>> >>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
>>>> >>> >> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian
>>>> Petroaca <
>>>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>
>>>> >>> >> >> >> >> >> >> >> >> :
>>>> >>> >> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> >> Hi Rupert,
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea.
>>>> I'll also
>>>> >>> >> take a
>>>> >>> >> >> >> >> >> >> >> >>> look
>>>> >>> >> >> >> >> at
>>>> >>> >> >> >> >> >> >> Yago.
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked
>>>> about
>>>> >>> here.
>>>> >>> >> It
>>>> >>> >> >> >> >> >> >> >> >>> will
>>>> >>> >> >> >> >> >> >> probably
>>>> >>> >> >> >> >> >> >> >> >>> have just a draft-like description for now
>>>> and
>>>> >>> will
>>>> >>> >> be
>>>> >>> >> >> >> >> >> >> >> >>> updated
>>>> >>> >> >> >> >> >> as I
>>>> >>> >> >> >> >> >> >> go
>>>> >>> >> >> >> >> >> >> >> >>> along.
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>> Thanks,
>>>> >>> >> >> >> >> >> >> >> >>> Cristian
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert
>>>> Westenthaler <
>>>> >>> >> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>> Hi Cristian,
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You
>>>> should
>>>> >>> have
>>>> >>> >> a
>>>> >>> >> >> >> >> >> >> >> >>>> look at
>>>> >>> >> >> >> >> >> Yago2
>>>> >>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago
>>>> taxonomy
>>>> >>> is
>>>> >>> >> much
>>>> >>> >> >> >> >> better
>>>> >>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia.
>>>> Mapping
>>>> >>> >> >> >> >> >> >> >> >>>> suggestions of
>>>> >>> >> >> >> >> >> >> dbpedia
>>>> >>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both
>>>> dbpedia and
>>>> >>> >> yago2
>>>> >>> >> >> do
>>>> >>> >> >> >> >> >> provide
>>>> >>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings.
>>>> The
>>>> >>> >> Redmond's
>>>> >>> >> >> >> >> >> >> >> >>>> >> company
>>>> >>> >> >> >> >> >> made
>>>> >>> >> >> >> >> >> >> a
>>>> >>> >> >> >> >> >> >> >> >>>> >> huge profit".
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> Thats actually a very good example.
>>>> Spatial
>>>> >>> contexts
>>>> >>> >> >> are
>>>> >>> >> >> >> >> >> >> >> >>>> very
>>>> >>> >> >> >> >> >> >> >> >>>> important as they tend to be often used
>>>> for
>>>> >>> >> >> referencing.
>>>> >>> >> >> >> >> >> >> >> >>>> So I
>>>> >>> >> >> >> >> >> would
>>>> >>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial
>>>> context.
>>>> >>> For
>>>> >>> >> >> >> >> >> >> >> >>>> spatial
>>>> >>> >> >> >> >> >> >> Entities
>>>> >>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for
>>>> other
>>>> >>> >> (like a
>>>> >>> >> >> >> >> Person,
>>>> >>> >> >> >> >> >> >> >> >>>> Company) you could use relations to
>>>> spatial
>>>> >>> entities
>>>> >>> >> >> >> >> >> >> >> >>>> define
>>>> >>> >> >> >> >> >> their
>>>> >>> >> >> >> >> >> >> >> >>>> spatial context. This context could than
>>>> be
>>>> >>> used to
>>>> >>> >> >> >> >> >> >> >> >>>> correctly
>>>> >>> >> >> >> >> >> link
>>>> >>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the
>>>> "spatial"
>>>> >>> >> >> context
>>>> >>> >> >> >> >> >> >> >> >>>> of
>>>> >>> >> >> >> >> each
>>>> >>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities
>>>> that are
>>>> >>> >> cities,
>>>> >>> >> >> >> >> regions,
>>>> >>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension,
>>>> because
>>>> >>> those
>>>> >>> >> are
>>>> >>> >> >> >> >> >> >> >> >>>> very
>>>> >>> >> >> >> >> often
>>>> >>> >> >> >> >> >> >> used
>>>> >>> >> >> >> >> >> >> >> >>>> for coreferences.
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> [1]
>>>> http://www.mpi-inf.mpg.de/yago-naga/yago/
>>>> >>> >> >> >> >> >> >> >> >>>> [2]
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>>>> >>> >> >> >> >> >> >> >> >>>> [3]
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
>>>> >>> Petroaca
>>>> >>> >> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>>>> >>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories
>>>> for each
>>>> >>> >> entity,
>>>> >>> >> >> >> >> >> >> >> >>>> > in
>>>> >>> >> >> >> >> this
>>>> >>> >> >> >> >> >> >> case
>>>> >>> >> >> >> >> >> >> >> for
>>>> >>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> category:Companies_in_the_NASDAQ-100_Index
>>>> >>> >> >> >> >> >> >> >> >>>> > category:Microsoft
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> category:Software_companies_of_the_United_States
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> category:Software_companies_based_in_Washington_(state)
>>>> >>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> category:1975_establishments_in_the_United_States
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> category:Companies_based_in_Redmond,_Washington
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >>
>>>> category:Multinational_companies_headquartered_in_the_United_States
>>>> >>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
>>>> >>> >> >> Redmont,Washington"
>>>> >>> >> >> >> >> which
>>>> >>> >> >> >> >> >> >> could
>>>> >>> >> >> >> >> >> >> >> be
>>>> >>> >> >> >> >> >> >> >> >>>> > matched.
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > There is still other contextual
>>>> information
>>>> >>> from
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia
>>>> >>> >> >> >> >> which
>>>> >>> >> >> >> >> >> >> can
>>>> >>> >> >> >> >> >> >> >> be
>>>> >>> >> >> >> >> >> >> >> >>>> used.
>>>> >>> >> >> >> >> >> >> >> >>>> > For example for an Organization we
>>>> could also
>>>> >>> >> >> include :
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service
>>>> Providers
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack
>>>> Obama) :
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>>  dbpedia:Author
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>>  dbpedia:Lawyer
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this
>>>> as I
>>>> >>> think
>>>> >>> >> >> that
>>>> >>> >> >> >> >> >> >> >> >>>> > it
>>>> >>> >> >> >> >> may
>>>> >>> >> >> >> >> >> >> have
>>>> >>> >> >> >> >> >> >> >> >>>> some
>>>> >>> >> >> >> >> >> >> >> >>>> > value in increasing the number of
>>>> coreference
>>>> >>> >> >> >> >> >> >> >> >>>> > resolutions
>>>> >>> >> >> >> >> and
>>>> >>> >> >> >> >> >> I'd
>>>> >>> >> >> >> >> >> >> >> like
>>>> >>> >> >> >> >> >> >> >> >>>> to
>>>> >>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather
>>>> than
>>>> >>> recall
>>>> >>> >> >> since
>>>> >>> >> >> >> >> >> >> >> >>>> > we
>>>> >>> >> >> >> >> >> already
>>>> >>> >> >> >> >> >> >> >> have
>>>> >>> >> >> >> >> >> >> >> >>>> a
>>>> >>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the
>>>> stanford
>>>> >>> nlp
>>>> >>> >> tool
>>>> >>> >> >> >> >> >> >> >> >>>> > and
>>>> >>> >> >> >> >> this
>>>> >>> >> >> >> >> >> >> would
>>>> >>> >> >> >> >> >> >> >> >>>> be as
>>>> >>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is
>>>> how I
>>>> >>> would
>>>> >>> >> >> like
>>>> >>> >> >> >> >> >> >> >> >>>> > to
>>>> >>> >> >> >> >> use
>>>> >>> >> >> >> >> >> >> it).
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a
>>>> jira? I
>>>> >>> >> could
>>>> >>> >> >> >> >> >> >> >> >>>> > update
>>>> >>> >> >> >> >> it
>>>> >>> >> >> >> >> >> to
>>>> >>> >> >> >> >> >> >> >> show
>>>> >>> >> >> >> >> >> >> >> >>>> my
>>>> >>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if
>>>> it
>>>> >>> turns
>>>> >>> >> out
>>>> >>> >> >> >> >> >> >> >> >>>> > that
>>>> >>> >> >> >> >> it
>>>> >>> >> >> >> >> >> was
>>>> >>> >> >> >> >> >> >> a
>>>> >>> >> >> >> >> >> >> >> bad
>>>> >>> >> >> >> >> >> >> >> >>>> idea
>>>> >>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll
>>>> end up
>>>> >>> >> with
>>>> >>> >> >> >> >> >> >> >> >>>> > more
>>>> >>> >> >> >> >> >> >> knowledge
>>>> >>> >> >> >> >> >> >> >> >>>> about
>>>> >>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>>>> >>> >> >> >> >> >> >> >> >>>> >
>>>> >>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want
>>>> to be
>>>> >>> the
>>>> >>> >> >> >> >> >> >> >> >>>> >> devil's
>>>> >>> >> >> >> >> >> >> advocate
>>>> >>> >> >> >> >> >> >> >> but
>>>> >>> >> >> >> >> >> >> >> >>>> I'm
>>>> >>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using
>>>> the
>>>> >>> dbpedia
>>>> >>> >> >> >> >> categories
>>>> >>> >> >> >> >> >> >> >> feature.
>>>> >>> >> >> >> >> >> >> >> >>>> For
>>>> >>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
>>>> >>> "Microsoft
>>>> >>> >> >> posted
>>>> >>> >> >> >> >> >> >> >> >>>> >> its
>>>> >>> >> >> >> >> >> 2013
>>>> >>> >> >> >> >> >> >> >> >>>> earnings.
>>>> >>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge
>>>> profit".
>>>> >>> So,
>>>> >>> >> maybe
>>>> >>> >> >> >> >> >> including
>>>> >>> >> >> >> >> >> >> more
>>>> >>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia
>>>> could
>>>> >>> >> increase
>>>> >>> >> >> the
>>>> >>> >> >> >> >> recall
>>>> >>> >> >> >> >> >> >> but
>>>> >>> >> >> >> >> >> >> >> of
>>>> >>> >> >> >> >> >> >> >> >>>> course
>>>> >>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>> >> Cheers,
>>>> >>> >> >> >> >> >> >> >> >>>> >> Rafa
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
>>>> >>> escribió:
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description
>>>> of the
>>>> >>> >> steps
>>>> >>> >> >> >> >> >> >> >> >>>> >> for
>>>> >>> >> >> >> >> >> making
>>>> >>> >> >> >> >> >> >> this
>>>> >>> >> >> >> >> >> >> >> >>>> kind of
>>>> >>> >> >> >> >> >> >> >> >>>> >>> coreference work.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the
>>>> following
>>>> >>> >> text in
>>>> >>> >> >> >> >> >> >> >> >>>> >>> the
>>>> >>> >> >> >> >> >> steps
>>>> >>> >> >> >> >> >> >> >> below
>>>> >>> >> >> >> >> >> >> >> >>>> in
>>>> >>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer :
>>>> "Microsoft
>>>> >>> posted
>>>> >>> >> >> its
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 2013
>>>> >>> >> >> >> >> >> >> >> earnings.
>>>> >>> >> >> >> >> >> >> >> >>>> The
>>>> >>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text
>>>> which
>>>> >>> has :
>>>> >>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which
>>>> implies
>>>> >>> >> reference
>>>> >>> >> >> to
>>>> >>> >> >> >> >> >> >> >> >>>> >>> an
>>>> >>> >> >> >> >> >> entity
>>>> >>> >> >> >> >> >> >> >> local
>>>> >>> >> >> >> >> >> >> >> >>>> to
>>>> >>> >> >> >> >> >> >> >> >>>> >>> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but
>>>> not
>>>> >>> >> "another,
>>>> >>> >> >> >> >> every",
>>>> >>> >> >> >> >> >> etc
>>>> >>> >> >> >> >> >> >> >> which
>>>> >>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity
>>>> outside of
>>>> >>> the
>>>> >>> >> >> text.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun
>>>> aside
>>>> >>> from
>>>> >>> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>> main
>>>> >>> >> >> >> >> >> >> required
>>>> >>> >> >> >> >> >> >> >> >>>> noun
>>>> >>> >> >> >> >> >> >> >> >>>> >>> which
>>>> >>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I
>>>> will not
>>>> >>> >> count
>>>> >>> >> >> >> >> >> >> >> >>>> >>> "The
>>>> >>> >> >> >> >> >> >> company"
>>>> >>> >> >> >> >> >> >> >> as
>>>> >>> >> >> >> >> >> >> >> >>>> being
>>>> >>> >> >> >> >> >> >> >> >>>> >>> a
>>>> >>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
>>>> >>> create a
>>>> >>> >> lot
>>>> >>> >> >> of
>>>> >>> >> >> >> >> false
>>>> >>> >> >> >> >> >> >> >> >>>> positives by
>>>> >>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of
>>>> some words
>>>> >>> >> such
>>>> >>> >> >> as
>>>> >>> >> >> >> >> >> >> >> >>>> >>> "in
>>>> >>> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> company
>>>> >>> >> >> >> >> >> >> >> >>>> of
>>>> >>> >> >> >> >> >> >> >> >>>> >>> good people".
>>>> >>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good
>>>> candidate
>>>> >>> >> since we
>>>> >>> >> >> >> >> >> >> >> >>>> >>> also
>>>> >>> >> >> >> >> >> have
>>>> >>> >> >> >> >> >> >> >> >>>> "software".
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase
>>>> to the
>>>> >>> >> >> contents
>>>> >>> >> >> >> >> >> >> >> >>>> >>> of
>>>> >>> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> dbpedia
>>>> >>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found
>>>> prior
>>>> >>> to
>>>> >>> >> the
>>>> >>> >> >> >> >> location
>>>> >>> >> >> >> >> >> of
>>>> >>> >> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> noun
>>>> >>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
>>>> >>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the
>>>> following
>>>> >>> >> format
>>>> >>> >> >> >> >> >> >> >> >>>> >>> (for
>>>> >>> >> >> >> >> >> >> Microsoft
>>>> >>> >> >> >> >> >> >> >> for
>>>> >>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the
>>>> United
>>>> >>> >> >> States".
>>>> >>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software
>>>> company" with
>>>> >>> >> that.
>>>> >>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun
>>>> in the
>>>> >>> >> dbpedia
>>>> >>> >> >> >> >> category
>>>> >>> >> >> >> >> >> >> has a
>>>> >>> >> >> >> >> >> >> >> >>>> plural
>>>> >>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all
>>>> categories
>>>> >>> which
>>>> >>> >> I
>>>> >>> >> >> >> >> >> >> >> >>>> >>> saw. I
>>>> >>> >> >> >> >> >> don't
>>>> >>> >> >> >> >> >> >> >> know
>>>> >>> >> >> >> >> >> >> >> >>>> if
>>>> >>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
>>>> >>> thought
>>>> >>> >> of
>>>> >>> >> >> >> >> applying a
>>>> >>> >> >> >> >> >> >> >> >>>> lemmatizer on
>>>> >>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in
>>>> order
>>>> >>> for
>>>> >>> >> them
>>>> >>> >> >> to
>>>> >>> >> >> >> >> have a
>>>> >>> >> >> >> >> >> >> >> common
>>>> >>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the
>>>> noun
>>>> >>> phrase
>>>> >>> >> >> itself
>>>> >>> >> >> >> >> has a
>>>> >>> >> >> >> >> >> >> plural
>>>> >>> >> >> >> >> >> >> >> >>>> form.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for
>>>> comparison
>>>> >>> only the
>>>> >>> >> >> >> >> >> >> >> >>>> >>> words in
>>>> >>> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> category
>>>> >>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
>>>> >>> prepositions
>>>> >>> >> or
>>>> >>> >> >> >> >> >> determiners
>>>> >>> >> >> >> >> >> >> >> such
>>>> >>> >> >> >> >> >> >> >> >>>> as "of
>>>> >>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos
>>>> tag the
>>>> >>> >> >> categories
>>>> >>> >> >> >> >> >> contents
>>>> >>> >> >> >> >> >> >> as
>>>> >>> >> >> >> >> >> >> >> >>>> well.
>>>> >>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and
>>>> lemma
>>>> >>> on
>>>> >>> >> the
>>>> >>> >> >> >> >> dbpedia
>>>> >>> >> >> >> >> >> >> >> >>>> categories when
>>>> >>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity
>>>> hub and
>>>> >>> >> storing
>>>> >>> >> >> >> >> >> >> >> >>>> >>> them
>>>> >>> >> >> >> >> for
>>>> >>> >> >> >> >> >> >> later
>>>> >>> >> >> >> >> >> >> >> >>>> use - I
>>>> >>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
>>>> >>> moment.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in
>>>> the
>>>> >>> noun
>>>> >>> >> >> phrase
>>>> >>> >> >> >> >> with
>>>> >>> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> equivalent
>>>> >>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on
>>>> the
>>>> >>> number
>>>> >>> >> of
>>>> >>> >> >> >> >> matches I
>>>> >>> >> >> >> >> >> >> can
>>>> >>> >> >> >> >> >> >> >> >>>> create a
>>>> >>> >> >> >> >> >> >> >> >>>> >>> confidence level.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase
>>>> with
>>>> >>> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>> rdf:type
>>>> >>> >> >> >> >> from
>>>> >>> >> >> >> >> >> >> >> dbpedia
>>>> >>> >> >> >> >> >> >> >> >>>> of the
>>>> >>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches
>>>> increase the
>>>> >>> >> >> confidence
>>>> >>> >> >> >> >> level.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named
>>>> entities
>>>> >>> which
>>>> >>> >> can
>>>> >>> >> >> >> >> >> >> >> >>>> >>> match a
>>>> >>> >> >> >> >> >> >> certain
>>>> >>> >> >> >> >> >> >> >> >>>> noun
>>>> >>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with
>>>> the
>>>> >>> >> closest
>>>> >>> >> >> >> >> >> >> >> >>>> >>> named
>>>> >>> >> >> >> >> >> entity
>>>> >>> >> >> >> >> >> >> >> prior
>>>> >>> >> >> >> >> >> >> >> >>>> to it
>>>> >>> >> >> >> >> >> >> >> >>>> >>> in the text.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> What do you think?
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> Cristian
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>>>> >>> >> >> >> >> cristian.petroaca@gmail.com>:
>>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete
>>>> heursitic but
>>>> >>> I'm
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> working on
>>>> >>> >> >> >> >> >> it.
>>>> >>> >> >> >> >> >> >> I'll
>>>> >>> >> >> >> >> >> >> >> >>>> provide
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me
>>>> a
>>>> >>> >> feedback on
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> it.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref
>>>> tools
>>>> >>> such as
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
>>>> >>> >> >> >> >> and
>>>> >>> >> >> >> >> >> >> >> >>>> CherryPicker
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> and
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a
>>>> coreference.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> Cristian
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <
>>>> rharo@apache.org>:
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about
>>>> your
>>>> >>> >> concrete
>>>> >>> >> >> >> >> heuristic,
>>>> >>> >> >> >> >> >> >> in my
>>>> >>> >> >> >> >> >> >> >> >>>> honest
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could
>>>> produce a
>>>> >>> lot of
>>>> >>> >> >> false
>>>> >>> >> >> >> >> >> >> positives. I
>>>> >>> >> >> >> >> >> >> >> >>>> don't
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> know
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some
>>>> "locality"
>>>> >>> >> >> features
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> to
>>>> >>> >> >> >> >> >> detect
>>>> >>> >> >> >> >> >> >> >> such
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take
>>>> into
>>>> >>> account
>>>> >>> >> >> that
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> it
>>>> >>> >> >> >> >> is
>>>> >>> >> >> >> >> >> >> quite
>>>> >>> >> >> >> >> >> >> >> >>>> usual
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> that
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs
>>>> even in
>>>> >>> >> >> different
>>>> >>> >> >> >> >> >> >> paragraphs.
>>>> >>> >> >> >> >> >> >> >> >>>> Although
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> I'm
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
>>>> >>> >> Understanding,
>>>> >>> >> >> I
>>>> >>> >> >> >> >> would
>>>> >>> >> >> >> >> >> say
>>>> >>> >> >> >> >> >> >> it
>>>> >>> >> >> >> >> >> >> >> is
>>>> >>> >> >> >> >> >> >> >> >>>> quite
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent
>>>> precision/recall
>>>> >>> rates
>>>> >>> >> >> for
>>>> >>> >> >> >> >> >> >> coreferencing
>>>> >>> >> >> >> >> >> >> >> >>>> using
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a
>>>> try to
>>>> >>> >> others
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> tools
>>>> >>> >> >> >> >> like
>>>> >>> >> >> >> >> >> >> BART
>>>> >>> >> >> >> >> >> >> >> (
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
>>>> >>> escribió:
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
>>>> >>> implementing
>>>> >>> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Event
>>>> >>> >> >> >> >> >> >> >> extraction
>>>> >>> >> >> >> >> >> >> >> >>>> Engine
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
>>>> >>> >> >> >> >> >> >>
>>>> https://issues.apache.org/jira/browse/STANBOL-1121is
>>>> >>> >> >> >> >> >> >> >> >>>> to
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> have
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the
>>>> given text.
>>>> >>> >> This
>>>> >>> >> >> is
>>>> >>> >> >> >> >> >> provided
>>>> >>> >> >> >> >> >> >> now
>>>> >>> >> >> >> >> >> >> >> >>>> via the
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as
>>>> I saw
>>>> >>> this
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> module
>>>> >>> >> >> >> >> is
>>>> >>> >> >> >> >> >> >> >> performing
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal
>>>> (Barack
>>>> >>> Obama
>>>> >>> >> and
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
>>>> >>> >> >> >> >> >> Obama)
>>>> >>> >> >> >> >> >> >> >> >>>> coreference
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences
>>>> from
>>>> >>> the
>>>> >>> >> text
>>>> >>> >> >> I
>>>> >>> >> >> >> >> though
>>>> >>> >> >> >> >> >> of
>>>> >>> >> >> >> >> >> >> >> >>>> creating
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> some
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind
>>>> of
>>>> >>> >> >> coreference :
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights.
>>>> The
>>>> >>> >> software
>>>> >>> >> >> >> >> company
>>>> >>> >> >> >> >> >> just
>>>> >>> >> >> >> >> >> >> >> >>>> announced
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> its
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company"
>>>> obviously
>>>> >>> refers
>>>> >>> >> to
>>>> >>> >> >> >> >> "Apple".
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences
>>>> of
>>>> >>> Named
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
>>>> >>> >> >> >> >> >> which
>>>> >>> >> >> >> >> >> >> are
>>>> >>> >> >> >> >> >> >> >> of
>>>> >>> >> >> >> >> >> >> >> >>>> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in
>>>> this
>>>> >>> case
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
>>>> >>> >> >> >> >> and
>>>> >>> >> >> >> >> >> >> also
>>>> >>> >> >> >> >> >> >> >> >>>> have
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in
>>>> the
>>>> >>> dbpedia
>>>> >>> >> >> >> >> categories
>>>> >>> >> >> >> >> >> of
>>>> >>> >> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> named
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such
>>>> as
>>>> >>> "The
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> software
>>>> >>> >> >> >> >> >> >> company" in
>>>> >>> >> >> >> >> >> >> >> >>>> the
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> text
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using
>>>> the
>>>> >>> new
>>>> >>> >> Pos
>>>> >>> >> >> Tag
>>>> >>> >> >> >> >> Based
>>>> >>> >> >> >> >> >> >> Phrase
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
>>>> >>> >> dependency
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
>>>> >>> >> >> >> >> >> the
>>>> >>> >> >> >> >> >> >> >> >>>> sentence and
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or
>>>> objects.
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if
>>>> this
>>>> >>> kind
>>>> >>> >> of
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic
>>>> >>> >> >> >> >> >> would
>>>> >>> >> >> >> >> >> >> be
>>>> >>> >> >> >> >> >> >> >> >>>> useful
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> as a
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in
>>>> case the
>>>> >>> >> >> precision
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> and
>>>> >>> >> >> >> >> >> >> recall
>>>> >>> >> >> >> >> >> >> >> are
>>>> >>> >> >> >> >> >> >> >> >>>> good
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>>> >>> >> >> >> >> >> >> >> >>>> >>
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>> --
>>>> >>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
>>>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>>>> >>> >> >> >> >> >> >> ++43-699-11108907
>>>> >>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>>>> >>> >> >> >> >> >> >> >> >>>>
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>>
>>>> >>> >> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> >> --
>>>> >>> >> >> >> >> >> >> >> | Rupert Westenthaler
>>>> >>> >> >> >> >> >> >> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
>>>> >>> >> >> >> >> ++43-699-11108907
>>>> >>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
>>>> >>> >> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >> >> --
>>>> >>> >> >> >> >> >> >> | Rupert Westenthaler
>>>> >>> >> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >> >> | Bodenlehenstraße 11
>>>> >>> >> >> >> >> >> >> ++43-699-11108907
>>>> >>> >> >> >> >> >> >> | A-5500 Bischofshofen
>>>> >>> >> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >> >> --
>>>> >>> >> >> >> >> >> | Rupert Westenthaler
>>>> >>> >> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> >> | Bodenlehenstraße 11
>>>> >>> >> >> ++43-699-11108907
>>>> >>> >> >> >> >> >> | A-5500 Bischofshofen
>>>> >>> >> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >>
>>>> >>> >> >> >> >> --
>>>> >>> >> >> >> >> | Rupert Westenthaler
>>>> >>> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> >> | Bodenlehenstraße 11
>>>> >>> >> ++43-699-11108907
>>>> >>> >> >> >> >> | A-5500 Bischofshofen
>>>> >>> >> >> >> >>
>>>> >>> >> >> >>
>>>> >>> >> >> >>
>>>> >>> >> >> >>
>>>> >>> >> >> >> --
>>>> >>> >> >> >> | Rupert Westenthaler
>>>> rupert.westenthaler@gmail.com
>>>> >>> >> >> >> | Bodenlehenstraße 11
>>>> >>> ++43-699-11108907
>>>> >>> >> >> >> | A-5500 Bischofshofen
>>>> >>> >> >> >
>>>> >>> >> >> >
>>>> >>> >> >>
>>>> >>> >> >>
>>>> >>> >> >>
>>>> >>> >> >> --
>>>> >>> >> >> | Rupert Westenthaler
>>>> rupert.westenthaler@gmail.com
>>>> >>> >> >> | Bodenlehenstraße 11
>>>> ++43-699-11108907
>>>> >>> >> >> | A-5500 Bischofshofen
>>>> >>> >> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> --
>>>> >>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> >>> >> | Bodenlehenstraße 11
>>>> ++43-699-11108907
>>>> >>> >> | A-5500 Bischofshofen
>>>> >>> >>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> >>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> >>> | A-5500 Bischofshofen
>>>> >>>
>>>> >>
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>>
>>>
>>>
>>
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
Ok, I think I kind of figured it out. If I want to use the dbpedia data
index I need to use the SiteManager to get the Site with id = "dbpedia".
Then I can query the Site directly.

I have some additional questions though :
1. In my particular case I want to be able to also get the yago class of
the given entity. These properties come with yago-types.nt file from
dbpedia and this file is not present in the entityhub dbpedia data fetch
scripts here :
https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/dbpedia-3.8/fetch_data_en_int.sh.
Also this file comes with dbpedia 3.9. This means that I need to rebuild
the dbpedia index data with 3.9 and the new yago-types.nt file. Is this
correct?

2. I also need to be able to get some specific dbpedia properties from the
index, such as dbpedia-owl:locationCity and others for a given entity. At
the moment these are not available when doing a query on the dbpedia Site.
I suppose I need to place them in
https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/src/main/resources/indexing/config/mappings.txtand
do a rebuild of the dbpedia index?

Thanks.
Cristian


2014-04-28 16:58 GMT+03:00 Cristian Petroaca <cr...@gmail.com>:

> Hi,
>
> I've started to implement the dbpedia properties logic and I'd like to get
> some feedback on some things that I am doing :
> I want to get a NER from the text and search for it in the dbpedia data so
> that I can get certain dbpedia properties.
> The way I'm trying to do this is by getting the NER_ANNOTATION chunk's
> text and search that in the Entityhub ( which from what I saw is by default
> configured with dbpedia data). I haven't yet performed a query to actually
> get the data but before I continue I'd like to ask if this is the way to go?
>
> Thanks,
> Cristian
>
>
> 2014-03-28 15:12 GMT+02:00 Cristian Petroaca <cr...@gmail.com>
> :
>
>> Examples :
>>
>> 1. Group membership :
>>     a. Spatial membership :
>>
>>         "Microsoft anounced its 2013 earnings. <coref>The Richmond-based
>> company</coref> made huge profits."
>>
>>     b. Organisational membership :
>>
>>        "Mick Jagger started a new solo album. <coref>The Rolling Stones
>> singer</coref> did not say what the theme will be."
>>
>> 2. Functional membership :
>>
>>    "Allianz announced its 2013 earnings. <coref>The financial services
>> company</coref> made a huge profit."
>>
>> 3.  If no matches were found for the current NER with rules from above
>> then if the yago:class which matched has more than 2 nouns then we also
>> consider this a good co-reference but with a lower confidence maybe.
>>
>>    "Boris Becker will take part in a demonstrative tennis match.
>> <coref>The former tennis player</coref> will play again after 10 years."
>>
>>
>> 2014-03-28 12:22 GMT+02:00 Rupert Westenthaler <
>> rupert.westenthaler@gmail.com>:
>>
>>> Hi Cristian, all
>>>
>>> Looks good to me, nut I am not sure if I got everything. If you could
>>> provide example texts where those rules apply it would make it much
>>> easier to understand.
>>>
>>> Instead of using dbpedia properties you should define your own domain
>>> model (ontology). You can than align the dbpedia properties to your
>>> model. This will allow it to apply this approach also to knowledge
>>> bases other than dbpedia.
>>>
>>> For people new to this thread: The above message adds to the
>>> suggestion first made by Cristian on 4th February. Also the following
>>> 4 messages (until 7th Feb) provide additional context.
>>>
>>> best
>>> Rupert
>>>
>>>
>>> On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca
>>> <cr...@gmail.com> wrote:
>>> > Hi guys,
>>> >
>>> > After Rupert's last suggestions related to this enhancement engine I
>>> > devised a more comprehensive algorithm for matching the noun phrases
>>> > against the NER properties.Please take a look and let me know what you
>>> > think. Thanks.
>>> >
>>> > The following rules will be applied to every noun phrase in order to
>>> find
>>> > co-references:
>>> >
>>> > 1. For each NER prior to the current noun phrase in the text match the
>>> > yago:class label to the contents of the noun phrase.
>>> >
>>> > For the NERs which have a yago:class which matches, apply:
>>> >
>>> > 2. Group membership rules :
>>> >
>>> >     a. spatial membership : the NER is part of a Location. If the noun
>>> > phrase contains a LOCATION or a demonym then check any location
>>> properties
>>> > of the matching NER.
>>> >
>>> >     If matching NER is a :
>>> >     - person, match against :birthPlace, :region, :nationality
>>> >     - organisation, match against :foundationPlace, :locationCity,
>>> > :location, :hometown
>>> >     - place, match against :country, :subdivisionName, :location,
>>> >
>>> >     Ex: The Italian President, The Richmond-based company
>>> >
>>> >     b. organisational membership : the NER is part of an Organisation.
>>> If
>>> > the noun phrase contains an ORGANISATION then check the following
>>> > properties of the maching NER:
>>> >
>>> >     If matching NER is :
>>> >     - person, match against :occupation, :associatedActs
>>> >     - organisation ?
>>> >     - location ?
>>> >
>>> > Ex: The Microsoft executive, The Pink Floyd singer
>>> >
>>> > 3. Functional description rule: the noun phrase describes what the NER
>>> does
>>> > conceptually.
>>> > If there are no NERs in the noun phrase then match the following
>>> properties
>>> > of the matching NER to the contents of the noun phrase (aside from the
>>> > nouns which are part of the yago:class) :
>>> >
>>> >    If NER is a:
>>> >    - person ?
>>> >    - organisation : , match against :service, :industry, :genre
>>> >    - location ?
>>> >
>>> > Ex:  The software company.
>>> >
>>> > 4. If no matches were found for the current NER with rules 2 or 3 then
>>> if
>>> > the yago:class which matched has more than 2 nouns then we also
>>> consider
>>> > this a good co-reference but with a lower confidence maybe.
>>> >
>>> > Ex: The former tennis player, the theoretical physicist.
>>> >
>>> > 5. Based on the number of nouns which matched we create a confidence
>>> level.
>>> > The number of matched nouns cannot be lower than 2 and we must have a
>>> > yago:class match.
>>> >
>>> > For all NERs which got to this point, select the closest ones in the
>>> text
>>> > to the noun phrase which matched against the same properties
>>> (yago:class
>>> > and dbpedia) and mark them as co-references.
>>> >
>>> > Note: all noun phrases need to be lemmatized before all of this in case
>>> > there are any plurals.
>>> >
>>> >
>>> > 2014-03-25 20:50 GMT+02:00 Cristian Petroaca <
>>> cristian.petroaca@gmail.com>:
>>> >
>>> >> That worked. Thanks.
>>> >>
>>> >> So, there are no exceptions during the startup of the launcher.
>>> >> The component tab in the felix console shows 6 WeightedChains the
>>> first
>>> >> time, including the default one but after my changes and a restart
>>> there
>>> >> are only 5 - the default one is missing altogether.
>>> >>
>>> >>
>>> >> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
>>> >> rupert.westenthaler@gmail.com>:
>>> >>
>>> >> Hi Cristian,
>>> >>>
>>> >>> I do see the same problem since last Friday. The solution as mentions
>>> >>> by [1] works for me.
>>> >>>
>>> >>>     mvn -Djsse.enableSNIExtension=false {goals}
>>> >>>
>>> >>> No Idea why https connections to github do currently cause this. I
>>> >>> could not find anything related via Google. So I suggest to use the
>>> >>> system property for now. If this persists for longer we can adapt the
>>> >>> build files accordingly.
>>> >>>
>>> >>> best
>>> >>> Rupert
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> [1]
>>> >>>
>>> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
>>> >>>
>>> >>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
>>> >>> <cr...@gmail.com> wrote:
>>> >>> > I did a clean on the whole project and now I wanted to do another
>>> "mvn
>>> >>> > clean install" but I am getting this :
>>> >>> >
>>> >>> > "[INFO]
>>> >>> >
>>> ------------------------------------------------------------------------
>>> >>> > [ERROR] Failed to execute goal
>>> >>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
>>> >>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es:
>>> An
>>> >>> Ant
>>> >>> > BuildE
>>> >>> > xception has occured: The following error occurred while executing
>>> this
>>> >>> > line:
>>> >>> > [ERROR]
>>> >>> >
>>> C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
>>> >>> > 3: Failed to copy
>>> >>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
>>> >>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin
>>> to
>>> >>> > C:\Data\Pr
>>> >>> >
>>> >>>
>>> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
>>> >>> > data\opennlp\es-pos-maxent.bin due to
>>> javax.net.ssl.SSLProtocolException
>>> >>> > handshake alert : unrecognized_name"
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
>>> >>> > rupert.westenthaler@gmail.com>:
>>> >>> >
>>> >>> >> Hi Cristian,
>>> >>> >>
>>> >>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
>>> >>> >> <cr...@gmail.com> wrote:
>>> >>> >> >
>>> >>> >>
>>> >>>
>>> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
>>> >>> >> > service.ranking=I"-2147483648"
>>> >>> >> > stanbol.enhancer.chain.name="default"
>>> >>> >>
>>> >>> >> Does look fine to me. Do you see any exception during the startup
>>> of
>>> >>> >> the launcher. Can you check the status of this component in the
>>> >>> >> component tab of the felix web console [1] (search for
>>> >>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain").
>>> If
>>> >>> >> you have multiple you can find the correct one by comparing the
>>> >>> >> "Properties" with those in the configuration file.
>>> >>> >>
>>> >>> >> I guess that the according service is in the 'unsatisfied' as you
>>> do
>>> >>> >> not see it in the web interface. But if this is the case you
>>> should
>>> >>> >> also see the according exception in the log. You can also manually
>>> >>> >> stop/start the component. In this case the exception should be
>>> >>> >> re-thrown and you do not need to search the log for it.
>>> >>> >>
>>> >>> >> best
>>> >>> >> Rupert
>>> >>> >>
>>> >>> >>
>>> >>> >> [1] http://localhost:8080/system/console/components
>>> >>> >>
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
>>> >>> >> rupert.westenthaler@gmail.com
>>> >>> >> >>:
>>> >>> >> >
>>> >>> >> >> Hi Cristian,
>>> >>> >> >>
>>> >>> >> >> you can not send attachments to the list. Please copy the
>>> contents
>>> >>> >> >> directly to the mail
>>> >>> >> >>
>>> >>> >> >> thx
>>> >>> >> >> Rupert
>>> >>> >> >>
>>> >>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>>> >>> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> > The config attached.
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>>> >>> >> >> > <ru...@gmail.com>:
>>> >>> >> >> >
>>> >>> >> >> >> Hi Cristian,
>>> >>> >> >> >>
>>> >>> >> >> >> can you provide the contents of the chain after your
>>> >>> modifications?
>>> >>> >> >> >> Would be interesting to test why the chain is no longer
>>> active
>>> >>> after
>>> >>> >> >> >> the restart.
>>> >>> >> >> >>
>>> >>> >> >> >> You can find the config file in the 'stanbol/fileinstall'
>>> folder.
>>> >>> >> >> >>
>>> >>> >> >> >> best
>>> >>> >> >> >> Rupert
>>> >>> >> >> >>
>>> >>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>>> >>> >> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> >> > Related to the default chain selection rules : before
>>> restart I
>>> >>> >> had a
>>> >>> >> >> >> > chain
>>> >>> >> >> >> > with the name 'default' as in I could access it via
>>> >>> >> >> >> > enhancer/chain/default.
>>> >>> >> >> >> > Then I just added another engine to the 'default' chain. I
>>> >>> assumed
>>> >>> >> >> that
>>> >>> >> >> >> > after the restart the chain with the 'default' name would
>>> be
>>> >>> >> >> persisted.
>>> >>> >> >> >> > So
>>> >>> >> >> >> > the first rule should have been applied after the restart
>>> as
>>> >>> well.
>>> >>> >> But
>>> >>> >> >> >> > instead I cannot reach it via enhancer/chain/default
>>> anymore
>>> >>> so its
>>> >>> >> >> >> > gone.
>>> >>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in
>>> any
>>> >>> way, I
>>> >>> >> >> just
>>> >>> >> >> >> > wanted to understand where the problem is.
>>> >>> >> >> >> >
>>> >>> >> >> >> >
>>> >>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>>> >>> >> >> >> > <rupert.westenthaler@gmail.com
>>> >>> >> >> >> >>:
>>> >>> >> >> >> >
>>> >>> >> >> >> >> Hi Cristian
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>>> >>> >> >> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> > 2. I start the stable launcher -> create a new
>>> instance of
>>> >>> the
>>> >>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At
>>> this
>>> >>> point
>>> >>> >> >> >> >> > everything
>>> >>> >> >> >> >> > looks good and works ok.
>>> >>> >> >> >> >> > After I restart the server the default chain is gone
>>> and
>>> >>> >> instead I
>>> >>> >> >> >> >> > see
>>> >>> >> >> >> >> this
>>> >>> >> >> >> >> > in the enhancement chains page : all-active (default,
>>> id:
>>> >>> 149,
>>> >>> >> >> >> >> > ranking:
>>> >>> >> >> >> >> 0,
>>> >>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not
>>> contain
>>> >>> the
>>> >>> >> >> >> >> > 'default'
>>> >>> >> >> >> >> > word before the restart.
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> Please note the default chain selection rules as
>>> described at
>>> >>> [1].
>>> >>> >> >> You
>>> >>> >> >> >> >> can also access chains chains under
>>> >>> '/enhancer/chain/{chain-name}'
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> best
>>> >>> >> >> >> >> Rupert
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> [1]
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> > It looks like the config files are exactly what I need.
>>> >>> Thanks.
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >>:
>>> >>> >> >> >> >> >
>>> >>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>>> >>> >> >> >> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> >> >> >> > Thanks Rupert.
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> > A couple more questions/issues :
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing
>>> this
>>> >>> in the
>>> >>> >> >> >> >> >> > console
>>> >>> >> >> >> >> >> > output :
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted
>>> Chains get
>>> >>> >> messed
>>> >>> >> >> >> >> >> > up. I
>>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine
>>> to it
>>> >>> so
>>> >>> >> there
>>> >>> >> >> >> >> >> > are
>>> >>> >> >> >> >> 11
>>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>>> contains
>>> >>> >> around
>>> >>> >> >> 23
>>> >>> >> >> >> >> >> engines
>>> >>> >> >> >> >> >> > in total.
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> I was not able to replicate this. What I tried was
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> (1) start up the stable launcher
>>> >>> >> >> >> >> >> (2) add an additional engine to the default chain
>>> >>> >> >> >> >> >> (3) restart the launcher
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> The default chain was not changed after (2) and (3).
>>> So I
>>> >>> would
>>> >>> >> >> need
>>> >>> >> >> >> >> >> further information for knowing why this is happening.
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> Generally it is better to create you own chain
>>> instance as
>>> >>> >> >> modifying
>>> >>> >> >> >> >> >> one that is provided by the default configuration. I
>>> would
>>> >>> also
>>> >>> >> >> >> >> >> recommend that you keep your test configuration in
>>> text
>>> >>> files
>>> >>> >> and
>>> >>> >> >> to
>>> >>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing
>>> so
>>> >>> >> prevent
>>> >>> >> >> you
>>> >>> >> >> >> >> >> from manually entering the configuration after a
>>> software
>>> >>> >> update.
>>> >>> >> >> >> >> >> The
>>> >>> >> >> >> >> >> production-mode section [3] provides information on
>>> how to
>>> >>> do
>>> >>> >> >> that.
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> best
>>> >>> >> >> >> >> >> Rupert
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> [1]
>>> https://issues.apache.org/jira/browse/STANBOL-1278
>>> >>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
>>> >>> >> >> >> >> >> [3]
>>> http://stanbol.apache.org/docs/trunk/production-mode
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> > ERROR: Bundle
>>> >>> org.apache.stanbol.enhancer.engine.topic.web
>>> >>> >> >> [153]:
>>> >>> >> >> >> >> Error
>>> >>> >> >> >> >> >> > starting
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >>
>>> >>>
>>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>>> >>> >> >> >> >> >> > (org.osgi
>>> >>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in
>>> >>> bundle
>>> >>> >> >> >> >> >> > org.apache.stanbol.e
>>> >>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve
>>> 153.0:
>>> >>> >> missing
>>> >>> >> >> >> >> >> > requirement [15
>>> >>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>>> >>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>>> >>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
>>> >>> constraint in
>>> >>> >> >> >> >> >> > bundle
>>> >>> >> >> >> >> >> > org.apache.s
>>> >>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to
>>> resolve
>>> >>> >> 153.0:
>>> >>> >> >> >> >> missing
>>> >>> >> >> >> >> >> > require
>>> >>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>>> >>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>>> >>> >> >> >> >> >> > )
>>> >>> >> >> >> >> >> >         at
>>> >>> >> >> >> >> >>
>>> >>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>>> >>> >> >> >> >> >> >         at
>>> >>> >> >> >> >>
>>> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>>> >>> >> >> >> >> >> >         at
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >>
>>> >>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >         at
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >>
>>> >>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>>> >>> >> >> >> >> >> > )
>>> >>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> > Despite of this the server starts fine and I can
>>> use the
>>> >>> >> >> enhancer
>>> >>> >> >> >> >> fine.
>>> >>> >> >> >> >> >> Do
>>> >>> >> >> >> >> >> > you guys see this as well?
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted
>>> Chains get
>>> >>> >> messed
>>> >>> >> >> >> >> >> > up. I
>>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine
>>> to it
>>> >>> so
>>> >>> >> there
>>> >>> >> >> >> >> >> > are
>>> >>> >> >> >> >> 11
>>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>>> contains
>>> >>> >> around
>>> >>> >> >> 23
>>> >>> >> >> >> >> >> engines
>>> >>> >> >> >> >> >> > in total.
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>>> >>> >> >> >> >> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >> >>:
>>> >>> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> Hi Cristian,
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> NER Annotations are typically available as both
>>> >>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and
>>>  fise:TextAnnotation
>>> >>> [1]
>>> >>> >> in
>>> >>> >> >> the
>>> >>> >> >> >> >> >> >> enhancement metadata. As you are already accessing
>>> the
>>> >>> >> >> >> >> >> >> AnayzedText I
>>> >>> >> >> >> >> >> >> would prefer using the
>>>  NlpAnnotations.NER_ANNOTATION.
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> best
>>> >>> >> >> >> >> >> >> Rupert
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> [1]
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>>> >>> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> >> >> >> >> > Thanks.
>>> >>> >> >> >> >> >> >> > I assume I should get the Named entities using
>>> the
>>> >>> same
>>> >>> >> but
>>> >>> >> >> >> >> >> >> > with
>>> >>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>>> >>> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>>> >>> >> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
>>> >>> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >> Hallo Cristian,
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement
>>> >>> results.
>>> >>> >> >> You
>>> >>> >> >> >> >> need to
>>> >>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> here is some demo code you can use in the
>>> >>> >> computeEnhancement
>>> >>> >> >> >> >> method
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>         AnalysedText at =
>>> >>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>>> >>> >> >> >> >> ci,
>>> >>> >> >> >> >> >> >> true);
>>> >>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
>>> >>> >> >> >> >> >> >> >> at.getSentences();
>>> >>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as
>>> single
>>> >>> >> >> sentence
>>> >>> >> >> >> >> >> >> >>             sections =
>>> >>> >> Collections.singleton(at).iterator();
>>> >>> >> >> >> >> >> >> >>         }
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>         while(sections.hasNext()){
>>> >>> >> >> >> >> >> >> >>             Section section = sections.next();
>>> >>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
>>> >>> >> >> >> >> >> >> >>
>>> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>>> >>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
>>> >>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
>>> >>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
>>> >>> >> >> >> >> >> >> >>
>>> >>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>>> >>> >> >> >> >> >> >> >>                 if(phrase.value().getCategory()
>>> ==
>>> >>> >> >> >> >> >> >> LexicalCategory.Noun){
>>> >>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase
>>> [{},{}]
>>> >>> {}",
>>> >>> >> >> new
>>> >>> >> >> >> >> >> Object[]{
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>
>>> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>>> >>> >> >> >> >> >> >> >>                 }
>>> >>> >> >> >> >> >> >> >>             }
>>> >>> >> >> >> >> >> >> >>         }
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> hope this helps
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> best
>>> >>> >> >> >> >> >> >> >> Rupert
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> [1]
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian
>>> Petroaca
>>> >>> >> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>>> >>> >> >> >> >> >> >> >> > I started to implement the engine and I'm
>>> having
>>> >>> >> problems
>>> >>> >> >> >> >> >> >> >> > with
>>> >>> >> >> >> >> >> getting
>>> >>> >> >> >> >> >> >> >> > results for noun phrases. I modified the
>>> "default"
>>> >>> >> >> weighted
>>> >>> >> >> >> >> chain
>>> >>> >> >> >> >> >> to
>>> >>> >> >> >> >> >> >> also
>>> >>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample
>>> text
>>> >>> :
>>> >>> >> >> "Angela
>>> >>> >> >> >> >> Merkel
>>> >>> >> >> >> >> >> >> >> visted
>>> >>> >> >> >> >> >> >> >> > China. The german chancellor met with various
>>> >>> people".
>>> >>> >> I
>>> >>> >> >> >> >> expected
>>> >>> >> >> >> >> >> that
>>> >>> >> >> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> > RDF XML output would contain some info about
>>> the
>>> >>> noun
>>> >>> >> >> >> >> >> >> >> > phrases
>>> >>> >> >> >> >> but I
>>> >>> >> >> >> >> >> >> >> cannot
>>> >>> >> >> >> >> >> >> >> > see any.
>>> >>> >> >> >> >> >> >> >> > Could you point me to the correct way to
>>> generate
>>> >>> the
>>> >>> >> noun
>>> >>> >> >> >> >> phrases?
>>> >>> >> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >> > Thanks,
>>> >>> >> >> >> >> >> >> >> > Cristian
>>> >>> >> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
>>> >>> >> >> >> >> >> >> >> >
>>> >>> >> >> >> >> >> >> >> >> Opened
>>> >>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
>>> >>> >> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca
>>> <
>>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>
>>> >>> >> >> >> >> >> >> >> >> :
>>> >>> >> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> >> Hi Rupert,
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea.
>>> I'll also
>>> >>> >> take a
>>> >>> >> >> >> >> >> >> >> >>> look
>>> >>> >> >> >> >> at
>>> >>> >> >> >> >> >> >> Yago.
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked
>>> about
>>> >>> here.
>>> >>> >> It
>>> >>> >> >> >> >> >> >> >> >>> will
>>> >>> >> >> >> >> >> >> probably
>>> >>> >> >> >> >> >> >> >> >>> have just a draft-like description for now
>>> and
>>> >>> will
>>> >>> >> be
>>> >>> >> >> >> >> >> >> >> >>> updated
>>> >>> >> >> >> >> >> as I
>>> >>> >> >> >> >> >> >> go
>>> >>> >> >> >> >> >> >> >> >>> along.
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>> Thanks,
>>> >>> >> >> >> >> >> >> >> >>> Cristian
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert
>>> Westenthaler <
>>> >>> >> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>> Hi Cristian,
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You
>>> should
>>> >>> have
>>> >>> >> a
>>> >>> >> >> >> >> >> >> >> >>>> look at
>>> >>> >> >> >> >> >> Yago2
>>> >>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago
>>> taxonomy
>>> >>> is
>>> >>> >> much
>>> >>> >> >> >> >> better
>>> >>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia.
>>> Mapping
>>> >>> >> >> >> >> >> >> >> >>>> suggestions of
>>> >>> >> >> >> >> >> >> dbpedia
>>> >>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both
>>> dbpedia and
>>> >>> >> yago2
>>> >>> >> >> do
>>> >>> >> >> >> >> >> provide
>>> >>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
>>> >>> >> Redmond's
>>> >>> >> >> >> >> >> >> >> >>>> >> company
>>> >>> >> >> >> >> >> made
>>> >>> >> >> >> >> >> >> a
>>> >>> >> >> >> >> >> >> >> >>>> >> huge profit".
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial
>>> >>> contexts
>>> >>> >> >> are
>>> >>> >> >> >> >> >> >> >> >>>> very
>>> >>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for
>>> >>> >> >> referencing.
>>> >>> >> >> >> >> >> >> >> >>>> So I
>>> >>> >> >> >> >> >> would
>>> >>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial
>>> context.
>>> >>> For
>>> >>> >> >> >> >> >> >> >> >>>> spatial
>>> >>> >> >> >> >> >> >> Entities
>>> >>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for
>>> other
>>> >>> >> (like a
>>> >>> >> >> >> >> Person,
>>> >>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial
>>> >>> entities
>>> >>> >> >> >> >> >> >> >> >>>> define
>>> >>> >> >> >> >> >> their
>>> >>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be
>>> >>> used to
>>> >>> >> >> >> >> >> >> >> >>>> correctly
>>> >>> >> >> >> >> >> link
>>> >>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the
>>> "spatial"
>>> >>> >> >> context
>>> >>> >> >> >> >> >> >> >> >>>> of
>>> >>> >> >> >> >> each
>>> >>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities
>>> that are
>>> >>> >> cities,
>>> >>> >> >> >> >> regions,
>>> >>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because
>>> >>> those
>>> >>> >> are
>>> >>> >> >> >> >> >> >> >> >>>> very
>>> >>> >> >> >> >> often
>>> >>> >> >> >> >> >> >> used
>>> >>> >> >> >> >> >> >> >> >>>> for coreferences.
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> [1]
>>> http://www.mpi-inf.mpg.de/yago-naga/yago/
>>> >>> >> >> >> >> >> >> >> >>>> [2]
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>>> >>> >> >> >> >> >> >> >> >>>> [3]
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
>>> >>> Petroaca
>>> >>> >> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>>> >>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for
>>> each
>>> >>> >> entity,
>>> >>> >> >> >> >> >> >> >> >>>> > in
>>> >>> >> >> >> >> this
>>> >>> >> >> >> >> >> >> case
>>> >>> >> >> >> >> >> >> >> for
>>> >>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> category:Companies_in_the_NASDAQ-100_Index
>>> >>> >> >> >> >> >> >> >> >>>> > category:Microsoft
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> category:Software_companies_of_the_United_States
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> category:Software_companies_based_in_Washington_(state)
>>> >>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> category:1975_establishments_in_the_United_States
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> category:Companies_based_in_Redmond,_Washington
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >>
>>> category:Multinational_companies_headquartered_in_the_United_States
>>> >>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
>>> >>> >> >> Redmont,Washington"
>>> >>> >> >> >> >> which
>>> >>> >> >> >> >> >> >> could
>>> >>> >> >> >> >> >> >> >> be
>>> >>> >> >> >> >> >> >> >> >>>> > matched.
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > There is still other contextual
>>> information
>>> >>> from
>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia
>>> >>> >> >> >> >> which
>>> >>> >> >> >> >> >> >> can
>>> >>> >> >> >> >> >> >> >> be
>>> >>> >> >> >> >> >> >> >> >>>> used.
>>> >>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could
>>> also
>>> >>> >> >> include :
>>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
>>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service
>>> Providers
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack
>>> Obama) :
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>>> >>> >> >> >> >> >> >> >> >>>> >
>>>  dbpedia:Author
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>>> >>> >> >> >> >> >> >> >> >>>> >
>>>  dbpedia:Lawyer
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this
>>> as I
>>> >>> think
>>> >>> >> >> that
>>> >>> >> >> >> >> >> >> >> >>>> > it
>>> >>> >> >> >> >> may
>>> >>> >> >> >> >> >> >> have
>>> >>> >> >> >> >> >> >> >> >>>> some
>>> >>> >> >> >> >> >> >> >> >>>> > value in increasing the number of
>>> coreference
>>> >>> >> >> >> >> >> >> >> >>>> > resolutions
>>> >>> >> >> >> >> and
>>> >>> >> >> >> >> >> I'd
>>> >>> >> >> >> >> >> >> >> like
>>> >>> >> >> >> >> >> >> >> >>>> to
>>> >>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than
>>> >>> recall
>>> >>> >> >> since
>>> >>> >> >> >> >> >> >> >> >>>> > we
>>> >>> >> >> >> >> >> already
>>> >>> >> >> >> >> >> >> >> have
>>> >>> >> >> >> >> >> >> >> >>>> a
>>> >>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the
>>> stanford
>>> >>> nlp
>>> >>> >> tool
>>> >>> >> >> >> >> >> >> >> >>>> > and
>>> >>> >> >> >> >> this
>>> >>> >> >> >> >> >> >> would
>>> >>> >> >> >> >> >> >> >> >>>> be as
>>> >>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is
>>> how I
>>> >>> would
>>> >>> >> >> like
>>> >>> >> >> >> >> >> >> >> >>>> > to
>>> >>> >> >> >> >> use
>>> >>> >> >> >> >> >> >> it).
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a
>>> jira? I
>>> >>> >> could
>>> >>> >> >> >> >> >> >> >> >>>> > update
>>> >>> >> >> >> >> it
>>> >>> >> >> >> >> >> to
>>> >>> >> >> >> >> >> >> >> show
>>> >>> >> >> >> >> >> >> >> >>>> my
>>> >>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if
>>> it
>>> >>> turns
>>> >>> >> out
>>> >>> >> >> >> >> >> >> >> >>>> > that
>>> >>> >> >> >> >> it
>>> >>> >> >> >> >> >> was
>>> >>> >> >> >> >> >> >> a
>>> >>> >> >> >> >> >> >> >> bad
>>> >>> >> >> >> >> >> >> >> >>>> idea
>>> >>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll
>>> end up
>>> >>> >> with
>>> >>> >> >> >> >> >> >> >> >>>> > more
>>> >>> >> >> >> >> >> >> knowledge
>>> >>> >> >> >> >> >> >> >> >>>> about
>>> >>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>>> >>> >> >> >> >> >> >> >> >>>> >
>>> >>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want
>>> to be
>>> >>> the
>>> >>> >> >> >> >> >> >> >> >>>> >> devil's
>>> >>> >> >> >> >> >> >> advocate
>>> >>> >> >> >> >> >> >> >> but
>>> >>> >> >> >> >> >> >> >> >>>> I'm
>>> >>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the
>>> >>> dbpedia
>>> >>> >> >> >> >> categories
>>> >>> >> >> >> >> >> >> >> feature.
>>> >>> >> >> >> >> >> >> >> >>>> For
>>> >>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
>>> >>> "Microsoft
>>> >>> >> >> posted
>>> >>> >> >> >> >> >> >> >> >>>> >> its
>>> >>> >> >> >> >> >> 2013
>>> >>> >> >> >> >> >> >> >> >>>> earnings.
>>> >>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge
>>> profit".
>>> >>> So,
>>> >>> >> maybe
>>> >>> >> >> >> >> >> including
>>> >>> >> >> >> >> >> >> more
>>> >>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia
>>> could
>>> >>> >> increase
>>> >>> >> >> the
>>> >>> >> >> >> >> recall
>>> >>> >> >> >> >> >> >> but
>>> >>> >> >> >> >> >> >> >> of
>>> >>> >> >> >> >> >> >> >> >>>> course
>>> >>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>> >> Cheers,
>>> >>> >> >> >> >> >> >> >> >>>> >> Rafa
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
>>> >>> escribió:
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description
>>> of the
>>> >>> >> steps
>>> >>> >> >> >> >> >> >> >> >>>> >> for
>>> >>> >> >> >> >> >> making
>>> >>> >> >> >> >> >> >> this
>>> >>> >> >> >> >> >> >> >> >>>> kind of
>>> >>> >> >> >> >> >> >> >> >>>> >>> coreference work.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the
>>> following
>>> >>> >> text in
>>> >>> >> >> >> >> >> >> >> >>>> >>> the
>>> >>> >> >> >> >> >> steps
>>> >>> >> >> >> >> >> >> >> below
>>> >>> >> >> >> >> >> >> >> >>>> in
>>> >>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer :
>>> "Microsoft
>>> >>> posted
>>> >>> >> >> its
>>> >>> >> >> >> >> >> >> >> >>>> >>> 2013
>>> >>> >> >> >> >> >> >> >> earnings.
>>> >>> >> >> >> >> >> >> >> >>>> The
>>> >>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text
>>> which
>>> >>> has :
>>> >>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
>>> >>> >> reference
>>> >>> >> >> to
>>> >>> >> >> >> >> >> >> >> >>>> >>> an
>>> >>> >> >> >> >> >> entity
>>> >>> >> >> >> >> >> >> >> local
>>> >>> >> >> >> >> >> >> >> >>>> to
>>> >>> >> >> >> >> >> >> >> >>>> >>> the
>>> >>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but
>>> not
>>> >>> >> "another,
>>> >>> >> >> >> >> every",
>>> >>> >> >> >> >> >> etc
>>> >>> >> >> >> >> >> >> >> which
>>> >>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity
>>> outside of
>>> >>> the
>>> >>> >> >> text.
>>> >>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun
>>> aside
>>> >>> from
>>> >>> >> the
>>> >>> >> >> >> >> >> >> >> >>>> >>> main
>>> >>> >> >> >> >> >> >> required
>>> >>> >> >> >> >> >> >> >> >>>> noun
>>> >>> >> >> >> >> >> >> >> >>>> >>> which
>>> >>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I
>>> will not
>>> >>> >> count
>>> >>> >> >> >> >> >> >> >> >>>> >>> "The
>>> >>> >> >> >> >> >> >> company"
>>> >>> >> >> >> >> >> >> >> as
>>> >>> >> >> >> >> >> >> >> >>>> being
>>> >>> >> >> >> >> >> >> >> >>>> >>> a
>>> >>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
>>> >>> create a
>>> >>> >> lot
>>> >>> >> >> of
>>> >>> >> >> >> >> false
>>> >>> >> >> >> >> >> >> >> >>>> positives by
>>> >>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some
>>> words
>>> >>> >> such
>>> >>> >> >> as
>>> >>> >> >> >> >> >> >> >> >>>> >>> "in
>>> >>> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> company
>>> >>> >> >> >> >> >> >> >> >>>> of
>>> >>> >> >> >> >> >> >> >> >>>> >>> good people".
>>> >>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good
>>> candidate
>>> >>> >> since we
>>> >>> >> >> >> >> >> >> >> >>>> >>> also
>>> >>> >> >> >> >> >> have
>>> >>> >> >> >> >> >> >> >> >>>> "software".
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase
>>> to the
>>> >>> >> >> contents
>>> >>> >> >> >> >> >> >> >> >>>> >>> of
>>> >>> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> dbpedia
>>> >>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found
>>> prior
>>> >>> to
>>> >>> >> the
>>> >>> >> >> >> >> location
>>> >>> >> >> >> >> >> of
>>> >>> >> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> >>>> noun
>>> >>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
>>> >>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the
>>> following
>>> >>> >> format
>>> >>> >> >> >> >> >> >> >> >>>> >>> (for
>>> >>> >> >> >> >> >> >> Microsoft
>>> >>> >> >> >> >> >> >> >> for
>>> >>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the
>>> United
>>> >>> >> >> States".
>>> >>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software
>>> company" with
>>> >>> >> that.
>>> >>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun
>>> in the
>>> >>> >> dbpedia
>>> >>> >> >> >> >> category
>>> >>> >> >> >> >> >> >> has a
>>> >>> >> >> >> >> >> >> >> >>>> plural
>>> >>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all
>>> categories
>>> >>> which
>>> >>> >> I
>>> >>> >> >> >> >> >> >> >> >>>> >>> saw. I
>>> >>> >> >> >> >> >> don't
>>> >>> >> >> >> >> >> >> >> know
>>> >>> >> >> >> >> >> >> >> >>>> if
>>> >>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
>>> >>> thought
>>> >>> >> of
>>> >>> >> >> >> >> applying a
>>> >>> >> >> >> >> >> >> >> >>>> lemmatizer on
>>> >>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in
>>> order
>>> >>> for
>>> >>> >> them
>>> >>> >> >> to
>>> >>> >> >> >> >> have a
>>> >>> >> >> >> >> >> >> >> common
>>> >>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun
>>> >>> phrase
>>> >>> >> >> itself
>>> >>> >> >> >> >> has a
>>> >>> >> >> >> >> >> >> plural
>>> >>> >> >> >> >> >> >> >> >>>> form.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison
>>> >>> only the
>>> >>> >> >> >> >> >> >> >> >>>> >>> words in
>>> >>> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> >>>> category
>>> >>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
>>> >>> prepositions
>>> >>> >> or
>>> >>> >> >> >> >> >> determiners
>>> >>> >> >> >> >> >> >> >> such
>>> >>> >> >> >> >> >> >> >> >>>> as "of
>>> >>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag
>>> the
>>> >>> >> >> categories
>>> >>> >> >> >> >> >> contents
>>> >>> >> >> >> >> >> >> as
>>> >>> >> >> >> >> >> >> >> >>>> well.
>>> >>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and
>>> lemma
>>> >>> on
>>> >>> >> the
>>> >>> >> >> >> >> dbpedia
>>> >>> >> >> >> >> >> >> >> >>>> categories when
>>> >>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub
>>> and
>>> >>> >> storing
>>> >>> >> >> >> >> >> >> >> >>>> >>> them
>>> >>> >> >> >> >> for
>>> >>> >> >> >> >> >> >> later
>>> >>> >> >> >> >> >> >> >> >>>> use - I
>>> >>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
>>> >>> moment.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in
>>> the
>>> >>> noun
>>> >>> >> >> phrase
>>> >>> >> >> >> >> with
>>> >>> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> >>>> equivalent
>>> >>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on
>>> the
>>> >>> number
>>> >>> >> of
>>> >>> >> >> >> >> matches I
>>> >>> >> >> >> >> >> >> can
>>> >>> >> >> >> >> >> >> >> >>>> create a
>>> >>> >> >> >> >> >> >> >> >>>> >>> confidence level.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase
>>> with
>>> >>> the
>>> >>> >> >> >> >> >> >> >> >>>> >>> rdf:type
>>> >>> >> >> >> >> from
>>> >>> >> >> >> >> >> >> >> dbpedia
>>> >>> >> >> >> >> >> >> >> >>>> of the
>>> >>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase
>>> the
>>> >>> >> >> confidence
>>> >>> >> >> >> >> level.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities
>>> >>> which
>>> >>> >> can
>>> >>> >> >> >> >> >> >> >> >>>> >>> match a
>>> >>> >> >> >> >> >> >> certain
>>> >>> >> >> >> >> >> >> >> >>>> noun
>>> >>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with
>>> the
>>> >>> >> closest
>>> >>> >> >> >> >> >> >> >> >>>> >>> named
>>> >>> >> >> >> >> >> entity
>>> >>> >> >> >> >> >> >> >> prior
>>> >>> >> >> >> >> >> >> >> >>>> to it
>>> >>> >> >> >> >> >> >> >> >>>> >>> in the text.
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> What do you think?
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> Cristian
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>>> >>> >> >> >> >> cristian.petroaca@gmail.com>:
>>> >>> >> >> >> >> >> >> >> >>>> >>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic
>>> but
>>> >>> I'm
>>> >>> >> >> >> >> >> >> >> >>>> >>>> working on
>>> >>> >> >> >> >> >> it.
>>> >>> >> >> >> >> >> >> I'll
>>> >>> >> >> >> >> >> >> >> >>>> provide
>>> >>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
>>> >>> >> feedback on
>>> >>> >> >> >> >> >> >> >> >>>> >>>> it.
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools
>>> >>> such as
>>> >>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
>>> >>> >> >> >> >> and
>>> >>> >> >> >> >> >> >> >> >>>> CherryPicker
>>> >>> >> >> >> >> >> >> >> >>>> >>>> and
>>> >>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> Cristian
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <
>>> rharo@apache.org>:
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
>>> >>> >> >> >> >> >> >> >> >>>> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about
>>> your
>>> >>> >> concrete
>>> >>> >> >> >> >> heuristic,
>>> >>> >> >> >> >> >> >> in my
>>> >>> >> >> >> >> >> >> >> >>>> honest
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce
>>> a
>>> >>> lot of
>>> >>> >> >> false
>>> >>> >> >> >> >> >> >> positives. I
>>> >>> >> >> >> >> >> >> >> >>>> don't
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> know
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some
>>> "locality"
>>> >>> >> >> features
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> to
>>> >>> >> >> >> >> >> detect
>>> >>> >> >> >> >> >> >> >> such
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take
>>> into
>>> >>> account
>>> >>> >> >> that
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> it
>>> >>> >> >> >> >> is
>>> >>> >> >> >> >> >> >> quite
>>> >>> >> >> >> >> >> >> >> >>>> usual
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> that
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs
>>> even in
>>> >>> >> >> different
>>> >>> >> >> >> >> >> >> paragraphs.
>>> >>> >> >> >> >> >> >> >> >>>> Although
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> I'm
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
>>> >>> >> Understanding,
>>> >>> >> >> I
>>> >>> >> >> >> >> would
>>> >>> >> >> >> >> >> say
>>> >>> >> >> >> >> >> >> it
>>> >>> >> >> >> >> >> >> >> is
>>> >>> >> >> >> >> >> >> >> >>>> quite
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent
>>> precision/recall
>>> >>> rates
>>> >>> >> >> for
>>> >>> >> >> >> >> >> >> coreferencing
>>> >>> >> >> >> >> >> >> >> >>>> using
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a
>>> try to
>>> >>> >> others
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> tools
>>> >>> >> >> >> >> like
>>> >>> >> >> >> >> >> >> BART
>>> >>> >> >> >> >> >> >> >> (
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
>>> >>> escribió:
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
>>> >>> implementing
>>> >>> >> the
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Event
>>> >>> >> >> >> >> >> >> >> extraction
>>> >>> >> >> >> >> >> >> >> >>>> Engine
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
>>> >>> >> >> >> >> >> >>
>>> https://issues.apache.org/jira/browse/STANBOL-1121is
>>> >>> >> >> >> >> >> >> >> >>>> to
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> have
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given
>>> text.
>>> >>> >> This
>>> >>> >> >> is
>>> >>> >> >> >> >> >> provided
>>> >>> >> >> >> >> >> >> now
>>> >>> >> >> >> >> >> >> >> >>>> via the
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as
>>> I saw
>>> >>> this
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> module
>>> >>> >> >> >> >> is
>>> >>> >> >> >> >> >> >> >> performing
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal
>>> (Barack
>>> >>> Obama
>>> >>> >> and
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
>>> >>> >> >> >> >> >> Obama)
>>> >>> >> >> >> >> >> >> >> >>>> coreference
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences
>>> from
>>> >>> the
>>> >>> >> text
>>> >>> >> >> I
>>> >>> >> >> >> >> though
>>> >>> >> >> >> >> >> of
>>> >>> >> >> >> >> >> >> >> >>>> creating
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> some
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
>>> >>> >> >> coreference :
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights.
>>> The
>>> >>> >> software
>>> >>> >> >> >> >> company
>>> >>> >> >> >> >> >> just
>>> >>> >> >> >> >> >> >> >> >>>> announced
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> its
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company"
>>> obviously
>>> >>> refers
>>> >>> >> to
>>> >>> >> >> >> >> "Apple".
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences
>>> of
>>> >>> Named
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
>>> >>> >> >> >> >> >> which
>>> >>> >> >> >> >> >> >> are
>>> >>> >> >> >> >> >> >> >> of
>>> >>> >> >> >> >> >> >> >> >>>> the
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in
>>> this
>>> >>> case
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
>>> >>> >> >> >> >> and
>>> >>> >> >> >> >> >> >> also
>>> >>> >> >> >> >> >> >> >> >>>> have
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the
>>> >>> dbpedia
>>> >>> >> >> >> >> categories
>>> >>> >> >> >> >> >> of
>>> >>> >> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> >>>> named
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such
>>> as
>>> >>> "The
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> software
>>> >>> >> >> >> >> >> >> company" in
>>> >>> >> >> >> >> >> >> >> >>>> the
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> text
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using
>>> the
>>> >>> new
>>> >>> >> Pos
>>> >>> >> >> Tag
>>> >>> >> >> >> >> Based
>>> >>> >> >> >> >> >> >> Phrase
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
>>> >>> >> dependency
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
>>> >>> >> >> >> >> >> the
>>> >>> >> >> >> >> >> >> >> >>>> sentence and
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if
>>> this
>>> >>> kind
>>> >>> >> of
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic
>>> >>> >> >> >> >> >> would
>>> >>> >> >> >> >> >> >> be
>>> >>> >> >> >> >> >> >> >> >>>> useful
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> as a
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in
>>> case the
>>> >>> >> >> precision
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> and
>>> >>> >> >> >> >> >> >> recall
>>> >>> >> >> >> >> >> >> >> are
>>> >>> >> >> >> >> >> >> >> >>>> good
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >>> >> >> >> >> >> >> >> >>>> >>
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>> --
>>> >>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
>>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>>> >>> >> >> >> >> >> >> ++43-699-11108907
>>> >>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>>> >>> >> >> >> >> >> >> >> >>>>
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>>
>>> >>> >> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> >> --
>>> >>> >> >> >> >> >> >> >> | Rupert Westenthaler
>>> >>> >> >> >> >> >> >> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
>>> >>> >> >> >> >> ++43-699-11108907
>>> >>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
>>> >>> >> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >> >> --
>>> >>> >> >> >> >> >> >> | Rupert Westenthaler
>>> >>> >> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >> >> | Bodenlehenstraße 11
>>> >>> >> >> >> >> >> >> ++43-699-11108907
>>> >>> >> >> >> >> >> >> | A-5500 Bischofshofen
>>> >>> >> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >> >> --
>>> >>> >> >> >> >> >> | Rupert Westenthaler
>>> >>> >> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> >> | Bodenlehenstraße 11
>>> >>> >> >> ++43-699-11108907
>>> >>> >> >> >> >> >> | A-5500 Bischofshofen
>>> >>> >> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >>
>>> >>> >> >> >> >> --
>>> >>> >> >> >> >> | Rupert Westenthaler
>>> >>> rupert.westenthaler@gmail.com
>>> >>> >> >> >> >> | Bodenlehenstraße 11
>>> >>> >> ++43-699-11108907
>>> >>> >> >> >> >> | A-5500 Bischofshofen
>>> >>> >> >> >> >>
>>> >>> >> >> >>
>>> >>> >> >> >>
>>> >>> >> >> >>
>>> >>> >> >> >> --
>>> >>> >> >> >> | Rupert Westenthaler
>>> rupert.westenthaler@gmail.com
>>> >>> >> >> >> | Bodenlehenstraße 11
>>> >>> ++43-699-11108907
>>> >>> >> >> >> | A-5500 Bischofshofen
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> --
>>> >>> >> >> | Rupert Westenthaler
>>> rupert.westenthaler@gmail.com
>>> >>> >> >> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >>> >> >> | A-5500 Bischofshofen
>>> >>> >> >>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >>> >> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >>> >> | A-5500 Bischofshofen
>>> >>> >>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >>> | A-5500 Bischofshofen
>>> >>>
>>> >>
>>> >>
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
Hi,

I've started to implement the dbpedia properties logic and I'd like to get
some feedback on some things that I am doing :
I want to get a NER from the text and search for it in the dbpedia data so
that I can get certain dbpedia properties.
The way I'm trying to do this is by getting the NER_ANNOTATION chunk's text
and search that in the Entityhub ( which from what I saw is by default
configured with dbpedia data). I haven't yet performed a query to actually
get the data but before I continue I'd like to ask if this is the way to go?

Thanks,
Cristian


2014-03-28 15:12 GMT+02:00 Cristian Petroaca <cr...@gmail.com>:

> Examples :
>
> 1. Group membership :
>     a. Spatial membership :
>
>         "Microsoft anounced its 2013 earnings. <coref>The Richmond-based
> company</coref> made huge profits."
>
>     b. Organisational membership :
>
>        "Mick Jagger started a new solo album. <coref>The Rolling Stones
> singer</coref> did not say what the theme will be."
>
> 2. Functional membership :
>
>    "Allianz announced its 2013 earnings. <coref>The financial services
> company</coref> made a huge profit."
>
> 3.  If no matches were found for the current NER with rules from above
> then if the yago:class which matched has more than 2 nouns then we also
> consider this a good co-reference but with a lower confidence maybe.
>
>    "Boris Becker will take part in a demonstrative tennis match.
> <coref>The former tennis player</coref> will play again after 10 years."
>
>
> 2014-03-28 12:22 GMT+02:00 Rupert Westenthaler <
> rupert.westenthaler@gmail.com>:
>
>> Hi Cristian, all
>>
>> Looks good to me, nut I am not sure if I got everything. If you could
>> provide example texts where those rules apply it would make it much
>> easier to understand.
>>
>> Instead of using dbpedia properties you should define your own domain
>> model (ontology). You can than align the dbpedia properties to your
>> model. This will allow it to apply this approach also to knowledge
>> bases other than dbpedia.
>>
>> For people new to this thread: The above message adds to the
>> suggestion first made by Cristian on 4th February. Also the following
>> 4 messages (until 7th Feb) provide additional context.
>>
>> best
>> Rupert
>>
>>
>> On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > Hi guys,
>> >
>> > After Rupert's last suggestions related to this enhancement engine I
>> > devised a more comprehensive algorithm for matching the noun phrases
>> > against the NER properties.Please take a look and let me know what you
>> > think. Thanks.
>> >
>> > The following rules will be applied to every noun phrase in order to
>> find
>> > co-references:
>> >
>> > 1. For each NER prior to the current noun phrase in the text match the
>> > yago:class label to the contents of the noun phrase.
>> >
>> > For the NERs which have a yago:class which matches, apply:
>> >
>> > 2. Group membership rules :
>> >
>> >     a. spatial membership : the NER is part of a Location. If the noun
>> > phrase contains a LOCATION or a demonym then check any location
>> properties
>> > of the matching NER.
>> >
>> >     If matching NER is a :
>> >     - person, match against :birthPlace, :region, :nationality
>> >     - organisation, match against :foundationPlace, :locationCity,
>> > :location, :hometown
>> >     - place, match against :country, :subdivisionName, :location,
>> >
>> >     Ex: The Italian President, The Richmond-based company
>> >
>> >     b. organisational membership : the NER is part of an Organisation.
>> If
>> > the noun phrase contains an ORGANISATION then check the following
>> > properties of the maching NER:
>> >
>> >     If matching NER is :
>> >     - person, match against :occupation, :associatedActs
>> >     - organisation ?
>> >     - location ?
>> >
>> > Ex: The Microsoft executive, The Pink Floyd singer
>> >
>> > 3. Functional description rule: the noun phrase describes what the NER
>> does
>> > conceptually.
>> > If there are no NERs in the noun phrase then match the following
>> properties
>> > of the matching NER to the contents of the noun phrase (aside from the
>> > nouns which are part of the yago:class) :
>> >
>> >    If NER is a:
>> >    - person ?
>> >    - organisation : , match against :service, :industry, :genre
>> >    - location ?
>> >
>> > Ex:  The software company.
>> >
>> > 4. If no matches were found for the current NER with rules 2 or 3 then
>> if
>> > the yago:class which matched has more than 2 nouns then we also consider
>> > this a good co-reference but with a lower confidence maybe.
>> >
>> > Ex: The former tennis player, the theoretical physicist.
>> >
>> > 5. Based on the number of nouns which matched we create a confidence
>> level.
>> > The number of matched nouns cannot be lower than 2 and we must have a
>> > yago:class match.
>> >
>> > For all NERs which got to this point, select the closest ones in the
>> text
>> > to the noun phrase which matched against the same properties (yago:class
>> > and dbpedia) and mark them as co-references.
>> >
>> > Note: all noun phrases need to be lemmatized before all of this in case
>> > there are any plurals.
>> >
>> >
>> > 2014-03-25 20:50 GMT+02:00 Cristian Petroaca <
>> cristian.petroaca@gmail.com>:
>> >
>> >> That worked. Thanks.
>> >>
>> >> So, there are no exceptions during the startup of the launcher.
>> >> The component tab in the felix console shows 6 WeightedChains the first
>> >> time, including the default one but after my changes and a restart
>> there
>> >> are only 5 - the default one is missing altogether.
>> >>
>> >>
>> >> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
>> >> rupert.westenthaler@gmail.com>:
>> >>
>> >> Hi Cristian,
>> >>>
>> >>> I do see the same problem since last Friday. The solution as mentions
>> >>> by [1] works for me.
>> >>>
>> >>>     mvn -Djsse.enableSNIExtension=false {goals}
>> >>>
>> >>> No Idea why https connections to github do currently cause this. I
>> >>> could not find anything related via Google. So I suggest to use the
>> >>> system property for now. If this persists for longer we can adapt the
>> >>> build files accordingly.
>> >>>
>> >>> best
>> >>> Rupert
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> [1]
>> >>>
>> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
>> >>>
>> >>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
>> >>> <cr...@gmail.com> wrote:
>> >>> > I did a clean on the whole project and now I wanted to do another
>> "mvn
>> >>> > clean install" but I am getting this :
>> >>> >
>> >>> > "[INFO]
>> >>> >
>> ------------------------------------------------------------------------
>> >>> > [ERROR] Failed to execute goal
>> >>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
>> >>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es:
>> An
>> >>> Ant
>> >>> > BuildE
>> >>> > xception has occured: The following error occurred while executing
>> this
>> >>> > line:
>> >>> > [ERROR]
>> >>> >
>> C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
>> >>> > 3: Failed to copy
>> >>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
>> >>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin
>> to
>> >>> > C:\Data\Pr
>> >>> >
>> >>>
>> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
>> >>> > data\opennlp\es-pos-maxent.bin due to
>> javax.net.ssl.SSLProtocolException
>> >>> > handshake alert : unrecognized_name"
>> >>> >
>> >>> >
>> >>> >
>> >>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
>> >>> > rupert.westenthaler@gmail.com>:
>> >>> >
>> >>> >> Hi Cristian,
>> >>> >>
>> >>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
>> >>> >> <cr...@gmail.com> wrote:
>> >>> >> >
>> >>> >>
>> >>>
>> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
>> >>> >> > service.ranking=I"-2147483648"
>> >>> >> > stanbol.enhancer.chain.name="default"
>> >>> >>
>> >>> >> Does look fine to me. Do you see any exception during the startup
>> of
>> >>> >> the launcher. Can you check the status of this component in the
>> >>> >> component tab of the felix web console [1] (search for
>> >>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain").
>> If
>> >>> >> you have multiple you can find the correct one by comparing the
>> >>> >> "Properties" with those in the configuration file.
>> >>> >>
>> >>> >> I guess that the according service is in the 'unsatisfied' as you
>> do
>> >>> >> not see it in the web interface. But if this is the case you should
>> >>> >> also see the according exception in the log. You can also manually
>> >>> >> stop/start the component. In this case the exception should be
>> >>> >> re-thrown and you do not need to search the log for it.
>> >>> >>
>> >>> >> best
>> >>> >> Rupert
>> >>> >>
>> >>> >>
>> >>> >> [1] http://localhost:8080/system/console/components
>> >>> >>
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
>> >>> >> rupert.westenthaler@gmail.com
>> >>> >> >>:
>> >>> >> >
>> >>> >> >> Hi Cristian,
>> >>> >> >>
>> >>> >> >> you can not send attachments to the list. Please copy the
>> contents
>> >>> >> >> directly to the mail
>> >>> >> >>
>> >>> >> >> thx
>> >>> >> >> Rupert
>> >>> >> >>
>> >>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>> >>> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> > The config attached.
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>> >>> >> >> > <ru...@gmail.com>:
>> >>> >> >> >
>> >>> >> >> >> Hi Cristian,
>> >>> >> >> >>
>> >>> >> >> >> can you provide the contents of the chain after your
>> >>> modifications?
>> >>> >> >> >> Would be interesting to test why the chain is no longer
>> active
>> >>> after
>> >>> >> >> >> the restart.
>> >>> >> >> >>
>> >>> >> >> >> You can find the config file in the 'stanbol/fileinstall'
>> folder.
>> >>> >> >> >>
>> >>> >> >> >> best
>> >>> >> >> >> Rupert
>> >>> >> >> >>
>> >>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>> >>> >> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> >> > Related to the default chain selection rules : before
>> restart I
>> >>> >> had a
>> >>> >> >> >> > chain
>> >>> >> >> >> > with the name 'default' as in I could access it via
>> >>> >> >> >> > enhancer/chain/default.
>> >>> >> >> >> > Then I just added another engine to the 'default' chain. I
>> >>> assumed
>> >>> >> >> that
>> >>> >> >> >> > after the restart the chain with the 'default' name would
>> be
>> >>> >> >> persisted.
>> >>> >> >> >> > So
>> >>> >> >> >> > the first rule should have been applied after the restart
>> as
>> >>> well.
>> >>> >> But
>> >>> >> >> >> > instead I cannot reach it via enhancer/chain/default
>> anymore
>> >>> so its
>> >>> >> >> >> > gone.
>> >>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in any
>> >>> way, I
>> >>> >> >> just
>> >>> >> >> >> > wanted to understand where the problem is.
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>> >>> >> >> >> > <rupert.westenthaler@gmail.com
>> >>> >> >> >> >>:
>> >>> >> >> >> >
>> >>> >> >> >> >> Hi Cristian
>> >>> >> >> >> >>
>> >>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>> >>> >> >> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
>> >>> >> >> >> >> >
>> >>> >> >> >> >> > 2. I start the stable launcher -> create a new instance
>> of
>> >>> the
>> >>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At this
>> >>> point
>> >>> >> >> >> >> > everything
>> >>> >> >> >> >> > looks good and works ok.
>> >>> >> >> >> >> > After I restart the server the default chain is gone and
>> >>> >> instead I
>> >>> >> >> >> >> > see
>> >>> >> >> >> >> this
>> >>> >> >> >> >> > in the enhancement chains page : all-active (default,
>> id:
>> >>> 149,
>> >>> >> >> >> >> > ranking:
>> >>> >> >> >> >> 0,
>> >>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not
>> contain
>> >>> the
>> >>> >> >> >> >> > 'default'
>> >>> >> >> >> >> > word before the restart.
>> >>> >> >> >> >> >
>> >>> >> >> >> >>
>> >>> >> >> >> >> Please note the default chain selection rules as
>> described at
>> >>> [1].
>> >>> >> >> You
>> >>> >> >> >> >> can also access chains chains under
>> >>> '/enhancer/chain/{chain-name}'
>> >>> >> >> >> >>
>> >>> >> >> >> >> best
>> >>> >> >> >> >> Rupert
>> >>> >> >> >> >>
>> >>> >> >> >> >> [1]
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>> >>> >> >> >> >>
>> >>> >> >> >> >> > It looks like the config files are exactly what I need.
>> >>> Thanks.
>> >>> >> >> >> >> >
>> >>> >> >> >> >> >
>> >>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >>:
>> >>> >> >> >> >> >
>> >>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> >>> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> >> >> >> > Thanks Rupert.
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> > A couple more questions/issues :
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing
>> this
>> >>> in the
>> >>> >> >> >> >> >> > console
>> >>> >> >> >> >> >> > output :
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains
>> get
>> >>> >> messed
>> >>> >> >> >> >> >> > up. I
>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to
>> it
>> >>> so
>> >>> >> there
>> >>> >> >> >> >> >> > are
>> >>> >> >> >> >> 11
>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>> contains
>> >>> >> around
>> >>> >> >> 23
>> >>> >> >> >> >> >> engines
>> >>> >> >> >> >> >> > in total.
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> I was not able to replicate this. What I tried was
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> (1) start up the stable launcher
>> >>> >> >> >> >> >> (2) add an additional engine to the default chain
>> >>> >> >> >> >> >> (3) restart the launcher
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> The default chain was not changed after (2) and (3).
>> So I
>> >>> would
>> >>> >> >> need
>> >>> >> >> >> >> >> further information for knowing why this is happening.
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> Generally it is better to create you own chain
>> instance as
>> >>> >> >> modifying
>> >>> >> >> >> >> >> one that is provided by the default configuration. I
>> would
>> >>> also
>> >>> >> >> >> >> >> recommend that you keep your test configuration in text
>> >>> files
>> >>> >> and
>> >>> >> >> to
>> >>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing
>> so
>> >>> >> prevent
>> >>> >> >> you
>> >>> >> >> >> >> >> from manually entering the configuration after a
>> software
>> >>> >> update.
>> >>> >> >> >> >> >> The
>> >>> >> >> >> >> >> production-mode section [3] provides information on
>> how to
>> >>> do
>> >>> >> >> that.
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> best
>> >>> >> >> >> >> >> Rupert
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> >>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
>> >>> >> >> >> >> >> [3]
>> http://stanbol.apache.org/docs/trunk/production-mode
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> > ERROR: Bundle
>> >>> org.apache.stanbol.enhancer.engine.topic.web
>> >>> >> >> [153]:
>> >>> >> >> >> >> Error
>> >>> >> >> >> >> >> > starting
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >>
>> >>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> >>> >> >> >> >> >> > (org.osgi
>> >>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in
>> >>> bundle
>> >>> >> >> >> >> >> > org.apache.stanbol.e
>> >>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve
>> 153.0:
>> >>> >> missing
>> >>> >> >> >> >> >> > requirement [15
>> >>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>> >>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>> >>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
>> >>> constraint in
>> >>> >> >> >> >> >> > bundle
>> >>> >> >> >> >> >> > org.apache.s
>> >>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to
>> resolve
>> >>> >> 153.0:
>> >>> >> >> >> >> missing
>> >>> >> >> >> >> >> > require
>> >>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>> >>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>> >>> >> >> >> >> >> > )
>> >>> >> >> >> >> >> >         at
>> >>> >> >> >> >> >>
>> >>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >>> >> >> >> >> >> >         at
>> >>> >> >> >> >>
>> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >>> >> >> >> >> >> >         at
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >>
>> >>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >         at
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >>
>> >>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> >>> >> >> >> >> >> > )
>> >>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> > Despite of this the server starts fine and I can use
>> the
>> >>> >> >> enhancer
>> >>> >> >> >> >> fine.
>> >>> >> >> >> >> >> Do
>> >>> >> >> >> >> >> > you guys see this as well?
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains
>> get
>> >>> >> messed
>> >>> >> >> >> >> >> > up. I
>> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to
>> it
>> >>> so
>> >>> >> there
>> >>> >> >> >> >> >> > are
>> >>> >> >> >> >> 11
>> >>> >> >> >> >> >> > engines in it. After the restart this chain now
>> contains
>> >>> >> around
>> >>> >> >> 23
>> >>> >> >> >> >> >> engines
>> >>> >> >> >> >> >> > in total.
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> >>> >> >> >> >> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >> >>:
>> >>> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> Hi Cristian,
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> NER Annotations are typically available as both
>> >>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and
>>  fise:TextAnnotation
>> >>> [1]
>> >>> >> in
>> >>> >> >> the
>> >>> >> >> >> >> >> >> enhancement metadata. As you are already accessing
>> the
>> >>> >> >> >> >> >> >> AnayzedText I
>> >>> >> >> >> >> >> >> would prefer using the
>>  NlpAnnotations.NER_ANNOTATION.
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> best
>> >>> >> >> >> >> >> >> Rupert
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> [1]
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >>> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> >> >> >> >> > Thanks.
>> >>> >> >> >> >> >> >> > I assume I should get the Named entities using the
>> >>> same
>> >>> >> but
>> >>> >> >> >> >> >> >> > with
>> >>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>> >>> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >>> >> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
>> >>> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >> Hallo Cristian,
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement
>> >>> results.
>> >>> >> >> You
>> >>> >> >> >> >> need to
>> >>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> here is some demo code you can use in the
>> >>> >> computeEnhancement
>> >>> >> >> >> >> method
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>         AnalysedText at =
>> >>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>> >>> >> >> >> >> ci,
>> >>> >> >> >> >> >> >> true);
>> >>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
>> >>> >> >> >> >> >> >> >> at.getSentences();
>> >>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as
>> single
>> >>> >> >> sentence
>> >>> >> >> >> >> >> >> >>             sections =
>> >>> >> Collections.singleton(at).iterator();
>> >>> >> >> >> >> >> >> >>         }
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>         while(sections.hasNext()){
>> >>> >> >> >> >> >> >> >>             Section section = sections.next();
>> >>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
>> >>> >> >> >> >> >> >> >>
>> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
>> >>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
>> >>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
>> >>> >> >> >> >> >> >> >>
>> >>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >>> >> >> >> >> >> >> >>                 if(phrase.value().getCategory()
>> ==
>> >>> >> >> >> >> >> >> LexicalCategory.Noun){
>> >>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase
>> [{},{}]
>> >>> {}",
>> >>> >> >> new
>> >>> >> >> >> >> >> Object[]{
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>
>> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >>> >> >> >> >> >> >> >>                 }
>> >>> >> >> >> >> >> >> >>             }
>> >>> >> >> >> >> >> >> >>         }
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> hope this helps
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> best
>> >>> >> >> >> >> >> >> >> Rupert
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> [1]
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >>> >> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >>> >> >> >> >> >> >> >> > I started to implement the engine and I'm
>> having
>> >>> >> problems
>> >>> >> >> >> >> >> >> >> > with
>> >>> >> >> >> >> >> getting
>> >>> >> >> >> >> >> >> >> > results for noun phrases. I modified the
>> "default"
>> >>> >> >> weighted
>> >>> >> >> >> >> chain
>> >>> >> >> >> >> >> to
>> >>> >> >> >> >> >> >> also
>> >>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample
>> text
>> >>> :
>> >>> >> >> "Angela
>> >>> >> >> >> >> Merkel
>> >>> >> >> >> >> >> >> >> visted
>> >>> >> >> >> >> >> >> >> > China. The german chancellor met with various
>> >>> people".
>> >>> >> I
>> >>> >> >> >> >> expected
>> >>> >> >> >> >> >> that
>> >>> >> >> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> > RDF XML output would contain some info about
>> the
>> >>> noun
>> >>> >> >> >> >> >> >> >> > phrases
>> >>> >> >> >> >> but I
>> >>> >> >> >> >> >> >> >> cannot
>> >>> >> >> >> >> >> >> >> > see any.
>> >>> >> >> >> >> >> >> >> > Could you point me to the correct way to
>> generate
>> >>> the
>> >>> >> noun
>> >>> >> >> >> >> phrases?
>> >>> >> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >> > Thanks,
>> >>> >> >> >> >> >> >> >> > Cristian
>> >>> >> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
>> >>> >> >> >> >> >> >> >> >
>> >>> >> >> >> >> >> >> >> >> Opened
>> >>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
>> >>> >> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>
>> >>> >> >> >> >> >> >> >> >> :
>> >>> >> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> >> Hi Rupert,
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll
>> also
>> >>> >> take a
>> >>> >> >> >> >> >> >> >> >>> look
>> >>> >> >> >> >> at
>> >>> >> >> >> >> >> >> Yago.
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked
>> about
>> >>> here.
>> >>> >> It
>> >>> >> >> >> >> >> >> >> >>> will
>> >>> >> >> >> >> >> >> probably
>> >>> >> >> >> >> >> >> >> >>> have just a draft-like description for now
>> and
>> >>> will
>> >>> >> be
>> >>> >> >> >> >> >> >> >> >>> updated
>> >>> >> >> >> >> >> as I
>> >>> >> >> >> >> >> >> go
>> >>> >> >> >> >> >> >> >> >>> along.
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>> Thanks,
>> >>> >> >> >> >> >> >> >> >>> Cristian
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert
>> Westenthaler <
>> >>> >> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>> Hi Cristian,
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You
>> should
>> >>> have
>> >>> >> a
>> >>> >> >> >> >> >> >> >> >>>> look at
>> >>> >> >> >> >> >> Yago2
>> >>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago
>> taxonomy
>> >>> is
>> >>> >> much
>> >>> >> >> >> >> better
>> >>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia.
>> Mapping
>> >>> >> >> >> >> >> >> >> >>>> suggestions of
>> >>> >> >> >> >> >> >> dbpedia
>> >>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both
>> dbpedia and
>> >>> >> yago2
>> >>> >> >> do
>> >>> >> >> >> >> >> provide
>> >>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
>> >>> >> Redmond's
>> >>> >> >> >> >> >> >> >> >>>> >> company
>> >>> >> >> >> >> >> made
>> >>> >> >> >> >> >> >> a
>> >>> >> >> >> >> >> >> >> >>>> >> huge profit".
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial
>> >>> contexts
>> >>> >> >> are
>> >>> >> >> >> >> >> >> >> >>>> very
>> >>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for
>> >>> >> >> referencing.
>> >>> >> >> >> >> >> >> >> >>>> So I
>> >>> >> >> >> >> >> would
>> >>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial
>> context.
>> >>> For
>> >>> >> >> >> >> >> >> >> >>>> spatial
>> >>> >> >> >> >> >> >> Entities
>> >>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for
>> other
>> >>> >> (like a
>> >>> >> >> >> >> Person,
>> >>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial
>> >>> entities
>> >>> >> >> >> >> >> >> >> >>>> define
>> >>> >> >> >> >> >> their
>> >>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be
>> >>> used to
>> >>> >> >> >> >> >> >> >> >>>> correctly
>> >>> >> >> >> >> >> link
>> >>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the
>> "spatial"
>> >>> >> >> context
>> >>> >> >> >> >> >> >> >> >>>> of
>> >>> >> >> >> >> each
>> >>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities that
>> are
>> >>> >> cities,
>> >>> >> >> >> >> regions,
>> >>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because
>> >>> those
>> >>> >> are
>> >>> >> >> >> >> >> >> >> >>>> very
>> >>> >> >> >> >> often
>> >>> >> >> >> >> >> >> used
>> >>> >> >> >> >> >> >> >> >>>> for coreferences.
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> [1]
>> http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >>> >> >> >> >> >> >> >> >>>> [2]
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >>> >> >> >> >> >> >> >> >>>> [3]
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
>> >>> Petroaca
>> >>> >> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>> >>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for
>> each
>> >>> >> entity,
>> >>> >> >> >> >> >> >> >> >>>> > in
>> >>> >> >> >> >> this
>> >>> >> >> >> >> >> >> case
>> >>> >> >> >> >> >> >> >> for
>> >>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >>> >> >> >> >> >> >> >> >>>> > category:Microsoft
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> category:Software_companies_of_the_United_States
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> category:Software_companies_based_in_Washington_(state)
>> >>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> category:1975_establishments_in_the_United_States
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> category:Companies_based_in_Redmond,_Washington
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >>
>> category:Multinational_companies_headquartered_in_the_United_States
>> >>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
>> >>> >> >> Redmont,Washington"
>> >>> >> >> >> >> which
>> >>> >> >> >> >> >> >> could
>> >>> >> >> >> >> >> >> >> be
>> >>> >> >> >> >> >> >> >> >>>> > matched.
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > There is still other contextual
>> information
>> >>> from
>> >>> >> >> >> >> >> >> >> >>>> > dbpedia
>> >>> >> >> >> >> which
>> >>> >> >> >> >> >> >> can
>> >>> >> >> >> >> >> >> >> be
>> >>> >> >> >> >> >> >> >> >>>> used.
>> >>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could
>> also
>> >>> >> >> include :
>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
>> >>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack
>> Obama) :
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>> >>> >> >> >> >> >> >> >> >>>> >
>>  dbpedia:Author
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>> >>> >> >> >> >> >> >> >> >>>> >
>>  dbpedia:Lawyer
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this
>> as I
>> >>> think
>> >>> >> >> that
>> >>> >> >> >> >> >> >> >> >>>> > it
>> >>> >> >> >> >> may
>> >>> >> >> >> >> >> >> have
>> >>> >> >> >> >> >> >> >> >>>> some
>> >>> >> >> >> >> >> >> >> >>>> > value in increasing the number of
>> coreference
>> >>> >> >> >> >> >> >> >> >>>> > resolutions
>> >>> >> >> >> >> and
>> >>> >> >> >> >> >> I'd
>> >>> >> >> >> >> >> >> >> like
>> >>> >> >> >> >> >> >> >> >>>> to
>> >>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than
>> >>> recall
>> >>> >> >> since
>> >>> >> >> >> >> >> >> >> >>>> > we
>> >>> >> >> >> >> >> already
>> >>> >> >> >> >> >> >> >> have
>> >>> >> >> >> >> >> >> >> >>>> a
>> >>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the
>> stanford
>> >>> nlp
>> >>> >> tool
>> >>> >> >> >> >> >> >> >> >>>> > and
>> >>> >> >> >> >> this
>> >>> >> >> >> >> >> >> would
>> >>> >> >> >> >> >> >> >> >>>> be as
>> >>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is how
>> I
>> >>> would
>> >>> >> >> like
>> >>> >> >> >> >> >> >> >> >>>> > to
>> >>> >> >> >> >> use
>> >>> >> >> >> >> >> >> it).
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a
>> jira? I
>> >>> >> could
>> >>> >> >> >> >> >> >> >> >>>> > update
>> >>> >> >> >> >> it
>> >>> >> >> >> >> >> to
>> >>> >> >> >> >> >> >> >> show
>> >>> >> >> >> >> >> >> >> >>>> my
>> >>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it
>> >>> turns
>> >>> >> out
>> >>> >> >> >> >> >> >> >> >>>> > that
>> >>> >> >> >> >> it
>> >>> >> >> >> >> >> was
>> >>> >> >> >> >> >> >> a
>> >>> >> >> >> >> >> >> >> bad
>> >>> >> >> >> >> >> >> >> >>>> idea
>> >>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll
>> end up
>> >>> >> with
>> >>> >> >> >> >> >> >> >> >>>> > more
>> >>> >> >> >> >> >> >> knowledge
>> >>> >> >> >> >> >> >> >> >>>> about
>> >>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>> >>> >> >> >> >> >> >> >> >>>> >
>> >>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want
>> to be
>> >>> the
>> >>> >> >> >> >> >> >> >> >>>> >> devil's
>> >>> >> >> >> >> >> >> advocate
>> >>> >> >> >> >> >> >> >> but
>> >>> >> >> >> >> >> >> >> >>>> I'm
>> >>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the
>> >>> dbpedia
>> >>> >> >> >> >> categories
>> >>> >> >> >> >> >> >> >> feature.
>> >>> >> >> >> >> >> >> >> >>>> For
>> >>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
>> >>> "Microsoft
>> >>> >> >> posted
>> >>> >> >> >> >> >> >> >> >>>> >> its
>> >>> >> >> >> >> >> 2013
>> >>> >> >> >> >> >> >> >> >>>> earnings.
>> >>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge
>> profit".
>> >>> So,
>> >>> >> maybe
>> >>> >> >> >> >> >> including
>> >>> >> >> >> >> >> >> more
>> >>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could
>> >>> >> increase
>> >>> >> >> the
>> >>> >> >> >> >> recall
>> >>> >> >> >> >> >> >> but
>> >>> >> >> >> >> >> >> >> of
>> >>> >> >> >> >> >> >> >> >>>> course
>> >>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>> >> Cheers,
>> >>> >> >> >> >> >> >> >> >>>> >> Rafa
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
>> >>> escribió:
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description
>> of the
>> >>> >> steps
>> >>> >> >> >> >> >> >> >> >>>> >> for
>> >>> >> >> >> >> >> making
>> >>> >> >> >> >> >> >> this
>> >>> >> >> >> >> >> >> >> >>>> kind of
>> >>> >> >> >> >> >> >> >> >>>> >>> coreference work.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the
>> following
>> >>> >> text in
>> >>> >> >> >> >> >> >> >> >>>> >>> the
>> >>> >> >> >> >> >> steps
>> >>> >> >> >> >> >> >> >> below
>> >>> >> >> >> >> >> >> >> >>>> in
>> >>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer :
>> "Microsoft
>> >>> posted
>> >>> >> >> its
>> >>> >> >> >> >> >> >> >> >>>> >>> 2013
>> >>> >> >> >> >> >> >> >> earnings.
>> >>> >> >> >> >> >> >> >> >>>> The
>> >>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text
>> which
>> >>> has :
>> >>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
>> >>> >> reference
>> >>> >> >> to
>> >>> >> >> >> >> >> >> >> >>>> >>> an
>> >>> >> >> >> >> >> entity
>> >>> >> >> >> >> >> >> >> local
>> >>> >> >> >> >> >> >> >> >>>> to
>> >>> >> >> >> >> >> >> >> >>>> >>> the
>> >>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but
>> not
>> >>> >> "another,
>> >>> >> >> >> >> every",
>> >>> >> >> >> >> >> etc
>> >>> >> >> >> >> >> >> >> which
>> >>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity
>> outside of
>> >>> the
>> >>> >> >> text.
>> >>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun
>> aside
>> >>> from
>> >>> >> the
>> >>> >> >> >> >> >> >> >> >>>> >>> main
>> >>> >> >> >> >> >> >> required
>> >>> >> >> >> >> >> >> >> >>>> noun
>> >>> >> >> >> >> >> >> >> >>>> >>> which
>> >>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I
>> will not
>> >>> >> count
>> >>> >> >> >> >> >> >> >> >>>> >>> "The
>> >>> >> >> >> >> >> >> company"
>> >>> >> >> >> >> >> >> >> as
>> >>> >> >> >> >> >> >> >> >>>> being
>> >>> >> >> >> >> >> >> >> >>>> >>> a
>> >>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
>> >>> create a
>> >>> >> lot
>> >>> >> >> of
>> >>> >> >> >> >> false
>> >>> >> >> >> >> >> >> >> >>>> positives by
>> >>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some
>> words
>> >>> >> such
>> >>> >> >> as
>> >>> >> >> >> >> >> >> >> >>>> >>> "in
>> >>> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> company
>> >>> >> >> >> >> >> >> >> >>>> of
>> >>> >> >> >> >> >> >> >> >>>> >>> good people".
>> >>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good
>> candidate
>> >>> >> since we
>> >>> >> >> >> >> >> >> >> >>>> >>> also
>> >>> >> >> >> >> >> have
>> >>> >> >> >> >> >> >> >> >>>> "software".
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase
>> to the
>> >>> >> >> contents
>> >>> >> >> >> >> >> >> >> >>>> >>> of
>> >>> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> dbpedia
>> >>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found
>> prior
>> >>> to
>> >>> >> the
>> >>> >> >> >> >> location
>> >>> >> >> >> >> >> of
>> >>> >> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> >>>> noun
>> >>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
>> >>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the
>> following
>> >>> >> format
>> >>> >> >> >> >> >> >> >> >>>> >>> (for
>> >>> >> >> >> >> >> >> Microsoft
>> >>> >> >> >> >> >> >> >> for
>> >>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the
>> United
>> >>> >> >> States".
>> >>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software company"
>> with
>> >>> >> that.
>> >>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in
>> the
>> >>> >> dbpedia
>> >>> >> >> >> >> category
>> >>> >> >> >> >> >> >> has a
>> >>> >> >> >> >> >> >> >> >>>> plural
>> >>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all
>> categories
>> >>> which
>> >>> >> I
>> >>> >> >> >> >> >> >> >> >>>> >>> saw. I
>> >>> >> >> >> >> >> don't
>> >>> >> >> >> >> >> >> >> know
>> >>> >> >> >> >> >> >> >> >>>> if
>> >>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
>> >>> thought
>> >>> >> of
>> >>> >> >> >> >> applying a
>> >>> >> >> >> >> >> >> >> >>>> lemmatizer on
>> >>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in
>> order
>> >>> for
>> >>> >> them
>> >>> >> >> to
>> >>> >> >> >> >> have a
>> >>> >> >> >> >> >> >> >> common
>> >>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun
>> >>> phrase
>> >>> >> >> itself
>> >>> >> >> >> >> has a
>> >>> >> >> >> >> >> >> plural
>> >>> >> >> >> >> >> >> >> >>>> form.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison
>> >>> only the
>> >>> >> >> >> >> >> >> >> >>>> >>> words in
>> >>> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> >>>> category
>> >>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
>> >>> prepositions
>> >>> >> or
>> >>> >> >> >> >> >> determiners
>> >>> >> >> >> >> >> >> >> such
>> >>> >> >> >> >> >> >> >> >>>> as "of
>> >>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag
>> the
>> >>> >> >> categories
>> >>> >> >> >> >> >> contents
>> >>> >> >> >> >> >> >> as
>> >>> >> >> >> >> >> >> >> >>>> well.
>> >>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and
>> lemma
>> >>> on
>> >>> >> the
>> >>> >> >> >> >> dbpedia
>> >>> >> >> >> >> >> >> >> >>>> categories when
>> >>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub
>> and
>> >>> >> storing
>> >>> >> >> >> >> >> >> >> >>>> >>> them
>> >>> >> >> >> >> for
>> >>> >> >> >> >> >> >> later
>> >>> >> >> >> >> >> >> >> >>>> use - I
>> >>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
>> >>> moment.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in
>> the
>> >>> noun
>> >>> >> >> phrase
>> >>> >> >> >> >> with
>> >>> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> >>>> equivalent
>> >>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the
>> >>> number
>> >>> >> of
>> >>> >> >> >> >> matches I
>> >>> >> >> >> >> >> >> can
>> >>> >> >> >> >> >> >> >> >>>> create a
>> >>> >> >> >> >> >> >> >> >>>> >>> confidence level.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase
>> with
>> >>> the
>> >>> >> >> >> >> >> >> >> >>>> >>> rdf:type
>> >>> >> >> >> >> from
>> >>> >> >> >> >> >> >> >> dbpedia
>> >>> >> >> >> >> >> >> >> >>>> of the
>> >>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase
>> the
>> >>> >> >> confidence
>> >>> >> >> >> >> level.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities
>> >>> which
>> >>> >> can
>> >>> >> >> >> >> >> >> >> >>>> >>> match a
>> >>> >> >> >> >> >> >> certain
>> >>> >> >> >> >> >> >> >> >>>> noun
>> >>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with
>> the
>> >>> >> closest
>> >>> >> >> >> >> >> >> >> >>>> >>> named
>> >>> >> >> >> >> >> entity
>> >>> >> >> >> >> >> >> >> prior
>> >>> >> >> >> >> >> >> >> >>>> to it
>> >>> >> >> >> >> >> >> >> >>>> >>> in the text.
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> What do you think?
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> Cristian
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>> >>> >> >> >> >> cristian.petroaca@gmail.com>:
>> >>> >> >> >> >> >> >> >> >>>> >>>
>> >>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic
>> but
>> >>> I'm
>> >>> >> >> >> >> >> >> >> >>>> >>>> working on
>> >>> >> >> >> >> >> it.
>> >>> >> >> >> >> >> >> I'll
>> >>> >> >> >> >> >> >> >> >>>> provide
>> >>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
>> >>> >> feedback on
>> >>> >> >> >> >> >> >> >> >>>> >>>> it.
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools
>> >>> such as
>> >>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
>> >>> >> >> >> >> and
>> >>> >> >> >> >> >> >> >> >>>> CherryPicker
>> >>> >> >> >> >> >> >> >> >>>> >>>> and
>> >>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> Cristian
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rharo@apache.org
>> >:
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
>> >>> >> >> >> >> >> >> >> >>>> >>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about your
>> >>> >> concrete
>> >>> >> >> >> >> heuristic,
>> >>> >> >> >> >> >> >> in my
>> >>> >> >> >> >> >> >> >> >>>> honest
>> >>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a
>> >>> lot of
>> >>> >> >> false
>> >>> >> >> >> >> >> >> positives. I
>> >>> >> >> >> >> >> >> >> >>>> don't
>> >>> >> >> >> >> >> >> >> >>>> >>>>> know
>> >>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some
>> "locality"
>> >>> >> >> features
>> >>> >> >> >> >> >> >> >> >>>> >>>>> to
>> >>> >> >> >> >> >> detect
>> >>> >> >> >> >> >> >> >> such
>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into
>> >>> account
>> >>> >> >> that
>> >>> >> >> >> >> >> >> >> >>>> >>>>> it
>> >>> >> >> >> >> is
>> >>> >> >> >> >> >> >> quite
>> >>> >> >> >> >> >> >> >> >>>> usual
>> >>> >> >> >> >> >> >> >> >>>> >>>>> that
>> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even
>> in
>> >>> >> >> different
>> >>> >> >> >> >> >> >> paragraphs.
>> >>> >> >> >> >> >> >> >> >>>> Although
>> >>> >> >> >> >> >> >> >> >>>> >>>>> I'm
>> >>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
>> >>> >> Understanding,
>> >>> >> >> I
>> >>> >> >> >> >> would
>> >>> >> >> >> >> >> say
>> >>> >> >> >> >> >> >> it
>> >>> >> >> >> >> >> >> >> is
>> >>> >> >> >> >> >> >> >> >>>> quite
>> >>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent
>> precision/recall
>> >>> rates
>> >>> >> >> for
>> >>> >> >> >> >> >> >> coreferencing
>> >>> >> >> >> >> >> >> >> >>>> using
>> >>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try
>> to
>> >>> >> others
>> >>> >> >> >> >> >> >> >> >>>> >>>>> tools
>> >>> >> >> >> >> like
>> >>> >> >> >> >> >> >> BART
>> >>> >> >> >> >> >> >> >> (
>> >>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
>> >>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
>> >>> escribió:
>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
>> >>> >> >> >> >> >> >> >> >>>> >>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
>> >>> implementing
>> >>> >> the
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Event
>> >>> >> >> >> >> >> >> >> extraction
>> >>> >> >> >> >> >> >> >> >>>> Engine
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
>> >>> >> >> >> >> >> >>
>> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >>> >> >> >> >> >> >> >> >>>> to
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> have
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given
>> text.
>> >>> >> This
>> >>> >> >> is
>> >>> >> >> >> >> >> provided
>> >>> >> >> >> >> >> >> now
>> >>> >> >> >> >> >> >> >> >>>> via the
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I
>> saw
>> >>> this
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> module
>> >>> >> >> >> >> is
>> >>> >> >> >> >> >> >> >> performing
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal
>> (Barack
>> >>> Obama
>> >>> >> and
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
>> >>> >> >> >> >> >> Obama)
>> >>> >> >> >> >> >> >> >> >>>> coreference
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences
>> from
>> >>> the
>> >>> >> text
>> >>> >> >> I
>> >>> >> >> >> >> though
>> >>> >> >> >> >> >> of
>> >>> >> >> >> >> >> >> >> >>>> creating
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> some
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
>> >>> >> >> coreference :
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights.
>> The
>> >>> >> software
>> >>> >> >> >> >> company
>> >>> >> >> >> >> >> just
>> >>> >> >> >> >> >> >> >> >>>> announced
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> its
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously
>> >>> refers
>> >>> >> to
>> >>> >> >> >> >> "Apple".
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of
>> >>> Named
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
>> >>> >> >> >> >> >> which
>> >>> >> >> >> >> >> >> are
>> >>> >> >> >> >> >> >> >> of
>> >>> >> >> >> >> >> >> >> >>>> the
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in
>> this
>> >>> case
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
>> >>> >> >> >> >> and
>> >>> >> >> >> >> >> >> also
>> >>> >> >> >> >> >> >> >> >>>> have
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the
>> >>> dbpedia
>> >>> >> >> >> >> categories
>> >>> >> >> >> >> >> of
>> >>> >> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> >>>> named
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as
>> >>> "The
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> software
>> >>> >> >> >> >> >> >> company" in
>> >>> >> >> >> >> >> >> >> >>>> the
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> text
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using
>> the
>> >>> new
>> >>> >> Pos
>> >>> >> >> Tag
>> >>> >> >> >> >> Based
>> >>> >> >> >> >> >> >> Phrase
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
>> >>> >> dependency
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
>> >>> >> >> >> >> >> the
>> >>> >> >> >> >> >> >> >> >>>> sentence and
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if
>> this
>> >>> kind
>> >>> >> of
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic
>> >>> >> >> >> >> >> would
>> >>> >> >> >> >> >> >> be
>> >>> >> >> >> >> >> >> >> >>>> useful
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> as a
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case
>> the
>> >>> >> >> precision
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> and
>> >>> >> >> >> >> >> >> recall
>> >>> >> >> >> >> >> >> >> are
>> >>> >> >> >> >> >> >> >> >>>> good
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
>> >>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >>> >> >> >> >> >> >> >> >>>> >>
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>> --
>> >>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
>> >>> >> >> >> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>> >>> >> >> >> >> >> >> ++43-699-11108907
>> >>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>> >>> >> >> >> >> >> >> >> >>>>
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>>
>> >>> >> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> >> --
>> >>> >> >> >> >> >> >> >> | Rupert Westenthaler
>> >>> >> >> >> >> >> >> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
>> >>> >> >> >> >> ++43-699-11108907
>> >>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
>> >>> >> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >> >> --
>> >>> >> >> >> >> >> >> | Rupert Westenthaler
>> >>> >> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >> >> | Bodenlehenstraße 11
>> >>> >> >> >> >> >> >> ++43-699-11108907
>> >>> >> >> >> >> >> >> | A-5500 Bischofshofen
>> >>> >> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >>
>> >>> >> >> >> >> >> --
>> >>> >> >> >> >> >> | Rupert Westenthaler
>> >>> >> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> >> | Bodenlehenstraße 11
>> >>> >> >> ++43-699-11108907
>> >>> >> >> >> >> >> | A-5500 Bischofshofen
>> >>> >> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >>
>> >>> >> >> >> >> --
>> >>> >> >> >> >> | Rupert Westenthaler
>> >>> rupert.westenthaler@gmail.com
>> >>> >> >> >> >> | Bodenlehenstraße 11
>> >>> >> ++43-699-11108907
>> >>> >> >> >> >> | A-5500 Bischofshofen
>> >>> >> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> >>> >> >> >> | Bodenlehenstraße 11
>> >>> ++43-699-11108907
>> >>> >> >> >> | A-5500 Bischofshofen
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>> >> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >>> >> >> | A-5500 Bischofshofen
>> >>> >> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >>> >> | A-5500 Bischofshofen
>> >>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>> | Bodenlehenstraße 11                             ++43-699-11108907
>> >>> | A-5500 Bischofshofen
>> >>>
>> >>
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
Examples :

1. Group membership :
    a. Spatial membership :

        "Microsoft anounced its 2013 earnings. <coref>The Richmond-based
company</coref> made huge profits."

    b. Organisational membership :

       "Mick Jagger started a new solo album. <coref>The Rolling Stones
singer</coref> did not say what the theme will be."

2. Functional membership :

   "Allianz announced its 2013 earnings. <coref>The financial services
company</coref> made a huge profit."

3.  If no matches were found for the current NER with rules from above then
if the yago:class which matched has more than 2 nouns then we also consider
this a good co-reference but with a lower confidence maybe.

   "Boris Becker will take part in a demonstrative tennis match. <coref>The
former tennis player</coref> will play again after 10 years."


2014-03-28 12:22 GMT+02:00 Rupert Westenthaler <
rupert.westenthaler@gmail.com>:

> Hi Cristian, all
>
> Looks good to me, nut I am not sure if I got everything. If you could
> provide example texts where those rules apply it would make it much
> easier to understand.
>
> Instead of using dbpedia properties you should define your own domain
> model (ontology). You can than align the dbpedia properties to your
> model. This will allow it to apply this approach also to knowledge
> bases other than dbpedia.
>
> For people new to this thread: The above message adds to the
> suggestion first made by Cristian on 4th February. Also the following
> 4 messages (until 7th Feb) provide additional context.
>
> best
> Rupert
>
>
> On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > Hi guys,
> >
> > After Rupert's last suggestions related to this enhancement engine I
> > devised a more comprehensive algorithm for matching the noun phrases
> > against the NER properties.Please take a look and let me know what you
> > think. Thanks.
> >
> > The following rules will be applied to every noun phrase in order to find
> > co-references:
> >
> > 1. For each NER prior to the current noun phrase in the text match the
> > yago:class label to the contents of the noun phrase.
> >
> > For the NERs which have a yago:class which matches, apply:
> >
> > 2. Group membership rules :
> >
> >     a. spatial membership : the NER is part of a Location. If the noun
> > phrase contains a LOCATION or a demonym then check any location
> properties
> > of the matching NER.
> >
> >     If matching NER is a :
> >     - person, match against :birthPlace, :region, :nationality
> >     - organisation, match against :foundationPlace, :locationCity,
> > :location, :hometown
> >     - place, match against :country, :subdivisionName, :location,
> >
> >     Ex: The Italian President, The Richmond-based company
> >
> >     b. organisational membership : the NER is part of an Organisation. If
> > the noun phrase contains an ORGANISATION then check the following
> > properties of the maching NER:
> >
> >     If matching NER is :
> >     - person, match against :occupation, :associatedActs
> >     - organisation ?
> >     - location ?
> >
> > Ex: The Microsoft executive, The Pink Floyd singer
> >
> > 3. Functional description rule: the noun phrase describes what the NER
> does
> > conceptually.
> > If there are no NERs in the noun phrase then match the following
> properties
> > of the matching NER to the contents of the noun phrase (aside from the
> > nouns which are part of the yago:class) :
> >
> >    If NER is a:
> >    - person ?
> >    - organisation : , match against :service, :industry, :genre
> >    - location ?
> >
> > Ex:  The software company.
> >
> > 4. If no matches were found for the current NER with rules 2 or 3 then if
> > the yago:class which matched has more than 2 nouns then we also consider
> > this a good co-reference but with a lower confidence maybe.
> >
> > Ex: The former tennis player, the theoretical physicist.
> >
> > 5. Based on the number of nouns which matched we create a confidence
> level.
> > The number of matched nouns cannot be lower than 2 and we must have a
> > yago:class match.
> >
> > For all NERs which got to this point, select the closest ones in the text
> > to the noun phrase which matched against the same properties (yago:class
> > and dbpedia) and mark them as co-references.
> >
> > Note: all noun phrases need to be lemmatized before all of this in case
> > there are any plurals.
> >
> >
> > 2014-03-25 20:50 GMT+02:00 Cristian Petroaca <
> cristian.petroaca@gmail.com>:
> >
> >> That worked. Thanks.
> >>
> >> So, there are no exceptions during the startup of the launcher.
> >> The component tab in the felix console shows 6 WeightedChains the first
> >> time, including the default one but after my changes and a restart there
> >> are only 5 - the default one is missing altogether.
> >>
> >>
> >> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
> >> rupert.westenthaler@gmail.com>:
> >>
> >> Hi Cristian,
> >>>
> >>> I do see the same problem since last Friday. The solution as mentions
> >>> by [1] works for me.
> >>>
> >>>     mvn -Djsse.enableSNIExtension=false {goals}
> >>>
> >>> No Idea why https connections to github do currently cause this. I
> >>> could not find anything related via Google. So I suggest to use the
> >>> system property for now. If this persists for longer we can adapt the
> >>> build files accordingly.
> >>>
> >>> best
> >>> Rupert
> >>>
> >>>
> >>>
> >>>
> >>> [1]
> >>>
> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
> >>>
> >>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
> >>> <cr...@gmail.com> wrote:
> >>> > I did a clean on the whole project and now I wanted to do another
> "mvn
> >>> > clean install" but I am getting this :
> >>> >
> >>> > "[INFO]
> >>> >
> ------------------------------------------------------------------------
> >>> > [ERROR] Failed to execute goal
> >>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
> >>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es:
> An
> >>> Ant
> >>> > BuildE
> >>> > xception has occured: The following error occurred while executing
> this
> >>> > line:
> >>> > [ERROR]
> >>> >
> C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
> >>> > 3: Failed to copy
> >>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
> >>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to
> >>> > C:\Data\Pr
> >>> >
> >>>
> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
> >>> > data\opennlp\es-pos-maxent.bin due to
> javax.net.ssl.SSLProtocolException
> >>> > handshake alert : unrecognized_name"
> >>> >
> >>> >
> >>> >
> >>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
> >>> > rupert.westenthaler@gmail.com>:
> >>> >
> >>> >> Hi Cristian,
> >>> >>
> >>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
> >>> >> <cr...@gmail.com> wrote:
> >>> >> >
> >>> >>
> >>>
> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
> >>> >> > service.ranking=I"-2147483648"
> >>> >> > stanbol.enhancer.chain.name="default"
> >>> >>
> >>> >> Does look fine to me. Do you see any exception during the startup of
> >>> >> the launcher. Can you check the status of this component in the
> >>> >> component tab of the felix web console [1] (search for
> >>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
> >>> >> you have multiple you can find the correct one by comparing the
> >>> >> "Properties" with those in the configuration file.
> >>> >>
> >>> >> I guess that the according service is in the 'unsatisfied' as you do
> >>> >> not see it in the web interface. But if this is the case you should
> >>> >> also see the according exception in the log. You can also manually
> >>> >> stop/start the component. In this case the exception should be
> >>> >> re-thrown and you do not need to search the log for it.
> >>> >>
> >>> >> best
> >>> >> Rupert
> >>> >>
> >>> >>
> >>> >> [1] http://localhost:8080/system/console/components
> >>> >>
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
> >>> >> rupert.westenthaler@gmail.com
> >>> >> >>:
> >>> >> >
> >>> >> >> Hi Cristian,
> >>> >> >>
> >>> >> >> you can not send attachments to the list. Please copy the
> contents
> >>> >> >> directly to the mail
> >>> >> >>
> >>> >> >> thx
> >>> >> >> Rupert
> >>> >> >>
> >>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
> >>> >> >> <cr...@gmail.com> wrote:
> >>> >> >> > The config attached.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
> >>> >> >> > <ru...@gmail.com>:
> >>> >> >> >
> >>> >> >> >> Hi Cristian,
> >>> >> >> >>
> >>> >> >> >> can you provide the contents of the chain after your
> >>> modifications?
> >>> >> >> >> Would be interesting to test why the chain is no longer active
> >>> after
> >>> >> >> >> the restart.
> >>> >> >> >>
> >>> >> >> >> You can find the config file in the 'stanbol/fileinstall'
> folder.
> >>> >> >> >>
> >>> >> >> >> best
> >>> >> >> >> Rupert
> >>> >> >> >>
> >>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
> >>> >> >> >> <cr...@gmail.com> wrote:
> >>> >> >> >> > Related to the default chain selection rules : before
> restart I
> >>> >> had a
> >>> >> >> >> > chain
> >>> >> >> >> > with the name 'default' as in I could access it via
> >>> >> >> >> > enhancer/chain/default.
> >>> >> >> >> > Then I just added another engine to the 'default' chain. I
> >>> assumed
> >>> >> >> that
> >>> >> >> >> > after the restart the chain with the 'default' name would be
> >>> >> >> persisted.
> >>> >> >> >> > So
> >>> >> >> >> > the first rule should have been applied after the restart as
> >>> well.
> >>> >> But
> >>> >> >> >> > instead I cannot reach it via enhancer/chain/default anymore
> >>> so its
> >>> >> >> >> > gone.
> >>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in any
> >>> way, I
> >>> >> >> just
> >>> >> >> >> > wanted to understand where the problem is.
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
> >>> >> >> >> > <rupert.westenthaler@gmail.com
> >>> >> >> >> >>:
> >>> >> >> >> >
> >>> >> >> >> >> Hi Cristian
> >>> >> >> >> >>
> >>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
> >>> >> >> >> >> <cr...@gmail.com> wrote:
> >>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
> >>> >> >> >> >> >
> >>> >> >> >> >> > 2. I start the stable launcher -> create a new instance
> of
> >>> the
> >>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At this
> >>> point
> >>> >> >> >> >> > everything
> >>> >> >> >> >> > looks good and works ok.
> >>> >> >> >> >> > After I restart the server the default chain is gone and
> >>> >> instead I
> >>> >> >> >> >> > see
> >>> >> >> >> >> this
> >>> >> >> >> >> > in the enhancement chains page : all-active (default, id:
> >>> 149,
> >>> >> >> >> >> > ranking:
> >>> >> >> >> >> 0,
> >>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not contain
> >>> the
> >>> >> >> >> >> > 'default'
> >>> >> >> >> >> > word before the restart.
> >>> >> >> >> >> >
> >>> >> >> >> >>
> >>> >> >> >> >> Please note the default chain selection rules as described
> at
> >>> [1].
> >>> >> >> You
> >>> >> >> >> >> can also access chains chains under
> >>> '/enhancer/chain/{chain-name}'
> >>> >> >> >> >>
> >>> >> >> >> >> best
> >>> >> >> >> >> Rupert
> >>> >> >> >> >>
> >>> >> >> >> >> [1]
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
> >>> >> >> >> >>
> >>> >> >> >> >> > It looks like the config files are exactly what I need.
> >>> Thanks.
> >>> >> >> >> >> >
> >>> >> >> >> >> >
> >>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
> >>> >> >> >> >> rupert.westenthaler@gmail.com
> >>> >> >> >> >> >>:
> >>> >> >> >> >> >
> >>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
> >>> >> >> >> >> >> <cr...@gmail.com> wrote:
> >>> >> >> >> >> >> > Thanks Rupert.
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> > A couple more questions/issues :
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this
> >>> in the
> >>> >> >> >> >> >> > console
> >>> >> >> >> >> >> > output :
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains
> get
> >>> >> messed
> >>> >> >> >> >> >> > up. I
> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to
> it
> >>> so
> >>> >> there
> >>> >> >> >> >> >> > are
> >>> >> >> >> >> 11
> >>> >> >> >> >> >> > engines in it. After the restart this chain now
> contains
> >>> >> around
> >>> >> >> 23
> >>> >> >> >> >> >> engines
> >>> >> >> >> >> >> > in total.
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> I was not able to replicate this. What I tried was
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> (1) start up the stable launcher
> >>> >> >> >> >> >> (2) add an additional engine to the default chain
> >>> >> >> >> >> >> (3) restart the launcher
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> The default chain was not changed after (2) and (3). So
> I
> >>> would
> >>> >> >> need
> >>> >> >> >> >> >> further information for knowing why this is happening.
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> Generally it is better to create you own chain instance
> as
> >>> >> >> modifying
> >>> >> >> >> >> >> one that is provided by the default configuration. I
> would
> >>> also
> >>> >> >> >> >> >> recommend that you keep your test configuration in text
> >>> files
> >>> >> and
> >>> >> >> to
> >>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so
> >>> >> prevent
> >>> >> >> you
> >>> >> >> >> >> >> from manually entering the configuration after a
> software
> >>> >> update.
> >>> >> >> >> >> >> The
> >>> >> >> >> >> >> production-mode section [3] provides information on how
> to
> >>> do
> >>> >> >> that.
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> best
> >>> >> >> >> >> >> Rupert
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
> >>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
> >>> >> >> >> >> >> [3]
> http://stanbol.apache.org/docs/trunk/production-mode
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> > ERROR: Bundle
> >>> org.apache.stanbol.enhancer.engine.topic.web
> >>> >> >> [153]:
> >>> >> >> >> >> Error
> >>> >> >> >> >> >> > starting
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >>
> >>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
> >>> >> >> >> >> >> > (org.osgi
> >>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in
> >>> bundle
> >>> >> >> >> >> >> > org.apache.stanbol.e
> >>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve
> 153.0:
> >>> >> missing
> >>> >> >> >> >> >> > requirement [15
> >>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
> >>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
> >>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
> >>> constraint in
> >>> >> >> >> >> >> > bundle
> >>> >> >> >> >> >> > org.apache.s
> >>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to
> resolve
> >>> >> 153.0:
> >>> >> >> >> >> missing
> >>> >> >> >> >> >> > require
> >>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
> >>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
> >>> >> >> >> >> >> > )
> >>> >> >> >> >> >> >         at
> >>> >> >> >> >> >>
> >>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
> >>> >> >> >> >> >> >         at
> >>> >> >> >> >>
> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
> >>> >> >> >> >> >> >         at
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >>
> >>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >         at
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >>
> >>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
> >>> >> >> >> >> >> > )
> >>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> > Despite of this the server starts fine and I can use
> the
> >>> >> >> enhancer
> >>> >> >> >> >> fine.
> >>> >> >> >> >> >> Do
> >>> >> >> >> >> >> > you guys see this as well?
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains
> get
> >>> >> messed
> >>> >> >> >> >> >> > up. I
> >>> >> >> >> >> >> > usually use the 'default' chain and add my engine to
> it
> >>> so
> >>> >> there
> >>> >> >> >> >> >> > are
> >>> >> >> >> >> 11
> >>> >> >> >> >> >> > engines in it. After the restart this chain now
> contains
> >>> >> around
> >>> >> >> 23
> >>> >> >> >> >> >> engines
> >>> >> >> >> >> >> > in total.
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
> >>> >> >> >> >> >> rupert.westenthaler@gmail.com
> >>> >> >> >> >> >> >>:
> >>> >> >> >> >> >> >
> >>> >> >> >> >> >> >> Hi Cristian,
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> NER Annotations are typically available as both
> >>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and
>  fise:TextAnnotation
> >>> [1]
> >>> >> in
> >>> >> >> the
> >>> >> >> >> >> >> >> enhancement metadata. As you are already accessing
> the
> >>> >> >> >> >> >> >> AnayzedText I
> >>> >> >> >> >> >> >> would prefer using the
>  NlpAnnotations.NER_ANNOTATION.
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> best
> >>> >> >> >> >> >> >> Rupert
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> [1]
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> >>> >> >> >> >> >> >> <cr...@gmail.com> wrote:
> >>> >> >> >> >> >> >> > Thanks.
> >>> >> >> >> >> >> >> > I assume I should get the Named entities using the
> >>> same
> >>> >> but
> >>> >> >> >> >> >> >> > with
> >>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
> >>> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> >>> >> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
> >>> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >> Hallo Cristian,
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement
> >>> results.
> >>> >> >> You
> >>> >> >> >> >> need to
> >>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> here is some demo code you can use in the
> >>> >> computeEnhancement
> >>> >> >> >> >> method
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >>         AnalysedText at =
> >>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
> >>> >> >> >> >> ci,
> >>> >> >> >> >> >> >> true);
> >>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
> >>> >> >> >> >> >> >> >> at.getSentences();
> >>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as
> single
> >>> >> >> sentence
> >>> >> >> >> >> >> >> >>             sections =
> >>> >> Collections.singleton(at).iterator();
> >>> >> >> >> >> >> >> >>         }
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >>         while(sections.hasNext()){
> >>> >> >> >> >> >> >> >>             Section section = sections.next();
> >>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
> >>> >> >> >> >> >> >> >>
> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
> >>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
> >>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
> >>> >> >> >> >> >> >> >>
> >>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >>> >> >> >> >> >> >> >>                 if(phrase.value().getCategory() ==
> >>> >> >> >> >> >> >> LexicalCategory.Noun){
> >>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase
> [{},{}]
> >>> {}",
> >>> >> >> new
> >>> >> >> >> >> >> Object[]{
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >>> >> >> >> >> >> >> >>                 }
> >>> >> >> >> >> >> >> >>             }
> >>> >> >> >> >> >> >> >>         }
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> hope this helps
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> best
> >>> >> >> >> >> >> >> >> Rupert
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> [1]
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >>> >> >> >> >> >> >> >> <cr...@gmail.com> wrote:
> >>> >> >> >> >> >> >> >> > I started to implement the engine and I'm having
> >>> >> problems
> >>> >> >> >> >> >> >> >> > with
> >>> >> >> >> >> >> getting
> >>> >> >> >> >> >> >> >> > results for noun phrases. I modified the
> "default"
> >>> >> >> weighted
> >>> >> >> >> >> chain
> >>> >> >> >> >> >> to
> >>> >> >> >> >> >> >> also
> >>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample
> text
> >>> :
> >>> >> >> "Angela
> >>> >> >> >> >> Merkel
> >>> >> >> >> >> >> >> >> visted
> >>> >> >> >> >> >> >> >> > China. The german chancellor met with various
> >>> people".
> >>> >> I
> >>> >> >> >> >> expected
> >>> >> >> >> >> >> that
> >>> >> >> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> > RDF XML output would contain some info about the
> >>> noun
> >>> >> >> >> >> >> >> >> > phrases
> >>> >> >> >> >> but I
> >>> >> >> >> >> >> >> >> cannot
> >>> >> >> >> >> >> >> >> > see any.
> >>> >> >> >> >> >> >> >> > Could you point me to the correct way to
> generate
> >>> the
> >>> >> noun
> >>> >> >> >> >> phrases?
> >>> >> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >> > Thanks,
> >>> >> >> >> >> >> >> >> > Cristian
> >>> >> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
> >>> >> >> >> >> >> >> >> >
> >>> >> >> >> >> >> >> >> >> Opened
> >>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
> >>> >> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>
> >>> >> >> >> >> >> >> >> >> :
> >>> >> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> >> Hi Rupert,
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll
> also
> >>> >> take a
> >>> >> >> >> >> >> >> >> >>> look
> >>> >> >> >> >> at
> >>> >> >> >> >> >> >> Yago.
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked about
> >>> here.
> >>> >> It
> >>> >> >> >> >> >> >> >> >>> will
> >>> >> >> >> >> >> >> probably
> >>> >> >> >> >> >> >> >> >>> have just a draft-like description for now and
> >>> will
> >>> >> be
> >>> >> >> >> >> >> >> >> >>> updated
> >>> >> >> >> >> >> as I
> >>> >> >> >> >> >> >> go
> >>> >> >> >> >> >> >> >> >>> along.
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>> Thanks,
> >>> >> >> >> >> >> >> >> >>> Cristian
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert
> Westenthaler <
> >>> >> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>> Hi Cristian,
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You
> should
> >>> have
> >>> >> a
> >>> >> >> >> >> >> >> >> >>>> look at
> >>> >> >> >> >> >> Yago2
> >>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago
> taxonomy
> >>> is
> >>> >> much
> >>> >> >> >> >> better
> >>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia.
> Mapping
> >>> >> >> >> >> >> >> >> >>>> suggestions of
> >>> >> >> >> >> >> >> dbpedia
> >>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia
> and
> >>> >> yago2
> >>> >> >> do
> >>> >> >> >> >> >> provide
> >>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
> >>> >> Redmond's
> >>> >> >> >> >> >> >> >> >>>> >> company
> >>> >> >> >> >> >> made
> >>> >> >> >> >> >> >> a
> >>> >> >> >> >> >> >> >> >>>> >> huge profit".
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial
> >>> contexts
> >>> >> >> are
> >>> >> >> >> >> >> >> >> >>>> very
> >>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for
> >>> >> >> referencing.
> >>> >> >> >> >> >> >> >> >>>> So I
> >>> >> >> >> >> >> would
> >>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial
> context.
> >>> For
> >>> >> >> >> >> >> >> >> >>>> spatial
> >>> >> >> >> >> >> >> Entities
> >>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for
> other
> >>> >> (like a
> >>> >> >> >> >> Person,
> >>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial
> >>> entities
> >>> >> >> >> >> >> >> >> >>>> define
> >>> >> >> >> >> >> their
> >>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be
> >>> used to
> >>> >> >> >> >> >> >> >> >>>> correctly
> >>> >> >> >> >> >> link
> >>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the
> "spatial"
> >>> >> >> context
> >>> >> >> >> >> >> >> >> >>>> of
> >>> >> >> >> >> each
> >>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities that
> are
> >>> >> cities,
> >>> >> >> >> >> regions,
> >>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because
> >>> those
> >>> >> are
> >>> >> >> >> >> >> >> >> >>>> very
> >>> >> >> >> >> often
> >>> >> >> >> >> >> >> used
> >>> >> >> >> >> >> >> >> >>>> for coreferences.
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> [1]
> http://www.mpi-inf.mpg.de/yago-naga/yago/
> >>> >> >> >> >> >> >> >> >>>> [2]
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >>> >> >> >> >> >> >> >> >>>> [3]
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
> >>> Petroaca
> >>> >> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
> >>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for
> each
> >>> >> entity,
> >>> >> >> >> >> >> >> >> >>>> > in
> >>> >> >> >> >> this
> >>> >> >> >> >> >> >> case
> >>> >> >> >> >> >> >> >> for
> >>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >>> >> >> >> >> >> >> >> >>>> > category:Microsoft
> >>> >> >> >> >> >> >> >> >>>> >
> >>> category:Software_companies_of_the_United_States
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> category:Software_companies_based_in_Washington_(state)
> >>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
> >>> >> >> >> >> >> >> >> >>>> >
> >>> category:1975_establishments_in_the_United_States
> >>> >> >> >> >> >> >> >> >>>> >
> >>> category:Companies_based_in_Redmond,_Washington
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >>
> category:Multinational_companies_headquartered_in_the_United_States
> >>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
> >>> >> >> Redmont,Washington"
> >>> >> >> >> >> which
> >>> >> >> >> >> >> >> could
> >>> >> >> >> >> >> >> >> be
> >>> >> >> >> >> >> >> >> >>>> > matched.
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > There is still other contextual information
> >>> from
> >>> >> >> >> >> >> >> >> >>>> > dbpedia
> >>> >> >> >> >> which
> >>> >> >> >> >> >> >> can
> >>> >> >> >> >> >> >> >> be
> >>> >> >> >> >> >> >> >> >>>> used.
> >>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could
> also
> >>> >> >> include :
> >>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
> >>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama)
> :
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
> >>> >> >> >> >> >> >> >> >>>> >
>  dbpedia:Author
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
> >>> >> >> >> >> >> >> >> >>>> >
>  dbpedia:Lawyer
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this as
> I
> >>> think
> >>> >> >> that
> >>> >> >> >> >> >> >> >> >>>> > it
> >>> >> >> >> >> may
> >>> >> >> >> >> >> >> have
> >>> >> >> >> >> >> >> >> >>>> some
> >>> >> >> >> >> >> >> >> >>>> > value in increasing the number of
> coreference
> >>> >> >> >> >> >> >> >> >>>> > resolutions
> >>> >> >> >> >> and
> >>> >> >> >> >> >> I'd
> >>> >> >> >> >> >> >> >> like
> >>> >> >> >> >> >> >> >> >>>> to
> >>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than
> >>> recall
> >>> >> >> since
> >>> >> >> >> >> >> >> >> >>>> > we
> >>> >> >> >> >> >> already
> >>> >> >> >> >> >> >> >> have
> >>> >> >> >> >> >> >> >> >>>> a
> >>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the
> stanford
> >>> nlp
> >>> >> tool
> >>> >> >> >> >> >> >> >> >>>> > and
> >>> >> >> >> >> this
> >>> >> >> >> >> >> >> would
> >>> >> >> >> >> >> >> >> >>>> be as
> >>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is how I
> >>> would
> >>> >> >> like
> >>> >> >> >> >> >> >> >> >>>> > to
> >>> >> >> >> >> use
> >>> >> >> >> >> >> >> it).
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a
> jira? I
> >>> >> could
> >>> >> >> >> >> >> >> >> >>>> > update
> >>> >> >> >> >> it
> >>> >> >> >> >> >> to
> >>> >> >> >> >> >> >> >> show
> >>> >> >> >> >> >> >> >> >>>> my
> >>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it
> >>> turns
> >>> >> out
> >>> >> >> >> >> >> >> >> >>>> > that
> >>> >> >> >> >> it
> >>> >> >> >> >> >> was
> >>> >> >> >> >> >> >> a
> >>> >> >> >> >> >> >> >> bad
> >>> >> >> >> >> >> >> >> >>>> idea
> >>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll
> end up
> >>> >> with
> >>> >> >> >> >> >> >> >> >>>> > more
> >>> >> >> >> >> >> >> knowledge
> >>> >> >> >> >> >> >> >> >>>> about
> >>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
> >>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
> >>> >> >> >> >> >> >> >> >>>> >
> >>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to
> be
> >>> the
> >>> >> >> >> >> >> >> >> >>>> >> devil's
> >>> >> >> >> >> >> >> advocate
> >>> >> >> >> >> >> >> >> but
> >>> >> >> >> >> >> >> >> >>>> I'm
> >>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the
> >>> dbpedia
> >>> >> >> >> >> categories
> >>> >> >> >> >> >> >> >> feature.
> >>> >> >> >> >> >> >> >> >>>> For
> >>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
> >>> "Microsoft
> >>> >> >> posted
> >>> >> >> >> >> >> >> >> >>>> >> its
> >>> >> >> >> >> >> 2013
> >>> >> >> >> >> >> >> >> >>>> earnings.
> >>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit".
> >>> So,
> >>> >> maybe
> >>> >> >> >> >> >> including
> >>> >> >> >> >> >> >> more
> >>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could
> >>> >> increase
> >>> >> >> the
> >>> >> >> >> >> recall
> >>> >> >> >> >> >> >> but
> >>> >> >> >> >> >> >> >> of
> >>> >> >> >> >> >> >> >> >>>> course
> >>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>> >> Cheers,
> >>> >> >> >> >> >> >> >> >>>> >> Rafa
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
> >>> escribió:
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description of
> the
> >>> >> steps
> >>> >> >> >> >> >> >> >> >>>> >> for
> >>> >> >> >> >> >> making
> >>> >> >> >> >> >> >> this
> >>> >> >> >> >> >> >> >> >>>> kind of
> >>> >> >> >> >> >> >> >> >>>> >>> coreference work.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the
> following
> >>> >> text in
> >>> >> >> >> >> >> >> >> >>>> >>> the
> >>> >> >> >> >> >> steps
> >>> >> >> >> >> >> >> >> below
> >>> >> >> >> >> >> >> >> >>>> in
> >>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft
> >>> posted
> >>> >> >> its
> >>> >> >> >> >> >> >> >> >>>> >>> 2013
> >>> >> >> >> >> >> >> >> earnings.
> >>> >> >> >> >> >> >> >> >>>> The
> >>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text
> which
> >>> has :
> >>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
> >>> >> reference
> >>> >> >> to
> >>> >> >> >> >> >> >> >> >>>> >>> an
> >>> >> >> >> >> >> entity
> >>> >> >> >> >> >> >> >> local
> >>> >> >> >> >> >> >> >> >>>> to
> >>> >> >> >> >> >> >> >> >>>> >>> the
> >>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not
> >>> >> "another,
> >>> >> >> >> >> every",
> >>> >> >> >> >> >> etc
> >>> >> >> >> >> >> >> >> which
> >>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity outside
> of
> >>> the
> >>> >> >> text.
> >>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun
> aside
> >>> from
> >>> >> the
> >>> >> >> >> >> >> >> >> >>>> >>> main
> >>> >> >> >> >> >> >> required
> >>> >> >> >> >> >> >> >> >>>> noun
> >>> >> >> >> >> >> >> >> >>>> >>> which
> >>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I will
> not
> >>> >> count
> >>> >> >> >> >> >> >> >> >>>> >>> "The
> >>> >> >> >> >> >> >> company"
> >>> >> >> >> >> >> >> >> as
> >>> >> >> >> >> >> >> >> >>>> being
> >>> >> >> >> >> >> >> >> >>>> >>> a
> >>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
> >>> create a
> >>> >> lot
> >>> >> >> of
> >>> >> >> >> >> false
> >>> >> >> >> >> >> >> >> >>>> positives by
> >>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some
> words
> >>> >> such
> >>> >> >> as
> >>> >> >> >> >> >> >> >> >>>> >>> "in
> >>> >> >> >> >> the
> >>> >> >> >> >> >> >> >> company
> >>> >> >> >> >> >> >> >> >>>> of
> >>> >> >> >> >> >> >> >> >>>> >>> good people".
> >>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good
> candidate
> >>> >> since we
> >>> >> >> >> >> >> >> >> >>>> >>> also
> >>> >> >> >> >> >> have
> >>> >> >> >> >> >> >> >> >>>> "software".
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to
> the
> >>> >> >> contents
> >>> >> >> >> >> >> >> >> >>>> >>> of
> >>> >> >> >> >> the
> >>> >> >> >> >> >> >> >> dbpedia
> >>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found
> prior
> >>> to
> >>> >> the
> >>> >> >> >> >> location
> >>> >> >> >> >> >> of
> >>> >> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> >>>> noun
> >>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
> >>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the
> following
> >>> >> format
> >>> >> >> >> >> >> >> >> >>>> >>> (for
> >>> >> >> >> >> >> >> Microsoft
> >>> >> >> >> >> >> >> >> for
> >>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the
> United
> >>> >> >> States".
> >>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software company"
> with
> >>> >> that.
> >>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in
> the
> >>> >> dbpedia
> >>> >> >> >> >> category
> >>> >> >> >> >> >> >> has a
> >>> >> >> >> >> >> >> >> >>>> plural
> >>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all categories
> >>> which
> >>> >> I
> >>> >> >> >> >> >> >> >> >>>> >>> saw. I
> >>> >> >> >> >> >> don't
> >>> >> >> >> >> >> >> >> know
> >>> >> >> >> >> >> >> >> >>>> if
> >>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
> >>> thought
> >>> >> of
> >>> >> >> >> >> applying a
> >>> >> >> >> >> >> >> >> >>>> lemmatizer on
> >>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in order
> >>> for
> >>> >> them
> >>> >> >> to
> >>> >> >> >> >> have a
> >>> >> >> >> >> >> >> >> common
> >>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun
> >>> phrase
> >>> >> >> itself
> >>> >> >> >> >> has a
> >>> >> >> >> >> >> >> plural
> >>> >> >> >> >> >> >> >> >>>> form.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison
> >>> only the
> >>> >> >> >> >> >> >> >> >>>> >>> words in
> >>> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> >>>> category
> >>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
> >>> prepositions
> >>> >> or
> >>> >> >> >> >> >> determiners
> >>> >> >> >> >> >> >> >> such
> >>> >> >> >> >> >> >> >> >>>> as "of
> >>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag
> the
> >>> >> >> categories
> >>> >> >> >> >> >> contents
> >>> >> >> >> >> >> >> as
> >>> >> >> >> >> >> >> >> >>>> well.
> >>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and
> lemma
> >>> on
> >>> >> the
> >>> >> >> >> >> dbpedia
> >>> >> >> >> >> >> >> >> >>>> categories when
> >>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub
> and
> >>> >> storing
> >>> >> >> >> >> >> >> >> >>>> >>> them
> >>> >> >> >> >> for
> >>> >> >> >> >> >> >> later
> >>> >> >> >> >> >> >> >> >>>> use - I
> >>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
> >>> moment.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in the
> >>> noun
> >>> >> >> phrase
> >>> >> >> >> >> with
> >>> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> >>>> equivalent
> >>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the
> >>> number
> >>> >> of
> >>> >> >> >> >> matches I
> >>> >> >> >> >> >> >> can
> >>> >> >> >> >> >> >> >> >>>> create a
> >>> >> >> >> >> >> >> >> >>>> >>> confidence level.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with
> >>> the
> >>> >> >> >> >> >> >> >> >>>> >>> rdf:type
> >>> >> >> >> >> from
> >>> >> >> >> >> >> >> >> dbpedia
> >>> >> >> >> >> >> >> >> >>>> of the
> >>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase
> the
> >>> >> >> confidence
> >>> >> >> >> >> level.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities
> >>> which
> >>> >> can
> >>> >> >> >> >> >> >> >> >>>> >>> match a
> >>> >> >> >> >> >> >> certain
> >>> >> >> >> >> >> >> >> >>>> noun
> >>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the
> >>> >> closest
> >>> >> >> >> >> >> >> >> >>>> >>> named
> >>> >> >> >> >> >> entity
> >>> >> >> >> >> >> >> >> prior
> >>> >> >> >> >> >> >> >> >>>> to it
> >>> >> >> >> >> >> >> >> >>>> >>> in the text.
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> What do you think?
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> Cristian
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
> >>> >> >> >> >> cristian.petroaca@gmail.com>:
> >>> >> >> >> >> >> >> >> >>>> >>>
> >>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic
> but
> >>> I'm
> >>> >> >> >> >> >> >> >> >>>> >>>> working on
> >>> >> >> >> >> >> it.
> >>> >> >> >> >> >> >> I'll
> >>> >> >> >> >> >> >> >> >>>> provide
> >>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
> >>> >> feedback on
> >>> >> >> >> >> >> >> >> >>>> >>>> it.
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools
> >>> such as
> >>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
> >>> >> >> >> >> and
> >>> >> >> >> >> >> >> >> >>>> CherryPicker
> >>> >> >> >> >> >> >> >> >>>> >>>> and
> >>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> Cristian
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rharo@apache.org
> >:
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
> >>> >> >> >> >> >> >> >> >>>> >>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about your
> >>> >> concrete
> >>> >> >> >> >> heuristic,
> >>> >> >> >> >> >> >> in my
> >>> >> >> >> >> >> >> >> >>>> honest
> >>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a
> >>> lot of
> >>> >> >> false
> >>> >> >> >> >> >> >> positives. I
> >>> >> >> >> >> >> >> >> >>>> don't
> >>> >> >> >> >> >> >> >> >>>> >>>>> know
> >>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some
> "locality"
> >>> >> >> features
> >>> >> >> >> >> >> >> >> >>>> >>>>> to
> >>> >> >> >> >> >> detect
> >>> >> >> >> >> >> >> >> such
> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into
> >>> account
> >>> >> >> that
> >>> >> >> >> >> >> >> >> >>>> >>>>> it
> >>> >> >> >> >> is
> >>> >> >> >> >> >> >> quite
> >>> >> >> >> >> >> >> >> >>>> usual
> >>> >> >> >> >> >> >> >> >>>> >>>>> that
> >>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even
> in
> >>> >> >> different
> >>> >> >> >> >> >> >> paragraphs.
> >>> >> >> >> >> >> >> >> >>>> Although
> >>> >> >> >> >> >> >> >> >>>> >>>>> I'm
> >>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
> >>> >> Understanding,
> >>> >> >> I
> >>> >> >> >> >> would
> >>> >> >> >> >> >> say
> >>> >> >> >> >> >> >> it
> >>> >> >> >> >> >> >> >> is
> >>> >> >> >> >> >> >> >> >>>> quite
> >>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent
> precision/recall
> >>> rates
> >>> >> >> for
> >>> >> >> >> >> >> >> coreferencing
> >>> >> >> >> >> >> >> >> >>>> using
> >>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try
> to
> >>> >> others
> >>> >> >> >> >> >> >> >> >>>> >>>>> tools
> >>> >> >> >> >> like
> >>> >> >> >> >> >> >> BART
> >>> >> >> >> >> >> >> >> (
> >>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
> >>> >> >> >> >> >> >> >> >>>> >>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
> >>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
> >>> >> >> >> >> >> >> >> >>>> >>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
> >>> escribió:
> >>> >> >> >> >> >> >> >> >>>> >>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
> >>> >> >> >> >> >> >> >> >>>> >>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
> >>> implementing
> >>> >> the
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Event
> >>> >> >> >> >> >> >> >> extraction
> >>> >> >> >> >> >> >> >> >>>> Engine
> >>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
> >>> >> >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
> >>> >> >> >> >> >> >> >> >>>> to
> >>> >> >> >> >> >> >> >> >>>> >>>>>> have
> >>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given
> text.
> >>> >> This
> >>> >> >> is
> >>> >> >> >> >> >> provided
> >>> >> >> >> >> >> >> now
> >>> >> >> >> >> >> >> >> >>>> via the
> >>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I
> saw
> >>> this
> >>> >> >> >> >> >> >> >> >>>> >>>>>> module
> >>> >> >> >> >> is
> >>> >> >> >> >> >> >> >> performing
> >>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
> >>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack
> >>> Obama
> >>> >> and
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
> >>> >> >> >> >> >> Obama)
> >>> >> >> >> >> >> >> >> >>>> coreference
> >>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from
> >>> the
> >>> >> text
> >>> >> >> I
> >>> >> >> >> >> though
> >>> >> >> >> >> >> of
> >>> >> >> >> >> >> >> >> >>>> creating
> >>> >> >> >> >> >> >> >> >>>> >>>>>> some
> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
> >>> >> >> coreference :
> >>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The
> >>> >> software
> >>> >> >> >> >> company
> >>> >> >> >> >> >> just
> >>> >> >> >> >> >> >> >> >>>> announced
> >>> >> >> >> >> >> >> >> >>>> >>>>>> its
> >>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously
> >>> refers
> >>> >> to
> >>> >> >> >> >> "Apple".
> >>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of
> >>> Named
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
> >>> >> >> >> >> >> which
> >>> >> >> >> >> >> >> are
> >>> >> >> >> >> >> >> >> of
> >>> >> >> >> >> >> >> >> >>>> the
> >>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this
> >>> case
> >>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
> >>> >> >> >> >> and
> >>> >> >> >> >> >> >> also
> >>> >> >> >> >> >> >> >> >>>> have
> >>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the
> >>> dbpedia
> >>> >> >> >> >> categories
> >>> >> >> >> >> >> of
> >>> >> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> >>>> named
> >>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as
> >>> "The
> >>> >> >> >> >> >> >> >> >>>> >>>>>> software
> >>> >> >> >> >> >> >> company" in
> >>> >> >> >> >> >> >> >> >>>> the
> >>> >> >> >> >> >> >> >> >>>> >>>>>> text
> >>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using the
> >>> new
> >>> >> Pos
> >>> >> >> Tag
> >>> >> >> >> >> Based
> >>> >> >> >> >> >> >> Phrase
> >>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
> >>> >> dependency
> >>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
> >>> >> >> >> >> >> the
> >>> >> >> >> >> >> >> >> >>>> sentence and
> >>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this
> >>> kind
> >>> >> of
> >>> >> >> >> >> >> >> >> >>>> >>>>>> logic
> >>> >> >> >> >> >> would
> >>> >> >> >> >> >> >> be
> >>> >> >> >> >> >> >> >> >>>> useful
> >>> >> >> >> >> >> >> >> >>>> >>>>>> as a
> >>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case
> the
> >>> >> >> precision
> >>> >> >> >> >> >> >> >> >>>> >>>>>> and
> >>> >> >> >> >> >> >> recall
> >>> >> >> >> >> >> >> >> are
> >>> >> >> >> >> >> >> >> >>>> good
> >>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
> >>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>>>>>
> >>> >> >> >> >> >> >> >> >>>> >>
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>> --
> >>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
> >>> >> >> >> >> rupert.westenthaler@gmail.com
> >>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
> >>> >> >> >> >> >> >> ++43-699-11108907
> >>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
> >>> >> >> >> >> >> >> >> >>>>
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>>
> >>> >> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> >> --
> >>> >> >> >> >> >> >> >> | Rupert Westenthaler
> >>> >> >> >> >> >> >> >> rupert.westenthaler@gmail.com
> >>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
> >>> >> >> >> >> ++43-699-11108907
> >>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
> >>> >> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >> >> --
> >>> >> >> >> >> >> >> | Rupert Westenthaler
> >>> >> >> rupert.westenthaler@gmail.com
> >>> >> >> >> >> >> >> | Bodenlehenstraße 11
> >>> >> >> >> >> >> >> ++43-699-11108907
> >>> >> >> >> >> >> >> | A-5500 Bischofshofen
> >>> >> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >> >>
> >>> >> >> >> >> >> --
> >>> >> >> >> >> >> | Rupert Westenthaler
> >>> >> rupert.westenthaler@gmail.com
> >>> >> >> >> >> >> | Bodenlehenstraße 11
> >>> >> >> ++43-699-11108907
> >>> >> >> >> >> >> | A-5500 Bischofshofen
> >>> >> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >> --
> >>> >> >> >> >> | Rupert Westenthaler
> >>> rupert.westenthaler@gmail.com
> >>> >> >> >> >> | Bodenlehenstraße 11
> >>> >> ++43-699-11108907
> >>> >> >> >> >> | A-5500 Bischofshofen
> >>> >> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> --
> >>> >> >> >> | Rupert Westenthaler
> rupert.westenthaler@gmail.com
> >>> >> >> >> | Bodenlehenstraße 11
> >>> ++43-699-11108907
> >>> >> >> >> | A-5500 Bischofshofen
> >>> >> >> >
> >>> >> >> >
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> --
> >>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>> >> >> | Bodenlehenstraße 11
> ++43-699-11108907
> >>> >> >> | A-5500 Bischofshofen
> >>> >> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >>> >> | A-5500 Bischofshofen
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>> | Bodenlehenstraße 11                             ++43-699-11108907
> >>> | A-5500 Bischofshofen
> >>>
> >>
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian, all

Looks good to me, nut I am not sure if I got everything. If you could
provide example texts where those rules apply it would make it much
easier to understand.

Instead of using dbpedia properties you should define your own domain
model (ontology). You can than align the dbpedia properties to your
model. This will allow it to apply this approach also to knowledge
bases other than dbpedia.

For people new to this thread: The above message adds to the
suggestion first made by Cristian on 4th February. Also the following
4 messages (until 7th Feb) provide additional context.

best
Rupert


On Fri, Mar 28, 2014 at 9:23 AM, Cristian Petroaca
<cr...@gmail.com> wrote:
> Hi guys,
>
> After Rupert's last suggestions related to this enhancement engine I
> devised a more comprehensive algorithm for matching the noun phrases
> against the NER properties.Please take a look and let me know what you
> think. Thanks.
>
> The following rules will be applied to every noun phrase in order to find
> co-references:
>
> 1. For each NER prior to the current noun phrase in the text match the
> yago:class label to the contents of the noun phrase.
>
> For the NERs which have a yago:class which matches, apply:
>
> 2. Group membership rules :
>
>     a. spatial membership : the NER is part of a Location. If the noun
> phrase contains a LOCATION or a demonym then check any location properties
> of the matching NER.
>
>     If matching NER is a :
>     - person, match against :birthPlace, :region, :nationality
>     - organisation, match against :foundationPlace, :locationCity,
> :location, :hometown
>     - place, match against :country, :subdivisionName, :location,
>
>     Ex: The Italian President, The Richmond-based company
>
>     b. organisational membership : the NER is part of an Organisation. If
> the noun phrase contains an ORGANISATION then check the following
> properties of the maching NER:
>
>     If matching NER is :
>     - person, match against :occupation, :associatedActs
>     - organisation ?
>     - location ?
>
> Ex: The Microsoft executive, The Pink Floyd singer
>
> 3. Functional description rule: the noun phrase describes what the NER does
> conceptually.
> If there are no NERs in the noun phrase then match the following properties
> of the matching NER to the contents of the noun phrase (aside from the
> nouns which are part of the yago:class) :
>
>    If NER is a:
>    - person ?
>    - organisation : , match against :service, :industry, :genre
>    - location ?
>
> Ex:  The software company.
>
> 4. If no matches were found for the current NER with rules 2 or 3 then if
> the yago:class which matched has more than 2 nouns then we also consider
> this a good co-reference but with a lower confidence maybe.
>
> Ex: The former tennis player, the theoretical physicist.
>
> 5. Based on the number of nouns which matched we create a confidence level.
> The number of matched nouns cannot be lower than 2 and we must have a
> yago:class match.
>
> For all NERs which got to this point, select the closest ones in the text
> to the noun phrase which matched against the same properties (yago:class
> and dbpedia) and mark them as co-references.
>
> Note: all noun phrases need to be lemmatized before all of this in case
> there are any plurals.
>
>
> 2014-03-25 20:50 GMT+02:00 Cristian Petroaca <cr...@gmail.com>:
>
>> That worked. Thanks.
>>
>> So, there are no exceptions during the startup of the launcher.
>> The component tab in the felix console shows 6 WeightedChains the first
>> time, including the default one but after my changes and a restart there
>> are only 5 - the default one is missing altogether.
>>
>>
>> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
>> rupert.westenthaler@gmail.com>:
>>
>> Hi Cristian,
>>>
>>> I do see the same problem since last Friday. The solution as mentions
>>> by [1] works for me.
>>>
>>>     mvn -Djsse.enableSNIExtension=false {goals}
>>>
>>> No Idea why https connections to github do currently cause this. I
>>> could not find anything related via Google. So I suggest to use the
>>> system property for now. If this persists for longer we can adapt the
>>> build files accordingly.
>>>
>>> best
>>> Rupert
>>>
>>>
>>>
>>>
>>> [1]
>>> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
>>>
>>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
>>> <cr...@gmail.com> wrote:
>>> > I did a clean on the whole project and now I wanted to do another "mvn
>>> > clean install" but I am getting this :
>>> >
>>> > "[INFO]
>>> > ------------------------------------------------------------------------
>>> > [ERROR] Failed to execute goal
>>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
>>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es: An
>>> Ant
>>> > BuildE
>>> > xception has occured: The following error occurred while executing this
>>> > line:
>>> > [ERROR]
>>> > C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
>>> > 3: Failed to copy
>>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
>>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to
>>> > C:\Data\Pr
>>> >
>>> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
>>> > data\opennlp\es-pos-maxent.bin due to javax.net.ssl.SSLProtocolException
>>> > handshake alert : unrecognized_name"
>>> >
>>> >
>>> >
>>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
>>> > rupert.westenthaler@gmail.com>:
>>> >
>>> >> Hi Cristian,
>>> >>
>>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
>>> >> <cr...@gmail.com> wrote:
>>> >> >
>>> >>
>>> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
>>> >> > service.ranking=I"-2147483648"
>>> >> > stanbol.enhancer.chain.name="default"
>>> >>
>>> >> Does look fine to me. Do you see any exception during the startup of
>>> >> the launcher. Can you check the status of this component in the
>>> >> component tab of the felix web console [1] (search for
>>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
>>> >> you have multiple you can find the correct one by comparing the
>>> >> "Properties" with those in the configuration file.
>>> >>
>>> >> I guess that the according service is in the 'unsatisfied' as you do
>>> >> not see it in the web interface. But if this is the case you should
>>> >> also see the according exception in the log. You can also manually
>>> >> stop/start the component. In this case the exception should be
>>> >> re-thrown and you do not need to search the log for it.
>>> >>
>>> >> best
>>> >> Rupert
>>> >>
>>> >>
>>> >> [1] http://localhost:8080/system/console/components
>>> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
>>> >> rupert.westenthaler@gmail.com
>>> >> >>:
>>> >> >
>>> >> >> Hi Cristian,
>>> >> >>
>>> >> >> you can not send attachments to the list. Please copy the contents
>>> >> >> directly to the mail
>>> >> >>
>>> >> >> thx
>>> >> >> Rupert
>>> >> >>
>>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>>> >> >> <cr...@gmail.com> wrote:
>>> >> >> > The config attached.
>>> >> >> >
>>> >> >> >
>>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>>> >> >> > <ru...@gmail.com>:
>>> >> >> >
>>> >> >> >> Hi Cristian,
>>> >> >> >>
>>> >> >> >> can you provide the contents of the chain after your
>>> modifications?
>>> >> >> >> Would be interesting to test why the chain is no longer active
>>> after
>>> >> >> >> the restart.
>>> >> >> >>
>>> >> >> >> You can find the config file in the 'stanbol/fileinstall' folder.
>>> >> >> >>
>>> >> >> >> best
>>> >> >> >> Rupert
>>> >> >> >>
>>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>>> >> >> >> <cr...@gmail.com> wrote:
>>> >> >> >> > Related to the default chain selection rules : before restart I
>>> >> had a
>>> >> >> >> > chain
>>> >> >> >> > with the name 'default' as in I could access it via
>>> >> >> >> > enhancer/chain/default.
>>> >> >> >> > Then I just added another engine to the 'default' chain. I
>>> assumed
>>> >> >> that
>>> >> >> >> > after the restart the chain with the 'default' name would be
>>> >> >> persisted.
>>> >> >> >> > So
>>> >> >> >> > the first rule should have been applied after the restart as
>>> well.
>>> >> But
>>> >> >> >> > instead I cannot reach it via enhancer/chain/default anymore
>>> so its
>>> >> >> >> > gone.
>>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in any
>>> way, I
>>> >> >> just
>>> >> >> >> > wanted to understand where the problem is.
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>>> >> >> >> > <rupert.westenthaler@gmail.com
>>> >> >> >> >>:
>>> >> >> >> >
>>> >> >> >> >> Hi Cristian
>>> >> >> >> >>
>>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>>> >> >> >> >> <cr...@gmail.com> wrote:
>>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
>>> >> >> >> >> >
>>> >> >> >> >> > 2. I start the stable launcher -> create a new instance of
>>> the
>>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At this
>>> point
>>> >> >> >> >> > everything
>>> >> >> >> >> > looks good and works ok.
>>> >> >> >> >> > After I restart the server the default chain is gone and
>>> >> instead I
>>> >> >> >> >> > see
>>> >> >> >> >> this
>>> >> >> >> >> > in the enhancement chains page : all-active (default, id:
>>> 149,
>>> >> >> >> >> > ranking:
>>> >> >> >> >> 0,
>>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not contain
>>> the
>>> >> >> >> >> > 'default'
>>> >> >> >> >> > word before the restart.
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >> Please note the default chain selection rules as described at
>>> [1].
>>> >> >> You
>>> >> >> >> >> can also access chains chains under
>>> '/enhancer/chain/{chain-name}'
>>> >> >> >> >>
>>> >> >> >> >> best
>>> >> >> >> >> Rupert
>>> >> >> >> >>
>>> >> >> >> >> [1]
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >>
>>> >>
>>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>>> >> >> >> >>
>>> >> >> >> >> > It looks like the config files are exactly what I need.
>>> Thanks.
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>>> >> >> >> >> rupert.westenthaler@gmail.com
>>> >> >> >> >> >>:
>>> >> >> >> >> >
>>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>>> >> >> >> >> >> <cr...@gmail.com> wrote:
>>> >> >> >> >> >> > Thanks Rupert.
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > A couple more questions/issues :
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this
>>> in the
>>> >> >> >> >> >> > console
>>> >> >> >> >> >> > output :
>>> >> >> >> >> >> >
>>> >> >> >> >> >>
>>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>>> >> >> >> >> >>
>>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
>>> >> messed
>>> >> >> >> >> >> > up. I
>>> >> >> >> >> >> > usually use the 'default' chain and add my engine to it
>>> so
>>> >> there
>>> >> >> >> >> >> > are
>>> >> >> >> >> 11
>>> >> >> >> >> >> > engines in it. After the restart this chain now contains
>>> >> around
>>> >> >> 23
>>> >> >> >> >> >> engines
>>> >> >> >> >> >> > in total.
>>> >> >> >> >> >>
>>> >> >> >> >> >> I was not able to replicate this. What I tried was
>>> >> >> >> >> >>
>>> >> >> >> >> >> (1) start up the stable launcher
>>> >> >> >> >> >> (2) add an additional engine to the default chain
>>> >> >> >> >> >> (3) restart the launcher
>>> >> >> >> >> >>
>>> >> >> >> >> >> The default chain was not changed after (2) and (3). So I
>>> would
>>> >> >> need
>>> >> >> >> >> >> further information for knowing why this is happening.
>>> >> >> >> >> >>
>>> >> >> >> >> >> Generally it is better to create you own chain instance as
>>> >> >> modifying
>>> >> >> >> >> >> one that is provided by the default configuration. I would
>>> also
>>> >> >> >> >> >> recommend that you keep your test configuration in text
>>> files
>>> >> and
>>> >> >> to
>>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so
>>> >> prevent
>>> >> >> you
>>> >> >> >> >> >> from manually entering the configuration after a software
>>> >> update.
>>> >> >> >> >> >> The
>>> >> >> >> >> >> production-mode section [3] provides information on how to
>>> do
>>> >> >> that.
>>> >> >> >> >> >>
>>> >> >> >> >> >> best
>>> >> >> >> >> >> Rupert
>>> >> >> >> >> >>
>>> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
>>> >> >> >> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
>>> >> >> >> >> >>
>>> >> >> >> >> >> > ERROR: Bundle
>>> org.apache.stanbol.enhancer.engine.topic.web
>>> >> >> [153]:
>>> >> >> >> >> Error
>>> >> >> >> >> >> > starting
>>> >> >> >> >> >> >
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >>
>>> >>
>>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >>
>>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>>> >> >> >> >> >> > (org.osgi
>>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in
>>> bundle
>>> >> >> >> >> >> > org.apache.stanbol.e
>>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0:
>>> >> missing
>>> >> >> >> >> >> > requirement [15
>>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
>>> constraint in
>>> >> >> >> >> >> > bundle
>>> >> >> >> >> >> > org.apache.s
>>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve
>>> >> 153.0:
>>> >> >> >> >> missing
>>> >> >> >> >> >> > require
>>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>>> >> >> >> >> >> > )
>>> >> >> >> >> >> >         at
>>> >> >> >> >> >>
>>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>>> >> >> >> >> >> >         at
>>> >> >> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>>> >> >> >> >> >> >         at
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >>
>>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >         at
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >>
>>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>>> >> >> >> >> >> > )
>>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > Despite of this the server starts fine and I can use the
>>> >> >> enhancer
>>> >> >> >> >> fine.
>>> >> >> >> >> >> Do
>>> >> >> >> >> >> > you guys see this as well?
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
>>> >> messed
>>> >> >> >> >> >> > up. I
>>> >> >> >> >> >> > usually use the 'default' chain and add my engine to it
>>> so
>>> >> there
>>> >> >> >> >> >> > are
>>> >> >> >> >> 11
>>> >> >> >> >> >> > engines in it. After the restart this chain now contains
>>> >> around
>>> >> >> 23
>>> >> >> >> >> >> engines
>>> >> >> >> >> >> > in total.
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>>> >> >> >> >> >> rupert.westenthaler@gmail.com
>>> >> >> >> >> >> >>:
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >> Hi Cristian,
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> NER Annotations are typically available as both
>>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation
>>> [1]
>>> >> in
>>> >> >> the
>>> >> >> >> >> >> >> enhancement metadata. As you are already accessing the
>>> >> >> >> >> >> >> AnayzedText I
>>> >> >> >> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> best
>>> >> >> >> >> >> >> Rupert
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> [1]
>>> >> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >>
>>> >>
>>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>>> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>>> >> >> >> >> >> >> > Thanks.
>>> >> >> >> >> >> >> > I assume I should get the Named entities using the
>>> same
>>> >> but
>>> >> >> >> >> >> >> > with
>>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>>> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>>> >> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
>>> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >> Hallo Cristian,
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement
>>> results.
>>> >> >> You
>>> >> >> >> >> need to
>>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> here is some demo code you can use in the
>>> >> computeEnhancement
>>> >> >> >> >> method
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >>         AnalysedText at =
>>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>>> >> >> >> >> ci,
>>> >> >> >> >> >> >> true);
>>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
>>> >> >> >> >> >> >> >> at.getSentences();
>>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as single
>>> >> >> sentence
>>> >> >> >> >> >> >> >>             sections =
>>> >> Collections.singleton(at).iterator();
>>> >> >> >> >> >> >> >>         }
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >>         while(sections.hasNext()){
>>> >> >> >> >> >> >> >>             Section section = sections.next();
>>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
>>> >> >> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
>>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
>>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
>>> >> >> >> >> >> >> >>
>>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>>> >> >> >> >> >> >> >>                 if(phrase.value().getCategory() ==
>>> >> >> >> >> >> >> LexicalCategory.Noun){
>>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase [{},{}]
>>> {}",
>>> >> >> new
>>> >> >> >> >> >> Object[]{
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>>> >> >> >> >> >> >> >>                 }
>>> >> >> >> >> >> >> >>             }
>>> >> >> >> >> >> >> >>         }
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> hope this helps
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> best
>>> >> >> >> >> >> >> >> Rupert
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> [1]
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >>
>>> >>
>>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>>> >> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>>> >> >> >> >> >> >> >> > I started to implement the engine and I'm having
>>> >> problems
>>> >> >> >> >> >> >> >> > with
>>> >> >> >> >> >> getting
>>> >> >> >> >> >> >> >> > results for noun phrases. I modified the "default"
>>> >> >> weighted
>>> >> >> >> >> chain
>>> >> >> >> >> >> to
>>> >> >> >> >> >> >> also
>>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample text
>>> :
>>> >> >> "Angela
>>> >> >> >> >> Merkel
>>> >> >> >> >> >> >> >> visted
>>> >> >> >> >> >> >> >> > China. The german chancellor met with various
>>> people".
>>> >> I
>>> >> >> >> >> expected
>>> >> >> >> >> >> that
>>> >> >> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> > RDF XML output would contain some info about the
>>> noun
>>> >> >> >> >> >> >> >> > phrases
>>> >> >> >> >> but I
>>> >> >> >> >> >> >> >> cannot
>>> >> >> >> >> >> >> >> > see any.
>>> >> >> >> >> >> >> >> > Could you point me to the correct way to generate
>>> the
>>> >> noun
>>> >> >> >> >> phrases?
>>> >> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >> > Thanks,
>>> >> >> >> >> >> >> >> > Cristian
>>> >> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
>>> >> >> >> >> >> >> >> >
>>> >> >> >> >> >> >> >> >> Opened
>>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
>>> >> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>
>>> >> >> >> >> >> >> >> >> :
>>> >> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> >> Hi Rupert,
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also
>>> >> take a
>>> >> >> >> >> >> >> >> >>> look
>>> >> >> >> >> at
>>> >> >> >> >> >> >> Yago.
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked about
>>> here.
>>> >> It
>>> >> >> >> >> >> >> >> >>> will
>>> >> >> >> >> >> >> probably
>>> >> >> >> >> >> >> >> >>> have just a draft-like description for now and
>>> will
>>> >> be
>>> >> >> >> >> >> >> >> >>> updated
>>> >> >> >> >> >> as I
>>> >> >> >> >> >> >> go
>>> >> >> >> >> >> >> >> >>> along.
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>> Thanks,
>>> >> >> >> >> >> >> >> >>> Cristian
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>>> >> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>> Hi Cristian,
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You should
>>> have
>>> >> a
>>> >> >> >> >> >> >> >> >>>> look at
>>> >> >> >> >> >> Yago2
>>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy
>>> is
>>> >> much
>>> >> >> >> >> better
>>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
>>> >> >> >> >> >> >> >> >>>> suggestions of
>>> >> >> >> >> >> >> dbpedia
>>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and
>>> >> yago2
>>> >> >> do
>>> >> >> >> >> >> provide
>>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
>>> >> Redmond's
>>> >> >> >> >> >> >> >> >>>> >> company
>>> >> >> >> >> >> made
>>> >> >> >> >> >> >> a
>>> >> >> >> >> >> >> >> >>>> >> huge profit".
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial
>>> contexts
>>> >> >> are
>>> >> >> >> >> >> >> >> >>>> very
>>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for
>>> >> >> referencing.
>>> >> >> >> >> >> >> >> >>>> So I
>>> >> >> >> >> >> would
>>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial context.
>>> For
>>> >> >> >> >> >> >> >> >>>> spatial
>>> >> >> >> >> >> >> Entities
>>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for other
>>> >> (like a
>>> >> >> >> >> Person,
>>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial
>>> entities
>>> >> >> >> >> >> >> >> >>>> define
>>> >> >> >> >> >> their
>>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be
>>> used to
>>> >> >> >> >> >> >> >> >>>> correctly
>>> >> >> >> >> >> link
>>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the "spatial"
>>> >> >> context
>>> >> >> >> >> >> >> >> >>>> of
>>> >> >> >> >> each
>>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities that are
>>> >> cities,
>>> >> >> >> >> regions,
>>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because
>>> those
>>> >> are
>>> >> >> >> >> >> >> >> >>>> very
>>> >> >> >> >> often
>>> >> >> >> >> >> >> used
>>> >> >> >> >> >> >> >> >>>> for coreferences.
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>>> >> >> >> >> >> >> >> >>>> [2]
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>>> >> >> >> >> >> >> >> >>>> [3]
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >>
>>> >>
>>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
>>> Petroaca
>>> >> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for each
>>> >> entity,
>>> >> >> >> >> >> >> >> >>>> > in
>>> >> >> >> >> this
>>> >> >> >> >> >> >> case
>>> >> >> >> >> >> >> >> for
>>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>>> >> >> >> >> >> >> >> >>>> > category:Microsoft
>>> >> >> >> >> >> >> >> >>>> >
>>> category:Software_companies_of_the_United_States
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> category:Software_companies_based_in_Washington_(state)
>>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>>> >> >> >> >> >> >> >> >>>> >
>>> category:1975_establishments_in_the_United_States
>>> >> >> >> >> >> >> >> >>>> >
>>> category:Companies_based_in_Redmond,_Washington
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> category:Multinational_companies_headquartered_in_the_United_States
>>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
>>> >> >> Redmont,Washington"
>>> >> >> >> >> which
>>> >> >> >> >> >> >> could
>>> >> >> >> >> >> >> >> be
>>> >> >> >> >> >> >> >> >>>> > matched.
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > There is still other contextual information
>>> from
>>> >> >> >> >> >> >> >> >>>> > dbpedia
>>> >> >> >> >> which
>>> >> >> >> >> >> >> can
>>> >> >> >> >> >> >> >> be
>>> >> >> >> >> >> >> >> >>>> used.
>>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could also
>>> >> >> include :
>>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
>>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>>> >> >> >> >> >> >> >> >>>> >                                dbpedia:Author
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>>> >> >> >> >> >> >> >> >>>> >                                dbpedia:Lawyer
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this as I
>>> think
>>> >> >> that
>>> >> >> >> >> >> >> >> >>>> > it
>>> >> >> >> >> may
>>> >> >> >> >> >> >> have
>>> >> >> >> >> >> >> >> >>>> some
>>> >> >> >> >> >> >> >> >>>> > value in increasing the number of coreference
>>> >> >> >> >> >> >> >> >>>> > resolutions
>>> >> >> >> >> and
>>> >> >> >> >> >> I'd
>>> >> >> >> >> >> >> >> like
>>> >> >> >> >> >> >> >> >>>> to
>>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than
>>> recall
>>> >> >> since
>>> >> >> >> >> >> >> >> >>>> > we
>>> >> >> >> >> >> already
>>> >> >> >> >> >> >> >> have
>>> >> >> >> >> >> >> >> >>>> a
>>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the stanford
>>> nlp
>>> >> tool
>>> >> >> >> >> >> >> >> >>>> > and
>>> >> >> >> >> this
>>> >> >> >> >> >> >> would
>>> >> >> >> >> >> >> >> >>>> be as
>>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is how I
>>> would
>>> >> >> like
>>> >> >> >> >> >> >> >> >>>> > to
>>> >> >> >> >> use
>>> >> >> >> >> >> >> it).
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I
>>> >> could
>>> >> >> >> >> >> >> >> >>>> > update
>>> >> >> >> >> it
>>> >> >> >> >> >> to
>>> >> >> >> >> >> >> >> show
>>> >> >> >> >> >> >> >> >>>> my
>>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it
>>> turns
>>> >> out
>>> >> >> >> >> >> >> >> >>>> > that
>>> >> >> >> >> it
>>> >> >> >> >> >> was
>>> >> >> >> >> >> >> a
>>> >> >> >> >> >> >> >> bad
>>> >> >> >> >> >> >> >> >>>> idea
>>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll end up
>>> >> with
>>> >> >> >> >> >> >> >> >>>> > more
>>> >> >> >> >> >> >> knowledge
>>> >> >> >> >> >> >> >> >>>> about
>>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>>> >> >> >> >> >> >> >> >>>> >
>>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be
>>> the
>>> >> >> >> >> >> >> >> >>>> >> devil's
>>> >> >> >> >> >> >> advocate
>>> >> >> >> >> >> >> >> but
>>> >> >> >> >> >> >> >> >>>> I'm
>>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the
>>> dbpedia
>>> >> >> >> >> categories
>>> >> >> >> >> >> >> >> feature.
>>> >> >> >> >> >> >> >> >>>> For
>>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
>>> "Microsoft
>>> >> >> posted
>>> >> >> >> >> >> >> >> >>>> >> its
>>> >> >> >> >> >> 2013
>>> >> >> >> >> >> >> >> >>>> earnings.
>>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit".
>>> So,
>>> >> maybe
>>> >> >> >> >> >> including
>>> >> >> >> >> >> >> more
>>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could
>>> >> increase
>>> >> >> the
>>> >> >> >> >> recall
>>> >> >> >> >> >> >> but
>>> >> >> >> >> >> >> >> of
>>> >> >> >> >> >> >> >> >>>> course
>>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>> >> Cheers,
>>> >> >> >> >> >> >> >> >>>> >> Rafa
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
>>> escribió:
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description of the
>>> >> steps
>>> >> >> >> >> >> >> >> >>>> >> for
>>> >> >> >> >> >> making
>>> >> >> >> >> >> >> this
>>> >> >> >> >> >> >> >> >>>> kind of
>>> >> >> >> >> >> >> >> >>>> >>> coreference work.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the following
>>> >> text in
>>> >> >> >> >> >> >> >> >>>> >>> the
>>> >> >> >> >> >> steps
>>> >> >> >> >> >> >> >> below
>>> >> >> >> >> >> >> >> >>>> in
>>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft
>>> posted
>>> >> >> its
>>> >> >> >> >> >> >> >> >>>> >>> 2013
>>> >> >> >> >> >> >> >> earnings.
>>> >> >> >> >> >> >> >> >>>> The
>>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which
>>> has :
>>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
>>> >> reference
>>> >> >> to
>>> >> >> >> >> >> >> >> >>>> >>> an
>>> >> >> >> >> >> entity
>>> >> >> >> >> >> >> >> local
>>> >> >> >> >> >> >> >> >>>> to
>>> >> >> >> >> >> >> >> >>>> >>> the
>>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not
>>> >> "another,
>>> >> >> >> >> every",
>>> >> >> >> >> >> etc
>>> >> >> >> >> >> >> >> which
>>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity outside of
>>> the
>>> >> >> text.
>>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun aside
>>> from
>>> >> the
>>> >> >> >> >> >> >> >> >>>> >>> main
>>> >> >> >> >> >> >> required
>>> >> >> >> >> >> >> >> >>>> noun
>>> >> >> >> >> >> >> >> >>>> >>> which
>>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I will not
>>> >> count
>>> >> >> >> >> >> >> >> >>>> >>> "The
>>> >> >> >> >> >> >> company"
>>> >> >> >> >> >> >> >> as
>>> >> >> >> >> >> >> >> >>>> being
>>> >> >> >> >> >> >> >> >>>> >>> a
>>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
>>> create a
>>> >> lot
>>> >> >> of
>>> >> >> >> >> false
>>> >> >> >> >> >> >> >> >>>> positives by
>>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some words
>>> >> such
>>> >> >> as
>>> >> >> >> >> >> >> >> >>>> >>> "in
>>> >> >> >> >> the
>>> >> >> >> >> >> >> >> company
>>> >> >> >> >> >> >> >> >>>> of
>>> >> >> >> >> >> >> >> >>>> >>> good people".
>>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good candidate
>>> >> since we
>>> >> >> >> >> >> >> >> >>>> >>> also
>>> >> >> >> >> >> have
>>> >> >> >> >> >> >> >> >>>> "software".
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the
>>> >> >> contents
>>> >> >> >> >> >> >> >> >>>> >>> of
>>> >> >> >> >> the
>>> >> >> >> >> >> >> >> dbpedia
>>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found prior
>>> to
>>> >> the
>>> >> >> >> >> location
>>> >> >> >> >> >> of
>>> >> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> >>>> noun
>>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
>>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the following
>>> >> format
>>> >> >> >> >> >> >> >> >>>> >>> (for
>>> >> >> >> >> >> >> Microsoft
>>> >> >> >> >> >> >> >> for
>>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the United
>>> >> >> States".
>>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software company" with
>>> >> that.
>>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in the
>>> >> dbpedia
>>> >> >> >> >> category
>>> >> >> >> >> >> >> has a
>>> >> >> >> >> >> >> >> >>>> plural
>>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all categories
>>> which
>>> >> I
>>> >> >> >> >> >> >> >> >>>> >>> saw. I
>>> >> >> >> >> >> don't
>>> >> >> >> >> >> >> >> know
>>> >> >> >> >> >> >> >> >>>> if
>>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
>>> thought
>>> >> of
>>> >> >> >> >> applying a
>>> >> >> >> >> >> >> >> >>>> lemmatizer on
>>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in order
>>> for
>>> >> them
>>> >> >> to
>>> >> >> >> >> have a
>>> >> >> >> >> >> >> >> common
>>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun
>>> phrase
>>> >> >> itself
>>> >> >> >> >> has a
>>> >> >> >> >> >> >> plural
>>> >> >> >> >> >> >> >> >>>> form.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison
>>> only the
>>> >> >> >> >> >> >> >> >>>> >>> words in
>>> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> >>>> category
>>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
>>> prepositions
>>> >> or
>>> >> >> >> >> >> determiners
>>> >> >> >> >> >> >> >> such
>>> >> >> >> >> >> >> >> >>>> as "of
>>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag the
>>> >> >> categories
>>> >> >> >> >> >> contents
>>> >> >> >> >> >> >> as
>>> >> >> >> >> >> >> >> >>>> well.
>>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma
>>> on
>>> >> the
>>> >> >> >> >> dbpedia
>>> >> >> >> >> >> >> >> >>>> categories when
>>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and
>>> >> storing
>>> >> >> >> >> >> >> >> >>>> >>> them
>>> >> >> >> >> for
>>> >> >> >> >> >> >> later
>>> >> >> >> >> >> >> >> >>>> use - I
>>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
>>> moment.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in the
>>> noun
>>> >> >> phrase
>>> >> >> >> >> with
>>> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> >>>> equivalent
>>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the
>>> number
>>> >> of
>>> >> >> >> >> matches I
>>> >> >> >> >> >> >> can
>>> >> >> >> >> >> >> >> >>>> create a
>>> >> >> >> >> >> >> >> >>>> >>> confidence level.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with
>>> the
>>> >> >> >> >> >> >> >> >>>> >>> rdf:type
>>> >> >> >> >> from
>>> >> >> >> >> >> >> >> dbpedia
>>> >> >> >> >> >> >> >> >>>> of the
>>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase the
>>> >> >> confidence
>>> >> >> >> >> level.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities
>>> which
>>> >> can
>>> >> >> >> >> >> >> >> >>>> >>> match a
>>> >> >> >> >> >> >> certain
>>> >> >> >> >> >> >> >> >>>> noun
>>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the
>>> >> closest
>>> >> >> >> >> >> >> >> >>>> >>> named
>>> >> >> >> >> >> entity
>>> >> >> >> >> >> >> >> prior
>>> >> >> >> >> >> >> >> >>>> to it
>>> >> >> >> >> >> >> >> >>>> >>> in the text.
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> What do you think?
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> Cristian
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>>> >> >> >> >> cristian.petroaca@gmail.com>:
>>> >> >> >> >> >> >> >> >>>> >>>
>>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but
>>> I'm
>>> >> >> >> >> >> >> >> >>>> >>>> working on
>>> >> >> >> >> >> it.
>>> >> >> >> >> >> >> I'll
>>> >> >> >> >> >> >> >> >>>> provide
>>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
>>> >> feedback on
>>> >> >> >> >> >> >> >> >>>> >>>> it.
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools
>>> such as
>>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
>>> >> >> >> >> and
>>> >> >> >> >> >> >> >> >>>> CherryPicker
>>> >> >> >> >> >> >> >> >>>> >>>> and
>>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> Cristian
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
>>> >> >> >> >> >> >> >> >>>> >>>>
>>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about your
>>> >> concrete
>>> >> >> >> >> heuristic,
>>> >> >> >> >> >> >> in my
>>> >> >> >> >> >> >> >> >>>> honest
>>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a
>>> lot of
>>> >> >> false
>>> >> >> >> >> >> >> positives. I
>>> >> >> >> >> >> >> >> >>>> don't
>>> >> >> >> >> >> >> >> >>>> >>>>> know
>>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some "locality"
>>> >> >> features
>>> >> >> >> >> >> >> >> >>>> >>>>> to
>>> >> >> >> >> >> detect
>>> >> >> >> >> >> >> >> such
>>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into
>>> account
>>> >> >> that
>>> >> >> >> >> >> >> >> >>>> >>>>> it
>>> >> >> >> >> is
>>> >> >> >> >> >> >> quite
>>> >> >> >> >> >> >> >> >>>> usual
>>> >> >> >> >> >> >> >> >>>> >>>>> that
>>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in
>>> >> >> different
>>> >> >> >> >> >> >> paragraphs.
>>> >> >> >> >> >> >> >> >>>> Although
>>> >> >> >> >> >> >> >> >>>> >>>>> I'm
>>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
>>> >> Understanding,
>>> >> >> I
>>> >> >> >> >> would
>>> >> >> >> >> >> say
>>> >> >> >> >> >> >> it
>>> >> >> >> >> >> >> >> is
>>> >> >> >> >> >> >> >> >>>> quite
>>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall
>>> rates
>>> >> >> for
>>> >> >> >> >> >> >> coreferencing
>>> >> >> >> >> >> >> >> >>>> using
>>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to
>>> >> others
>>> >> >> >> >> >> >> >> >>>> >>>>> tools
>>> >> >> >> >> like
>>> >> >> >> >> >> >> BART
>>> >> >> >> >> >> >> >> (
>>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
>>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
>>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
>>> escribió:
>>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
>>> >> >> >> >> >> >> >> >>>> >>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
>>> implementing
>>> >> the
>>> >> >> >> >> >> >> >> >>>> >>>>>> Event
>>> >> >> >> >> >> >> >> extraction
>>> >> >> >> >> >> >> >> >>>> Engine
>>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
>>> >> >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>>> >> >> >> >> >> >> >> >>>> to
>>> >> >> >> >> >> >> >> >>>> >>>>>> have
>>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given text.
>>> >> This
>>> >> >> is
>>> >> >> >> >> >> provided
>>> >> >> >> >> >> >> now
>>> >> >> >> >> >> >> >> >>>> via the
>>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw
>>> this
>>> >> >> >> >> >> >> >> >>>> >>>>>> module
>>> >> >> >> >> is
>>> >> >> >> >> >> >> >> performing
>>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
>>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack
>>> Obama
>>> >> and
>>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
>>> >> >> >> >> >> Obama)
>>> >> >> >> >> >> >> >> >>>> coreference
>>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from
>>> the
>>> >> text
>>> >> >> I
>>> >> >> >> >> though
>>> >> >> >> >> >> of
>>> >> >> >> >> >> >> >> >>>> creating
>>> >> >> >> >> >> >> >> >>>> >>>>>> some
>>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
>>> >> >> coreference :
>>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The
>>> >> software
>>> >> >> >> >> company
>>> >> >> >> >> >> just
>>> >> >> >> >> >> >> >> >>>> announced
>>> >> >> >> >> >> >> >> >>>> >>>>>> its
>>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously
>>> refers
>>> >> to
>>> >> >> >> >> "Apple".
>>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of
>>> Named
>>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
>>> >> >> >> >> >> which
>>> >> >> >> >> >> >> are
>>> >> >> >> >> >> >> >> of
>>> >> >> >> >> >> >> >> >>>> the
>>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this
>>> case
>>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
>>> >> >> >> >> and
>>> >> >> >> >> >> >> also
>>> >> >> >> >> >> >> >> >>>> have
>>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the
>>> dbpedia
>>> >> >> >> >> categories
>>> >> >> >> >> >> of
>>> >> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> >>>> named
>>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as
>>> "The
>>> >> >> >> >> >> >> >> >>>> >>>>>> software
>>> >> >> >> >> >> >> company" in
>>> >> >> >> >> >> >> >> >>>> the
>>> >> >> >> >> >> >> >> >>>> >>>>>> text
>>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using the
>>> new
>>> >> Pos
>>> >> >> Tag
>>> >> >> >> >> Based
>>> >> >> >> >> >> >> Phrase
>>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
>>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
>>> >> dependency
>>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
>>> >> >> >> >> >> the
>>> >> >> >> >> >> >> >> >>>> sentence and
>>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this
>>> kind
>>> >> of
>>> >> >> >> >> >> >> >> >>>> >>>>>> logic
>>> >> >> >> >> >> would
>>> >> >> >> >> >> >> be
>>> >> >> >> >> >> >> >> >>>> useful
>>> >> >> >> >> >> >> >> >>>> >>>>>> as a
>>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the
>>> >> >> precision
>>> >> >> >> >> >> >> >> >>>> >>>>>> and
>>> >> >> >> >> >> >> recall
>>> >> >> >> >> >> >> >> are
>>> >> >> >> >> >> >> >> >>>> good
>>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
>>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>>>>>
>>> >> >> >> >> >> >> >> >>>> >>
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>> --
>>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
>>> >> >> >> >> rupert.westenthaler@gmail.com
>>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>>> >> >> >> >> >> >> ++43-699-11108907
>>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>>> >> >> >> >> >> >> >> >>>>
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>>
>>> >> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >> >> --
>>> >> >> >> >> >> >> >> | Rupert Westenthaler
>>> >> >> >> >> >> >> >> rupert.westenthaler@gmail.com
>>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
>>> >> >> >> >> ++43-699-11108907
>>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
>>> >> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> --
>>> >> >> >> >> >> >> | Rupert Westenthaler
>>> >> >> rupert.westenthaler@gmail.com
>>> >> >> >> >> >> >> | Bodenlehenstraße 11
>>> >> >> >> >> >> >> ++43-699-11108907
>>> >> >> >> >> >> >> | A-5500 Bischofshofen
>>> >> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >> >> --
>>> >> >> >> >> >> | Rupert Westenthaler
>>> >> rupert.westenthaler@gmail.com
>>> >> >> >> >> >> | Bodenlehenstraße 11
>>> >> >> ++43-699-11108907
>>> >> >> >> >> >> | A-5500 Bischofshofen
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >> | Rupert Westenthaler
>>> rupert.westenthaler@gmail.com
>>> >> >> >> >> | Bodenlehenstraße 11
>>> >> ++43-699-11108907
>>> >> >> >> >> | A-5500 Bischofshofen
>>> >> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >> >> >> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >> >> >> | A-5500 Bischofshofen
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >> >> | A-5500 Bischofshofen
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >> | A-5500 Bischofshofen
>>> >>
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
Hi guys,

After Rupert's last suggestions related to this enhancement engine I
devised a more comprehensive algorithm for matching the noun phrases
against the NER properties.Please take a look and let me know what you
think. Thanks.

The following rules will be applied to every noun phrase in order to find
co-references:

1. For each NER prior to the current noun phrase in the text match the
yago:class label to the contents of the noun phrase.

For the NERs which have a yago:class which matches, apply:

2. Group membership rules :

    a. spatial membership : the NER is part of a Location. If the noun
phrase contains a LOCATION or a demonym then check any location properties
of the matching NER.

    If matching NER is a :
    - person, match against :birthPlace, :region, :nationality
    - organisation, match against :foundationPlace, :locationCity,
:location, :hometown
    - place, match against :country, :subdivisionName, :location,

    Ex: The Italian President, The Richmond-based company

    b. organisational membership : the NER is part of an Organisation. If
the noun phrase contains an ORGANISATION then check the following
properties of the maching NER:

    If matching NER is :
    - person, match against :occupation, :associatedActs
    - organisation ?
    - location ?

Ex: The Microsoft executive, The Pink Floyd singer

3. Functional description rule: the noun phrase describes what the NER does
conceptually.
If there are no NERs in the noun phrase then match the following properties
of the matching NER to the contents of the noun phrase (aside from the
nouns which are part of the yago:class) :

   If NER is a:
   - person ?
   - organisation : , match against :service, :industry, :genre
   - location ?

Ex:  The software company.

4. If no matches were found for the current NER with rules 2 or 3 then if
the yago:class which matched has more than 2 nouns then we also consider
this a good co-reference but with a lower confidence maybe.

Ex: The former tennis player, the theoretical physicist.

5. Based on the number of nouns which matched we create a confidence level.
The number of matched nouns cannot be lower than 2 and we must have a
yago:class match.

For all NERs which got to this point, select the closest ones in the text
to the noun phrase which matched against the same properties (yago:class
and dbpedia) and mark them as co-references.

Note: all noun phrases need to be lemmatized before all of this in case
there are any plurals.


2014-03-25 20:50 GMT+02:00 Cristian Petroaca <cr...@gmail.com>:

> That worked. Thanks.
>
> So, there are no exceptions during the startup of the launcher.
> The component tab in the felix console shows 6 WeightedChains the first
> time, including the default one but after my changes and a restart there
> are only 5 - the default one is missing altogether.
>
>
> 2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
> rupert.westenthaler@gmail.com>:
>
> Hi Cristian,
>>
>> I do see the same problem since last Friday. The solution as mentions
>> by [1] works for me.
>>
>>     mvn -Djsse.enableSNIExtension=false {goals}
>>
>> No Idea why https connections to github do currently cause this. I
>> could not find anything related via Google. So I suggest to use the
>> system property for now. If this persists for longer we can adapt the
>> build files accordingly.
>>
>> best
>> Rupert
>>
>>
>>
>>
>> [1]
>> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
>>
>> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > I did a clean on the whole project and now I wanted to do another "mvn
>> > clean install" but I am getting this :
>> >
>> > "[INFO]
>> > ------------------------------------------------------------------------
>> > [ERROR] Failed to execute goal
>> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
>> > run (download) on project org.apache.stanbol.data.opennlp.lang.es: An
>> Ant
>> > BuildE
>> > xception has occured: The following error occurred while executing this
>> > line:
>> > [ERROR]
>> > C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
>> > 3: Failed to copy
>> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
>> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to
>> > C:\Data\Pr
>> >
>> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
>> > data\opennlp\es-pos-maxent.bin due to javax.net.ssl.SSLProtocolException
>> > handshake alert : unrecognized_name"
>> >
>> >
>> >
>> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
>> > rupert.westenthaler@gmail.com>:
>> >
>> >> Hi Cristian,
>> >>
>> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
>> >> <cr...@gmail.com> wrote:
>> >> >
>> >>
>> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
>> >> > service.ranking=I"-2147483648"
>> >> > stanbol.enhancer.chain.name="default"
>> >>
>> >> Does look fine to me. Do you see any exception during the startup of
>> >> the launcher. Can you check the status of this component in the
>> >> component tab of the felix web console [1] (search for
>> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
>> >> you have multiple you can find the correct one by comparing the
>> >> "Properties" with those in the configuration file.
>> >>
>> >> I guess that the according service is in the 'unsatisfied' as you do
>> >> not see it in the web interface. But if this is the case you should
>> >> also see the according exception in the log. You can also manually
>> >> stop/start the component. In this case the exception should be
>> >> re-thrown and you do not need to search the log for it.
>> >>
>> >> best
>> >> Rupert
>> >>
>> >>
>> >> [1] http://localhost:8080/system/console/components
>> >>
>> >> >
>> >> >
>> >> >
>> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
>> >> rupert.westenthaler@gmail.com
>> >> >>:
>> >> >
>> >> >> Hi Cristian,
>> >> >>
>> >> >> you can not send attachments to the list. Please copy the contents
>> >> >> directly to the mail
>> >> >>
>> >> >> thx
>> >> >> Rupert
>> >> >>
>> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>> >> >> <cr...@gmail.com> wrote:
>> >> >> > The config attached.
>> >> >> >
>> >> >> >
>> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>> >> >> > <ru...@gmail.com>:
>> >> >> >
>> >> >> >> Hi Cristian,
>> >> >> >>
>> >> >> >> can you provide the contents of the chain after your
>> modifications?
>> >> >> >> Would be interesting to test why the chain is no longer active
>> after
>> >> >> >> the restart.
>> >> >> >>
>> >> >> >> You can find the config file in the 'stanbol/fileinstall' folder.
>> >> >> >>
>> >> >> >> best
>> >> >> >> Rupert
>> >> >> >>
>> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> > Related to the default chain selection rules : before restart I
>> >> had a
>> >> >> >> > chain
>> >> >> >> > with the name 'default' as in I could access it via
>> >> >> >> > enhancer/chain/default.
>> >> >> >> > Then I just added another engine to the 'default' chain. I
>> assumed
>> >> >> that
>> >> >> >> > after the restart the chain with the 'default' name would be
>> >> >> persisted.
>> >> >> >> > So
>> >> >> >> > the first rule should have been applied after the restart as
>> well.
>> >> But
>> >> >> >> > instead I cannot reach it via enhancer/chain/default anymore
>> so its
>> >> >> >> > gone.
>> >> >> >> > Anyway, this is not a big deal, it's not blocking me in any
>> way, I
>> >> >> just
>> >> >> >> > wanted to understand where the problem is.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>> >> >> >> > <rupert.westenthaler@gmail.com
>> >> >> >> >>:
>> >> >> >> >
>> >> >> >> >> Hi Cristian
>> >> >> >> >>
>> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>> >> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
>> >> >> >> >> >
>> >> >> >> >> > 2. I start the stable launcher -> create a new instance of
>> the
>> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At this
>> point
>> >> >> >> >> > everything
>> >> >> >> >> > looks good and works ok.
>> >> >> >> >> > After I restart the server the default chain is gone and
>> >> instead I
>> >> >> >> >> > see
>> >> >> >> >> this
>> >> >> >> >> > in the enhancement chains page : all-active (default, id:
>> 149,
>> >> >> >> >> > ranking:
>> >> >> >> >> 0,
>> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not contain
>> the
>> >> >> >> >> > 'default'
>> >> >> >> >> > word before the restart.
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> Please note the default chain selection rules as described at
>> [1].
>> >> >> You
>> >> >> >> >> can also access chains chains under
>> '/enhancer/chain/{chain-name}'
>> >> >> >> >>
>> >> >> >> >> best
>> >> >> >> >> Rupert
>> >> >> >> >>
>> >> >> >> >> [1]
>> >> >> >> >>
>> >> >> >> >>
>> >> >>
>> >>
>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>> >> >> >> >>
>> >> >> >> >> > It looks like the config files are exactly what I need.
>> Thanks.
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >>:
>> >> >> >> >> >
>> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> >> >> > Thanks Rupert.
>> >> >> >> >> >> >
>> >> >> >> >> >> > A couple more questions/issues :
>> >> >> >> >> >> >
>> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this
>> in the
>> >> >> >> >> >> > console
>> >> >> >> >> >> > output :
>> >> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>> >> >> >> >> >>
>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
>> >> messed
>> >> >> >> >> >> > up. I
>> >> >> >> >> >> > usually use the 'default' chain and add my engine to it
>> so
>> >> there
>> >> >> >> >> >> > are
>> >> >> >> >> 11
>> >> >> >> >> >> > engines in it. After the restart this chain now contains
>> >> around
>> >> >> 23
>> >> >> >> >> >> engines
>> >> >> >> >> >> > in total.
>> >> >> >> >> >>
>> >> >> >> >> >> I was not able to replicate this. What I tried was
>> >> >> >> >> >>
>> >> >> >> >> >> (1) start up the stable launcher
>> >> >> >> >> >> (2) add an additional engine to the default chain
>> >> >> >> >> >> (3) restart the launcher
>> >> >> >> >> >>
>> >> >> >> >> >> The default chain was not changed after (2) and (3). So I
>> would
>> >> >> need
>> >> >> >> >> >> further information for knowing why this is happening.
>> >> >> >> >> >>
>> >> >> >> >> >> Generally it is better to create you own chain instance as
>> >> >> modifying
>> >> >> >> >> >> one that is provided by the default configuration. I would
>> also
>> >> >> >> >> >> recommend that you keep your test configuration in text
>> files
>> >> and
>> >> >> to
>> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so
>> >> prevent
>> >> >> you
>> >> >> >> >> >> from manually entering the configuration after a software
>> >> update.
>> >> >> >> >> >> The
>> >> >> >> >> >> production-mode section [3] provides information on how to
>> do
>> >> >> that.
>> >> >> >> >> >>
>> >> >> >> >> >> best
>> >> >> >> >> >> Rupert
>> >> >> >> >> >>
>> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> >> >> >> >> >> [2] http://svn.apache.org/r1576623
>> >> >> >> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
>> >> >> >> >> >>
>> >> >> >> >> >> > ERROR: Bundle
>> org.apache.stanbol.enhancer.engine.topic.web
>> >> >> [153]:
>> >> >> >> >> Error
>> >> >> >> >> >> > starting
>> >> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >>
>> >>
>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >>
>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> >> >> >> >> >> > (org.osgi
>> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in
>> bundle
>> >> >> >> >> >> > org.apache.stanbol.e
>> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0:
>> >> missing
>> >> >> >> >> >> > requirement [15
>> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved
>> constraint in
>> >> >> >> >> >> > bundle
>> >> >> >> >> >> > org.apache.s
>> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve
>> >> 153.0:
>> >> >> >> >> missing
>> >> >> >> >> >> > require
>> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>> >> >> >> >> >> > )
>> >> >> >> >> >> >         at
>> >> >> >> >> >>
>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >> >> >> >> >> >         at
>> >> >> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >> >> >> >> >> >         at
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >>
>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >> >> >> >> >> >
>> >> >> >> >> >> >         at
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >>
>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> >> >> >> >> >> > )
>> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>> >> >> >> >> >> >
>> >> >> >> >> >> > Despite of this the server starts fine and I can use the
>> >> >> enhancer
>> >> >> >> >> fine.
>> >> >> >> >> >> Do
>> >> >> >> >> >> > you guys see this as well?
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
>> >> messed
>> >> >> >> >> >> > up. I
>> >> >> >> >> >> > usually use the 'default' chain and add my engine to it
>> so
>> >> there
>> >> >> >> >> >> > are
>> >> >> >> >> 11
>> >> >> >> >> >> > engines in it. After the restart this chain now contains
>> >> around
>> >> >> 23
>> >> >> >> >> >> engines
>> >> >> >> >> >> > in total.
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >> >>:
>> >> >> >> >> >> >
>> >> >> >> >> >> >> Hi Cristian,
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> NER Annotations are typically available as both
>> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation
>> [1]
>> >> in
>> >> >> the
>> >> >> >> >> >> >> enhancement metadata. As you are already accessing the
>> >> >> >> >> >> >> AnayzedText I
>> >> >> >> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> best
>> >> >> >> >> >> >> Rupert
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> [1]
>> >> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >>
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> >> >> >> > Thanks.
>> >> >> >> >> >> >> > I assume I should get the Named entities using the
>> same
>> >> but
>> >> >> >> >> >> >> > with
>> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> Hallo Cristian,
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement
>> results.
>> >> >> You
>> >> >> >> >> need to
>> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> here is some demo code you can use in the
>> >> computeEnhancement
>> >> >> >> >> method
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >>         AnalysedText at =
>> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>> >> >> >> >> ci,
>> >> >> >> >> >> >> true);
>> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
>> >> >> >> >> >> >> >> at.getSentences();
>> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as single
>> >> >> sentence
>> >> >> >> >> >> >> >>             sections =
>> >> Collections.singleton(at).iterator();
>> >> >> >> >> >> >> >>         }
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >>         while(sections.hasNext()){
>> >> >> >> >> >> >> >>             Section section = sections.next();
>> >> >> >> >> >> >> >>             Iterator<Span> chunks =
>> >> >> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >> >> >> >> >> >> >>             while(chunks.hasNext()){
>> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
>> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
>> >> >> >> >> >> >> >>
>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >> >> >> >> >> >> >>                 if(phrase.value().getCategory() ==
>> >> >> >> >> >> >> LexicalCategory.Noun){
>> >> >> >> >> >> >> >>                     log.info(" - NounPhrase [{},{}]
>> {}",
>> >> >> new
>> >> >> >> >> >> Object[]{
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >> >> >> >> >> >> >>                 }
>> >> >> >> >> >> >> >>             }
>> >> >> >> >> >> >> >>         }
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> hope this helps
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> best
>> >> >> >> >> >> >> >> Rupert
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> [1]
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >>
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> >> >> >> >> > I started to implement the engine and I'm having
>> >> problems
>> >> >> >> >> >> >> >> > with
>> >> >> >> >> >> getting
>> >> >> >> >> >> >> >> > results for noun phrases. I modified the "default"
>> >> >> weighted
>> >> >> >> >> chain
>> >> >> >> >> >> to
>> >> >> >> >> >> >> also
>> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample text
>> :
>> >> >> "Angela
>> >> >> >> >> Merkel
>> >> >> >> >> >> >> >> visted
>> >> >> >> >> >> >> >> > China. The german chancellor met with various
>> people".
>> >> I
>> >> >> >> >> expected
>> >> >> >> >> >> that
>> >> >> >> >> >> >> >> the
>> >> >> >> >> >> >> >> > RDF XML output would contain some info about the
>> noun
>> >> >> >> >> >> >> >> > phrases
>> >> >> >> >> but I
>> >> >> >> >> >> >> >> cannot
>> >> >> >> >> >> >> >> > see any.
>> >> >> >> >> >> >> >> > Could you point me to the correct way to generate
>> the
>> >> noun
>> >> >> >> >> phrases?
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > Thanks,
>> >> >> >> >> >> >> >> > Cristian
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> >> Opened
>> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
>> >> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>
>> >> >> >> >> >> >> >> >> :
>> >> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> >> Hi Rupert,
>> >> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also
>> >> take a
>> >> >> >> >> >> >> >> >>> look
>> >> >> >> >> at
>> >> >> >> >> >> >> Yago.
>> >> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked about
>> here.
>> >> It
>> >> >> >> >> >> >> >> >>> will
>> >> >> >> >> >> >> probably
>> >> >> >> >> >> >> >> >>> have just a draft-like description for now and
>> will
>> >> be
>> >> >> >> >> >> >> >> >>> updated
>> >> >> >> >> >> as I
>> >> >> >> >> >> >> go
>> >> >> >> >> >> >> >> >>> along.
>> >> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >> >>> Thanks,
>> >> >> >> >> >> >> >> >>> Cristian
>> >> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>> >> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >> >>> Hi Cristian,
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You should
>> have
>> >> a
>> >> >> >> >> >> >> >> >>>> look at
>> >> >> >> >> >> Yago2
>> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy
>> is
>> >> much
>> >> >> >> >> better
>> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
>> >> >> >> >> >> >> >> >>>> suggestions of
>> >> >> >> >> >> >> dbpedia
>> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and
>> >> yago2
>> >> >> do
>> >> >> >> >> >> provide
>> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>> >> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
>> >> Redmond's
>> >> >> >> >> >> >> >> >>>> >> company
>> >> >> >> >> >> made
>> >> >> >> >> >> >> a
>> >> >> >> >> >> >> >> >>>> >> huge profit".
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial
>> contexts
>> >> >> are
>> >> >> >> >> >> >> >> >>>> very
>> >> >> >> >> >> >> >> >>>> important as they tend to be often used for
>> >> >> referencing.
>> >> >> >> >> >> >> >> >>>> So I
>> >> >> >> >> >> would
>> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial context.
>> For
>> >> >> >> >> >> >> >> >>>> spatial
>> >> >> >> >> >> >> Entities
>> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for other
>> >> (like a
>> >> >> >> >> Person,
>> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial
>> entities
>> >> >> >> >> >> >> >> >>>> define
>> >> >> >> >> >> their
>> >> >> >> >> >> >> >> >>>> spatial context. This context could than be
>> used to
>> >> >> >> >> >> >> >> >>>> correctly
>> >> >> >> >> >> link
>> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the "spatial"
>> >> >> context
>> >> >> >> >> >> >> >> >>>> of
>> >> >> >> >> each
>> >> >> >> >> >> >> >> >>>> entity (basically relation to entities that are
>> >> cities,
>> >> >> >> >> regions,
>> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because
>> those
>> >> are
>> >> >> >> >> >> >> >> >>>> very
>> >> >> >> >> often
>> >> >> >> >> >> >> used
>> >> >> >> >> >> >> >> >>>> for coreferences.
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >> >> >> >> >> >> >> >>>> [2]
>> >> >> >> >> >> >> >> >>>>
>> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >> >> >> >> >> >> >> >>>> [3]
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >>
>> >>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
>> Petroaca
>> >> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for each
>> >> entity,
>> >> >> >> >> >> >> >> >>>> > in
>> >> >> >> >> this
>> >> >> >> >> >> >> case
>> >> >> >> >> >> >> >> for
>> >> >> >> >> >> >> >> >>>> > Microsoft we have :
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >> >> >> >> >> >> >> >>>> > category:Microsoft
>> >> >> >> >> >> >> >> >>>> >
>> category:Software_companies_of_the_United_States
>> >> >> >> >> >> >> >> >>>> >
>> >> >> category:Software_companies_based_in_Washington_(state)
>> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>> >> >> >> >> >> >> >> >>>> >
>> category:1975_establishments_in_the_United_States
>> >> >> >> >> >> >> >> >>>> >
>> category:Companies_based_in_Redmond,_Washington
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> category:Multinational_companies_headquartered_in_the_United_States
>> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>> >> >> >> >> >> >> >> >>>> >
>> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
>> >> >> Redmont,Washington"
>> >> >> >> >> which
>> >> >> >> >> >> >> could
>> >> >> >> >> >> >> >> be
>> >> >> >> >> >> >> >> >>>> > matched.
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> > There is still other contextual information
>> from
>> >> >> >> >> >> >> >> >>>> > dbpedia
>> >> >> >> >> which
>> >> >> >> >> >> >> can
>> >> >> >> >> >> >> >> be
>> >> >> >> >> >> >> >> >>>> used.
>> >> >> >> >> >> >> >> >>>> > For example for an Organization we could also
>> >> >> include :
>> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
>> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>> >> >> >> >> >> >> >> >>>> >                                dbpedia:Author
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>> >> >> >> >> >> >> >> >>>> >                                dbpedia:Lawyer
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this as I
>> think
>> >> >> that
>> >> >> >> >> >> >> >> >>>> > it
>> >> >> >> >> may
>> >> >> >> >> >> >> have
>> >> >> >> >> >> >> >> >>>> some
>> >> >> >> >> >> >> >> >>>> > value in increasing the number of coreference
>> >> >> >> >> >> >> >> >>>> > resolutions
>> >> >> >> >> and
>> >> >> >> >> >> I'd
>> >> >> >> >> >> >> >> like
>> >> >> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than
>> recall
>> >> >> since
>> >> >> >> >> >> >> >> >>>> > we
>> >> >> >> >> >> already
>> >> >> >> >> >> >> >> have
>> >> >> >> >> >> >> >> >>>> a
>> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the stanford
>> nlp
>> >> tool
>> >> >> >> >> >> >> >> >>>> > and
>> >> >> >> >> this
>> >> >> >> >> >> >> would
>> >> >> >> >> >> >> >> >>>> be as
>> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is how I
>> would
>> >> >> like
>> >> >> >> >> >> >> >> >>>> > to
>> >> >> >> >> use
>> >> >> >> >> >> >> it).
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I
>> >> could
>> >> >> >> >> >> >> >> >>>> > update
>> >> >> >> >> it
>> >> >> >> >> >> to
>> >> >> >> >> >> >> >> show
>> >> >> >> >> >> >> >> >>>> my
>> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it
>> turns
>> >> out
>> >> >> >> >> >> >> >> >>>> > that
>> >> >> >> >> it
>> >> >> >> >> >> was
>> >> >> >> >> >> >> a
>> >> >> >> >> >> >> >> bad
>> >> >> >> >> >> >> >> >>>> idea
>> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll end up
>> >> with
>> >> >> >> >> >> >> >> >>>> > more
>> >> >> >> >> >> >> knowledge
>> >> >> >> >> >> >> >> >>>> about
>> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>> >> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
>> >> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be
>> the
>> >> >> >> >> >> >> >> >>>> >> devil's
>> >> >> >> >> >> >> advocate
>> >> >> >> >> >> >> >> but
>> >> >> >> >> >> >> >> >>>> I'm
>> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the
>> dbpedia
>> >> >> >> >> categories
>> >> >> >> >> >> >> >> feature.
>> >> >> >> >> >> >> >> >>>> For
>> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
>> "Microsoft
>> >> >> posted
>> >> >> >> >> >> >> >> >>>> >> its
>> >> >> >> >> >> 2013
>> >> >> >> >> >> >> >> >>>> earnings.
>> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit".
>> So,
>> >> maybe
>> >> >> >> >> >> including
>> >> >> >> >> >> >> more
>> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could
>> >> increase
>> >> >> the
>> >> >> >> >> recall
>> >> >> >> >> >> >> but
>> >> >> >> >> >> >> >> of
>> >> >> >> >> >> >> >> >>>> course
>> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
>> >> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >> >>>> >> Cheers,
>> >> >> >> >> >> >> >> >>>> >> Rafa
>> >> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca
>> escribió:
>> >> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description of the
>> >> steps
>> >> >> >> >> >> >> >> >>>> >> for
>> >> >> >> >> >> making
>> >> >> >> >> >> >> this
>> >> >> >> >> >> >> >> >>>> kind of
>> >> >> >> >> >> >> >> >>>> >>> coreference work.
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>> I will be using references to the following
>> >> text in
>> >> >> >> >> >> >> >> >>>> >>> the
>> >> >> >> >> >> steps
>> >> >> >> >> >> >> >> below
>> >> >> >> >> >> >> >> >>>> in
>> >> >> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft
>> posted
>> >> >> its
>> >> >> >> >> >> >> >> >>>> >>> 2013
>> >> >> >> >> >> >> >> earnings.
>> >> >> >> >> >> >> >> >>>> The
>> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which
>> has :
>> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
>> >> reference
>> >> >> to
>> >> >> >> >> >> >> >> >>>> >>> an
>> >> >> >> >> >> entity
>> >> >> >> >> >> >> >> local
>> >> >> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >> >> >>>> >>> the
>> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not
>> >> "another,
>> >> >> >> >> every",
>> >> >> >> >> >> etc
>> >> >> >> >> >> >> >> which
>> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity outside of
>> the
>> >> >> text.
>> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun aside
>> from
>> >> the
>> >> >> >> >> >> >> >> >>>> >>> main
>> >> >> >> >> >> >> required
>> >> >> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >> >> >>>> >>> which
>> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I will not
>> >> count
>> >> >> >> >> >> >> >> >>>> >>> "The
>> >> >> >> >> >> >> company"
>> >> >> >> >> >> >> >> as
>> >> >> >> >> >> >> >> >>>> being
>> >> >> >> >> >> >> >> >>>> >>> a
>> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could
>> create a
>> >> lot
>> >> >> of
>> >> >> >> >> false
>> >> >> >> >> >> >> >> >>>> positives by
>> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some words
>> >> such
>> >> >> as
>> >> >> >> >> >> >> >> >>>> >>> "in
>> >> >> >> >> the
>> >> >> >> >> >> >> >> company
>> >> >> >> >> >> >> >> >>>> of
>> >> >> >> >> >> >> >> >>>> >>> good people".
>> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good candidate
>> >> since we
>> >> >> >> >> >> >> >> >>>> >>> also
>> >> >> >> >> >> have
>> >> >> >> >> >> >> >> >>>> "software".
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the
>> >> >> contents
>> >> >> >> >> >> >> >> >>>> >>> of
>> >> >> >> >> the
>> >> >> >> >> >> >> >> dbpedia
>> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found prior
>> to
>> >> the
>> >> >> >> >> location
>> >> >> >> >> >> of
>> >> >> >> >> >> >> the
>> >> >> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
>> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the following
>> >> format
>> >> >> >> >> >> >> >> >>>> >>> (for
>> >> >> >> >> >> >> Microsoft
>> >> >> >> >> >> >> >> for
>> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the United
>> >> >> States".
>> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software company" with
>> >> that.
>> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in the
>> >> dbpedia
>> >> >> >> >> category
>> >> >> >> >> >> >> has a
>> >> >> >> >> >> >> >> >>>> plural
>> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all categories
>> which
>> >> I
>> >> >> >> >> >> >> >> >>>> >>> saw. I
>> >> >> >> >> >> don't
>> >> >> >> >> >> >> >> know
>> >> >> >> >> >> >> >> >>>> if
>> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
>> thought
>> >> of
>> >> >> >> >> applying a
>> >> >> >> >> >> >> >> >>>> lemmatizer on
>> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in order
>> for
>> >> them
>> >> >> to
>> >> >> >> >> have a
>> >> >> >> >> >> >> >> common
>> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun
>> phrase
>> >> >> itself
>> >> >> >> >> has a
>> >> >> >> >> >> >> plural
>> >> >> >> >> >> >> >> >>>> form.
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison
>> only the
>> >> >> >> >> >> >> >> >>>> >>> words in
>> >> >> >> >> >> the
>> >> >> >> >> >> >> >> >>>> category
>> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
>> prepositions
>> >> or
>> >> >> >> >> >> determiners
>> >> >> >> >> >> >> >> such
>> >> >> >> >> >> >> >> >>>> as "of
>> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag the
>> >> >> categories
>> >> >> >> >> >> contents
>> >> >> >> >> >> >> as
>> >> >> >> >> >> >> >> >>>> well.
>> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma
>> on
>> >> the
>> >> >> >> >> dbpedia
>> >> >> >> >> >> >> >> >>>> categories when
>> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and
>> >> storing
>> >> >> >> >> >> >> >> >>>> >>> them
>> >> >> >> >> for
>> >> >> >> >> >> >> later
>> >> >> >> >> >> >> >> >>>> use - I
>> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
>> moment.
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in the
>> noun
>> >> >> phrase
>> >> >> >> >> with
>> >> >> >> >> >> the
>> >> >> >> >> >> >> >> >>>> equivalent
>> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the
>> number
>> >> of
>> >> >> >> >> matches I
>> >> >> >> >> >> >> can
>> >> >> >> >> >> >> >> >>>> create a
>> >> >> >> >> >> >> >> >>>> >>> confidence level.
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with
>> the
>> >> >> >> >> >> >> >> >>>> >>> rdf:type
>> >> >> >> >> from
>> >> >> >> >> >> >> >> dbpedia
>> >> >> >> >> >> >> >> >>>> of the
>> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase the
>> >> >> confidence
>> >> >> >> >> level.
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities
>> which
>> >> can
>> >> >> >> >> >> >> >> >>>> >>> match a
>> >> >> >> >> >> >> certain
>> >> >> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the
>> >> closest
>> >> >> >> >> >> >> >> >>>> >>> named
>> >> >> >> >> >> entity
>> >> >> >> >> >> >> >> prior
>> >> >> >> >> >> >> >> >>>> to it
>> >> >> >> >> >> >> >> >>>> >>> in the text.
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>> What do you think?
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>> Cristian
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>> >> >> >> >> cristian.petroaca@gmail.com>:
>> >> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
>> >> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but
>> I'm
>> >> >> >> >> >> >> >> >>>> >>>> working on
>> >> >> >> >> >> it.
>> >> >> >> >> >> >> I'll
>> >> >> >> >> >> >> >> >>>> provide
>> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
>> >> feedback on
>> >> >> >> >> >> >> >> >>>> >>>> it.
>> >> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
>> >> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools
>> such as
>> >> >> >> >> >> >> >> >>>> >>>> ArkRef
>> >> >> >> >> and
>> >> >> >> >> >> >> >> >>>> CherryPicker
>> >> >> >> >> >> >> >> >>>> >>>> and
>> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>> >> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >> >>>> >>>> Cristian
>> >> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>> >> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
>> >> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about your
>> >> concrete
>> >> >> >> >> heuristic,
>> >> >> >> >> >> >> in my
>> >> >> >> >> >> >> >> >>>> honest
>> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a
>> lot of
>> >> >> false
>> >> >> >> >> >> >> positives. I
>> >> >> >> >> >> >> >> >>>> don't
>> >> >> >> >> >> >> >> >>>> >>>>> know
>> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some "locality"
>> >> >> features
>> >> >> >> >> >> >> >> >>>> >>>>> to
>> >> >> >> >> >> detect
>> >> >> >> >> >> >> >> such
>> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into
>> account
>> >> >> that
>> >> >> >> >> >> >> >> >>>> >>>>> it
>> >> >> >> >> is
>> >> >> >> >> >> >> quite
>> >> >> >> >> >> >> >> >>>> usual
>> >> >> >> >> >> >> >> >>>> >>>>> that
>> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in
>> >> >> different
>> >> >> >> >> >> >> paragraphs.
>> >> >> >> >> >> >> >> >>>> Although
>> >> >> >> >> >> >> >> >>>> >>>>> I'm
>> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
>> >> Understanding,
>> >> >> I
>> >> >> >> >> would
>> >> >> >> >> >> say
>> >> >> >> >> >> >> it
>> >> >> >> >> >> >> >> is
>> >> >> >> >> >> >> >> >>>> quite
>> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall
>> rates
>> >> >> for
>> >> >> >> >> >> >> coreferencing
>> >> >> >> >> >> >> >> >>>> using
>> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to
>> >> others
>> >> >> >> >> >> >> >> >>>> >>>>> tools
>> >> >> >> >> like
>> >> >> >> >> >> >> BART
>> >> >> >> >> >> >> >> (
>> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >> >> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
>> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
>> >> >> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
>> escribió:
>> >> >> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
>> >> >> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
>> implementing
>> >> the
>> >> >> >> >> >> >> >> >>>> >>>>>> Event
>> >> >> >> >> >> >> >> extraction
>> >> >> >> >> >> >> >> >>>> Engine
>> >> >> >> >> >> >> >> >>>> >>>>>> feature :
>> >> >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >> >> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >> >> >>>> >>>>>> have
>> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given text.
>> >> This
>> >> >> is
>> >> >> >> >> >> provided
>> >> >> >> >> >> >> now
>> >> >> >> >> >> >> >> >>>> via the
>> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw
>> this
>> >> >> >> >> >> >> >> >>>> >>>>>> module
>> >> >> >> >> is
>> >> >> >> >> >> >> >> performing
>> >> >> >> >> >> >> >> >>>> >>>>>> mostly
>> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack
>> Obama
>> >> and
>> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
>> >> >> >> >> >> Obama)
>> >> >> >> >> >> >> >> >>>> coreference
>> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from
>> the
>> >> text
>> >> >> I
>> >> >> >> >> though
>> >> >> >> >> >> of
>> >> >> >> >> >> >> >> >>>> creating
>> >> >> >> >> >> >> >> >>>> >>>>>> some
>> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
>> >> >> coreference :
>> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The
>> >> software
>> >> >> >> >> company
>> >> >> >> >> >> just
>> >> >> >> >> >> >> >> >>>> announced
>> >> >> >> >> >> >> >> >>>> >>>>>> its
>> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously
>> refers
>> >> to
>> >> >> >> >> "Apple".
>> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of
>> Named
>> >> >> >> >> >> >> >> >>>> >>>>>> Entities
>> >> >> >> >> >> which
>> >> >> >> >> >> >> are
>> >> >> >> >> >> >> >> of
>> >> >> >> >> >> >> >> >>>> the
>> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this
>> case
>> >> >> >> >> >> >> >> >>>> >>>>>> "company"
>> >> >> >> >> and
>> >> >> >> >> >> >> also
>> >> >> >> >> >> >> >> >>>> have
>> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the
>> dbpedia
>> >> >> >> >> categories
>> >> >> >> >> >> of
>> >> >> >> >> >> >> the
>> >> >> >> >> >> >> >> >>>> named
>> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as
>> "The
>> >> >> >> >> >> >> >> >>>> >>>>>> software
>> >> >> >> >> >> >> company" in
>> >> >> >> >> >> >> >> >>>> the
>> >> >> >> >> >> >> >> >>>> >>>>>> text
>> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using the
>> new
>> >> Pos
>> >> >> Tag
>> >> >> >> >> Based
>> >> >> >> >> >> >> Phrase
>> >> >> >> >> >> >> >> >>>> >>>>>> extraction
>> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
>> >> dependency
>> >> >> >> >> >> >> >> >>>> >>>>>> tree of
>> >> >> >> >> >> the
>> >> >> >> >> >> >> >> >>>> sentence and
>> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this
>> kind
>> >> of
>> >> >> >> >> >> >> >> >>>> >>>>>> logic
>> >> >> >> >> >> would
>> >> >> >> >> >> >> be
>> >> >> >> >> >> >> >> >>>> useful
>> >> >> >> >> >> >> >> >>>> >>>>>> as a
>> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the
>> >> >> precision
>> >> >> >> >> >> >> >> >>>> >>>>>> and
>> >> >> >> >> >> >> recall
>> >> >> >> >> >> >> >> are
>> >> >> >> >> >> >> >> >>>> good
>> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
>> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>> --
>> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
>> >> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>> >> >> >> >> >> >> ++43-699-11108907
>> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>> >> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> --
>> >> >> >> >> >> >> >> | Rupert Westenthaler
>> >> >> >> >> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >> >> >> | Bodenlehenstraße 11
>> >> >> >> >> ++43-699-11108907
>> >> >> >> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> --
>> >> >> >> >> >> >> | Rupert Westenthaler
>> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >> >> | Bodenlehenstraße 11
>> >> >> >> >> >> >> ++43-699-11108907
>> >> >> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> --
>> >> >> >> >> >> | Rupert Westenthaler
>> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >> | Bodenlehenstraße 11
>> >> >> ++43-699-11108907
>> >> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> >> >> >> >> | Bodenlehenstraße 11
>> >> ++43-699-11108907
>> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >> >> >> | A-5500 Bischofshofen
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
That worked. Thanks.

So, there are no exceptions during the startup of the launcher.
The component tab in the felix console shows 6 WeightedChains the first
time, including the default one but after my changes and a restart there
are only 5 - the default one is missing altogether.


2014-03-24 20:18 GMT+02:00 Rupert Westenthaler <
rupert.westenthaler@gmail.com>:

> Hi Cristian,
>
> I do see the same problem since last Friday. The solution as mentions
> by [1] works for me.
>
>     mvn -Djsse.enableSNIExtension=false {goals}
>
> No Idea why https connections to github do currently cause this. I
> could not find anything related via Google. So I suggest to use the
> system property for now. If this persists for longer we can adapt the
> build files accordingly.
>
> best
> Rupert
>
>
>
>
> [1]
> http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
>
> On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > I did a clean on the whole project and now I wanted to do another "mvn
> > clean install" but I am getting this :
> >
> > "[INFO]
> > ------------------------------------------------------------------------
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-antrun-plugin:1.6:
> > run (download) on project org.apache.stanbol.data.opennlp.lang.es: An
> Ant
> > BuildE
> > xception has occured: The following error occurred while executing this
> > line:
> > [ERROR]
> > C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
> > 3: Failed to copy
> > https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
> > 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to
> > C:\Data\Pr
> >
> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
> > data\opennlp\es-pos-maxent.bin due to javax.net.ssl.SSLProtocolException
> > handshake alert : unrecognized_name"
> >
> >
> >
> > 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
> > rupert.westenthaler@gmail.com>:
> >
> >> Hi Cristian,
> >>
> >> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
> >> <cr...@gmail.com> wrote:
> >> >
> >>
> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
> >> > service.ranking=I"-2147483648"
> >> > stanbol.enhancer.chain.name="default"
> >>
> >> Does look fine to me. Do you see any exception during the startup of
> >> the launcher. Can you check the status of this component in the
> >> component tab of the felix web console [1] (search for
> >> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
> >> you have multiple you can find the correct one by comparing the
> >> "Properties" with those in the configuration file.
> >>
> >> I guess that the according service is in the 'unsatisfied' as you do
> >> not see it in the web interface. But if this is the case you should
> >> also see the according exception in the log. You can also manually
> >> stop/start the component. In this case the exception should be
> >> re-thrown and you do not need to search the log for it.
> >>
> >> best
> >> Rupert
> >>
> >>
> >> [1] http://localhost:8080/system/console/components
> >>
> >> >
> >> >
> >> >
> >> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
> >> rupert.westenthaler@gmail.com
> >> >>:
> >> >
> >> >> Hi Cristian,
> >> >>
> >> >> you can not send attachments to the list. Please copy the contents
> >> >> directly to the mail
> >> >>
> >> >> thx
> >> >> Rupert
> >> >>
> >> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
> >> >> <cr...@gmail.com> wrote:
> >> >> > The config attached.
> >> >> >
> >> >> >
> >> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
> >> >> > <ru...@gmail.com>:
> >> >> >
> >> >> >> Hi Cristian,
> >> >> >>
> >> >> >> can you provide the contents of the chain after your
> modifications?
> >> >> >> Would be interesting to test why the chain is no longer active
> after
> >> >> >> the restart.
> >> >> >>
> >> >> >> You can find the config file in the 'stanbol/fileinstall' folder.
> >> >> >>
> >> >> >> best
> >> >> >> Rupert
> >> >> >>
> >> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> > Related to the default chain selection rules : before restart I
> >> had a
> >> >> >> > chain
> >> >> >> > with the name 'default' as in I could access it via
> >> >> >> > enhancer/chain/default.
> >> >> >> > Then I just added another engine to the 'default' chain. I
> assumed
> >> >> that
> >> >> >> > after the restart the chain with the 'default' name would be
> >> >> persisted.
> >> >> >> > So
> >> >> >> > the first rule should have been applied after the restart as
> well.
> >> But
> >> >> >> > instead I cannot reach it via enhancer/chain/default anymore so
> its
> >> >> >> > gone.
> >> >> >> > Anyway, this is not a big deal, it's not blocking me in any
> way, I
> >> >> just
> >> >> >> > wanted to understand where the problem is.
> >> >> >> >
> >> >> >> >
> >> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
> >> >> >> > <rupert.westenthaler@gmail.com
> >> >> >> >>:
> >> >> >> >
> >> >> >> >> Hi Cristian
> >> >> >> >>
> >> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
> >> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> >> > 1. Updated to the latest code and it's gone. Cool
> >> >> >> >> >
> >> >> >> >> > 2. I start the stable launcher -> create a new instance of
> the
> >> >> >> >> > PosChunkerEngine -> add it to the default chain. At this
> point
> >> >> >> >> > everything
> >> >> >> >> > looks good and works ok.
> >> >> >> >> > After I restart the server the default chain is gone and
> >> instead I
> >> >> >> >> > see
> >> >> >> >> this
> >> >> >> >> > in the enhancement chains page : all-active (default, id:
> 149,
> >> >> >> >> > ranking:
> >> >> >> >> 0,
> >> >> >> >> > impl: AllActiveEnginesChain ). all-active did not contain the
> >> >> >> >> > 'default'
> >> >> >> >> > word before the restart.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> Please note the default chain selection rules as described at
> [1].
> >> >> You
> >> >> >> >> can also access chains chains under
> '/enhancer/chain/{chain-name}'
> >> >> >> >>
> >> >> >> >> best
> >> >> >> >> Rupert
> >> >> >> >>
> >> >> >> >> [1]
> >> >> >> >>
> >> >> >> >>
> >> >>
> >>
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
> >> >> >> >>
> >> >> >> >> > It looks like the config files are exactly what I need.
> Thanks.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
> >> >> >> >> rupert.westenthaler@gmail.com
> >> >> >> >> >>:
> >> >> >> >> >
> >> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
> >> >> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> >> >> > Thanks Rupert.
> >> >> >> >> >> >
> >> >> >> >> >> > A couple more questions/issues :
> >> >> >> >> >> >
> >> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this in
> the
> >> >> >> >> >> > console
> >> >> >> >> >> > output :
> >> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
> >> >> >> >> >>
> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
> >> messed
> >> >> >> >> >> > up. I
> >> >> >> >> >> > usually use the 'default' chain and add my engine to it so
> >> there
> >> >> >> >> >> > are
> >> >> >> >> 11
> >> >> >> >> >> > engines in it. After the restart this chain now contains
> >> around
> >> >> 23
> >> >> >> >> >> engines
> >> >> >> >> >> > in total.
> >> >> >> >> >>
> >> >> >> >> >> I was not able to replicate this. What I tried was
> >> >> >> >> >>
> >> >> >> >> >> (1) start up the stable launcher
> >> >> >> >> >> (2) add an additional engine to the default chain
> >> >> >> >> >> (3) restart the launcher
> >> >> >> >> >>
> >> >> >> >> >> The default chain was not changed after (2) and (3). So I
> would
> >> >> need
> >> >> >> >> >> further information for knowing why this is happening.
> >> >> >> >> >>
> >> >> >> >> >> Generally it is better to create you own chain instance as
> >> >> modifying
> >> >> >> >> >> one that is provided by the default configuration. I would
> also
> >> >> >> >> >> recommend that you keep your test configuration in text
> files
> >> and
> >> >> to
> >> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so
> >> prevent
> >> >> you
> >> >> >> >> >> from manually entering the configuration after a software
> >> update.
> >> >> >> >> >> The
> >> >> >> >> >> production-mode section [3] provides information on how to
> do
> >> >> that.
> >> >> >> >> >>
> >> >> >> >> >> best
> >> >> >> >> >> Rupert
> >> >> >> >> >>
> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
> >> >> >> >> >> [2] http://svn.apache.org/r1576623
> >> >> >> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
> >> >> >> >> >>
> >> >> >> >> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web
> >> >> [153]:
> >> >> >> >> Error
> >> >> >> >> >> > starting
> >> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >>
> >>
> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >>
> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
> >> >> >> >> >> > (org.osgi
> >> >> >> >> >> > .framework.BundleException: Unresolved constraint in
> bundle
> >> >> >> >> >> > org.apache.stanbol.e
> >> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0:
> >> missing
> >> >> >> >> >> > requirement [15
> >> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
> >> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
> >> >> >> >> >> > org.osgi.framework.BundleException: Unresolved constraint
> in
> >> >> >> >> >> > bundle
> >> >> >> >> >> > org.apache.s
> >> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve
> >> 153.0:
> >> >> >> >> missing
> >> >> >> >> >> > require
> >> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
> >> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
> >> >> >> >> >> > )
> >> >> >> >> >> >         at
> >> >> >> >> >>
> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
> >> >> >> >> >> >         at
> >> >> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
> >> >> >> >> >> >         at
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
> >> >> >> >> >> >
> >> >> >> >> >> >         at
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
> >> >> >> >> >> > )
> >> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
> >> >> >> >> >> >
> >> >> >> >> >> > Despite of this the server starts fine and I can use the
> >> >> enhancer
> >> >> >> >> fine.
> >> >> >> >> >> Do
> >> >> >> >> >> > you guys see this as well?
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
> >> messed
> >> >> >> >> >> > up. I
> >> >> >> >> >> > usually use the 'default' chain and add my engine to it so
> >> there
> >> >> >> >> >> > are
> >> >> >> >> 11
> >> >> >> >> >> > engines in it. After the restart this chain now contains
> >> around
> >> >> 23
> >> >> >> >> >> engines
> >> >> >> >> >> > in total.
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
> >> >> >> >> >> rupert.westenthaler@gmail.com
> >> >> >> >> >> >>:
> >> >> >> >> >> >
> >> >> >> >> >> >> Hi Cristian,
> >> >> >> >> >> >>
> >> >> >> >> >> >> NER Annotations are typically available as both
> >> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation
> [1]
> >> in
> >> >> the
> >> >> >> >> >> >> enhancement metadata. As you are already accessing the
> >> >> >> >> >> >> AnayzedText I
> >> >> >> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
> >> >> >> >> >> >>
> >> >> >> >> >> >> best
> >> >> >> >> >> >> Rupert
> >> >> >> >> >> >>
> >> >> >> >> >> >> [1]
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
> >> >> >> >> >> >>
> >> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> >> >> >> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> >> >> >> > Thanks.
> >> >> >> >> >> >> > I assume I should get the Named entities using the same
> >> but
> >> >> >> >> >> >> > with
> >> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> >> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> Hallo Cristian,
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement
> results.
> >> >> You
> >> >> >> >> need to
> >> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> here is some demo code you can use in the
> >> computeEnhancement
> >> >> >> >> method
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >>         AnalysedText at =
> >> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
> >> >> >> >> ci,
> >> >> >> >> >> >> true);
> >> >> >> >> >> >> >>         Iterator<? extends Section> sections =
> >> >> >> >> >> >> >> at.getSentences();
> >> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as single
> >> >> sentence
> >> >> >> >> >> >> >>             sections =
> >> Collections.singleton(at).iterator();
> >> >> >> >> >> >> >>         }
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >>         while(sections.hasNext()){
> >> >> >> >> >> >> >>             Section section = sections.next();
> >> >> >> >> >> >> >>             Iterator<Span> chunks =
> >> >> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >> >> >> >> >> >> >>             while(chunks.hasNext()){
> >> >> >> >> >> >> >>                 Span chunk = chunks.next();
> >> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
> >> >> >> >> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >> >> >> >> >> >> >>                 if(phrase.value().getCategory() ==
> >> >> >> >> >> >> LexicalCategory.Noun){
> >> >> >> >> >> >> >>                     log.info(" - NounPhrase [{},{}]
> {}",
> >> >> new
> >> >> >> >> >> Object[]{
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >> >> >> >> >> >> >>                 }
> >> >> >> >> >> >> >>             }
> >> >> >> >> >> >> >>         }
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> hope this helps
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> best
> >> >> >> >> >> >> >> Rupert
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> [1]
> >> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >> >> >> >> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> >> >> >> >> > I started to implement the engine and I'm having
> >> problems
> >> >> >> >> >> >> >> > with
> >> >> >> >> >> getting
> >> >> >> >> >> >> >> > results for noun phrases. I modified the "default"
> >> >> weighted
> >> >> >> >> chain
> >> >> >> >> >> to
> >> >> >> >> >> >> also
> >> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample text :
> >> >> "Angela
> >> >> >> >> Merkel
> >> >> >> >> >> >> >> visted
> >> >> >> >> >> >> >> > China. The german chancellor met with various
> people".
> >> I
> >> >> >> >> expected
> >> >> >> >> >> that
> >> >> >> >> >> >> >> the
> >> >> >> >> >> >> >> > RDF XML output would contain some info about the
> noun
> >> >> >> >> >> >> >> > phrases
> >> >> >> >> but I
> >> >> >> >> >> >> >> cannot
> >> >> >> >> >> >> >> > see any.
> >> >> >> >> >> >> >> > Could you point me to the correct way to generate
> the
> >> noun
> >> >> >> >> phrases?
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Thanks,
> >> >> >> >> >> >> >> > Cristian
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> Opened
> >> >> https://issues.apache.org/jira/browse/STANBOL-1279
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >> >> >> >> >> >> >> cristian.petroaca@gmail.com>
> >> >> >> >> >> >> >> >> :
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> Hi Rupert,
> >> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also
> >> take a
> >> >> >> >> >> >> >> >>> look
> >> >> >> >> at
> >> >> >> >> >> >> Yago.
> >> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >> >>> I will create a Jira with what we talked about
> here.
> >> It
> >> >> >> >> >> >> >> >>> will
> >> >> >> >> >> >> probably
> >> >> >> >> >> >> >> >>> have just a draft-like description for now and
> will
> >> be
> >> >> >> >> >> >> >> >>> updated
> >> >> >> >> >> as I
> >> >> >> >> >> >> go
> >> >> >> >> >> >> >> >>> along.
> >> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >> >>> Thanks,
> >> >> >> >> >> >> >> >>> Cristian
> >> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> >> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
> >> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >> >>> Hi Cristian,
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>> definitely an interesting approach. You should
> have
> >> a
> >> >> >> >> >> >> >> >>>> look at
> >> >> >> >> >> Yago2
> >> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy
> is
> >> much
> >> >> >> >> better
> >> >> >> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
> >> >> >> >> >> >> >> >>>> suggestions of
> >> >> >> >> >> >> dbpedia
> >> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and
> >> yago2
> >> >> do
> >> >> >> >> >> provide
> >> >> >> >> >> >> >> >>>> mappings [2] and [3]
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
> >> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
> >> Redmond's
> >> >> >> >> >> >> >> >>>> >> company
> >> >> >> >> >> made
> >> >> >> >> >> >> a
> >> >> >> >> >> >> >> >>>> >> huge profit".
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial
> contexts
> >> >> are
> >> >> >> >> >> >> >> >>>> very
> >> >> >> >> >> >> >> >>>> important as they tend to be often used for
> >> >> referencing.
> >> >> >> >> >> >> >> >>>> So I
> >> >> >> >> >> would
> >> >> >> >> >> >> >> >>>> suggest to specially treat the spatial context.
> For
> >> >> >> >> >> >> >> >>>> spatial
> >> >> >> >> >> >> Entities
> >> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for other
> >> (like a
> >> >> >> >> Person,
> >> >> >> >> >> >> >> >>>> Company) you could use relations to spatial
> entities
> >> >> >> >> >> >> >> >>>> define
> >> >> >> >> >> their
> >> >> >> >> >> >> >> >>>> spatial context. This context could than be used
> to
> >> >> >> >> >> >> >> >>>> correctly
> >> >> >> >> >> link
> >> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>> In addition I would suggest to use the "spatial"
> >> >> context
> >> >> >> >> >> >> >> >>>> of
> >> >> >> >> each
> >> >> >> >> >> >> >> >>>> entity (basically relation to entities that are
> >> cities,
> >> >> >> >> regions,
> >> >> >> >> >> >> >> >>>> countries) as a separate dimension, because those
> >> are
> >> >> >> >> >> >> >> >>>> very
> >> >> >> >> often
> >> >> >> >> >> >> used
> >> >> >> >> >> >> >> >>>> for coreferences.
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> >> >> >> >> >> >> >> >>>> [2]
> >> >> >> >> >> >> >> >>>>
> >> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >> >> >> >> >> >> >> >>>> [3]
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >>
> >>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian
> Petroaca
> >> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
> >> >> >> >> >> >> >> >>>> > There are several dbpedia categories for each
> >> entity,
> >> >> >> >> >> >> >> >>>> > in
> >> >> >> >> this
> >> >> >> >> >> >> case
> >> >> >> >> >> >> >> for
> >> >> >> >> >> >> >> >>>> > Microsoft we have :
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >> >> >> >> >> >> >> >>>> > category:Microsoft
> >> >> >> >> >> >> >> >>>> >
> category:Software_companies_of_the_United_States
> >> >> >> >> >> >> >> >>>> >
> >> >> category:Software_companies_based_in_Washington_(state)
> >> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
> >> >> >> >> >> >> >> >>>> >
> category:1975_establishments_in_the_United_States
> >> >> >> >> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> category:Multinational_companies_headquartered_in_the_United_States
> >> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
> >> >> >> >> >> >> >> >>>> >
> >> >> category:Companies_in_the_Dow_Jones_Industrial_Average
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> > So we also have "Companies based in
> >> >> Redmont,Washington"
> >> >> >> >> which
> >> >> >> >> >> >> could
> >> >> >> >> >> >> >> be
> >> >> >> >> >> >> >> >>>> > matched.
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> > There is still other contextual information
> from
> >> >> >> >> >> >> >> >>>> > dbpedia
> >> >> >> >> which
> >> >> >> >> >> >> can
> >> >> >> >> >> >> >> be
> >> >> >> >> >> >> >> >>>> used.
> >> >> >> >> >> >> >> >>>> > For example for an Organization we could also
> >> >> include :
> >> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
> >> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
> >> >> >> >> >> >> >> >>>> >                                dbpedia:Author
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
> >> >> >> >> >> >> >> >>>> >                                dbpedia:Lawyer
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> > I'd like to continue investigating this as I
> think
> >> >> that
> >> >> >> >> >> >> >> >>>> > it
> >> >> >> >> may
> >> >> >> >> >> >> have
> >> >> >> >> >> >> >> >>>> some
> >> >> >> >> >> >> >> >>>> > value in increasing the number of coreference
> >> >> >> >> >> >> >> >>>> > resolutions
> >> >> >> >> and
> >> >> >> >> >> I'd
> >> >> >> >> >> >> >> like
> >> >> >> >> >> >> >> >>>> to
> >> >> >> >> >> >> >> >>>> > concentrate more on precision rather than
> recall
> >> >> since
> >> >> >> >> >> >> >> >>>> > we
> >> >> >> >> >> already
> >> >> >> >> >> >> >> have
> >> >> >> >> >> >> >> >>>> a
> >> >> >> >> >> >> >> >>>> > set of coreferences detected by the stanford
> nlp
> >> tool
> >> >> >> >> >> >> >> >>>> > and
> >> >> >> >> this
> >> >> >> >> >> >> would
> >> >> >> >> >> >> >> >>>> be as
> >> >> >> >> >> >> >> >>>> > an addition to that (at least this is how I
> would
> >> >> like
> >> >> >> >> >> >> >> >>>> > to
> >> >> >> >> use
> >> >> >> >> >> >> it).
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I
> >> could
> >> >> >> >> >> >> >> >>>> > update
> >> >> >> >> it
> >> >> >> >> >> to
> >> >> >> >> >> >> >> show
> >> >> >> >> >> >> >> >>>> my
> >> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it
> turns
> >> out
> >> >> >> >> >> >> >> >>>> > that
> >> >> >> >> it
> >> >> >> >> >> was
> >> >> >> >> >> >> a
> >> >> >> >> >> >> >> bad
> >> >> >> >> >> >> >> >>>> idea
> >> >> >> >> >> >> >> >>>> > then that's the situation at least I'll end up
> >> with
> >> >> >> >> >> >> >> >>>> > more
> >> >> >> >> >> >> knowledge
> >> >> >> >> >> >> >> >>>> about
> >> >> >> >> >> >> >> >>>> > Stanbol in the end :).
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
> >> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
> >> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >> >>>> >> Hi Cristian,
> >> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be
> the
> >> >> >> >> >> >> >> >>>> >> devil's
> >> >> >> >> >> >> advocate
> >> >> >> >> >> >> >> but
> >> >> >> >> >> >> >> >>>> I'm
> >> >> >> >> >> >> >> >>>> >> just not sure about the recall using the
> dbpedia
> >> >> >> >> categories
> >> >> >> >> >> >> >> feature.
> >> >> >> >> >> >> >> >>>> For
> >> >> >> >> >> >> >> >>>> >> example, your sentence could be also
> "Microsoft
> >> >> posted
> >> >> >> >> >> >> >> >>>> >> its
> >> >> >> >> >> 2013
> >> >> >> >> >> >> >> >>>> earnings.
> >> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit". So,
> >> maybe
> >> >> >> >> >> including
> >> >> >> >> >> >> more
> >> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could
> >> increase
> >> >> the
> >> >> >> >> recall
> >> >> >> >> >> >> but
> >> >> >> >> >> >> >> of
> >> >> >> >> >> >> >> >>>> course
> >> >> >> >> >> >> >> >>>> >> will reduce the precision.
> >> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >> >>>> >> Cheers,
> >> >> >> >> >> >> >> >>>> >> Rafa
> >> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >> >>>> >>  Back with a more detailed description of the
> >> steps
> >> >> >> >> >> >> >> >>>> >> for
> >> >> >> >> >> making
> >> >> >> >> >> >> this
> >> >> >> >> >> >> >> >>>> kind of
> >> >> >> >> >> >> >> >>>> >>> coreference work.
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>> I will be using references to the following
> >> text in
> >> >> >> >> >> >> >> >>>> >>> the
> >> >> >> >> >> steps
> >> >> >> >> >> >> >> below
> >> >> >> >> >> >> >> >>>> in
> >> >> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft
> posted
> >> >> its
> >> >> >> >> >> >> >> >>>> >>> 2013
> >> >> >> >> >> >> >> earnings.
> >> >> >> >> >> >> >> >>>> The
> >> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which
> has :
> >> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
> >> reference
> >> >> to
> >> >> >> >> >> >> >> >>>> >>> an
> >> >> >> >> >> entity
> >> >> >> >> >> >> >> local
> >> >> >> >> >> >> >> >>>> to
> >> >> >> >> >> >> >> >>>> >>> the
> >> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not
> >> "another,
> >> >> >> >> every",
> >> >> >> >> >> etc
> >> >> >> >> >> >> >> which
> >> >> >> >> >> >> >> >>>> >>> implies a reference to an entity outside of
> the
> >> >> text.
> >> >> >> >> >> >> >> >>>> >>>      b. having at least another noun aside
> from
> >> the
> >> >> >> >> >> >> >> >>>> >>> main
> >> >> >> >> >> >> required
> >> >> >> >> >> >> >> >>>> noun
> >> >> >> >> >> >> >> >>>> >>> which
> >> >> >> >> >> >> >> >>>> >>> further describes it. For example I will not
> >> count
> >> >> >> >> >> >> >> >>>> >>> "The
> >> >> >> >> >> >> company"
> >> >> >> >> >> >> >> as
> >> >> >> >> >> >> >> >>>> being
> >> >> >> >> >> >> >> >>>> >>> a
> >> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could create
> a
> >> lot
> >> >> of
> >> >> >> >> false
> >> >> >> >> >> >> >> >>>> positives by
> >> >> >> >> >> >> >> >>>> >>> considering the double meaning of some words
> >> such
> >> >> as
> >> >> >> >> >> >> >> >>>> >>> "in
> >> >> >> >> the
> >> >> >> >> >> >> >> company
> >> >> >> >> >> >> >> >>>> of
> >> >> >> >> >> >> >> >>>> >>> good people".
> >> >> >> >> >> >> >> >>>> >>> "The software company" is a good candidate
> >> since we
> >> >> >> >> >> >> >> >>>> >>> also
> >> >> >> >> >> have
> >> >> >> >> >> >> >> >>>> "software".
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the
> >> >> contents
> >> >> >> >> >> >> >> >>>> >>> of
> >> >> >> >> the
> >> >> >> >> >> >> >> dbpedia
> >> >> >> >> >> >> >> >>>> >>> categories of each named entity found prior
> to
> >> the
> >> >> >> >> location
> >> >> >> >> >> of
> >> >> >> >> >> >> the
> >> >> >> >> >> >> >> >>>> noun
> >> >> >> >> >> >> >> >>>> >>> phrase in the text.
> >> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the following
> >> format
> >> >> >> >> >> >> >> >>>> >>> (for
> >> >> >> >> >> >> Microsoft
> >> >> >> >> >> >> >> for
> >> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the United
> >> >> States".
> >> >> >> >> >> >> >> >>>> >>>   So we try to match "software company" with
> >> that.
> >> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in the
> >> dbpedia
> >> >> >> >> category
> >> >> >> >> >> >> has a
> >> >> >> >> >> >> >> >>>> plural
> >> >> >> >> >> >> >> >>>> >>> form and it's the same for all categories
> which
> >> I
> >> >> >> >> >> >> >> >>>> >>> saw. I
> >> >> >> >> >> don't
> >> >> >> >> >> >> >> know
> >> >> >> >> >> >> >> >>>> if
> >> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I
> thought
> >> of
> >> >> >> >> applying a
> >> >> >> >> >> >> >> >>>> lemmatizer on
> >> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in order for
> >> them
> >> >> to
> >> >> >> >> have a
> >> >> >> >> >> >> >> common
> >> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun
> phrase
> >> >> itself
> >> >> >> >> has a
> >> >> >> >> >> >> plural
> >> >> >> >> >> >> >> >>>> form.
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison only
> the
> >> >> >> >> >> >> >> >>>> >>> words in
> >> >> >> >> >> the
> >> >> >> >> >> >> >> >>>> category
> >> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not
> prepositions
> >> or
> >> >> >> >> >> determiners
> >> >> >> >> >> >> >> such
> >> >> >> >> >> >> >> >>>> as "of
> >> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag the
> >> >> categories
> >> >> >> >> >> contents
> >> >> >> >> >> >> as
> >> >> >> >> >> >> >> >>>> well.
> >> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma
> on
> >> the
> >> >> >> >> dbpedia
> >> >> >> >> >> >> >> >>>> categories when
> >> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and
> >> storing
> >> >> >> >> >> >> >> >>>> >>> them
> >> >> >> >> for
> >> >> >> >> >> >> later
> >> >> >> >> >> >> >> >>>> use - I
> >> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the
> moment.
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in the
> noun
> >> >> phrase
> >> >> >> >> with
> >> >> >> >> >> the
> >> >> >> >> >> >> >> >>>> equivalent
> >> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the
> number
> >> of
> >> >> >> >> matches I
> >> >> >> >> >> >> can
> >> >> >> >> >> >> >> >>>> create a
> >> >> >> >> >> >> >> >>>> >>> confidence level.
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the
> >> >> >> >> >> >> >> >>>> >>> rdf:type
> >> >> >> >> from
> >> >> >> >> >> >> >> dbpedia
> >> >> >> >> >> >> >> >>>> of the
> >> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase the
> >> >> confidence
> >> >> >> >> level.
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities which
> >> can
> >> >> >> >> >> >> >> >>>> >>> match a
> >> >> >> >> >> >> certain
> >> >> >> >> >> >> >> >>>> noun
> >> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the
> >> closest
> >> >> >> >> >> >> >> >>>> >>> named
> >> >> >> >> >> entity
> >> >> >> >> >> >> >> prior
> >> >> >> >> >> >> >> >>>> to it
> >> >> >> >> >> >> >> >>>> >>> in the text.
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>> What do you think?
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>> Cristian
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
> >> >> >> >> cristian.petroaca@gmail.com>:
> >> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
> >> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but
> I'm
> >> >> >> >> >> >> >> >>>> >>>> working on
> >> >> >> >> >> it.
> >> >> >> >> >> >> I'll
> >> >> >> >> >> >> >> >>>> provide
> >> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
> >> feedback on
> >> >> >> >> >> >> >> >>>> >>>> it.
> >> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
> >> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools such
> as
> >> >> >> >> >> >> >> >>>> >>>> ArkRef
> >> >> >> >> and
> >> >> >> >> >> >> >> >>>> CherryPicker
> >> >> >> >> >> >> >> >>>> >>>> and
> >> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
> >> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >> >>>> >>>> Cristian
> >> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
> >> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
> >> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >> >>>> >>>>> Without having more details about your
> >> concrete
> >> >> >> >> heuristic,
> >> >> >> >> >> >> in my
> >> >> >> >> >> >> >> >>>> honest
> >> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a lot
> of
> >> >> false
> >> >> >> >> >> >> positives. I
> >> >> >> >> >> >> >> >>>> don't
> >> >> >> >> >> >> >> >>>> >>>>> know
> >> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some "locality"
> >> >> features
> >> >> >> >> >> >> >> >>>> >>>>> to
> >> >> >> >> >> detect
> >> >> >> >> >> >> >> such
> >> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into
> account
> >> >> that
> >> >> >> >> >> >> >> >>>> >>>>> it
> >> >> >> >> is
> >> >> >> >> >> >> quite
> >> >> >> >> >> >> >> >>>> usual
> >> >> >> >> >> >> >> >>>> >>>>> that
> >> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in
> >> >> different
> >> >> >> >> >> >> paragraphs.
> >> >> >> >> >> >> >> >>>> Although
> >> >> >> >> >> >> >> >>>> >>>>> I'm
> >> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
> >> Understanding,
> >> >> I
> >> >> >> >> would
> >> >> >> >> >> say
> >> >> >> >> >> >> it
> >> >> >> >> >> >> >> is
> >> >> >> >> >> >> >> >>>> quite
> >> >> >> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall
> rates
> >> >> for
> >> >> >> >> >> >> coreferencing
> >> >> >> >> >> >> >> >>>> using
> >> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to
> >> others
> >> >> >> >> >> >> >> >>>> >>>>> tools
> >> >> >> >> like
> >> >> >> >> >> >> BART
> >> >> >> >> >> >> >> (
> >> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
> >> >> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >> >> >>>> >>>>> Cheers,
> >> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
> >> >> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca
> escribió:
> >> >> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >> >> >>>> >>>>>   Hi,
> >> >> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for
> implementing
> >> the
> >> >> >> >> >> >> >> >>>> >>>>>> Event
> >> >> >> >> >> >> >> extraction
> >> >> >> >> >> >> >> >>>> Engine
> >> >> >> >> >> >> >> >>>> >>>>>> feature :
> >> >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
> >> >> >> >> >> >> >> >>>> to
> >> >> >> >> >> >> >> >>>> >>>>>> have
> >> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given text.
> >> This
> >> >> is
> >> >> >> >> >> provided
> >> >> >> >> >> >> now
> >> >> >> >> >> >> >> >>>> via the
> >> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw
> this
> >> >> >> >> >> >> >> >>>> >>>>>> module
> >> >> >> >> is
> >> >> >> >> >> >> >> performing
> >> >> >> >> >> >> >> >>>> >>>>>> mostly
> >> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack
> Obama
> >> and
> >> >> >> >> >> >> >> >>>> >>>>>> Mr.
> >> >> >> >> >> Obama)
> >> >> >> >> >> >> >> >>>> coreference
> >> >> >> >> >> >> >> >>>> >>>>>> resolution.
> >> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from the
> >> text
> >> >> I
> >> >> >> >> though
> >> >> >> >> >> of
> >> >> >> >> >> >> >> >>>> creating
> >> >> >> >> >> >> >> >>>> >>>>>> some
> >> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
> >> >> coreference :
> >> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The
> >> software
> >> >> >> >> company
> >> >> >> >> >> just
> >> >> >> >> >> >> >> >>>> announced
> >> >> >> >> >> >> >> >>>> >>>>>> its
> >> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
> >> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously
> refers
> >> to
> >> >> >> >> "Apple".
> >> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of
> Named
> >> >> >> >> >> >> >> >>>> >>>>>> Entities
> >> >> >> >> >> which
> >> >> >> >> >> >> are
> >> >> >> >> >> >> >> of
> >> >> >> >> >> >> >> >>>> the
> >> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this
> case
> >> >> >> >> >> >> >> >>>> >>>>>> "company"
> >> >> >> >> and
> >> >> >> >> >> >> also
> >> >> >> >> >> >> >> >>>> have
> >> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the
> dbpedia
> >> >> >> >> categories
> >> >> >> >> >> of
> >> >> >> >> >> >> the
> >> >> >> >> >> >> >> >>>> named
> >> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
> >> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as "The
> >> >> >> >> >> >> >> >>>> >>>>>> software
> >> >> >> >> >> >> company" in
> >> >> >> >> >> >> >> >>>> the
> >> >> >> >> >> >> >> >>>> >>>>>> text
> >> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using the new
> >> Pos
> >> >> Tag
> >> >> >> >> Based
> >> >> >> >> >> >> Phrase
> >> >> >> >> >> >> >> >>>> >>>>>> extraction
> >> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
> >> dependency
> >> >> >> >> >> >> >> >>>> >>>>>> tree of
> >> >> >> >> >> the
> >> >> >> >> >> >> >> >>>> sentence and
> >> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
> >> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this
> kind
> >> of
> >> >> >> >> >> >> >> >>>> >>>>>> logic
> >> >> >> >> >> would
> >> >> >> >> >> >> be
> >> >> >> >> >> >> >> >>>> useful
> >> >> >> >> >> >> >> >>>> >>>>>> as a
> >> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the
> >> >> precision
> >> >> >> >> >> >> >> >>>> >>>>>> and
> >> >> >> >> >> >> recall
> >> >> >> >> >> >> >> are
> >> >> >> >> >> >> >> >>>> good
> >> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
> >> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >> >>>> >>>>>> Thanks,
> >> >> >> >> >> >> >> >>>> >>>>>> Cristian
> >> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>> --
> >> >> >> >> >> >> >> >>>> | Rupert Westenthaler
> >> >> >> >> rupert.westenthaler@gmail.com
> >> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
> >> >> >> >> >> >> ++43-699-11108907
> >> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
> >> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> --
> >> >> >> >> >> >> >> | Rupert Westenthaler
> >> >> >> >> >> >> >> rupert.westenthaler@gmail.com
> >> >> >> >> >> >> >> | Bodenlehenstraße 11
> >> >> >> >> ++43-699-11108907
> >> >> >> >> >> >> >> | A-5500 Bischofshofen
> >> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> --
> >> >> >> >> >> >> | Rupert Westenthaler
> >> >> rupert.westenthaler@gmail.com
> >> >> >> >> >> >> | Bodenlehenstraße 11
> >> >> >> >> >> >> ++43-699-11108907
> >> >> >> >> >> >> | A-5500 Bischofshofen
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >> | Rupert Westenthaler
> >> rupert.westenthaler@gmail.com
> >> >> >> >> >> | Bodenlehenstraße 11
> >> >> ++43-699-11108907
> >> >> >> >> >> | A-5500 Bischofshofen
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> | Rupert Westenthaler
> rupert.westenthaler@gmail.com
> >> >> >> >> | Bodenlehenstraße 11
> >> ++43-699-11108907
> >> >> >> >> | A-5500 Bischofshofen
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> >> | Bodenlehenstraße 11
> ++43-699-11108907
> >> >> >> | A-5500 Bischofshofen
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian,

I do see the same problem since last Friday. The solution as mentions
by [1] works for me.

    mvn -Djsse.enableSNIExtension=false {goals}

No Idea why https connections to github do currently cause this. I
could not find anything related via Google. So I suggest to use the
system property for now. If this persists for longer we can adapt the
build files accordingly.

best
Rupert




[1] http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0

On Mon, Mar 24, 2014 at 7:01 PM, Cristian Petroaca
<cr...@gmail.com> wrote:
> I did a clean on the whole project and now I wanted to do another "mvn
> clean install" but I am getting this :
>
> "[INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-antrun-plugin:1.6:
> run (download) on project org.apache.stanbol.data.opennlp.lang.es: An Ant
> BuildE
> xception has occured: The following error occurred while executing this
> line:
> [ERROR]
> C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
> 3: Failed to copy
> https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
> 3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to
> C:\Data\Pr
> ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
> data\opennlp\es-pos-maxent.bin due to javax.net.ssl.SSLProtocolException
> handshake alert : unrecognized_name"
>
>
>
> 2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
> rupert.westenthaler@gmail.com>:
>
>> Hi Cristian,
>>
>> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> >
>> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
>> > service.ranking=I"-2147483648"
>> > stanbol.enhancer.chain.name="default"
>>
>> Does look fine to me. Do you see any exception during the startup of
>> the launcher. Can you check the status of this component in the
>> component tab of the felix web console [1] (search for
>> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
>> you have multiple you can find the correct one by comparing the
>> "Properties" with those in the configuration file.
>>
>> I guess that the according service is in the 'unsatisfied' as you do
>> not see it in the web interface. But if this is the case you should
>> also see the according exception in the log. You can also manually
>> stop/start the component. In this case the exception should be
>> re-thrown and you do not need to search the log for it.
>>
>> best
>> Rupert
>>
>>
>> [1] http://localhost:8080/system/console/components
>>
>> >
>> >
>> >
>> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
>> rupert.westenthaler@gmail.com
>> >>:
>> >
>> >> Hi Cristian,
>> >>
>> >> you can not send attachments to the list. Please copy the contents
>> >> directly to the mail
>> >>
>> >> thx
>> >> Rupert
>> >>
>> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>> >> <cr...@gmail.com> wrote:
>> >> > The config attached.
>> >> >
>> >> >
>> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>> >> > <ru...@gmail.com>:
>> >> >
>> >> >> Hi Cristian,
>> >> >>
>> >> >> can you provide the contents of the chain after your modifications?
>> >> >> Would be interesting to test why the chain is no longer active after
>> >> >> the restart.
>> >> >>
>> >> >> You can find the config file in the 'stanbol/fileinstall' folder.
>> >> >>
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>> >> >> <cr...@gmail.com> wrote:
>> >> >> > Related to the default chain selection rules : before restart I
>> had a
>> >> >> > chain
>> >> >> > with the name 'default' as in I could access it via
>> >> >> > enhancer/chain/default.
>> >> >> > Then I just added another engine to the 'default' chain. I assumed
>> >> that
>> >> >> > after the restart the chain with the 'default' name would be
>> >> persisted.
>> >> >> > So
>> >> >> > the first rule should have been applied after the restart as well.
>> But
>> >> >> > instead I cannot reach it via enhancer/chain/default anymore so its
>> >> >> > gone.
>> >> >> > Anyway, this is not a big deal, it's not blocking me in any way, I
>> >> just
>> >> >> > wanted to understand where the problem is.
>> >> >> >
>> >> >> >
>> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>> >> >> > <rupert.westenthaler@gmail.com
>> >> >> >>:
>> >> >> >
>> >> >> >> Hi Cristian
>> >> >> >>
>> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> > 1. Updated to the latest code and it's gone. Cool
>> >> >> >> >
>> >> >> >> > 2. I start the stable launcher -> create a new instance of the
>> >> >> >> > PosChunkerEngine -> add it to the default chain. At this point
>> >> >> >> > everything
>> >> >> >> > looks good and works ok.
>> >> >> >> > After I restart the server the default chain is gone and
>> instead I
>> >> >> >> > see
>> >> >> >> this
>> >> >> >> > in the enhancement chains page : all-active (default, id: 149,
>> >> >> >> > ranking:
>> >> >> >> 0,
>> >> >> >> > impl: AllActiveEnginesChain ). all-active did not contain the
>> >> >> >> > 'default'
>> >> >> >> > word before the restart.
>> >> >> >> >
>> >> >> >>
>> >> >> >> Please note the default chain selection rules as described at [1].
>> >> You
>> >> >> >> can also access chains chains under '/enhancer/chain/{chain-name}'
>> >> >> >>
>> >> >> >> best
>> >> >> >> Rupert
>> >> >> >>
>> >> >> >> [1]
>> >> >> >>
>> >> >> >>
>> >>
>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>> >> >> >>
>> >> >> >> > It looks like the config files are exactly what I need. Thanks.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >>:
>> >> >> >> >
>> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> >> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> >> > Thanks Rupert.
>> >> >> >> >> >
>> >> >> >> >> > A couple more questions/issues :
>> >> >> >> >> >
>> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this in the
>> >> >> >> >> > console
>> >> >> >> >> > output :
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>> >> >> >> >>
>> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
>> messed
>> >> >> >> >> > up. I
>> >> >> >> >> > usually use the 'default' chain and add my engine to it so
>> there
>> >> >> >> >> > are
>> >> >> >> 11
>> >> >> >> >> > engines in it. After the restart this chain now contains
>> around
>> >> 23
>> >> >> >> >> engines
>> >> >> >> >> > in total.
>> >> >> >> >>
>> >> >> >> >> I was not able to replicate this. What I tried was
>> >> >> >> >>
>> >> >> >> >> (1) start up the stable launcher
>> >> >> >> >> (2) add an additional engine to the default chain
>> >> >> >> >> (3) restart the launcher
>> >> >> >> >>
>> >> >> >> >> The default chain was not changed after (2) and (3). So I would
>> >> need
>> >> >> >> >> further information for knowing why this is happening.
>> >> >> >> >>
>> >> >> >> >> Generally it is better to create you own chain instance as
>> >> modifying
>> >> >> >> >> one that is provided by the default configuration. I would also
>> >> >> >> >> recommend that you keep your test configuration in text files
>> and
>> >> to
>> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so
>> prevent
>> >> you
>> >> >> >> >> from manually entering the configuration after a software
>> update.
>> >> >> >> >> The
>> >> >> >> >> production-mode section [3] provides information on how to do
>> >> that.
>> >> >> >> >>
>> >> >> >> >> best
>> >> >> >> >> Rupert
>> >> >> >> >>
>> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> >> >> >> >> [2] http://svn.apache.org/r1576623
>> >> >> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
>> >> >> >> >>
>> >> >> >> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web
>> >> [153]:
>> >> >> >> Error
>> >> >> >> >> > starting
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >>
>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> >> >> >> >> >
>> >> >> >> >> >
>> >> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> >> >> >> >> > (org.osgi
>> >> >> >> >> > .framework.BundleException: Unresolved constraint in bundle
>> >> >> >> >> > org.apache.stanbol.e
>> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0:
>> missing
>> >> >> >> >> > requirement [15
>> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>> >> >> >> >> > org.osgi.framework.BundleException: Unresolved constraint in
>> >> >> >> >> > bundle
>> >> >> >> >> > org.apache.s
>> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve
>> 153.0:
>> >> >> >> missing
>> >> >> >> >> > require
>> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>> >> >> >> >> > )
>> >> >> >> >> >         at
>> >> >> >> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >> >> >> >> >         at
>> >> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >> >> >> >> >         at
>> >> >> >> >> >
>> >> >> >> >> >
>> >> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >> >> >> >> >
>> >> >> >> >> >         at
>> >> >> >> >> >
>> >> >> >> >> >
>> >> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> >> >> >> >> > )
>> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>> >> >> >> >> >
>> >> >> >> >> > Despite of this the server starts fine and I can use the
>> >> enhancer
>> >> >> >> fine.
>> >> >> >> >> Do
>> >> >> >> >> > you guys see this as well?
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
>> messed
>> >> >> >> >> > up. I
>> >> >> >> >> > usually use the 'default' chain and add my engine to it so
>> there
>> >> >> >> >> > are
>> >> >> >> 11
>> >> >> >> >> > engines in it. After the restart this chain now contains
>> around
>> >> 23
>> >> >> >> >> engines
>> >> >> >> >> > in total.
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >>:
>> >> >> >> >> >
>> >> >> >> >> >> Hi Cristian,
>> >> >> >> >> >>
>> >> >> >> >> >> NER Annotations are typically available as both
>> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1]
>> in
>> >> the
>> >> >> >> >> >> enhancement metadata. As you are already accessing the
>> >> >> >> >> >> AnayzedText I
>> >> >> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>> >> >> >> >> >>
>> >> >> >> >> >> best
>> >> >> >> >> >> Rupert
>> >> >> >> >> >>
>> >> >> >> >> >> [1]
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >> >> >> >> >>
>> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> >> >> > Thanks.
>> >> >> >> >> >> > I assume I should get the Named entities using the same
>> but
>> >> >> >> >> >> > with
>> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
>> >> >> >> >> >> >
>> >> >> >> >> >> >> Hallo Cristian,
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement results.
>> >> You
>> >> >> >> need to
>> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> here is some demo code you can use in the
>> computeEnhancement
>> >> >> >> method
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>         AnalysedText at =
>> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>> >> >> >> ci,
>> >> >> >> >> >> true);
>> >> >> >> >> >> >>         Iterator<? extends Section> sections =
>> >> >> >> >> >> >> at.getSentences();
>> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as single
>> >> sentence
>> >> >> >> >> >> >>             sections =
>> Collections.singleton(at).iterator();
>> >> >> >> >> >> >>         }
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>         while(sections.hasNext()){
>> >> >> >> >> >> >>             Section section = sections.next();
>> >> >> >> >> >> >>             Iterator<Span> chunks =
>> >> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >> >> >> >> >> >>             while(chunks.hasNext()){
>> >> >> >> >> >> >>                 Span chunk = chunks.next();
>> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
>> >> >> >> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >> >> >> >> >> >>                 if(phrase.value().getCategory() ==
>> >> >> >> >> >> LexicalCategory.Noun){
>> >> >> >> >> >> >>                     log.info(" - NounPhrase [{},{}] {}",
>> >> new
>> >> >> >> >> Object[]{
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >> >> >> >> >> >>                 }
>> >> >> >> >> >> >>             }
>> >> >> >> >> >> >>         }
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> hope this helps
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> best
>> >> >> >> >> >> >> Rupert
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> [1]
>> >> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> >> >> >> > I started to implement the engine and I'm having
>> problems
>> >> >> >> >> >> >> > with
>> >> >> >> >> getting
>> >> >> >> >> >> >> > results for noun phrases. I modified the "default"
>> >> weighted
>> >> >> >> chain
>> >> >> >> >> to
>> >> >> >> >> >> also
>> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample text :
>> >> "Angela
>> >> >> >> Merkel
>> >> >> >> >> >> >> visted
>> >> >> >> >> >> >> > China. The german chancellor met with various people".
>> I
>> >> >> >> expected
>> >> >> >> >> that
>> >> >> >> >> >> >> the
>> >> >> >> >> >> >> > RDF XML output would contain some info about the noun
>> >> >> >> >> >> >> > phrases
>> >> >> >> but I
>> >> >> >> >> >> >> cannot
>> >> >> >> >> >> >> > see any.
>> >> >> >> >> >> >> > Could you point me to the correct way to generate the
>> noun
>> >> >> >> phrases?
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Thanks,
>> >> >> >> >> >> >> > Cristian
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> Opened
>> >> https://issues.apache.org/jira/browse/STANBOL-1279
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >> >> >> >> >> >> cristian.petroaca@gmail.com>
>> >> >> >> >> >> >> >> :
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> Hi Rupert,
>> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also
>> take a
>> >> >> >> >> >> >> >>> look
>> >> >> >> at
>> >> >> >> >> >> Yago.
>> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >>> I will create a Jira with what we talked about here.
>> It
>> >> >> >> >> >> >> >>> will
>> >> >> >> >> >> probably
>> >> >> >> >> >> >> >>> have just a draft-like description for now and will
>> be
>> >> >> >> >> >> >> >>> updated
>> >> >> >> >> as I
>> >> >> >> >> >> go
>> >> >> >> >> >> >> >>> along.
>> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >>> Thanks,
>> >> >> >> >> >> >> >>> Cristian
>> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >>> Hi Cristian,
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>> definitely an interesting approach. You should have
>> a
>> >> >> >> >> >> >> >>>> look at
>> >> >> >> >> Yago2
>> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is
>> much
>> >> >> >> better
>> >> >> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
>> >> >> >> >> >> >> >>>> suggestions of
>> >> >> >> >> >> dbpedia
>> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and
>> yago2
>> >> do
>> >> >> >> >> provide
>> >> >> >> >> >> >> >>>> mappings [2] and [3]
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
>> Redmond's
>> >> >> >> >> >> >> >>>> >> company
>> >> >> >> >> made
>> >> >> >> >> >> a
>> >> >> >> >> >> >> >>>> >> huge profit".
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial contexts
>> >> are
>> >> >> >> >> >> >> >>>> very
>> >> >> >> >> >> >> >>>> important as they tend to be often used for
>> >> referencing.
>> >> >> >> >> >> >> >>>> So I
>> >> >> >> >> would
>> >> >> >> >> >> >> >>>> suggest to specially treat the spatial context. For
>> >> >> >> >> >> >> >>>> spatial
>> >> >> >> >> >> Entities
>> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for other
>> (like a
>> >> >> >> Person,
>> >> >> >> >> >> >> >>>> Company) you could use relations to spatial entities
>> >> >> >> >> >> >> >>>> define
>> >> >> >> >> their
>> >> >> >> >> >> >> >>>> spatial context. This context could than be used to
>> >> >> >> >> >> >> >>>> correctly
>> >> >> >> >> link
>> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>> In addition I would suggest to use the "spatial"
>> >> context
>> >> >> >> >> >> >> >>>> of
>> >> >> >> each
>> >> >> >> >> >> >> >>>> entity (basically relation to entities that are
>> cities,
>> >> >> >> regions,
>> >> >> >> >> >> >> >>>> countries) as a separate dimension, because those
>> are
>> >> >> >> >> >> >> >>>> very
>> >> >> >> often
>> >> >> >> >> >> used
>> >> >> >> >> >> >> >>>> for coreferences.
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >> >> >> >> >> >> >>>> [2]
>> >> >> >> >> >> >> >>>>
>> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >> >> >> >> >> >> >>>> [3]
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>> >> >> >> >> >> >> >>>> > There are several dbpedia categories for each
>> entity,
>> >> >> >> >> >> >> >>>> > in
>> >> >> >> this
>> >> >> >> >> >> case
>> >> >> >> >> >> >> for
>> >> >> >> >> >> >> >>>> > Microsoft we have :
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >> >> >> >> >> >> >>>> > category:Microsoft
>> >> >> >> >> >> >> >>>> > category:Software_companies_of_the_United_States
>> >> >> >> >> >> >> >>>> >
>> >> category:Software_companies_based_in_Washington_(state)
>> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>> >> >> >> >> >> >> >>>> > category:1975_establishments_in_the_United_States
>> >> >> >> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> category:Multinational_companies_headquartered_in_the_United_States
>> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>> >> >> >> >> >> >> >>>> >
>> >> category:Companies_in_the_Dow_Jones_Industrial_Average
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> > So we also have "Companies based in
>> >> Redmont,Washington"
>> >> >> >> which
>> >> >> >> >> >> could
>> >> >> >> >> >> >> be
>> >> >> >> >> >> >> >>>> > matched.
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> > There is still other contextual information from
>> >> >> >> >> >> >> >>>> > dbpedia
>> >> >> >> which
>> >> >> >> >> >> can
>> >> >> >> >> >> >> be
>> >> >> >> >> >> >> >>>> used.
>> >> >> >> >> >> >> >>>> > For example for an Organization we could also
>> >> include :
>> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
>> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>> >> >> >> >> >> >> >>>> >                                dbpedia:Author
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>> >> >> >> >> >> >> >>>> >                                dbpedia:Lawyer
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> > I'd like to continue investigating this as I think
>> >> that
>> >> >> >> >> >> >> >>>> > it
>> >> >> >> may
>> >> >> >> >> >> have
>> >> >> >> >> >> >> >>>> some
>> >> >> >> >> >> >> >>>> > value in increasing the number of coreference
>> >> >> >> >> >> >> >>>> > resolutions
>> >> >> >> and
>> >> >> >> >> I'd
>> >> >> >> >> >> >> like
>> >> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >> >>>> > concentrate more on precision rather than recall
>> >> since
>> >> >> >> >> >> >> >>>> > we
>> >> >> >> >> already
>> >> >> >> >> >> >> have
>> >> >> >> >> >> >> >>>> a
>> >> >> >> >> >> >> >>>> > set of coreferences detected by the stanford nlp
>> tool
>> >> >> >> >> >> >> >>>> > and
>> >> >> >> this
>> >> >> >> >> >> would
>> >> >> >> >> >> >> >>>> be as
>> >> >> >> >> >> >> >>>> > an addition to that (at least this is how I would
>> >> like
>> >> >> >> >> >> >> >>>> > to
>> >> >> >> use
>> >> >> >> >> >> it).
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I
>> could
>> >> >> >> >> >> >> >>>> > update
>> >> >> >> it
>> >> >> >> >> to
>> >> >> >> >> >> >> show
>> >> >> >> >> >> >> >>>> my
>> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it turns
>> out
>> >> >> >> >> >> >> >>>> > that
>> >> >> >> it
>> >> >> >> >> was
>> >> >> >> >> >> a
>> >> >> >> >> >> >> bad
>> >> >> >> >> >> >> >>>> idea
>> >> >> >> >> >> >> >>>> > then that's the situation at least I'll end up
>> with
>> >> >> >> >> >> >> >>>> > more
>> >> >> >> >> >> knowledge
>> >> >> >> >> >> >> >>>> about
>> >> >> >> >> >> >> >>>> > Stanbol in the end :).
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
>> >> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >> >>>> >> Hi Cristian,
>> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be the
>> >> >> >> >> >> >> >>>> >> devil's
>> >> >> >> >> >> advocate
>> >> >> >> >> >> >> but
>> >> >> >> >> >> >> >>>> I'm
>> >> >> >> >> >> >> >>>> >> just not sure about the recall using the dbpedia
>> >> >> >> categories
>> >> >> >> >> >> >> feature.
>> >> >> >> >> >> >> >>>> For
>> >> >> >> >> >> >> >>>> >> example, your sentence could be also "Microsoft
>> >> posted
>> >> >> >> >> >> >> >>>> >> its
>> >> >> >> >> 2013
>> >> >> >> >> >> >> >>>> earnings.
>> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit". So,
>> maybe
>> >> >> >> >> including
>> >> >> >> >> >> more
>> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could
>> increase
>> >> the
>> >> >> >> recall
>> >> >> >> >> >> but
>> >> >> >> >> >> >> of
>> >> >> >> >> >> >> >>>> course
>> >> >> >> >> >> >> >>>> >> will reduce the precision.
>> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >>>> >> Cheers,
>> >> >> >> >> >> >> >>>> >> Rafa
>> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >>>> >>  Back with a more detailed description of the
>> steps
>> >> >> >> >> >> >> >>>> >> for
>> >> >> >> >> making
>> >> >> >> >> >> this
>> >> >> >> >> >> >> >>>> kind of
>> >> >> >> >> >> >> >>>> >>> coreference work.
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>> I will be using references to the following
>> text in
>> >> >> >> >> >> >> >>>> >>> the
>> >> >> >> >> steps
>> >> >> >> >> >> >> below
>> >> >> >> >> >> >> >>>> in
>> >> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted
>> >> its
>> >> >> >> >> >> >> >>>> >>> 2013
>> >> >> >> >> >> >> earnings.
>> >> >> >> >> >> >> >>>> The
>> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
>> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
>> reference
>> >> to
>> >> >> >> >> >> >> >>>> >>> an
>> >> >> >> >> entity
>> >> >> >> >> >> >> local
>> >> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >> >>>> >>> the
>> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not
>> "another,
>> >> >> >> every",
>> >> >> >> >> etc
>> >> >> >> >> >> >> which
>> >> >> >> >> >> >> >>>> >>> implies a reference to an entity outside of the
>> >> text.
>> >> >> >> >> >> >> >>>> >>>      b. having at least another noun aside from
>> the
>> >> >> >> >> >> >> >>>> >>> main
>> >> >> >> >> >> required
>> >> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >> >>>> >>> which
>> >> >> >> >> >> >> >>>> >>> further describes it. For example I will not
>> count
>> >> >> >> >> >> >> >>>> >>> "The
>> >> >> >> >> >> company"
>> >> >> >> >> >> >> as
>> >> >> >> >> >> >> >>>> being
>> >> >> >> >> >> >> >>>> >>> a
>> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could create a
>> lot
>> >> of
>> >> >> >> false
>> >> >> >> >> >> >> >>>> positives by
>> >> >> >> >> >> >> >>>> >>> considering the double meaning of some words
>> such
>> >> as
>> >> >> >> >> >> >> >>>> >>> "in
>> >> >> >> the
>> >> >> >> >> >> >> company
>> >> >> >> >> >> >> >>>> of
>> >> >> >> >> >> >> >>>> >>> good people".
>> >> >> >> >> >> >> >>>> >>> "The software company" is a good candidate
>> since we
>> >> >> >> >> >> >> >>>> >>> also
>> >> >> >> >> have
>> >> >> >> >> >> >> >>>> "software".
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the
>> >> contents
>> >> >> >> >> >> >> >>>> >>> of
>> >> >> >> the
>> >> >> >> >> >> >> dbpedia
>> >> >> >> >> >> >> >>>> >>> categories of each named entity found prior to
>> the
>> >> >> >> location
>> >> >> >> >> of
>> >> >> >> >> >> the
>> >> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >> >>>> >>> phrase in the text.
>> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the following
>> format
>> >> >> >> >> >> >> >>>> >>> (for
>> >> >> >> >> >> Microsoft
>> >> >> >> >> >> >> for
>> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the United
>> >> States".
>> >> >> >> >> >> >> >>>> >>>   So we try to match "software company" with
>> that.
>> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in the
>> dbpedia
>> >> >> >> category
>> >> >> >> >> >> has a
>> >> >> >> >> >> >> >>>> plural
>> >> >> >> >> >> >> >>>> >>> form and it's the same for all categories which
>> I
>> >> >> >> >> >> >> >>>> >>> saw. I
>> >> >> >> >> don't
>> >> >> >> >> >> >> know
>> >> >> >> >> >> >> >>>> if
>> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I thought
>> of
>> >> >> >> applying a
>> >> >> >> >> >> >> >>>> lemmatizer on
>> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in order for
>> them
>> >> to
>> >> >> >> have a
>> >> >> >> >> >> >> common
>> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun phrase
>> >> itself
>> >> >> >> has a
>> >> >> >> >> >> plural
>> >> >> >> >> >> >> >>>> form.
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison only the
>> >> >> >> >> >> >> >>>> >>> words in
>> >> >> >> >> the
>> >> >> >> >> >> >> >>>> category
>> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not prepositions
>> or
>> >> >> >> >> determiners
>> >> >> >> >> >> >> such
>> >> >> >> >> >> >> >>>> as "of
>> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag the
>> >> categories
>> >> >> >> >> contents
>> >> >> >> >> >> as
>> >> >> >> >> >> >> >>>> well.
>> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma on
>> the
>> >> >> >> dbpedia
>> >> >> >> >> >> >> >>>> categories when
>> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and
>> storing
>> >> >> >> >> >> >> >>>> >>> them
>> >> >> >> for
>> >> >> >> >> >> later
>> >> >> >> >> >> >> >>>> use - I
>> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the moment.
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in the noun
>> >> phrase
>> >> >> >> with
>> >> >> >> >> the
>> >> >> >> >> >> >> >>>> equivalent
>> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the number
>> of
>> >> >> >> matches I
>> >> >> >> >> >> can
>> >> >> >> >> >> >> >>>> create a
>> >> >> >> >> >> >> >>>> >>> confidence level.
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the
>> >> >> >> >> >> >> >>>> >>> rdf:type
>> >> >> >> from
>> >> >> >> >> >> >> dbpedia
>> >> >> >> >> >> >> >>>> of the
>> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase the
>> >> confidence
>> >> >> >> level.
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities which
>> can
>> >> >> >> >> >> >> >>>> >>> match a
>> >> >> >> >> >> certain
>> >> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the
>> closest
>> >> >> >> >> >> >> >>>> >>> named
>> >> >> >> >> entity
>> >> >> >> >> >> >> prior
>> >> >> >> >> >> >> >>>> to it
>> >> >> >> >> >> >> >>>> >>> in the text.
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>> What do you think?
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>> Cristian
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>> >> >> >> cristian.petroaca@gmail.com>:
>> >> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
>> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm
>> >> >> >> >> >> >> >>>> >>>> working on
>> >> >> >> >> it.
>> >> >> >> >> >> I'll
>> >> >> >> >> >> >> >>>> provide
>> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
>> feedback on
>> >> >> >> >> >> >> >>>> >>>> it.
>> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
>> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as
>> >> >> >> >> >> >> >>>> >>>> ArkRef
>> >> >> >> and
>> >> >> >> >> >> >> >>>> CherryPicker
>> >> >> >> >> >> >> >>>> >>>> and
>> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >>>> >>>> Cristian
>> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
>> >> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >> >>>> >>>>> Without having more details about your
>> concrete
>> >> >> >> heuristic,
>> >> >> >> >> >> in my
>> >> >> >> >> >> >> >>>> honest
>> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of
>> >> false
>> >> >> >> >> >> positives. I
>> >> >> >> >> >> >> >>>> don't
>> >> >> >> >> >> >> >>>> >>>>> know
>> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some "locality"
>> >> features
>> >> >> >> >> >> >> >>>> >>>>> to
>> >> >> >> >> detect
>> >> >> >> >> >> >> such
>> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into account
>> >> that
>> >> >> >> >> >> >> >>>> >>>>> it
>> >> >> >> is
>> >> >> >> >> >> quite
>> >> >> >> >> >> >> >>>> usual
>> >> >> >> >> >> >> >>>> >>>>> that
>> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in
>> >> different
>> >> >> >> >> >> paragraphs.
>> >> >> >> >> >> >> >>>> Although
>> >> >> >> >> >> >> >>>> >>>>> I'm
>> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
>> Understanding,
>> >> I
>> >> >> >> would
>> >> >> >> >> say
>> >> >> >> >> >> it
>> >> >> >> >> >> >> is
>> >> >> >> >> >> >> >>>> quite
>> >> >> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates
>> >> for
>> >> >> >> >> >> coreferencing
>> >> >> >> >> >> >> >>>> using
>> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to
>> others
>> >> >> >> >> >> >> >>>> >>>>> tools
>> >> >> >> like
>> >> >> >> >> >> BART
>> >> >> >> >> >> >> (
>> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >> >>>> >>>>> Cheers,
>> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
>> >> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >> >>>> >>>>>   Hi,
>> >> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for implementing
>> the
>> >> >> >> >> >> >> >>>> >>>>>> Event
>> >> >> >> >> >> >> extraction
>> >> >> >> >> >> >> >>>> Engine
>> >> >> >> >> >> >> >>>> >>>>>> feature :
>> >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >> >>>> >>>>>> have
>> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given text.
>> This
>> >> is
>> >> >> >> >> provided
>> >> >> >> >> >> now
>> >> >> >> >> >> >> >>>> via the
>> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this
>> >> >> >> >> >> >> >>>> >>>>>> module
>> >> >> >> is
>> >> >> >> >> >> >> performing
>> >> >> >> >> >> >> >>>> >>>>>> mostly
>> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama
>> and
>> >> >> >> >> >> >> >>>> >>>>>> Mr.
>> >> >> >> >> Obama)
>> >> >> >> >> >> >> >>>> coreference
>> >> >> >> >> >> >> >>>> >>>>>> resolution.
>> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from the
>> text
>> >> I
>> >> >> >> though
>> >> >> >> >> of
>> >> >> >> >> >> >> >>>> creating
>> >> >> >> >> >> >> >>>> >>>>>> some
>> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
>> >> coreference :
>> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The
>> software
>> >> >> >> company
>> >> >> >> >> just
>> >> >> >> >> >> >> >>>> announced
>> >> >> >> >> >> >> >>>> >>>>>> its
>> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously refers
>> to
>> >> >> >> "Apple".
>> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named
>> >> >> >> >> >> >> >>>> >>>>>> Entities
>> >> >> >> >> which
>> >> >> >> >> >> are
>> >> >> >> >> >> >> of
>> >> >> >> >> >> >> >>>> the
>> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case
>> >> >> >> >> >> >> >>>> >>>>>> "company"
>> >> >> >> and
>> >> >> >> >> >> also
>> >> >> >> >> >> >> >>>> have
>> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
>> >> >> >> categories
>> >> >> >> >> of
>> >> >> >> >> >> the
>> >> >> >> >> >> >> >>>> named
>> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as "The
>> >> >> >> >> >> >> >>>> >>>>>> software
>> >> >> >> >> >> company" in
>> >> >> >> >> >> >> >>>> the
>> >> >> >> >> >> >> >>>> >>>>>> text
>> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using the new
>> Pos
>> >> Tag
>> >> >> >> Based
>> >> >> >> >> >> Phrase
>> >> >> >> >> >> >> >>>> >>>>>> extraction
>> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
>> dependency
>> >> >> >> >> >> >> >>>> >>>>>> tree of
>> >> >> >> >> the
>> >> >> >> >> >> >> >>>> sentence and
>> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind
>> of
>> >> >> >> >> >> >> >>>> >>>>>> logic
>> >> >> >> >> would
>> >> >> >> >> >> be
>> >> >> >> >> >> >> >>>> useful
>> >> >> >> >> >> >> >>>> >>>>>> as a
>> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the
>> >> precision
>> >> >> >> >> >> >> >>>> >>>>>> and
>> >> >> >> >> >> recall
>> >> >> >> >> >> >> are
>> >> >> >> >> >> >> >>>> good
>> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >>>> >>>>>> Thanks,
>> >> >> >> >> >> >> >>>> >>>>>> Cristian
>> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>> --
>> >> >> >> >> >> >> >>>> | Rupert Westenthaler
>> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>> >> >> >> >> >> ++43-699-11108907
>> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>> >> >> >> >> >> >> >>>>
>> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >>>
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> --
>> >> >> >> >> >> >> | Rupert Westenthaler
>> >> >> >> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >> >> | Bodenlehenstraße 11
>> >> >> >> ++43-699-11108907
>> >> >> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> --
>> >> >> >> >> >> | Rupert Westenthaler
>> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >> | Bodenlehenstraße 11
>> >> >> >> >> >> ++43-699-11108907
>> >> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> >> >> >> >> | Bodenlehenstraße 11
>> >> ++43-699-11108907
>> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >> >> >> | A-5500 Bischofshofen
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
I did a clean on the whole project and now I wanted to do another "mvn
clean install" but I am getting this :

"[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-antrun-plugin:1.6:
run (download) on project org.apache.stanbol.data.opennlp.lang.es: An Ant
BuildE
xception has occured: The following error occurred while executing this
line:
[ERROR]
C:\Data\Projects\Stanbol\main\data\opennlp\lang\es\download_models.xml:3
3: Failed to copy
https://github.com/utcompling/OpenNLP-Models/raw/58ef0c6003140
3e66e47ae35edaf58d3478b67af/models/es/opennlp-es-maxent-pos-es.bin to
C:\Data\Pr
ojects\Stanbol\main\data\opennlp\lang\es\downloads\resources\org\apache\stanbol\
data\opennlp\es-pos-maxent.bin due to javax.net.ssl.SSLProtocolException
handshake alert : unrecognized_name"



2014-03-20 11:25 GMT+02:00 Rupert Westenthaler <
rupert.westenthaler@gmail.com>:

> Hi Cristian,
>
> On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> >
> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
> > service.ranking=I"-2147483648"
> > stanbol.enhancer.chain.name="default"
>
> Does look fine to me. Do you see any exception during the startup of
> the launcher. Can you check the status of this component in the
> component tab of the felix web console [1] (search for
> "org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
> you have multiple you can find the correct one by comparing the
> "Properties" with those in the configuration file.
>
> I guess that the according service is in the 'unsatisfied' as you do
> not see it in the web interface. But if this is the case you should
> also see the according exception in the log. You can also manually
> stop/start the component. In this case the exception should be
> re-thrown and you do not need to search the log for it.
>
> best
> Rupert
>
>
> [1] http://localhost:8080/system/console/components
>
> >
> >
> >
> > 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <
> rupert.westenthaler@gmail.com
> >>:
> >
> >> Hi Cristian,
> >>
> >> you can not send attachments to the list. Please copy the contents
> >> directly to the mail
> >>
> >> thx
> >> Rupert
> >>
> >> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
> >> <cr...@gmail.com> wrote:
> >> > The config attached.
> >> >
> >> >
> >> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
> >> > <ru...@gmail.com>:
> >> >
> >> >> Hi Cristian,
> >> >>
> >> >> can you provide the contents of the chain after your modifications?
> >> >> Would be interesting to test why the chain is no longer active after
> >> >> the restart.
> >> >>
> >> >> You can find the config file in the 'stanbol/fileinstall' folder.
> >> >>
> >> >> best
> >> >> Rupert
> >> >>
> >> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
> >> >> <cr...@gmail.com> wrote:
> >> >> > Related to the default chain selection rules : before restart I
> had a
> >> >> > chain
> >> >> > with the name 'default' as in I could access it via
> >> >> > enhancer/chain/default.
> >> >> > Then I just added another engine to the 'default' chain. I assumed
> >> that
> >> >> > after the restart the chain with the 'default' name would be
> >> persisted.
> >> >> > So
> >> >> > the first rule should have been applied after the restart as well.
> But
> >> >> > instead I cannot reach it via enhancer/chain/default anymore so its
> >> >> > gone.
> >> >> > Anyway, this is not a big deal, it's not blocking me in any way, I
> >> just
> >> >> > wanted to understand where the problem is.
> >> >> >
> >> >> >
> >> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
> >> >> > <rupert.westenthaler@gmail.com
> >> >> >>:
> >> >> >
> >> >> >> Hi Cristian
> >> >> >>
> >> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> > 1. Updated to the latest code and it's gone. Cool
> >> >> >> >
> >> >> >> > 2. I start the stable launcher -> create a new instance of the
> >> >> >> > PosChunkerEngine -> add it to the default chain. At this point
> >> >> >> > everything
> >> >> >> > looks good and works ok.
> >> >> >> > After I restart the server the default chain is gone and
> instead I
> >> >> >> > see
> >> >> >> this
> >> >> >> > in the enhancement chains page : all-active (default, id: 149,
> >> >> >> > ranking:
> >> >> >> 0,
> >> >> >> > impl: AllActiveEnginesChain ). all-active did not contain the
> >> >> >> > 'default'
> >> >> >> > word before the restart.
> >> >> >> >
> >> >> >>
> >> >> >> Please note the default chain selection rules as described at [1].
> >> You
> >> >> >> can also access chains chains under '/enhancer/chain/{chain-name}'
> >> >> >>
> >> >> >> best
> >> >> >> Rupert
> >> >> >>
> >> >> >> [1]
> >> >> >>
> >> >> >>
> >>
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
> >> >> >>
> >> >> >> > It looks like the config files are exactly what I need. Thanks.
> >> >> >> >
> >> >> >> >
> >> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
> >> >> >> rupert.westenthaler@gmail.com
> >> >> >> >>:
> >> >> >> >
> >> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
> >> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> >> > Thanks Rupert.
> >> >> >> >> >
> >> >> >> >> > A couple more questions/issues :
> >> >> >> >> >
> >> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this in the
> >> >> >> >> > console
> >> >> >> >> > output :
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
> >> >> >> >>
> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
> messed
> >> >> >> >> > up. I
> >> >> >> >> > usually use the 'default' chain and add my engine to it so
> there
> >> >> >> >> > are
> >> >> >> 11
> >> >> >> >> > engines in it. After the restart this chain now contains
> around
> >> 23
> >> >> >> >> engines
> >> >> >> >> > in total.
> >> >> >> >>
> >> >> >> >> I was not able to replicate this. What I tried was
> >> >> >> >>
> >> >> >> >> (1) start up the stable launcher
> >> >> >> >> (2) add an additional engine to the default chain
> >> >> >> >> (3) restart the launcher
> >> >> >> >>
> >> >> >> >> The default chain was not changed after (2) and (3). So I would
> >> need
> >> >> >> >> further information for knowing why this is happening.
> >> >> >> >>
> >> >> >> >> Generally it is better to create you own chain instance as
> >> modifying
> >> >> >> >> one that is provided by the default configuration. I would also
> >> >> >> >> recommend that you keep your test configuration in text files
> and
> >> to
> >> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so
> prevent
> >> you
> >> >> >> >> from manually entering the configuration after a software
> update.
> >> >> >> >> The
> >> >> >> >> production-mode section [3] provides information on how to do
> >> that.
> >> >> >> >>
> >> >> >> >> best
> >> >> >> >> Rupert
> >> >> >> >>
> >> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
> >> >> >> >> [2] http://svn.apache.org/r1576623
> >> >> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
> >> >> >> >>
> >> >> >> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web
> >> [153]:
> >> >> >> Error
> >> >> >> >> > starting
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >> >>
> >>
> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
> >> >> >> >> >
> >> >> >> >> >
> >> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
> >> >> >> >> > (org.osgi
> >> >> >> >> > .framework.BundleException: Unresolved constraint in bundle
> >> >> >> >> > org.apache.stanbol.e
> >> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0:
> missing
> >> >> >> >> > requirement [15
> >> >> >> >> > 3.0] package; (&(package=javax.ws.rs
> >> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
> >> >> >> >> > org.osgi.framework.BundleException: Unresolved constraint in
> >> >> >> >> > bundle
> >> >> >> >> > org.apache.s
> >> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve
> 153.0:
> >> >> >> missing
> >> >> >> >> > require
> >> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
> >> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
> >> >> >> >> > )
> >> >> >> >> >         at
> >> >> >> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
> >> >> >> >> >         at
> >> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
> >> >> >> >> >         at
> >> >> >> >> >
> >> >> >> >> >
> >> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
> >> >> >> >> >
> >> >> >> >> >         at
> >> >> >> >> >
> >> >> >> >> >
> >> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
> >> >> >> >> > )
> >> >> >> >> >         at java.lang.Thread.run(Unknown Source)
> >> >> >> >> >
> >> >> >> >> > Despite of this the server starts fine and I can use the
> >> enhancer
> >> >> >> fine.
> >> >> >> >> Do
> >> >> >> >> > you guys see this as well?
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > 2. Whenever I restart the server the Weighted Chains get
> messed
> >> >> >> >> > up. I
> >> >> >> >> > usually use the 'default' chain and add my engine to it so
> there
> >> >> >> >> > are
> >> >> >> 11
> >> >> >> >> > engines in it. After the restart this chain now contains
> around
> >> 23
> >> >> >> >> engines
> >> >> >> >> > in total.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
> >> >> >> >> rupert.westenthaler@gmail.com
> >> >> >> >> >>:
> >> >> >> >> >
> >> >> >> >> >> Hi Cristian,
> >> >> >> >> >>
> >> >> >> >> >> NER Annotations are typically available as both
> >> >> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1]
> in
> >> the
> >> >> >> >> >> enhancement metadata. As you are already accessing the
> >> >> >> >> >> AnayzedText I
> >> >> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
> >> >> >> >> >>
> >> >> >> >> >> best
> >> >> >> >> >> Rupert
> >> >> >> >> >>
> >> >> >> >> >> [1]
> >> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
> >> >> >> >> >>
> >> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> >> >> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> >> >> > Thanks.
> >> >> >> >> >> > I assume I should get the Named entities using the same
> but
> >> >> >> >> >> > with
> >> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> >> >> >> >> >> > rupert.westenthaler@gmail.com>:
> >> >> >> >> >> >
> >> >> >> >> >> >> Hallo Cristian,
> >> >> >> >> >> >>
> >> >> >> >> >> >> NounPhrases are not added to the RDF enhancement results.
> >> You
> >> >> >> need to
> >> >> >> >> >> >> use the AnalyzedText ContentPart [1]
> >> >> >> >> >> >>
> >> >> >> >> >> >> here is some demo code you can use in the
> computeEnhancement
> >> >> >> method
> >> >> >> >> >> >>
> >> >> >> >> >> >>         AnalysedText at =
> >> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
> >> >> >> ci,
> >> >> >> >> >> true);
> >> >> >> >> >> >>         Iterator<? extends Section> sections =
> >> >> >> >> >> >> at.getSentences();
> >> >> >> >> >> >>         if(!sections.hasNext()){ //process as single
> >> sentence
> >> >> >> >> >> >>             sections =
> Collections.singleton(at).iterator();
> >> >> >> >> >> >>         }
> >> >> >> >> >> >>
> >> >> >> >> >> >>         while(sections.hasNext()){
> >> >> >> >> >> >>             Section section = sections.next();
> >> >> >> >> >> >>             Iterator<Span> chunks =
> >> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >> >> >> >> >> >>             while(chunks.hasNext()){
> >> >> >> >> >> >>                 Span chunk = chunks.next();
> >> >> >> >> >> >>                 Value<PhraseTag> phrase =
> >> >> >> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >> >> >> >> >> >>                 if(phrase.value().getCategory() ==
> >> >> >> >> >> LexicalCategory.Noun){
> >> >> >> >> >> >>                     log.info(" - NounPhrase [{},{}] {}",
> >> new
> >> >> >> >> Object[]{
> >> >> >> >> >> >>
> >> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >> >> >> >> >> >>                 }
> >> >> >> >> >> >>             }
> >> >> >> >> >> >>         }
> >> >> >> >> >> >>
> >> >> >> >> >> >> hope this helps
> >> >> >> >> >> >>
> >> >> >> >> >> >> best
> >> >> >> >> >> >> Rupert
> >> >> >> >> >> >>
> >> >> >> >> >> >> [1]
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >> >> >> >> >> >>
> >> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >> >> >> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> >> >> >> > I started to implement the engine and I'm having
> problems
> >> >> >> >> >> >> > with
> >> >> >> >> getting
> >> >> >> >> >> >> > results for noun phrases. I modified the "default"
> >> weighted
> >> >> >> chain
> >> >> >> >> to
> >> >> >> >> >> also
> >> >> >> >> >> >> > include the PosChunkerEngine and ran a sample text :
> >> "Angela
> >> >> >> Merkel
> >> >> >> >> >> >> visted
> >> >> >> >> >> >> > China. The german chancellor met with various people".
> I
> >> >> >> expected
> >> >> >> >> that
> >> >> >> >> >> >> the
> >> >> >> >> >> >> > RDF XML output would contain some info about the noun
> >> >> >> >> >> >> > phrases
> >> >> >> but I
> >> >> >> >> >> >> cannot
> >> >> >> >> >> >> > see any.
> >> >> >> >> >> >> > Could you point me to the correct way to generate the
> noun
> >> >> >> phrases?
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Thanks,
> >> >> >> >> >> >> > Cristian
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >> >> >> >> >> >> cristian.petroaca@gmail.com>:
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> Opened
> >> https://issues.apache.org/jira/browse/STANBOL-1279
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >> >> >> >> >> >> cristian.petroaca@gmail.com>
> >> >> >> >> >> >> >> :
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> Hi Rupert,
> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also
> take a
> >> >> >> >> >> >> >>> look
> >> >> >> at
> >> >> >> >> >> Yago.
> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >>> I will create a Jira with what we talked about here.
> It
> >> >> >> >> >> >> >>> will
> >> >> >> >> >> probably
> >> >> >> >> >> >> >>> have just a draft-like description for now and will
> be
> >> >> >> >> >> >> >>> updated
> >> >> >> >> as I
> >> >> >> >> >> go
> >> >> >> >> >> >> >>> along.
> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >>> Thanks,
> >> >> >> >> >> >> >>> Cristian
> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> >> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >>> Hi Cristian,
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>> definitely an interesting approach. You should have
> a
> >> >> >> >> >> >> >>>> look at
> >> >> >> >> Yago2
> >> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is
> much
> >> >> >> better
> >> >> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
> >> >> >> >> >> >> >>>> suggestions of
> >> >> >> >> >> dbpedia
> >> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and
> yago2
> >> do
> >> >> >> >> provide
> >> >> >> >> >> >> >>>> mappings [2] and [3]
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The
> Redmond's
> >> >> >> >> >> >> >>>> >> company
> >> >> >> >> made
> >> >> >> >> >> a
> >> >> >> >> >> >> >>>> >> huge profit".
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>> Thats actually a very good example. Spatial contexts
> >> are
> >> >> >> >> >> >> >>>> very
> >> >> >> >> >> >> >>>> important as they tend to be often used for
> >> referencing.
> >> >> >> >> >> >> >>>> So I
> >> >> >> >> would
> >> >> >> >> >> >> >>>> suggest to specially treat the spatial context. For
> >> >> >> >> >> >> >>>> spatial
> >> >> >> >> >> Entities
> >> >> >> >> >> >> >>>> (like a City) this is easy, but even for other
> (like a
> >> >> >> Person,
> >> >> >> >> >> >> >>>> Company) you could use relations to spatial entities
> >> >> >> >> >> >> >>>> define
> >> >> >> >> their
> >> >> >> >> >> >> >>>> spatial context. This context could than be used to
> >> >> >> >> >> >> >>>> correctly
> >> >> >> >> link
> >> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>> In addition I would suggest to use the "spatial"
> >> context
> >> >> >> >> >> >> >>>> of
> >> >> >> each
> >> >> >> >> >> >> >>>> entity (basically relation to entities that are
> cities,
> >> >> >> regions,
> >> >> >> >> >> >> >>>> countries) as a separate dimension, because those
> are
> >> >> >> >> >> >> >>>> very
> >> >> >> often
> >> >> >> >> >> used
> >> >> >> >> >> >> >>>> for coreferences.
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> >> >> >> >> >> >> >>>> [2]
> >> >> >> >> >> >> >>>>
> >> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >> >> >> >> >> >> >>>> [3]
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >> >>
> >>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
> >> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
> >> >> >> >> >> >> >>>> > There are several dbpedia categories for each
> entity,
> >> >> >> >> >> >> >>>> > in
> >> >> >> this
> >> >> >> >> >> case
> >> >> >> >> >> >> for
> >> >> >> >> >> >> >>>> > Microsoft we have :
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >> >> >> >> >> >> >>>> > category:Microsoft
> >> >> >> >> >> >> >>>> > category:Software_companies_of_the_United_States
> >> >> >> >> >> >> >>>> >
> >> category:Software_companies_based_in_Washington_(state)
> >> >> >> >> >> >> >>>> > category:Companies_established_in_1975
> >> >> >> >> >> >> >>>> > category:1975_establishments_in_the_United_States
> >> >> >> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >>
> >> >> >> >> >>
> >> category:Multinational_companies_headquartered_in_the_United_States
> >> >> >> >> >> >> >>>> > category:Cloud_computing_providers
> >> >> >> >> >> >> >>>> >
> >> category:Companies_in_the_Dow_Jones_Industrial_Average
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> > So we also have "Companies based in
> >> Redmont,Washington"
> >> >> >> which
> >> >> >> >> >> could
> >> >> >> >> >> >> be
> >> >> >> >> >> >> >>>> > matched.
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> > There is still other contextual information from
> >> >> >> >> >> >> >>>> > dbpedia
> >> >> >> which
> >> >> >> >> >> can
> >> >> >> >> >> >> be
> >> >> >> >> >> >> >>>> used.
> >> >> >> >> >> >> >>>> > For example for an Organization we could also
> >> include :
> >> >> >> >> >> >> >>>> > dbpprop:industry = Software
> >> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> > dbpedia-owl:profession:
> >> >> >> >> >> >> >>>> >                                dbpedia:Author
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
> >> >> >> >> >> >> >>>> >                                dbpedia:Lawyer
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> > dbpedia:Community_organizing
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> > I'd like to continue investigating this as I think
> >> that
> >> >> >> >> >> >> >>>> > it
> >> >> >> may
> >> >> >> >> >> have
> >> >> >> >> >> >> >>>> some
> >> >> >> >> >> >> >>>> > value in increasing the number of coreference
> >> >> >> >> >> >> >>>> > resolutions
> >> >> >> and
> >> >> >> >> I'd
> >> >> >> >> >> >> like
> >> >> >> >> >> >> >>>> to
> >> >> >> >> >> >> >>>> > concentrate more on precision rather than recall
> >> since
> >> >> >> >> >> >> >>>> > we
> >> >> >> >> already
> >> >> >> >> >> >> have
> >> >> >> >> >> >> >>>> a
> >> >> >> >> >> >> >>>> > set of coreferences detected by the stanford nlp
> tool
> >> >> >> >> >> >> >>>> > and
> >> >> >> this
> >> >> >> >> >> would
> >> >> >> >> >> >> >>>> be as
> >> >> >> >> >> >> >>>> > an addition to that (at least this is how I would
> >> like
> >> >> >> >> >> >> >>>> > to
> >> >> >> use
> >> >> >> >> >> it).
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I
> could
> >> >> >> >> >> >> >>>> > update
> >> >> >> it
> >> >> >> >> to
> >> >> >> >> >> >> show
> >> >> >> >> >> >> >>>> my
> >> >> >> >> >> >> >>>> > progress and also my conclusions and if it turns
> out
> >> >> >> >> >> >> >>>> > that
> >> >> >> it
> >> >> >> >> was
> >> >> >> >> >> a
> >> >> >> >> >> >> bad
> >> >> >> >> >> >> >>>> idea
> >> >> >> >> >> >> >>>> > then that's the situation at least I'll end up
> with
> >> >> >> >> >> >> >>>> > more
> >> >> >> >> >> knowledge
> >> >> >> >> >> >> >>>> about
> >> >> >> >> >> >> >>>> > Stanbol in the end :).
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
> >> >> >> >> >> >> >>>> > <rh...@apache.org>:
> >> >> >> >> >> >> >>>> >
> >> >> >> >> >> >> >>>> >> Hi Cristian,
> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be the
> >> >> >> >> >> >> >>>> >> devil's
> >> >> >> >> >> advocate
> >> >> >> >> >> >> but
> >> >> >> >> >> >> >>>> I'm
> >> >> >> >> >> >> >>>> >> just not sure about the recall using the dbpedia
> >> >> >> categories
> >> >> >> >> >> >> feature.
> >> >> >> >> >> >> >>>> For
> >> >> >> >> >> >> >>>> >> example, your sentence could be also "Microsoft
> >> posted
> >> >> >> >> >> >> >>>> >> its
> >> >> >> >> 2013
> >> >> >> >> >> >> >>>> earnings.
> >> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit". So,
> maybe
> >> >> >> >> including
> >> >> >> >> >> more
> >> >> >> >> >> >> >>>> >> contextual information from dbpedia could
> increase
> >> the
> >> >> >> recall
> >> >> >> >> >> but
> >> >> >> >> >> >> of
> >> >> >> >> >> >> >>>> course
> >> >> >> >> >> >> >>>> >> will reduce the precision.
> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >>>> >> Cheers,
> >> >> >> >> >> >> >>>> >> Rafa
> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >>>> >>  Back with a more detailed description of the
> steps
> >> >> >> >> >> >> >>>> >> for
> >> >> >> >> making
> >> >> >> >> >> this
> >> >> >> >> >> >> >>>> kind of
> >> >> >> >> >> >> >>>> >>> coreference work.
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>> I will be using references to the following
> text in
> >> >> >> >> >> >> >>>> >>> the
> >> >> >> >> steps
> >> >> >> >> >> >> below
> >> >> >> >> >> >> >>>> in
> >> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted
> >> its
> >> >> >> >> >> >> >>>> >>> 2013
> >> >> >> >> >> >> earnings.
> >> >> >> >> >> >> >>>> The
> >> >> >> >> >> >> >>>> >>> software company made a huge profit."
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
> >> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies
> reference
> >> to
> >> >> >> >> >> >> >>>> >>> an
> >> >> >> >> entity
> >> >> >> >> >> >> local
> >> >> >> >> >> >> >>>> to
> >> >> >> >> >> >> >>>> >>> the
> >> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not
> "another,
> >> >> >> every",
> >> >> >> >> etc
> >> >> >> >> >> >> which
> >> >> >> >> >> >> >>>> >>> implies a reference to an entity outside of the
> >> text.
> >> >> >> >> >> >> >>>> >>>      b. having at least another noun aside from
> the
> >> >> >> >> >> >> >>>> >>> main
> >> >> >> >> >> required
> >> >> >> >> >> >> >>>> noun
> >> >> >> >> >> >> >>>> >>> which
> >> >> >> >> >> >> >>>> >>> further describes it. For example I will not
> count
> >> >> >> >> >> >> >>>> >>> "The
> >> >> >> >> >> company"
> >> >> >> >> >> >> as
> >> >> >> >> >> >> >>>> being
> >> >> >> >> >> >> >>>> >>> a
> >> >> >> >> >> >> >>>> >>> legitimate candidate since this could create a
> lot
> >> of
> >> >> >> false
> >> >> >> >> >> >> >>>> positives by
> >> >> >> >> >> >> >>>> >>> considering the double meaning of some words
> such
> >> as
> >> >> >> >> >> >> >>>> >>> "in
> >> >> >> the
> >> >> >> >> >> >> company
> >> >> >> >> >> >> >>>> of
> >> >> >> >> >> >> >>>> >>> good people".
> >> >> >> >> >> >> >>>> >>> "The software company" is a good candidate
> since we
> >> >> >> >> >> >> >>>> >>> also
> >> >> >> >> have
> >> >> >> >> >> >> >>>> "software".
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the
> >> contents
> >> >> >> >> >> >> >>>> >>> of
> >> >> >> the
> >> >> >> >> >> >> dbpedia
> >> >> >> >> >> >> >>>> >>> categories of each named entity found prior to
> the
> >> >> >> location
> >> >> >> >> of
> >> >> >> >> >> the
> >> >> >> >> >> >> >>>> noun
> >> >> >> >> >> >> >>>> >>> phrase in the text.
> >> >> >> >> >> >> >>>> >>> The dbpedia categories are in the following
> format
> >> >> >> >> >> >> >>>> >>> (for
> >> >> >> >> >> Microsoft
> >> >> >> >> >> >> for
> >> >> >> >> >> >> >>>> >>> example) : "Software companies of the United
> >> States".
> >> >> >> >> >> >> >>>> >>>   So we try to match "software company" with
> that.
> >> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in the
> dbpedia
> >> >> >> category
> >> >> >> >> >> has a
> >> >> >> >> >> >> >>>> plural
> >> >> >> >> >> >> >>>> >>> form and it's the same for all categories which
> I
> >> >> >> >> >> >> >>>> >>> saw. I
> >> >> >> >> don't
> >> >> >> >> >> >> know
> >> >> >> >> >> >> >>>> if
> >> >> >> >> >> >> >>>> >>> there's an easier way to do this but I thought
> of
> >> >> >> applying a
> >> >> >> >> >> >> >>>> lemmatizer on
> >> >> >> >> >> >> >>>> >>> the category and the noun phrase in order for
> them
> >> to
> >> >> >> have a
> >> >> >> >> >> >> common
> >> >> >> >> >> >> >>>> >>> denominator.This also works if the noun phrase
> >> itself
> >> >> >> has a
> >> >> >> >> >> plural
> >> >> >> >> >> >> >>>> form.
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison only the
> >> >> >> >> >> >> >>>> >>> words in
> >> >> >> >> the
> >> >> >> >> >> >> >>>> category
> >> >> >> >> >> >> >>>> >>> which are themselves nouns and not prepositions
> or
> >> >> >> >> determiners
> >> >> >> >> >> >> such
> >> >> >> >> >> >> >>>> as "of
> >> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag the
> >> categories
> >> >> >> >> contents
> >> >> >> >> >> as
> >> >> >> >> >> >> >>>> well.
> >> >> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma on
> the
> >> >> >> dbpedia
> >> >> >> >> >> >> >>>> categories when
> >> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and
> storing
> >> >> >> >> >> >> >>>> >>> them
> >> >> >> for
> >> >> >> >> >> later
> >> >> >> >> >> >> >>>> use - I
> >> >> >> >> >> >> >>>> >>> don't know how feasible this is at the moment.
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>> After this I can compare each noun in the noun
> >> phrase
> >> >> >> with
> >> >> >> >> the
> >> >> >> >> >> >> >>>> equivalent
> >> >> >> >> >> >> >>>> >>> nouns in the categories and based on the number
> of
> >> >> >> matches I
> >> >> >> >> >> can
> >> >> >> >> >> >> >>>> create a
> >> >> >> >> >> >> >>>> >>> confidence level.
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the
> >> >> >> >> >> >> >>>> >>> rdf:type
> >> >> >> from
> >> >> >> >> >> >> dbpedia
> >> >> >> >> >> >> >>>> of the
> >> >> >> >> >> >> >>>> >>> named entity. If this matches increase the
> >> confidence
> >> >> >> level.
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities which
> can
> >> >> >> >> >> >> >>>> >>> match a
> >> >> >> >> >> certain
> >> >> >> >> >> >> >>>> noun
> >> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the
> closest
> >> >> >> >> >> >> >>>> >>> named
> >> >> >> >> entity
> >> >> >> >> >> >> prior
> >> >> >> >> >> >> >>>> to it
> >> >> >> >> >> >> >>>> >>> in the text.
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>> What do you think?
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>> Cristian
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
> >> >> >> cristian.petroaca@gmail.com>:
> >> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >> >>>> >>>  Hi Rafa,
> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm
> >> >> >> >> >> >> >>>> >>>> working on
> >> >> >> >> it.
> >> >> >> >> >> I'll
> >> >> >> >> >> >> >>>> provide
> >> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a
> feedback on
> >> >> >> >> >> >> >>>> >>>> it.
> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >>>> >>>> What are "locality" features?
> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as
> >> >> >> >> >> >> >>>> >>>> ArkRef
> >> >> >> and
> >> >> >> >> >> >> >>>> CherryPicker
> >> >> >> >> >> >> >>>> >>>> and
> >> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >>>> >>>> Cristian
> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >>>> >>>> Hi Cristian,
> >> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >> >>>> >>>>> Without having more details about your
> concrete
> >> >> >> heuristic,
> >> >> >> >> >> in my
> >> >> >> >> >> >> >>>> honest
> >> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of
> >> false
> >> >> >> >> >> positives. I
> >> >> >> >> >> >> >>>> don't
> >> >> >> >> >> >> >>>> >>>>> know
> >> >> >> >> >> >> >>>> >>>>> if you are planning to use some "locality"
> >> features
> >> >> >> >> >> >> >>>> >>>>> to
> >> >> >> >> detect
> >> >> >> >> >> >> such
> >> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into account
> >> that
> >> >> >> >> >> >> >>>> >>>>> it
> >> >> >> is
> >> >> >> >> >> quite
> >> >> >> >> >> >> >>>> usual
> >> >> >> >> >> >> >>>> >>>>> that
> >> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in
> >> different
> >> >> >> >> >> paragraphs.
> >> >> >> >> >> >> >>>> Although
> >> >> >> >> >> >> >>>> >>>>> I'm
> >> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language
> Understanding,
> >> I
> >> >> >> would
> >> >> >> >> say
> >> >> >> >> >> it
> >> >> >> >> >> >> is
> >> >> >> >> >> >> >>>> quite
> >> >> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates
> >> for
> >> >> >> >> >> coreferencing
> >> >> >> >> >> >> >>>> using
> >> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to
> others
> >> >> >> >> >> >> >>>> >>>>> tools
> >> >> >> like
> >> >> >> >> >> BART
> >> >> >> >> >> >> (
> >> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
> >> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >> >>>> >>>>> Cheers,
> >> >> >> >> >> >> >>>> >>>>> Rafa Haro
> >> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
> >> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >> >>>> >>>>>   Hi,
> >> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for implementing
> the
> >> >> >> >> >> >> >>>> >>>>>> Event
> >> >> >> >> >> >> extraction
> >> >> >> >> >> >> >>>> Engine
> >> >> >> >> >> >> >>>> >>>>>> feature :
> >> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
> >> >> >> >> >> >> >>>> to
> >> >> >> >> >> >> >>>> >>>>>> have
> >> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given text.
> This
> >> is
> >> >> >> >> provided
> >> >> >> >> >> now
> >> >> >> >> >> >> >>>> via the
> >> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this
> >> >> >> >> >> >> >>>> >>>>>> module
> >> >> >> is
> >> >> >> >> >> >> performing
> >> >> >> >> >> >> >>>> >>>>>> mostly
> >> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama
> and
> >> >> >> >> >> >> >>>> >>>>>> Mr.
> >> >> >> >> Obama)
> >> >> >> >> >> >> >>>> coreference
> >> >> >> >> >> >> >>>> >>>>>> resolution.
> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from the
> text
> >> I
> >> >> >> though
> >> >> >> >> of
> >> >> >> >> >> >> >>>> creating
> >> >> >> >> >> >> >>>> >>>>>> some
> >> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
> >> coreference :
> >> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The
> software
> >> >> >> company
> >> >> >> >> just
> >> >> >> >> >> >> >>>> announced
> >> >> >> >> >> >> >>>> >>>>>> its
> >> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
> >> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously refers
> to
> >> >> >> "Apple".
> >> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named
> >> >> >> >> >> >> >>>> >>>>>> Entities
> >> >> >> >> which
> >> >> >> >> >> are
> >> >> >> >> >> >> of
> >> >> >> >> >> >> >>>> the
> >> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case
> >> >> >> >> >> >> >>>> >>>>>> "company"
> >> >> >> and
> >> >> >> >> >> also
> >> >> >> >> >> >> >>>> have
> >> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
> >> >> >> categories
> >> >> >> >> of
> >> >> >> >> >> the
> >> >> >> >> >> >> >>>> named
> >> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as "The
> >> >> >> >> >> >> >>>> >>>>>> software
> >> >> >> >> >> company" in
> >> >> >> >> >> >> >>>> the
> >> >> >> >> >> >> >>>> >>>>>> text
> >> >> >> >> >> >> >>>> >>>>>> would also be done by either using the new
> Pos
> >> Tag
> >> >> >> Based
> >> >> >> >> >> Phrase
> >> >> >> >> >> >> >>>> >>>>>> extraction
> >> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a
> dependency
> >> >> >> >> >> >> >>>> >>>>>> tree of
> >> >> >> >> the
> >> >> >> >> >> >> >>>> sentence and
> >> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind
> of
> >> >> >> >> >> >> >>>> >>>>>> logic
> >> >> >> >> would
> >> >> >> >> >> be
> >> >> >> >> >> >> >>>> useful
> >> >> >> >> >> >> >>>> >>>>>> as a
> >> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the
> >> precision
> >> >> >> >> >> >> >>>> >>>>>> and
> >> >> >> >> >> recall
> >> >> >> >> >> >> are
> >> >> >> >> >> >> >>>> good
> >> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >>>> >>>>>> Thanks,
> >> >> >> >> >> >> >>>> >>>>>> Cristian
> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>> --
> >> >> >> >> >> >> >>>> | Rupert Westenthaler
> >> >> >> rupert.westenthaler@gmail.com
> >> >> >> >> >> >> >>>> | Bodenlehenstraße 11
> >> >> >> >> >> ++43-699-11108907
> >> >> >> >> >> >> >>>> | A-5500 Bischofshofen
> >> >> >> >> >> >> >>>>
> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >>>
> >> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> --
> >> >> >> >> >> >> | Rupert Westenthaler
> >> >> >> >> >> >> rupert.westenthaler@gmail.com
> >> >> >> >> >> >> | Bodenlehenstraße 11
> >> >> >> ++43-699-11108907
> >> >> >> >> >> >> | A-5500 Bischofshofen
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >> | Rupert Westenthaler
> >> rupert.westenthaler@gmail.com
> >> >> >> >> >> | Bodenlehenstraße 11
> >> >> >> >> >> ++43-699-11108907
> >> >> >> >> >> | A-5500 Bischofshofen
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> | Rupert Westenthaler
> rupert.westenthaler@gmail.com
> >> >> >> >> | Bodenlehenstraße 11
> >> ++43-699-11108907
> >> >> >> >> | A-5500 Bischofshofen
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> >> | Bodenlehenstraße 11
> ++43-699-11108907
> >> >> >> | A-5500 Bischofshofen
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian,

On Thu, Mar 20, 2014 at 10:00 AM, Cristian Petroaca
<cr...@gmail.com> wrote:
> stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
> service.ranking=I"-2147483648"
> stanbol.enhancer.chain.name="default"

Does look fine to me. Do you see any exception during the startup of
the launcher. Can you check the status of this component in the
component tab of the felix web console [1] (search for
"org.apache.stanbol.enhancer.chain.weighted.impl.WeightedChain"). If
you have multiple you can find the correct one by comparing the
"Properties" with those in the configuration file.

I guess that the according service is in the 'unsatisfied' as you do
not see it in the web interface. But if this is the case you should
also see the according exception in the log. You can also manually
stop/start the component. In this case the exception should be
re-thrown and you do not need to search the log for it.

best
Rupert


[1] http://localhost:8080/system/console/components

>
>
>
> 2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <rupert.westenthaler@gmail.com
>>:
>
>> Hi Cristian,
>>
>> you can not send attachments to the list. Please copy the contents
>> directly to the mail
>>
>> thx
>> Rupert
>>
>> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > The config attached.
>> >
>> >
>> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
>> > <ru...@gmail.com>:
>> >
>> >> Hi Cristian,
>> >>
>> >> can you provide the contents of the chain after your modifications?
>> >> Would be interesting to test why the chain is no longer active after
>> >> the restart.
>> >>
>> >> You can find the config file in the 'stanbol/fileinstall' folder.
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>> >> <cr...@gmail.com> wrote:
>> >> > Related to the default chain selection rules : before restart I had a
>> >> > chain
>> >> > with the name 'default' as in I could access it via
>> >> > enhancer/chain/default.
>> >> > Then I just added another engine to the 'default' chain. I assumed
>> that
>> >> > after the restart the chain with the 'default' name would be
>> persisted.
>> >> > So
>> >> > the first rule should have been applied after the restart as well. But
>> >> > instead I cannot reach it via enhancer/chain/default anymore so its
>> >> > gone.
>> >> > Anyway, this is not a big deal, it's not blocking me in any way, I
>> just
>> >> > wanted to understand where the problem is.
>> >> >
>> >> >
>> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>> >> > <rupert.westenthaler@gmail.com
>> >> >>:
>> >> >
>> >> >> Hi Cristian
>> >> >>
>> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>> >> >> <cr...@gmail.com> wrote:
>> >> >> > 1. Updated to the latest code and it's gone. Cool
>> >> >> >
>> >> >> > 2. I start the stable launcher -> create a new instance of the
>> >> >> > PosChunkerEngine -> add it to the default chain. At this point
>> >> >> > everything
>> >> >> > looks good and works ok.
>> >> >> > After I restart the server the default chain is gone and instead I
>> >> >> > see
>> >> >> this
>> >> >> > in the enhancement chains page : all-active (default, id: 149,
>> >> >> > ranking:
>> >> >> 0,
>> >> >> > impl: AllActiveEnginesChain ). all-active did not contain the
>> >> >> > 'default'
>> >> >> > word before the restart.
>> >> >> >
>> >> >>
>> >> >> Please note the default chain selection rules as described at [1].
>> You
>> >> >> can also access chains chains under '/enhancer/chain/{chain-name}'
>> >> >>
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >> [1]
>> >> >>
>> >> >>
>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>> >> >>
>> >> >> > It looks like the config files are exactly what I need. Thanks.
>> >> >> >
>> >> >> >
>> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>> >> >> rupert.westenthaler@gmail.com
>> >> >> >>:
>> >> >> >
>> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> > Thanks Rupert.
>> >> >> >> >
>> >> >> >> > A couple more questions/issues :
>> >> >> >> >
>> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this in the
>> >> >> >> > console
>> >> >> >> > output :
>> >> >> >> >
>> >> >> >>
>> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
>> >> >> >>
>> >> >> >> > 2. Whenever I restart the server the Weighted Chains get messed
>> >> >> >> > up. I
>> >> >> >> > usually use the 'default' chain and add my engine to it so there
>> >> >> >> > are
>> >> >> 11
>> >> >> >> > engines in it. After the restart this chain now contains around
>> 23
>> >> >> >> engines
>> >> >> >> > in total.
>> >> >> >>
>> >> >> >> I was not able to replicate this. What I tried was
>> >> >> >>
>> >> >> >> (1) start up the stable launcher
>> >> >> >> (2) add an additional engine to the default chain
>> >> >> >> (3) restart the launcher
>> >> >> >>
>> >> >> >> The default chain was not changed after (2) and (3). So I would
>> need
>> >> >> >> further information for knowing why this is happening.
>> >> >> >>
>> >> >> >> Generally it is better to create you own chain instance as
>> modifying
>> >> >> >> one that is provided by the default configuration. I would also
>> >> >> >> recommend that you keep your test configuration in text files and
>> to
>> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so prevent
>> you
>> >> >> >> from manually entering the configuration after a software update.
>> >> >> >> The
>> >> >> >> production-mode section [3] provides information on how to do
>> that.
>> >> >> >>
>> >> >> >> best
>> >> >> >> Rupert
>> >> >> >>
>> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> >> >> >> [2] http://svn.apache.org/r1576623
>> >> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
>> >> >> >>
>> >> >> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web
>> [153]:
>> >> >> Error
>> >> >> >> > starting
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >> >>
>> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> >> >> >> >
>> >> >> >> >
>> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> >> >> >> > (org.osgi
>> >> >> >> > .framework.BundleException: Unresolved constraint in bundle
>> >> >> >> > org.apache.stanbol.e
>> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
>> >> >> >> > requirement [15
>> >> >> >> > 3.0] package; (&(package=javax.ws.rs
>> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>> >> >> >> > org.osgi.framework.BundleException: Unresolved constraint in
>> >> >> >> > bundle
>> >> >> >> > org.apache.s
>> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0:
>> >> >> missing
>> >> >> >> > require
>> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>> >> >> >> > )
>> >> >> >> >         at
>> >> >> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >> >> >> >         at
>> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >> >> >> >         at
>> >> >> >> >
>> >> >> >> >
>> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >> >> >> >
>> >> >> >> >         at
>> >> >> >> >
>> >> >> >> >
>> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> >> >> >> > )
>> >> >> >> >         at java.lang.Thread.run(Unknown Source)
>> >> >> >> >
>> >> >> >> > Despite of this the server starts fine and I can use the
>> enhancer
>> >> >> fine.
>> >> >> >> Do
>> >> >> >> > you guys see this as well?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2. Whenever I restart the server the Weighted Chains get messed
>> >> >> >> > up. I
>> >> >> >> > usually use the 'default' chain and add my engine to it so there
>> >> >> >> > are
>> >> >> 11
>> >> >> >> > engines in it. After the restart this chain now contains around
>> 23
>> >> >> >> engines
>> >> >> >> > in total.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >>:
>> >> >> >> >
>> >> >> >> >> Hi Cristian,
>> >> >> >> >>
>> >> >> >> >> NER Annotations are typically available as both
>> >> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in
>> the
>> >> >> >> >> enhancement metadata. As you are already accessing the
>> >> >> >> >> AnayzedText I
>> >> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>> >> >> >> >>
>> >> >> >> >> best
>> >> >> >> >> Rupert
>> >> >> >> >>
>> >> >> >> >> [1]
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >> >> >> >>
>> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> >> > Thanks.
>> >> >> >> >> > I assume I should get the Named entities using the same but
>> >> >> >> >> > with
>> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> > rupert.westenthaler@gmail.com>:
>> >> >> >> >> >
>> >> >> >> >> >> Hallo Cristian,
>> >> >> >> >> >>
>> >> >> >> >> >> NounPhrases are not added to the RDF enhancement results.
>> You
>> >> >> need to
>> >> >> >> >> >> use the AnalyzedText ContentPart [1]
>> >> >> >> >> >>
>> >> >> >> >> >> here is some demo code you can use in the computeEnhancement
>> >> >> method
>> >> >> >> >> >>
>> >> >> >> >> >>         AnalysedText at =
>> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>> >> >> ci,
>> >> >> >> >> true);
>> >> >> >> >> >>         Iterator<? extends Section> sections =
>> >> >> >> >> >> at.getSentences();
>> >> >> >> >> >>         if(!sections.hasNext()){ //process as single
>> sentence
>> >> >> >> >> >>             sections = Collections.singleton(at).iterator();
>> >> >> >> >> >>         }
>> >> >> >> >> >>
>> >> >> >> >> >>         while(sections.hasNext()){
>> >> >> >> >> >>             Section section = sections.next();
>> >> >> >> >> >>             Iterator<Span> chunks =
>> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >> >> >> >> >>             while(chunks.hasNext()){
>> >> >> >> >> >>                 Span chunk = chunks.next();
>> >> >> >> >> >>                 Value<PhraseTag> phrase =
>> >> >> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >> >> >> >> >>                 if(phrase.value().getCategory() ==
>> >> >> >> >> LexicalCategory.Noun){
>> >> >> >> >> >>                     log.info(" - NounPhrase [{},{}] {}",
>> new
>> >> >> >> Object[]{
>> >> >> >> >> >>
>> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >> >> >> >> >>                 }
>> >> >> >> >> >>             }
>> >> >> >> >> >>         }
>> >> >> >> >> >>
>> >> >> >> >> >> hope this helps
>> >> >> >> >> >>
>> >> >> >> >> >> best
>> >> >> >> >> >> Rupert
>> >> >> >> >> >>
>> >> >> >> >> >> [1]
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >> >> >> >> >>
>> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >> >> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> >> >> > I started to implement the engine and I'm having problems
>> >> >> >> >> >> > with
>> >> >> >> getting
>> >> >> >> >> >> > results for noun phrases. I modified the "default"
>> weighted
>> >> >> chain
>> >> >> >> to
>> >> >> >> >> also
>> >> >> >> >> >> > include the PosChunkerEngine and ran a sample text :
>> "Angela
>> >> >> Merkel
>> >> >> >> >> >> visted
>> >> >> >> >> >> > China. The german chancellor met with various people". I
>> >> >> expected
>> >> >> >> that
>> >> >> >> >> >> the
>> >> >> >> >> >> > RDF XML output would contain some info about the noun
>> >> >> >> >> >> > phrases
>> >> >> but I
>> >> >> >> >> >> cannot
>> >> >> >> >> >> > see any.
>> >> >> >> >> >> > Could you point me to the correct way to generate the noun
>> >> >> phrases?
>> >> >> >> >> >> >
>> >> >> >> >> >> > Thanks,
>> >> >> >> >> >> > Cristian
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >> >> >> >> >> cristian.petroaca@gmail.com>:
>> >> >> >> >> >> >
>> >> >> >> >> >> >> Opened
>> https://issues.apache.org/jira/browse/STANBOL-1279
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >> >> >> >> >> cristian.petroaca@gmail.com>
>> >> >> >> >> >> >> :
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> Hi Rupert,
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also take a
>> >> >> >> >> >> >>> look
>> >> >> at
>> >> >> >> >> Yago.
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>> I will create a Jira with what we talked about here. It
>> >> >> >> >> >> >>> will
>> >> >> >> >> probably
>> >> >> >> >> >> >>> have just a draft-like description for now and will be
>> >> >> >> >> >> >>> updated
>> >> >> >> as I
>> >> >> >> >> go
>> >> >> >> >> >> >>> along.
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>> Thanks,
>> >> >> >> >> >> >>> Cristian
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>> Hi Cristian,
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> definitely an interesting approach. You should have a
>> >> >> >> >> >> >>>> look at
>> >> >> >> Yago2
>> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much
>> >> >> better
>> >> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
>> >> >> >> >> >> >>>> suggestions of
>> >> >> >> >> dbpedia
>> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2
>> do
>> >> >> >> provide
>> >> >> >> >> >> >>>> mappings [2] and [3]
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >> >> >> >> >> >>>> > <rh...@apache.org>:
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's
>> >> >> >> >> >> >>>> >> company
>> >> >> >> made
>> >> >> >> >> a
>> >> >> >> >> >> >>>> >> huge profit".
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> Thats actually a very good example. Spatial contexts
>> are
>> >> >> >> >> >> >>>> very
>> >> >> >> >> >> >>>> important as they tend to be often used for
>> referencing.
>> >> >> >> >> >> >>>> So I
>> >> >> >> would
>> >> >> >> >> >> >>>> suggest to specially treat the spatial context. For
>> >> >> >> >> >> >>>> spatial
>> >> >> >> >> Entities
>> >> >> >> >> >> >>>> (like a City) this is easy, but even for other (like a
>> >> >> Person,
>> >> >> >> >> >> >>>> Company) you could use relations to spatial entities
>> >> >> >> >> >> >>>> define
>> >> >> >> their
>> >> >> >> >> >> >>>> spatial context. This context could than be used to
>> >> >> >> >> >> >>>> correctly
>> >> >> >> link
>> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> In addition I would suggest to use the "spatial"
>> context
>> >> >> >> >> >> >>>> of
>> >> >> each
>> >> >> >> >> >> >>>> entity (basically relation to entities that are cities,
>> >> >> regions,
>> >> >> >> >> >> >>>> countries) as a separate dimension, because those are
>> >> >> >> >> >> >>>> very
>> >> >> often
>> >> >> >> >> used
>> >> >> >> >> >> >>>> for coreferences.
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >> >> >> >> >> >>>> [2]
>> >> >> >> >> >> >>>>
>> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >> >> >> >> >> >>>> [3]
>> >> >> >> >> >> >>>>
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>> >> >> >> >> >> >>>> > There are several dbpedia categories for each entity,
>> >> >> >> >> >> >>>> > in
>> >> >> this
>> >> >> >> >> case
>> >> >> >> >> >> for
>> >> >> >> >> >> >>>> > Microsoft we have :
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >> >> >> >> >> >>>> > category:Microsoft
>> >> >> >> >> >> >>>> > category:Software_companies_of_the_United_States
>> >> >> >> >> >> >>>> >
>> category:Software_companies_based_in_Washington_(state)
>> >> >> >> >> >> >>>> > category:Companies_established_in_1975
>> >> >> >> >> >> >>>> > category:1975_establishments_in_the_United_States
>> >> >> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
>> >> >> >> >> >> >>>> >
>> >> >> >> >>
>> >> >> >> >>
>> category:Multinational_companies_headquartered_in_the_United_States
>> >> >> >> >> >> >>>> > category:Cloud_computing_providers
>> >> >> >> >> >> >>>> >
>> category:Companies_in_the_Dow_Jones_Industrial_Average
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > So we also have "Companies based in
>> Redmont,Washington"
>> >> >> which
>> >> >> >> >> could
>> >> >> >> >> >> be
>> >> >> >> >> >> >>>> > matched.
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > There is still other contextual information from
>> >> >> >> >> >> >>>> > dbpedia
>> >> >> which
>> >> >> >> >> can
>> >> >> >> >> >> be
>> >> >> >> >> >> >>>> used.
>> >> >> >> >> >> >>>> > For example for an Organization we could also
>> include :
>> >> >> >> >> >> >>>> > dbpprop:industry = Software
>> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > dbpedia-owl:profession:
>> >> >> >> >> >> >>>> >                                dbpedia:Author
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
>> >> >> >> >> >> >>>> >                                dbpedia:Lawyer
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > dbpedia:Community_organizing
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > I'd like to continue investigating this as I think
>> that
>> >> >> >> >> >> >>>> > it
>> >> >> may
>> >> >> >> >> have
>> >> >> >> >> >> >>>> some
>> >> >> >> >> >> >>>> > value in increasing the number of coreference
>> >> >> >> >> >> >>>> > resolutions
>> >> >> and
>> >> >> >> I'd
>> >> >> >> >> >> like
>> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >>>> > concentrate more on precision rather than recall
>> since
>> >> >> >> >> >> >>>> > we
>> >> >> >> already
>> >> >> >> >> >> have
>> >> >> >> >> >> >>>> a
>> >> >> >> >> >> >>>> > set of coreferences detected by the stanford nlp tool
>> >> >> >> >> >> >>>> > and
>> >> >> this
>> >> >> >> >> would
>> >> >> >> >> >> >>>> be as
>> >> >> >> >> >> >>>> > an addition to that (at least this is how I would
>> like
>> >> >> >> >> >> >>>> > to
>> >> >> use
>> >> >> >> >> it).
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I could
>> >> >> >> >> >> >>>> > update
>> >> >> it
>> >> >> >> to
>> >> >> >> >> >> show
>> >> >> >> >> >> >>>> my
>> >> >> >> >> >> >>>> > progress and also my conclusions and if it turns out
>> >> >> >> >> >> >>>> > that
>> >> >> it
>> >> >> >> was
>> >> >> >> >> a
>> >> >> >> >> >> bad
>> >> >> >> >> >> >>>> idea
>> >> >> >> >> >> >>>> > then that's the situation at least I'll end up with
>> >> >> >> >> >> >>>> > more
>> >> >> >> >> knowledge
>> >> >> >> >> >> >>>> about
>> >> >> >> >> >> >>>> > Stanbol in the end :).
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >> >> >> >> >> >>>> > <rh...@apache.org>:
>> >> >> >> >> >> >>>> >
>> >> >> >> >> >> >>>> >> Hi Cristian,
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be the
>> >> >> >> >> >> >>>> >> devil's
>> >> >> >> >> advocate
>> >> >> >> >> >> but
>> >> >> >> >> >> >>>> I'm
>> >> >> >> >> >> >>>> >> just not sure about the recall using the dbpedia
>> >> >> categories
>> >> >> >> >> >> feature.
>> >> >> >> >> >> >>>> For
>> >> >> >> >> >> >>>> >> example, your sentence could be also "Microsoft
>> posted
>> >> >> >> >> >> >>>> >> its
>> >> >> >> 2013
>> >> >> >> >> >> >>>> earnings.
>> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
>> >> >> >> including
>> >> >> >> >> more
>> >> >> >> >> >> >>>> >> contextual information from dbpedia could increase
>> the
>> >> >> recall
>> >> >> >> >> but
>> >> >> >> >> >> of
>> >> >> >> >> >> >>>> course
>> >> >> >> >> >> >>>> >> will reduce the precision.
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>> >> Cheers,
>> >> >> >> >> >> >>>> >> Rafa
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>> >>  Back with a more detailed description of the steps
>> >> >> >> >> >> >>>> >> for
>> >> >> >> making
>> >> >> >> >> this
>> >> >> >> >> >> >>>> kind of
>> >> >> >> >> >> >>>> >>> coreference work.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> I will be using references to the following text in
>> >> >> >> >> >> >>>> >>> the
>> >> >> >> steps
>> >> >> >> >> >> below
>> >> >> >> >> >> >>>> in
>> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted
>> its
>> >> >> >> >> >> >>>> >>> 2013
>> >> >> >> >> >> earnings.
>> >> >> >> >> >> >>>> The
>> >> >> >> >> >> >>>> >>> software company made a huge profit."
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
>> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies reference
>> to
>> >> >> >> >> >> >>>> >>> an
>> >> >> >> entity
>> >> >> >> >> >> local
>> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >>>> >>> the
>> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not "another,
>> >> >> every",
>> >> >> >> etc
>> >> >> >> >> >> which
>> >> >> >> >> >> >>>> >>> implies a reference to an entity outside of the
>> text.
>> >> >> >> >> >> >>>> >>>      b. having at least another noun aside from the
>> >> >> >> >> >> >>>> >>> main
>> >> >> >> >> required
>> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >>>> >>> which
>> >> >> >> >> >> >>>> >>> further describes it. For example I will not count
>> >> >> >> >> >> >>>> >>> "The
>> >> >> >> >> company"
>> >> >> >> >> >> as
>> >> >> >> >> >> >>>> being
>> >> >> >> >> >> >>>> >>> a
>> >> >> >> >> >> >>>> >>> legitimate candidate since this could create a lot
>> of
>> >> >> false
>> >> >> >> >> >> >>>> positives by
>> >> >> >> >> >> >>>> >>> considering the double meaning of some words such
>> as
>> >> >> >> >> >> >>>> >>> "in
>> >> >> the
>> >> >> >> >> >> company
>> >> >> >> >> >> >>>> of
>> >> >> >> >> >> >>>> >>> good people".
>> >> >> >> >> >> >>>> >>> "The software company" is a good candidate since we
>> >> >> >> >> >> >>>> >>> also
>> >> >> >> have
>> >> >> >> >> >> >>>> "software".
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the
>> contents
>> >> >> >> >> >> >>>> >>> of
>> >> >> the
>> >> >> >> >> >> dbpedia
>> >> >> >> >> >> >>>> >>> categories of each named entity found prior to the
>> >> >> location
>> >> >> >> of
>> >> >> >> >> the
>> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >>>> >>> phrase in the text.
>> >> >> >> >> >> >>>> >>> The dbpedia categories are in the following format
>> >> >> >> >> >> >>>> >>> (for
>> >> >> >> >> Microsoft
>> >> >> >> >> >> for
>> >> >> >> >> >> >>>> >>> example) : "Software companies of the United
>> States".
>> >> >> >> >> >> >>>> >>>   So we try to match "software company" with that.
>> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia
>> >> >> category
>> >> >> >> >> has a
>> >> >> >> >> >> >>>> plural
>> >> >> >> >> >> >>>> >>> form and it's the same for all categories which I
>> >> >> >> >> >> >>>> >>> saw. I
>> >> >> >> don't
>> >> >> >> >> >> know
>> >> >> >> >> >> >>>> if
>> >> >> >> >> >> >>>> >>> there's an easier way to do this but I thought of
>> >> >> applying a
>> >> >> >> >> >> >>>> lemmatizer on
>> >> >> >> >> >> >>>> >>> the category and the noun phrase in order for them
>> to
>> >> >> have a
>> >> >> >> >> >> common
>> >> >> >> >> >> >>>> >>> denominator.This also works if the noun phrase
>> itself
>> >> >> has a
>> >> >> >> >> plural
>> >> >> >> >> >> >>>> form.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison only the
>> >> >> >> >> >> >>>> >>> words in
>> >> >> >> the
>> >> >> >> >> >> >>>> category
>> >> >> >> >> >> >>>> >>> which are themselves nouns and not prepositions or
>> >> >> >> determiners
>> >> >> >> >> >> such
>> >> >> >> >> >> >>>> as "of
>> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag the
>> categories
>> >> >> >> contents
>> >> >> >> >> as
>> >> >> >> >> >> >>>> well.
>> >> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma on the
>> >> >> dbpedia
>> >> >> >> >> >> >>>> categories when
>> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and storing
>> >> >> >> >> >> >>>> >>> them
>> >> >> for
>> >> >> >> >> later
>> >> >> >> >> >> >>>> use - I
>> >> >> >> >> >> >>>> >>> don't know how feasible this is at the moment.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> After this I can compare each noun in the noun
>> phrase
>> >> >> with
>> >> >> >> the
>> >> >> >> >> >> >>>> equivalent
>> >> >> >> >> >> >>>> >>> nouns in the categories and based on the number of
>> >> >> matches I
>> >> >> >> >> can
>> >> >> >> >> >> >>>> create a
>> >> >> >> >> >> >>>> >>> confidence level.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the
>> >> >> >> >> >> >>>> >>> rdf:type
>> >> >> from
>> >> >> >> >> >> dbpedia
>> >> >> >> >> >> >>>> of the
>> >> >> >> >> >> >>>> >>> named entity. If this matches increase the
>> confidence
>> >> >> level.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities which can
>> >> >> >> >> >> >>>> >>> match a
>> >> >> >> >> certain
>> >> >> >> >> >> >>>> noun
>> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the closest
>> >> >> >> >> >> >>>> >>> named
>> >> >> >> entity
>> >> >> >> >> >> prior
>> >> >> >> >> >> >>>> to it
>> >> >> >> >> >> >>>> >>> in the text.
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> What do you think?
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> Cristian
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>> >> >> cristian.petroaca@gmail.com>:
>> >> >> >> >> >> >>>> >>>
>> >> >> >> >> >> >>>> >>>  Hi Rafa,
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm
>> >> >> >> >> >> >>>> >>>> working on
>> >> >> >> it.
>> >> >> >> >> I'll
>> >> >> >> >> >> >>>> provide
>> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a feedback on
>> >> >> >> >> >> >>>> >>>> it.
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> What are "locality" features?
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as
>> >> >> >> >> >> >>>> >>>> ArkRef
>> >> >> and
>> >> >> >> >> >> >>>> CherryPicker
>> >> >> >> >> >> >>>> >>>> and
>> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> Cristian
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>> Hi Cristian,
>> >> >> >> >> >> >>>> >>>>
>> >> >> >> >> >> >>>> >>>>> Without having more details about your concrete
>> >> >> heuristic,
>> >> >> >> >> in my
>> >> >> >> >> >> >>>> honest
>> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of
>> false
>> >> >> >> >> positives. I
>> >> >> >> >> >> >>>> don't
>> >> >> >> >> >> >>>> >>>>> know
>> >> >> >> >> >> >>>> >>>>> if you are planning to use some "locality"
>> features
>> >> >> >> >> >> >>>> >>>>> to
>> >> >> >> detect
>> >> >> >> >> >> such
>> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into account
>> that
>> >> >> >> >> >> >>>> >>>>> it
>> >> >> is
>> >> >> >> >> quite
>> >> >> >> >> >> >>>> usual
>> >> >> >> >> >> >>>> >>>>> that
>> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in
>> different
>> >> >> >> >> paragraphs.
>> >> >> >> >> >> >>>> Although
>> >> >> >> >> >> >>>> >>>>> I'm
>> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language Understanding,
>> I
>> >> >> would
>> >> >> >> say
>> >> >> >> >> it
>> >> >> >> >> >> is
>> >> >> >> >> >> >>>> quite
>> >> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates
>> for
>> >> >> >> >> coreferencing
>> >> >> >> >> >> >>>> using
>> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others
>> >> >> >> >> >> >>>> >>>>> tools
>> >> >> like
>> >> >> >> >> BART
>> >> >> >> >> >> (
>> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >>>> >>>>> Cheers,
>> >> >> >> >> >> >>>> >>>>> Rafa Haro
>> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >>>> >>>>>   Hi,
>> >> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for implementing the
>> >> >> >> >> >> >>>> >>>>>> Event
>> >> >> >> >> >> extraction
>> >> >> >> >> >> >>>> Engine
>> >> >> >> >> >> >>>> >>>>>> feature :
>> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >> >> >> >> >> >>>> to
>> >> >> >> >> >> >>>> >>>>>> have
>> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given text. This
>> is
>> >> >> >> provided
>> >> >> >> >> now
>> >> >> >> >> >> >>>> via the
>> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this
>> >> >> >> >> >> >>>> >>>>>> module
>> >> >> is
>> >> >> >> >> >> performing
>> >> >> >> >> >> >>>> >>>>>> mostly
>> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and
>> >> >> >> >> >> >>>> >>>>>> Mr.
>> >> >> >> Obama)
>> >> >> >> >> >> >>>> coreference
>> >> >> >> >> >> >>>> >>>>>> resolution.
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from the text
>> I
>> >> >> though
>> >> >> >> of
>> >> >> >> >> >> >>>> creating
>> >> >> >> >> >> >>>> >>>>>> some
>> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
>> coreference :
>> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software
>> >> >> company
>> >> >> >> just
>> >> >> >> >> >> >>>> announced
>> >> >> >> >> >> >>>> >>>>>> its
>> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
>> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously refers to
>> >> >> "Apple".
>> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named
>> >> >> >> >> >> >>>> >>>>>> Entities
>> >> >> >> which
>> >> >> >> >> are
>> >> >> >> >> >> of
>> >> >> >> >> >> >>>> the
>> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case
>> >> >> >> >> >> >>>> >>>>>> "company"
>> >> >> and
>> >> >> >> >> also
>> >> >> >> >> >> >>>> have
>> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
>> >> >> categories
>> >> >> >> of
>> >> >> >> >> the
>> >> >> >> >> >> >>>> named
>> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as "The
>> >> >> >> >> >> >>>> >>>>>> software
>> >> >> >> >> company" in
>> >> >> >> >> >> >>>> the
>> >> >> >> >> >> >>>> >>>>>> text
>> >> >> >> >> >> >>>> >>>>>> would also be done by either using the new Pos
>> Tag
>> >> >> Based
>> >> >> >> >> Phrase
>> >> >> >> >> >> >>>> >>>>>> extraction
>> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency
>> >> >> >> >> >> >>>> >>>>>> tree of
>> >> >> >> the
>> >> >> >> >> >> >>>> sentence and
>> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind of
>> >> >> >> >> >> >>>> >>>>>> logic
>> >> >> >> would
>> >> >> >> >> be
>> >> >> >> >> >> >>>> useful
>> >> >> >> >> >> >>>> >>>>>> as a
>> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the
>> precision
>> >> >> >> >> >> >>>> >>>>>> and
>> >> >> >> >> recall
>> >> >> >> >> >> are
>> >> >> >> >> >> >>>> good
>> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>> Thanks,
>> >> >> >> >> >> >>>> >>>>>> Cristian
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >> >>>> >>
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>> --
>> >> >> >> >> >> >>>> | Rupert Westenthaler
>> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >> >>>> | Bodenlehenstraße 11
>> >> >> >> >> ++43-699-11108907
>> >> >> >> >> >> >>>> | A-5500 Bischofshofen
>> >> >> >> >> >> >>>>
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>>
>> >> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> --
>> >> >> >> >> >> | Rupert Westenthaler
>> >> >> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >> | Bodenlehenstraße 11
>> >> >> ++43-699-11108907
>> >> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> >> >> >> >> | Bodenlehenstraße 11
>> >> >> >> >> ++43-699-11108907
>> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >> >> >> | A-5500 Bischofshofen
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >
>> >
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
stanbol.enhancer.chain.weighted.chain=["tika;optional","langdetect","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-ner","dbpediaLinking","entityhubExtraction","dbpedia-dereference","pos-chunker"]
service.ranking=I"-2147483648"
stanbol.enhancer.chain.name="default"



2014-03-20 7:39 GMT+02:00 Rupert Westenthaler <rupert.westenthaler@gmail.com
>:

> Hi Cristian,
>
> you can not send attachments to the list. Please copy the contents
> directly to the mail
>
> thx
> Rupert
>
> On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > The config attached.
> >
> >
> > 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
> > <ru...@gmail.com>:
> >
> >> Hi Cristian,
> >>
> >> can you provide the contents of the chain after your modifications?
> >> Would be interesting to test why the chain is no longer active after
> >> the restart.
> >>
> >> You can find the config file in the 'stanbol/fileinstall' folder.
> >>
> >> best
> >> Rupert
> >>
> >> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
> >> <cr...@gmail.com> wrote:
> >> > Related to the default chain selection rules : before restart I had a
> >> > chain
> >> > with the name 'default' as in I could access it via
> >> > enhancer/chain/default.
> >> > Then I just added another engine to the 'default' chain. I assumed
> that
> >> > after the restart the chain with the 'default' name would be
> persisted.
> >> > So
> >> > the first rule should have been applied after the restart as well. But
> >> > instead I cannot reach it via enhancer/chain/default anymore so its
> >> > gone.
> >> > Anyway, this is not a big deal, it's not blocking me in any way, I
> just
> >> > wanted to understand where the problem is.
> >> >
> >> >
> >> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
> >> > <rupert.westenthaler@gmail.com
> >> >>:
> >> >
> >> >> Hi Cristian
> >> >>
> >> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
> >> >> <cr...@gmail.com> wrote:
> >> >> > 1. Updated to the latest code and it's gone. Cool
> >> >> >
> >> >> > 2. I start the stable launcher -> create a new instance of the
> >> >> > PosChunkerEngine -> add it to the default chain. At this point
> >> >> > everything
> >> >> > looks good and works ok.
> >> >> > After I restart the server the default chain is gone and instead I
> >> >> > see
> >> >> this
> >> >> > in the enhancement chains page : all-active (default, id: 149,
> >> >> > ranking:
> >> >> 0,
> >> >> > impl: AllActiveEnginesChain ). all-active did not contain the
> >> >> > 'default'
> >> >> > word before the restart.
> >> >> >
> >> >>
> >> >> Please note the default chain selection rules as described at [1].
> You
> >> >> can also access chains chains under '/enhancer/chain/{chain-name}'
> >> >>
> >> >> best
> >> >> Rupert
> >> >>
> >> >> [1]
> >> >>
> >> >>
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
> >> >>
> >> >> > It looks like the config files are exactly what I need. Thanks.
> >> >> >
> >> >> >
> >> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
> >> >> rupert.westenthaler@gmail.com
> >> >> >>:
> >> >> >
> >> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> > Thanks Rupert.
> >> >> >> >
> >> >> >> > A couple more questions/issues :
> >> >> >> >
> >> >> >> > 1. Whenever I start the stanbol server I'm seeing this in the
> >> >> >> > console
> >> >> >> > output :
> >> >> >> >
> >> >> >>
> >> >> >> This should be fixed with STANBOL-1278 [1] [2]
> >> >> >>
> >> >> >> > 2. Whenever I restart the server the Weighted Chains get messed
> >> >> >> > up. I
> >> >> >> > usually use the 'default' chain and add my engine to it so there
> >> >> >> > are
> >> >> 11
> >> >> >> > engines in it. After the restart this chain now contains around
> 23
> >> >> >> engines
> >> >> >> > in total.
> >> >> >>
> >> >> >> I was not able to replicate this. What I tried was
> >> >> >>
> >> >> >> (1) start up the stable launcher
> >> >> >> (2) add an additional engine to the default chain
> >> >> >> (3) restart the launcher
> >> >> >>
> >> >> >> The default chain was not changed after (2) and (3). So I would
> need
> >> >> >> further information for knowing why this is happening.
> >> >> >>
> >> >> >> Generally it is better to create you own chain instance as
> modifying
> >> >> >> one that is provided by the default configuration. I would also
> >> >> >> recommend that you keep your test configuration in text files and
> to
> >> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so prevent
> you
> >> >> >> from manually entering the configuration after a software update.
> >> >> >> The
> >> >> >> production-mode section [3] provides information on how to do
> that.
> >> >> >>
> >> >> >> best
> >> >> >> Rupert
> >> >> >>
> >> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
> >> >> >> [2] http://svn.apache.org/r1576623
> >> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
> >> >> >>
> >> >> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web
> [153]:
> >> >> Error
> >> >> >> > starting
> >> >> >> >
> >> >> >>
> >> >>
> >> >>
> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
> >> >> >> >
> >> >> >> >
> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
> >> >> >> > (org.osgi
> >> >> >> > .framework.BundleException: Unresolved constraint in bundle
> >> >> >> > org.apache.stanbol.e
> >> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
> >> >> >> > requirement [15
> >> >> >> > 3.0] package; (&(package=javax.ws.rs
> >> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
> >> >> >> > org.osgi.framework.BundleException: Unresolved constraint in
> >> >> >> > bundle
> >> >> >> > org.apache.s
> >> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0:
> >> >> missing
> >> >> >> > require
> >> >> >> > ment [153.0] package; (&(package=javax.ws.rs
> >> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
> >> >> >> > )
> >> >> >> >         at
> >> >> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
> >> >> >> >         at
> >> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
> >> >> >> >         at
> >> >> >> >
> >> >> >> >
> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
> >> >> >> >
> >> >> >> >         at
> >> >> >> >
> >> >> >> >
> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
> >> >> >> > )
> >> >> >> >         at java.lang.Thread.run(Unknown Source)
> >> >> >> >
> >> >> >> > Despite of this the server starts fine and I can use the
> enhancer
> >> >> fine.
> >> >> >> Do
> >> >> >> > you guys see this as well?
> >> >> >> >
> >> >> >> >
> >> >> >> > 2. Whenever I restart the server the Weighted Chains get messed
> >> >> >> > up. I
> >> >> >> > usually use the 'default' chain and add my engine to it so there
> >> >> >> > are
> >> >> 11
> >> >> >> > engines in it. After the restart this chain now contains around
> 23
> >> >> >> engines
> >> >> >> > in total.
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
> >> >> >> rupert.westenthaler@gmail.com
> >> >> >> >>:
> >> >> >> >
> >> >> >> >> Hi Cristian,
> >> >> >> >>
> >> >> >> >> NER Annotations are typically available as both
> >> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in
> the
> >> >> >> >> enhancement metadata. As you are already accessing the
> >> >> >> >> AnayzedText I
> >> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
> >> >> >> >>
> >> >> >> >> best
> >> >> >> >> Rupert
> >> >> >> >>
> >> >> >> >> [1]
> >> >> >> >>
> >> >> >>
> >> >>
> >> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
> >> >> >> >>
> >> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> >> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> >> > Thanks.
> >> >> >> >> > I assume I should get the Named entities using the same but
> >> >> >> >> > with
> >> >> >> >> > NlpAnnotations.NER_ANNOTATION?
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> >> >> >> >> > rupert.westenthaler@gmail.com>:
> >> >> >> >> >
> >> >> >> >> >> Hallo Cristian,
> >> >> >> >> >>
> >> >> >> >> >> NounPhrases are not added to the RDF enhancement results.
> You
> >> >> need to
> >> >> >> >> >> use the AnalyzedText ContentPart [1]
> >> >> >> >> >>
> >> >> >> >> >> here is some demo code you can use in the computeEnhancement
> >> >> method
> >> >> >> >> >>
> >> >> >> >> >>         AnalysedText at =
> >> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
> >> >> ci,
> >> >> >> >> true);
> >> >> >> >> >>         Iterator<? extends Section> sections =
> >> >> >> >> >> at.getSentences();
> >> >> >> >> >>         if(!sections.hasNext()){ //process as single
> sentence
> >> >> >> >> >>             sections = Collections.singleton(at).iterator();
> >> >> >> >> >>         }
> >> >> >> >> >>
> >> >> >> >> >>         while(sections.hasNext()){
> >> >> >> >> >>             Section section = sections.next();
> >> >> >> >> >>             Iterator<Span> chunks =
> >> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >> >> >> >> >>             while(chunks.hasNext()){
> >> >> >> >> >>                 Span chunk = chunks.next();
> >> >> >> >> >>                 Value<PhraseTag> phrase =
> >> >> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >> >> >> >> >>                 if(phrase.value().getCategory() ==
> >> >> >> >> LexicalCategory.Noun){
> >> >> >> >> >>                     log.info(" - NounPhrase [{},{}] {}",
> new
> >> >> >> Object[]{
> >> >> >> >> >>
> >> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >> >> >> >> >>                 }
> >> >> >> >> >>             }
> >> >> >> >> >>         }
> >> >> >> >> >>
> >> >> >> >> >> hope this helps
> >> >> >> >> >>
> >> >> >> >> >> best
> >> >> >> >> >> Rupert
> >> >> >> >> >>
> >> >> >> >> >> [1]
> >> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >>
> >> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >> >> >> >> >>
> >> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >> >> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> >> >> > I started to implement the engine and I'm having problems
> >> >> >> >> >> > with
> >> >> >> getting
> >> >> >> >> >> > results for noun phrases. I modified the "default"
> weighted
> >> >> chain
> >> >> >> to
> >> >> >> >> also
> >> >> >> >> >> > include the PosChunkerEngine and ran a sample text :
> "Angela
> >> >> Merkel
> >> >> >> >> >> visted
> >> >> >> >> >> > China. The german chancellor met with various people". I
> >> >> expected
> >> >> >> that
> >> >> >> >> >> the
> >> >> >> >> >> > RDF XML output would contain some info about the noun
> >> >> >> >> >> > phrases
> >> >> but I
> >> >> >> >> >> cannot
> >> >> >> >> >> > see any.
> >> >> >> >> >> > Could you point me to the correct way to generate the noun
> >> >> phrases?
> >> >> >> >> >> >
> >> >> >> >> >> > Thanks,
> >> >> >> >> >> > Cristian
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >> >> >> >> >> cristian.petroaca@gmail.com>:
> >> >> >> >> >> >
> >> >> >> >> >> >> Opened
> https://issues.apache.org/jira/browse/STANBOL-1279
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >> >> >> >> >> cristian.petroaca@gmail.com>
> >> >> >> >> >> >> :
> >> >> >> >> >> >>
> >> >> >> >> >> >> Hi Rupert,
> >> >> >> >> >> >>>
> >> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also take a
> >> >> >> >> >> >>> look
> >> >> at
> >> >> >> >> Yago.
> >> >> >> >> >> >>>
> >> >> >> >> >> >>> I will create a Jira with what we talked about here. It
> >> >> >> >> >> >>> will
> >> >> >> >> probably
> >> >> >> >> >> >>> have just a draft-like description for now and will be
> >> >> >> >> >> >>> updated
> >> >> >> as I
> >> >> >> >> go
> >> >> >> >> >> >>> along.
> >> >> >> >> >> >>>
> >> >> >> >> >> >>> Thanks,
> >> >> >> >> >> >>> Cristian
> >> >> >> >> >> >>>
> >> >> >> >> >> >>>
> >> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> >> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
> >> >> >> >> >> >>>
> >> >> >> >> >> >>> Hi Cristian,
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>> definitely an interesting approach. You should have a
> >> >> >> >> >> >>>> look at
> >> >> >> Yago2
> >> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much
> >> >> better
> >> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
> >> >> >> >> >> >>>> suggestions of
> >> >> >> >> dbpedia
> >> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2
> do
> >> >> >> provide
> >> >> >> >> >> >>>> mappings [2] and [3]
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
> >> >> >> >> >> >>>> > <rh...@apache.org>:
> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's
> >> >> >> >> >> >>>> >> company
> >> >> >> made
> >> >> >> >> a
> >> >> >> >> >> >>>> >> huge profit".
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>> Thats actually a very good example. Spatial contexts
> are
> >> >> >> >> >> >>>> very
> >> >> >> >> >> >>>> important as they tend to be often used for
> referencing.
> >> >> >> >> >> >>>> So I
> >> >> >> would
> >> >> >> >> >> >>>> suggest to specially treat the spatial context. For
> >> >> >> >> >> >>>> spatial
> >> >> >> >> Entities
> >> >> >> >> >> >>>> (like a City) this is easy, but even for other (like a
> >> >> Person,
> >> >> >> >> >> >>>> Company) you could use relations to spatial entities
> >> >> >> >> >> >>>> define
> >> >> >> their
> >> >> >> >> >> >>>> spatial context. This context could than be used to
> >> >> >> >> >> >>>> correctly
> >> >> >> link
> >> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>> In addition I would suggest to use the "spatial"
> context
> >> >> >> >> >> >>>> of
> >> >> each
> >> >> >> >> >> >>>> entity (basically relation to entities that are cities,
> >> >> regions,
> >> >> >> >> >> >>>> countries) as a separate dimension, because those are
> >> >> >> >> >> >>>> very
> >> >> often
> >> >> >> >> used
> >> >> >> >> >> >>>> for coreferences.
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> >> >> >> >> >> >>>> [2]
> >> >> >> >> >> >>>>
> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >> >> >> >> >> >>>> [3]
> >> >> >> >> >> >>>>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >>
> >> >>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
> >> >> >> >> >> >>>> <cr...@gmail.com> wrote:
> >> >> >> >> >> >>>> > There are several dbpedia categories for each entity,
> >> >> >> >> >> >>>> > in
> >> >> this
> >> >> >> >> case
> >> >> >> >> >> for
> >> >> >> >> >> >>>> > Microsoft we have :
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >> >> >> >> >> >>>> > category:Microsoft
> >> >> >> >> >> >>>> > category:Software_companies_of_the_United_States
> >> >> >> >> >> >>>> >
> category:Software_companies_based_in_Washington_(state)
> >> >> >> >> >> >>>> > category:Companies_established_in_1975
> >> >> >> >> >> >>>> > category:1975_establishments_in_the_United_States
> >> >> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
> >> >> >> >> >> >>>> >
> >> >> >> >>
> >> >> >> >>
> category:Multinational_companies_headquartered_in_the_United_States
> >> >> >> >> >> >>>> > category:Cloud_computing_providers
> >> >> >> >> >> >>>> >
> category:Companies_in_the_Dow_Jones_Industrial_Average
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> > So we also have "Companies based in
> Redmont,Washington"
> >> >> which
> >> >> >> >> could
> >> >> >> >> >> be
> >> >> >> >> >> >>>> > matched.
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> > There is still other contextual information from
> >> >> >> >> >> >>>> > dbpedia
> >> >> which
> >> >> >> >> can
> >> >> >> >> >> be
> >> >> >> >> >> >>>> used.
> >> >> >> >> >> >>>> > For example for an Organization we could also
> include :
> >> >> >> >> >> >>>> > dbpprop:industry = Software
> >> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> > dbpedia-owl:profession:
> >> >> >> >> >> >>>> >                                dbpedia:Author
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> > dbpedia:Constitutional_law
> >> >> >> >> >> >>>> >                                dbpedia:Lawyer
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> > dbpedia:Community_organizing
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> > I'd like to continue investigating this as I think
> that
> >> >> >> >> >> >>>> > it
> >> >> may
> >> >> >> >> have
> >> >> >> >> >> >>>> some
> >> >> >> >> >> >>>> > value in increasing the number of coreference
> >> >> >> >> >> >>>> > resolutions
> >> >> and
> >> >> >> I'd
> >> >> >> >> >> like
> >> >> >> >> >> >>>> to
> >> >> >> >> >> >>>> > concentrate more on precision rather than recall
> since
> >> >> >> >> >> >>>> > we
> >> >> >> already
> >> >> >> >> >> have
> >> >> >> >> >> >>>> a
> >> >> >> >> >> >>>> > set of coreferences detected by the stanford nlp tool
> >> >> >> >> >> >>>> > and
> >> >> this
> >> >> >> >> would
> >> >> >> >> >> >>>> be as
> >> >> >> >> >> >>>> > an addition to that (at least this is how I would
> like
> >> >> >> >> >> >>>> > to
> >> >> use
> >> >> >> >> it).
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I could
> >> >> >> >> >> >>>> > update
> >> >> it
> >> >> >> to
> >> >> >> >> >> show
> >> >> >> >> >> >>>> my
> >> >> >> >> >> >>>> > progress and also my conclusions and if it turns out
> >> >> >> >> >> >>>> > that
> >> >> it
> >> >> >> was
> >> >> >> >> a
> >> >> >> >> >> bad
> >> >> >> >> >> >>>> idea
> >> >> >> >> >> >>>> > then that's the situation at least I'll end up with
> >> >> >> >> >> >>>> > more
> >> >> >> >> knowledge
> >> >> >> >> >> >>>> about
> >> >> >> >> >> >>>> > Stanbol in the end :).
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
> >> >> >> >> >> >>>> > <rh...@apache.org>:
> >> >> >> >> >> >>>> >
> >> >> >> >> >> >>>> >> Hi Cristian,
> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be the
> >> >> >> >> >> >>>> >> devil's
> >> >> >> >> advocate
> >> >> >> >> >> but
> >> >> >> >> >> >>>> I'm
> >> >> >> >> >> >>>> >> just not sure about the recall using the dbpedia
> >> >> categories
> >> >> >> >> >> feature.
> >> >> >> >> >> >>>> For
> >> >> >> >> >> >>>> >> example, your sentence could be also "Microsoft
> posted
> >> >> >> >> >> >>>> >> its
> >> >> >> 2013
> >> >> >> >> >> >>>> earnings.
> >> >> >> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
> >> >> >> including
> >> >> >> >> more
> >> >> >> >> >> >>>> >> contextual information from dbpedia could increase
> the
> >> >> recall
> >> >> >> >> but
> >> >> >> >> >> of
> >> >> >> >> >> >>>> course
> >> >> >> >> >> >>>> >> will reduce the precision.
> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >>>> >> Cheers,
> >> >> >> >> >> >>>> >> Rafa
> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >>>> >>  Back with a more detailed description of the steps
> >> >> >> >> >> >>>> >> for
> >> >> >> making
> >> >> >> >> this
> >> >> >> >> >> >>>> kind of
> >> >> >> >> >> >>>> >>> coreference work.
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>> I will be using references to the following text in
> >> >> >> >> >> >>>> >>> the
> >> >> >> steps
> >> >> >> >> >> below
> >> >> >> >> >> >>>> in
> >> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted
> its
> >> >> >> >> >> >>>> >>> 2013
> >> >> >> >> >> earnings.
> >> >> >> >> >> >>>> The
> >> >> >> >> >> >>>> >>> software company made a huge profit."
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
> >> >> >> >> >> >>>> >>>      a. a determinate pos which implies reference
> to
> >> >> >> >> >> >>>> >>> an
> >> >> >> entity
> >> >> >> >> >> local
> >> >> >> >> >> >>>> to
> >> >> >> >> >> >>>> >>> the
> >> >> >> >> >> >>>> >>> text, such as "the, this, these") but not "another,
> >> >> every",
> >> >> >> etc
> >> >> >> >> >> which
> >> >> >> >> >> >>>> >>> implies a reference to an entity outside of the
> text.
> >> >> >> >> >> >>>> >>>      b. having at least another noun aside from the
> >> >> >> >> >> >>>> >>> main
> >> >> >> >> required
> >> >> >> >> >> >>>> noun
> >> >> >> >> >> >>>> >>> which
> >> >> >> >> >> >>>> >>> further describes it. For example I will not count
> >> >> >> >> >> >>>> >>> "The
> >> >> >> >> company"
> >> >> >> >> >> as
> >> >> >> >> >> >>>> being
> >> >> >> >> >> >>>> >>> a
> >> >> >> >> >> >>>> >>> legitimate candidate since this could create a lot
> of
> >> >> false
> >> >> >> >> >> >>>> positives by
> >> >> >> >> >> >>>> >>> considering the double meaning of some words such
> as
> >> >> >> >> >> >>>> >>> "in
> >> >> the
> >> >> >> >> >> company
> >> >> >> >> >> >>>> of
> >> >> >> >> >> >>>> >>> good people".
> >> >> >> >> >> >>>> >>> "The software company" is a good candidate since we
> >> >> >> >> >> >>>> >>> also
> >> >> >> have
> >> >> >> >> >> >>>> "software".
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the
> contents
> >> >> >> >> >> >>>> >>> of
> >> >> the
> >> >> >> >> >> dbpedia
> >> >> >> >> >> >>>> >>> categories of each named entity found prior to the
> >> >> location
> >> >> >> of
> >> >> >> >> the
> >> >> >> >> >> >>>> noun
> >> >> >> >> >> >>>> >>> phrase in the text.
> >> >> >> >> >> >>>> >>> The dbpedia categories are in the following format
> >> >> >> >> >> >>>> >>> (for
> >> >> >> >> Microsoft
> >> >> >> >> >> for
> >> >> >> >> >> >>>> >>> example) : "Software companies of the United
> States".
> >> >> >> >> >> >>>> >>>   So we try to match "software company" with that.
> >> >> >> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia
> >> >> category
> >> >> >> >> has a
> >> >> >> >> >> >>>> plural
> >> >> >> >> >> >>>> >>> form and it's the same for all categories which I
> >> >> >> >> >> >>>> >>> saw. I
> >> >> >> don't
> >> >> >> >> >> know
> >> >> >> >> >> >>>> if
> >> >> >> >> >> >>>> >>> there's an easier way to do this but I thought of
> >> >> applying a
> >> >> >> >> >> >>>> lemmatizer on
> >> >> >> >> >> >>>> >>> the category and the noun phrase in order for them
> to
> >> >> have a
> >> >> >> >> >> common
> >> >> >> >> >> >>>> >>> denominator.This also works if the noun phrase
> itself
> >> >> has a
> >> >> >> >> plural
> >> >> >> >> >> >>>> form.
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>> Second, I'll need to use for comparison only the
> >> >> >> >> >> >>>> >>> words in
> >> >> >> the
> >> >> >> >> >> >>>> category
> >> >> >> >> >> >>>> >>> which are themselves nouns and not prepositions or
> >> >> >> determiners
> >> >> >> >> >> such
> >> >> >> >> >> >>>> as "of
> >> >> >> >> >> >>>> >>> the".This means that I need to pos tag the
> categories
> >> >> >> contents
> >> >> >> >> as
> >> >> >> >> >> >>>> well.
> >> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma on the
> >> >> dbpedia
> >> >> >> >> >> >>>> categories when
> >> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and storing
> >> >> >> >> >> >>>> >>> them
> >> >> for
> >> >> >> >> later
> >> >> >> >> >> >>>> use - I
> >> >> >> >> >> >>>> >>> don't know how feasible this is at the moment.
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>> After this I can compare each noun in the noun
> phrase
> >> >> with
> >> >> >> the
> >> >> >> >> >> >>>> equivalent
> >> >> >> >> >> >>>> >>> nouns in the categories and based on the number of
> >> >> matches I
> >> >> >> >> can
> >> >> >> >> >> >>>> create a
> >> >> >> >> >> >>>> >>> confidence level.
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the
> >> >> >> >> >> >>>> >>> rdf:type
> >> >> from
> >> >> >> >> >> dbpedia
> >> >> >> >> >> >>>> of the
> >> >> >> >> >> >>>> >>> named entity. If this matches increase the
> confidence
> >> >> level.
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>> 4. If there are multiple named entities which can
> >> >> >> >> >> >>>> >>> match a
> >> >> >> >> certain
> >> >> >> >> >> >>>> noun
> >> >> >> >> >> >>>> >>> phrase then link the noun phrase with the closest
> >> >> >> >> >> >>>> >>> named
> >> >> >> entity
> >> >> >> >> >> prior
> >> >> >> >> >> >>>> to it
> >> >> >> >> >> >>>> >>> in the text.
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>> What do you think?
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>> Cristian
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
> >> >> cristian.petroaca@gmail.com>:
> >> >> >> >> >> >>>> >>>
> >> >> >> >> >> >>>> >>>  Hi Rafa,
> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm
> >> >> >> >> >> >>>> >>>> working on
> >> >> >> it.
> >> >> >> >> I'll
> >> >> >> >> >> >>>> provide
> >> >> >> >> >> >>>> >>>> it here so that you guys can give me a feedback on
> >> >> >> >> >> >>>> >>>> it.
> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >>>> >>>> What are "locality" features?
> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as
> >> >> >> >> >> >>>> >>>> ArkRef
> >> >> and
> >> >> >> >> >> >>>> CherryPicker
> >> >> >> >> >> >>>> >>>> and
> >> >> >> >> >> >>>> >>>> they don't provide such a coreference.
> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >>>> >>>> Cristian
> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >>>> >>>> Hi Cristian,
> >> >> >> >> >> >>>> >>>>
> >> >> >> >> >> >>>> >>>>> Without having more details about your concrete
> >> >> heuristic,
> >> >> >> >> in my
> >> >> >> >> >> >>>> honest
> >> >> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of
> false
> >> >> >> >> positives. I
> >> >> >> >> >> >>>> don't
> >> >> >> >> >> >>>> >>>>> know
> >> >> >> >> >> >>>> >>>>> if you are planning to use some "locality"
> features
> >> >> >> >> >> >>>> >>>>> to
> >> >> >> detect
> >> >> >> >> >> such
> >> >> >> >> >> >>>> >>>>> coreferences but you need to take into account
> that
> >> >> >> >> >> >>>> >>>>> it
> >> >> is
> >> >> >> >> quite
> >> >> >> >> >> >>>> usual
> >> >> >> >> >> >>>> >>>>> that
> >> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in
> different
> >> >> >> >> paragraphs.
> >> >> >> >> >> >>>> Although
> >> >> >> >> >> >>>> >>>>> I'm
> >> >> >> >> >> >>>> >>>>> not an expert in Natural Language Understanding,
> I
> >> >> would
> >> >> >> say
> >> >> >> >> it
> >> >> >> >> >> is
> >> >> >> >> >> >>>> quite
> >> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates
> for
> >> >> >> >> coreferencing
> >> >> >> >> >> >>>> using
> >> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others
> >> >> >> >> >> >>>> >>>>> tools
> >> >> like
> >> >> >> >> BART
> >> >> >> >> >> (
> >> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >>>> >>>>> Cheers,
> >> >> >> >> >> >>>> >>>>> Rafa Haro
> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >>>> >>>>>   Hi,
> >> >> >> >> >> >>>> >>>>>
> >> >> >> >> >> >>>> >>>>>> One of the necessary steps for implementing the
> >> >> >> >> >> >>>> >>>>>> Event
> >> >> >> >> >> extraction
> >> >> >> >> >> >>>> Engine
> >> >> >> >> >> >>>> >>>>>> feature :
> >> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
> >> >> >> >> >> >>>> to
> >> >> >> >> >> >>>> >>>>>> have
> >> >> >> >> >> >>>> >>>>>> coreference resolution in the given text. This
> is
> >> >> >> provided
> >> >> >> >> now
> >> >> >> >> >> >>>> via the
> >> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this
> >> >> >> >> >> >>>> >>>>>> module
> >> >> is
> >> >> >> >> >> performing
> >> >> >> >> >> >>>> >>>>>> mostly
> >> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and
> >> >> >> >> >> >>>> >>>>>> Mr.
> >> >> >> Obama)
> >> >> >> >> >> >>>> coreference
> >> >> >> >> >> >>>> >>>>>> resolution.
> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >>>> >>>>>> In order to get more coreferences from the text
> I
> >> >> though
> >> >> >> of
> >> >> >> >> >> >>>> creating
> >> >> >> >> >> >>>> >>>>>> some
> >> >> >> >> >> >>>> >>>>>> logic that would detect this kind of
> coreference :
> >> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software
> >> >> company
> >> >> >> just
> >> >> >> >> >> >>>> announced
> >> >> >> >> >> >>>> >>>>>> its
> >> >> >> >> >> >>>> >>>>>> 2013 earnings."
> >> >> >> >> >> >>>> >>>>>> Here "The software company" obviously refers to
> >> >> "Apple".
> >> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named
> >> >> >> >> >> >>>> >>>>>> Entities
> >> >> >> which
> >> >> >> >> are
> >> >> >> >> >> of
> >> >> >> >> >> >>>> the
> >> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case
> >> >> >> >> >> >>>> >>>>>> "company"
> >> >> and
> >> >> >> >> also
> >> >> >> >> >> >>>> have
> >> >> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
> >> >> categories
> >> >> >> of
> >> >> >> >> the
> >> >> >> >> >> >>>> named
> >> >> >> >> >> >>>> >>>>>> entity, in this case "software".
> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >>>> >>>>>> The detection of coreferences such as "The
> >> >> >> >> >> >>>> >>>>>> software
> >> >> >> >> company" in
> >> >> >> >> >> >>>> the
> >> >> >> >> >> >>>> >>>>>> text
> >> >> >> >> >> >>>> >>>>>> would also be done by either using the new Pos
> Tag
> >> >> Based
> >> >> >> >> Phrase
> >> >> >> >> >> >>>> >>>>>> extraction
> >> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency
> >> >> >> >> >> >>>> >>>>>> tree of
> >> >> >> the
> >> >> >> >> >> >>>> sentence and
> >> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind of
> >> >> >> >> >> >>>> >>>>>> logic
> >> >> >> would
> >> >> >> >> be
> >> >> >> >> >> >>>> useful
> >> >> >> >> >> >>>> >>>>>> as a
> >> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the
> precision
> >> >> >> >> >> >>>> >>>>>> and
> >> >> >> >> recall
> >> >> >> >> >> are
> >> >> >> >> >> >>>> good
> >> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >>>> >>>>>> Thanks,
> >> >> >> >> >> >>>> >>>>>> Cristian
> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >> >>>> >>
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>> --
> >> >> >> >> >> >>>> | Rupert Westenthaler
> >> >> rupert.westenthaler@gmail.com
> >> >> >> >> >> >>>> | Bodenlehenstraße 11
> >> >> >> >> ++43-699-11108907
> >> >> >> >> >> >>>> | A-5500 Bischofshofen
> >> >> >> >> >> >>>>
> >> >> >> >> >> >>>
> >> >> >> >> >> >>>
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >> | Rupert Westenthaler
> >> >> >> >> >> rupert.westenthaler@gmail.com
> >> >> >> >> >> | Bodenlehenstraße 11
> >> >> ++43-699-11108907
> >> >> >> >> >> | A-5500 Bischofshofen
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> | Rupert Westenthaler
> rupert.westenthaler@gmail.com
> >> >> >> >> | Bodenlehenstraße 11
> >> >> >> >> ++43-699-11108907
> >> >> >> >> | A-5500 Bischofshofen
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> >> | Bodenlehenstraße 11
> ++43-699-11108907
> >> >> >> | A-5500 Bischofshofen
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >
> >
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian,

you can not send attachments to the list. Please copy the contents
directly to the mail

thx
Rupert

On Wed, Mar 19, 2014 at 9:20 PM, Cristian Petroaca
<cr...@gmail.com> wrote:
> The config attached.
>
>
> 2014-03-19 9:09 GMT+02:00 Rupert Westenthaler
> <ru...@gmail.com>:
>
>> Hi Cristian,
>>
>> can you provide the contents of the chain after your modifications?
>> Would be interesting to test why the chain is no longer active after
>> the restart.
>>
>> You can find the config file in the 'stanbol/fileinstall' folder.
>>
>> best
>> Rupert
>>
>> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > Related to the default chain selection rules : before restart I had a
>> > chain
>> > with the name 'default' as in I could access it via
>> > enhancer/chain/default.
>> > Then I just added another engine to the 'default' chain. I assumed that
>> > after the restart the chain with the 'default' name would be persisted.
>> > So
>> > the first rule should have been applied after the restart as well. But
>> > instead I cannot reach it via enhancer/chain/default anymore so its
>> > gone.
>> > Anyway, this is not a big deal, it's not blocking me in any way, I just
>> > wanted to understand where the problem is.
>> >
>> >
>> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler
>> > <rupert.westenthaler@gmail.com
>> >>:
>> >
>> >> Hi Cristian
>> >>
>> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>> >> <cr...@gmail.com> wrote:
>> >> > 1. Updated to the latest code and it's gone. Cool
>> >> >
>> >> > 2. I start the stable launcher -> create a new instance of the
>> >> > PosChunkerEngine -> add it to the default chain. At this point
>> >> > everything
>> >> > looks good and works ok.
>> >> > After I restart the server the default chain is gone and instead I
>> >> > see
>> >> this
>> >> > in the enhancement chains page : all-active (default, id: 149,
>> >> > ranking:
>> >> 0,
>> >> > impl: AllActiveEnginesChain ). all-active did not contain the
>> >> > 'default'
>> >> > word before the restart.
>> >> >
>> >>
>> >> Please note the default chain selection rules as described at [1]. You
>> >> can also access chains chains under '/enhancer/chain/{chain-name}'
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> [1]
>> >>
>> >> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>> >>
>> >> > It looks like the config files are exactly what I need. Thanks.
>> >> >
>> >> >
>> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>> >> rupert.westenthaler@gmail.com
>> >> >>:
>> >> >
>> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> >> >> <cr...@gmail.com> wrote:
>> >> >> > Thanks Rupert.
>> >> >> >
>> >> >> > A couple more questions/issues :
>> >> >> >
>> >> >> > 1. Whenever I start the stanbol server I'm seeing this in the
>> >> >> > console
>> >> >> > output :
>> >> >> >
>> >> >>
>> >> >> This should be fixed with STANBOL-1278 [1] [2]
>> >> >>
>> >> >> > 2. Whenever I restart the server the Weighted Chains get messed
>> >> >> > up. I
>> >> >> > usually use the 'default' chain and add my engine to it so there
>> >> >> > are
>> >> 11
>> >> >> > engines in it. After the restart this chain now contains around 23
>> >> >> engines
>> >> >> > in total.
>> >> >>
>> >> >> I was not able to replicate this. What I tried was
>> >> >>
>> >> >> (1) start up the stable launcher
>> >> >> (2) add an additional engine to the default chain
>> >> >> (3) restart the launcher
>> >> >>
>> >> >> The default chain was not changed after (2) and (3). So I would need
>> >> >> further information for knowing why this is happening.
>> >> >>
>> >> >> Generally it is better to create you own chain instance as modifying
>> >> >> one that is provided by the default configuration. I would also
>> >> >> recommend that you keep your test configuration in text files and to
>> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so prevent you
>> >> >> from manually entering the configuration after a software update.
>> >> >> The
>> >> >> production-mode section [3] provides information on how to do that.
>> >> >>
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> >> >> [2] http://svn.apache.org/r1576623
>> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
>> >> >>
>> >> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]:
>> >> Error
>> >> >> > starting
>> >> >> >
>> >> >>
>> >>
>> >> slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> >> >> >
>> >> >> > tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> >> >> > (org.osgi
>> >> >> > .framework.BundleException: Unresolved constraint in bundle
>> >> >> > org.apache.stanbol.e
>> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
>> >> >> > requirement [15
>> >> >> > 3.0] package; (&(package=javax.ws.rs
>> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
>> >> >> > org.osgi.framework.BundleException: Unresolved constraint in
>> >> >> > bundle
>> >> >> > org.apache.s
>> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0:
>> >> missing
>> >> >> > require
>> >> >> > ment [153.0] package; (&(package=javax.ws.rs
>> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
>> >> >> > )
>> >> >> >         at
>> >> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >> >> >         at
>> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >> >> >         at
>> >> >> >
>> >> >> > org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >> >> >
>> >> >> >         at
>> >> >> >
>> >> >> > org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> >> >> > )
>> >> >> >         at java.lang.Thread.run(Unknown Source)
>> >> >> >
>> >> >> > Despite of this the server starts fine and I can use the enhancer
>> >> fine.
>> >> >> Do
>> >> >> > you guys see this as well?
>> >> >> >
>> >> >> >
>> >> >> > 2. Whenever I restart the server the Weighted Chains get messed
>> >> >> > up. I
>> >> >> > usually use the 'default' chain and add my engine to it so there
>> >> >> > are
>> >> 11
>> >> >> > engines in it. After the restart this chain now contains around 23
>> >> >> engines
>> >> >> > in total.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> >> >> rupert.westenthaler@gmail.com
>> >> >> >>:
>> >> >> >
>> >> >> >> Hi Cristian,
>> >> >> >>
>> >> >> >> NER Annotations are typically available as both
>> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
>> >> >> >> enhancement metadata. As you are already accessing the
>> >> >> >> AnayzedText I
>> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>> >> >> >>
>> >> >> >> best
>> >> >> >> Rupert
>> >> >> >>
>> >> >> >> [1]
>> >> >> >>
>> >> >>
>> >>
>> >> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >> >> >>
>> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> > Thanks.
>> >> >> >> > I assume I should get the Named entities using the same but
>> >> >> >> > with
>> >> >> >> > NlpAnnotations.NER_ANNOTATION?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >> >> >> > rupert.westenthaler@gmail.com>:
>> >> >> >> >
>> >> >> >> >> Hallo Cristian,
>> >> >> >> >>
>> >> >> >> >> NounPhrases are not added to the RDF enhancement results. You
>> >> need to
>> >> >> >> >> use the AnalyzedText ContentPart [1]
>> >> >> >> >>
>> >> >> >> >> here is some demo code you can use in the computeEnhancement
>> >> method
>> >> >> >> >>
>> >> >> >> >>         AnalysedText at =
>> >> >> >> >> NlpEngineHelper.getAnalysedText(this,
>> >> ci,
>> >> >> >> true);
>> >> >> >> >>         Iterator<? extends Section> sections =
>> >> >> >> >> at.getSentences();
>> >> >> >> >>         if(!sections.hasNext()){ //process as single sentence
>> >> >> >> >>             sections = Collections.singleton(at).iterator();
>> >> >> >> >>         }
>> >> >> >> >>
>> >> >> >> >>         while(sections.hasNext()){
>> >> >> >> >>             Section section = sections.next();
>> >> >> >> >>             Iterator<Span> chunks =
>> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >> >> >> >>             while(chunks.hasNext()){
>> >> >> >> >>                 Span chunk = chunks.next();
>> >> >> >> >>                 Value<PhraseTag> phrase =
>> >> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >> >> >> >>                 if(phrase.value().getCategory() ==
>> >> >> >> LexicalCategory.Noun){
>> >> >> >> >>                     log.info(" - NounPhrase [{},{}] {}", new
>> >> >> Object[]{
>> >> >> >> >>
>> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >> >> >> >>                 }
>> >> >> >> >>             }
>> >> >> >> >>         }
>> >> >> >> >>
>> >> >> >> >> hope this helps
>> >> >> >> >>
>> >> >> >> >> best
>> >> >> >> >> Rupert
>> >> >> >> >>
>> >> >> >> >> [1]
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> >> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >> >> >> >>
>> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> >> > I started to implement the engine and I'm having problems
>> >> >> >> >> > with
>> >> >> getting
>> >> >> >> >> > results for noun phrases. I modified the "default" weighted
>> >> chain
>> >> >> to
>> >> >> >> also
>> >> >> >> >> > include the PosChunkerEngine and ran a sample text : "Angela
>> >> Merkel
>> >> >> >> >> visted
>> >> >> >> >> > China. The german chancellor met with various people". I
>> >> expected
>> >> >> that
>> >> >> >> >> the
>> >> >> >> >> > RDF XML output would contain some info about the noun
>> >> >> >> >> > phrases
>> >> but I
>> >> >> >> >> cannot
>> >> >> >> >> > see any.
>> >> >> >> >> > Could you point me to the correct way to generate the noun
>> >> phrases?
>> >> >> >> >> >
>> >> >> >> >> > Thanks,
>> >> >> >> >> > Cristian
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >> >> >> >> cristian.petroaca@gmail.com>:
>> >> >> >> >> >
>> >> >> >> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >> >> >> >> cristian.petroaca@gmail.com>
>> >> >> >> >> >> :
>> >> >> >> >> >>
>> >> >> >> >> >> Hi Rupert,
>> >> >> >> >> >>>
>> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also take a
>> >> >> >> >> >>> look
>> >> at
>> >> >> >> Yago.
>> >> >> >> >> >>>
>> >> >> >> >> >>> I will create a Jira with what we talked about here. It
>> >> >> >> >> >>> will
>> >> >> >> probably
>> >> >> >> >> >>> have just a draft-like description for now and will be
>> >> >> >> >> >>> updated
>> >> >> as I
>> >> >> >> go
>> >> >> >> >> >>> along.
>> >> >> >> >> >>>
>> >> >> >> >> >>> Thanks,
>> >> >> >> >> >>> Cristian
>> >> >> >> >> >>>
>> >> >> >> >> >>>
>> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
>> >> >> >> >> >>>
>> >> >> >> >> >>> Hi Cristian,
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> definitely an interesting approach. You should have a
>> >> >> >> >> >>>> look at
>> >> >> Yago2
>> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much
>> >> better
>> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
>> >> >> >> >> >>>> suggestions of
>> >> >> >> dbpedia
>> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do
>> >> >> provide
>> >> >> >> >> >>>> mappings [2] and [3]
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >> >> >> >> >>>> > <rh...@apache.org>:
>> >> >> >> >> >>>> >>
>> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's
>> >> >> >> >> >>>> >> company
>> >> >> made
>> >> >> >> a
>> >> >> >> >> >>>> >> huge profit".
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Thats actually a very good example. Spatial contexts are
>> >> >> >> >> >>>> very
>> >> >> >> >> >>>> important as they tend to be often used for referencing.
>> >> >> >> >> >>>> So I
>> >> >> would
>> >> >> >> >> >>>> suggest to specially treat the spatial context. For
>> >> >> >> >> >>>> spatial
>> >> >> >> Entities
>> >> >> >> >> >>>> (like a City) this is easy, but even for other (like a
>> >> Person,
>> >> >> >> >> >>>> Company) you could use relations to spatial entities
>> >> >> >> >> >>>> define
>> >> >> their
>> >> >> >> >> >>>> spatial context. This context could than be used to
>> >> >> >> >> >>>> correctly
>> >> >> link
>> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> In addition I would suggest to use the "spatial" context
>> >> >> >> >> >>>> of
>> >> each
>> >> >> >> >> >>>> entity (basically relation to entities that are cities,
>> >> regions,
>> >> >> >> >> >>>> countries) as a separate dimension, because those are
>> >> >> >> >> >>>> very
>> >> often
>> >> >> >> used
>> >> >> >> >> >>>> for coreferences.
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >> >> >> >> >>>> [2]
>> >> >> >> >> >>>> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >> >> >> >> >>>> [3]
>> >> >> >> >> >>>>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> >> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >> >> >> >> >>>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> >> >> >> >> >>>> <cr...@gmail.com> wrote:
>> >> >> >> >> >>>> > There are several dbpedia categories for each entity,
>> >> >> >> >> >>>> > in
>> >> this
>> >> >> >> case
>> >> >> >> >> for
>> >> >> >> >> >>>> > Microsoft we have :
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >> >> >> >> >>>> > category:Microsoft
>> >> >> >> >> >>>> > category:Software_companies_of_the_United_States
>> >> >> >> >> >>>> > category:Software_companies_based_in_Washington_(state)
>> >> >> >> >> >>>> > category:Companies_established_in_1975
>> >> >> >> >> >>>> > category:1975_establishments_in_the_United_States
>> >> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
>> >> >> >> >> >>>> >
>> >> >> >>
>> >> >> >> category:Multinational_companies_headquartered_in_the_United_States
>> >> >> >> >> >>>> > category:Cloud_computing_providers
>> >> >> >> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> > So we also have "Companies based in Redmont,Washington"
>> >> which
>> >> >> >> could
>> >> >> >> >> be
>> >> >> >> >> >>>> > matched.
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> > There is still other contextual information from
>> >> >> >> >> >>>> > dbpedia
>> >> which
>> >> >> >> can
>> >> >> >> >> be
>> >> >> >> >> >>>> used.
>> >> >> >> >> >>>> > For example for an Organization we could also include :
>> >> >> >> >> >>>> > dbpprop:industry = Software
>> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> > dbpedia-owl:profession:
>> >> >> >> >> >>>> >                                dbpedia:Author
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> > dbpedia:Constitutional_law
>> >> >> >> >> >>>> >                                dbpedia:Lawyer
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> > dbpedia:Community_organizing
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> > I'd like to continue investigating this as I think that
>> >> >> >> >> >>>> > it
>> >> may
>> >> >> >> have
>> >> >> >> >> >>>> some
>> >> >> >> >> >>>> > value in increasing the number of coreference
>> >> >> >> >> >>>> > resolutions
>> >> and
>> >> >> I'd
>> >> >> >> >> like
>> >> >> >> >> >>>> to
>> >> >> >> >> >>>> > concentrate more on precision rather than recall since
>> >> >> >> >> >>>> > we
>> >> >> already
>> >> >> >> >> have
>> >> >> >> >> >>>> a
>> >> >> >> >> >>>> > set of coreferences detected by the stanford nlp tool
>> >> >> >> >> >>>> > and
>> >> this
>> >> >> >> would
>> >> >> >> >> >>>> be as
>> >> >> >> >> >>>> > an addition to that (at least this is how I would like
>> >> >> >> >> >>>> > to
>> >> use
>> >> >> >> it).
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I could
>> >> >> >> >> >>>> > update
>> >> it
>> >> >> to
>> >> >> >> >> show
>> >> >> >> >> >>>> my
>> >> >> >> >> >>>> > progress and also my conclusions and if it turns out
>> >> >> >> >> >>>> > that
>> >> it
>> >> >> was
>> >> >> >> a
>> >> >> >> >> bad
>> >> >> >> >> >>>> idea
>> >> >> >> >> >>>> > then that's the situation at least I'll end up with
>> >> >> >> >> >>>> > more
>> >> >> >> knowledge
>> >> >> >> >> >>>> about
>> >> >> >> >> >>>> > Stanbol in the end :).
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro
>> >> >> >> >> >>>> > <rh...@apache.org>:
>> >> >> >> >> >>>> >
>> >> >> >> >> >>>> >> Hi Cristian,
>> >> >> >> >> >>>> >>
>> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be the
>> >> >> >> >> >>>> >> devil's
>> >> >> >> advocate
>> >> >> >> >> but
>> >> >> >> >> >>>> I'm
>> >> >> >> >> >>>> >> just not sure about the recall using the dbpedia
>> >> categories
>> >> >> >> >> feature.
>> >> >> >> >> >>>> For
>> >> >> >> >> >>>> >> example, your sentence could be also "Microsoft posted
>> >> >> >> >> >>>> >> its
>> >> >> 2013
>> >> >> >> >> >>>> earnings.
>> >> >> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
>> >> >> including
>> >> >> >> more
>> >> >> >> >> >>>> >> contextual information from dbpedia could increase the
>> >> recall
>> >> >> >> but
>> >> >> >> >> of
>> >> >> >> >> >>>> course
>> >> >> >> >> >>>> >> will reduce the precision.
>> >> >> >> >> >>>> >>
>> >> >> >> >> >>>> >> Cheers,
>> >> >> >> >> >>>> >> Rafa
>> >> >> >> >> >>>> >>
>> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >> >> >> >> >>>> >>
>> >> >> >> >> >>>> >>  Back with a more detailed description of the steps
>> >> >> >> >> >>>> >> for
>> >> >> making
>> >> >> >> this
>> >> >> >> >> >>>> kind of
>> >> >> >> >> >>>> >>> coreference work.
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>> I will be using references to the following text in
>> >> >> >> >> >>>> >>> the
>> >> >> steps
>> >> >> >> >> below
>> >> >> >> >> >>>> in
>> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted its
>> >> >> >> >> >>>> >>> 2013
>> >> >> >> >> earnings.
>> >> >> >> >> >>>> The
>> >> >> >> >> >>>> >>> software company made a huge profit."
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
>> >> >> >> >> >>>> >>>      a. a determinate pos which implies reference to
>> >> >> >> >> >>>> >>> an
>> >> >> entity
>> >> >> >> >> local
>> >> >> >> >> >>>> to
>> >> >> >> >> >>>> >>> the
>> >> >> >> >> >>>> >>> text, such as "the, this, these") but not "another,
>> >> every",
>> >> >> etc
>> >> >> >> >> which
>> >> >> >> >> >>>> >>> implies a reference to an entity outside of the text.
>> >> >> >> >> >>>> >>>      b. having at least another noun aside from the
>> >> >> >> >> >>>> >>> main
>> >> >> >> required
>> >> >> >> >> >>>> noun
>> >> >> >> >> >>>> >>> which
>> >> >> >> >> >>>> >>> further describes it. For example I will not count
>> >> >> >> >> >>>> >>> "The
>> >> >> >> company"
>> >> >> >> >> as
>> >> >> >> >> >>>> being
>> >> >> >> >> >>>> >>> a
>> >> >> >> >> >>>> >>> legitimate candidate since this could create a lot of
>> >> false
>> >> >> >> >> >>>> positives by
>> >> >> >> >> >>>> >>> considering the double meaning of some words such as
>> >> >> >> >> >>>> >>> "in
>> >> the
>> >> >> >> >> company
>> >> >> >> >> >>>> of
>> >> >> >> >> >>>> >>> good people".
>> >> >> >> >> >>>> >>> "The software company" is a good candidate since we
>> >> >> >> >> >>>> >>> also
>> >> >> have
>> >> >> >> >> >>>> "software".
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the contents
>> >> >> >> >> >>>> >>> of
>> >> the
>> >> >> >> >> dbpedia
>> >> >> >> >> >>>> >>> categories of each named entity found prior to the
>> >> location
>> >> >> of
>> >> >> >> the
>> >> >> >> >> >>>> noun
>> >> >> >> >> >>>> >>> phrase in the text.
>> >> >> >> >> >>>> >>> The dbpedia categories are in the following format
>> >> >> >> >> >>>> >>> (for
>> >> >> >> Microsoft
>> >> >> >> >> for
>> >> >> >> >> >>>> >>> example) : "Software companies of the United States".
>> >> >> >> >> >>>> >>>   So we try to match "software company" with that.
>> >> >> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia
>> >> category
>> >> >> >> has a
>> >> >> >> >> >>>> plural
>> >> >> >> >> >>>> >>> form and it's the same for all categories which I
>> >> >> >> >> >>>> >>> saw. I
>> >> >> don't
>> >> >> >> >> know
>> >> >> >> >> >>>> if
>> >> >> >> >> >>>> >>> there's an easier way to do this but I thought of
>> >> applying a
>> >> >> >> >> >>>> lemmatizer on
>> >> >> >> >> >>>> >>> the category and the noun phrase in order for them to
>> >> have a
>> >> >> >> >> common
>> >> >> >> >> >>>> >>> denominator.This also works if the noun phrase itself
>> >> has a
>> >> >> >> plural
>> >> >> >> >> >>>> form.
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>> Second, I'll need to use for comparison only the
>> >> >> >> >> >>>> >>> words in
>> >> >> the
>> >> >> >> >> >>>> category
>> >> >> >> >> >>>> >>> which are themselves nouns and not prepositions or
>> >> >> determiners
>> >> >> >> >> such
>> >> >> >> >> >>>> as "of
>> >> >> >> >> >>>> >>> the".This means that I need to pos tag the categories
>> >> >> contents
>> >> >> >> as
>> >> >> >> >> >>>> well.
>> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma on the
>> >> dbpedia
>> >> >> >> >> >>>> categories when
>> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and storing
>> >> >> >> >> >>>> >>> them
>> >> for
>> >> >> >> later
>> >> >> >> >> >>>> use - I
>> >> >> >> >> >>>> >>> don't know how feasible this is at the moment.
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>> After this I can compare each noun in the noun phrase
>> >> with
>> >> >> the
>> >> >> >> >> >>>> equivalent
>> >> >> >> >> >>>> >>> nouns in the categories and based on the number of
>> >> matches I
>> >> >> >> can
>> >> >> >> >> >>>> create a
>> >> >> >> >> >>>> >>> confidence level.
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the
>> >> >> >> >> >>>> >>> rdf:type
>> >> from
>> >> >> >> >> dbpedia
>> >> >> >> >> >>>> of the
>> >> >> >> >> >>>> >>> named entity. If this matches increase the confidence
>> >> level.
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>> 4. If there are multiple named entities which can
>> >> >> >> >> >>>> >>> match a
>> >> >> >> certain
>> >> >> >> >> >>>> noun
>> >> >> >> >> >>>> >>> phrase then link the noun phrase with the closest
>> >> >> >> >> >>>> >>> named
>> >> >> entity
>> >> >> >> >> prior
>> >> >> >> >> >>>> to it
>> >> >> >> >> >>>> >>> in the text.
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>> What do you think?
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>> Cristian
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>> >> cristian.petroaca@gmail.com>:
>> >> >> >> >> >>>> >>>
>> >> >> >> >> >>>> >>>  Hi Rafa,
>> >> >> >> >> >>>> >>>>
>> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm
>> >> >> >> >> >>>> >>>> working on
>> >> >> it.
>> >> >> >> I'll
>> >> >> >> >> >>>> provide
>> >> >> >> >> >>>> >>>> it here so that you guys can give me a feedback on
>> >> >> >> >> >>>> >>>> it.
>> >> >> >> >> >>>> >>>>
>> >> >> >> >> >>>> >>>> What are "locality" features?
>> >> >> >> >> >>>> >>>>
>> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as
>> >> >> >> >> >>>> >>>> ArkRef
>> >> and
>> >> >> >> >> >>>> CherryPicker
>> >> >> >> >> >>>> >>>> and
>> >> >> >> >> >>>> >>>> they don't provide such a coreference.
>> >> >> >> >> >>>> >>>>
>> >> >> >> >> >>>> >>>> Cristian
>> >> >> >> >> >>>> >>>>
>> >> >> >> >> >>>> >>>>
>> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>> >> >> >> >> >>>> >>>>
>> >> >> >> >> >>>> >>>> Hi Cristian,
>> >> >> >> >> >>>> >>>>
>> >> >> >> >> >>>> >>>>> Without having more details about your concrete
>> >> heuristic,
>> >> >> >> in my
>> >> >> >> >> >>>> honest
>> >> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of false
>> >> >> >> positives. I
>> >> >> >> >> >>>> don't
>> >> >> >> >> >>>> >>>>> know
>> >> >> >> >> >>>> >>>>> if you are planning to use some "locality" features
>> >> >> >> >> >>>> >>>>> to
>> >> >> detect
>> >> >> >> >> such
>> >> >> >> >> >>>> >>>>> coreferences but you need to take into account that
>> >> >> >> >> >>>> >>>>> it
>> >> is
>> >> >> >> quite
>> >> >> >> >> >>>> usual
>> >> >> >> >> >>>> >>>>> that
>> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in different
>> >> >> >> paragraphs.
>> >> >> >> >> >>>> Although
>> >> >> >> >> >>>> >>>>> I'm
>> >> >> >> >> >>>> >>>>> not an expert in Natural Language Understanding, I
>> >> would
>> >> >> say
>> >> >> >> it
>> >> >> >> >> is
>> >> >> >> >> >>>> quite
>> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates for
>> >> >> >> coreferencing
>> >> >> >> >> >>>> using
>> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others
>> >> >> >> >> >>>> >>>>> tools
>> >> like
>> >> >> >> BART
>> >> >> >> >> (
>> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >>>> >>>>> Cheers,
>> >> >> >> >> >>>> >>>>> Rafa Haro
>> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >>>> >>>>>   Hi,
>> >> >> >> >> >>>> >>>>>
>> >> >> >> >> >>>> >>>>>> One of the necessary steps for implementing the
>> >> >> >> >> >>>> >>>>>> Event
>> >> >> >> >> extraction
>> >> >> >> >> >>>> Engine
>> >> >> >> >> >>>> >>>>>> feature :
>> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >> >> >> >> >>>> to
>> >> >> >> >> >>>> >>>>>> have
>> >> >> >> >> >>>> >>>>>> coreference resolution in the given text. This is
>> >> >> provided
>> >> >> >> now
>> >> >> >> >> >>>> via the
>> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this
>> >> >> >> >> >>>> >>>>>> module
>> >> is
>> >> >> >> >> performing
>> >> >> >> >> >>>> >>>>>> mostly
>> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and
>> >> >> >> >> >>>> >>>>>> Mr.
>> >> >> Obama)
>> >> >> >> >> >>>> coreference
>> >> >> >> >> >>>> >>>>>> resolution.
>> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >>>> >>>>>> In order to get more coreferences from the text I
>> >> though
>> >> >> of
>> >> >> >> >> >>>> creating
>> >> >> >> >> >>>> >>>>>> some
>> >> >> >> >> >>>> >>>>>> logic that would detect this kind of coreference :
>> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software
>> >> company
>> >> >> just
>> >> >> >> >> >>>> announced
>> >> >> >> >> >>>> >>>>>> its
>> >> >> >> >> >>>> >>>>>> 2013 earnings."
>> >> >> >> >> >>>> >>>>>> Here "The software company" obviously refers to
>> >> "Apple".
>> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named
>> >> >> >> >> >>>> >>>>>> Entities
>> >> >> which
>> >> >> >> are
>> >> >> >> >> of
>> >> >> >> >> >>>> the
>> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case
>> >> >> >> >> >>>> >>>>>> "company"
>> >> and
>> >> >> >> also
>> >> >> >> >> >>>> have
>> >> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
>> >> categories
>> >> >> of
>> >> >> >> the
>> >> >> >> >> >>>> named
>> >> >> >> >> >>>> >>>>>> entity, in this case "software".
>> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >>>> >>>>>> The detection of coreferences such as "The
>> >> >> >> >> >>>> >>>>>> software
>> >> >> >> company" in
>> >> >> >> >> >>>> the
>> >> >> >> >> >>>> >>>>>> text
>> >> >> >> >> >>>> >>>>>> would also be done by either using the new Pos Tag
>> >> Based
>> >> >> >> Phrase
>> >> >> >> >> >>>> >>>>>> extraction
>> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency
>> >> >> >> >> >>>> >>>>>> tree of
>> >> >> the
>> >> >> >> >> >>>> sentence and
>> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind of
>> >> >> >> >> >>>> >>>>>> logic
>> >> >> would
>> >> >> >> be
>> >> >> >> >> >>>> useful
>> >> >> >> >> >>>> >>>>>> as a
>> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the precision
>> >> >> >> >> >>>> >>>>>> and
>> >> >> >> recall
>> >> >> >> >> are
>> >> >> >> >> >>>> good
>> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
>> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >>>> >>>>>> Thanks,
>> >> >> >> >> >>>> >>>>>> Cristian
>> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >>>> >>>>>>
>> >> >> >> >> >>>> >>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> --
>> >> >> >> >> >>>> | Rupert Westenthaler
>> >> rupert.westenthaler@gmail.com
>> >> >> >> >> >>>> | Bodenlehenstraße 11
>> >> >> >> ++43-699-11108907
>> >> >> >> >> >>>> | A-5500 Bischofshofen
>> >> >> >> >> >>>>
>> >> >> >> >> >>>
>> >> >> >> >> >>>
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> | Rupert Westenthaler
>> >> >> >> >> rupert.westenthaler@gmail.com
>> >> >> >> >> | Bodenlehenstraße 11
>> >> ++43-699-11108907
>> >> >> >> >> | A-5500 Bischofshofen
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> >> | Bodenlehenstraße 11
>> >> >> >> ++43-699-11108907
>> >> >> >> | A-5500 Bischofshofen
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
The config attached.


2014-03-19 9:09 GMT+02:00 Rupert Westenthaler <rupert.westenthaler@gmail.com
>:

> Hi Cristian,
>
> can you provide the contents of the chain after your modifications?
> Would be interesting to test why the chain is no longer active after
> the restart.
>
> You can find the config file in the 'stanbol/fileinstall' folder.
>
> best
> Rupert
>
> On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > Related to the default chain selection rules : before restart I had a
> chain
> > with the name 'default' as in I could access it via
> enhancer/chain/default.
> > Then I just added another engine to the 'default' chain. I assumed that
> > after the restart the chain with the 'default' name would be persisted.
> So
> > the first rule should have been applied after the restart as well. But
> > instead I cannot reach it via enhancer/chain/default anymore so its gone.
> > Anyway, this is not a big deal, it's not blocking me in any way, I just
> > wanted to understand where the problem is.
> >
> >
> > 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler <
> rupert.westenthaler@gmail.com
> >>:
> >
> >> Hi Cristian
> >>
> >> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
> >> <cr...@gmail.com> wrote:
> >> > 1. Updated to the latest code and it's gone. Cool
> >> >
> >> > 2. I start the stable launcher -> create a new instance of the
> >> > PosChunkerEngine -> add it to the default chain. At this point
> everything
> >> > looks good and works ok.
> >> > After I restart the server the default chain is gone and instead I see
> >> this
> >> > in the enhancement chains page : all-active (default, id: 149,
> ranking:
> >> 0,
> >> > impl: AllActiveEnginesChain ). all-active did not contain the
> 'default'
> >> > word before the restart.
> >> >
> >>
> >> Please note the default chain selection rules as described at [1]. You
> >> can also access chains chains under '/enhancer/chain/{chain-name}'
> >>
> >> best
> >> Rupert
> >>
> >> [1]
> >>
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
> >>
> >> > It looks like the config files are exactly what I need. Thanks.
> >> >
> >> >
> >> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
> >> rupert.westenthaler@gmail.com
> >> >>:
> >> >
> >> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
> >> >> <cr...@gmail.com> wrote:
> >> >> > Thanks Rupert.
> >> >> >
> >> >> > A couple more questions/issues :
> >> >> >
> >> >> > 1. Whenever I start the stanbol server I'm seeing this in the
> console
> >> >> > output :
> >> >> >
> >> >>
> >> >> This should be fixed with STANBOL-1278 [1] [2]
> >> >>
> >> >> > 2. Whenever I restart the server the Weighted Chains get messed
> up. I
> >> >> > usually use the 'default' chain and add my engine to it so there
> are
> >> 11
> >> >> > engines in it. After the restart this chain now contains around 23
> >> >> engines
> >> >> > in total.
> >> >>
> >> >> I was not able to replicate this. What I tried was
> >> >>
> >> >> (1) start up the stable launcher
> >> >> (2) add an additional engine to the default chain
> >> >> (3) restart the launcher
> >> >>
> >> >> The default chain was not changed after (2) and (3). So I would need
> >> >> further information for knowing why this is happening.
> >> >>
> >> >> Generally it is better to create you own chain instance as modifying
> >> >> one that is provided by the default configuration. I would also
> >> >> recommend that you keep your test configuration in text files and to
> >> >> copy those to the 'stanbol/fileinstall' folder. Doing so prevent you
> >> >> from manually entering the configuration after a software update. The
> >> >> production-mode section [3] provides information on how to do that.
> >> >>
> >> >> best
> >> >> Rupert
> >> >>
> >> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
> >> >> [2] http://svn.apache.org/r1576623
> >> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
> >> >>
> >> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]:
> >> Error
> >> >> > starting
> >> >> >
> >> >>
> >>
>  slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
> >> >> >
> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
> >> >> > (org.osgi
> >> >> > .framework.BundleException: Unresolved constraint in bundle
> >> >> > org.apache.stanbol.e
> >> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
> >> >> > requirement [15
> >> >> > 3.0] package; (&(package=javax.ws.rs
> >> >> )(version>=0.0.0)(!(version>=2.0.0))))
> >> >> > org.osgi.framework.BundleException: Unresolved constraint in bundle
> >> >> > org.apache.s
> >> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0:
> >> missing
> >> >> > require
> >> >> > ment [153.0] package; (&(package=javax.ws.rs
> >> >> > )(version>=0.0.0)(!(version>=2.0.0))
> >> >> > )
> >> >> >         at
> >> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
> >> >> >         at
> >> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
> >> >> >         at
> >> >> >
> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
> >> >> >
> >> >> >         at
> >> >> >
> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
> >> >> > )
> >> >> >         at java.lang.Thread.run(Unknown Source)
> >> >> >
> >> >> > Despite of this the server starts fine and I can use the enhancer
> >> fine.
> >> >> Do
> >> >> > you guys see this as well?
> >> >> >
> >> >> >
> >> >> > 2. Whenever I restart the server the Weighted Chains get messed
> up. I
> >> >> > usually use the 'default' chain and add my engine to it so there
> are
> >> 11
> >> >> > engines in it. After the restart this chain now contains around 23
> >> >> engines
> >> >> > in total.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
> >> >> rupert.westenthaler@gmail.com
> >> >> >>:
> >> >> >
> >> >> >> Hi Cristian,
> >> >> >>
> >> >> >> NER Annotations are typically available as both
> >> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
> >> >> >> enhancement metadata. As you are already accessing the
> AnayzedText I
> >> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
> >> >> >>
> >> >> >> best
> >> >> >> Rupert
> >> >> >>
> >> >> >> [1]
> >> >> >>
> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
> >> >> >>
> >> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> > Thanks.
> >> >> >> > I assume I should get the Named entities using the same but with
> >> >> >> > NlpAnnotations.NER_ANNOTATION?
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> >> >> >> > rupert.westenthaler@gmail.com>:
> >> >> >> >
> >> >> >> >> Hallo Cristian,
> >> >> >> >>
> >> >> >> >> NounPhrases are not added to the RDF enhancement results. You
> >> need to
> >> >> >> >> use the AnalyzedText ContentPart [1]
> >> >> >> >>
> >> >> >> >> here is some demo code you can use in the computeEnhancement
> >> method
> >> >> >> >>
> >> >> >> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this,
> >> ci,
> >> >> >> true);
> >> >> >> >>         Iterator<? extends Section> sections =
> at.getSentences();
> >> >> >> >>         if(!sections.hasNext()){ //process as single sentence
> >> >> >> >>             sections = Collections.singleton(at).iterator();
> >> >> >> >>         }
> >> >> >> >>
> >> >> >> >>         while(sections.hasNext()){
> >> >> >> >>             Section section = sections.next();
> >> >> >> >>             Iterator<Span> chunks =
> >> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >> >> >> >>             while(chunks.hasNext()){
> >> >> >> >>                 Span chunk = chunks.next();
> >> >> >> >>                 Value<PhraseTag> phrase =
> >> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >> >> >> >>                 if(phrase.value().getCategory() ==
> >> >> >> LexicalCategory.Noun){
> >> >> >> >>                     log.info(" - NounPhrase [{},{}] {}", new
> >> >> Object[]{
> >> >> >> >>
> >> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >> >> >> >>                 }
> >> >> >> >>             }
> >> >> >> >>         }
> >> >> >> >>
> >> >> >> >> hope this helps
> >> >> >> >>
> >> >> >> >> best
> >> >> >> >> Rupert
> >> >> >> >>
> >> >> >> >> [1]
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >> >> >> >>
> >> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> >> > I started to implement the engine and I'm having problems
> with
> >> >> getting
> >> >> >> >> > results for noun phrases. I modified the "default" weighted
> >> chain
> >> >> to
> >> >> >> also
> >> >> >> >> > include the PosChunkerEngine and ran a sample text : "Angela
> >> Merkel
> >> >> >> >> visted
> >> >> >> >> > China. The german chancellor met with various people". I
> >> expected
> >> >> that
> >> >> >> >> the
> >> >> >> >> > RDF XML output would contain some info about the noun phrases
> >> but I
> >> >> >> >> cannot
> >> >> >> >> > see any.
> >> >> >> >> > Could you point me to the correct way to generate the noun
> >> phrases?
> >> >> >> >> >
> >> >> >> >> > Thanks,
> >> >> >> >> > Cristian
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >> >> >> >> cristian.petroaca@gmail.com>:
> >> >> >> >> >
> >> >> >> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >> >> >> >> cristian.petroaca@gmail.com>
> >> >> >> >> >> :
> >> >> >> >> >>
> >> >> >> >> >> Hi Rupert,
> >> >> >> >> >>>
> >> >> >> >> >>> The "spatial" dimension is a good idea. I'll also take a
> look
> >> at
> >> >> >> Yago.
> >> >> >> >> >>>
> >> >> >> >> >>> I will create a Jira with what we talked about here. It
> will
> >> >> >> probably
> >> >> >> >> >>> have just a draft-like description for now and will be
> updated
> >> >> as I
> >> >> >> go
> >> >> >> >> >>> along.
> >> >> >> >> >>>
> >> >> >> >> >>> Thanks,
> >> >> >> >> >>> Cristian
> >> >> >> >> >>>
> >> >> >> >> >>>
> >> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> >> >> >> >> >>> rupert.westenthaler@gmail.com>:
> >> >> >> >> >>>
> >> >> >> >> >>> Hi Cristian,
> >> >> >> >> >>>>
> >> >> >> >> >>>> definitely an interesting approach. You should have a
> look at
> >> >> Yago2
> >> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much
> >> better
> >> >> >> >> >>>> structured as the one used by dbpedia. Mapping
> suggestions of
> >> >> >> dbpedia
> >> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do
> >> >> provide
> >> >> >> >> >>>> mappings [2] and [3]
> >> >> >> >> >>>>
> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rharo@apache.org
> >:
> >> >> >> >> >>>> >>
> >> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's
> company
> >> >> made
> >> >> >> a
> >> >> >> >> >>>> >> huge profit".
> >> >> >> >> >>>>
> >> >> >> >> >>>> Thats actually a very good example. Spatial contexts are
> very
> >> >> >> >> >>>> important as they tend to be often used for referencing.
> So I
> >> >> would
> >> >> >> >> >>>> suggest to specially treat the spatial context. For
> spatial
> >> >> >> Entities
> >> >> >> >> >>>> (like a City) this is easy, but even for other (like a
> >> Person,
> >> >> >> >> >>>> Company) you could use relations to spatial entities
> define
> >> >> their
> >> >> >> >> >>>> spatial context. This context could than be used to
> correctly
> >> >> link
> >> >> >> >> >>>> "The Redmond's company" to "Microsoft".
> >> >> >> >> >>>>
> >> >> >> >> >>>> In addition I would suggest to use the "spatial" context
> of
> >> each
> >> >> >> >> >>>> entity (basically relation to entities that are cities,
> >> regions,
> >> >> >> >> >>>> countries) as a separate dimension, because those are very
> >> often
> >> >> >> used
> >> >> >> >> >>>> for coreferences.
> >> >> >> >> >>>>
> >> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> >> >> >> >> >>>> [2]
> http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >> >> >> >> >>>> [3]
> >> >> >> >> >>>>
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >> >> >> >> >>>>
> >> >> >> >> >>>>
> >> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
> >> >> >> >> >>>> <cr...@gmail.com> wrote:
> >> >> >> >> >>>> > There are several dbpedia categories for each entity, in
> >> this
> >> >> >> case
> >> >> >> >> for
> >> >> >> >> >>>> > Microsoft we have :
> >> >> >> >> >>>> >
> >> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >> >> >> >> >>>> > category:Microsoft
> >> >> >> >> >>>> > category:Software_companies_of_the_United_States
> >> >> >> >> >>>> > category:Software_companies_based_in_Washington_(state)
> >> >> >> >> >>>> > category:Companies_established_in_1975
> >> >> >> >> >>>> > category:1975_establishments_in_the_United_States
> >> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
> >> >> >> >> >>>> >
> >> >> >>
> category:Multinational_companies_headquartered_in_the_United_States
> >> >> >> >> >>>> > category:Cloud_computing_providers
> >> >> >> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
> >> >> >> >> >>>> >
> >> >> >> >> >>>> > So we also have "Companies based in Redmont,Washington"
> >> which
> >> >> >> could
> >> >> >> >> be
> >> >> >> >> >>>> > matched.
> >> >> >> >> >>>> >
> >> >> >> >> >>>> >
> >> >> >> >> >>>> > There is still other contextual information from dbpedia
> >> which
> >> >> >> can
> >> >> >> >> be
> >> >> >> >> >>>> used.
> >> >> >> >> >>>> > For example for an Organization we could also include :
> >> >> >> >> >>>> > dbpprop:industry = Software
> >> >> >> >> >>>> > dbpprop:service = Online Service Providers
> >> >> >> >> >>>> >
> >> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
> >> >> >> >> >>>> >
> >> >> >> >> >>>> > dbpedia-owl:profession:
> >> >> >> >> >>>> >                                dbpedia:Author
> >> >> >> >> >>>> >
>  dbpedia:Constitutional_law
> >> >> >> >> >>>> >                                dbpedia:Lawyer
> >> >> >> >> >>>> >
>  dbpedia:Community_organizing
> >> >> >> >> >>>> >
> >> >> >> >> >>>> > I'd like to continue investigating this as I think that
> it
> >> may
> >> >> >> have
> >> >> >> >> >>>> some
> >> >> >> >> >>>> > value in increasing the number of coreference
> resolutions
> >> and
> >> >> I'd
> >> >> >> >> like
> >> >> >> >> >>>> to
> >> >> >> >> >>>> > concentrate more on precision rather than recall since
> we
> >> >> already
> >> >> >> >> have
> >> >> >> >> >>>> a
> >> >> >> >> >>>> > set of coreferences detected by the stanford nlp tool
> and
> >> this
> >> >> >> would
> >> >> >> >> >>>> be as
> >> >> >> >> >>>> > an addition to that (at least this is how I would like
> to
> >> use
> >> >> >> it).
> >> >> >> >> >>>> >
> >> >> >> >> >>>> > Is it ok if I track this by opening a jira? I could
> update
> >> it
> >> >> to
> >> >> >> >> show
> >> >> >> >> >>>> my
> >> >> >> >> >>>> > progress and also my conclusions and if it turns out
> that
> >> it
> >> >> was
> >> >> >> a
> >> >> >> >> bad
> >> >> >> >> >>>> idea
> >> >> >> >> >>>> > then that's the situation at least I'll end up with more
> >> >> >> knowledge
> >> >> >> >> >>>> about
> >> >> >> >> >>>> > Stanbol in the end :).
> >> >> >> >> >>>> >
> >> >> >> >> >>>> >
> >> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rharo@apache.org
> >:
> >> >> >> >> >>>> >
> >> >> >> >> >>>> >> Hi Cristian,
> >> >> >> >> >>>> >>
> >> >> >> >> >>>> >> The approach sounds nice. I don't want to be the
> devil's
> >> >> >> advocate
> >> >> >> >> but
> >> >> >> >> >>>> I'm
> >> >> >> >> >>>> >> just not sure about the recall using the dbpedia
> >> categories
> >> >> >> >> feature.
> >> >> >> >> >>>> For
> >> >> >> >> >>>> >> example, your sentence could be also "Microsoft posted
> its
> >> >> 2013
> >> >> >> >> >>>> earnings.
> >> >> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
> >> >> including
> >> >> >> more
> >> >> >> >> >>>> >> contextual information from dbpedia could increase the
> >> recall
> >> >> >> but
> >> >> >> >> of
> >> >> >> >> >>>> course
> >> >> >> >> >>>> >> will reduce the precision.
> >> >> >> >> >>>> >>
> >> >> >> >> >>>> >> Cheers,
> >> >> >> >> >>>> >> Rafa
> >> >> >> >> >>>> >>
> >> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >> >> >> >> >>>> >>
> >> >> >> >> >>>> >>  Back with a more detailed description of the steps for
> >> >> making
> >> >> >> this
> >> >> >> >> >>>> kind of
> >> >> >> >> >>>> >>> coreference work.
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>> I will be using references to the following text in
> the
> >> >> steps
> >> >> >> >> below
> >> >> >> >> >>>> in
> >> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted its
> 2013
> >> >> >> >> earnings.
> >> >> >> >> >>>> The
> >> >> >> >> >>>> >>> software company made a huge profit."
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
> >> >> >> >> >>>> >>>      a. a determinate pos which implies reference to
> an
> >> >> entity
> >> >> >> >> local
> >> >> >> >> >>>> to
> >> >> >> >> >>>> >>> the
> >> >> >> >> >>>> >>> text, such as "the, this, these") but not "another,
> >> every",
> >> >> etc
> >> >> >> >> which
> >> >> >> >> >>>> >>> implies a reference to an entity outside of the text.
> >> >> >> >> >>>> >>>      b. having at least another noun aside from the
> main
> >> >> >> required
> >> >> >> >> >>>> noun
> >> >> >> >> >>>> >>> which
> >> >> >> >> >>>> >>> further describes it. For example I will not count
> "The
> >> >> >> company"
> >> >> >> >> as
> >> >> >> >> >>>> being
> >> >> >> >> >>>> >>> a
> >> >> >> >> >>>> >>> legitimate candidate since this could create a lot of
> >> false
> >> >> >> >> >>>> positives by
> >> >> >> >> >>>> >>> considering the double meaning of some words such as
> "in
> >> the
> >> >> >> >> company
> >> >> >> >> >>>> of
> >> >> >> >> >>>> >>> good people".
> >> >> >> >> >>>> >>> "The software company" is a good candidate since we
> also
> >> >> have
> >> >> >> >> >>>> "software".
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the contents
> of
> >> the
> >> >> >> >> dbpedia
> >> >> >> >> >>>> >>> categories of each named entity found prior to the
> >> location
> >> >> of
> >> >> >> the
> >> >> >> >> >>>> noun
> >> >> >> >> >>>> >>> phrase in the text.
> >> >> >> >> >>>> >>> The dbpedia categories are in the following format
> (for
> >> >> >> Microsoft
> >> >> >> >> for
> >> >> >> >> >>>> >>> example) : "Software companies of the United States".
> >> >> >> >> >>>> >>>   So we try to match "software company" with that.
> >> >> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia
> >> category
> >> >> >> has a
> >> >> >> >> >>>> plural
> >> >> >> >> >>>> >>> form and it's the same for all categories which I
> saw. I
> >> >> don't
> >> >> >> >> know
> >> >> >> >> >>>> if
> >> >> >> >> >>>> >>> there's an easier way to do this but I thought of
> >> applying a
> >> >> >> >> >>>> lemmatizer on
> >> >> >> >> >>>> >>> the category and the noun phrase in order for them to
> >> have a
> >> >> >> >> common
> >> >> >> >> >>>> >>> denominator.This also works if the noun phrase itself
> >> has a
> >> >> >> plural
> >> >> >> >> >>>> form.
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>> Second, I'll need to use for comparison only the
> words in
> >> >> the
> >> >> >> >> >>>> category
> >> >> >> >> >>>> >>> which are themselves nouns and not prepositions or
> >> >> determiners
> >> >> >> >> such
> >> >> >> >> >>>> as "of
> >> >> >> >> >>>> >>> the".This means that I need to pos tag the categories
> >> >> contents
> >> >> >> as
> >> >> >> >> >>>> well.
> >> >> >> >> >>>> >>> I was thinking of running the pos and lemma on the
> >> dbpedia
> >> >> >> >> >>>> categories when
> >> >> >> >> >>>> >>> building the dbpedia backed entity hub and storing
> them
> >> for
> >> >> >> later
> >> >> >> >> >>>> use - I
> >> >> >> >> >>>> >>> don't know how feasible this is at the moment.
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>> After this I can compare each noun in the noun phrase
> >> with
> >> >> the
> >> >> >> >> >>>> equivalent
> >> >> >> >> >>>> >>> nouns in the categories and based on the number of
> >> matches I
> >> >> >> can
> >> >> >> >> >>>> create a
> >> >> >> >> >>>> >>> confidence level.
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type
> >> from
> >> >> >> >> dbpedia
> >> >> >> >> >>>> of the
> >> >> >> >> >>>> >>> named entity. If this matches increase the confidence
> >> level.
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>> 4. If there are multiple named entities which can
> match a
> >> >> >> certain
> >> >> >> >> >>>> noun
> >> >> >> >> >>>> >>> phrase then link the noun phrase with the closest
> named
> >> >> entity
> >> >> >> >> prior
> >> >> >> >> >>>> to it
> >> >> >> >> >>>> >>> in the text.
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>> What do you think?
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>> Cristian
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
> >> cristian.petroaca@gmail.com>:
> >> >> >> >> >>>> >>>
> >> >> >> >> >>>> >>>  Hi Rafa,
> >> >> >> >> >>>> >>>>
> >> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm
> working on
> >> >> it.
> >> >> >> I'll
> >> >> >> >> >>>> provide
> >> >> >> >> >>>> >>>> it here so that you guys can give me a feedback on
> it.
> >> >> >> >> >>>> >>>>
> >> >> >> >> >>>> >>>> What are "locality" features?
> >> >> >> >> >>>> >>>>
> >> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef
> >> and
> >> >> >> >> >>>> CherryPicker
> >> >> >> >> >>>> >>>> and
> >> >> >> >> >>>> >>>> they don't provide such a coreference.
> >> >> >> >> >>>> >>>>
> >> >> >> >> >>>> >>>> Cristian
> >> >> >> >> >>>> >>>>
> >> >> >> >> >>>> >>>>
> >> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
> >> >> >> >> >>>> >>>>
> >> >> >> >> >>>> >>>> Hi Cristian,
> >> >> >> >> >>>> >>>>
> >> >> >> >> >>>> >>>>> Without having more details about your concrete
> >> heuristic,
> >> >> >> in my
> >> >> >> >> >>>> honest
> >> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of false
> >> >> >> positives. I
> >> >> >> >> >>>> don't
> >> >> >> >> >>>> >>>>> know
> >> >> >> >> >>>> >>>>> if you are planning to use some "locality" features
> to
> >> >> detect
> >> >> >> >> such
> >> >> >> >> >>>> >>>>> coreferences but you need to take into account that
> it
> >> is
> >> >> >> quite
> >> >> >> >> >>>> usual
> >> >> >> >> >>>> >>>>> that
> >> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in different
> >> >> >> paragraphs.
> >> >> >> >> >>>> Although
> >> >> >> >> >>>> >>>>> I'm
> >> >> >> >> >>>> >>>>> not an expert in Natural Language Understanding, I
> >> would
> >> >> say
> >> >> >> it
> >> >> >> >> is
> >> >> >> >> >>>> quite
> >> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates for
> >> >> >> coreferencing
> >> >> >> >> >>>> using
> >> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others
> tools
> >> like
> >> >> >> BART
> >> >> >> >> (
> >> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
> >> >> >> >> >>>> >>>>>
> >> >> >> >> >>>> >>>>> Cheers,
> >> >> >> >> >>>> >>>>> Rafa Haro
> >> >> >> >> >>>> >>>>>
> >> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
> >> >> >> >> >>>> >>>>>
> >> >> >> >> >>>> >>>>>   Hi,
> >> >> >> >> >>>> >>>>>
> >> >> >> >> >>>> >>>>>> One of the necessary steps for implementing the
> Event
> >> >> >> >> extraction
> >> >> >> >> >>>> Engine
> >> >> >> >> >>>> >>>>>> feature :
> >> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
> >> >> >> >> >>>> to
> >> >> >> >> >>>> >>>>>> have
> >> >> >> >> >>>> >>>>>> coreference resolution in the given text. This is
> >> >> provided
> >> >> >> now
> >> >> >> >> >>>> via the
> >> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this
> module
> >> is
> >> >> >> >> performing
> >> >> >> >> >>>> >>>>>> mostly
> >> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and
> Mr.
> >> >> Obama)
> >> >> >> >> >>>> coreference
> >> >> >> >> >>>> >>>>>> resolution.
> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >>>> >>>>>> In order to get more coreferences from the text I
> >> though
> >> >> of
> >> >> >> >> >>>> creating
> >> >> >> >> >>>> >>>>>> some
> >> >> >> >> >>>> >>>>>> logic that would detect this kind of coreference :
> >> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software
> >> company
> >> >> just
> >> >> >> >> >>>> announced
> >> >> >> >> >>>> >>>>>> its
> >> >> >> >> >>>> >>>>>> 2013 earnings."
> >> >> >> >> >>>> >>>>>> Here "The software company" obviously refers to
> >> "Apple".
> >> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named
> Entities
> >> >> which
> >> >> >> are
> >> >> >> >> of
> >> >> >> >> >>>> the
> >> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case
> "company"
> >> and
> >> >> >> also
> >> >> >> >> >>>> have
> >> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
> >> categories
> >> >> of
> >> >> >> the
> >> >> >> >> >>>> named
> >> >> >> >> >>>> >>>>>> entity, in this case "software".
> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >>>> >>>>>> The detection of coreferences such as "The software
> >> >> >> company" in
> >> >> >> >> >>>> the
> >> >> >> >> >>>> >>>>>> text
> >> >> >> >> >>>> >>>>>> would also be done by either using the new Pos Tag
> >> Based
> >> >> >> Phrase
> >> >> >> >> >>>> >>>>>> extraction
> >> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency
> tree of
> >> >> the
> >> >> >> >> >>>> sentence and
> >> >> >> >> >>>> >>>>>> picking up only subjects or objects.
> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind of
> logic
> >> >> would
> >> >> >> be
> >> >> >> >> >>>> useful
> >> >> >> >> >>>> >>>>>> as a
> >> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the precision
> and
> >> >> >> recall
> >> >> >> >> are
> >> >> >> >> >>>> good
> >> >> >> >> >>>> >>>>>> enough) in Stanbol?
> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >>>> >>>>>> Thanks,
> >> >> >> >> >>>> >>>>>> Cristian
> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >>>> >>>>>>
> >> >> >> >> >>>> >>
> >> >> >> >> >>>>
> >> >> >> >> >>>>
> >> >> >> >> >>>>
> >> >> >> >> >>>> --
> >> >> >> >> >>>> | Rupert Westenthaler
> >> rupert.westenthaler@gmail.com
> >> >> >> >> >>>> | Bodenlehenstraße 11
> >> >> >> ++43-699-11108907
> >> >> >> >> >>>> | A-5500 Bischofshofen
> >> >> >> >> >>>>
> >> >> >> >> >>>
> >> >> >> >> >>>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> | Rupert Westenthaler
> rupert.westenthaler@gmail.com
> >> >> >> >> | Bodenlehenstraße 11
> >> ++43-699-11108907
> >> >> >> >> | A-5500 Bischofshofen
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> >> | Bodenlehenstraße 11
> ++43-699-11108907
> >> >> >> | A-5500 Bischofshofen
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian,

can you provide the contents of the chain after your modifications?
Would be interesting to test why the chain is no longer active after
the restart.

You can find the config file in the 'stanbol/fileinstall' folder.

best
Rupert

On Tue, Mar 18, 2014 at 8:24 PM, Cristian Petroaca
<cr...@gmail.com> wrote:
> Related to the default chain selection rules : before restart I had a chain
> with the name 'default' as in I could access it via enhancer/chain/default.
> Then I just added another engine to the 'default' chain. I assumed that
> after the restart the chain with the 'default' name would be persisted. So
> the first rule should have been applied after the restart as well. But
> instead I cannot reach it via enhancer/chain/default anymore so its gone.
> Anyway, this is not a big deal, it's not blocking me in any way, I just
> wanted to understand where the problem is.
>
>
> 2014-03-18 7:15 GMT+02:00 Rupert Westenthaler <rupert.westenthaler@gmail.com
>>:
>
>> Hi Cristian
>>
>> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > 1. Updated to the latest code and it's gone. Cool
>> >
>> > 2. I start the stable launcher -> create a new instance of the
>> > PosChunkerEngine -> add it to the default chain. At this point everything
>> > looks good and works ok.
>> > After I restart the server the default chain is gone and instead I see
>> this
>> > in the enhancement chains page : all-active (default, id: 149, ranking:
>> 0,
>> > impl: AllActiveEnginesChain ). all-active did not contain the 'default'
>> > word before the restart.
>> >
>>
>> Please note the default chain selection rules as described at [1]. You
>> can also access chains chains under '/enhancer/chain/{chain-name}'
>>
>> best
>> Rupert
>>
>> [1]
>> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>>
>> > It looks like the config files are exactly what I need. Thanks.
>> >
>> >
>> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
>> rupert.westenthaler@gmail.com
>> >>:
>> >
>> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> >> <cr...@gmail.com> wrote:
>> >> > Thanks Rupert.
>> >> >
>> >> > A couple more questions/issues :
>> >> >
>> >> > 1. Whenever I start the stanbol server I'm seeing this in the console
>> >> > output :
>> >> >
>> >>
>> >> This should be fixed with STANBOL-1278 [1] [2]
>> >>
>> >> > 2. Whenever I restart the server the Weighted Chains get messed up. I
>> >> > usually use the 'default' chain and add my engine to it so there are
>> 11
>> >> > engines in it. After the restart this chain now contains around 23
>> >> engines
>> >> > in total.
>> >>
>> >> I was not able to replicate this. What I tried was
>> >>
>> >> (1) start up the stable launcher
>> >> (2) add an additional engine to the default chain
>> >> (3) restart the launcher
>> >>
>> >> The default chain was not changed after (2) and (3). So I would need
>> >> further information for knowing why this is happening.
>> >>
>> >> Generally it is better to create you own chain instance as modifying
>> >> one that is provided by the default configuration. I would also
>> >> recommend that you keep your test configuration in text files and to
>> >> copy those to the 'stanbol/fileinstall' folder. Doing so prevent you
>> >> from manually entering the configuration after a software update. The
>> >> production-mode section [3] provides information on how to do that.
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> >> [2] http://svn.apache.org/r1576623
>> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
>> >>
>> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]:
>> Error
>> >> > starting
>> >> >
>> >>
>>  slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> >> > tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> >> > (org.osgi
>> >> > .framework.BundleException: Unresolved constraint in bundle
>> >> > org.apache.stanbol.e
>> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
>> >> > requirement [15
>> >> > 3.0] package; (&(package=javax.ws.rs
>> >> )(version>=0.0.0)(!(version>=2.0.0))))
>> >> > org.osgi.framework.BundleException: Unresolved constraint in bundle
>> >> > org.apache.s
>> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0:
>> missing
>> >> > require
>> >> > ment [153.0] package; (&(package=javax.ws.rs
>> >> > )(version>=0.0.0)(!(version>=2.0.0))
>> >> > )
>> >> >         at
>> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >> >         at
>> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >> >         at
>> >> > org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >> >
>> >> >         at
>> >> > org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> >> > )
>> >> >         at java.lang.Thread.run(Unknown Source)
>> >> >
>> >> > Despite of this the server starts fine and I can use the enhancer
>> fine.
>> >> Do
>> >> > you guys see this as well?
>> >> >
>> >> >
>> >> > 2. Whenever I restart the server the Weighted Chains get messed up. I
>> >> > usually use the 'default' chain and add my engine to it so there are
>> 11
>> >> > engines in it. After the restart this chain now contains around 23
>> >> engines
>> >> > in total.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> >> rupert.westenthaler@gmail.com
>> >> >>:
>> >> >
>> >> >> Hi Cristian,
>> >> >>
>> >> >> NER Annotations are typically available as both
>> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
>> >> >> enhancement metadata. As you are already accessing the AnayzedText I
>> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>> >> >>
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >> [1]
>> >> >>
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >> >>
>> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >> >> <cr...@gmail.com> wrote:
>> >> >> > Thanks.
>> >> >> > I assume I should get the Named entities using the same but with
>> >> >> > NlpAnnotations.NER_ANNOTATION?
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >> >> > rupert.westenthaler@gmail.com>:
>> >> >> >
>> >> >> >> Hallo Cristian,
>> >> >> >>
>> >> >> >> NounPhrases are not added to the RDF enhancement results. You
>> need to
>> >> >> >> use the AnalyzedText ContentPart [1]
>> >> >> >>
>> >> >> >> here is some demo code you can use in the computeEnhancement
>> method
>> >> >> >>
>> >> >> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this,
>> ci,
>> >> >> true);
>> >> >> >>         Iterator<? extends Section> sections = at.getSentences();
>> >> >> >>         if(!sections.hasNext()){ //process as single sentence
>> >> >> >>             sections = Collections.singleton(at).iterator();
>> >> >> >>         }
>> >> >> >>
>> >> >> >>         while(sections.hasNext()){
>> >> >> >>             Section section = sections.next();
>> >> >> >>             Iterator<Span> chunks =
>> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >> >> >>             while(chunks.hasNext()){
>> >> >> >>                 Span chunk = chunks.next();
>> >> >> >>                 Value<PhraseTag> phrase =
>> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >> >> >>                 if(phrase.value().getCategory() ==
>> >> >> LexicalCategory.Noun){
>> >> >> >>                     log.info(" - NounPhrase [{},{}] {}", new
>> >> Object[]{
>> >> >> >>
>> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >> >> >>                 }
>> >> >> >>             }
>> >> >> >>         }
>> >> >> >>
>> >> >> >> hope this helps
>> >> >> >>
>> >> >> >> best
>> >> >> >> Rupert
>> >> >> >>
>> >> >> >> [1]
>> >> >> >>
>> >> >>
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >> >> >>
>> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >> >> >> <cr...@gmail.com> wrote:
>> >> >> >> > I started to implement the engine and I'm having problems with
>> >> getting
>> >> >> >> > results for noun phrases. I modified the "default" weighted
>> chain
>> >> to
>> >> >> also
>> >> >> >> > include the PosChunkerEngine and ran a sample text : "Angela
>> Merkel
>> >> >> >> visted
>> >> >> >> > China. The german chancellor met with various people". I
>> expected
>> >> that
>> >> >> >> the
>> >> >> >> > RDF XML output would contain some info about the noun phrases
>> but I
>> >> >> >> cannot
>> >> >> >> > see any.
>> >> >> >> > Could you point me to the correct way to generate the noun
>> phrases?
>> >> >> >> >
>> >> >> >> > Thanks,
>> >> >> >> > Cristian
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >> >> >> cristian.petroaca@gmail.com>:
>> >> >> >> >
>> >> >> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >> >> >> cristian.petroaca@gmail.com>
>> >> >> >> >> :
>> >> >> >> >>
>> >> >> >> >> Hi Rupert,
>> >> >> >> >>>
>> >> >> >> >>> The "spatial" dimension is a good idea. I'll also take a look
>> at
>> >> >> Yago.
>> >> >> >> >>>
>> >> >> >> >>> I will create a Jira with what we talked about here. It will
>> >> >> probably
>> >> >> >> >>> have just a draft-like description for now and will be updated
>> >> as I
>> >> >> go
>> >> >> >> >>> along.
>> >> >> >> >>>
>> >> >> >> >>> Thanks,
>> >> >> >> >>> Cristian
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >> >> >> >>> rupert.westenthaler@gmail.com>:
>> >> >> >> >>>
>> >> >> >> >>> Hi Cristian,
>> >> >> >> >>>>
>> >> >> >> >>>> definitely an interesting approach. You should have a look at
>> >> Yago2
>> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much
>> better
>> >> >> >> >>>> structured as the one used by dbpedia. Mapping suggestions of
>> >> >> dbpedia
>> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do
>> >> provide
>> >> >> >> >>>> mappings [2] and [3]
>> >> >> >> >>>>
>> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>> >> >> >> >>>> >>
>> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company
>> >> made
>> >> >> a
>> >> >> >> >>>> >> huge profit".
>> >> >> >> >>>>
>> >> >> >> >>>> Thats actually a very good example. Spatial contexts are very
>> >> >> >> >>>> important as they tend to be often used for referencing. So I
>> >> would
>> >> >> >> >>>> suggest to specially treat the spatial context. For spatial
>> >> >> Entities
>> >> >> >> >>>> (like a City) this is easy, but even for other (like a
>> Person,
>> >> >> >> >>>> Company) you could use relations to spatial entities define
>> >> their
>> >> >> >> >>>> spatial context. This context could than be used to correctly
>> >> link
>> >> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >> >> >> >>>>
>> >> >> >> >>>> In addition I would suggest to use the "spatial" context of
>> each
>> >> >> >> >>>> entity (basically relation to entities that are cities,
>> regions,
>> >> >> >> >>>> countries) as a separate dimension, because those are very
>> often
>> >> >> used
>> >> >> >> >>>> for coreferences.
>> >> >> >> >>>>
>> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >> >> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >> >> >> >>>> [3]
>> >> >> >> >>>>
>> >> >> >>
>> >> >>
>> >>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> >> >> >> >>>> <cr...@gmail.com> wrote:
>> >> >> >> >>>> > There are several dbpedia categories for each entity, in
>> this
>> >> >> case
>> >> >> >> for
>> >> >> >> >>>> > Microsoft we have :
>> >> >> >> >>>> >
>> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >> >> >> >>>> > category:Microsoft
>> >> >> >> >>>> > category:Software_companies_of_the_United_States
>> >> >> >> >>>> > category:Software_companies_based_in_Washington_(state)
>> >> >> >> >>>> > category:Companies_established_in_1975
>> >> >> >> >>>> > category:1975_establishments_in_the_United_States
>> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
>> >> >> >> >>>> >
>> >> >> category:Multinational_companies_headquartered_in_the_United_States
>> >> >> >> >>>> > category:Cloud_computing_providers
>> >> >> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
>> >> >> >> >>>> >
>> >> >> >> >>>> > So we also have "Companies based in Redmont,Washington"
>> which
>> >> >> could
>> >> >> >> be
>> >> >> >> >>>> > matched.
>> >> >> >> >>>> >
>> >> >> >> >>>> >
>> >> >> >> >>>> > There is still other contextual information from dbpedia
>> which
>> >> >> can
>> >> >> >> be
>> >> >> >> >>>> used.
>> >> >> >> >>>> > For example for an Organization we could also include :
>> >> >> >> >>>> > dbpprop:industry = Software
>> >> >> >> >>>> > dbpprop:service = Online Service Providers
>> >> >> >> >>>> >
>> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
>> >> >> >> >>>> >
>> >> >> >> >>>> > dbpedia-owl:profession:
>> >> >> >> >>>> >                                dbpedia:Author
>> >> >> >> >>>> >                                dbpedia:Constitutional_law
>> >> >> >> >>>> >                                dbpedia:Lawyer
>> >> >> >> >>>> >                                dbpedia:Community_organizing
>> >> >> >> >>>> >
>> >> >> >> >>>> > I'd like to continue investigating this as I think that it
>> may
>> >> >> have
>> >> >> >> >>>> some
>> >> >> >> >>>> > value in increasing the number of coreference resolutions
>> and
>> >> I'd
>> >> >> >> like
>> >> >> >> >>>> to
>> >> >> >> >>>> > concentrate more on precision rather than recall since we
>> >> already
>> >> >> >> have
>> >> >> >> >>>> a
>> >> >> >> >>>> > set of coreferences detected by the stanford nlp tool and
>> this
>> >> >> would
>> >> >> >> >>>> be as
>> >> >> >> >>>> > an addition to that (at least this is how I would like to
>> use
>> >> >> it).
>> >> >> >> >>>> >
>> >> >> >> >>>> > Is it ok if I track this by opening a jira? I could update
>> it
>> >> to
>> >> >> >> show
>> >> >> >> >>>> my
>> >> >> >> >>>> > progress and also my conclusions and if it turns out that
>> it
>> >> was
>> >> >> a
>> >> >> >> bad
>> >> >> >> >>>> idea
>> >> >> >> >>>> > then that's the situation at least I'll end up with more
>> >> >> knowledge
>> >> >> >> >>>> about
>> >> >> >> >>>> > Stanbol in the end :).
>> >> >> >> >>>> >
>> >> >> >> >>>> >
>> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>> >> >> >> >>>> >
>> >> >> >> >>>> >> Hi Cristian,
>> >> >> >> >>>> >>
>> >> >> >> >>>> >> The approach sounds nice. I don't want to be the devil's
>> >> >> advocate
>> >> >> >> but
>> >> >> >> >>>> I'm
>> >> >> >> >>>> >> just not sure about the recall using the dbpedia
>> categories
>> >> >> >> feature.
>> >> >> >> >>>> For
>> >> >> >> >>>> >> example, your sentence could be also "Microsoft posted its
>> >> 2013
>> >> >> >> >>>> earnings.
>> >> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
>> >> including
>> >> >> more
>> >> >> >> >>>> >> contextual information from dbpedia could increase the
>> recall
>> >> >> but
>> >> >> >> of
>> >> >> >> >>>> course
>> >> >> >> >>>> >> will reduce the precision.
>> >> >> >> >>>> >>
>> >> >> >> >>>> >> Cheers,
>> >> >> >> >>>> >> Rafa
>> >> >> >> >>>> >>
>> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >> >> >> >>>> >>
>> >> >> >> >>>> >>  Back with a more detailed description of the steps for
>> >> making
>> >> >> this
>> >> >> >> >>>> kind of
>> >> >> >> >>>> >>> coreference work.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> I will be using references to the following text in the
>> >> steps
>> >> >> >> below
>> >> >> >> >>>> in
>> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013
>> >> >> >> earnings.
>> >> >> >> >>>> The
>> >> >> >> >>>> >>> software company made a huge profit."
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
>> >> >> >> >>>> >>>      a. a determinate pos which implies reference to an
>> >> entity
>> >> >> >> local
>> >> >> >> >>>> to
>> >> >> >> >>>> >>> the
>> >> >> >> >>>> >>> text, such as "the, this, these") but not "another,
>> every",
>> >> etc
>> >> >> >> which
>> >> >> >> >>>> >>> implies a reference to an entity outside of the text.
>> >> >> >> >>>> >>>      b. having at least another noun aside from the main
>> >> >> required
>> >> >> >> >>>> noun
>> >> >> >> >>>> >>> which
>> >> >> >> >>>> >>> further describes it. For example I will not count "The
>> >> >> company"
>> >> >> >> as
>> >> >> >> >>>> being
>> >> >> >> >>>> >>> a
>> >> >> >> >>>> >>> legitimate candidate since this could create a lot of
>> false
>> >> >> >> >>>> positives by
>> >> >> >> >>>> >>> considering the double meaning of some words such as "in
>> the
>> >> >> >> company
>> >> >> >> >>>> of
>> >> >> >> >>>> >>> good people".
>> >> >> >> >>>> >>> "The software company" is a good candidate since we also
>> >> have
>> >> >> >> >>>> "software".
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of
>> the
>> >> >> >> dbpedia
>> >> >> >> >>>> >>> categories of each named entity found prior to the
>> location
>> >> of
>> >> >> the
>> >> >> >> >>>> noun
>> >> >> >> >>>> >>> phrase in the text.
>> >> >> >> >>>> >>> The dbpedia categories are in the following format (for
>> >> >> Microsoft
>> >> >> >> for
>> >> >> >> >>>> >>> example) : "Software companies of the United States".
>> >> >> >> >>>> >>>   So we try to match "software company" with that.
>> >> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia
>> category
>> >> >> has a
>> >> >> >> >>>> plural
>> >> >> >> >>>> >>> form and it's the same for all categories which I saw. I
>> >> don't
>> >> >> >> know
>> >> >> >> >>>> if
>> >> >> >> >>>> >>> there's an easier way to do this but I thought of
>> applying a
>> >> >> >> >>>> lemmatizer on
>> >> >> >> >>>> >>> the category and the noun phrase in order for them to
>> have a
>> >> >> >> common
>> >> >> >> >>>> >>> denominator.This also works if the noun phrase itself
>> has a
>> >> >> plural
>> >> >> >> >>>> form.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> Second, I'll need to use for comparison only the words in
>> >> the
>> >> >> >> >>>> category
>> >> >> >> >>>> >>> which are themselves nouns and not prepositions or
>> >> determiners
>> >> >> >> such
>> >> >> >> >>>> as "of
>> >> >> >> >>>> >>> the".This means that I need to pos tag the categories
>> >> contents
>> >> >> as
>> >> >> >> >>>> well.
>> >> >> >> >>>> >>> I was thinking of running the pos and lemma on the
>> dbpedia
>> >> >> >> >>>> categories when
>> >> >> >> >>>> >>> building the dbpedia backed entity hub and storing them
>> for
>> >> >> later
>> >> >> >> >>>> use - I
>> >> >> >> >>>> >>> don't know how feasible this is at the moment.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> After this I can compare each noun in the noun phrase
>> with
>> >> the
>> >> >> >> >>>> equivalent
>> >> >> >> >>>> >>> nouns in the categories and based on the number of
>> matches I
>> >> >> can
>> >> >> >> >>>> create a
>> >> >> >> >>>> >>> confidence level.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type
>> from
>> >> >> >> dbpedia
>> >> >> >> >>>> of the
>> >> >> >> >>>> >>> named entity. If this matches increase the confidence
>> level.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> 4. If there are multiple named entities which can match a
>> >> >> certain
>> >> >> >> >>>> noun
>> >> >> >> >>>> >>> phrase then link the noun phrase with the closest named
>> >> entity
>> >> >> >> prior
>> >> >> >> >>>> to it
>> >> >> >> >>>> >>> in the text.
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> What do you think?
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> Cristian
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
>> cristian.petroaca@gmail.com>:
>> >> >> >> >>>> >>>
>> >> >> >> >>>> >>>  Hi Rafa,
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on
>> >> it.
>> >> >> I'll
>> >> >> >> >>>> provide
>> >> >> >> >>>> >>>> it here so that you guys can give me a feedback on it.
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> What are "locality" features?
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef
>> and
>> >> >> >> >>>> CherryPicker
>> >> >> >> >>>> >>>> and
>> >> >> >> >>>> >>>> they don't provide such a coreference.
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> Cristian
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>> Hi Cristian,
>> >> >> >> >>>> >>>>
>> >> >> >> >>>> >>>>> Without having more details about your concrete
>> heuristic,
>> >> >> in my
>> >> >> >> >>>> honest
>> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of false
>> >> >> positives. I
>> >> >> >> >>>> don't
>> >> >> >> >>>> >>>>> know
>> >> >> >> >>>> >>>>> if you are planning to use some "locality" features to
>> >> detect
>> >> >> >> such
>> >> >> >> >>>> >>>>> coreferences but you need to take into account that it
>> is
>> >> >> quite
>> >> >> >> >>>> usual
>> >> >> >> >>>> >>>>> that
>> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in different
>> >> >> paragraphs.
>> >> >> >> >>>> Although
>> >> >> >> >>>> >>>>> I'm
>> >> >> >> >>>> >>>>> not an expert in Natural Language Understanding, I
>> would
>> >> say
>> >> >> it
>> >> >> >> is
>> >> >> >> >>>> quite
>> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates for
>> >> >> coreferencing
>> >> >> >> >>>> using
>> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools
>> like
>> >> >> BART
>> >> >> >> (
>> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >> >> >> >>>> >>>>>
>> >> >> >> >>>> >>>>> Cheers,
>> >> >> >> >>>> >>>>> Rafa Haro
>> >> >> >> >>>> >>>>>
>> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >> >> >> >>>> >>>>>
>> >> >> >> >>>> >>>>>   Hi,
>> >> >> >> >>>> >>>>>
>> >> >> >> >>>> >>>>>> One of the necessary steps for implementing the Event
>> >> >> >> extraction
>> >> >> >> >>>> Engine
>> >> >> >> >>>> >>>>>> feature :
>> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >> >> >> >>>> to
>> >> >> >> >>>> >>>>>> have
>> >> >> >> >>>> >>>>>> coreference resolution in the given text. This is
>> >> provided
>> >> >> now
>> >> >> >> >>>> via the
>> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module
>> is
>> >> >> >> performing
>> >> >> >> >>>> >>>>>> mostly
>> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr.
>> >> Obama)
>> >> >> >> >>>> coreference
>> >> >> >> >>>> >>>>>> resolution.
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>> In order to get more coreferences from the text I
>> though
>> >> of
>> >> >> >> >>>> creating
>> >> >> >> >>>> >>>>>> some
>> >> >> >> >>>> >>>>>> logic that would detect this kind of coreference :
>> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software
>> company
>> >> just
>> >> >> >> >>>> announced
>> >> >> >> >>>> >>>>>> its
>> >> >> >> >>>> >>>>>> 2013 earnings."
>> >> >> >> >>>> >>>>>> Here "The software company" obviously refers to
>> "Apple".
>> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities
>> >> which
>> >> >> are
>> >> >> >> of
>> >> >> >> >>>> the
>> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company"
>> and
>> >> >> also
>> >> >> >> >>>> have
>> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
>> categories
>> >> of
>> >> >> the
>> >> >> >> >>>> named
>> >> >> >> >>>> >>>>>> entity, in this case "software".
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>> The detection of coreferences such as "The software
>> >> >> company" in
>> >> >> >> >>>> the
>> >> >> >> >>>> >>>>>> text
>> >> >> >> >>>> >>>>>> would also be done by either using the new Pos Tag
>> Based
>> >> >> Phrase
>> >> >> >> >>>> >>>>>> extraction
>> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of
>> >> the
>> >> >> >> >>>> sentence and
>> >> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind of logic
>> >> would
>> >> >> be
>> >> >> >> >>>> useful
>> >> >> >> >>>> >>>>>> as a
>> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and
>> >> >> recall
>> >> >> >> are
>> >> >> >> >>>> good
>> >> >> >> >>>> >>>>>> enough) in Stanbol?
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>> Thanks,
>> >> >> >> >>>> >>>>>> Cristian
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>>>>>
>> >> >> >> >>>> >>
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>> --
>> >> >> >> >>>> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> >> >> >> >>>> | Bodenlehenstraße 11
>> >> >> ++43-699-11108907
>> >> >> >> >>>> | A-5500 Bischofshofen
>> >> >> >> >>>>
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> >> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >> >> >> | A-5500 Bischofshofen
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
Related to the default chain selection rules : before restart I had a chain
with the name 'default' as in I could access it via enhancer/chain/default.
Then I just added another engine to the 'default' chain. I assumed that
after the restart the chain with the 'default' name would be persisted. So
the first rule should have been applied after the restart as well. But
instead I cannot reach it via enhancer/chain/default anymore so its gone.
Anyway, this is not a big deal, it's not blocking me in any way, I just
wanted to understand where the problem is.


2014-03-18 7:15 GMT+02:00 Rupert Westenthaler <rupert.westenthaler@gmail.com
>:

> Hi Cristian
>
> On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > 1. Updated to the latest code and it's gone. Cool
> >
> > 2. I start the stable launcher -> create a new instance of the
> > PosChunkerEngine -> add it to the default chain. At this point everything
> > looks good and works ok.
> > After I restart the server the default chain is gone and instead I see
> this
> > in the enhancement chains page : all-active (default, id: 149, ranking:
> 0,
> > impl: AllActiveEnginesChain ). all-active did not contain the 'default'
> > word before the restart.
> >
>
> Please note the default chain selection rules as described at [1]. You
> can also access chains chains under '/enhancer/chain/{chain-name}'
>
> best
> Rupert
>
> [1]
> http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain
>
> > It looks like the config files are exactly what I need. Thanks.
> >
> >
> > 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <
> rupert.westenthaler@gmail.com
> >>:
> >
> >> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
> >> <cr...@gmail.com> wrote:
> >> > Thanks Rupert.
> >> >
> >> > A couple more questions/issues :
> >> >
> >> > 1. Whenever I start the stanbol server I'm seeing this in the console
> >> > output :
> >> >
> >>
> >> This should be fixed with STANBOL-1278 [1] [2]
> >>
> >> > 2. Whenever I restart the server the Weighted Chains get messed up. I
> >> > usually use the 'default' chain and add my engine to it so there are
> 11
> >> > engines in it. After the restart this chain now contains around 23
> >> engines
> >> > in total.
> >>
> >> I was not able to replicate this. What I tried was
> >>
> >> (1) start up the stable launcher
> >> (2) add an additional engine to the default chain
> >> (3) restart the launcher
> >>
> >> The default chain was not changed after (2) and (3). So I would need
> >> further information for knowing why this is happening.
> >>
> >> Generally it is better to create you own chain instance as modifying
> >> one that is provided by the default configuration. I would also
> >> recommend that you keep your test configuration in text files and to
> >> copy those to the 'stanbol/fileinstall' folder. Doing so prevent you
> >> from manually entering the configuration after a software update. The
> >> production-mode section [3] provides information on how to do that.
> >>
> >> best
> >> Rupert
> >>
> >> [1] https://issues.apache.org/jira/browse/STANBOL-1278
> >> [2] http://svn.apache.org/r1576623
> >> [3] http://stanbol.apache.org/docs/trunk/production-mode
> >>
> >> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]:
> Error
> >> > starting
> >> >
> >>
>  slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
> >> > tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
> >> > (org.osgi
> >> > .framework.BundleException: Unresolved constraint in bundle
> >> > org.apache.stanbol.e
> >> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
> >> > requirement [15
> >> > 3.0] package; (&(package=javax.ws.rs
> >> )(version>=0.0.0)(!(version>=2.0.0))))
> >> > org.osgi.framework.BundleException: Unresolved constraint in bundle
> >> > org.apache.s
> >> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0:
> missing
> >> > require
> >> > ment [153.0] package; (&(package=javax.ws.rs
> >> > )(version>=0.0.0)(!(version>=2.0.0))
> >> > )
> >> >         at
> >> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
> >> >         at
> org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
> >> >         at
> >> > org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
> >> >
> >> >         at
> >> > org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
> >> > )
> >> >         at java.lang.Thread.run(Unknown Source)
> >> >
> >> > Despite of this the server starts fine and I can use the enhancer
> fine.
> >> Do
> >> > you guys see this as well?
> >> >
> >> >
> >> > 2. Whenever I restart the server the Weighted Chains get messed up. I
> >> > usually use the 'default' chain and add my engine to it so there are
> 11
> >> > engines in it. After the restart this chain now contains around 23
> >> engines
> >> > in total.
> >> >
> >> >
> >> >
> >> >
> >> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
> >> rupert.westenthaler@gmail.com
> >> >>:
> >> >
> >> >> Hi Cristian,
> >> >>
> >> >> NER Annotations are typically available as both
> >> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
> >> >> enhancement metadata. As you are already accessing the AnayzedText I
> >> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
> >> >>
> >> >> best
> >> >> Rupert
> >> >>
> >> >> [1]
> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
> >> >>
> >> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> >> >> <cr...@gmail.com> wrote:
> >> >> > Thanks.
> >> >> > I assume I should get the Named entities using the same but with
> >> >> > NlpAnnotations.NER_ANNOTATION?
> >> >> >
> >> >> >
> >> >> >
> >> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> >> >> > rupert.westenthaler@gmail.com>:
> >> >> >
> >> >> >> Hallo Cristian,
> >> >> >>
> >> >> >> NounPhrases are not added to the RDF enhancement results. You
> need to
> >> >> >> use the AnalyzedText ContentPart [1]
> >> >> >>
> >> >> >> here is some demo code you can use in the computeEnhancement
> method
> >> >> >>
> >> >> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this,
> ci,
> >> >> true);
> >> >> >>         Iterator<? extends Section> sections = at.getSentences();
> >> >> >>         if(!sections.hasNext()){ //process as single sentence
> >> >> >>             sections = Collections.singleton(at).iterator();
> >> >> >>         }
> >> >> >>
> >> >> >>         while(sections.hasNext()){
> >> >> >>             Section section = sections.next();
> >> >> >>             Iterator<Span> chunks =
> >> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >> >> >>             while(chunks.hasNext()){
> >> >> >>                 Span chunk = chunks.next();
> >> >> >>                 Value<PhraseTag> phrase =
> >> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >> >> >>                 if(phrase.value().getCategory() ==
> >> >> LexicalCategory.Noun){
> >> >> >>                     log.info(" - NounPhrase [{},{}] {}", new
> >> Object[]{
> >> >> >>
> >> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >> >> >>                 }
> >> >> >>             }
> >> >> >>         }
> >> >> >>
> >> >> >> hope this helps
> >> >> >>
> >> >> >> best
> >> >> >> Rupert
> >> >> >>
> >> >> >> [1]
> >> >> >>
> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >> >> >>
> >> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >> >> >> <cr...@gmail.com> wrote:
> >> >> >> > I started to implement the engine and I'm having problems with
> >> getting
> >> >> >> > results for noun phrases. I modified the "default" weighted
> chain
> >> to
> >> >> also
> >> >> >> > include the PosChunkerEngine and ran a sample text : "Angela
> Merkel
> >> >> >> visted
> >> >> >> > China. The german chancellor met with various people". I
> expected
> >> that
> >> >> >> the
> >> >> >> > RDF XML output would contain some info about the noun phrases
> but I
> >> >> >> cannot
> >> >> >> > see any.
> >> >> >> > Could you point me to the correct way to generate the noun
> phrases?
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Cristian
> >> >> >> >
> >> >> >> >
> >> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >> >> >> cristian.petroaca@gmail.com>:
> >> >> >> >
> >> >> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >> >> >> cristian.petroaca@gmail.com>
> >> >> >> >> :
> >> >> >> >>
> >> >> >> >> Hi Rupert,
> >> >> >> >>>
> >> >> >> >>> The "spatial" dimension is a good idea. I'll also take a look
> at
> >> >> Yago.
> >> >> >> >>>
> >> >> >> >>> I will create a Jira with what we talked about here. It will
> >> >> probably
> >> >> >> >>> have just a draft-like description for now and will be updated
> >> as I
> >> >> go
> >> >> >> >>> along.
> >> >> >> >>>
> >> >> >> >>> Thanks,
> >> >> >> >>> Cristian
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> >> >> >> >>> rupert.westenthaler@gmail.com>:
> >> >> >> >>>
> >> >> >> >>> Hi Cristian,
> >> >> >> >>>>
> >> >> >> >>>> definitely an interesting approach. You should have a look at
> >> Yago2
> >> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much
> better
> >> >> >> >>>> structured as the one used by dbpedia. Mapping suggestions of
> >> >> dbpedia
> >> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do
> >> provide
> >> >> >> >>>> mappings [2] and [3]
> >> >> >> >>>>
> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
> >> >> >> >>>> >>
> >> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company
> >> made
> >> >> a
> >> >> >> >>>> >> huge profit".
> >> >> >> >>>>
> >> >> >> >>>> Thats actually a very good example. Spatial contexts are very
> >> >> >> >>>> important as they tend to be often used for referencing. So I
> >> would
> >> >> >> >>>> suggest to specially treat the spatial context. For spatial
> >> >> Entities
> >> >> >> >>>> (like a City) this is easy, but even for other (like a
> Person,
> >> >> >> >>>> Company) you could use relations to spatial entities define
> >> their
> >> >> >> >>>> spatial context. This context could than be used to correctly
> >> link
> >> >> >> >>>> "The Redmond's company" to "Microsoft".
> >> >> >> >>>>
> >> >> >> >>>> In addition I would suggest to use the "spatial" context of
> each
> >> >> >> >>>> entity (basically relation to entities that are cities,
> regions,
> >> >> >> >>>> countries) as a separate dimension, because those are very
> often
> >> >> used
> >> >> >> >>>> for coreferences.
> >> >> >> >>>>
> >> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> >> >> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >> >> >> >>>> [3]
> >> >> >> >>>>
> >> >> >>
> >> >>
> >>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
> >> >> >> >>>> <cr...@gmail.com> wrote:
> >> >> >> >>>> > There are several dbpedia categories for each entity, in
> this
> >> >> case
> >> >> >> for
> >> >> >> >>>> > Microsoft we have :
> >> >> >> >>>> >
> >> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >> >> >> >>>> > category:Microsoft
> >> >> >> >>>> > category:Software_companies_of_the_United_States
> >> >> >> >>>> > category:Software_companies_based_in_Washington_(state)
> >> >> >> >>>> > category:Companies_established_in_1975
> >> >> >> >>>> > category:1975_establishments_in_the_United_States
> >> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
> >> >> >> >>>> >
> >> >> category:Multinational_companies_headquartered_in_the_United_States
> >> >> >> >>>> > category:Cloud_computing_providers
> >> >> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
> >> >> >> >>>> >
> >> >> >> >>>> > So we also have "Companies based in Redmont,Washington"
> which
> >> >> could
> >> >> >> be
> >> >> >> >>>> > matched.
> >> >> >> >>>> >
> >> >> >> >>>> >
> >> >> >> >>>> > There is still other contextual information from dbpedia
> which
> >> >> can
> >> >> >> be
> >> >> >> >>>> used.
> >> >> >> >>>> > For example for an Organization we could also include :
> >> >> >> >>>> > dbpprop:industry = Software
> >> >> >> >>>> > dbpprop:service = Online Service Providers
> >> >> >> >>>> >
> >> >> >> >>>> > and for a Person (that's for Barack Obama) :
> >> >> >> >>>> >
> >> >> >> >>>> > dbpedia-owl:profession:
> >> >> >> >>>> >                                dbpedia:Author
> >> >> >> >>>> >                                dbpedia:Constitutional_law
> >> >> >> >>>> >                                dbpedia:Lawyer
> >> >> >> >>>> >                                dbpedia:Community_organizing
> >> >> >> >>>> >
> >> >> >> >>>> > I'd like to continue investigating this as I think that it
> may
> >> >> have
> >> >> >> >>>> some
> >> >> >> >>>> > value in increasing the number of coreference resolutions
> and
> >> I'd
> >> >> >> like
> >> >> >> >>>> to
> >> >> >> >>>> > concentrate more on precision rather than recall since we
> >> already
> >> >> >> have
> >> >> >> >>>> a
> >> >> >> >>>> > set of coreferences detected by the stanford nlp tool and
> this
> >> >> would
> >> >> >> >>>> be as
> >> >> >> >>>> > an addition to that (at least this is how I would like to
> use
> >> >> it).
> >> >> >> >>>> >
> >> >> >> >>>> > Is it ok if I track this by opening a jira? I could update
> it
> >> to
> >> >> >> show
> >> >> >> >>>> my
> >> >> >> >>>> > progress and also my conclusions and if it turns out that
> it
> >> was
> >> >> a
> >> >> >> bad
> >> >> >> >>>> idea
> >> >> >> >>>> > then that's the situation at least I'll end up with more
> >> >> knowledge
> >> >> >> >>>> about
> >> >> >> >>>> > Stanbol in the end :).
> >> >> >> >>>> >
> >> >> >> >>>> >
> >> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
> >> >> >> >>>> >
> >> >> >> >>>> >> Hi Cristian,
> >> >> >> >>>> >>
> >> >> >> >>>> >> The approach sounds nice. I don't want to be the devil's
> >> >> advocate
> >> >> >> but
> >> >> >> >>>> I'm
> >> >> >> >>>> >> just not sure about the recall using the dbpedia
> categories
> >> >> >> feature.
> >> >> >> >>>> For
> >> >> >> >>>> >> example, your sentence could be also "Microsoft posted its
> >> 2013
> >> >> >> >>>> earnings.
> >> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
> >> including
> >> >> more
> >> >> >> >>>> >> contextual information from dbpedia could increase the
> recall
> >> >> but
> >> >> >> of
> >> >> >> >>>> course
> >> >> >> >>>> >> will reduce the precision.
> >> >> >> >>>> >>
> >> >> >> >>>> >> Cheers,
> >> >> >> >>>> >> Rafa
> >> >> >> >>>> >>
> >> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >> >> >> >>>> >>
> >> >> >> >>>> >>  Back with a more detailed description of the steps for
> >> making
> >> >> this
> >> >> >> >>>> kind of
> >> >> >> >>>> >>> coreference work.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> I will be using references to the following text in the
> >> steps
> >> >> >> below
> >> >> >> >>>> in
> >> >> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013
> >> >> >> earnings.
> >> >> >> >>>> The
> >> >> >> >>>> >>> software company made a huge profit."
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> 1. For every noun phrase in the text which has :
> >> >> >> >>>> >>>      a. a determinate pos which implies reference to an
> >> entity
> >> >> >> local
> >> >> >> >>>> to
> >> >> >> >>>> >>> the
> >> >> >> >>>> >>> text, such as "the, this, these") but not "another,
> every",
> >> etc
> >> >> >> which
> >> >> >> >>>> >>> implies a reference to an entity outside of the text.
> >> >> >> >>>> >>>      b. having at least another noun aside from the main
> >> >> required
> >> >> >> >>>> noun
> >> >> >> >>>> >>> which
> >> >> >> >>>> >>> further describes it. For example I will not count "The
> >> >> company"
> >> >> >> as
> >> >> >> >>>> being
> >> >> >> >>>> >>> a
> >> >> >> >>>> >>> legitimate candidate since this could create a lot of
> false
> >> >> >> >>>> positives by
> >> >> >> >>>> >>> considering the double meaning of some words such as "in
> the
> >> >> >> company
> >> >> >> >>>> of
> >> >> >> >>>> >>> good people".
> >> >> >> >>>> >>> "The software company" is a good candidate since we also
> >> have
> >> >> >> >>>> "software".
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of
> the
> >> >> >> dbpedia
> >> >> >> >>>> >>> categories of each named entity found prior to the
> location
> >> of
> >> >> the
> >> >> >> >>>> noun
> >> >> >> >>>> >>> phrase in the text.
> >> >> >> >>>> >>> The dbpedia categories are in the following format (for
> >> >> Microsoft
> >> >> >> for
> >> >> >> >>>> >>> example) : "Software companies of the United States".
> >> >> >> >>>> >>>   So we try to match "software company" with that.
> >> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia
> category
> >> >> has a
> >> >> >> >>>> plural
> >> >> >> >>>> >>> form and it's the same for all categories which I saw. I
> >> don't
> >> >> >> know
> >> >> >> >>>> if
> >> >> >> >>>> >>> there's an easier way to do this but I thought of
> applying a
> >> >> >> >>>> lemmatizer on
> >> >> >> >>>> >>> the category and the noun phrase in order for them to
> have a
> >> >> >> common
> >> >> >> >>>> >>> denominator.This also works if the noun phrase itself
> has a
> >> >> plural
> >> >> >> >>>> form.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> Second, I'll need to use for comparison only the words in
> >> the
> >> >> >> >>>> category
> >> >> >> >>>> >>> which are themselves nouns and not prepositions or
> >> determiners
> >> >> >> such
> >> >> >> >>>> as "of
> >> >> >> >>>> >>> the".This means that I need to pos tag the categories
> >> contents
> >> >> as
> >> >> >> >>>> well.
> >> >> >> >>>> >>> I was thinking of running the pos and lemma on the
> dbpedia
> >> >> >> >>>> categories when
> >> >> >> >>>> >>> building the dbpedia backed entity hub and storing them
> for
> >> >> later
> >> >> >> >>>> use - I
> >> >> >> >>>> >>> don't know how feasible this is at the moment.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> After this I can compare each noun in the noun phrase
> with
> >> the
> >> >> >> >>>> equivalent
> >> >> >> >>>> >>> nouns in the categories and based on the number of
> matches I
> >> >> can
> >> >> >> >>>> create a
> >> >> >> >>>> >>> confidence level.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type
> from
> >> >> >> dbpedia
> >> >> >> >>>> of the
> >> >> >> >>>> >>> named entity. If this matches increase the confidence
> level.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> 4. If there are multiple named entities which can match a
> >> >> certain
> >> >> >> >>>> noun
> >> >> >> >>>> >>> phrase then link the noun phrase with the closest named
> >> entity
> >> >> >> prior
> >> >> >> >>>> to it
> >> >> >> >>>> >>> in the text.
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> What do you think?
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> Cristian
> >> >> >> >>>> >>>
> >> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <
> cristian.petroaca@gmail.com>:
> >> >> >> >>>> >>>
> >> >> >> >>>> >>>  Hi Rafa,
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on
> >> it.
> >> >> I'll
> >> >> >> >>>> provide
> >> >> >> >>>> >>>> it here so that you guys can give me a feedback on it.
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> What are "locality" features?
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef
> and
> >> >> >> >>>> CherryPicker
> >> >> >> >>>> >>>> and
> >> >> >> >>>> >>>> they don't provide such a coreference.
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> Cristian
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>> Hi Cristian,
> >> >> >> >>>> >>>>
> >> >> >> >>>> >>>>> Without having more details about your concrete
> heuristic,
> >> >> in my
> >> >> >> >>>> honest
> >> >> >> >>>> >>>>> opinion, such approach could produce a lot of false
> >> >> positives. I
> >> >> >> >>>> don't
> >> >> >> >>>> >>>>> know
> >> >> >> >>>> >>>>> if you are planning to use some "locality" features to
> >> detect
> >> >> >> such
> >> >> >> >>>> >>>>> coreferences but you need to take into account that it
> is
> >> >> quite
> >> >> >> >>>> usual
> >> >> >> >>>> >>>>> that
> >> >> >> >>>> >>>>> coreferenced mentions can occurs even in different
> >> >> paragraphs.
> >> >> >> >>>> Although
> >> >> >> >>>> >>>>> I'm
> >> >> >> >>>> >>>>> not an expert in Natural Language Understanding, I
> would
> >> say
> >> >> it
> >> >> >> is
> >> >> >> >>>> quite
> >> >> >> >>>> >>>>> difficult to get decent precision/recall rates for
> >> >> coreferencing
> >> >> >> >>>> using
> >> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools
> like
> >> >> BART
> >> >> >> (
> >> >> >> >>>> >>>>> http://www.bart-coref.org/).
> >> >> >> >>>> >>>>>
> >> >> >> >>>> >>>>> Cheers,
> >> >> >> >>>> >>>>> Rafa Haro
> >> >> >> >>>> >>>>>
> >> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
> >> >> >> >>>> >>>>>
> >> >> >> >>>> >>>>>   Hi,
> >> >> >> >>>> >>>>>
> >> >> >> >>>> >>>>>> One of the necessary steps for implementing the Event
> >> >> >> extraction
> >> >> >> >>>> Engine
> >> >> >> >>>> >>>>>> feature :
> >> >> https://issues.apache.org/jira/browse/STANBOL-1121is
> >> >> >> >>>> to
> >> >> >> >>>> >>>>>> have
> >> >> >> >>>> >>>>>> coreference resolution in the given text. This is
> >> provided
> >> >> now
> >> >> >> >>>> via the
> >> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module
> is
> >> >> >> performing
> >> >> >> >>>> >>>>>> mostly
> >> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr.
> >> Obama)
> >> >> >> >>>> coreference
> >> >> >> >>>> >>>>>> resolution.
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>> In order to get more coreferences from the text I
> though
> >> of
> >> >> >> >>>> creating
> >> >> >> >>>> >>>>>> some
> >> >> >> >>>> >>>>>> logic that would detect this kind of coreference :
> >> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software
> company
> >> just
> >> >> >> >>>> announced
> >> >> >> >>>> >>>>>> its
> >> >> >> >>>> >>>>>> 2013 earnings."
> >> >> >> >>>> >>>>>> Here "The software company" obviously refers to
> "Apple".
> >> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities
> >> which
> >> >> are
> >> >> >> of
> >> >> >> >>>> the
> >> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company"
> and
> >> >> also
> >> >> >> >>>> have
> >> >> >> >>>> >>>>>> attributes which can be found in the dbpedia
> categories
> >> of
> >> >> the
> >> >> >> >>>> named
> >> >> >> >>>> >>>>>> entity, in this case "software".
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>> The detection of coreferences such as "The software
> >> >> company" in
> >> >> >> >>>> the
> >> >> >> >>>> >>>>>> text
> >> >> >> >>>> >>>>>> would also be done by either using the new Pos Tag
> Based
> >> >> Phrase
> >> >> >> >>>> >>>>>> extraction
> >> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of
> >> the
> >> >> >> >>>> sentence and
> >> >> >> >>>> >>>>>> picking up only subjects or objects.
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>> At this point I'd like to know if this kind of logic
> >> would
> >> >> be
> >> >> >> >>>> useful
> >> >> >> >>>> >>>>>> as a
> >> >> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and
> >> >> recall
> >> >> >> are
> >> >> >> >>>> good
> >> >> >> >>>> >>>>>> enough) in Stanbol?
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>> Thanks,
> >> >> >> >>>> >>>>>> Cristian
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>>>>>
> >> >> >> >>>> >>
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> --
> >> >> >> >>>> | Rupert Westenthaler
> rupert.westenthaler@gmail.com
> >> >> >> >>>> | Bodenlehenstraße 11
> >> >> ++43-699-11108907
> >> >> >> >>>> | A-5500 Bischofshofen
> >> >> >> >>>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> >> | Bodenlehenstraße 11
> ++43-699-11108907
> >> >> >> | A-5500 Bischofshofen
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian

On Mon, Mar 17, 2014 at 9:43 PM, Cristian Petroaca
<cr...@gmail.com> wrote:
> 1. Updated to the latest code and it's gone. Cool
>
> 2. I start the stable launcher -> create a new instance of the
> PosChunkerEngine -> add it to the default chain. At this point everything
> looks good and works ok.
> After I restart the server the default chain is gone and instead I see this
> in the enhancement chains page : all-active (default, id: 149, ranking: 0,
> impl: AllActiveEnginesChain ). all-active did not contain the 'default'
> word before the restart.
>

Please note the default chain selection rules as described at [1]. You
can also access chains chains under '/enhancer/chain/{chain-name}'

best
Rupert

[1] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/chains/#default-chain

> It looks like the config files are exactly what I need. Thanks.
>
>
> 2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <rupert.westenthaler@gmail.com
>>:
>
>> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > Thanks Rupert.
>> >
>> > A couple more questions/issues :
>> >
>> > 1. Whenever I start the stanbol server I'm seeing this in the console
>> > output :
>> >
>>
>> This should be fixed with STANBOL-1278 [1] [2]
>>
>> > 2. Whenever I restart the server the Weighted Chains get messed up. I
>> > usually use the 'default' chain and add my engine to it so there are 11
>> > engines in it. After the restart this chain now contains around 23
>> engines
>> > in total.
>>
>> I was not able to replicate this. What I tried was
>>
>> (1) start up the stable launcher
>> (2) add an additional engine to the default chain
>> (3) restart the launcher
>>
>> The default chain was not changed after (2) and (3). So I would need
>> further information for knowing why this is happening.
>>
>> Generally it is better to create you own chain instance as modifying
>> one that is provided by the default configuration. I would also
>> recommend that you keep your test configuration in text files and to
>> copy those to the 'stanbol/fileinstall' folder. Doing so prevent you
>> from manually entering the configuration after a software update. The
>> production-mode section [3] provides information on how to do that.
>>
>> best
>> Rupert
>>
>> [1] https://issues.apache.org/jira/browse/STANBOL-1278
>> [2] http://svn.apache.org/r1576623
>> [3] http://stanbol.apache.org/docs/trunk/production-mode
>>
>> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]: Error
>> > starting
>> >
>>  slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
>> > tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
>> > (org.osgi
>> > .framework.BundleException: Unresolved constraint in bundle
>> > org.apache.stanbol.e
>> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
>> > requirement [15
>> > 3.0] package; (&(package=javax.ws.rs
>> )(version>=0.0.0)(!(version>=2.0.0))))
>> > org.osgi.framework.BundleException: Unresolved constraint in bundle
>> > org.apache.s
>> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
>> > require
>> > ment [153.0] package; (&(package=javax.ws.rs
>> > )(version>=0.0.0)(!(version>=2.0.0))
>> > )
>> >         at
>> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>> >         at org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>> >         at
>> > org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>> >
>> >         at
>> > org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
>> > )
>> >         at java.lang.Thread.run(Unknown Source)
>> >
>> > Despite of this the server starts fine and I can use the enhancer fine.
>> Do
>> > you guys see this as well?
>> >
>> >
>> > 2. Whenever I restart the server the Weighted Chains get messed up. I
>> > usually use the 'default' chain and add my engine to it so there are 11
>> > engines in it. After the restart this chain now contains around 23
>> engines
>> > in total.
>> >
>> >
>> >
>> >
>> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
>> rupert.westenthaler@gmail.com
>> >>:
>> >
>> >> Hi Cristian,
>> >>
>> >> NER Annotations are typically available as both
>> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
>> >> enhancement metadata. As you are already accessing the AnayzedText I
>> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> [1]
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>> >>
>> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> >> <cr...@gmail.com> wrote:
>> >> > Thanks.
>> >> > I assume I should get the Named entities using the same but with
>> >> > NlpAnnotations.NER_ANNOTATION?
>> >> >
>> >> >
>> >> >
>> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> >> > rupert.westenthaler@gmail.com>:
>> >> >
>> >> >> Hallo Cristian,
>> >> >>
>> >> >> NounPhrases are not added to the RDF enhancement results. You need to
>> >> >> use the AnalyzedText ContentPart [1]
>> >> >>
>> >> >> here is some demo code you can use in the computeEnhancement method
>> >> >>
>> >> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci,
>> >> true);
>> >> >>         Iterator<? extends Section> sections = at.getSentences();
>> >> >>         if(!sections.hasNext()){ //process as single sentence
>> >> >>             sections = Collections.singleton(at).iterator();
>> >> >>         }
>> >> >>
>> >> >>         while(sections.hasNext()){
>> >> >>             Section section = sections.next();
>> >> >>             Iterator<Span> chunks =
>> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >> >>             while(chunks.hasNext()){
>> >> >>                 Span chunk = chunks.next();
>> >> >>                 Value<PhraseTag> phrase =
>> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >> >>                 if(phrase.value().getCategory() ==
>> >> LexicalCategory.Noun){
>> >> >>                     log.info(" - NounPhrase [{},{}] {}", new
>> Object[]{
>> >> >>
>> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >> >>                 }
>> >> >>             }
>> >> >>         }
>> >> >>
>> >> >> hope this helps
>> >> >>
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >> [1]
>> >> >>
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >> >>
>> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >> >> <cr...@gmail.com> wrote:
>> >> >> > I started to implement the engine and I'm having problems with
>> getting
>> >> >> > results for noun phrases. I modified the "default" weighted chain
>> to
>> >> also
>> >> >> > include the PosChunkerEngine and ran a sample text : "Angela Merkel
>> >> >> visted
>> >> >> > China. The german chancellor met with various people". I expected
>> that
>> >> >> the
>> >> >> > RDF XML output would contain some info about the noun phrases but I
>> >> >> cannot
>> >> >> > see any.
>> >> >> > Could you point me to the correct way to generate the noun phrases?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Cristian
>> >> >> >
>> >> >> >
>> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >> >> cristian.petroaca@gmail.com>:
>> >> >> >
>> >> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
>> >> >> >>
>> >> >> >>
>> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >> >> cristian.petroaca@gmail.com>
>> >> >> >> :
>> >> >> >>
>> >> >> >> Hi Rupert,
>> >> >> >>>
>> >> >> >>> The "spatial" dimension is a good idea. I'll also take a look at
>> >> Yago.
>> >> >> >>>
>> >> >> >>> I will create a Jira with what we talked about here. It will
>> >> probably
>> >> >> >>> have just a draft-like description for now and will be updated
>> as I
>> >> go
>> >> >> >>> along.
>> >> >> >>>
>> >> >> >>> Thanks,
>> >> >> >>> Cristian
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >> >> >>> rupert.westenthaler@gmail.com>:
>> >> >> >>>
>> >> >> >>> Hi Cristian,
>> >> >> >>>>
>> >> >> >>>> definitely an interesting approach. You should have a look at
>> Yago2
>> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much better
>> >> >> >>>> structured as the one used by dbpedia. Mapping suggestions of
>> >> dbpedia
>> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do
>> provide
>> >> >> >>>> mappings [2] and [3]
>> >> >> >>>>
>> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>> >> >> >>>> >>
>> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company
>> made
>> >> a
>> >> >> >>>> >> huge profit".
>> >> >> >>>>
>> >> >> >>>> Thats actually a very good example. Spatial contexts are very
>> >> >> >>>> important as they tend to be often used for referencing. So I
>> would
>> >> >> >>>> suggest to specially treat the spatial context. For spatial
>> >> Entities
>> >> >> >>>> (like a City) this is easy, but even for other (like a Person,
>> >> >> >>>> Company) you could use relations to spatial entities define
>> their
>> >> >> >>>> spatial context. This context could than be used to correctly
>> link
>> >> >> >>>> "The Redmond's company" to "Microsoft".
>> >> >> >>>>
>> >> >> >>>> In addition I would suggest to use the "spatial" context of each
>> >> >> >>>> entity (basically relation to entities that are cities, regions,
>> >> >> >>>> countries) as a separate dimension, because those are very often
>> >> used
>> >> >> >>>> for coreferences.
>> >> >> >>>>
>> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >> >> >>>> [3]
>> >> >> >>>>
>> >> >>
>> >>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> >> >> >>>> <cr...@gmail.com> wrote:
>> >> >> >>>> > There are several dbpedia categories for each entity, in this
>> >> case
>> >> >> for
>> >> >> >>>> > Microsoft we have :
>> >> >> >>>> >
>> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >> >> >>>> > category:Microsoft
>> >> >> >>>> > category:Software_companies_of_the_United_States
>> >> >> >>>> > category:Software_companies_based_in_Washington_(state)
>> >> >> >>>> > category:Companies_established_in_1975
>> >> >> >>>> > category:1975_establishments_in_the_United_States
>> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
>> >> >> >>>> >
>> >> category:Multinational_companies_headquartered_in_the_United_States
>> >> >> >>>> > category:Cloud_computing_providers
>> >> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
>> >> >> >>>> >
>> >> >> >>>> > So we also have "Companies based in Redmont,Washington" which
>> >> could
>> >> >> be
>> >> >> >>>> > matched.
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> > There is still other contextual information from dbpedia which
>> >> can
>> >> >> be
>> >> >> >>>> used.
>> >> >> >>>> > For example for an Organization we could also include :
>> >> >> >>>> > dbpprop:industry = Software
>> >> >> >>>> > dbpprop:service = Online Service Providers
>> >> >> >>>> >
>> >> >> >>>> > and for a Person (that's for Barack Obama) :
>> >> >> >>>> >
>> >> >> >>>> > dbpedia-owl:profession:
>> >> >> >>>> >                                dbpedia:Author
>> >> >> >>>> >                                dbpedia:Constitutional_law
>> >> >> >>>> >                                dbpedia:Lawyer
>> >> >> >>>> >                                dbpedia:Community_organizing
>> >> >> >>>> >
>> >> >> >>>> > I'd like to continue investigating this as I think that it may
>> >> have
>> >> >> >>>> some
>> >> >> >>>> > value in increasing the number of coreference resolutions and
>> I'd
>> >> >> like
>> >> >> >>>> to
>> >> >> >>>> > concentrate more on precision rather than recall since we
>> already
>> >> >> have
>> >> >> >>>> a
>> >> >> >>>> > set of coreferences detected by the stanford nlp tool and this
>> >> would
>> >> >> >>>> be as
>> >> >> >>>> > an addition to that (at least this is how I would like to use
>> >> it).
>> >> >> >>>> >
>> >> >> >>>> > Is it ok if I track this by opening a jira? I could update it
>> to
>> >> >> show
>> >> >> >>>> my
>> >> >> >>>> > progress and also my conclusions and if it turns out that it
>> was
>> >> a
>> >> >> bad
>> >> >> >>>> idea
>> >> >> >>>> > then that's the situation at least I'll end up with more
>> >> knowledge
>> >> >> >>>> about
>> >> >> >>>> > Stanbol in the end :).
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>> >> >> >>>> >
>> >> >> >>>> >> Hi Cristian,
>> >> >> >>>> >>
>> >> >> >>>> >> The approach sounds nice. I don't want to be the devil's
>> >> advocate
>> >> >> but
>> >> >> >>>> I'm
>> >> >> >>>> >> just not sure about the recall using the dbpedia categories
>> >> >> feature.
>> >> >> >>>> For
>> >> >> >>>> >> example, your sentence could be also "Microsoft posted its
>> 2013
>> >> >> >>>> earnings.
>> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
>> including
>> >> more
>> >> >> >>>> >> contextual information from dbpedia could increase the recall
>> >> but
>> >> >> of
>> >> >> >>>> course
>> >> >> >>>> >> will reduce the precision.
>> >> >> >>>> >>
>> >> >> >>>> >> Cheers,
>> >> >> >>>> >> Rafa
>> >> >> >>>> >>
>> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >> >> >>>> >>
>> >> >> >>>> >>  Back with a more detailed description of the steps for
>> making
>> >> this
>> >> >> >>>> kind of
>> >> >> >>>> >>> coreference work.
>> >> >> >>>> >>>
>> >> >> >>>> >>> I will be using references to the following text in the
>> steps
>> >> >> below
>> >> >> >>>> in
>> >> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013
>> >> >> earnings.
>> >> >> >>>> The
>> >> >> >>>> >>> software company made a huge profit."
>> >> >> >>>> >>>
>> >> >> >>>> >>> 1. For every noun phrase in the text which has :
>> >> >> >>>> >>>      a. a determinate pos which implies reference to an
>> entity
>> >> >> local
>> >> >> >>>> to
>> >> >> >>>> >>> the
>> >> >> >>>> >>> text, such as "the, this, these") but not "another, every",
>> etc
>> >> >> which
>> >> >> >>>> >>> implies a reference to an entity outside of the text.
>> >> >> >>>> >>>      b. having at least another noun aside from the main
>> >> required
>> >> >> >>>> noun
>> >> >> >>>> >>> which
>> >> >> >>>> >>> further describes it. For example I will not count "The
>> >> company"
>> >> >> as
>> >> >> >>>> being
>> >> >> >>>> >>> a
>> >> >> >>>> >>> legitimate candidate since this could create a lot of false
>> >> >> >>>> positives by
>> >> >> >>>> >>> considering the double meaning of some words such as "in the
>> >> >> company
>> >> >> >>>> of
>> >> >> >>>> >>> good people".
>> >> >> >>>> >>> "The software company" is a good candidate since we also
>> have
>> >> >> >>>> "software".
>> >> >> >>>> >>>
>> >> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of the
>> >> >> dbpedia
>> >> >> >>>> >>> categories of each named entity found prior to the location
>> of
>> >> the
>> >> >> >>>> noun
>> >> >> >>>> >>> phrase in the text.
>> >> >> >>>> >>> The dbpedia categories are in the following format (for
>> >> Microsoft
>> >> >> for
>> >> >> >>>> >>> example) : "Software companies of the United States".
>> >> >> >>>> >>>   So we try to match "software company" with that.
>> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia category
>> >> has a
>> >> >> >>>> plural
>> >> >> >>>> >>> form and it's the same for all categories which I saw. I
>> don't
>> >> >> know
>> >> >> >>>> if
>> >> >> >>>> >>> there's an easier way to do this but I thought of applying a
>> >> >> >>>> lemmatizer on
>> >> >> >>>> >>> the category and the noun phrase in order for them to have a
>> >> >> common
>> >> >> >>>> >>> denominator.This also works if the noun phrase itself has a
>> >> plural
>> >> >> >>>> form.
>> >> >> >>>> >>>
>> >> >> >>>> >>> Second, I'll need to use for comparison only the words in
>> the
>> >> >> >>>> category
>> >> >> >>>> >>> which are themselves nouns and not prepositions or
>> determiners
>> >> >> such
>> >> >> >>>> as "of
>> >> >> >>>> >>> the".This means that I need to pos tag the categories
>> contents
>> >> as
>> >> >> >>>> well.
>> >> >> >>>> >>> I was thinking of running the pos and lemma on the dbpedia
>> >> >> >>>> categories when
>> >> >> >>>> >>> building the dbpedia backed entity hub and storing them for
>> >> later
>> >> >> >>>> use - I
>> >> >> >>>> >>> don't know how feasible this is at the moment.
>> >> >> >>>> >>>
>> >> >> >>>> >>> After this I can compare each noun in the noun phrase with
>> the
>> >> >> >>>> equivalent
>> >> >> >>>> >>> nouns in the categories and based on the number of matches I
>> >> can
>> >> >> >>>> create a
>> >> >> >>>> >>> confidence level.
>> >> >> >>>> >>>
>> >> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type from
>> >> >> dbpedia
>> >> >> >>>> of the
>> >> >> >>>> >>> named entity. If this matches increase the confidence level.
>> >> >> >>>> >>>
>> >> >> >>>> >>> 4. If there are multiple named entities which can match a
>> >> certain
>> >> >> >>>> noun
>> >> >> >>>> >>> phrase then link the noun phrase with the closest named
>> entity
>> >> >> prior
>> >> >> >>>> to it
>> >> >> >>>> >>> in the text.
>> >> >> >>>> >>>
>> >> >> >>>> >>> What do you think?
>> >> >> >>>> >>>
>> >> >> >>>> >>> Cristian
>> >> >> >>>> >>>
>> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
>> >> >> >>>> >>>
>> >> >> >>>> >>>  Hi Rafa,
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on
>> it.
>> >> I'll
>> >> >> >>>> provide
>> >> >> >>>> >>>> it here so that you guys can give me a feedback on it.
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> What are "locality" features?
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef and
>> >> >> >>>> CherryPicker
>> >> >> >>>> >>>> and
>> >> >> >>>> >>>> they don't provide such a coreference.
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> Cristian
>> >> >> >>>> >>>>
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>> >> >> >>>> >>>>
>> >> >> >>>> >>>> Hi Cristian,
>> >> >> >>>> >>>>
>> >> >> >>>> >>>>> Without having more details about your concrete heuristic,
>> >> in my
>> >> >> >>>> honest
>> >> >> >>>> >>>>> opinion, such approach could produce a lot of false
>> >> positives. I
>> >> >> >>>> don't
>> >> >> >>>> >>>>> know
>> >> >> >>>> >>>>> if you are planning to use some "locality" features to
>> detect
>> >> >> such
>> >> >> >>>> >>>>> coreferences but you need to take into account that it is
>> >> quite
>> >> >> >>>> usual
>> >> >> >>>> >>>>> that
>> >> >> >>>> >>>>> coreferenced mentions can occurs even in different
>> >> paragraphs.
>> >> >> >>>> Although
>> >> >> >>>> >>>>> I'm
>> >> >> >>>> >>>>> not an expert in Natural Language Understanding, I would
>> say
>> >> it
>> >> >> is
>> >> >> >>>> quite
>> >> >> >>>> >>>>> difficult to get decent precision/recall rates for
>> >> coreferencing
>> >> >> >>>> using
>> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools like
>> >> BART
>> >> >> (
>> >> >> >>>> >>>>> http://www.bart-coref.org/).
>> >> >> >>>> >>>>>
>> >> >> >>>> >>>>> Cheers,
>> >> >> >>>> >>>>> Rafa Haro
>> >> >> >>>> >>>>>
>> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >> >> >>>> >>>>>
>> >> >> >>>> >>>>>   Hi,
>> >> >> >>>> >>>>>
>> >> >> >>>> >>>>>> One of the necessary steps for implementing the Event
>> >> >> extraction
>> >> >> >>>> Engine
>> >> >> >>>> >>>>>> feature :
>> >> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >> >> >>>> to
>> >> >> >>>> >>>>>> have
>> >> >> >>>> >>>>>> coreference resolution in the given text. This is
>> provided
>> >> now
>> >> >> >>>> via the
>> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module is
>> >> >> performing
>> >> >> >>>> >>>>>> mostly
>> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr.
>> Obama)
>> >> >> >>>> coreference
>> >> >> >>>> >>>>>> resolution.
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>> In order to get more coreferences from the text I though
>> of
>> >> >> >>>> creating
>> >> >> >>>> >>>>>> some
>> >> >> >>>> >>>>>> logic that would detect this kind of coreference :
>> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software company
>> just
>> >> >> >>>> announced
>> >> >> >>>> >>>>>> its
>> >> >> >>>> >>>>>> 2013 earnings."
>> >> >> >>>> >>>>>> Here "The software company" obviously refers to "Apple".
>> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities
>> which
>> >> are
>> >> >> of
>> >> >> >>>> the
>> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company" and
>> >> also
>> >> >> >>>> have
>> >> >> >>>> >>>>>> attributes which can be found in the dbpedia categories
>> of
>> >> the
>> >> >> >>>> named
>> >> >> >>>> >>>>>> entity, in this case "software".
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>> The detection of coreferences such as "The software
>> >> company" in
>> >> >> >>>> the
>> >> >> >>>> >>>>>> text
>> >> >> >>>> >>>>>> would also be done by either using the new Pos Tag Based
>> >> Phrase
>> >> >> >>>> >>>>>> extraction
>> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of
>> the
>> >> >> >>>> sentence and
>> >> >> >>>> >>>>>> picking up only subjects or objects.
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>> At this point I'd like to know if this kind of logic
>> would
>> >> be
>> >> >> >>>> useful
>> >> >> >>>> >>>>>> as a
>> >> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and
>> >> recall
>> >> >> are
>> >> >> >>>> good
>> >> >> >>>> >>>>>> enough) in Stanbol?
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>> Thanks,
>> >> >> >>>> >>>>>> Cristian
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>>>>>
>> >> >> >>>> >>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> --
>> >> >> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> >>>> | Bodenlehenstraße 11
>> >> ++43-699-11108907
>> >> >> >>>> | A-5500 Bischofshofen
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
1. Updated to the latest code and it's gone. Cool

2. I start the stable launcher -> create a new instance of the
PosChunkerEngine -> add it to the default chain. At this point everything
looks good and works ok.
After I restart the server the default chain is gone and instead I see this
in the enhancement chains page : all-active (default, id: 149, ranking: 0,
impl: AllActiveEnginesChain ). all-active did not contain the 'default'
word before the restart.

It looks like the config files are exactly what I need. Thanks.


2014-03-17 9:26 GMT+02:00 Rupert Westenthaler <rupert.westenthaler@gmail.com
>:

> On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > Thanks Rupert.
> >
> > A couple more questions/issues :
> >
> > 1. Whenever I start the stanbol server I'm seeing this in the console
> > output :
> >
>
> This should be fixed with STANBOL-1278 [1] [2]
>
> > 2. Whenever I restart the server the Weighted Chains get messed up. I
> > usually use the 'default' chain and add my engine to it so there are 11
> > engines in it. After the restart this chain now contains around 23
> engines
> > in total.
>
> I was not able to replicate this. What I tried was
>
> (1) start up the stable launcher
> (2) add an additional engine to the default chain
> (3) restart the launcher
>
> The default chain was not changed after (2) and (3). So I would need
> further information for knowing why this is happening.
>
> Generally it is better to create you own chain instance as modifying
> one that is provided by the default configuration. I would also
> recommend that you keep your test configuration in text files and to
> copy those to the 'stanbol/fileinstall' folder. Doing so prevent you
> from manually entering the configuration after a software update. The
> production-mode section [3] provides information on how to do that.
>
> best
> Rupert
>
> [1] https://issues.apache.org/jira/browse/STANBOL-1278
> [2] http://svn.apache.org/r1576623
> [3] http://stanbol.apache.org/docs/trunk/production-mode
>
> > ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]: Error
> > starting
> >
>  slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
> > tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
> > (org.osgi
> > .framework.BundleException: Unresolved constraint in bundle
> > org.apache.stanbol.e
> > nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
> > requirement [15
> > 3.0] package; (&(package=javax.ws.rs
> )(version>=0.0.0)(!(version>=2.0.0))))
> > org.osgi.framework.BundleException: Unresolved constraint in bundle
> > org.apache.s
> > tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
> > require
> > ment [153.0] package; (&(package=javax.ws.rs
> > )(version>=0.0.0)(!(version>=2.0.0))
> > )
> >         at
> org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
> >         at org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
> >         at
> > org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
> >
> >         at
> > org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
> > )
> >         at java.lang.Thread.run(Unknown Source)
> >
> > Despite of this the server starts fine and I can use the enhancer fine.
> Do
> > you guys see this as well?
> >
> >
> > 2. Whenever I restart the server the Weighted Chains get messed up. I
> > usually use the 'default' chain and add my engine to it so there are 11
> > engines in it. After the restart this chain now contains around 23
> engines
> > in total.
> >
> >
> >
> >
> > 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <
> rupert.westenthaler@gmail.com
> >>:
> >
> >> Hi Cristian,
> >>
> >> NER Annotations are typically available as both
> >> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
> >> enhancement metadata. As you are already accessing the AnayzedText I
> >> would prefer using the  NlpAnnotations.NER_ANNOTATION.
> >>
> >> best
> >> Rupert
> >>
> >> [1]
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
> >>
> >> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> >> <cr...@gmail.com> wrote:
> >> > Thanks.
> >> > I assume I should get the Named entities using the same but with
> >> > NlpAnnotations.NER_ANNOTATION?
> >> >
> >> >
> >> >
> >> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> >> > rupert.westenthaler@gmail.com>:
> >> >
> >> >> Hallo Cristian,
> >> >>
> >> >> NounPhrases are not added to the RDF enhancement results. You need to
> >> >> use the AnalyzedText ContentPart [1]
> >> >>
> >> >> here is some demo code you can use in the computeEnhancement method
> >> >>
> >> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci,
> >> true);
> >> >>         Iterator<? extends Section> sections = at.getSentences();
> >> >>         if(!sections.hasNext()){ //process as single sentence
> >> >>             sections = Collections.singleton(at).iterator();
> >> >>         }
> >> >>
> >> >>         while(sections.hasNext()){
> >> >>             Section section = sections.next();
> >> >>             Iterator<Span> chunks =
> >> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >> >>             while(chunks.hasNext()){
> >> >>                 Span chunk = chunks.next();
> >> >>                 Value<PhraseTag> phrase =
> >> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >> >>                 if(phrase.value().getCategory() ==
> >> LexicalCategory.Noun){
> >> >>                     log.info(" - NounPhrase [{},{}] {}", new
> Object[]{
> >> >>
> >> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >> >>                 }
> >> >>             }
> >> >>         }
> >> >>
> >> >> hope this helps
> >> >>
> >> >> best
> >> >> Rupert
> >> >>
> >> >> [1]
> >> >>
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >> >>
> >> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >> >> <cr...@gmail.com> wrote:
> >> >> > I started to implement the engine and I'm having problems with
> getting
> >> >> > results for noun phrases. I modified the "default" weighted chain
> to
> >> also
> >> >> > include the PosChunkerEngine and ran a sample text : "Angela Merkel
> >> >> visted
> >> >> > China. The german chancellor met with various people". I expected
> that
> >> >> the
> >> >> > RDF XML output would contain some info about the noun phrases but I
> >> >> cannot
> >> >> > see any.
> >> >> > Could you point me to the correct way to generate the noun phrases?
> >> >> >
> >> >> > Thanks,
> >> >> > Cristian
> >> >> >
> >> >> >
> >> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >> >> cristian.petroaca@gmail.com>:
> >> >> >
> >> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
> >> >> >>
> >> >> >>
> >> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >> >> cristian.petroaca@gmail.com>
> >> >> >> :
> >> >> >>
> >> >> >> Hi Rupert,
> >> >> >>>
> >> >> >>> The "spatial" dimension is a good idea. I'll also take a look at
> >> Yago.
> >> >> >>>
> >> >> >>> I will create a Jira with what we talked about here. It will
> >> probably
> >> >> >>> have just a draft-like description for now and will be updated
> as I
> >> go
> >> >> >>> along.
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Cristian
> >> >> >>>
> >> >> >>>
> >> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> >> >> >>> rupert.westenthaler@gmail.com>:
> >> >> >>>
> >> >> >>> Hi Cristian,
> >> >> >>>>
> >> >> >>>> definitely an interesting approach. You should have a look at
> Yago2
> >> >> >>>> [1]. As far as I can remember the Yago taxonomy is much better
> >> >> >>>> structured as the one used by dbpedia. Mapping suggestions of
> >> dbpedia
> >> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do
> provide
> >> >> >>>> mappings [2] and [3]
> >> >> >>>>
> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
> >> >> >>>> >>
> >> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company
> made
> >> a
> >> >> >>>> >> huge profit".
> >> >> >>>>
> >> >> >>>> Thats actually a very good example. Spatial contexts are very
> >> >> >>>> important as they tend to be often used for referencing. So I
> would
> >> >> >>>> suggest to specially treat the spatial context. For spatial
> >> Entities
> >> >> >>>> (like a City) this is easy, but even for other (like a Person,
> >> >> >>>> Company) you could use relations to spatial entities define
> their
> >> >> >>>> spatial context. This context could than be used to correctly
> link
> >> >> >>>> "The Redmond's company" to "Microsoft".
> >> >> >>>>
> >> >> >>>> In addition I would suggest to use the "spatial" context of each
> >> >> >>>> entity (basically relation to entities that are cities, regions,
> >> >> >>>> countries) as a separate dimension, because those are very often
> >> used
> >> >> >>>> for coreferences.
> >> >> >>>>
> >> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> >> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >> >> >>>> [3]
> >> >> >>>>
> >> >>
> >>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
> >> >> >>>> <cr...@gmail.com> wrote:
> >> >> >>>> > There are several dbpedia categories for each entity, in this
> >> case
> >> >> for
> >> >> >>>> > Microsoft we have :
> >> >> >>>> >
> >> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >> >> >>>> > category:Microsoft
> >> >> >>>> > category:Software_companies_of_the_United_States
> >> >> >>>> > category:Software_companies_based_in_Washington_(state)
> >> >> >>>> > category:Companies_established_in_1975
> >> >> >>>> > category:1975_establishments_in_the_United_States
> >> >> >>>> > category:Companies_based_in_Redmond,_Washington
> >> >> >>>> >
> >> category:Multinational_companies_headquartered_in_the_United_States
> >> >> >>>> > category:Cloud_computing_providers
> >> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
> >> >> >>>> >
> >> >> >>>> > So we also have "Companies based in Redmont,Washington" which
> >> could
> >> >> be
> >> >> >>>> > matched.
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> > There is still other contextual information from dbpedia which
> >> can
> >> >> be
> >> >> >>>> used.
> >> >> >>>> > For example for an Organization we could also include :
> >> >> >>>> > dbpprop:industry = Software
> >> >> >>>> > dbpprop:service = Online Service Providers
> >> >> >>>> >
> >> >> >>>> > and for a Person (that's for Barack Obama) :
> >> >> >>>> >
> >> >> >>>> > dbpedia-owl:profession:
> >> >> >>>> >                                dbpedia:Author
> >> >> >>>> >                                dbpedia:Constitutional_law
> >> >> >>>> >                                dbpedia:Lawyer
> >> >> >>>> >                                dbpedia:Community_organizing
> >> >> >>>> >
> >> >> >>>> > I'd like to continue investigating this as I think that it may
> >> have
> >> >> >>>> some
> >> >> >>>> > value in increasing the number of coreference resolutions and
> I'd
> >> >> like
> >> >> >>>> to
> >> >> >>>> > concentrate more on precision rather than recall since we
> already
> >> >> have
> >> >> >>>> a
> >> >> >>>> > set of coreferences detected by the stanford nlp tool and this
> >> would
> >> >> >>>> be as
> >> >> >>>> > an addition to that (at least this is how I would like to use
> >> it).
> >> >> >>>> >
> >> >> >>>> > Is it ok if I track this by opening a jira? I could update it
> to
> >> >> show
> >> >> >>>> my
> >> >> >>>> > progress and also my conclusions and if it turns out that it
> was
> >> a
> >> >> bad
> >> >> >>>> idea
> >> >> >>>> > then that's the situation at least I'll end up with more
> >> knowledge
> >> >> >>>> about
> >> >> >>>> > Stanbol in the end :).
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
> >> >> >>>> >
> >> >> >>>> >> Hi Cristian,
> >> >> >>>> >>
> >> >> >>>> >> The approach sounds nice. I don't want to be the devil's
> >> advocate
> >> >> but
> >> >> >>>> I'm
> >> >> >>>> >> just not sure about the recall using the dbpedia categories
> >> >> feature.
> >> >> >>>> For
> >> >> >>>> >> example, your sentence could be also "Microsoft posted its
> 2013
> >> >> >>>> earnings.
> >> >> >>>> >> The Redmond's company made a huge profit". So, maybe
> including
> >> more
> >> >> >>>> >> contextual information from dbpedia could increase the recall
> >> but
> >> >> of
> >> >> >>>> course
> >> >> >>>> >> will reduce the precision.
> >> >> >>>> >>
> >> >> >>>> >> Cheers,
> >> >> >>>> >> Rafa
> >> >> >>>> >>
> >> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >> >> >>>> >>
> >> >> >>>> >>  Back with a more detailed description of the steps for
> making
> >> this
> >> >> >>>> kind of
> >> >> >>>> >>> coreference work.
> >> >> >>>> >>>
> >> >> >>>> >>> I will be using references to the following text in the
> steps
> >> >> below
> >> >> >>>> in
> >> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013
> >> >> earnings.
> >> >> >>>> The
> >> >> >>>> >>> software company made a huge profit."
> >> >> >>>> >>>
> >> >> >>>> >>> 1. For every noun phrase in the text which has :
> >> >> >>>> >>>      a. a determinate pos which implies reference to an
> entity
> >> >> local
> >> >> >>>> to
> >> >> >>>> >>> the
> >> >> >>>> >>> text, such as "the, this, these") but not "another, every",
> etc
> >> >> which
> >> >> >>>> >>> implies a reference to an entity outside of the text.
> >> >> >>>> >>>      b. having at least another noun aside from the main
> >> required
> >> >> >>>> noun
> >> >> >>>> >>> which
> >> >> >>>> >>> further describes it. For example I will not count "The
> >> company"
> >> >> as
> >> >> >>>> being
> >> >> >>>> >>> a
> >> >> >>>> >>> legitimate candidate since this could create a lot of false
> >> >> >>>> positives by
> >> >> >>>> >>> considering the double meaning of some words such as "in the
> >> >> company
> >> >> >>>> of
> >> >> >>>> >>> good people".
> >> >> >>>> >>> "The software company" is a good candidate since we also
> have
> >> >> >>>> "software".
> >> >> >>>> >>>
> >> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of the
> >> >> dbpedia
> >> >> >>>> >>> categories of each named entity found prior to the location
> of
> >> the
> >> >> >>>> noun
> >> >> >>>> >>> phrase in the text.
> >> >> >>>> >>> The dbpedia categories are in the following format (for
> >> Microsoft
> >> >> for
> >> >> >>>> >>> example) : "Software companies of the United States".
> >> >> >>>> >>>   So we try to match "software company" with that.
> >> >> >>>> >>> First, as you can see, the main noun in the dbpedia category
> >> has a
> >> >> >>>> plural
> >> >> >>>> >>> form and it's the same for all categories which I saw. I
> don't
> >> >> know
> >> >> >>>> if
> >> >> >>>> >>> there's an easier way to do this but I thought of applying a
> >> >> >>>> lemmatizer on
> >> >> >>>> >>> the category and the noun phrase in order for them to have a
> >> >> common
> >> >> >>>> >>> denominator.This also works if the noun phrase itself has a
> >> plural
> >> >> >>>> form.
> >> >> >>>> >>>
> >> >> >>>> >>> Second, I'll need to use for comparison only the words in
> the
> >> >> >>>> category
> >> >> >>>> >>> which are themselves nouns and not prepositions or
> determiners
> >> >> such
> >> >> >>>> as "of
> >> >> >>>> >>> the".This means that I need to pos tag the categories
> contents
> >> as
> >> >> >>>> well.
> >> >> >>>> >>> I was thinking of running the pos and lemma on the dbpedia
> >> >> >>>> categories when
> >> >> >>>> >>> building the dbpedia backed entity hub and storing them for
> >> later
> >> >> >>>> use - I
> >> >> >>>> >>> don't know how feasible this is at the moment.
> >> >> >>>> >>>
> >> >> >>>> >>> After this I can compare each noun in the noun phrase with
> the
> >> >> >>>> equivalent
> >> >> >>>> >>> nouns in the categories and based on the number of matches I
> >> can
> >> >> >>>> create a
> >> >> >>>> >>> confidence level.
> >> >> >>>> >>>
> >> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type from
> >> >> dbpedia
> >> >> >>>> of the
> >> >> >>>> >>> named entity. If this matches increase the confidence level.
> >> >> >>>> >>>
> >> >> >>>> >>> 4. If there are multiple named entities which can match a
> >> certain
> >> >> >>>> noun
> >> >> >>>> >>> phrase then link the noun phrase with the closest named
> entity
> >> >> prior
> >> >> >>>> to it
> >> >> >>>> >>> in the text.
> >> >> >>>> >>>
> >> >> >>>> >>> What do you think?
> >> >> >>>> >>>
> >> >> >>>> >>> Cristian
> >> >> >>>> >>>
> >> >> >>>> >>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
> >> >> >>>> >>>
> >> >> >>>> >>>  Hi Rafa,
> >> >> >>>> >>>>
> >> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on
> it.
> >> I'll
> >> >> >>>> provide
> >> >> >>>> >>>> it here so that you guys can give me a feedback on it.
> >> >> >>>> >>>>
> >> >> >>>> >>>> What are "locality" features?
> >> >> >>>> >>>>
> >> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef and
> >> >> >>>> CherryPicker
> >> >> >>>> >>>> and
> >> >> >>>> >>>> they don't provide such a coreference.
> >> >> >>>> >>>>
> >> >> >>>> >>>> Cristian
> >> >> >>>> >>>>
> >> >> >>>> >>>>
> >> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
> >> >> >>>> >>>>
> >> >> >>>> >>>> Hi Cristian,
> >> >> >>>> >>>>
> >> >> >>>> >>>>> Without having more details about your concrete heuristic,
> >> in my
> >> >> >>>> honest
> >> >> >>>> >>>>> opinion, such approach could produce a lot of false
> >> positives. I
> >> >> >>>> don't
> >> >> >>>> >>>>> know
> >> >> >>>> >>>>> if you are planning to use some "locality" features to
> detect
> >> >> such
> >> >> >>>> >>>>> coreferences but you need to take into account that it is
> >> quite
> >> >> >>>> usual
> >> >> >>>> >>>>> that
> >> >> >>>> >>>>> coreferenced mentions can occurs even in different
> >> paragraphs.
> >> >> >>>> Although
> >> >> >>>> >>>>> I'm
> >> >> >>>> >>>>> not an expert in Natural Language Understanding, I would
> say
> >> it
> >> >> is
> >> >> >>>> quite
> >> >> >>>> >>>>> difficult to get decent precision/recall rates for
> >> coreferencing
> >> >> >>>> using
> >> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools like
> >> BART
> >> >> (
> >> >> >>>> >>>>> http://www.bart-coref.org/).
> >> >> >>>> >>>>>
> >> >> >>>> >>>>> Cheers,
> >> >> >>>> >>>>> Rafa Haro
> >> >> >>>> >>>>>
> >> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
> >> >> >>>> >>>>>
> >> >> >>>> >>>>>   Hi,
> >> >> >>>> >>>>>
> >> >> >>>> >>>>>> One of the necessary steps for implementing the Event
> >> >> extraction
> >> >> >>>> Engine
> >> >> >>>> >>>>>> feature :
> >> https://issues.apache.org/jira/browse/STANBOL-1121is
> >> >> >>>> to
> >> >> >>>> >>>>>> have
> >> >> >>>> >>>>>> coreference resolution in the given text. This is
> provided
> >> now
> >> >> >>>> via the
> >> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module is
> >> >> performing
> >> >> >>>> >>>>>> mostly
> >> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr.
> Obama)
> >> >> >>>> coreference
> >> >> >>>> >>>>>> resolution.
> >> >> >>>> >>>>>>
> >> >> >>>> >>>>>> In order to get more coreferences from the text I though
> of
> >> >> >>>> creating
> >> >> >>>> >>>>>> some
> >> >> >>>> >>>>>> logic that would detect this kind of coreference :
> >> >> >>>> >>>>>> "Apple reaches new profit heights. The software company
> just
> >> >> >>>> announced
> >> >> >>>> >>>>>> its
> >> >> >>>> >>>>>> 2013 earnings."
> >> >> >>>> >>>>>> Here "The software company" obviously refers to "Apple".
> >> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities
> which
> >> are
> >> >> of
> >> >> >>>> the
> >> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company" and
> >> also
> >> >> >>>> have
> >> >> >>>> >>>>>> attributes which can be found in the dbpedia categories
> of
> >> the
> >> >> >>>> named
> >> >> >>>> >>>>>> entity, in this case "software".
> >> >> >>>> >>>>>>
> >> >> >>>> >>>>>> The detection of coreferences such as "The software
> >> company" in
> >> >> >>>> the
> >> >> >>>> >>>>>> text
> >> >> >>>> >>>>>> would also be done by either using the new Pos Tag Based
> >> Phrase
> >> >> >>>> >>>>>> extraction
> >> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of
> the
> >> >> >>>> sentence and
> >> >> >>>> >>>>>> picking up only subjects or objects.
> >> >> >>>> >>>>>>
> >> >> >>>> >>>>>> At this point I'd like to know if this kind of logic
> would
> >> be
> >> >> >>>> useful
> >> >> >>>> >>>>>> as a
> >> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and
> >> recall
> >> >> are
> >> >> >>>> good
> >> >> >>>> >>>>>> enough) in Stanbol?
> >> >> >>>> >>>>>>
> >> >> >>>> >>>>>> Thanks,
> >> >> >>>> >>>>>> Cristian
> >> >> >>>> >>>>>>
> >> >> >>>> >>>>>>
> >> >> >>>> >>>>>>
> >> >> >>>> >>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> --
> >> >> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> >>>> | Bodenlehenstraße 11
> >> ++43-699-11108907
> >> >> >>>> | A-5500 Bischofshofen
> >> >> >>>>
> >> >> >>>
> >> >> >>>
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
On Sat, Mar 15, 2014 at 8:34 PM, Cristian Petroaca
<cr...@gmail.com> wrote:
> Thanks Rupert.
>
> A couple more questions/issues :
>
> 1. Whenever I start the stanbol server I'm seeing this in the console
> output :
>

This should be fixed with STANBOL-1278 [1] [2]

> 2. Whenever I restart the server the Weighted Chains get messed up. I
> usually use the 'default' chain and add my engine to it so there are 11
> engines in it. After the restart this chain now contains around 23 engines
> in total.

I was not able to replicate this. What I tried was

(1) start up the stable launcher
(2) add an additional engine to the default chain
(3) restart the launcher

The default chain was not changed after (2) and (3). So I would need
further information for knowing why this is happening.

Generally it is better to create you own chain instance as modifying
one that is provided by the default configuration. I would also
recommend that you keep your test configuration in text files and to
copy those to the 'stanbol/fileinstall' folder. Doing so prevent you
from manually entering the configuration after a software update. The
production-mode section [3] provides information on how to do that.

best
Rupert

[1] https://issues.apache.org/jira/browse/STANBOL-1278
[2] http://svn.apache.org/r1576623
[3] http://stanbol.apache.org/docs/trunk/production-mode

> ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]: Error
> starting
>  slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
> tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
> (org.osgi
> .framework.BundleException: Unresolved constraint in bundle
> org.apache.stanbol.e
> nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
> requirement [15
> 3.0] package; (&(package=javax.ws.rs)(version>=0.0.0)(!(version>=2.0.0))))
> org.osgi.framework.BundleException: Unresolved constraint in bundle
> org.apache.s
> tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
> require
> ment [153.0] package; (&(package=javax.ws.rs
> )(version>=0.0.0)(!(version>=2.0.0))
> )
>         at org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
>         at org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
>         at
> org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)
>
>         at
> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
> )
>         at java.lang.Thread.run(Unknown Source)
>
> Despite of this the server starts fine and I can use the enhancer fine. Do
> you guys see this as well?
>
>
> 2. Whenever I restart the server the Weighted Chains get messed up. I
> usually use the 'default' chain and add my engine to it so there are 11
> engines in it. After the restart this chain now contains around 23 engines
> in total.
>
>
>
>
> 2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <rupert.westenthaler@gmail.com
>>:
>
>> Hi Cristian,
>>
>> NER Annotations are typically available as both
>> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
>> enhancement metadata. As you are already accessing the AnayzedText I
>> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>>
>> best
>> Rupert
>>
>> [1]
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>>
>> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > Thanks.
>> > I assume I should get the Named entities using the same but with
>> > NlpAnnotations.NER_ANNOTATION?
>> >
>> >
>> >
>> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
>> > rupert.westenthaler@gmail.com>:
>> >
>> >> Hallo Cristian,
>> >>
>> >> NounPhrases are not added to the RDF enhancement results. You need to
>> >> use the AnalyzedText ContentPart [1]
>> >>
>> >> here is some demo code you can use in the computeEnhancement method
>> >>
>> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci,
>> true);
>> >>         Iterator<? extends Section> sections = at.getSentences();
>> >>         if(!sections.hasNext()){ //process as single sentence
>> >>             sections = Collections.singleton(at).iterator();
>> >>         }
>> >>
>> >>         while(sections.hasNext()){
>> >>             Section section = sections.next();
>> >>             Iterator<Span> chunks =
>> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>> >>             while(chunks.hasNext()){
>> >>                 Span chunk = chunks.next();
>> >>                 Value<PhraseTag> phrase =
>> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>> >>                 if(phrase.value().getCategory() ==
>> LexicalCategory.Noun){
>> >>                     log.info(" - NounPhrase [{},{}] {}", new Object[]{
>> >>
>> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>> >>                 }
>> >>             }
>> >>         }
>> >>
>> >> hope this helps
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> [1]
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>> >>
>> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> >> <cr...@gmail.com> wrote:
>> >> > I started to implement the engine and I'm having problems with getting
>> >> > results for noun phrases. I modified the "default" weighted chain to
>> also
>> >> > include the PosChunkerEngine and ran a sample text : "Angela Merkel
>> >> visted
>> >> > China. The german chancellor met with various people". I expected that
>> >> the
>> >> > RDF XML output would contain some info about the noun phrases but I
>> >> cannot
>> >> > see any.
>> >> > Could you point me to the correct way to generate the noun phrases?
>> >> >
>> >> > Thanks,
>> >> > Cristian
>> >> >
>> >> >
>> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> >> cristian.petroaca@gmail.com>:
>> >> >
>> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
>> >> >>
>> >> >>
>> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> >> cristian.petroaca@gmail.com>
>> >> >> :
>> >> >>
>> >> >> Hi Rupert,
>> >> >>>
>> >> >>> The "spatial" dimension is a good idea. I'll also take a look at
>> Yago.
>> >> >>>
>> >> >>> I will create a Jira with what we talked about here. It will
>> probably
>> >> >>> have just a draft-like description for now and will be updated as I
>> go
>> >> >>> along.
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Cristian
>> >> >>>
>> >> >>>
>> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >> >>> rupert.westenthaler@gmail.com>:
>> >> >>>
>> >> >>> Hi Cristian,
>> >> >>>>
>> >> >>>> definitely an interesting approach. You should have a look at Yago2
>> >> >>>> [1]. As far as I can remember the Yago taxonomy is much better
>> >> >>>> structured as the one used by dbpedia. Mapping suggestions of
>> dbpedia
>> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do provide
>> >> >>>> mappings [2] and [3]
>> >> >>>>
>> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>> >> >>>> >>
>> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company made
>> a
>> >> >>>> >> huge profit".
>> >> >>>>
>> >> >>>> Thats actually a very good example. Spatial contexts are very
>> >> >>>> important as they tend to be often used for referencing. So I would
>> >> >>>> suggest to specially treat the spatial context. For spatial
>> Entities
>> >> >>>> (like a City) this is easy, but even for other (like a Person,
>> >> >>>> Company) you could use relations to spatial entities define their
>> >> >>>> spatial context. This context could than be used to correctly link
>> >> >>>> "The Redmond's company" to "Microsoft".
>> >> >>>>
>> >> >>>> In addition I would suggest to use the "spatial" context of each
>> >> >>>> entity (basically relation to entities that are cities, regions,
>> >> >>>> countries) as a separate dimension, because those are very often
>> used
>> >> >>>> for coreferences.
>> >> >>>>
>> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >> >>>> [3]
>> >> >>>>
>> >>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >> >>>>
>> >> >>>>
>> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> >> >>>> <cr...@gmail.com> wrote:
>> >> >>>> > There are several dbpedia categories for each entity, in this
>> case
>> >> for
>> >> >>>> > Microsoft we have :
>> >> >>>> >
>> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >> >>>> > category:Microsoft
>> >> >>>> > category:Software_companies_of_the_United_States
>> >> >>>> > category:Software_companies_based_in_Washington_(state)
>> >> >>>> > category:Companies_established_in_1975
>> >> >>>> > category:1975_establishments_in_the_United_States
>> >> >>>> > category:Companies_based_in_Redmond,_Washington
>> >> >>>> >
>> category:Multinational_companies_headquartered_in_the_United_States
>> >> >>>> > category:Cloud_computing_providers
>> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
>> >> >>>> >
>> >> >>>> > So we also have "Companies based in Redmont,Washington" which
>> could
>> >> be
>> >> >>>> > matched.
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > There is still other contextual information from dbpedia which
>> can
>> >> be
>> >> >>>> used.
>> >> >>>> > For example for an Organization we could also include :
>> >> >>>> > dbpprop:industry = Software
>> >> >>>> > dbpprop:service = Online Service Providers
>> >> >>>> >
>> >> >>>> > and for a Person (that's for Barack Obama) :
>> >> >>>> >
>> >> >>>> > dbpedia-owl:profession:
>> >> >>>> >                                dbpedia:Author
>> >> >>>> >                                dbpedia:Constitutional_law
>> >> >>>> >                                dbpedia:Lawyer
>> >> >>>> >                                dbpedia:Community_organizing
>> >> >>>> >
>> >> >>>> > I'd like to continue investigating this as I think that it may
>> have
>> >> >>>> some
>> >> >>>> > value in increasing the number of coreference resolutions and I'd
>> >> like
>> >> >>>> to
>> >> >>>> > concentrate more on precision rather than recall since we already
>> >> have
>> >> >>>> a
>> >> >>>> > set of coreferences detected by the stanford nlp tool and this
>> would
>> >> >>>> be as
>> >> >>>> > an addition to that (at least this is how I would like to use
>> it).
>> >> >>>> >
>> >> >>>> > Is it ok if I track this by opening a jira? I could update it to
>> >> show
>> >> >>>> my
>> >> >>>> > progress and also my conclusions and if it turns out that it was
>> a
>> >> bad
>> >> >>>> idea
>> >> >>>> > then that's the situation at least I'll end up with more
>> knowledge
>> >> >>>> about
>> >> >>>> > Stanbol in the end :).
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>> >> >>>> >
>> >> >>>> >> Hi Cristian,
>> >> >>>> >>
>> >> >>>> >> The approach sounds nice. I don't want to be the devil's
>> advocate
>> >> but
>> >> >>>> I'm
>> >> >>>> >> just not sure about the recall using the dbpedia categories
>> >> feature.
>> >> >>>> For
>> >> >>>> >> example, your sentence could be also "Microsoft posted its 2013
>> >> >>>> earnings.
>> >> >>>> >> The Redmond's company made a huge profit". So, maybe including
>> more
>> >> >>>> >> contextual information from dbpedia could increase the recall
>> but
>> >> of
>> >> >>>> course
>> >> >>>> >> will reduce the precision.
>> >> >>>> >>
>> >> >>>> >> Cheers,
>> >> >>>> >> Rafa
>> >> >>>> >>
>> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >> >>>> >>
>> >> >>>> >>  Back with a more detailed description of the steps for making
>> this
>> >> >>>> kind of
>> >> >>>> >>> coreference work.
>> >> >>>> >>>
>> >> >>>> >>> I will be using references to the following text in the steps
>> >> below
>> >> >>>> in
>> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013
>> >> earnings.
>> >> >>>> The
>> >> >>>> >>> software company made a huge profit."
>> >> >>>> >>>
>> >> >>>> >>> 1. For every noun phrase in the text which has :
>> >> >>>> >>>      a. a determinate pos which implies reference to an entity
>> >> local
>> >> >>>> to
>> >> >>>> >>> the
>> >> >>>> >>> text, such as "the, this, these") but not "another, every", etc
>> >> which
>> >> >>>> >>> implies a reference to an entity outside of the text.
>> >> >>>> >>>      b. having at least another noun aside from the main
>> required
>> >> >>>> noun
>> >> >>>> >>> which
>> >> >>>> >>> further describes it. For example I will not count "The
>> company"
>> >> as
>> >> >>>> being
>> >> >>>> >>> a
>> >> >>>> >>> legitimate candidate since this could create a lot of false
>> >> >>>> positives by
>> >> >>>> >>> considering the double meaning of some words such as "in the
>> >> company
>> >> >>>> of
>> >> >>>> >>> good people".
>> >> >>>> >>> "The software company" is a good candidate since we also have
>> >> >>>> "software".
>> >> >>>> >>>
>> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of the
>> >> dbpedia
>> >> >>>> >>> categories of each named entity found prior to the location of
>> the
>> >> >>>> noun
>> >> >>>> >>> phrase in the text.
>> >> >>>> >>> The dbpedia categories are in the following format (for
>> Microsoft
>> >> for
>> >> >>>> >>> example) : "Software companies of the United States".
>> >> >>>> >>>   So we try to match "software company" with that.
>> >> >>>> >>> First, as you can see, the main noun in the dbpedia category
>> has a
>> >> >>>> plural
>> >> >>>> >>> form and it's the same for all categories which I saw. I don't
>> >> know
>> >> >>>> if
>> >> >>>> >>> there's an easier way to do this but I thought of applying a
>> >> >>>> lemmatizer on
>> >> >>>> >>> the category and the noun phrase in order for them to have a
>> >> common
>> >> >>>> >>> denominator.This also works if the noun phrase itself has a
>> plural
>> >> >>>> form.
>> >> >>>> >>>
>> >> >>>> >>> Second, I'll need to use for comparison only the words in the
>> >> >>>> category
>> >> >>>> >>> which are themselves nouns and not prepositions or determiners
>> >> such
>> >> >>>> as "of
>> >> >>>> >>> the".This means that I need to pos tag the categories contents
>> as
>> >> >>>> well.
>> >> >>>> >>> I was thinking of running the pos and lemma on the dbpedia
>> >> >>>> categories when
>> >> >>>> >>> building the dbpedia backed entity hub and storing them for
>> later
>> >> >>>> use - I
>> >> >>>> >>> don't know how feasible this is at the moment.
>> >> >>>> >>>
>> >> >>>> >>> After this I can compare each noun in the noun phrase with the
>> >> >>>> equivalent
>> >> >>>> >>> nouns in the categories and based on the number of matches I
>> can
>> >> >>>> create a
>> >> >>>> >>> confidence level.
>> >> >>>> >>>
>> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type from
>> >> dbpedia
>> >> >>>> of the
>> >> >>>> >>> named entity. If this matches increase the confidence level.
>> >> >>>> >>>
>> >> >>>> >>> 4. If there are multiple named entities which can match a
>> certain
>> >> >>>> noun
>> >> >>>> >>> phrase then link the noun phrase with the closest named entity
>> >> prior
>> >> >>>> to it
>> >> >>>> >>> in the text.
>> >> >>>> >>>
>> >> >>>> >>> What do you think?
>> >> >>>> >>>
>> >> >>>> >>> Cristian
>> >> >>>> >>>
>> >> >>>> >>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
>> >> >>>> >>>
>> >> >>>> >>>  Hi Rafa,
>> >> >>>> >>>>
>> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on it.
>> I'll
>> >> >>>> provide
>> >> >>>> >>>> it here so that you guys can give me a feedback on it.
>> >> >>>> >>>>
>> >> >>>> >>>> What are "locality" features?
>> >> >>>> >>>>
>> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef and
>> >> >>>> CherryPicker
>> >> >>>> >>>> and
>> >> >>>> >>>> they don't provide such a coreference.
>> >> >>>> >>>>
>> >> >>>> >>>> Cristian
>> >> >>>> >>>>
>> >> >>>> >>>>
>> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>> >> >>>> >>>>
>> >> >>>> >>>> Hi Cristian,
>> >> >>>> >>>>
>> >> >>>> >>>>> Without having more details about your concrete heuristic,
>> in my
>> >> >>>> honest
>> >> >>>> >>>>> opinion, such approach could produce a lot of false
>> positives. I
>> >> >>>> don't
>> >> >>>> >>>>> know
>> >> >>>> >>>>> if you are planning to use some "locality" features to detect
>> >> such
>> >> >>>> >>>>> coreferences but you need to take into account that it is
>> quite
>> >> >>>> usual
>> >> >>>> >>>>> that
>> >> >>>> >>>>> coreferenced mentions can occurs even in different
>> paragraphs.
>> >> >>>> Although
>> >> >>>> >>>>> I'm
>> >> >>>> >>>>> not an expert in Natural Language Understanding, I would say
>> it
>> >> is
>> >> >>>> quite
>> >> >>>> >>>>> difficult to get decent precision/recall rates for
>> coreferencing
>> >> >>>> using
>> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools like
>> BART
>> >> (
>> >> >>>> >>>>> http://www.bart-coref.org/).
>> >> >>>> >>>>>
>> >> >>>> >>>>> Cheers,
>> >> >>>> >>>>> Rafa Haro
>> >> >>>> >>>>>
>> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >> >>>> >>>>>
>> >> >>>> >>>>>   Hi,
>> >> >>>> >>>>>
>> >> >>>> >>>>>> One of the necessary steps for implementing the Event
>> >> extraction
>> >> >>>> Engine
>> >> >>>> >>>>>> feature :
>> https://issues.apache.org/jira/browse/STANBOL-1121is
>> >> >>>> to
>> >> >>>> >>>>>> have
>> >> >>>> >>>>>> coreference resolution in the given text. This is provided
>> now
>> >> >>>> via the
>> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module is
>> >> performing
>> >> >>>> >>>>>> mostly
>> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama)
>> >> >>>> coreference
>> >> >>>> >>>>>> resolution.
>> >> >>>> >>>>>>
>> >> >>>> >>>>>> In order to get more coreferences from the text I though of
>> >> >>>> creating
>> >> >>>> >>>>>> some
>> >> >>>> >>>>>> logic that would detect this kind of coreference :
>> >> >>>> >>>>>> "Apple reaches new profit heights. The software company just
>> >> >>>> announced
>> >> >>>> >>>>>> its
>> >> >>>> >>>>>> 2013 earnings."
>> >> >>>> >>>>>> Here "The software company" obviously refers to "Apple".
>> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities which
>> are
>> >> of
>> >> >>>> the
>> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company" and
>> also
>> >> >>>> have
>> >> >>>> >>>>>> attributes which can be found in the dbpedia categories of
>> the
>> >> >>>> named
>> >> >>>> >>>>>> entity, in this case "software".
>> >> >>>> >>>>>>
>> >> >>>> >>>>>> The detection of coreferences such as "The software
>> company" in
>> >> >>>> the
>> >> >>>> >>>>>> text
>> >> >>>> >>>>>> would also be done by either using the new Pos Tag Based
>> Phrase
>> >> >>>> >>>>>> extraction
>> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of the
>> >> >>>> sentence and
>> >> >>>> >>>>>> picking up only subjects or objects.
>> >> >>>> >>>>>>
>> >> >>>> >>>>>> At this point I'd like to know if this kind of logic would
>> be
>> >> >>>> useful
>> >> >>>> >>>>>> as a
>> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and
>> recall
>> >> are
>> >> >>>> good
>> >> >>>> >>>>>> enough) in Stanbol?
>> >> >>>> >>>>>>
>> >> >>>> >>>>>> Thanks,
>> >> >>>> >>>>>> Cristian
>> >> >>>> >>>>>>
>> >> >>>> >>>>>>
>> >> >>>> >>>>>>
>> >> >>>> >>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >>>> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >> >>>> | A-5500 Bischofshofen
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
Thanks Rupert.

A couple more questions/issues :

1. Whenever I start the stanbol server I'm seeing this in the console
output :

ERROR: Bundle org.apache.stanbol.enhancer.engine.topic.web [153]: Error
starting
 slinginstall:c:\Data\Projects\Stanbol\main\launchers\stable\target\stanbol\star
tup\35\org.apache.stanbol.enhancer.engine.topic.web-1.0.0-SNAPSHOT.jar
(org.osgi
.framework.BundleException: Unresolved constraint in bundle
org.apache.stanbol.e
nhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
requirement [15
3.0] package; (&(package=javax.ws.rs)(version>=0.0.0)(!(version>=2.0.0))))
org.osgi.framework.BundleException: Unresolved constraint in bundle
org.apache.s
tanbol.enhancer.engine.topic.web [153]: Unable to resolve 153.0: missing
require
ment [153.0] package; (&(package=javax.ws.rs
)(version>=0.0.0)(!(version>=2.0.0))
)
        at org.apache.felix.framework.Felix.resolveBundle(Felix.java:3443)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:1727)
        at
org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1156)

        at
org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:264
)
        at java.lang.Thread.run(Unknown Source)

Despite of this the server starts fine and I can use the enhancer fine. Do
you guys see this as well?


2. Whenever I restart the server the Weighted Chains get messed up. I
usually use the 'default' chain and add my engine to it so there are 11
engines in it. After the restart this chain now contains around 23 engines
in total.




2014-03-11 9:47 GMT+02:00 Rupert Westenthaler <rupert.westenthaler@gmail.com
>:

> Hi Cristian,
>
> NER Annotations are typically available as both
> NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
> enhancement metadata. As you are already accessing the AnayzedText I
> would prefer using the  NlpAnnotations.NER_ANNOTATION.
>
> best
> Rupert
>
> [1]
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>
> On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > Thanks.
> > I assume I should get the Named entities using the same but with
> > NlpAnnotations.NER_ANNOTATION?
> >
> >
> >
> > 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> > rupert.westenthaler@gmail.com>:
> >
> >> Hallo Cristian,
> >>
> >> NounPhrases are not added to the RDF enhancement results. You need to
> >> use the AnalyzedText ContentPart [1]
> >>
> >> here is some demo code you can use in the computeEnhancement method
> >>
> >>         AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci,
> true);
> >>         Iterator<? extends Section> sections = at.getSentences();
> >>         if(!sections.hasNext()){ //process as single sentence
> >>             sections = Collections.singleton(at).iterator();
> >>         }
> >>
> >>         while(sections.hasNext()){
> >>             Section section = sections.next();
> >>             Iterator<Span> chunks =
> >> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
> >>             while(chunks.hasNext()){
> >>                 Span chunk = chunks.next();
> >>                 Value<PhraseTag> phrase =
> >> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
> >>                 if(phrase.value().getCategory() ==
> LexicalCategory.Noun){
> >>                     log.info(" - NounPhrase [{},{}] {}", new Object[]{
> >>
> >> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
> >>                 }
> >>             }
> >>         }
> >>
> >> hope this helps
> >>
> >> best
> >> Rupert
> >>
> >> [1]
> >>
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> >>
> >> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> >> <cr...@gmail.com> wrote:
> >> > I started to implement the engine and I'm having problems with getting
> >> > results for noun phrases. I modified the "default" weighted chain to
> also
> >> > include the PosChunkerEngine and ran a sample text : "Angela Merkel
> >> visted
> >> > China. The german chancellor met with various people". I expected that
> >> the
> >> > RDF XML output would contain some info about the noun phrases but I
> >> cannot
> >> > see any.
> >> > Could you point me to the correct way to generate the noun phrases?
> >> >
> >> > Thanks,
> >> > Cristian
> >> >
> >> >
> >> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> >> cristian.petroaca@gmail.com>:
> >> >
> >> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
> >> >>
> >> >>
> >> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> >> cristian.petroaca@gmail.com>
> >> >> :
> >> >>
> >> >> Hi Rupert,
> >> >>>
> >> >>> The "spatial" dimension is a good idea. I'll also take a look at
> Yago.
> >> >>>
> >> >>> I will create a Jira with what we talked about here. It will
> probably
> >> >>> have just a draft-like description for now and will be updated as I
> go
> >> >>> along.
> >> >>>
> >> >>> Thanks,
> >> >>> Cristian
> >> >>>
> >> >>>
> >> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> >> >>> rupert.westenthaler@gmail.com>:
> >> >>>
> >> >>> Hi Cristian,
> >> >>>>
> >> >>>> definitely an interesting approach. You should have a look at Yago2
> >> >>>> [1]. As far as I can remember the Yago taxonomy is much better
> >> >>>> structured as the one used by dbpedia. Mapping suggestions of
> dbpedia
> >> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do provide
> >> >>>> mappings [2] and [3]
> >> >>>>
> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
> >> >>>> >>
> >> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company made
> a
> >> >>>> >> huge profit".
> >> >>>>
> >> >>>> Thats actually a very good example. Spatial contexts are very
> >> >>>> important as they tend to be often used for referencing. So I would
> >> >>>> suggest to specially treat the spatial context. For spatial
> Entities
> >> >>>> (like a City) this is easy, but even for other (like a Person,
> >> >>>> Company) you could use relations to spatial entities define their
> >> >>>> spatial context. This context could than be used to correctly link
> >> >>>> "The Redmond's company" to "Microsoft".
> >> >>>>
> >> >>>> In addition I would suggest to use the "spatial" context of each
> >> >>>> entity (basically relation to entities that are cities, regions,
> >> >>>> countries) as a separate dimension, because those are very often
> used
> >> >>>> for coreferences.
> >> >>>>
> >> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> >> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >> >>>> [3]
> >> >>>>
> >>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >> >>>>
> >> >>>>
> >> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
> >> >>>> <cr...@gmail.com> wrote:
> >> >>>> > There are several dbpedia categories for each entity, in this
> case
> >> for
> >> >>>> > Microsoft we have :
> >> >>>> >
> >> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >> >>>> > category:Microsoft
> >> >>>> > category:Software_companies_of_the_United_States
> >> >>>> > category:Software_companies_based_in_Washington_(state)
> >> >>>> > category:Companies_established_in_1975
> >> >>>> > category:1975_establishments_in_the_United_States
> >> >>>> > category:Companies_based_in_Redmond,_Washington
> >> >>>> >
> category:Multinational_companies_headquartered_in_the_United_States
> >> >>>> > category:Cloud_computing_providers
> >> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
> >> >>>> >
> >> >>>> > So we also have "Companies based in Redmont,Washington" which
> could
> >> be
> >> >>>> > matched.
> >> >>>> >
> >> >>>> >
> >> >>>> > There is still other contextual information from dbpedia which
> can
> >> be
> >> >>>> used.
> >> >>>> > For example for an Organization we could also include :
> >> >>>> > dbpprop:industry = Software
> >> >>>> > dbpprop:service = Online Service Providers
> >> >>>> >
> >> >>>> > and for a Person (that's for Barack Obama) :
> >> >>>> >
> >> >>>> > dbpedia-owl:profession:
> >> >>>> >                                dbpedia:Author
> >> >>>> >                                dbpedia:Constitutional_law
> >> >>>> >                                dbpedia:Lawyer
> >> >>>> >                                dbpedia:Community_organizing
> >> >>>> >
> >> >>>> > I'd like to continue investigating this as I think that it may
> have
> >> >>>> some
> >> >>>> > value in increasing the number of coreference resolutions and I'd
> >> like
> >> >>>> to
> >> >>>> > concentrate more on precision rather than recall since we already
> >> have
> >> >>>> a
> >> >>>> > set of coreferences detected by the stanford nlp tool and this
> would
> >> >>>> be as
> >> >>>> > an addition to that (at least this is how I would like to use
> it).
> >> >>>> >
> >> >>>> > Is it ok if I track this by opening a jira? I could update it to
> >> show
> >> >>>> my
> >> >>>> > progress and also my conclusions and if it turns out that it was
> a
> >> bad
> >> >>>> idea
> >> >>>> > then that's the situation at least I'll end up with more
> knowledge
> >> >>>> about
> >> >>>> > Stanbol in the end :).
> >> >>>> >
> >> >>>> >
> >> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
> >> >>>> >
> >> >>>> >> Hi Cristian,
> >> >>>> >>
> >> >>>> >> The approach sounds nice. I don't want to be the devil's
> advocate
> >> but
> >> >>>> I'm
> >> >>>> >> just not sure about the recall using the dbpedia categories
> >> feature.
> >> >>>> For
> >> >>>> >> example, your sentence could be also "Microsoft posted its 2013
> >> >>>> earnings.
> >> >>>> >> The Redmond's company made a huge profit". So, maybe including
> more
> >> >>>> >> contextual information from dbpedia could increase the recall
> but
> >> of
> >> >>>> course
> >> >>>> >> will reduce the precision.
> >> >>>> >>
> >> >>>> >> Cheers,
> >> >>>> >> Rafa
> >> >>>> >>
> >> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >> >>>> >>
> >> >>>> >>  Back with a more detailed description of the steps for making
> this
> >> >>>> kind of
> >> >>>> >>> coreference work.
> >> >>>> >>>
> >> >>>> >>> I will be using references to the following text in the steps
> >> below
> >> >>>> in
> >> >>>> >>> order to make things clearer : "Microsoft posted its 2013
> >> earnings.
> >> >>>> The
> >> >>>> >>> software company made a huge profit."
> >> >>>> >>>
> >> >>>> >>> 1. For every noun phrase in the text which has :
> >> >>>> >>>      a. a determinate pos which implies reference to an entity
> >> local
> >> >>>> to
> >> >>>> >>> the
> >> >>>> >>> text, such as "the, this, these") but not "another, every", etc
> >> which
> >> >>>> >>> implies a reference to an entity outside of the text.
> >> >>>> >>>      b. having at least another noun aside from the main
> required
> >> >>>> noun
> >> >>>> >>> which
> >> >>>> >>> further describes it. For example I will not count "The
> company"
> >> as
> >> >>>> being
> >> >>>> >>> a
> >> >>>> >>> legitimate candidate since this could create a lot of false
> >> >>>> positives by
> >> >>>> >>> considering the double meaning of some words such as "in the
> >> company
> >> >>>> of
> >> >>>> >>> good people".
> >> >>>> >>> "The software company" is a good candidate since we also have
> >> >>>> "software".
> >> >>>> >>>
> >> >>>> >>> 2. match the nouns in the noun phrase to the contents of the
> >> dbpedia
> >> >>>> >>> categories of each named entity found prior to the location of
> the
> >> >>>> noun
> >> >>>> >>> phrase in the text.
> >> >>>> >>> The dbpedia categories are in the following format (for
> Microsoft
> >> for
> >> >>>> >>> example) : "Software companies of the United States".
> >> >>>> >>>   So we try to match "software company" with that.
> >> >>>> >>> First, as you can see, the main noun in the dbpedia category
> has a
> >> >>>> plural
> >> >>>> >>> form and it's the same for all categories which I saw. I don't
> >> know
> >> >>>> if
> >> >>>> >>> there's an easier way to do this but I thought of applying a
> >> >>>> lemmatizer on
> >> >>>> >>> the category and the noun phrase in order for them to have a
> >> common
> >> >>>> >>> denominator.This also works if the noun phrase itself has a
> plural
> >> >>>> form.
> >> >>>> >>>
> >> >>>> >>> Second, I'll need to use for comparison only the words in the
> >> >>>> category
> >> >>>> >>> which are themselves nouns and not prepositions or determiners
> >> such
> >> >>>> as "of
> >> >>>> >>> the".This means that I need to pos tag the categories contents
> as
> >> >>>> well.
> >> >>>> >>> I was thinking of running the pos and lemma on the dbpedia
> >> >>>> categories when
> >> >>>> >>> building the dbpedia backed entity hub and storing them for
> later
> >> >>>> use - I
> >> >>>> >>> don't know how feasible this is at the moment.
> >> >>>> >>>
> >> >>>> >>> After this I can compare each noun in the noun phrase with the
> >> >>>> equivalent
> >> >>>> >>> nouns in the categories and based on the number of matches I
> can
> >> >>>> create a
> >> >>>> >>> confidence level.
> >> >>>> >>>
> >> >>>> >>> 3. match the noun of the noun phrase with the rdf:type from
> >> dbpedia
> >> >>>> of the
> >> >>>> >>> named entity. If this matches increase the confidence level.
> >> >>>> >>>
> >> >>>> >>> 4. If there are multiple named entities which can match a
> certain
> >> >>>> noun
> >> >>>> >>> phrase then link the noun phrase with the closest named entity
> >> prior
> >> >>>> to it
> >> >>>> >>> in the text.
> >> >>>> >>>
> >> >>>> >>> What do you think?
> >> >>>> >>>
> >> >>>> >>> Cristian
> >> >>>> >>>
> >> >>>> >>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
> >> >>>> >>>
> >> >>>> >>>  Hi Rafa,
> >> >>>> >>>>
> >> >>>> >>>> I don't yet have a concrete heursitic but I'm working on it.
> I'll
> >> >>>> provide
> >> >>>> >>>> it here so that you guys can give me a feedback on it.
> >> >>>> >>>>
> >> >>>> >>>> What are "locality" features?
> >> >>>> >>>>
> >> >>>> >>>> I looked at Bart and other coref tools such as ArkRef and
> >> >>>> CherryPicker
> >> >>>> >>>> and
> >> >>>> >>>> they don't provide such a coreference.
> >> >>>> >>>>
> >> >>>> >>>> Cristian
> >> >>>> >>>>
> >> >>>> >>>>
> >> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
> >> >>>> >>>>
> >> >>>> >>>> Hi Cristian,
> >> >>>> >>>>
> >> >>>> >>>>> Without having more details about your concrete heuristic,
> in my
> >> >>>> honest
> >> >>>> >>>>> opinion, such approach could produce a lot of false
> positives. I
> >> >>>> don't
> >> >>>> >>>>> know
> >> >>>> >>>>> if you are planning to use some "locality" features to detect
> >> such
> >> >>>> >>>>> coreferences but you need to take into account that it is
> quite
> >> >>>> usual
> >> >>>> >>>>> that
> >> >>>> >>>>> coreferenced mentions can occurs even in different
> paragraphs.
> >> >>>> Although
> >> >>>> >>>>> I'm
> >> >>>> >>>>> not an expert in Natural Language Understanding, I would say
> it
> >> is
> >> >>>> quite
> >> >>>> >>>>> difficult to get decent precision/recall rates for
> coreferencing
> >> >>>> using
> >> >>>> >>>>> fixed rules. Maybe you can give a try to others tools like
> BART
> >> (
> >> >>>> >>>>> http://www.bart-coref.org/).
> >> >>>> >>>>>
> >> >>>> >>>>> Cheers,
> >> >>>> >>>>> Rafa Haro
> >> >>>> >>>>>
> >> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
> >> >>>> >>>>>
> >> >>>> >>>>>   Hi,
> >> >>>> >>>>>
> >> >>>> >>>>>> One of the necessary steps for implementing the Event
> >> extraction
> >> >>>> Engine
> >> >>>> >>>>>> feature :
> https://issues.apache.org/jira/browse/STANBOL-1121is
> >> >>>> to
> >> >>>> >>>>>> have
> >> >>>> >>>>>> coreference resolution in the given text. This is provided
> now
> >> >>>> via the
> >> >>>> >>>>>> stanford-nlp project but as far as I saw this module is
> >> performing
> >> >>>> >>>>>> mostly
> >> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama)
> >> >>>> coreference
> >> >>>> >>>>>> resolution.
> >> >>>> >>>>>>
> >> >>>> >>>>>> In order to get more coreferences from the text I though of
> >> >>>> creating
> >> >>>> >>>>>> some
> >> >>>> >>>>>> logic that would detect this kind of coreference :
> >> >>>> >>>>>> "Apple reaches new profit heights. The software company just
> >> >>>> announced
> >> >>>> >>>>>> its
> >> >>>> >>>>>> 2013 earnings."
> >> >>>> >>>>>> Here "The software company" obviously refers to "Apple".
> >> >>>> >>>>>> So I'd like to detect coreferences of Named Entities which
> are
> >> of
> >> >>>> the
> >> >>>> >>>>>> rdf:type of the Named Entity , in this case "company" and
> also
> >> >>>> have
> >> >>>> >>>>>> attributes which can be found in the dbpedia categories of
> the
> >> >>>> named
> >> >>>> >>>>>> entity, in this case "software".
> >> >>>> >>>>>>
> >> >>>> >>>>>> The detection of coreferences such as "The software
> company" in
> >> >>>> the
> >> >>>> >>>>>> text
> >> >>>> >>>>>> would also be done by either using the new Pos Tag Based
> Phrase
> >> >>>> >>>>>> extraction
> >> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of the
> >> >>>> sentence and
> >> >>>> >>>>>> picking up only subjects or objects.
> >> >>>> >>>>>>
> >> >>>> >>>>>> At this point I'd like to know if this kind of logic would
> be
> >> >>>> useful
> >> >>>> >>>>>> as a
> >> >>>> >>>>>> separate Enhancement Engine (in case the precision and
> recall
> >> are
> >> >>>> good
> >> >>>> >>>>>> enough) in Stanbol?
> >> >>>> >>>>>>
> >> >>>> >>>>>> Thanks,
> >> >>>> >>>>>> Cristian
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> >>>> | Bodenlehenstraße 11
> ++43-699-11108907
> >> >>>> | A-5500 Bischofshofen
> >> >>>>
> >> >>>
> >> >>>
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian,

NER Annotations are typically available as both
NlpAnnotations.NER_ANNOTATION and  fise:TextAnnotation [1] in the
enhancement metadata. As you are already accessing the AnayzedText I
would prefer using the  NlpAnnotations.NER_ANNOTATION.

best
Rupert

[1] http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation

On Mon, Mar 10, 2014 at 10:07 PM, Cristian Petroaca
<cr...@gmail.com> wrote:
> Thanks.
> I assume I should get the Named entities using the same but with
> NlpAnnotations.NER_ANNOTATION?
>
>
>
> 2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
> rupert.westenthaler@gmail.com>:
>
>> Hallo Cristian,
>>
>> NounPhrases are not added to the RDF enhancement results. You need to
>> use the AnalyzedText ContentPart [1]
>>
>> here is some demo code you can use in the computeEnhancement method
>>
>>         AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci, true);
>>         Iterator<? extends Section> sections = at.getSentences();
>>         if(!sections.hasNext()){ //process as single sentence
>>             sections = Collections.singleton(at).iterator();
>>         }
>>
>>         while(sections.hasNext()){
>>             Section section = sections.next();
>>             Iterator<Span> chunks =
>> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>>             while(chunks.hasNext()){
>>                 Span chunk = chunks.next();
>>                 Value<PhraseTag> phrase =
>> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>>                 if(phrase.value().getCategory() == LexicalCategory.Noun){
>>                     log.info(" - NounPhrase [{},{}] {}", new Object[]{
>>
>> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>>                 }
>>             }
>>         }
>>
>> hope this helps
>>
>> best
>> Rupert
>>
>> [1]
>> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>>
>> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > I started to implement the engine and I'm having problems with getting
>> > results for noun phrases. I modified the "default" weighted chain to also
>> > include the PosChunkerEngine and ran a sample text : "Angela Merkel
>> visted
>> > China. The german chancellor met with various people". I expected that
>> the
>> > RDF XML output would contain some info about the noun phrases but I
>> cannot
>> > see any.
>> > Could you point me to the correct way to generate the noun phrases?
>> >
>> > Thanks,
>> > Cristian
>> >
>> >
>> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
>> cristian.petroaca@gmail.com>:
>> >
>> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
>> >>
>> >>
>> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
>> cristian.petroaca@gmail.com>
>> >> :
>> >>
>> >> Hi Rupert,
>> >>>
>> >>> The "spatial" dimension is a good idea. I'll also take a look at Yago.
>> >>>
>> >>> I will create a Jira with what we talked about here. It will probably
>> >>> have just a draft-like description for now and will be updated as I go
>> >>> along.
>> >>>
>> >>> Thanks,
>> >>> Cristian
>> >>>
>> >>>
>> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> >>> rupert.westenthaler@gmail.com>:
>> >>>
>> >>> Hi Cristian,
>> >>>>
>> >>>> definitely an interesting approach. You should have a look at Yago2
>> >>>> [1]. As far as I can remember the Yago taxonomy is much better
>> >>>> structured as the one used by dbpedia. Mapping suggestions of dbpedia
>> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do provide
>> >>>> mappings [2] and [3]
>> >>>>
>> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>> >>>> >>
>> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company made a
>> >>>> >> huge profit".
>> >>>>
>> >>>> Thats actually a very good example. Spatial contexts are very
>> >>>> important as they tend to be often used for referencing. So I would
>> >>>> suggest to specially treat the spatial context. For spatial Entities
>> >>>> (like a City) this is easy, but even for other (like a Person,
>> >>>> Company) you could use relations to spatial entities define their
>> >>>> spatial context. This context could than be used to correctly link
>> >>>> "The Redmond's company" to "Microsoft".
>> >>>>
>> >>>> In addition I would suggest to use the "spatial" context of each
>> >>>> entity (basically relation to entities that are cities, regions,
>> >>>> countries) as a separate dimension, because those are very often used
>> >>>> for coreferences.
>> >>>>
>> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> >>>> [3]
>> >>>>
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>> >>>>
>> >>>>
>> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> >>>> <cr...@gmail.com> wrote:
>> >>>> > There are several dbpedia categories for each entity, in this case
>> for
>> >>>> > Microsoft we have :
>> >>>> >
>> >>>> > category:Companies_in_the_NASDAQ-100_Index
>> >>>> > category:Microsoft
>> >>>> > category:Software_companies_of_the_United_States
>> >>>> > category:Software_companies_based_in_Washington_(state)
>> >>>> > category:Companies_established_in_1975
>> >>>> > category:1975_establishments_in_the_United_States
>> >>>> > category:Companies_based_in_Redmond,_Washington
>> >>>> > category:Multinational_companies_headquartered_in_the_United_States
>> >>>> > category:Cloud_computing_providers
>> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
>> >>>> >
>> >>>> > So we also have "Companies based in Redmont,Washington" which could
>> be
>> >>>> > matched.
>> >>>> >
>> >>>> >
>> >>>> > There is still other contextual information from dbpedia which can
>> be
>> >>>> used.
>> >>>> > For example for an Organization we could also include :
>> >>>> > dbpprop:industry = Software
>> >>>> > dbpprop:service = Online Service Providers
>> >>>> >
>> >>>> > and for a Person (that's for Barack Obama) :
>> >>>> >
>> >>>> > dbpedia-owl:profession:
>> >>>> >                                dbpedia:Author
>> >>>> >                                dbpedia:Constitutional_law
>> >>>> >                                dbpedia:Lawyer
>> >>>> >                                dbpedia:Community_organizing
>> >>>> >
>> >>>> > I'd like to continue investigating this as I think that it may have
>> >>>> some
>> >>>> > value in increasing the number of coreference resolutions and I'd
>> like
>> >>>> to
>> >>>> > concentrate more on precision rather than recall since we already
>> have
>> >>>> a
>> >>>> > set of coreferences detected by the stanford nlp tool and this would
>> >>>> be as
>> >>>> > an addition to that (at least this is how I would like to use it).
>> >>>> >
>> >>>> > Is it ok if I track this by opening a jira? I could update it to
>> show
>> >>>> my
>> >>>> > progress and also my conclusions and if it turns out that it was a
>> bad
>> >>>> idea
>> >>>> > then that's the situation at least I'll end up with more knowledge
>> >>>> about
>> >>>> > Stanbol in the end :).
>> >>>> >
>> >>>> >
>> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>> >>>> >
>> >>>> >> Hi Cristian,
>> >>>> >>
>> >>>> >> The approach sounds nice. I don't want to be the devil's advocate
>> but
>> >>>> I'm
>> >>>> >> just not sure about the recall using the dbpedia categories
>> feature.
>> >>>> For
>> >>>> >> example, your sentence could be also "Microsoft posted its 2013
>> >>>> earnings.
>> >>>> >> The Redmond's company made a huge profit". So, maybe including more
>> >>>> >> contextual information from dbpedia could increase the recall but
>> of
>> >>>> course
>> >>>> >> will reduce the precision.
>> >>>> >>
>> >>>> >> Cheers,
>> >>>> >> Rafa
>> >>>> >>
>> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >>>> >>
>> >>>> >>  Back with a more detailed description of the steps for making this
>> >>>> kind of
>> >>>> >>> coreference work.
>> >>>> >>>
>> >>>> >>> I will be using references to the following text in the steps
>> below
>> >>>> in
>> >>>> >>> order to make things clearer : "Microsoft posted its 2013
>> earnings.
>> >>>> The
>> >>>> >>> software company made a huge profit."
>> >>>> >>>
>> >>>> >>> 1. For every noun phrase in the text which has :
>> >>>> >>>      a. a determinate pos which implies reference to an entity
>> local
>> >>>> to
>> >>>> >>> the
>> >>>> >>> text, such as "the, this, these") but not "another, every", etc
>> which
>> >>>> >>> implies a reference to an entity outside of the text.
>> >>>> >>>      b. having at least another noun aside from the main required
>> >>>> noun
>> >>>> >>> which
>> >>>> >>> further describes it. For example I will not count "The company"
>> as
>> >>>> being
>> >>>> >>> a
>> >>>> >>> legitimate candidate since this could create a lot of false
>> >>>> positives by
>> >>>> >>> considering the double meaning of some words such as "in the
>> company
>> >>>> of
>> >>>> >>> good people".
>> >>>> >>> "The software company" is a good candidate since we also have
>> >>>> "software".
>> >>>> >>>
>> >>>> >>> 2. match the nouns in the noun phrase to the contents of the
>> dbpedia
>> >>>> >>> categories of each named entity found prior to the location of the
>> >>>> noun
>> >>>> >>> phrase in the text.
>> >>>> >>> The dbpedia categories are in the following format (for Microsoft
>> for
>> >>>> >>> example) : "Software companies of the United States".
>> >>>> >>>   So we try to match "software company" with that.
>> >>>> >>> First, as you can see, the main noun in the dbpedia category has a
>> >>>> plural
>> >>>> >>> form and it's the same for all categories which I saw. I don't
>> know
>> >>>> if
>> >>>> >>> there's an easier way to do this but I thought of applying a
>> >>>> lemmatizer on
>> >>>> >>> the category and the noun phrase in order for them to have a
>> common
>> >>>> >>> denominator.This also works if the noun phrase itself has a plural
>> >>>> form.
>> >>>> >>>
>> >>>> >>> Second, I'll need to use for comparison only the words in the
>> >>>> category
>> >>>> >>> which are themselves nouns and not prepositions or determiners
>> such
>> >>>> as "of
>> >>>> >>> the".This means that I need to pos tag the categories contents as
>> >>>> well.
>> >>>> >>> I was thinking of running the pos and lemma on the dbpedia
>> >>>> categories when
>> >>>> >>> building the dbpedia backed entity hub and storing them for later
>> >>>> use - I
>> >>>> >>> don't know how feasible this is at the moment.
>> >>>> >>>
>> >>>> >>> After this I can compare each noun in the noun phrase with the
>> >>>> equivalent
>> >>>> >>> nouns in the categories and based on the number of matches I can
>> >>>> create a
>> >>>> >>> confidence level.
>> >>>> >>>
>> >>>> >>> 3. match the noun of the noun phrase with the rdf:type from
>> dbpedia
>> >>>> of the
>> >>>> >>> named entity. If this matches increase the confidence level.
>> >>>> >>>
>> >>>> >>> 4. If there are multiple named entities which can match a certain
>> >>>> noun
>> >>>> >>> phrase then link the noun phrase with the closest named entity
>> prior
>> >>>> to it
>> >>>> >>> in the text.
>> >>>> >>>
>> >>>> >>> What do you think?
>> >>>> >>>
>> >>>> >>> Cristian
>> >>>> >>>
>> >>>> >>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
>> >>>> >>>
>> >>>> >>>  Hi Rafa,
>> >>>> >>>>
>> >>>> >>>> I don't yet have a concrete heursitic but I'm working on it. I'll
>> >>>> provide
>> >>>> >>>> it here so that you guys can give me a feedback on it.
>> >>>> >>>>
>> >>>> >>>> What are "locality" features?
>> >>>> >>>>
>> >>>> >>>> I looked at Bart and other coref tools such as ArkRef and
>> >>>> CherryPicker
>> >>>> >>>> and
>> >>>> >>>> they don't provide such a coreference.
>> >>>> >>>>
>> >>>> >>>> Cristian
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>> >>>> >>>>
>> >>>> >>>> Hi Cristian,
>> >>>> >>>>
>> >>>> >>>>> Without having more details about your concrete heuristic, in my
>> >>>> honest
>> >>>> >>>>> opinion, such approach could produce a lot of false positives. I
>> >>>> don't
>> >>>> >>>>> know
>> >>>> >>>>> if you are planning to use some "locality" features to detect
>> such
>> >>>> >>>>> coreferences but you need to take into account that it is quite
>> >>>> usual
>> >>>> >>>>> that
>> >>>> >>>>> coreferenced mentions can occurs even in different paragraphs.
>> >>>> Although
>> >>>> >>>>> I'm
>> >>>> >>>>> not an expert in Natural Language Understanding, I would say it
>> is
>> >>>> quite
>> >>>> >>>>> difficult to get decent precision/recall rates for coreferencing
>> >>>> using
>> >>>> >>>>> fixed rules. Maybe you can give a try to others tools like BART
>> (
>> >>>> >>>>> http://www.bart-coref.org/).
>> >>>> >>>>>
>> >>>> >>>>> Cheers,
>> >>>> >>>>> Rafa Haro
>> >>>> >>>>>
>> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >>>> >>>>>
>> >>>> >>>>>   Hi,
>> >>>> >>>>>
>> >>>> >>>>>> One of the necessary steps for implementing the Event
>> extraction
>> >>>> Engine
>> >>>> >>>>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121is
>> >>>> to
>> >>>> >>>>>> have
>> >>>> >>>>>> coreference resolution in the given text. This is provided now
>> >>>> via the
>> >>>> >>>>>> stanford-nlp project but as far as I saw this module is
>> performing
>> >>>> >>>>>> mostly
>> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama)
>> >>>> coreference
>> >>>> >>>>>> resolution.
>> >>>> >>>>>>
>> >>>> >>>>>> In order to get more coreferences from the text I though of
>> >>>> creating
>> >>>> >>>>>> some
>> >>>> >>>>>> logic that would detect this kind of coreference :
>> >>>> >>>>>> "Apple reaches new profit heights. The software company just
>> >>>> announced
>> >>>> >>>>>> its
>> >>>> >>>>>> 2013 earnings."
>> >>>> >>>>>> Here "The software company" obviously refers to "Apple".
>> >>>> >>>>>> So I'd like to detect coreferences of Named Entities which are
>> of
>> >>>> the
>> >>>> >>>>>> rdf:type of the Named Entity , in this case "company" and also
>> >>>> have
>> >>>> >>>>>> attributes which can be found in the dbpedia categories of the
>> >>>> named
>> >>>> >>>>>> entity, in this case "software".
>> >>>> >>>>>>
>> >>>> >>>>>> The detection of coreferences such as "The software company" in
>> >>>> the
>> >>>> >>>>>> text
>> >>>> >>>>>> would also be done by either using the new Pos Tag Based Phrase
>> >>>> >>>>>> extraction
>> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of the
>> >>>> sentence and
>> >>>> >>>>>> picking up only subjects or objects.
>> >>>> >>>>>>
>> >>>> >>>>>> At this point I'd like to know if this kind of logic would be
>> >>>> useful
>> >>>> >>>>>> as a
>> >>>> >>>>>> separate Enhancement Engine (in case the precision and recall
>> are
>> >>>> good
>> >>>> >>>>>> enough) in Stanbol?
>> >>>> >>>>>>
>> >>>> >>>>>> Thanks,
>> >>>> >>>>>> Cristian
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >>>> | Bodenlehenstraße 11                             ++43-699-11108907
>> >>>> | A-5500 Bischofshofen
>> >>>>
>> >>>
>> >>>
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
Thanks.
I assume I should get the Named entities using the same but with
NlpAnnotations.NER_ANNOTATION?



2014-03-10 13:29 GMT+02:00 Rupert Westenthaler <
rupert.westenthaler@gmail.com>:

> Hallo Cristian,
>
> NounPhrases are not added to the RDF enhancement results. You need to
> use the AnalyzedText ContentPart [1]
>
> here is some demo code you can use in the computeEnhancement method
>
>         AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci, true);
>         Iterator<? extends Section> sections = at.getSentences();
>         if(!sections.hasNext()){ //process as single sentence
>             sections = Collections.singleton(at).iterator();
>         }
>
>         while(sections.hasNext()){
>             Section section = sections.next();
>             Iterator<Span> chunks =
> section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
>             while(chunks.hasNext()){
>                 Span chunk = chunks.next();
>                 Value<PhraseTag> phrase =
> chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
>                 if(phrase.value().getCategory() == LexicalCategory.Noun){
>                     log.info(" - NounPhrase [{},{}] {}", new Object[]{
>
> chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
>                 }
>             }
>         }
>
> hope this helps
>
> best
> Rupert
>
> [1]
> http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
>
> On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > I started to implement the engine and I'm having problems with getting
> > results for noun phrases. I modified the "default" weighted chain to also
> > include the PosChunkerEngine and ran a sample text : "Angela Merkel
> visted
> > China. The german chancellor met with various people". I expected that
> the
> > RDF XML output would contain some info about the noun phrases but I
> cannot
> > see any.
> > Could you point me to the correct way to generate the noun phrases?
> >
> > Thanks,
> > Cristian
> >
> >
> > 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <
> cristian.petroaca@gmail.com>:
> >
> >> Opened https://issues.apache.org/jira/browse/STANBOL-1279
> >>
> >>
> >> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <
> cristian.petroaca@gmail.com>
> >> :
> >>
> >> Hi Rupert,
> >>>
> >>> The "spatial" dimension is a good idea. I'll also take a look at Yago.
> >>>
> >>> I will create a Jira with what we talked about here. It will probably
> >>> have just a draft-like description for now and will be updated as I go
> >>> along.
> >>>
> >>> Thanks,
> >>> Cristian
> >>>
> >>>
> >>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> >>> rupert.westenthaler@gmail.com>:
> >>>
> >>> Hi Cristian,
> >>>>
> >>>> definitely an interesting approach. You should have a look at Yago2
> >>>> [1]. As far as I can remember the Yago taxonomy is much better
> >>>> structured as the one used by dbpedia. Mapping suggestions of dbpedia
> >>>> to concepts in Yago2 is easy as both dbpedia and yago2 do provide
> >>>> mappings [2] and [3]
> >>>>
> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
> >>>> >>
> >>>> >> "Microsoft posted its 2013 earnings. The Redmond's company made a
> >>>> >> huge profit".
> >>>>
> >>>> Thats actually a very good example. Spatial contexts are very
> >>>> important as they tend to be often used for referencing. So I would
> >>>> suggest to specially treat the spatial context. For spatial Entities
> >>>> (like a City) this is easy, but even for other (like a Person,
> >>>> Company) you could use relations to spatial entities define their
> >>>> spatial context. This context could than be used to correctly link
> >>>> "The Redmond's company" to "Microsoft".
> >>>>
> >>>> In addition I would suggest to use the "spatial" context of each
> >>>> entity (basically relation to entities that are cities, regions,
> >>>> countries) as a separate dimension, because those are very often used
> >>>> for coreferences.
> >>>>
> >>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> >>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> >>>> [3]
> >>>>
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
> >>>>
> >>>>
> >>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
> >>>> <cr...@gmail.com> wrote:
> >>>> > There are several dbpedia categories for each entity, in this case
> for
> >>>> > Microsoft we have :
> >>>> >
> >>>> > category:Companies_in_the_NASDAQ-100_Index
> >>>> > category:Microsoft
> >>>> > category:Software_companies_of_the_United_States
> >>>> > category:Software_companies_based_in_Washington_(state)
> >>>> > category:Companies_established_in_1975
> >>>> > category:1975_establishments_in_the_United_States
> >>>> > category:Companies_based_in_Redmond,_Washington
> >>>> > category:Multinational_companies_headquartered_in_the_United_States
> >>>> > category:Cloud_computing_providers
> >>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
> >>>> >
> >>>> > So we also have "Companies based in Redmont,Washington" which could
> be
> >>>> > matched.
> >>>> >
> >>>> >
> >>>> > There is still other contextual information from dbpedia which can
> be
> >>>> used.
> >>>> > For example for an Organization we could also include :
> >>>> > dbpprop:industry = Software
> >>>> > dbpprop:service = Online Service Providers
> >>>> >
> >>>> > and for a Person (that's for Barack Obama) :
> >>>> >
> >>>> > dbpedia-owl:profession:
> >>>> >                                dbpedia:Author
> >>>> >                                dbpedia:Constitutional_law
> >>>> >                                dbpedia:Lawyer
> >>>> >                                dbpedia:Community_organizing
> >>>> >
> >>>> > I'd like to continue investigating this as I think that it may have
> >>>> some
> >>>> > value in increasing the number of coreference resolutions and I'd
> like
> >>>> to
> >>>> > concentrate more on precision rather than recall since we already
> have
> >>>> a
> >>>> > set of coreferences detected by the stanford nlp tool and this would
> >>>> be as
> >>>> > an addition to that (at least this is how I would like to use it).
> >>>> >
> >>>> > Is it ok if I track this by opening a jira? I could update it to
> show
> >>>> my
> >>>> > progress and also my conclusions and if it turns out that it was a
> bad
> >>>> idea
> >>>> > then that's the situation at least I'll end up with more knowledge
> >>>> about
> >>>> > Stanbol in the end :).
> >>>> >
> >>>> >
> >>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
> >>>> >
> >>>> >> Hi Cristian,
> >>>> >>
> >>>> >> The approach sounds nice. I don't want to be the devil's advocate
> but
> >>>> I'm
> >>>> >> just not sure about the recall using the dbpedia categories
> feature.
> >>>> For
> >>>> >> example, your sentence could be also "Microsoft posted its 2013
> >>>> earnings.
> >>>> >> The Redmond's company made a huge profit". So, maybe including more
> >>>> >> contextual information from dbpedia could increase the recall but
> of
> >>>> course
> >>>> >> will reduce the precision.
> >>>> >>
> >>>> >> Cheers,
> >>>> >> Rafa
> >>>> >>
> >>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >>>> >>
> >>>> >>  Back with a more detailed description of the steps for making this
> >>>> kind of
> >>>> >>> coreference work.
> >>>> >>>
> >>>> >>> I will be using references to the following text in the steps
> below
> >>>> in
> >>>> >>> order to make things clearer : "Microsoft posted its 2013
> earnings.
> >>>> The
> >>>> >>> software company made a huge profit."
> >>>> >>>
> >>>> >>> 1. For every noun phrase in the text which has :
> >>>> >>>      a. a determinate pos which implies reference to an entity
> local
> >>>> to
> >>>> >>> the
> >>>> >>> text, such as "the, this, these") but not "another, every", etc
> which
> >>>> >>> implies a reference to an entity outside of the text.
> >>>> >>>      b. having at least another noun aside from the main required
> >>>> noun
> >>>> >>> which
> >>>> >>> further describes it. For example I will not count "The company"
> as
> >>>> being
> >>>> >>> a
> >>>> >>> legitimate candidate since this could create a lot of false
> >>>> positives by
> >>>> >>> considering the double meaning of some words such as "in the
> company
> >>>> of
> >>>> >>> good people".
> >>>> >>> "The software company" is a good candidate since we also have
> >>>> "software".
> >>>> >>>
> >>>> >>> 2. match the nouns in the noun phrase to the contents of the
> dbpedia
> >>>> >>> categories of each named entity found prior to the location of the
> >>>> noun
> >>>> >>> phrase in the text.
> >>>> >>> The dbpedia categories are in the following format (for Microsoft
> for
> >>>> >>> example) : "Software companies of the United States".
> >>>> >>>   So we try to match "software company" with that.
> >>>> >>> First, as you can see, the main noun in the dbpedia category has a
> >>>> plural
> >>>> >>> form and it's the same for all categories which I saw. I don't
> know
> >>>> if
> >>>> >>> there's an easier way to do this but I thought of applying a
> >>>> lemmatizer on
> >>>> >>> the category and the noun phrase in order for them to have a
> common
> >>>> >>> denominator.This also works if the noun phrase itself has a plural
> >>>> form.
> >>>> >>>
> >>>> >>> Second, I'll need to use for comparison only the words in the
> >>>> category
> >>>> >>> which are themselves nouns and not prepositions or determiners
> such
> >>>> as "of
> >>>> >>> the".This means that I need to pos tag the categories contents as
> >>>> well.
> >>>> >>> I was thinking of running the pos and lemma on the dbpedia
> >>>> categories when
> >>>> >>> building the dbpedia backed entity hub and storing them for later
> >>>> use - I
> >>>> >>> don't know how feasible this is at the moment.
> >>>> >>>
> >>>> >>> After this I can compare each noun in the noun phrase with the
> >>>> equivalent
> >>>> >>> nouns in the categories and based on the number of matches I can
> >>>> create a
> >>>> >>> confidence level.
> >>>> >>>
> >>>> >>> 3. match the noun of the noun phrase with the rdf:type from
> dbpedia
> >>>> of the
> >>>> >>> named entity. If this matches increase the confidence level.
> >>>> >>>
> >>>> >>> 4. If there are multiple named entities which can match a certain
> >>>> noun
> >>>> >>> phrase then link the noun phrase with the closest named entity
> prior
> >>>> to it
> >>>> >>> in the text.
> >>>> >>>
> >>>> >>> What do you think?
> >>>> >>>
> >>>> >>> Cristian
> >>>> >>>
> >>>> >>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
> >>>> >>>
> >>>> >>>  Hi Rafa,
> >>>> >>>>
> >>>> >>>> I don't yet have a concrete heursitic but I'm working on it. I'll
> >>>> provide
> >>>> >>>> it here so that you guys can give me a feedback on it.
> >>>> >>>>
> >>>> >>>> What are "locality" features?
> >>>> >>>>
> >>>> >>>> I looked at Bart and other coref tools such as ArkRef and
> >>>> CherryPicker
> >>>> >>>> and
> >>>> >>>> they don't provide such a coreference.
> >>>> >>>>
> >>>> >>>> Cristian
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
> >>>> >>>>
> >>>> >>>> Hi Cristian,
> >>>> >>>>
> >>>> >>>>> Without having more details about your concrete heuristic, in my
> >>>> honest
> >>>> >>>>> opinion, such approach could produce a lot of false positives. I
> >>>> don't
> >>>> >>>>> know
> >>>> >>>>> if you are planning to use some "locality" features to detect
> such
> >>>> >>>>> coreferences but you need to take into account that it is quite
> >>>> usual
> >>>> >>>>> that
> >>>> >>>>> coreferenced mentions can occurs even in different paragraphs.
> >>>> Although
> >>>> >>>>> I'm
> >>>> >>>>> not an expert in Natural Language Understanding, I would say it
> is
> >>>> quite
> >>>> >>>>> difficult to get decent precision/recall rates for coreferencing
> >>>> using
> >>>> >>>>> fixed rules. Maybe you can give a try to others tools like BART
> (
> >>>> >>>>> http://www.bart-coref.org/).
> >>>> >>>>>
> >>>> >>>>> Cheers,
> >>>> >>>>> Rafa Haro
> >>>> >>>>>
> >>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
> >>>> >>>>>
> >>>> >>>>>   Hi,
> >>>> >>>>>
> >>>> >>>>>> One of the necessary steps for implementing the Event
> extraction
> >>>> Engine
> >>>> >>>>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121is
> >>>> to
> >>>> >>>>>> have
> >>>> >>>>>> coreference resolution in the given text. This is provided now
> >>>> via the
> >>>> >>>>>> stanford-nlp project but as far as I saw this module is
> performing
> >>>> >>>>>> mostly
> >>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama)
> >>>> coreference
> >>>> >>>>>> resolution.
> >>>> >>>>>>
> >>>> >>>>>> In order to get more coreferences from the text I though of
> >>>> creating
> >>>> >>>>>> some
> >>>> >>>>>> logic that would detect this kind of coreference :
> >>>> >>>>>> "Apple reaches new profit heights. The software company just
> >>>> announced
> >>>> >>>>>> its
> >>>> >>>>>> 2013 earnings."
> >>>> >>>>>> Here "The software company" obviously refers to "Apple".
> >>>> >>>>>> So I'd like to detect coreferences of Named Entities which are
> of
> >>>> the
> >>>> >>>>>> rdf:type of the Named Entity , in this case "company" and also
> >>>> have
> >>>> >>>>>> attributes which can be found in the dbpedia categories of the
> >>>> named
> >>>> >>>>>> entity, in this case "software".
> >>>> >>>>>>
> >>>> >>>>>> The detection of coreferences such as "The software company" in
> >>>> the
> >>>> >>>>>> text
> >>>> >>>>>> would also be done by either using the new Pos Tag Based Phrase
> >>>> >>>>>> extraction
> >>>> >>>>>> Engine (noun phrases) or by using a dependency tree of the
> >>>> sentence and
> >>>> >>>>>> picking up only subjects or objects.
> >>>> >>>>>>
> >>>> >>>>>> At this point I'd like to know if this kind of logic would be
> >>>> useful
> >>>> >>>>>> as a
> >>>> >>>>>> separate Enhancement Engine (in case the precision and recall
> are
> >>>> good
> >>>> >>>>>> enough) in Stanbol?
> >>>> >>>>>>
> >>>> >>>>>> Thanks,
> >>>> >>>>>> Cristian
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>>> | Bodenlehenstraße 11                             ++43-699-11108907
> >>>> | A-5500 Bischofshofen
> >>>>
> >>>
> >>>
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hallo Cristian,

NounPhrases are not added to the RDF enhancement results. You need to
use the AnalyzedText ContentPart [1]

here is some demo code you can use in the computeEnhancement method

        AnalysedText at = NlpEngineHelper.getAnalysedText(this, ci, true);
        Iterator<? extends Section> sections = at.getSentences();
        if(!sections.hasNext()){ //process as single sentence
            sections = Collections.singleton(at).iterator();
        }

        while(sections.hasNext()){
            Section section = sections.next();
            Iterator<Span> chunks =
section.getEnclosed(EnumSet.of(SpanTypeEnum.Chunk));
            while(chunks.hasNext()){
                Span chunk = chunks.next();
                Value<PhraseTag> phrase =
chunk.getAnnotation(NlpAnnotations.PHRASE_ANNOTATION);
                if(phrase.value().getCategory() == LexicalCategory.Noun){
                    log.info(" - NounPhrase [{},{}] {}", new Object[]{
                            chunk.getStart(),chunk.getEnd(),chunk.getSpan()});
                }
            }
        }

hope this helps

best
Rupert

[1] http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext

On Sun, Mar 9, 2014 at 6:07 PM, Cristian Petroaca
<cr...@gmail.com> wrote:
> I started to implement the engine and I'm having problems with getting
> results for noun phrases. I modified the "default" weighted chain to also
> include the PosChunkerEngine and ran a sample text : "Angela Merkel visted
> China. The german chancellor met with various people". I expected that the
> RDF XML output would contain some info about the noun phrases but I cannot
> see any.
> Could you point me to the correct way to generate the noun phrases?
>
> Thanks,
> Cristian
>
>
> 2014-02-09 14:15 GMT+02:00 Cristian Petroaca <cr...@gmail.com>:
>
>> Opened https://issues.apache.org/jira/browse/STANBOL-1279
>>
>>
>> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <cr...@gmail.com>
>> :
>>
>> Hi Rupert,
>>>
>>> The "spatial" dimension is a good idea. I'll also take a look at Yago.
>>>
>>> I will create a Jira with what we talked about here. It will probably
>>> have just a draft-like description for now and will be updated as I go
>>> along.
>>>
>>> Thanks,
>>> Cristian
>>>
>>>
>>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com>:
>>>
>>> Hi Cristian,
>>>>
>>>> definitely an interesting approach. You should have a look at Yago2
>>>> [1]. As far as I can remember the Yago taxonomy is much better
>>>> structured as the one used by dbpedia. Mapping suggestions of dbpedia
>>>> to concepts in Yago2 is easy as both dbpedia and yago2 do provide
>>>> mappings [2] and [3]
>>>>
>>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>>>> >>
>>>> >> "Microsoft posted its 2013 earnings. The Redmond's company made a
>>>> >> huge profit".
>>>>
>>>> Thats actually a very good example. Spatial contexts are very
>>>> important as they tend to be often used for referencing. So I would
>>>> suggest to specially treat the spatial context. For spatial Entities
>>>> (like a City) this is easy, but even for other (like a Person,
>>>> Company) you could use relations to spatial entities define their
>>>> spatial context. This context could than be used to correctly link
>>>> "The Redmond's company" to "Microsoft".
>>>>
>>>> In addition I would suggest to use the "spatial" context of each
>>>> entity (basically relation to entities that are cities, regions,
>>>> countries) as a separate dimension, because those are very often used
>>>> for coreferences.
>>>>
>>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>>>> [3]
>>>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>>>>
>>>>
>>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>>>> <cr...@gmail.com> wrote:
>>>> > There are several dbpedia categories for each entity, in this case for
>>>> > Microsoft we have :
>>>> >
>>>> > category:Companies_in_the_NASDAQ-100_Index
>>>> > category:Microsoft
>>>> > category:Software_companies_of_the_United_States
>>>> > category:Software_companies_based_in_Washington_(state)
>>>> > category:Companies_established_in_1975
>>>> > category:1975_establishments_in_the_United_States
>>>> > category:Companies_based_in_Redmond,_Washington
>>>> > category:Multinational_companies_headquartered_in_the_United_States
>>>> > category:Cloud_computing_providers
>>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
>>>> >
>>>> > So we also have "Companies based in Redmont,Washington" which could be
>>>> > matched.
>>>> >
>>>> >
>>>> > There is still other contextual information from dbpedia which can be
>>>> used.
>>>> > For example for an Organization we could also include :
>>>> > dbpprop:industry = Software
>>>> > dbpprop:service = Online Service Providers
>>>> >
>>>> > and for a Person (that's for Barack Obama) :
>>>> >
>>>> > dbpedia-owl:profession:
>>>> >                                dbpedia:Author
>>>> >                                dbpedia:Constitutional_law
>>>> >                                dbpedia:Lawyer
>>>> >                                dbpedia:Community_organizing
>>>> >
>>>> > I'd like to continue investigating this as I think that it may have
>>>> some
>>>> > value in increasing the number of coreference resolutions and I'd like
>>>> to
>>>> > concentrate more on precision rather than recall since we already have
>>>> a
>>>> > set of coreferences detected by the stanford nlp tool and this would
>>>> be as
>>>> > an addition to that (at least this is how I would like to use it).
>>>> >
>>>> > Is it ok if I track this by opening a jira? I could update it to show
>>>> my
>>>> > progress and also my conclusions and if it turns out that it was a bad
>>>> idea
>>>> > then that's the situation at least I'll end up with more knowledge
>>>> about
>>>> > Stanbol in the end :).
>>>> >
>>>> >
>>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>>>> >
>>>> >> Hi Cristian,
>>>> >>
>>>> >> The approach sounds nice. I don't want to be the devil's advocate but
>>>> I'm
>>>> >> just not sure about the recall using the dbpedia categories feature.
>>>> For
>>>> >> example, your sentence could be also "Microsoft posted its 2013
>>>> earnings.
>>>> >> The Redmond's company made a huge profit". So, maybe including more
>>>> >> contextual information from dbpedia could increase the recall but of
>>>> course
>>>> >> will reduce the precision.
>>>> >>
>>>> >> Cheers,
>>>> >> Rafa
>>>> >>
>>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>>>> >>
>>>> >>  Back with a more detailed description of the steps for making this
>>>> kind of
>>>> >>> coreference work.
>>>> >>>
>>>> >>> I will be using references to the following text in the steps below
>>>> in
>>>> >>> order to make things clearer : "Microsoft posted its 2013 earnings.
>>>> The
>>>> >>> software company made a huge profit."
>>>> >>>
>>>> >>> 1. For every noun phrase in the text which has :
>>>> >>>      a. a determinate pos which implies reference to an entity local
>>>> to
>>>> >>> the
>>>> >>> text, such as "the, this, these") but not "another, every", etc which
>>>> >>> implies a reference to an entity outside of the text.
>>>> >>>      b. having at least another noun aside from the main required
>>>> noun
>>>> >>> which
>>>> >>> further describes it. For example I will not count "The company" as
>>>> being
>>>> >>> a
>>>> >>> legitimate candidate since this could create a lot of false
>>>> positives by
>>>> >>> considering the double meaning of some words such as "in the company
>>>> of
>>>> >>> good people".
>>>> >>> "The software company" is a good candidate since we also have
>>>> "software".
>>>> >>>
>>>> >>> 2. match the nouns in the noun phrase to the contents of the dbpedia
>>>> >>> categories of each named entity found prior to the location of the
>>>> noun
>>>> >>> phrase in the text.
>>>> >>> The dbpedia categories are in the following format (for Microsoft for
>>>> >>> example) : "Software companies of the United States".
>>>> >>>   So we try to match "software company" with that.
>>>> >>> First, as you can see, the main noun in the dbpedia category has a
>>>> plural
>>>> >>> form and it's the same for all categories which I saw. I don't know
>>>> if
>>>> >>> there's an easier way to do this but I thought of applying a
>>>> lemmatizer on
>>>> >>> the category and the noun phrase in order for them to have a common
>>>> >>> denominator.This also works if the noun phrase itself has a plural
>>>> form.
>>>> >>>
>>>> >>> Second, I'll need to use for comparison only the words in the
>>>> category
>>>> >>> which are themselves nouns and not prepositions or determiners such
>>>> as "of
>>>> >>> the".This means that I need to pos tag the categories contents as
>>>> well.
>>>> >>> I was thinking of running the pos and lemma on the dbpedia
>>>> categories when
>>>> >>> building the dbpedia backed entity hub and storing them for later
>>>> use - I
>>>> >>> don't know how feasible this is at the moment.
>>>> >>>
>>>> >>> After this I can compare each noun in the noun phrase with the
>>>> equivalent
>>>> >>> nouns in the categories and based on the number of matches I can
>>>> create a
>>>> >>> confidence level.
>>>> >>>
>>>> >>> 3. match the noun of the noun phrase with the rdf:type from dbpedia
>>>> of the
>>>> >>> named entity. If this matches increase the confidence level.
>>>> >>>
>>>> >>> 4. If there are multiple named entities which can match a certain
>>>> noun
>>>> >>> phrase then link the noun phrase with the closest named entity prior
>>>> to it
>>>> >>> in the text.
>>>> >>>
>>>> >>> What do you think?
>>>> >>>
>>>> >>> Cristian
>>>> >>>
>>>> >>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
>>>> >>>
>>>> >>>  Hi Rafa,
>>>> >>>>
>>>> >>>> I don't yet have a concrete heursitic but I'm working on it. I'll
>>>> provide
>>>> >>>> it here so that you guys can give me a feedback on it.
>>>> >>>>
>>>> >>>> What are "locality" features?
>>>> >>>>
>>>> >>>> I looked at Bart and other coref tools such as ArkRef and
>>>> CherryPicker
>>>> >>>> and
>>>> >>>> they don't provide such a coreference.
>>>> >>>>
>>>> >>>> Cristian
>>>> >>>>
>>>> >>>>
>>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>>>> >>>>
>>>> >>>> Hi Cristian,
>>>> >>>>
>>>> >>>>> Without having more details about your concrete heuristic, in my
>>>> honest
>>>> >>>>> opinion, such approach could produce a lot of false positives. I
>>>> don't
>>>> >>>>> know
>>>> >>>>> if you are planning to use some "locality" features to detect such
>>>> >>>>> coreferences but you need to take into account that it is quite
>>>> usual
>>>> >>>>> that
>>>> >>>>> coreferenced mentions can occurs even in different paragraphs.
>>>> Although
>>>> >>>>> I'm
>>>> >>>>> not an expert in Natural Language Understanding, I would say it is
>>>> quite
>>>> >>>>> difficult to get decent precision/recall rates for coreferencing
>>>> using
>>>> >>>>> fixed rules. Maybe you can give a try to others tools like BART (
>>>> >>>>> http://www.bart-coref.org/).
>>>> >>>>>
>>>> >>>>> Cheers,
>>>> >>>>> Rafa Haro
>>>> >>>>>
>>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>>>> >>>>>
>>>> >>>>>   Hi,
>>>> >>>>>
>>>> >>>>>> One of the necessary steps for implementing the Event extraction
>>>> Engine
>>>> >>>>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121 is
>>>> to
>>>> >>>>>> have
>>>> >>>>>> coreference resolution in the given text. This is provided now
>>>> via the
>>>> >>>>>> stanford-nlp project but as far as I saw this module is performing
>>>> >>>>>> mostly
>>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama)
>>>> coreference
>>>> >>>>>> resolution.
>>>> >>>>>>
>>>> >>>>>> In order to get more coreferences from the text I though of
>>>> creating
>>>> >>>>>> some
>>>> >>>>>> logic that would detect this kind of coreference :
>>>> >>>>>> "Apple reaches new profit heights. The software company just
>>>> announced
>>>> >>>>>> its
>>>> >>>>>> 2013 earnings."
>>>> >>>>>> Here "The software company" obviously refers to "Apple".
>>>> >>>>>> So I'd like to detect coreferences of Named Entities which are of
>>>> the
>>>> >>>>>> rdf:type of the Named Entity , in this case "company" and also
>>>> have
>>>> >>>>>> attributes which can be found in the dbpedia categories of the
>>>> named
>>>> >>>>>> entity, in this case "software".
>>>> >>>>>>
>>>> >>>>>> The detection of coreferences such as "The software company" in
>>>> the
>>>> >>>>>> text
>>>> >>>>>> would also be done by either using the new Pos Tag Based Phrase
>>>> >>>>>> extraction
>>>> >>>>>> Engine (noun phrases) or by using a dependency tree of the
>>>> sentence and
>>>> >>>>>> picking up only subjects or objects.
>>>> >>>>>>
>>>> >>>>>> At this point I'd like to know if this kind of logic would be
>>>> useful
>>>> >>>>>> as a
>>>> >>>>>> separate Enhancement Engine (in case the precision and recall are
>>>> good
>>>> >>>>>> enough) in Stanbol?
>>>> >>>>>>
>>>> >>>>>> Thanks,
>>>> >>>>>> Cristian
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>>
>>>
>>>
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
I started to implement the engine and I'm having problems with getting
results for noun phrases. I modified the "default" weighted chain to also
include the PosChunkerEngine and ran a sample text : "Angela Merkel visted
China. The german chancellor met with various people". I expected that the
RDF XML output would contain some info about the noun phrases but I cannot
see any.
Could you point me to the correct way to generate the noun phrases?

Thanks,
Cristian


2014-02-09 14:15 GMT+02:00 Cristian Petroaca <cr...@gmail.com>:

> Opened https://issues.apache.org/jira/browse/STANBOL-1279
>
>
> 2014-02-07 10:53 GMT+02:00 Cristian Petroaca <cr...@gmail.com>
> :
>
> Hi Rupert,
>>
>> The "spatial" dimension is a good idea. I'll also take a look at Yago.
>>
>> I will create a Jira with what we talked about here. It will probably
>> have just a draft-like description for now and will be updated as I go
>> along.
>>
>> Thanks,
>> Cristian
>>
>>
>> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
>> rupert.westenthaler@gmail.com>:
>>
>> Hi Cristian,
>>>
>>> definitely an interesting approach. You should have a look at Yago2
>>> [1]. As far as I can remember the Yago taxonomy is much better
>>> structured as the one used by dbpedia. Mapping suggestions of dbpedia
>>> to concepts in Yago2 is easy as both dbpedia and yago2 do provide
>>> mappings [2] and [3]
>>>
>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>>> >>
>>> >> "Microsoft posted its 2013 earnings. The Redmond's company made a
>>> >> huge profit".
>>>
>>> Thats actually a very good example. Spatial contexts are very
>>> important as they tend to be often used for referencing. So I would
>>> suggest to specially treat the spatial context. For spatial Entities
>>> (like a City) this is easy, but even for other (like a Person,
>>> Company) you could use relations to spatial entities define their
>>> spatial context. This context could than be used to correctly link
>>> "The Redmond's company" to "Microsoft".
>>>
>>> In addition I would suggest to use the "spatial" context of each
>>> entity (basically relation to entities that are cities, regions,
>>> countries) as a separate dimension, because those are very often used
>>> for coreferences.
>>>
>>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>>> [3]
>>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>>>
>>>
>>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>>> <cr...@gmail.com> wrote:
>>> > There are several dbpedia categories for each entity, in this case for
>>> > Microsoft we have :
>>> >
>>> > category:Companies_in_the_NASDAQ-100_Index
>>> > category:Microsoft
>>> > category:Software_companies_of_the_United_States
>>> > category:Software_companies_based_in_Washington_(state)
>>> > category:Companies_established_in_1975
>>> > category:1975_establishments_in_the_United_States
>>> > category:Companies_based_in_Redmond,_Washington
>>> > category:Multinational_companies_headquartered_in_the_United_States
>>> > category:Cloud_computing_providers
>>> > category:Companies_in_the_Dow_Jones_Industrial_Average
>>> >
>>> > So we also have "Companies based in Redmont,Washington" which could be
>>> > matched.
>>> >
>>> >
>>> > There is still other contextual information from dbpedia which can be
>>> used.
>>> > For example for an Organization we could also include :
>>> > dbpprop:industry = Software
>>> > dbpprop:service = Online Service Providers
>>> >
>>> > and for a Person (that's for Barack Obama) :
>>> >
>>> > dbpedia-owl:profession:
>>> >                                dbpedia:Author
>>> >                                dbpedia:Constitutional_law
>>> >                                dbpedia:Lawyer
>>> >                                dbpedia:Community_organizing
>>> >
>>> > I'd like to continue investigating this as I think that it may have
>>> some
>>> > value in increasing the number of coreference resolutions and I'd like
>>> to
>>> > concentrate more on precision rather than recall since we already have
>>> a
>>> > set of coreferences detected by the stanford nlp tool and this would
>>> be as
>>> > an addition to that (at least this is how I would like to use it).
>>> >
>>> > Is it ok if I track this by opening a jira? I could update it to show
>>> my
>>> > progress and also my conclusions and if it turns out that it was a bad
>>> idea
>>> > then that's the situation at least I'll end up with more knowledge
>>> about
>>> > Stanbol in the end :).
>>> >
>>> >
>>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>>> >
>>> >> Hi Cristian,
>>> >>
>>> >> The approach sounds nice. I don't want to be the devil's advocate but
>>> I'm
>>> >> just not sure about the recall using the dbpedia categories feature.
>>> For
>>> >> example, your sentence could be also "Microsoft posted its 2013
>>> earnings.
>>> >> The Redmond's company made a huge profit". So, maybe including more
>>> >> contextual information from dbpedia could increase the recall but of
>>> course
>>> >> will reduce the precision.
>>> >>
>>> >> Cheers,
>>> >> Rafa
>>> >>
>>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>>> >>
>>> >>  Back with a more detailed description of the steps for making this
>>> kind of
>>> >>> coreference work.
>>> >>>
>>> >>> I will be using references to the following text in the steps below
>>> in
>>> >>> order to make things clearer : "Microsoft posted its 2013 earnings.
>>> The
>>> >>> software company made a huge profit."
>>> >>>
>>> >>> 1. For every noun phrase in the text which has :
>>> >>>      a. a determinate pos which implies reference to an entity local
>>> to
>>> >>> the
>>> >>> text, such as "the, this, these") but not "another, every", etc which
>>> >>> implies a reference to an entity outside of the text.
>>> >>>      b. having at least another noun aside from the main required
>>> noun
>>> >>> which
>>> >>> further describes it. For example I will not count "The company" as
>>> being
>>> >>> a
>>> >>> legitimate candidate since this could create a lot of false
>>> positives by
>>> >>> considering the double meaning of some words such as "in the company
>>> of
>>> >>> good people".
>>> >>> "The software company" is a good candidate since we also have
>>> "software".
>>> >>>
>>> >>> 2. match the nouns in the noun phrase to the contents of the dbpedia
>>> >>> categories of each named entity found prior to the location of the
>>> noun
>>> >>> phrase in the text.
>>> >>> The dbpedia categories are in the following format (for Microsoft for
>>> >>> example) : "Software companies of the United States".
>>> >>>   So we try to match "software company" with that.
>>> >>> First, as you can see, the main noun in the dbpedia category has a
>>> plural
>>> >>> form and it's the same for all categories which I saw. I don't know
>>> if
>>> >>> there's an easier way to do this but I thought of applying a
>>> lemmatizer on
>>> >>> the category and the noun phrase in order for them to have a common
>>> >>> denominator.This also works if the noun phrase itself has a plural
>>> form.
>>> >>>
>>> >>> Second, I'll need to use for comparison only the words in the
>>> category
>>> >>> which are themselves nouns and not prepositions or determiners such
>>> as "of
>>> >>> the".This means that I need to pos tag the categories contents as
>>> well.
>>> >>> I was thinking of running the pos and lemma on the dbpedia
>>> categories when
>>> >>> building the dbpedia backed entity hub and storing them for later
>>> use - I
>>> >>> don't know how feasible this is at the moment.
>>> >>>
>>> >>> After this I can compare each noun in the noun phrase with the
>>> equivalent
>>> >>> nouns in the categories and based on the number of matches I can
>>> create a
>>> >>> confidence level.
>>> >>>
>>> >>> 3. match the noun of the noun phrase with the rdf:type from dbpedia
>>> of the
>>> >>> named entity. If this matches increase the confidence level.
>>> >>>
>>> >>> 4. If there are multiple named entities which can match a certain
>>> noun
>>> >>> phrase then link the noun phrase with the closest named entity prior
>>> to it
>>> >>> in the text.
>>> >>>
>>> >>> What do you think?
>>> >>>
>>> >>> Cristian
>>> >>>
>>> >>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
>>> >>>
>>> >>>  Hi Rafa,
>>> >>>>
>>> >>>> I don't yet have a concrete heursitic but I'm working on it. I'll
>>> provide
>>> >>>> it here so that you guys can give me a feedback on it.
>>> >>>>
>>> >>>> What are "locality" features?
>>> >>>>
>>> >>>> I looked at Bart and other coref tools such as ArkRef and
>>> CherryPicker
>>> >>>> and
>>> >>>> they don't provide such a coreference.
>>> >>>>
>>> >>>> Cristian
>>> >>>>
>>> >>>>
>>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>>> >>>>
>>> >>>> Hi Cristian,
>>> >>>>
>>> >>>>> Without having more details about your concrete heuristic, in my
>>> honest
>>> >>>>> opinion, such approach could produce a lot of false positives. I
>>> don't
>>> >>>>> know
>>> >>>>> if you are planning to use some "locality" features to detect such
>>> >>>>> coreferences but you need to take into account that it is quite
>>> usual
>>> >>>>> that
>>> >>>>> coreferenced mentions can occurs even in different paragraphs.
>>> Although
>>> >>>>> I'm
>>> >>>>> not an expert in Natural Language Understanding, I would say it is
>>> quite
>>> >>>>> difficult to get decent precision/recall rates for coreferencing
>>> using
>>> >>>>> fixed rules. Maybe you can give a try to others tools like BART (
>>> >>>>> http://www.bart-coref.org/).
>>> >>>>>
>>> >>>>> Cheers,
>>> >>>>> Rafa Haro
>>> >>>>>
>>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>>> >>>>>
>>> >>>>>   Hi,
>>> >>>>>
>>> >>>>>> One of the necessary steps for implementing the Event extraction
>>> Engine
>>> >>>>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121 is
>>> to
>>> >>>>>> have
>>> >>>>>> coreference resolution in the given text. This is provided now
>>> via the
>>> >>>>>> stanford-nlp project but as far as I saw this module is performing
>>> >>>>>> mostly
>>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama)
>>> coreference
>>> >>>>>> resolution.
>>> >>>>>>
>>> >>>>>> In order to get more coreferences from the text I though of
>>> creating
>>> >>>>>> some
>>> >>>>>> logic that would detect this kind of coreference :
>>> >>>>>> "Apple reaches new profit heights. The software company just
>>> announced
>>> >>>>>> its
>>> >>>>>> 2013 earnings."
>>> >>>>>> Here "The software company" obviously refers to "Apple".
>>> >>>>>> So I'd like to detect coreferences of Named Entities which are of
>>> the
>>> >>>>>> rdf:type of the Named Entity , in this case "company" and also
>>> have
>>> >>>>>> attributes which can be found in the dbpedia categories of the
>>> named
>>> >>>>>> entity, in this case "software".
>>> >>>>>>
>>> >>>>>> The detection of coreferences such as "The software company" in
>>> the
>>> >>>>>> text
>>> >>>>>> would also be done by either using the new Pos Tag Based Phrase
>>> >>>>>> extraction
>>> >>>>>> Engine (noun phrases) or by using a dependency tree of the
>>> sentence and
>>> >>>>>> picking up only subjects or objects.
>>> >>>>>>
>>> >>>>>> At this point I'd like to know if this kind of logic would be
>>> useful
>>> >>>>>> as a
>>> >>>>>> separate Enhancement Engine (in case the precision and recall are
>>> good
>>> >>>>>> enough) in Stanbol?
>>> >>>>>>
>>> >>>>>> Thanks,
>>> >>>>>> Cristian
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
Opened https://issues.apache.org/jira/browse/STANBOL-1279


2014-02-07 10:53 GMT+02:00 Cristian Petroaca <cr...@gmail.com>:

> Hi Rupert,
>
> The "spatial" dimension is a good idea. I'll also take a look at Yago.
>
> I will create a Jira with what we talked about here. It will probably have
> just a draft-like description for now and will be updated as I go along.
>
> Thanks,
> Cristian
>
>
> 2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
> rupert.westenthaler@gmail.com>:
>
> Hi Cristian,
>>
>> definitely an interesting approach. You should have a look at Yago2
>> [1]. As far as I can remember the Yago taxonomy is much better
>> structured as the one used by dbpedia. Mapping suggestions of dbpedia
>> to concepts in Yago2 is easy as both dbpedia and yago2 do provide
>> mappings [2] and [3]
>>
>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>> >>
>> >> "Microsoft posted its 2013 earnings. The Redmond's company made a
>> >> huge profit".
>>
>> Thats actually a very good example. Spatial contexts are very
>> important as they tend to be often used for referencing. So I would
>> suggest to specially treat the spatial context. For spatial Entities
>> (like a City) this is easy, but even for other (like a Person,
>> Company) you could use relations to spatial entities define their
>> spatial context. This context could than be used to correctly link
>> "The Redmond's company" to "Microsoft".
>>
>> In addition I would suggest to use the "spatial" context of each
>> entity (basically relation to entities that are cities, regions,
>> countries) as a separate dimension, because those are very often used
>> for coreferences.
>>
>> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
>> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
>> [3]
>> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>>
>>
>> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
>> <cr...@gmail.com> wrote:
>> > There are several dbpedia categories for each entity, in this case for
>> > Microsoft we have :
>> >
>> > category:Companies_in_the_NASDAQ-100_Index
>> > category:Microsoft
>> > category:Software_companies_of_the_United_States
>> > category:Software_companies_based_in_Washington_(state)
>> > category:Companies_established_in_1975
>> > category:1975_establishments_in_the_United_States
>> > category:Companies_based_in_Redmond,_Washington
>> > category:Multinational_companies_headquartered_in_the_United_States
>> > category:Cloud_computing_providers
>> > category:Companies_in_the_Dow_Jones_Industrial_Average
>> >
>> > So we also have "Companies based in Redmont,Washington" which could be
>> > matched.
>> >
>> >
>> > There is still other contextual information from dbpedia which can be
>> used.
>> > For example for an Organization we could also include :
>> > dbpprop:industry = Software
>> > dbpprop:service = Online Service Providers
>> >
>> > and for a Person (that's for Barack Obama) :
>> >
>> > dbpedia-owl:profession:
>> >                                dbpedia:Author
>> >                                dbpedia:Constitutional_law
>> >                                dbpedia:Lawyer
>> >                                dbpedia:Community_organizing
>> >
>> > I'd like to continue investigating this as I think that it may have some
>> > value in increasing the number of coreference resolutions and I'd like
>> to
>> > concentrate more on precision rather than recall since we already have a
>> > set of coreferences detected by the stanford nlp tool and this would be
>> as
>> > an addition to that (at least this is how I would like to use it).
>> >
>> > Is it ok if I track this by opening a jira? I could update it to show my
>> > progress and also my conclusions and if it turns out that it was a bad
>> idea
>> > then that's the situation at least I'll end up with more knowledge about
>> > Stanbol in the end :).
>> >
>> >
>> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>> >
>> >> Hi Cristian,
>> >>
>> >> The approach sounds nice. I don't want to be the devil's advocate but
>> I'm
>> >> just not sure about the recall using the dbpedia categories feature.
>> For
>> >> example, your sentence could be also "Microsoft posted its 2013
>> earnings.
>> >> The Redmond's company made a huge profit". So, maybe including more
>> >> contextual information from dbpedia could increase the recall but of
>> course
>> >> will reduce the precision.
>> >>
>> >> Cheers,
>> >> Rafa
>> >>
>> >> El 04/02/14 09:50, Cristian Petroaca escribió:
>> >>
>> >>  Back with a more detailed description of the steps for making this
>> kind of
>> >>> coreference work.
>> >>>
>> >>> I will be using references to the following text in the steps below in
>> >>> order to make things clearer : "Microsoft posted its 2013 earnings.
>> The
>> >>> software company made a huge profit."
>> >>>
>> >>> 1. For every noun phrase in the text which has :
>> >>>      a. a determinate pos which implies reference to an entity local
>> to
>> >>> the
>> >>> text, such as "the, this, these") but not "another, every", etc which
>> >>> implies a reference to an entity outside of the text.
>> >>>      b. having at least another noun aside from the main required noun
>> >>> which
>> >>> further describes it. For example I will not count "The company" as
>> being
>> >>> a
>> >>> legitimate candidate since this could create a lot of false positives
>> by
>> >>> considering the double meaning of some words such as "in the company
>> of
>> >>> good people".
>> >>> "The software company" is a good candidate since we also have
>> "software".
>> >>>
>> >>> 2. match the nouns in the noun phrase to the contents of the dbpedia
>> >>> categories of each named entity found prior to the location of the
>> noun
>> >>> phrase in the text.
>> >>> The dbpedia categories are in the following format (for Microsoft for
>> >>> example) : "Software companies of the United States".
>> >>>   So we try to match "software company" with that.
>> >>> First, as you can see, the main noun in the dbpedia category has a
>> plural
>> >>> form and it's the same for all categories which I saw. I don't know if
>> >>> there's an easier way to do this but I thought of applying a
>> lemmatizer on
>> >>> the category and the noun phrase in order for them to have a common
>> >>> denominator.This also works if the noun phrase itself has a plural
>> form.
>> >>>
>> >>> Second, I'll need to use for comparison only the words in the category
>> >>> which are themselves nouns and not prepositions or determiners such
>> as "of
>> >>> the".This means that I need to pos tag the categories contents as
>> well.
>> >>> I was thinking of running the pos and lemma on the dbpedia categories
>> when
>> >>> building the dbpedia backed entity hub and storing them for later use
>> - I
>> >>> don't know how feasible this is at the moment.
>> >>>
>> >>> After this I can compare each noun in the noun phrase with the
>> equivalent
>> >>> nouns in the categories and based on the number of matches I can
>> create a
>> >>> confidence level.
>> >>>
>> >>> 3. match the noun of the noun phrase with the rdf:type from dbpedia
>> of the
>> >>> named entity. If this matches increase the confidence level.
>> >>>
>> >>> 4. If there are multiple named entities which can match a certain noun
>> >>> phrase then link the noun phrase with the closest named entity prior
>> to it
>> >>> in the text.
>> >>>
>> >>> What do you think?
>> >>>
>> >>> Cristian
>> >>>
>> >>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
>> >>>
>> >>>  Hi Rafa,
>> >>>>
>> >>>> I don't yet have a concrete heursitic but I'm working on it. I'll
>> provide
>> >>>> it here so that you guys can give me a feedback on it.
>> >>>>
>> >>>> What are "locality" features?
>> >>>>
>> >>>> I looked at Bart and other coref tools such as ArkRef and
>> CherryPicker
>> >>>> and
>> >>>> they don't provide such a coreference.
>> >>>>
>> >>>> Cristian
>> >>>>
>> >>>>
>> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>> >>>>
>> >>>> Hi Cristian,
>> >>>>
>> >>>>> Without having more details about your concrete heuristic, in my
>> honest
>> >>>>> opinion, such approach could produce a lot of false positives. I
>> don't
>> >>>>> know
>> >>>>> if you are planning to use some "locality" features to detect such
>> >>>>> coreferences but you need to take into account that it is quite
>> usual
>> >>>>> that
>> >>>>> coreferenced mentions can occurs even in different paragraphs.
>> Although
>> >>>>> I'm
>> >>>>> not an expert in Natural Language Understanding, I would say it is
>> quite
>> >>>>> difficult to get decent precision/recall rates for coreferencing
>> using
>> >>>>> fixed rules. Maybe you can give a try to others tools like BART (
>> >>>>> http://www.bart-coref.org/).
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Rafa Haro
>> >>>>>
>> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>> >>>>>
>> >>>>>   Hi,
>> >>>>>
>> >>>>>> One of the necessary steps for implementing the Event extraction
>> Engine
>> >>>>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121 is to
>> >>>>>> have
>> >>>>>> coreference resolution in the given text. This is provided now via
>> the
>> >>>>>> stanford-nlp project but as far as I saw this module is performing
>> >>>>>> mostly
>> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama)
>> coreference
>> >>>>>> resolution.
>> >>>>>>
>> >>>>>> In order to get more coreferences from the text I though of
>> creating
>> >>>>>> some
>> >>>>>> logic that would detect this kind of coreference :
>> >>>>>> "Apple reaches new profit heights. The software company just
>> announced
>> >>>>>> its
>> >>>>>> 2013 earnings."
>> >>>>>> Here "The software company" obviously refers to "Apple".
>> >>>>>> So I'd like to detect coreferences of Named Entities which are of
>> the
>> >>>>>> rdf:type of the Named Entity , in this case "company" and also have
>> >>>>>> attributes which can be found in the dbpedia categories of the
>> named
>> >>>>>> entity, in this case "software".
>> >>>>>>
>> >>>>>> The detection of coreferences such as "The software company" in the
>> >>>>>> text
>> >>>>>> would also be done by either using the new Pos Tag Based Phrase
>> >>>>>> extraction
>> >>>>>> Engine (noun phrases) or by using a dependency tree of the
>> sentence and
>> >>>>>> picking up only subjects or objects.
>> >>>>>>
>> >>>>>> At this point I'd like to know if this kind of logic would be
>> useful
>> >>>>>> as a
>> >>>>>> separate Enhancement Engine (in case the precision and recall are
>> good
>> >>>>>> enough) in Stanbol?
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Cristian
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
Hi Rupert,

The "spatial" dimension is a good idea. I'll also take a look at Yago.

I will create a Jira with what we talked about here. It will probably have
just a draft-like description for now and will be updated as I go along.

Thanks,
Cristian


2014-02-06 15:39 GMT+02:00 Rupert Westenthaler <
rupert.westenthaler@gmail.com>:

> Hi Cristian,
>
> definitely an interesting approach. You should have a look at Yago2
> [1]. As far as I can remember the Yago taxonomy is much better
> structured as the one used by dbpedia. Mapping suggestions of dbpedia
> to concepts in Yago2 is easy as both dbpedia and yago2 do provide
> mappings [2] and [3]
>
> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
> >>
> >> "Microsoft posted its 2013 earnings. The Redmond's company made a
> >> huge profit".
>
> Thats actually a very good example. Spatial contexts are very
> important as they tend to be often used for referencing. So I would
> suggest to specially treat the spatial context. For spatial Entities
> (like a City) this is easy, but even for other (like a Person,
> Company) you could use relations to spatial entities define their
> spatial context. This context could than be used to correctly link
> "The Redmond's company" to "Microsoft".
>
> In addition I would suggest to use the "spatial" context of each
> entity (basically relation to entities that are cities, regions,
> countries) as a separate dimension, because those are very often used
> for coreferences.
>
> [1] http://www.mpi-inf.mpg.de/yago-naga/yago/
> [2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
> [3]
> http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z
>
>
> On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
> <cr...@gmail.com> wrote:
> > There are several dbpedia categories for each entity, in this case for
> > Microsoft we have :
> >
> > category:Companies_in_the_NASDAQ-100_Index
> > category:Microsoft
> > category:Software_companies_of_the_United_States
> > category:Software_companies_based_in_Washington_(state)
> > category:Companies_established_in_1975
> > category:1975_establishments_in_the_United_States
> > category:Companies_based_in_Redmond,_Washington
> > category:Multinational_companies_headquartered_in_the_United_States
> > category:Cloud_computing_providers
> > category:Companies_in_the_Dow_Jones_Industrial_Average
> >
> > So we also have "Companies based in Redmont,Washington" which could be
> > matched.
> >
> >
> > There is still other contextual information from dbpedia which can be
> used.
> > For example for an Organization we could also include :
> > dbpprop:industry = Software
> > dbpprop:service = Online Service Providers
> >
> > and for a Person (that's for Barack Obama) :
> >
> > dbpedia-owl:profession:
> >                                dbpedia:Author
> >                                dbpedia:Constitutional_law
> >                                dbpedia:Lawyer
> >                                dbpedia:Community_organizing
> >
> > I'd like to continue investigating this as I think that it may have some
> > value in increasing the number of coreference resolutions and I'd like to
> > concentrate more on precision rather than recall since we already have a
> > set of coreferences detected by the stanford nlp tool and this would be
> as
> > an addition to that (at least this is how I would like to use it).
> >
> > Is it ok if I track this by opening a jira? I could update it to show my
> > progress and also my conclusions and if it turns out that it was a bad
> idea
> > then that's the situation at least I'll end up with more knowledge about
> > Stanbol in the end :).
> >
> >
> > 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
> >
> >> Hi Cristian,
> >>
> >> The approach sounds nice. I don't want to be the devil's advocate but
> I'm
> >> just not sure about the recall using the dbpedia categories feature. For
> >> example, your sentence could be also "Microsoft posted its 2013
> earnings.
> >> The Redmond's company made a huge profit". So, maybe including more
> >> contextual information from dbpedia could increase the recall but of
> course
> >> will reduce the precision.
> >>
> >> Cheers,
> >> Rafa
> >>
> >> El 04/02/14 09:50, Cristian Petroaca escribió:
> >>
> >>  Back with a more detailed description of the steps for making this
> kind of
> >>> coreference work.
> >>>
> >>> I will be using references to the following text in the steps below in
> >>> order to make things clearer : "Microsoft posted its 2013 earnings. The
> >>> software company made a huge profit."
> >>>
> >>> 1. For every noun phrase in the text which has :
> >>>      a. a determinate pos which implies reference to an entity local to
> >>> the
> >>> text, such as "the, this, these") but not "another, every", etc which
> >>> implies a reference to an entity outside of the text.
> >>>      b. having at least another noun aside from the main required noun
> >>> which
> >>> further describes it. For example I will not count "The company" as
> being
> >>> a
> >>> legitimate candidate since this could create a lot of false positives
> by
> >>> considering the double meaning of some words such as "in the company of
> >>> good people".
> >>> "The software company" is a good candidate since we also have
> "software".
> >>>
> >>> 2. match the nouns in the noun phrase to the contents of the dbpedia
> >>> categories of each named entity found prior to the location of the noun
> >>> phrase in the text.
> >>> The dbpedia categories are in the following format (for Microsoft for
> >>> example) : "Software companies of the United States".
> >>>   So we try to match "software company" with that.
> >>> First, as you can see, the main noun in the dbpedia category has a
> plural
> >>> form and it's the same for all categories which I saw. I don't know if
> >>> there's an easier way to do this but I thought of applying a
> lemmatizer on
> >>> the category and the noun phrase in order for them to have a common
> >>> denominator.This also works if the noun phrase itself has a plural
> form.
> >>>
> >>> Second, I'll need to use for comparison only the words in the category
> >>> which are themselves nouns and not prepositions or determiners such as
> "of
> >>> the".This means that I need to pos tag the categories contents as well.
> >>> I was thinking of running the pos and lemma on the dbpedia categories
> when
> >>> building the dbpedia backed entity hub and storing them for later use
> - I
> >>> don't know how feasible this is at the moment.
> >>>
> >>> After this I can compare each noun in the noun phrase with the
> equivalent
> >>> nouns in the categories and based on the number of matches I can
> create a
> >>> confidence level.
> >>>
> >>> 3. match the noun of the noun phrase with the rdf:type from dbpedia of
> the
> >>> named entity. If this matches increase the confidence level.
> >>>
> >>> 4. If there are multiple named entities which can match a certain noun
> >>> phrase then link the noun phrase with the closest named entity prior
> to it
> >>> in the text.
> >>>
> >>> What do you think?
> >>>
> >>> Cristian
> >>>
> >>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
> >>>
> >>>  Hi Rafa,
> >>>>
> >>>> I don't yet have a concrete heursitic but I'm working on it. I'll
> provide
> >>>> it here so that you guys can give me a feedback on it.
> >>>>
> >>>> What are "locality" features?
> >>>>
> >>>> I looked at Bart and other coref tools such as ArkRef and CherryPicker
> >>>> and
> >>>> they don't provide such a coreference.
> >>>>
> >>>> Cristian
> >>>>
> >>>>
> >>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
> >>>>
> >>>> Hi Cristian,
> >>>>
> >>>>> Without having more details about your concrete heuristic, in my
> honest
> >>>>> opinion, such approach could produce a lot of false positives. I
> don't
> >>>>> know
> >>>>> if you are planning to use some "locality" features to detect such
> >>>>> coreferences but you need to take into account that it is quite usual
> >>>>> that
> >>>>> coreferenced mentions can occurs even in different paragraphs.
> Although
> >>>>> I'm
> >>>>> not an expert in Natural Language Understanding, I would say it is
> quite
> >>>>> difficult to get decent precision/recall rates for coreferencing
> using
> >>>>> fixed rules. Maybe you can give a try to others tools like BART (
> >>>>> http://www.bart-coref.org/).
> >>>>>
> >>>>> Cheers,
> >>>>> Rafa Haro
> >>>>>
> >>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
> >>>>>
> >>>>>   Hi,
> >>>>>
> >>>>>> One of the necessary steps for implementing the Event extraction
> Engine
> >>>>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121 is to
> >>>>>> have
> >>>>>> coreference resolution in the given text. This is provided now via
> the
> >>>>>> stanford-nlp project but as far as I saw this module is performing
> >>>>>> mostly
> >>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama)
> coreference
> >>>>>> resolution.
> >>>>>>
> >>>>>> In order to get more coreferences from the text I though of creating
> >>>>>> some
> >>>>>> logic that would detect this kind of coreference :
> >>>>>> "Apple reaches new profit heights. The software company just
> announced
> >>>>>> its
> >>>>>> 2013 earnings."
> >>>>>> Here "The software company" obviously refers to "Apple".
> >>>>>> So I'd like to detect coreferences of Named Entities which are of
> the
> >>>>>> rdf:type of the Named Entity , in this case "company" and also have
> >>>>>> attributes which can be found in the dbpedia categories of the named
> >>>>>> entity, in this case "software".
> >>>>>>
> >>>>>> The detection of coreferences such as "The software company" in the
> >>>>>> text
> >>>>>> would also be done by either using the new Pos Tag Based Phrase
> >>>>>> extraction
> >>>>>> Engine (noun phrases) or by using a dependency tree of the sentence
> and
> >>>>>> picking up only subjects or objects.
> >>>>>>
> >>>>>> At this point I'd like to know if this kind of logic would be useful
> >>>>>> as a
> >>>>>> separate Enhancement Engine (in case the precision and recall are
> good
> >>>>>> enough) in Stanbol?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Cristian
> >>>>>>
> >>>>>>
> >>>>>>
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Cristian,

definitely an interesting approach. You should have a look at Yago2
[1]. As far as I can remember the Yago taxonomy is much better
structured as the one used by dbpedia. Mapping suggestions of dbpedia
to concepts in Yago2 is easy as both dbpedia and yago2 do provide
mappings [2] and [3]

> 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>>
>> "Microsoft posted its 2013 earnings. The Redmond's company made a
>> huge profit".

Thats actually a very good example. Spatial contexts are very
important as they tend to be often used for referencing. So I would
suggest to specially treat the spatial context. For spatial Entities
(like a City) this is easy, but even for other (like a Person,
Company) you could use relations to spatial entities define their
spatial context. This context could than be used to correctly link
"The Redmond's company" to "Microsoft".

In addition I would suggest to use the "spatial" context of each
entity (basically relation to entities that are cities, regions,
countries) as a separate dimension, because those are very often used
for coreferences.

[1] http://www.mpi-inf.mpg.de/yago-naga/yago/
[2] http://downloads.dbpedia.org/3.9/links/yago_links.nt.bz2
[3] http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoDBpediaInstances.ttl.7z


On Thu, Feb 6, 2014 at 10:33 AM, Cristian Petroaca
<cr...@gmail.com> wrote:
> There are several dbpedia categories for each entity, in this case for
> Microsoft we have :
>
> category:Companies_in_the_NASDAQ-100_Index
> category:Microsoft
> category:Software_companies_of_the_United_States
> category:Software_companies_based_in_Washington_(state)
> category:Companies_established_in_1975
> category:1975_establishments_in_the_United_States
> category:Companies_based_in_Redmond,_Washington
> category:Multinational_companies_headquartered_in_the_United_States
> category:Cloud_computing_providers
> category:Companies_in_the_Dow_Jones_Industrial_Average
>
> So we also have "Companies based in Redmont,Washington" which could be
> matched.
>
>
> There is still other contextual information from dbpedia which can be used.
> For example for an Organization we could also include :
> dbpprop:industry = Software
> dbpprop:service = Online Service Providers
>
> and for a Person (that's for Barack Obama) :
>
> dbpedia-owl:profession:
>                                dbpedia:Author
>                                dbpedia:Constitutional_law
>                                dbpedia:Lawyer
>                                dbpedia:Community_organizing
>
> I'd like to continue investigating this as I think that it may have some
> value in increasing the number of coreference resolutions and I'd like to
> concentrate more on precision rather than recall since we already have a
> set of coreferences detected by the stanford nlp tool and this would be as
> an addition to that (at least this is how I would like to use it).
>
> Is it ok if I track this by opening a jira? I could update it to show my
> progress and also my conclusions and if it turns out that it was a bad idea
> then that's the situation at least I'll end up with more knowledge about
> Stanbol in the end :).
>
>
> 2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:
>
>> Hi Cristian,
>>
>> The approach sounds nice. I don't want to be the devil's advocate but I'm
>> just not sure about the recall using the dbpedia categories feature. For
>> example, your sentence could be also "Microsoft posted its 2013 earnings.
>> The Redmond's company made a huge profit". So, maybe including more
>> contextual information from dbpedia could increase the recall but of course
>> will reduce the precision.
>>
>> Cheers,
>> Rafa
>>
>> El 04/02/14 09:50, Cristian Petroaca escribió:
>>
>>  Back with a more detailed description of the steps for making this kind of
>>> coreference work.
>>>
>>> I will be using references to the following text in the steps below in
>>> order to make things clearer : "Microsoft posted its 2013 earnings. The
>>> software company made a huge profit."
>>>
>>> 1. For every noun phrase in the text which has :
>>>      a. a determinate pos which implies reference to an entity local to
>>> the
>>> text, such as "the, this, these") but not "another, every", etc which
>>> implies a reference to an entity outside of the text.
>>>      b. having at least another noun aside from the main required noun
>>> which
>>> further describes it. For example I will not count "The company" as being
>>> a
>>> legitimate candidate since this could create a lot of false positives by
>>> considering the double meaning of some words such as "in the company of
>>> good people".
>>> "The software company" is a good candidate since we also have "software".
>>>
>>> 2. match the nouns in the noun phrase to the contents of the dbpedia
>>> categories of each named entity found prior to the location of the noun
>>> phrase in the text.
>>> The dbpedia categories are in the following format (for Microsoft for
>>> example) : "Software companies of the United States".
>>>   So we try to match "software company" with that.
>>> First, as you can see, the main noun in the dbpedia category has a plural
>>> form and it's the same for all categories which I saw. I don't know if
>>> there's an easier way to do this but I thought of applying a lemmatizer on
>>> the category and the noun phrase in order for them to have a common
>>> denominator.This also works if the noun phrase itself has a plural form.
>>>
>>> Second, I'll need to use for comparison only the words in the category
>>> which are themselves nouns and not prepositions or determiners such as "of
>>> the".This means that I need to pos tag the categories contents as well.
>>> I was thinking of running the pos and lemma on the dbpedia categories when
>>> building the dbpedia backed entity hub and storing them for later use - I
>>> don't know how feasible this is at the moment.
>>>
>>> After this I can compare each noun in the noun phrase with the equivalent
>>> nouns in the categories and based on the number of matches I can create a
>>> confidence level.
>>>
>>> 3. match the noun of the noun phrase with the rdf:type from dbpedia of the
>>> named entity. If this matches increase the confidence level.
>>>
>>> 4. If there are multiple named entities which can match a certain noun
>>> phrase then link the noun phrase with the closest named entity prior to it
>>> in the text.
>>>
>>> What do you think?
>>>
>>> Cristian
>>>
>>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
>>>
>>>  Hi Rafa,
>>>>
>>>> I don't yet have a concrete heursitic but I'm working on it. I'll provide
>>>> it here so that you guys can give me a feedback on it.
>>>>
>>>> What are "locality" features?
>>>>
>>>> I looked at Bart and other coref tools such as ArkRef and CherryPicker
>>>> and
>>>> they don't provide such a coreference.
>>>>
>>>> Cristian
>>>>
>>>>
>>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>>>>
>>>> Hi Cristian,
>>>>
>>>>> Without having more details about your concrete heuristic, in my honest
>>>>> opinion, such approach could produce a lot of false positives. I don't
>>>>> know
>>>>> if you are planning to use some "locality" features to detect such
>>>>> coreferences but you need to take into account that it is quite usual
>>>>> that
>>>>> coreferenced mentions can occurs even in different paragraphs. Although
>>>>> I'm
>>>>> not an expert in Natural Language Understanding, I would say it is quite
>>>>> difficult to get decent precision/recall rates for coreferencing using
>>>>> fixed rules. Maybe you can give a try to others tools like BART (
>>>>> http://www.bart-coref.org/).
>>>>>
>>>>> Cheers,
>>>>> Rafa Haro
>>>>>
>>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>>>>>
>>>>>   Hi,
>>>>>
>>>>>> One of the necessary steps for implementing the Event extraction Engine
>>>>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121 is to
>>>>>> have
>>>>>> coreference resolution in the given text. This is provided now via the
>>>>>> stanford-nlp project but as far as I saw this module is performing
>>>>>> mostly
>>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama) coreference
>>>>>> resolution.
>>>>>>
>>>>>> In order to get more coreferences from the text I though of creating
>>>>>> some
>>>>>> logic that would detect this kind of coreference :
>>>>>> "Apple reaches new profit heights. The software company just announced
>>>>>> its
>>>>>> 2013 earnings."
>>>>>> Here "The software company" obviously refers to "Apple".
>>>>>> So I'd like to detect coreferences of Named Entities which are of the
>>>>>> rdf:type of the Named Entity , in this case "company" and also have
>>>>>> attributes which can be found in the dbpedia categories of the named
>>>>>> entity, in this case "software".
>>>>>>
>>>>>> The detection of coreferences such as "The software company" in the
>>>>>> text
>>>>>> would also be done by either using the new Pos Tag Based Phrase
>>>>>> extraction
>>>>>> Engine (noun phrases) or by using a dependency tree of the sentence and
>>>>>> picking up only subjects or objects.
>>>>>>
>>>>>> At this point I'd like to know if this kind of logic would be useful
>>>>>> as a
>>>>>> separate Enhancement Engine (in case the precision and recall are good
>>>>>> enough) in Stanbol?
>>>>>>
>>>>>> Thanks,
>>>>>> Cristian
>>>>>>
>>>>>>
>>>>>>
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Cristian Petroaca <cr...@gmail.com>.
There are several dbpedia categories for each entity, in this case for
Microsoft we have :

category:Companies_in_the_NASDAQ-100_Index
category:Microsoft
category:Software_companies_of_the_United_States
category:Software_companies_based_in_Washington_(state)
category:Companies_established_in_1975
category:1975_establishments_in_the_United_States
category:Companies_based_in_Redmond,_Washington
category:Multinational_companies_headquartered_in_the_United_States
category:Cloud_computing_providers
category:Companies_in_the_Dow_Jones_Industrial_Average

So we also have "Companies based in Redmont,Washington" which could be
matched.


There is still other contextual information from dbpedia which can be used.
For example for an Organization we could also include :
dbpprop:industry = Software
dbpprop:service = Online Service Providers

and for a Person (that's for Barack Obama) :

dbpedia-owl:profession:
                               dbpedia:Author
                               dbpedia:Constitutional_law
                               dbpedia:Lawyer
                               dbpedia:Community_organizing

I'd like to continue investigating this as I think that it may have some
value in increasing the number of coreference resolutions and I'd like to
concentrate more on precision rather than recall since we already have a
set of coreferences detected by the stanford nlp tool and this would be as
an addition to that (at least this is how I would like to use it).

Is it ok if I track this by opening a jira? I could update it to show my
progress and also my conclusions and if it turns out that it was a bad idea
then that's the situation at least I'll end up with more knowledge about
Stanbol in the end :).


2014-02-05 15:39 GMT+02:00 Rafa Haro <rh...@apache.org>:

> Hi Cristian,
>
> The approach sounds nice. I don't want to be the devil's advocate but I'm
> just not sure about the recall using the dbpedia categories feature. For
> example, your sentence could be also "Microsoft posted its 2013 earnings.
> The Redmond's company made a huge profit". So, maybe including more
> contextual information from dbpedia could increase the recall but of course
> will reduce the precision.
>
> Cheers,
> Rafa
>
> El 04/02/14 09:50, Cristian Petroaca escribió:
>
>  Back with a more detailed description of the steps for making this kind of
>> coreference work.
>>
>> I will be using references to the following text in the steps below in
>> order to make things clearer : "Microsoft posted its 2013 earnings. The
>> software company made a huge profit."
>>
>> 1. For every noun phrase in the text which has :
>>      a. a determinate pos which implies reference to an entity local to
>> the
>> text, such as "the, this, these") but not "another, every", etc which
>> implies a reference to an entity outside of the text.
>>      b. having at least another noun aside from the main required noun
>> which
>> further describes it. For example I will not count "The company" as being
>> a
>> legitimate candidate since this could create a lot of false positives by
>> considering the double meaning of some words such as "in the company of
>> good people".
>> "The software company" is a good candidate since we also have "software".
>>
>> 2. match the nouns in the noun phrase to the contents of the dbpedia
>> categories of each named entity found prior to the location of the noun
>> phrase in the text.
>> The dbpedia categories are in the following format (for Microsoft for
>> example) : "Software companies of the United States".
>>   So we try to match "software company" with that.
>> First, as you can see, the main noun in the dbpedia category has a plural
>> form and it's the same for all categories which I saw. I don't know if
>> there's an easier way to do this but I thought of applying a lemmatizer on
>> the category and the noun phrase in order for them to have a common
>> denominator.This also works if the noun phrase itself has a plural form.
>>
>> Second, I'll need to use for comparison only the words in the category
>> which are themselves nouns and not prepositions or determiners such as "of
>> the".This means that I need to pos tag the categories contents as well.
>> I was thinking of running the pos and lemma on the dbpedia categories when
>> building the dbpedia backed entity hub and storing them for later use - I
>> don't know how feasible this is at the moment.
>>
>> After this I can compare each noun in the noun phrase with the equivalent
>> nouns in the categories and based on the number of matches I can create a
>> confidence level.
>>
>> 3. match the noun of the noun phrase with the rdf:type from dbpedia of the
>> named entity. If this matches increase the confidence level.
>>
>> 4. If there are multiple named entities which can match a certain noun
>> phrase then link the noun phrase with the closest named entity prior to it
>> in the text.
>>
>> What do you think?
>>
>> Cristian
>>
>> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
>>
>>  Hi Rafa,
>>>
>>> I don't yet have a concrete heursitic but I'm working on it. I'll provide
>>> it here so that you guys can give me a feedback on it.
>>>
>>> What are "locality" features?
>>>
>>> I looked at Bart and other coref tools such as ArkRef and CherryPicker
>>> and
>>> they don't provide such a coreference.
>>>
>>> Cristian
>>>
>>>
>>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>>>
>>> Hi Cristian,
>>>
>>>> Without having more details about your concrete heuristic, in my honest
>>>> opinion, such approach could produce a lot of false positives. I don't
>>>> know
>>>> if you are planning to use some "locality" features to detect such
>>>> coreferences but you need to take into account that it is quite usual
>>>> that
>>>> coreferenced mentions can occurs even in different paragraphs. Although
>>>> I'm
>>>> not an expert in Natural Language Understanding, I would say it is quite
>>>> difficult to get decent precision/recall rates for coreferencing using
>>>> fixed rules. Maybe you can give a try to others tools like BART (
>>>> http://www.bart-coref.org/).
>>>>
>>>> Cheers,
>>>> Rafa Haro
>>>>
>>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>>>>
>>>>   Hi,
>>>>
>>>>> One of the necessary steps for implementing the Event extraction Engine
>>>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121 is to
>>>>> have
>>>>> coreference resolution in the given text. This is provided now via the
>>>>> stanford-nlp project but as far as I saw this module is performing
>>>>> mostly
>>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama) coreference
>>>>> resolution.
>>>>>
>>>>> In order to get more coreferences from the text I though of creating
>>>>> some
>>>>> logic that would detect this kind of coreference :
>>>>> "Apple reaches new profit heights. The software company just announced
>>>>> its
>>>>> 2013 earnings."
>>>>> Here "The software company" obviously refers to "Apple".
>>>>> So I'd like to detect coreferences of Named Entities which are of the
>>>>> rdf:type of the Named Entity , in this case "company" and also have
>>>>> attributes which can be found in the dbpedia categories of the named
>>>>> entity, in this case "software".
>>>>>
>>>>> The detection of coreferences such as "The software company" in the
>>>>> text
>>>>> would also be done by either using the new Pos Tag Based Phrase
>>>>> extraction
>>>>> Engine (noun phrases) or by using a dependency tree of the sentence and
>>>>> picking up only subjects or objects.
>>>>>
>>>>> At this point I'd like to know if this kind of logic would be useful
>>>>> as a
>>>>> separate Enhancement Engine (in case the precision and recall are good
>>>>> enough) in Stanbol?
>>>>>
>>>>> Thanks,
>>>>> Cristian
>>>>>
>>>>>
>>>>>
>

Re: Named entity coref resolution based on dbpedia categories and rdf:type

Posted by Rafa Haro <rh...@apache.org>.
Hi Cristian,

The approach sounds nice. I don't want to be the devil's advocate but 
I'm just not sure about the recall using the dbpedia categories feature. 
For example, your sentence could be also "Microsoft posted its 2013 
earnings. The Redmond's company made a huge profit". So, maybe including 
more contextual information from dbpedia could increase the recall but 
of course will reduce the precision.

Cheers,
Rafa

El 04/02/14 09:50, Cristian Petroaca escribió:
> Back with a more detailed description of the steps for making this kind of
> coreference work.
>
> I will be using references to the following text in the steps below in
> order to make things clearer : "Microsoft posted its 2013 earnings. The
> software company made a huge profit."
>
> 1. For every noun phrase in the text which has :
>      a. a determinate pos which implies reference to an entity local to the
> text, such as "the, this, these") but not "another, every", etc which
> implies a reference to an entity outside of the text.
>      b. having at least another noun aside from the main required noun which
> further describes it. For example I will not count "The company" as being a
> legitimate candidate since this could create a lot of false positives by
> considering the double meaning of some words such as "in the company of
> good people".
> "The software company" is a good candidate since we also have "software".
>
> 2. match the nouns in the noun phrase to the contents of the dbpedia
> categories of each named entity found prior to the location of the noun
> phrase in the text.
> The dbpedia categories are in the following format (for Microsoft for
> example) : "Software companies of the United States".
>   So we try to match "software company" with that.
> First, as you can see, the main noun in the dbpedia category has a plural
> form and it's the same for all categories which I saw. I don't know if
> there's an easier way to do this but I thought of applying a lemmatizer on
> the category and the noun phrase in order for them to have a common
> denominator.This also works if the noun phrase itself has a plural form.
>
> Second, I'll need to use for comparison only the words in the category
> which are themselves nouns and not prepositions or determiners such as "of
> the".This means that I need to pos tag the categories contents as well.
> I was thinking of running the pos and lemma on the dbpedia categories when
> building the dbpedia backed entity hub and storing them for later use - I
> don't know how feasible this is at the moment.
>
> After this I can compare each noun in the noun phrase with the equivalent
> nouns in the categories and based on the number of matches I can create a
> confidence level.
>
> 3. match the noun of the noun phrase with the rdf:type from dbpedia of the
> named entity. If this matches increase the confidence level.
>
> 4. If there are multiple named entities which can match a certain noun
> phrase then link the noun phrase with the closest named entity prior to it
> in the text.
>
> What do you think?
>
> Cristian
>
> 2014-01-31 Cristian Petroaca <cr...@gmail.com>:
>
>> Hi Rafa,
>>
>> I don't yet have a concrete heursitic but I'm working on it. I'll provide
>> it here so that you guys can give me a feedback on it.
>>
>> What are "locality" features?
>>
>> I looked at Bart and other coref tools such as ArkRef and CherryPicker and
>> they don't provide such a coreference.
>>
>> Cristian
>>
>>
>> 2014-01-30 Rafa Haro <rh...@apache.org>:
>>
>> Hi Cristian,
>>> Without having more details about your concrete heuristic, in my honest
>>> opinion, such approach could produce a lot of false positives. I don't know
>>> if you are planning to use some "locality" features to detect such
>>> coreferences but you need to take into account that it is quite usual that
>>> coreferenced mentions can occurs even in different paragraphs. Although I'm
>>> not an expert in Natural Language Understanding, I would say it is quite
>>> difficult to get decent precision/recall rates for coreferencing using
>>> fixed rules. Maybe you can give a try to others tools like BART (
>>> http://www.bart-coref.org/).
>>>
>>> Cheers,
>>> Rafa Haro
>>>
>>> El 30/01/14 10:33, Cristian Petroaca escribió:
>>>
>>>   Hi,
>>>> One of the necessary steps for implementing the Event extraction Engine
>>>> feature : https://issues.apache.org/jira/browse/STANBOL-1121 is to have
>>>> coreference resolution in the given text. This is provided now via the
>>>> stanford-nlp project but as far as I saw this module is performing mostly
>>>> pronomial (He, She) or nominal (Barack Obama and Mr. Obama) coreference
>>>> resolution.
>>>>
>>>> In order to get more coreferences from the text I though of creating some
>>>> logic that would detect this kind of coreference :
>>>> "Apple reaches new profit heights. The software company just announced
>>>> its
>>>> 2013 earnings."
>>>> Here "The software company" obviously refers to "Apple".
>>>> So I'd like to detect coreferences of Named Entities which are of the
>>>> rdf:type of the Named Entity , in this case "company" and also have
>>>> attributes which can be found in the dbpedia categories of the named
>>>> entity, in this case "software".
>>>>
>>>> The detection of coreferences such as "The software company" in the text
>>>> would also be done by either using the new Pos Tag Based Phrase
>>>> extraction
>>>> Engine (noun phrases) or by using a dependency tree of the sentence and
>>>> picking up only subjects or objects.
>>>>
>>>> At this point I'd like to know if this kind of logic would be useful as a
>>>> separate Enhancement Engine (in case the precision and recall are good
>>>> enough) in Stanbol?
>>>>
>>>> Thanks,
>>>> Cristian
>>>>
>>>>