You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Nicolas Hernandez <ni...@gmail.com> on 2011/11/02 09:45:46 UTC

Re: Consuming RDF ontologies as dictionaries

Hi Spico,

for sure it is possible with UIMA. For now, you have components in the
sandbox (like the Dictionary Annotator or the Concept Mapper
Annotator) which aims at recognizing text forms in a text from
dictionaries.

Personaly, I was not satisfied by the current solutions (either too
simple (no features can be associated with an entry of the
DictionaryAnnotator) or too complex to set up for me (the Concept
Mapper Annotator is based on a tokenizer which was different from
mine) ).

Based on a previous work of Jerome Rocheteau, I developed a simple
Dictionary Annotator with the following features
  * a dictionary is a uima resource (one instance can be shared by
multiple annotators) [1]
  * the dictionary design is abstract enough to allow several implementations.
  * right know it comes with one implementation of dictionary format :
CSV (one column is the entry and the others are feature values), but
XML RDF would be an easy
  * the dictionary entries are strings of characters which are stored
as a prefix tree of characters in order to process the recognition in
a fast way
  * it is not type system dependent

It would not cost to much to add an extension to deal XML RDF (only a
parser and the connector to the data structure). I have planed to open
the code soon but I can make it available sooner if you re interested
in participating in.
Anyway I ll be interested to know a bit more about how you wanted to
use your RDF format (what are the entries, the values...)

Best regards

/Nicolas

[1] http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.accessing_external_resource_files



On Mon, Oct 31, 2011 at 9:41 PM, Alexander Klenner
<al...@scai.fraunhofer.de> wrote:
> Hi Florin,
>
> I think what you are looking for is an UIMA type system that corresponds to your specific RDF ontologies (URIs). As far as I know you must implement this type system by hand (experienced UIMA users please correct me if I am wrong here...).
>
> There is an RDF CAS Consumer to be found in the UIMA sandbox:
>
> http://uima.apache.org/sandbox.html#rdfcas.consumer
>
> that does it the other way round, an existing type system in a CAS is converted to RDF triplestore format. But the created URIs from the typesystem change from one run to another for the same artefact, which makes them not really usable in a bigger RDF context. But maybe this could be a starting point for further investigation...
>
> Cheers,
>
> Alex
>
>
>
> --
> Dipl. Bioinformatiker Alexander G. Klenner
> Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
> Schloss Birlinghoven, D-53754 Sankt Augustin
> Tel.: +49 - 2241 - 14 - 2736
> E-mail: alexander.garvin.klenner@scai.fraunhofer.de
> Internet: http://www.scai.fraunhofer.de
>
>
> ----- Ursprüngliche Mail -----
> Von: "Spico Florin" <sp...@gmail.com>
> An: user@uima.apache.org
> Gesendet: Montag, 31. Oktober 2011 16:48:18
> Betreff: Consuming RDF ontologies as dictionaries
>
> Hello!
>  I'm newbie in UIMA. I would like to know if it is possible to create a
> dictionary (vocabulary) from a RDF triplestore. I would like that UIMA to
> be used to classify a words contained in a text by using a given ontology
> stored in a triplestore.
> How can I use UIMA in this particular use case?
>  I look forward for your answers.
>  Thank you.
>  Regards,
>  Florin
>



-- 
Dr. Nicolas Hernandez
Associate Professor (Maître de Conférences)
Université de Nantes - LINA CNRS
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
+33 (0)2 51 12 58 55
+33 (0)2 40 30 60 67

Re: Consuming RDF ontologies as dictionaries

Posted by Michael Tanenblatt <sl...@park-slope.net>.
Just FYI regarding ConceptMapper: one of the key design points of ConceptMapper is that *any* tokenizer annotator can be used--the one supplied is just an example--and you can set it up so that that tokenizer is also used to tokenize your dictionary, to minimize missed matches due to differing tokenizations between the input text and dictionaries.


On Nov 2, 2011, at 4:45 AM, Nicolas Hernandez wrote:

> Hi Spico,
> 
> for sure it is possible with UIMA. For now, you have components in the
> sandbox (like the Dictionary Annotator or the Concept Mapper
> Annotator) which aims at recognizing text forms in a text from
> dictionaries.
> 
> Personaly, I was not satisfied by the current solutions (either too
> simple (no features can be associated with an entry of the
> DictionaryAnnotator) or too complex to set up for me (the Concept
> Mapper Annotator is based on a tokenizer which was different from
> mine) ).
> 
> Based on a previous work of Jerome Rocheteau, I developed a simple
> Dictionary Annotator with the following features
>  * a dictionary is a uima resource (one instance can be shared by
> multiple annotators) [1]
>  * the dictionary design is abstract enough to allow several implementations.
>  * right know it comes with one implementation of dictionary format :
> CSV (one column is the entry and the others are feature values), but
> XML RDF would be an easy
>  * the dictionary entries are strings of characters which are stored
> as a prefix tree of characters in order to process the recognition in
> a fast way
>  * it is not type system dependent
> 
> It would not cost to much to add an extension to deal XML RDF (only a
> parser and the connector to the data structure). I have planed to open
> the code soon but I can make it available sooner if you re interested
> in participating in.
> Anyway I ll be interested to know a bit more about how you wanted to
> use your RDF format (what are the entries, the values...)
> 
> Best regards
> 
> /Nicolas
> 
> [1] http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.accessing_external_resource_files
> 
> 
> 
> On Mon, Oct 31, 2011 at 9:41 PM, Alexander Klenner
> <al...@scai.fraunhofer.de> wrote:
>> Hi Florin,
>> 
>> I think what you are looking for is an UIMA type system that corresponds to your specific RDF ontologies (URIs). As far as I know you must implement this type system by hand (experienced UIMA users please correct me if I am wrong here...).
>> 
>> There is an RDF CAS Consumer to be found in the UIMA sandbox:
>> 
>> http://uima.apache.org/sandbox.html#rdfcas.consumer
>> 
>> that does it the other way round, an existing type system in a CAS is converted to RDF triplestore format. But the created URIs from the typesystem change from one run to another for the same artefact, which makes them not really usable in a bigger RDF context. But maybe this could be a starting point for further investigation...
>> 
>> Cheers,
>> 
>> Alex
>> 
>> 
>> 
>> --
>> Dipl. Bioinformatiker Alexander G. Klenner
>> Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
>> Schloss Birlinghoven, D-53754 Sankt Augustin
>> Tel.: +49 - 2241 - 14 - 2736
>> E-mail: alexander.garvin.klenner@scai.fraunhofer.de
>> Internet: http://www.scai.fraunhofer.de
>> 
>> 
>> ----- Ursprüngliche Mail -----
>> Von: "Spico Florin" <sp...@gmail.com>
>> An: user@uima.apache.org
>> Gesendet: Montag, 31. Oktober 2011 16:48:18
>> Betreff: Consuming RDF ontologies as dictionaries
>> 
>> Hello!
>>  I'm newbie in UIMA. I would like to know if it is possible to create a
>> dictionary (vocabulary) from a RDF triplestore. I would like that UIMA to
>> be used to classify a words contained in a text by using a given ontology
>> stored in a triplestore.
>> How can I use UIMA in this particular use case?
>>  I look forward for your answers.
>>  Thank you.
>>  Regards,
>>  Florin
>> 
> 
> 
> 
> -- 
> Dr. Nicolas Hernandez
> Associate Professor (Maître de Conférences)
> Université de Nantes - LINA CNRS
> http://enicolashernandez.blogspot.com
> http://www.univ-nantes.fr/hernandez-n
> +33 (0)2 51 12 58 55
> +33 (0)2 40 30 60 67


Re: Consuming RDF ontologies as dictionaries

Posted by Spico Florin <sp...@gmail.com>.
Hello!
  Thank you for your answers. I have an RDF with the entities organized in
hierarchies:

Organization
  I
Competitor
    |- Volvo
    I-  Renault
    I- Mercedes

Each competitor competes on different  car models
   Car_Model
      |-class A
      |-class C
  etc.

Here is my use case:

 1. I store the above information in RDF
2. Giving a text document, I would like to use an UIMA Annotator to
identify competitors and the car classes that were identified in the given
text.


I look forward for your inputs and suggestion on how to apply UIMA for the
above scenario>

Thank you in advance,

Regards,
    Florin



On Wed, Nov 2, 2011 at 10:45 AM, Nicolas Hernandez <
nicolas.hernandez@gmail.com> wrote:

> Hi Spico,
>
> for sure it is possible with UIMA. For now, you have components in the
> sandbox (like the Dictionary Annotator or the Concept Mapper
> Annotator) which aims at recognizing text forms in a text from
> dictionaries.
>
> Personaly, I was not satisfied by the current solutions (either too
> simple (no features can be associated with an entry of the
> DictionaryAnnotator) or too complex to set up for me (the Concept
> Mapper Annotator is based on a tokenizer which was different from
> mine) ).
>
> Based on a previous work of Jerome Rocheteau, I developed a simple
> Dictionary Annotator with the following features
>  * a dictionary is a uima resource (one instance can be shared by
> multiple annotators) [1]
>  * the dictionary design is abstract enough to allow several
> implementations.
>  * right know it comes with one implementation of dictionary format :
> CSV (one column is the entry and the others are feature values), but
> XML RDF would be an easy
>  * the dictionary entries are strings of characters which are stored
> as a prefix tree of characters in order to process the recognition in
> a fast way
>  * it is not type system dependent
>
> It would not cost to much to add an extension to deal XML RDF (only a
> parser and the connector to the data structure). I have planed to open
> the code soon but I can make it available sooner if you re interested
> in participating in.
> Anyway I ll be interested to know a bit more about how you wanted to
> use your RDF format (what are the entries, the values...)
>
> Best regards
>
> /Nicolas
>
> [1]
> http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.accessing_external_resource_files
>
>
>
> On Mon, Oct 31, 2011 at 9:41 PM, Alexander Klenner
> <al...@scai.fraunhofer.de> wrote:
> > Hi Florin,
> >
> > I think what you are looking for is an UIMA type system that corresponds
> to your specific RDF ontologies (URIs). As far as I know you must implement
> this type system by hand (experienced UIMA users please correct me if I am
> wrong here...).
> >
> > There is an RDF CAS Consumer to be found in the UIMA sandbox:
> >
> > http://uima.apache.org/sandbox.html#rdfcas.consumer
> >
> > that does it the other way round, an existing type system in a CAS is
> converted to RDF triplestore format. But the created URIs from the
> typesystem change from one run to another for the same artefact, which
> makes them not really usable in a bigger RDF context. But maybe this could
> be a starting point for further investigation...
> >
> > Cheers,
> >
> > Alex
> >
> >
> >
> > --
> > Dipl. Bioinformatiker Alexander G. Klenner
> > Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
> > Schloss Birlinghoven, D-53754 Sankt Augustin
> > Tel.: +49 - 2241 - 14 - 2736
> > E-mail: alexander.garvin.klenner@scai.fraunhofer.de
> > Internet: http://www.scai.fraunhofer.de
> >
> >
> > ----- Ursprüngliche Mail -----
> > Von: "Spico Florin" <sp...@gmail.com>
> > An: user@uima.apache.org
> > Gesendet: Montag, 31. Oktober 2011 16:48:18
> > Betreff: Consuming RDF ontologies as dictionaries
> >
> > Hello!
> >  I'm newbie in UIMA. I would like to know if it is possible to create a
> > dictionary (vocabulary) from a RDF triplestore. I would like that UIMA to
> > be used to classify a words contained in a text by using a given ontology
> > stored in a triplestore.
> > How can I use UIMA in this particular use case?
> >  I look forward for your answers.
> >  Thank you.
> >  Regards,
> >  Florin
> >
>
>
>
> --
> Dr. Nicolas Hernandez
> Associate Professor (Maître de Conférences)
> Université de Nantes - LINA CNRS
> http://enicolashernandez.blogspot.com
> http://www.univ-nantes.fr/hernandez-n
> +33 (0)2 51 12 58 55
> +33 (0)2 40 30 60 67
>