You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Igor Sominsky <so...@gmail.com> on 2008/05/15 18:24:21 UTC

proposal for a new testing and evaluation component

My group would like to offer the following UIMA component, Common Feature Extractor (CFE), as an open source offering into the UIMA sandbox, assuming there is interest from the community:

CFE enables the configuration driven feature value extraction from UIMA annotations contained in CAS. The extracted information can be used for statistical analysis, performance metrics evaluation, regression testing and machine learning related processing.

CFE provides a flexible, yet powerful language FESL (Feature Extraction Specification Language) for working with the UIMA CAS to enable the collection and classification of resultant data. FESL is a declarative XML-based language that expresses semantic rules for the feature extraction. While the rules guide the feature extraction in a completely generalized way and CFE provides methods for subsequent processing to format the output of the extraction as needed for downstream use. The destination for the output is defined by a particular application where CFE is used (CAS, external file, database, etc.). CFE could be implemented by either TAE or CAS Consumer, depending on a particular application needs

FESL rules allow flexible and powerful way of defining multi-parameter criteria for specific information to be extracted from CAS. Such criteria can be customized by:

1.. a type of an UIMA annotation object that contains the feature of interest
2.. a surrounding (enclosing) annotation type and a relative location of the object within the enclosure that limits the extraction within a boundaries of a certain UIMA type.
3.. "path" to the feature from the annotation object
4.. a type and value of the feature itself
5.. values of any public Java get-style methods (methods that accept no parameters and return a value) implemented by the underlying class of the feature
6.. a location of the object or the feature on a specific path (in cases when it is required to select/bypass annotations if they are features of other UIMA annotation types)

The feature values can be evaluated by conditional expressions stated in FESL. Particularly, the feature values can be evaluated whether they:

1.. are of a certain type
2.. belong to a specific set of values (vocabulary)
3.. belong to a range of numeric values (inclusively or non-inclusively)
4.. match certain bits of a bit mask (integer values only)
5.. match a Java regular expression pattern,

These expressions can be specified in disjunctive normal form that gives a powerful and flexible way of defining fairly complex criteria for an extraction of a required annotation and/or its value

The FESL itself is defined in XSD format and integrated with EMF for syntax validation and automated code generation.

CFE has been successfully used in several internal projects for evaluation of performance metrics and machine learning.

CFE is described in more detail in the paper "CFE - a system for testing, evaluation and machine learning of UIMA based applications", by I. Sominsky, A. Coden, M. Tanenblatt that will be presented at UIMA for NLP workshop as part of the LREC 2008 conference in Marrakech, Morocco.

Igor Sominsky

sominsky@gmail.com

Re: proposal for a new testing and evaluation component

Posted by Igor Sominsky <so...@gmail.com>.

CFE does not provide this functionality. It extracts feature values

--Igor

----- Original Message ----- 
From: "Thilo Goetz" <tw...@gmx.de>
To: <ui...@incubator.apache.org>
Sent: Wednesday, May 21, 2008 4:41 AM
Subject: Re: proposal for a new testing and evaluation component


> Andrew Borthwick wrote:
>> Does this provide functionality similar to GATE's JAPE regular expression
>> language, i.e. could I use CFE to create new UIMA annotations as the 
>> result
>> of regular expressions over other UIMA annotations?
>
> I don't think so.
>
>>
>> If not, does anything like this exist for UIMA right now or is anything 
>> in
>> the works?
>
> I know of several proprietary ones, but nothing open source.  It
> would be nice to have something like Jape in UIMA.
>
> --Thilo
>

Re: proposal for a new testing and evaluation component

Posted by Christian Mauceri <ma...@hermeneute.com>.

Hi,

Indeed any new and encapsulated tool in UIMA is interesting for the 
community but what makes UIMA unique is not the fact that these tools 
exist or not but rather that UIMA makes their seamless integration 
within a same framework possible no matter the languages they are 
written in. So I do not see why regexp tool existence in UIMA can be a 
criterion in making the decision to adopt UIMA. If people need a regexp 
tool in a UIMA application, they have just to pick one and write the 
thin layer of code to encapsulate it, you can use mine 
<http://code.google.com/p/digital-philology/source/> available under 
Apache License  but there are many others around.
Once again the big advantages of UIMA are: interoperability, language 
independence and theory neutrality. You can, for instance, sketch a 
module in Perl integrate it in your processing line and when happy 
seamlessly rewrite it in another language.
Finally UIMA can deal with other object than texts, images, for instance 
and for this very reason it must be kept at a high level, it's up to the 
community to provide modules. You can for instance use Gate 
<http://gate.ac.uk/sale/tao/index.html#x1-38600016> in UIMA but you can 
also integrate OpenNLP, here 
<http://uima.lti.cs.cmu.edu:8080/UCR/pages/static/osnlp/OpenNLPReadme.html> 
is UIMA example wrappers for the OpenNLP tools.

Regards.

Andrew Borthwick a écrit :
> Frank,
>
> Your ANTLR-based approach sounds interesting.  I'd like to see the paper and
> I'd be interested in seeing some demo code too.
>
> This whole area of needing JAPE-like functionality for UIMA is a critical
> issue, as far as I'm concerned.  The lack of support for writing regex's
> over annotations was one of two key reasons that my company decided to go
> with GATE over UIMA a year ago (the other was that GATE has the ability to
> read in an html document and convert the html into GATE annotations, which
> is a key feature for working with web documents).
>
> Although we like UIMA's infrastructure and array of ML toolkits, we would
> need to see some kind of solid regex functionality before we could consider
> starting to develop in UIMA.
>
> Regards,
> Andrew Borthwick
>
> On Wed, May 21, 2008 at 9:51 AM, Frank Schilder <
> frank.schilder@thomsonreuters.com> wrote:
>
>   
>>     
>>>>> If not, does anything like this exist for UIMA right now or is anything
>>>>>           
>> in
>>     
>>>>> the works?
>>>>>           
>>>> I know of several proprietary ones, but nothing open source.  It
>>>> would be nice to have something like Jape in UIMA.
>>>>
>>>>         
>>> well, I wrote an annotator that uses Jape.
>>>
>>>       
>> We have been using ANTLR (www.antlr.org) for writing grammars that detect,
>> for example, temporal and monetary expressions. The integration of an ANTLR
>> lexer and parser into UIMA was fairly straight forward. We based our
>> integration on a posting that explains the interfacing of StAX with ANTLR
>> http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR
>>
>> ANTLR grammars are written in EBNF and can be compiled into different
>> programming languages (e.g. Java, C, C#). The ANTLR grammar can also
>> contain
>> Java code, if you want to manipulate other objects (e.g. adding annotations
>> to the CAS) while parsing the input.
>>
>> You can write an ANTLR grammar, add java code to it and compile everything
>> into a java class. This java class can then be used by your AE in UIMA.
>>
>> We experimented with lexers and parsers in ANTLR:
>>
>> 1) a lexer in ANTLR can be set to be a scanner that scans an input string
>> for expressions defined within EBNF
>> 2) a parser expects a stream of ANTLR tokens. A stream of ANTLR tokens can
>> be constructed from UIMA annotations (see integration of StAX events into
>> ANTLR). Such a grammar can detect more complex structures consisting of
>> basic (UIMA) annotations.
>>
>>
>> The grammar formalism used by ANTLR is LL(*) which is more flexible than
>> LL(k). We found the grammars we wrote are much faster than the Jape
>> grammars
>> we also used within UIMA. You're more constrained by the LL(*) formalism in
>> writing rules, but ANTLRworks is a useful GUI development environment that
>> alerts you to ambiguous rules.
>> http://www.antlr.org/works/index.html
>>
>> BTW: This work will also be discusses as part of our paper at the LREC UIMA
>> workshop next week.
>> http://watchtower.coling.uni-jena.de/~coling/uimaws_lrec2008/<http://watchtower.coling.uni-jena.de/%7Ecoling/uimaws_lrec2008/>
>>
>> Frank
>>
>>
>>
>>
>>     
>>> There are some limits:
>>> - it's impossible to create (in jape) an annotation that references to
>>> another annotation, that's easy to do in uima (pseudo code):
>>> Lemma lemma = new Lemma(cas);
>>> Token token = new Token(cas);
>>> token.setLemma(lemma);
>>> - the annotator is packaged as a PEAR that include ALL the GATE jars...
>>> - if the annotator is deployed in a web context, only the precompiled
>>> grammars are working: I think it's a class loading problem: the pear
>>> is loaded by a class loader, the uimaframework in deployed inside a
>>> web context that is under another class loader.... and so on....
>>> -performance: the reverse mapping from gate to uima il slow: updating
>>> the existing annotation means scanning all the annos in the cas, each
>>> feature and check if they're changed (well, if the grammar doesn't
>>> update anithing, the updates could be excluded)
>>>
>>> I want to open the annototor, but at the moment I don't have the
>>> permission to do that.
>>>
>>> But, the better would be to have a JAPE clone, or something better,
>>> that uses UIMA directly.
>>> I want to take a loook to the BSFAnnotator to understand if it could be
>>> usefull.
>>>
>>> cheers,
>>> Roberto
>>>
>>> --
>>> Roberto Franchini
>>> CELI s.r.l. (http://www.celi.it) - C.so Moncalieri 21 - 10131 Torino -
>>>       
>> ITALY
>>     
>>> Tel +39-011-6600814 - Fax +39-011-6600687
>>> jabber:ro.franchini@gmail.com <ja...@gmail.com>skype:ro.franchini
>>>       
>>     
>
>
>

Re: proposal for a new testing and evaluation component

Posted by Andrew Borthwick <an...@corp.spock.com>.

Frank,

Your ANTLR-based approach sounds interesting.  I'd like to see the paper and
I'd be interested in seeing some demo code too.

This whole area of needing JAPE-like functionality for UIMA is a critical
issue, as far as I'm concerned.  The lack of support for writing regex's
over annotations was one of two key reasons that my company decided to go
with GATE over UIMA a year ago (the other was that GATE has the ability to
read in an html document and convert the html into GATE annotations, which
is a key feature for working with web documents).

Although we like UIMA's infrastructure and array of ML toolkits, we would
need to see some kind of solid regex functionality before we could consider
starting to develop in UIMA.

Regards,
Andrew Borthwick

On Wed, May 21, 2008 at 9:51 AM, Frank Schilder <
frank.schilder@thomsonreuters.com> wrote:

>
>
> >>
> >>>
> >>> If not, does anything like this exist for UIMA right now or is anything
> in
> >>> the works?
> >>
> >> I know of several proprietary ones, but nothing open source.  It
> >> would be nice to have something like Jape in UIMA.
> >>
> >
> > well, I wrote an annotator that uses Jape.
> >
>
> We have been using ANTLR (www.antlr.org) for writing grammars that detect,
> for example, temporal and monetary expressions. The integration of an ANTLR
> lexer and parser into UIMA was fairly straight forward. We based our
> integration on a posting that explains the interfacing of StAX with ANTLR
> http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR
>
> ANTLR grammars are written in EBNF and can be compiled into different
> programming languages (e.g. Java, C, C#). The ANTLR grammar can also
> contain
> Java code, if you want to manipulate other objects (e.g. adding annotations
> to the CAS) while parsing the input.
>
> You can write an ANTLR grammar, add java code to it and compile everything
> into a java class. This java class can then be used by your AE in UIMA.
>
> We experimented with lexers and parsers in ANTLR:
>
> 1) a lexer in ANTLR can be set to be a scanner that scans an input string
> for expressions defined within EBNF
> 2) a parser expects a stream of ANTLR tokens. A stream of ANTLR tokens can
> be constructed from UIMA annotations (see integration of StAX events into
> ANTLR). Such a grammar can detect more complex structures consisting of
> basic (UIMA) annotations.
>
>
> The grammar formalism used by ANTLR is LL(*) which is more flexible than
> LL(k). We found the grammars we wrote are much faster than the Jape
> grammars
> we also used within UIMA. You're more constrained by the LL(*) formalism in
> writing rules, but ANTLRworks is a useful GUI development environment that
> alerts you to ambiguous rules.
> http://www.antlr.org/works/index.html
>
> BTW: This work will also be discusses as part of our paper at the LREC UIMA
> workshop next week.
> http://watchtower.coling.uni-jena.de/~coling/uimaws_lrec2008/<http://watchtower.coling.uni-jena.de/%7Ecoling/uimaws_lrec2008/>
>
> Frank
>
>
>
>
> >
> > There are some limits:
> > - it's impossible to create (in jape) an annotation that references to
> > another annotation, that's easy to do in uima (pseudo code):
> > Lemma lemma = new Lemma(cas);
> > Token token = new Token(cas);
> > token.setLemma(lemma);
> > - the annotator is packaged as a PEAR that include ALL the GATE jars...
> > - if the annotator is deployed in a web context, only the precompiled
> > grammars are working: I think it's a class loading problem: the pear
> > is loaded by a class loader, the uimaframework in deployed inside a
> > web context that is under another class loader.... and so on....
> > -performance: the reverse mapping from gate to uima il slow: updating
> > the existing annotation means scanning all the annos in the cas, each
> > feature and check if they're changed (well, if the grammar doesn't
> > update anithing, the updates could be excluded)
> >
> > I want to open the annototor, but at the moment I don't have the
> > permission to do that.
> >
> > But, the better would be to have a JAPE clone, or something better,
> > that uses UIMA directly.
> > I want to take a loook to the BSFAnnotator to understand if it could be
> > usefull.
> >
> > cheers,
> > Roberto
> >
> > --
> > Roberto Franchini
> > CELI s.r.l. (http://www.celi.it) - C.so Moncalieri 21 - 10131 Torino -
> ITALY
> > Tel +39-011-6600814 - Fax +39-011-6600687
> > jabber:ro.franchini@gmail.com <ja...@gmail.com>skype:ro.franchini
>
>


-- 
Andrew Borthwick, Ph.D. | SPOCK Networks
Spock is Hiring!
www.spock.com/jobs
P.S. We pay a $5,000 referral fee for anyone we hire

Re: proposal for a new testing and evaluation component

Posted by Frank Schilder <fr...@thomsonreuters.com>.

  
>> 
>>> 
>>> If not, does anything like this exist for UIMA right now or is anything in
>>> the works?
>> 
>> I know of several proprietary ones, but nothing open source.  It
>> would be nice to have something like Jape in UIMA.
>> 
> 
> well, I wrote an annotator that uses Jape.
> 

We have been using ANTLR (www.antlr.org) for writing grammars that detect,
for example, temporal and monetary expressions. The integration of an ANTLR
lexer and parser into UIMA was fairly straight forward. We based our
integration on a posting that explains the interfacing of StAX with ANTLR
http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR

ANTLR grammars are written in EBNF and can be compiled into different
programming languages (e.g. Java, C, C#). The ANTLR grammar can also contain
Java code, if you want to manipulate other objects (e.g. adding annotations
to the CAS) while parsing the input.

You can write an ANTLR grammar, add java code to it and compile everything
into a java class. This java class can then be used by your AE in UIMA.

We experimented with lexers and parsers in ANTLR:

1) a lexer in ANTLR can be set to be a scanner that scans an input string
for expressions defined within EBNF
2) a parser expects a stream of ANTLR tokens. A stream of ANTLR tokens can
be constructed from UIMA annotations (see integration of StAX events into
ANTLR). Such a grammar can detect more complex structures consisting of
basic (UIMA) annotations.


The grammar formalism used by ANTLR is LL(*) which is more flexible than
LL(k). We found the grammars we wrote are much faster than the Jape grammars
we also used within UIMA. You're more constrained by the LL(*) formalism in
writing rules, but ANTLRworks is a useful GUI development environment that
alerts you to ambiguous rules.
http://www.antlr.org/works/index.html

BTW: This work will also be discusses as part of our paper at the LREC UIMA
workshop next week.
http://watchtower.coling.uni-jena.de/~coling/uimaws_lrec2008/

Frank




> 
> There are some limits:
> - it's impossible to create (in jape) an annotation that references to
> another annotation, that's easy to do in uima (pseudo code):
> Lemma lemma = new Lemma(cas);
> Token token = new Token(cas);
> token.setLemma(lemma);
> - the annotator is packaged as a PEAR that include ALL the GATE jars...
> - if the annotator is deployed in a web context, only the precompiled
> grammars are working: I think it's a class loading problem: the pear
> is loaded by a class loader, the uimaframework in deployed inside a
> web context that is under another class loader.... and so on....
> -performance: the reverse mapping from gate to uima il slow: updating
> the existing annotation means scanning all the annos in the cas, each
> feature and check if they're changed (well, if the grammar doesn't
> update anithing, the updates could be excluded)
> 
> I want to open the annototor, but at the moment I don't have the
> permission to do that.
> 
> But, the better would be to have a JAPE clone, or something better,
> that uses UIMA directly.
> I want to take a loook to the BSFAnnotator to understand if it could be
> usefull.
> 
> cheers,
> Roberto
> 
> -- 
> Roberto Franchini
> CELI s.r.l. (http://www.celi.it) - C.so Moncalieri 21 - 10131 Torino - ITALY
> Tel +39-011-6600814 - Fax +39-011-6600687
> jabber:ro.franchini@gmail.com skype:ro.franchini

Re: proposal for a new testing and evaluation component

Posted by Roberto Franchini <ro...@gmail.com>.

On Wed, May 21, 2008 at 10:41 AM, Thilo Goetz <tw...@gmx.de> wrote:
> Andrew Borthwick wrote:
>>
>> Does this provide functionality similar to GATE's JAPE regular expression
>> language, i.e. could I use CFE to create new UIMA annotations as the
>> result
>> of regular expressions over other UIMA annotations?
>
> I don't think so.
>
>>
>> If not, does anything like this exist for UIMA right now or is anything in
>> the works?
>
> I know of several proprietary ones, but nothing open source.  It
> would be nice to have something like Jape in UIMA.
>

well, I wrote an annotator that uses Jape.

The annotator can use Batch grammars and precompiled (with japec)
grammars. It works this way:
- ther's an annotation mapper that takes a UIMA annotation and maps it
to a GATE ones
- the JAPE grammars are executed: grammars can update existing
annotation and create new one in the GATE's world
- tha annotaton mapper update annotations and creates news in the UIMA's world

There are some limits:
- it's impossible to create (in jape) an annotation that references to
another annotation, that's easy to do in uima (pseudo code):
Lemma lemma = new Lemma(cas);
Token token = new Token(cas);
token.setLemma(lemma);
- the annotator is packaged as a PEAR that include ALL the GATE jars...
- if the annotator is deployed in a web context, only the precompiled
grammars are working: I think it's a class loading problem: the pear
is loaded by a class loader, the uimaframework in deployed inside a
web context that is under another class loader.... and so on....
-performance: the reverse mapping from gate to uima il slow: updating
the existing annotation means scanning all the annos in the cas, each
feature and check if they're changed (well, if the grammar doesn't
update anithing, the updates could be excluded)

I want to open the annototor, but at the moment I don't have the
permission to do that.

But, the better would be to have a JAPE clone, or something better,
that uses UIMA directly.
I want to take a loook to the BSFAnnotator to understand if it could be usefull.

cheers,
Roberto

-- 
Roberto Franchini
CELI s.r.l. (http://www.celi.it) - C.so Moncalieri 21 - 10131 Torino - ITALY
Tel +39-011-6600814 - Fax +39-011-6600687
jabber:ro.franchini@gmail.com skype:ro.franchini

Re: proposal for a new testing and evaluation component

Posted by Thilo Goetz <tw...@gmx.de>.

Andrew Borthwick wrote:
> Does this provide functionality similar to GATE's JAPE regular expression
> language, i.e. could I use CFE to create new UIMA annotations as the result
> of regular expressions over other UIMA annotations?

I don't think so.

> 
> If not, does anything like this exist for UIMA right now or is anything in
> the works?

I know of several proprietary ones, but nothing open source.  It
would be nice to have something like Jape in UIMA.

--Thilo

Re: proposal for a new testing and evaluation component

Posted by Andrew Borthwick <an...@corp.spock.com>.

Does this provide functionality similar to GATE's JAPE regular expression
language, i.e. could I use CFE to create new UIMA annotations as the result
of regular expressions over other UIMA annotations?

If not, does anything like this exist for UIMA right now or is anything in
the works?

Thanks,
Andrew Borthwick

On Thu, May 15, 2008 at 11:08 AM, Thilo Goetz <tw...@gmx.de> wrote:

> Igor Sominsky wrote:
>
>> The vocabulary can be either fully embedded into the configuration file or
>> referenced by a URI. Any UIMA annotation feature or result of get-like
>> method (getCoverdText for instance) could be evaluated whether it belongs to
>> the list, so it could be included to or excluded from extraction.
>>
>> I am not sure if I understand your second question correctly, but let me
>> try to answer it. CFE implements the extraction process in 2 steps. On the
>> first step an annotation that represents a certain concept is located. It
>> can be a single word annotation (uima.tt.TokenAnnotation for instance) or a
>> custom type annotation that contains the group of words in its properties
>> (FSArray for instance). But in any case your concept must be represented by
>> a single annotation. On the second step, annotations that are in a certain
>> context (defined by a configuration file) of you concept annotation are
>> located. For example, the configuration file could specify to extract
>> features from 5 annotations to the left from an annotation that represents
>> the concept (let's say a particular word). The annotations that are located
>> on the second step - are the annotations the features are extracted from. I
>> hope I got your question right
>>
>
> Perfectly, thank you.  You answered the question that I was
> trying to ask.
>
> This would be our missing link to Apache Mahout.  At the moment,
> their input formats are still moving targets afaict.  Once they've
> settled down, we can generate input data for Mahout and use
> their machine learning algorithms.
>
> --Thilo
>
>
>
>> Igor
>>
>> ----- Original Message ----- From: "Thilo Goetz" <tw...@gmx.de>
>> To: <ui...@incubator.apache.org>
>> Sent: Thursday, May 15, 2008 1:07 PM
>> Subject: Re: proposal for a new testing and evaluation component
>>
>>
>>  Cool, we absolutely need this!  I was actually about to
>>> write something like this myself, but now I think I can
>>> wait a little longer :-)
>>>
>>> I have quite a few questions on this, here are just some
>>> of them:
>>>
>>> Can you integrate external resources in the process?  For
>>> example, I might have a list of last names, and a feature
>>> might be if a token occurs in that list or not.
>>>
>>> I'd like to apply this to learning for individual words
>>> or word windows.  Is that possible with/supported by your
>>> tool?
>>>
>>> --Thilo
>>>
>>> Igor Sominsky wrote:
>>>
>>>> My group would like to offer the following UIMA component, Common
>>>> Feature Extractor (CFE), as an open source offering into the UIMA sandbox,
>>>> assuming there is interest from the community:
>>>>
>>>>  CFE enables the configuration driven feature value extraction from UIMA
>>>> annotations contained in CAS. The extracted information can be used for
>>>> statistical analysis, performance metrics evaluation, regression testing and
>>>> machine learning related processing. CFE provides a flexible, yet powerful
>>>> language FESL (Feature Extraction Specification Language) for working with
>>>> the UIMA CAS to enable the collection and classification of resultant data.
>>>> FESL is a declarative XML-based language that expresses semantic rules for
>>>> the feature extraction. While the rules guide the feature extraction in a
>>>> completely generalized way and CFE provides methods for subsequent
>>>> processing to format the output of the extraction as needed for downstream
>>>> use.  The destination for the output is defined by a particular application
>>>> where CFE is used (CAS, external file, database, etc.). CFE could be
>>>> implemented by either TAE or CAS Consumer, depending on a particular
>>>> application needs
>>>>
>>>>  FESL rules allow flexible and powerful way of defining multi-parameter
>>>> criteria for specific information to be extracted from CAS. Such criteria
>>>> can be customized by:
>>>>
>>>>  1.. a type of an UIMA annotation object that contains the feature of
>>>> interest
>>>>  2.. a surrounding (enclosing) annotation type and a relative location
>>>> of the object within the enclosure that limits the extraction within a
>>>> boundaries of a certain UIMA type.
>>>>  3.. "path" to the feature from the annotation object
>>>>  4.. a type and value of the feature itself
>>>>  5.. values of any public Java get-style methods (methods that accept no
>>>> parameters and return a value) implemented by the underlying class of the
>>>> feature
>>>>  6.. a location of the object or the feature on a specific path (in
>>>> cases when it is required to select/bypass annotations if they are features
>>>> of other UIMA annotation types)
>>>>  The feature values can be evaluated by conditional expressions stated
>>>> in FESL. Particularly, the feature values can be evaluated whether they:
>>>>
>>>>  1.. are of a certain type
>>>>  2.. belong to a specific set of values (vocabulary)
>>>>  3.. belong to a range of numeric values (inclusively or
>>>> non-inclusively)
>>>>  4.. match certain bits of a bit mask (integer values only)
>>>>  5.. match a Java regular expression pattern, These expressions can be
>>>> specified in disjunctive normal form that gives a powerful and flexible way
>>>> of defining fairly complex criteria for an extraction of a required
>>>> annotation and/or its value
>>>>
>>>>  The FESL itself is defined in XSD format and integrated with EMF for
>>>> syntax validation and automated code generation. CFE has been successfully
>>>> used in several internal projects for evaluation of performance metrics and
>>>> machine learning.
>>>>
>>>>  CFE is described in more detail in the paper "CFE - a system for
>>>> testing, evaluation and machine learning of UIMA based applications", by I.
>>>> Sominsky, A. Coden, M. Tanenblatt that will be presented at UIMA for NLP
>>>> workshop as part of the LREC 2008 conference in Marrakech, Morocco. Igor
>>>> Sominsky
>>>>
>>>> sominsky@gmail.com
>>>>
>>>
>


-- 
Andrew Borthwick, Ph.D. | SPOCK Networks
Spock is Hiring!
www.spock.com/jobs
P.S. We pay a $5,000 referral fee for anyone we hire

Re: proposal for a new testing and evaluation component

Posted by Igor Sominsky <so...@gmail.com>.

Abhay,
To help you please describe how you plan to use CFE. You may also take
a look at tests to see usage example

Regards
Igor

On Apr 20, 2012, at 12:35 AM, Abhay Chaware
<ab...@kpitcummins.com> wrote:

> I am trying to use CFE with UIMA, but not able to find some good tutorial /
> documentation. I have gone through the apache UIMA CFE documentation, but not
> able to proceed further .. can you pls help ?
>
> -abhay
>
>

Re: proposal for a new testing and evaluation component

Posted by Abhay Chaware <ab...@kpitcummins.com>.

I am trying to use CFE with UIMA, but not able to find some good tutorial / 
documentation. I have gone through the apache UIMA CFE documentation, but not 
able to proceed further .. can you pls help ?

-abhay

Re: proposal for a new testing and evaluation component

Posted by Thilo Goetz <tw...@gmx.de>.

Igor Sominsky wrote:
> The vocabulary can be either fully embedded into the configuration file 
> or referenced by a URI. Any UIMA annotation feature or result of 
> get-like method (getCoverdText for instance) could be evaluated whether 
> it belongs to the list, so it could be included to or excluded from 
> extraction.
> 
> I am not sure if I understand your second question correctly, but let me 
> try to answer it. CFE implements the extraction process in 2 steps. On 
> the first step an annotation that represents a certain concept is 
> located. It can be a single word annotation (uima.tt.TokenAnnotation for 
> instance) or a custom type annotation that contains the group of words 
> in its properties (FSArray for instance). But in any case your concept 
> must be represented by a single annotation. On the second step, 
> annotations that are in a certain context (defined by a configuration 
> file) of you concept annotation are located. For example, the 
> configuration file could specify to extract features from 5 annotations 
> to the left from an annotation that represents the concept (let's say a 
> particular word). The annotations that are located on the second step - 
> are the annotations the features are extracted from. I hope I got your 
> question right

Perfectly, thank you.  You answered the question that I was
trying to ask.

This would be our missing link to Apache Mahout.  At the moment,
their input formats are still moving targets afaict.  Once they've
settled down, we can generate input data for Mahout and use
their machine learning algorithms.

--Thilo

> 
> Igor
> 
> ----- Original Message ----- From: "Thilo Goetz" <tw...@gmx.de>
> To: <ui...@incubator.apache.org>
> Sent: Thursday, May 15, 2008 1:07 PM
> Subject: Re: proposal for a new testing and evaluation component
> 
> 
>> Cool, we absolutely need this!  I was actually about to
>> write something like this myself, but now I think I can
>> wait a little longer :-)
>>
>> I have quite a few questions on this, here are just some
>> of them:
>>
>> Can you integrate external resources in the process?  For
>> example, I might have a list of last names, and a feature
>> might be if a token occurs in that list or not.
>>
>> I'd like to apply this to learning for individual words
>> or word windows.  Is that possible with/supported by your
>> tool?
>>
>> --Thilo
>>
>> Igor Sominsky wrote:
>>> My group would like to offer the following UIMA component, Common 
>>> Feature Extractor (CFE), as an open source offering into the UIMA 
>>> sandbox, assuming there is interest from the community:
>>>
>>>  CFE enables the configuration driven feature value extraction from 
>>> UIMA annotations contained in CAS. The extracted information can be 
>>> used for statistical analysis, performance metrics evaluation, 
>>> regression testing and machine learning related processing. CFE 
>>> provides a flexible, yet powerful language FESL (Feature Extraction 
>>> Specification Language) for working with the UIMA CAS to enable the 
>>> collection and classification of resultant data. FESL is a 
>>> declarative XML-based language that expresses semantic rules for the 
>>> feature extraction. While the rules guide the feature extraction in a 
>>> completely generalized way and CFE provides methods for subsequent 
>>> processing to format the output of the extraction as needed for 
>>> downstream use.  The destination for the output is defined by a 
>>> particular application where CFE is used (CAS, external file, 
>>> database, etc.). CFE could be implemented by either TAE or CAS 
>>> Consumer, depending on a particular application needs
>>>
>>>  FESL rules allow flexible and powerful way of defining 
>>> multi-parameter criteria for specific information to be extracted 
>>> from CAS. Such criteria can be customized by:
>>>
>>>   1.. a type of an UIMA annotation object that contains the feature 
>>> of interest
>>>   2.. a surrounding (enclosing) annotation type and a relative 
>>> location of the object within the enclosure that limits the 
>>> extraction within a boundaries of a certain UIMA type.
>>>   3.. "path" to the feature from the annotation object
>>>   4.. a type and value of the feature itself
>>>   5.. values of any public Java get-style methods (methods that 
>>> accept no parameters and return a value) implemented by the 
>>> underlying class of the feature
>>>   6.. a location of the object or the feature on a specific path (in 
>>> cases when it is required to select/bypass annotations if they are 
>>> features of other UIMA annotation types)
>>>  The feature values can be evaluated by conditional expressions 
>>> stated in FESL. Particularly, the feature values can be evaluated 
>>> whether they:
>>>
>>>   1.. are of a certain type
>>>   2.. belong to a specific set of values (vocabulary)
>>>   3.. belong to a range of numeric values (inclusively or 
>>> non-inclusively)
>>>   4.. match certain bits of a bit mask (integer values only)
>>>   5.. match a Java regular expression pattern, These expressions can 
>>> be specified in disjunctive normal form that gives a powerful and 
>>> flexible way of defining fairly complex criteria for an extraction of 
>>> a required annotation and/or its value
>>>
>>>  The FESL itself is defined in XSD format and integrated with EMF for 
>>> syntax validation and automated code generation. CFE has been 
>>> successfully used in several internal projects for evaluation of 
>>> performance metrics and machine learning.
>>>
>>>  CFE is described in more detail in the paper "CFE - a system for 
>>> testing, evaluation and machine learning of UIMA based applications", 
>>> by I. Sominsky, A. Coden, M. Tanenblatt that will be presented at 
>>> UIMA for NLP workshop as part of the LREC 2008 conference in 
>>> Marrakech, Morocco. Igor Sominsky
>>>
>>> sominsky@gmail.com

Re: proposal for a new testing and evaluation component

Posted by Igor Sominsky <so...@gmail.com>.

The vocabulary can be either fully embedded into the configuration file or 
referenced by a URI. Any UIMA annotation feature or result of get-like 
method (getCoverdText for instance) could be evaluated whether it belongs to 
the list, so it could be included to or excluded from extraction.

I am not sure if I understand your second question correctly, but let me try 
to answer it. CFE implements the extraction process in 2 steps. On the first 
step an annotation that represents a certain concept is located. It can be a 
single word annotation (uima.tt.TokenAnnotation for instance) or a custom 
type annotation that contains the group of words in its properties (FSArray 
for instance). But in any case your concept must be represented by a single 
annotation. On the second step, annotations that are in a certain context 
(defined by a configuration file) of you concept annotation are located. For 
example, the configuration file could specify to extract features from 5 
annotations to the left from an annotation that represents the concept 
(let's say a particular word). The annotations that are located on the 
second step - are the annotations the features are extracted from. I hope I 
got your question right

Igor

----- Original Message ----- 
From: "Thilo Goetz" <tw...@gmx.de>
To: <ui...@incubator.apache.org>
Sent: Thursday, May 15, 2008 1:07 PM
Subject: Re: proposal for a new testing and evaluation component


> Cool, we absolutely need this!  I was actually about to
> write something like this myself, but now I think I can
> wait a little longer :-)
>
> I have quite a few questions on this, here are just some
> of them:
>
> Can you integrate external resources in the process?  For
> example, I might have a list of last names, and a feature
> might be if a token occurs in that list or not.
>
> I'd like to apply this to learning for individual words
> or word windows.  Is that possible with/supported by your
> tool?
>
> --Thilo
>
> Igor Sominsky wrote:
>> My group would like to offer the following UIMA component, Common Feature 
>> Extractor (CFE), as an open source offering into the UIMA sandbox, 
>> assuming there is interest from the community:
>>
>>  CFE enables the configuration driven feature value extraction from UIMA 
>> annotations contained in CAS. The extracted information can be used for 
>> statistical analysis, performance metrics evaluation, regression testing 
>> and machine learning related processing. CFE provides a flexible, yet 
>> powerful language FESL (Feature Extraction Specification Language) for 
>> working with the UIMA CAS to enable the collection and classification of 
>> resultant data. FESL is a declarative XML-based language that expresses 
>> semantic rules for the feature extraction. While the rules guide the 
>> feature extraction in a completely generalized way and CFE provides 
>> methods for subsequent processing to format the output of the extraction 
>> as needed for downstream use.  The destination for the output is defined 
>> by a particular application where CFE is used (CAS, external file, 
>> database, etc.). CFE could be implemented by either TAE or CAS Consumer, 
>> depending on a particular application needs
>>
>>  FESL rules allow flexible and powerful way of defining multi-parameter 
>> criteria for specific information to be extracted from CAS. Such criteria 
>> can be customized by:
>>
>>   1.. a type of an UIMA annotation object that contains the feature of 
>> interest
>>   2.. a surrounding (enclosing) annotation type and a relative location 
>> of the object within the enclosure that limits the extraction within a 
>> boundaries of a certain UIMA type.
>>   3.. "path" to the feature from the annotation object
>>   4.. a type and value of the feature itself
>>   5.. values of any public Java get-style methods (methods that accept no 
>> parameters and return a value) implemented by the underlying class of the 
>> feature
>>   6.. a location of the object or the feature on a specific path (in 
>> cases when it is required to select/bypass annotations if they are 
>> features of other UIMA annotation types)
>>  The feature values can be evaluated by conditional expressions stated in 
>> FESL. Particularly, the feature values can be evaluated whether they:
>>
>>   1.. are of a certain type
>>   2.. belong to a specific set of values (vocabulary)
>>   3.. belong to a range of numeric values (inclusively or 
>> non-inclusively)
>>   4.. match certain bits of a bit mask (integer values only)
>>   5.. match a Java regular expression pattern, These expressions can be 
>> specified in disjunctive normal form that gives a powerful and flexible 
>> way of defining fairly complex criteria for an extraction of a required 
>> annotation and/or its value
>>
>>  The FESL itself is defined in XSD format and integrated with EMF for 
>> syntax validation and automated code generation. CFE has been 
>> successfully used in several internal projects for evaluation of 
>> performance metrics and machine learning.
>>
>>  CFE is described in more detail in the paper "CFE - a system for 
>> testing, evaluation and machine learning of UIMA based applications", by 
>> I. Sominsky, A. Coden, M. Tanenblatt that will be presented at UIMA for 
>> NLP workshop as part of the LREC 2008 conference in Marrakech, Morocco. 
>> Igor Sominsky
>>
>> sominsky@gmail.com

Re: proposal for a new testing and evaluation component

Posted by Thilo Goetz <tw...@gmx.de>.

Cool, we absolutely need this!  I was actually about to
write something like this myself, but now I think I can
wait a little longer :-)

I have quite a few questions on this, here are just some
of them:

Can you integrate external resources in the process?  For
example, I might have a list of last names, and a feature
might be if a token occurs in that list or not.

I'd like to apply this to learning for individual words
or word windows.  Is that possible with/supported by your
tool?

--Thilo

Igor Sominsky wrote:
> My group would like to offer the following UIMA component, Common Feature Extractor (CFE), as an open source offering into the UIMA sandbox, assuming there is interest from the community:
> 
>  
> 
> CFE enables the configuration driven feature value extraction from UIMA annotations contained in CAS. The extracted information can be used for statistical analysis, performance metrics evaluation, regression testing and machine learning related processing. 
> 
>  
> 
> CFE provides a flexible, yet powerful language FESL (Feature Extraction Specification Language) for working with the UIMA CAS to enable the collection and classification of resultant data. FESL is a declarative XML-based language that expresses semantic rules for the feature extraction. While the rules guide the feature extraction in a completely generalized way and CFE provides methods for subsequent processing to format the output of the extraction as needed for downstream use.  The destination for the output is defined by a particular application where CFE is used (CAS, external file, database, etc.). CFE could be implemented by either TAE or CAS Consumer, depending on a particular application needs
> 
>  
> 
> FESL rules allow flexible and powerful way of defining multi-parameter criteria for specific information to be extracted from CAS. Such criteria can be customized by:
> 
>   1.. a type of an UIMA annotation object that contains the feature of interest
>   2.. a surrounding (enclosing) annotation type and a relative location of the object within the enclosure that limits the extraction within a boundaries of a certain UIMA type.
>   3.. "path" to the feature from the annotation object
>   4.. a type and value of the feature itself
>   5.. values of any public Java get-style methods (methods that accept no parameters and return a value) implemented by the underlying class of the feature
>   6.. a location of the object or the feature on a specific path (in cases when it is required to select/bypass annotations if they are features of other UIMA annotation types)
>  
> 
> The feature values can be evaluated by conditional expressions stated in FESL. Particularly, the feature values can be evaluated whether they:
> 
>   1.. are of a certain type
>   2.. belong to a specific set of values (vocabulary)
>   3.. belong to a range of numeric values (inclusively or non-inclusively)
>   4.. match certain bits of a bit mask (integer values only)
>   5.. match a Java regular expression pattern, 
>  
> 
> These expressions can be specified in disjunctive normal form that gives a powerful and flexible way of defining fairly complex criteria for an extraction of a required annotation and/or its value
> 
>  
> 
> The FESL itself is defined in XSD format and integrated with EMF for syntax validation and automated code generation. 
> 
>  
> 
> CFE has been successfully used in several internal projects for evaluation of performance metrics and machine learning.
> 
>  
> 
> CFE is described in more detail in the paper "CFE - a system for testing, evaluation and machine learning of UIMA based applications", by I. Sominsky, A. Coden, M. Tanenblatt that will be presented at UIMA for NLP workshop as part of the LREC 2008 conference in Marrakech, Morocco. 
> 
> 
> 
> Igor Sominsky
> 
> sominsky@gmail.com

Re: proposal for a new testing and evaluation component

Posted by Marshall Schor <ms...@schor.com>.

I'm starting a vote on this on uima-dev mailing list.

-Marshall

Marshall Schor wrote:
> Michael Tanenblatt wrote:
>> So, is there any actual interest in accepting this into the sandbox?
>> Discussions died down with no resolution.
>>
>> ...m
> Yes, please submit a Jira issue with an attachmentment and a checksum
> for it.  Then we'll call an official vote on the uima-dev list.
> -Marshall
>>
>>
>> On May 15, 2008, at 12:24 PM, Igor Sominsky wrote:
>>
>>> My group would like to offer the following UIMA component, Common
>>> Feature Extractor (CFE), as an open source offering into the UIMA
>>> sandbox, assuming there is interest from the community:
>>>
>>>
>>>
>>> CFE enables the configuration driven feature value extraction from
>>> UIMA annotations contained in CAS. The extracted information can be
>>> used for statistical analysis, performance metrics evaluation,
>>> regression testing and machine learning related processing.
>>>
>>>
>>>
>>> CFE provides a flexible, yet powerful language FESL (Feature
>>> Extraction Specification Language) for working with the UIMA CAS to
>>> enable the collection and classification of resultant data. FESL is
>>> a declarative XML-based language that expresses semantic rules for
>>> the feature extraction. While the rules guide the feature extraction
>>> in a completely generalized way and CFE provides methods for
>>> subsequent processing to format the output of the extraction as
>>> needed for downstream use.  The destination for the output is
>>> defined by a particular application where CFE is used (CAS, external
>>> file, database, etc.). CFE could be implemented by either TAE or CAS
>>> Consumer, depending on a particular application needs
>>>
>>>
>>>
>>> FESL rules allow flexible and powerful way of defining
>>> multi-parameter criteria for specific information to be extracted
>>> from CAS. Such criteria can be customized by:
>>>
>>>  1.. a type of an UIMA annotation object that contains the feature
>>> of interest
>>>  2.. a surrounding (enclosing) annotation type and a relative
>>> location of the object within the enclosure that limits the
>>> extraction within a boundaries of a certain UIMA type.
>>>  3.. "path" to the feature from the annotation object
>>>  4.. a type and value of the feature itself
>>>  5.. values of any public Java get-style methods (methods that
>>> accept no parameters and return a value) implemented by the
>>> underlying class of the feature
>>>  6.. a location of the object or the feature on a specific path (in
>>> cases when it is required to select/bypass annotations if they are
>>> features of other UIMA annotation types)
>>>
>>>
>>> The feature values can be evaluated by conditional expressions
>>> stated in FESL. Particularly, the feature values can be evaluated
>>> whether they:
>>>
>>>  1.. are of a certain type
>>>  2.. belong to a specific set of values (vocabulary)
>>>  3.. belong to a range of numeric values (inclusively or
>>> non-inclusively)
>>>  4.. match certain bits of a bit mask (integer values only)
>>>  5.. match a Java regular expression pattern,
>>>
>>>
>>> These expressions can be specified in disjunctive normal form that
>>> gives a powerful and flexible way of defining fairly complex
>>> criteria for an extraction of a required annotation and/or its value
>>>
>>>
>>>
>>> The FESL itself is defined in XSD format and integrated with EMF for
>>> syntax validation and automated code generation.
>>>
>>>
>>>
>>> CFE has been successfully used in several internal projects for
>>> evaluation of performance metrics and machine learning.
>>>
>>>
>>>
>>> CFE is described in more detail in the paper "CFE - a system for
>>> testing, evaluation and machine learning of UIMA based
>>> applications", by I. Sominsky, A. Coden, M. Tanenblatt that will be
>>> presented at UIMA for NLP workshop as part of the LREC 2008
>>> conference in Marrakech, Morocco.
>>>
>>>
>>>
>>> Igor Sominsky
>>>
>>> sominsky@gmail.com
>>
>>
>>
>
>
>

Re: proposal for a new testing and evaluation component

Posted by Marshall Schor <ms...@schor.com>.

Michael Tanenblatt wrote:
> So, is there any actual interest in accepting this into the sandbox? 
> Discussions died down with no resolution.
>
> ...m
Yes, please submit a Jira issue with an attachmentment and a checksum 
for it.  Then we'll call an official vote on the uima-dev list. 

-Marshall
>
>
> On May 15, 2008, at 12:24 PM, Igor Sominsky wrote:
>
>> My group would like to offer the following UIMA component, Common 
>> Feature Extractor (CFE), as an open source offering into the UIMA 
>> sandbox, assuming there is interest from the community:
>>
>>
>>
>> CFE enables the configuration driven feature value extraction from 
>> UIMA annotations contained in CAS. The extracted information can be 
>> used for statistical analysis, performance metrics evaluation, 
>> regression testing and machine learning related processing.
>>
>>
>>
>> CFE provides a flexible, yet powerful language FESL (Feature 
>> Extraction Specification Language) for working with the UIMA CAS to 
>> enable the collection and classification of resultant data. FESL is a 
>> declarative XML-based language that expresses semantic rules for the 
>> feature extraction. While the rules guide the feature extraction in a 
>> completely generalized way and CFE provides methods for subsequent 
>> processing to format the output of the extraction as needed for 
>> downstream use.  The destination for the output is defined by a 
>> particular application where CFE is used (CAS, external file, 
>> database, etc.). CFE could be implemented by either TAE or CAS 
>> Consumer, depending on a particular application needs
>>
>>
>>
>> FESL rules allow flexible and powerful way of defining 
>> multi-parameter criteria for specific information to be extracted 
>> from CAS. Such criteria can be customized by:
>>
>>  1.. a type of an UIMA annotation object that contains the feature of 
>> interest
>>  2.. a surrounding (enclosing) annotation type and a relative 
>> location of the object within the enclosure that limits the 
>> extraction within a boundaries of a certain UIMA type.
>>  3.. "path" to the feature from the annotation object
>>  4.. a type and value of the feature itself
>>  5.. values of any public Java get-style methods (methods that accept 
>> no parameters and return a value) implemented by the underlying class 
>> of the feature
>>  6.. a location of the object or the feature on a specific path (in 
>> cases when it is required to select/bypass annotations if they are 
>> features of other UIMA annotation types)
>>
>>
>> The feature values can be evaluated by conditional expressions stated 
>> in FESL. Particularly, the feature values can be evaluated whether they:
>>
>>  1.. are of a certain type
>>  2.. belong to a specific set of values (vocabulary)
>>  3.. belong to a range of numeric values (inclusively or 
>> non-inclusively)
>>  4.. match certain bits of a bit mask (integer values only)
>>  5.. match a Java regular expression pattern,
>>
>>
>> These expressions can be specified in disjunctive normal form that 
>> gives a powerful and flexible way of defining fairly complex criteria 
>> for an extraction of a required annotation and/or its value
>>
>>
>>
>> The FESL itself is defined in XSD format and integrated with EMF for 
>> syntax validation and automated code generation.
>>
>>
>>
>> CFE has been successfully used in several internal projects for 
>> evaluation of performance metrics and machine learning.
>>
>>
>>
>> CFE is described in more detail in the paper "CFE - a system for 
>> testing, evaluation and machine learning of UIMA based applications", 
>> by I. Sominsky, A. Coden, M. Tanenblatt that will be presented at 
>> UIMA for NLP workshop as part of the LREC 2008 conference in 
>> Marrakech, Morocco.
>>
>>
>>
>> Igor Sominsky
>>
>> sominsky@gmail.com
>
>
>

Re: proposal for a new testing and evaluation component

Posted by Michael Tanenblatt <sl...@park-slope.net>.

So, is there any actual interest in accepting this into the sandbox?  
Discussions died down with no resolution.

...m


On May 15, 2008, at 12:24 PM, Igor Sominsky wrote:

> My group would like to offer the following UIMA component, Common  
> Feature Extractor (CFE), as an open source offering into the UIMA  
> sandbox, assuming there is interest from the community:
>
>
>
> CFE enables the configuration driven feature value extraction from  
> UIMA annotations contained in CAS. The extracted information can be  
> used for statistical analysis, performance metrics evaluation,  
> regression testing and machine learning related processing.
>
>
>
> CFE provides a flexible, yet powerful language FESL (Feature  
> Extraction Specification Language) for working with the UIMA CAS to  
> enable the collection and classification of resultant data. FESL is  
> a declarative XML-based language that expresses semantic rules for  
> the feature extraction. While the rules guide the feature extraction  
> in a completely generalized way and CFE provides methods for  
> subsequent processing to format the output of the extraction as  
> needed for downstream use.  The destination for the output is  
> defined by a particular application where CFE is used (CAS, external  
> file, database, etc.). CFE could be implemented by either TAE or CAS  
> Consumer, depending on a particular application needs
>
>
>
> FESL rules allow flexible and powerful way of defining multi- 
> parameter criteria for specific information to be extracted from  
> CAS. Such criteria can be customized by:
>
>  1.. a type of an UIMA annotation object that contains the feature  
> of interest
>  2.. a surrounding (enclosing) annotation type and a relative  
> location of the object within the enclosure that limits the  
> extraction within a boundaries of a certain UIMA type.
>  3.. "path" to the feature from the annotation object
>  4.. a type and value of the feature itself
>  5.. values of any public Java get-style methods (methods that  
> accept no parameters and return a value) implemented by the  
> underlying class of the feature
>  6.. a location of the object or the feature on a specific path (in  
> cases when it is required to select/bypass annotations if they are  
> features of other UIMA annotation types)
>
>
> The feature values can be evaluated by conditional expressions  
> stated in FESL. Particularly, the feature values can be evaluated  
> whether they:
>
>  1.. are of a certain type
>  2.. belong to a specific set of values (vocabulary)
>  3.. belong to a range of numeric values (inclusively or non- 
> inclusively)
>  4.. match certain bits of a bit mask (integer values only)
>  5.. match a Java regular expression pattern,
>
>
> These expressions can be specified in disjunctive normal form that  
> gives a powerful and flexible way of defining fairly complex  
> criteria for an extraction of a required annotation and/or its value
>
>
>
> The FESL itself is defined in XSD format and integrated with EMF for  
> syntax validation and automated code generation.
>
>
>
> CFE has been successfully used in several internal projects for  
> evaluation of performance metrics and machine learning.
>
>
>
> CFE is described in more detail in the paper "CFE - a system for  
> testing, evaluation and machine learning of UIMA based  
> applications", by I. Sominsky, A. Coden, M. Tanenblatt that will be  
> presented at UIMA for NLP workshop as part of the LREC 2008  
> conference in Marrakech, Morocco.
>
>
>
> Igor Sominsky
>
> sominsky@gmail.com