You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Richard Eckart de Castilho <re...@apache.org> on 2018/12/15 22:58:29 UTC

Announcing DKPro cassis - UIMA CAS implementation in Python with XMI support

Hi all,

over at DKPro, a new library has been created which might be of interest to some of the people on this list.
I am quoting the description from its GitHub repo below in the mail. More information can be found at

  https://github.com/dkpro/dkpro-cassis

The library is also distributed via PyPI
 
  https://pypi.org/project/dkpro-cassis/

Cheers,

-- Richard

----

# dkpro-cassis

DKPro cassis (pronunciation: [ka.sis]) provides a pure-Python implementation of the Common Analysis System (CAS) as defined by the UIMA framework. The CAS is a data structure representing an object to be enrichted with annotations (the co-called Subject of Analysis, short SofA).

This library enables the creation and manipulation of CAS objects and their associated type systems as well as loading and saving CAS objects in the CAS XMI XML representation in Python programs. This can ease in particular the integration of Python-based Natural Language Processing (e.g. spacy or NLTK) and Machine Learning librarys (e.g. scikit-learn or Keras) in UIMA-based text analysis workflows.

An example of cassis in action is the spacy recommender for INCEpTION, which wraps the spacy NLP library as a web service which can be used in conjunction with the INCEpTION text annotation platform to automatically generate annotation suggestions.

## Features

Currently supported features are:

• Text SofAs
• Deserializing/serializing UIMA CAS from/to XMI
• Deserializing/serializing type systems from/to XML
• Selecting annotations, selecting covered annotations, adding annotations
• Type inheritance
• Multiple SofA support

Some features are still under development, e.g.

• feature encoding as XML elements (right now only XML attributes work)
• proper type checking
• XML/XMI schema validation
• type unmarshalling from string to the actual type specified in the type system
• reference, array and list features