You are viewing a plain text version of this content. The canonical link for it is here.

Posted to announce@apache.org by Jaroslaw Cwiklik <cw...@apache.org> on 2017/08/31 13:48:09 UTC

[ANNOUNCE] Apache UIMA DUCC 2.2.1 released

The Apache UIMA team is pleased to announce the release of the Apache
UIMA-DUCC version 2.2.1.

The Unstructured Information Management Architecture (UIMA) is a component
framework supporting development, discovery, composition, and deployment of
multi-modal analytics tasked with the analysis of unstructured information.

Apache UIMA is an Apache licensed open source implementation of the UIMA
specification which is being developed by a technical committee within
OASIS, a standards organization. The implementation comprises an SDK and
tooling for composing and running analytic components written in Java and
C++, with some support for Perl, Python and TCL.

DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster
management system providing tooling,management, and scheduling facilities
to automate the scale-out of applications written to the UIMA framework.
Core UIMA provides a generalized framework for applications that process
unstructured information such as human language, but does not provide a
scale-out mechanism. UIMA-AS provides a scale-out mechanism to distribute
UIMA pipelines over a cluster of computing resources, but does not provide
job or cluster management of the resources.
DUCC defines a formal job model that closely maps to a standard UIMA
pipeline. Around this job model DUCC provides cluster management services
to automate the scale-out of UIMA pipelines over computing clusters.


This release contains a number of improvements and bug fixes. Notable
updates in this release include:

-The userid of a privileged DUCC installation does not have to be "ducc"
-ducc-mon login can be used on systems where users do not have password
login
-The DUCC head-node daemons may be moved to another host without breaking
working applications
-The deployment descriptor for a UIMA-AS service can be loaded from the
classpath
-Interactive applications run correctly with viaducc (fixed lost inputs)
-Files created by DUCC jobs inherit the permissions of the launching
shell's umask.
-DUCC performance breakdown for scaled synchronous pipelines is now correct
-Fixed javadoc method headers to enable ducc build with java 8
-Fixed JP communication threads wait logic when JD returns no work
-GC stats not available sometimes from remote JP

For a complete list of bugs and improvements included in this release
please see
https://uima.apache.org/d/uima-ducc-2.2.1/issuesFixed/jira-report.html

-- Jerry Cwiklik, for the Apache UIMA development team

Re: uniqueID() function

Posted by Marshall Schor <ms...@schor.com>.

Hi,

The uniqueId() function you found is (as you have noticed) not actually a
method.  It's instead, some special syntax that was supported by the
feature-value-path mechanism.

I think this is not what you're looking for.

The best thing for you to do is to design your type system as follows:

1) separate the types into those which you want to store in the database, and
others.  Examples of others might be things like "temporary" types, or types
which are in some sense "derived" and not worth the redundant storage in the DB.

2) for those types you want to store in the DB, add a feature, let's call it:
db_unique_id.  You can have it be whatever kind of value makes the most sense -
an integer, or a string, for example.

3) Then arrange your code to "set" this when the feature structure is created.

---------------

Having said that, there is a more-or-less unique "id", for every feature
structure in the CAS.  Of course, it's not unique across CASs.  Given a feature
structure myFeatureStructure, you can access it using

myFeatureStructure.hashCode()

In UIMA v3, we have myFeatureStructure._id() 

-Marshall

On 11/1/2017 1:55 PM, Kameron Cole wrote:
> Hello
>
> I am trying to use the uniqueId() function, and find some examples.
> Basically I want to use the CAS unique ID as the unique id Feature for an
> annotation. Fro example, a police report Annotation would have a Feature
> reportid, which would leverage the uniqueId() .  The ultimate purpose is to
> send the CAS to a database table, and use the reportid as the row's unique
> ID.
>
> I can't find any information on it, except here:
>
> http://uima.apache.org/d/uimaj-2.4.2/apidocs/org/apache/uima/cas/FeatureValuePath.html
>
> Contains CAS Type and Feature objects to represent a feature path of the
> form feature1/.../featureN. Each part that is enclosed within / is referred
> to as "path snippet" below. Also contains the necessary evaluation logic to
> yield the value of the feature path. For leaf snippets, the following
> "special features" are defined:
>       coveredText() can be accessed using evaluateAsString
>       typeName() can be accessed using evaluateAsString
>       fsId() can be accessed using evaluateAsInt. Its result can be used to
>       retrieve an FS from the current LowLevel-CAS.
>       uniqueId() can be accessed using evaluateAsInt. Its result can be
>       used to uniquely identify an FS for a document (even if the document
>       is split over several CAS chunks)
>
> This is deprecated, and replaced with
>
> http://uima.apache.org/d/uimaj-2.4.2/apidocs/org/apache/uima/cas/FeaturePath.html
>
> However, FeaturePath does not have the uniqueID() method
>
> The feature path syntax also allows some built-in functions on the last
> feature path element. Built-in functions are added with a ":" followed by
> the function name. E.g. "/my/path:fsId()". The allowed built-in functions
> are:
>       coveredText()
>       fsId()
>       typeName()
> Built-in functions are only evaluated if getValueAsString() is called.
>
> At least, I don't get it. Can I get an example?  Thanks
>
>
>
>

uniqueID() function

Posted by Kameron Cole <ka...@us.ibm.com>.

Hello

I am trying to use the uniqueId() function, and find some examples.
Basically I want to use the CAS unique ID as the unique id Feature for an
annotation. Fro example, a police report Annotation would have a Feature
reportid, which would leverage the uniqueId() .  The ultimate purpose is to
send the CAS to a database table, and use the reportid as the row's unique
ID.

I can't find any information on it, except here:

http://uima.apache.org/d/uimaj-2.4.2/apidocs/org/apache/uima/cas/FeatureValuePath.html

Contains CAS Type and Feature objects to represent a feature path of the
form feature1/.../featureN. Each part that is enclosed within / is referred
to as "path snippet" below. Also contains the necessary evaluation logic to
yield the value of the feature path. For leaf snippets, the following
"special features" are defined:
      coveredText() can be accessed using evaluateAsString
      typeName() can be accessed using evaluateAsString
      fsId() can be accessed using evaluateAsInt. Its result can be used to
      retrieve an FS from the current LowLevel-CAS.
      uniqueId() can be accessed using evaluateAsInt. Its result can be
      used to uniquely identify an FS for a document (even if the document
      is split over several CAS chunks)

This is deprecated, and replaced with

http://uima.apache.org/d/uimaj-2.4.2/apidocs/org/apache/uima/cas/FeaturePath.html

However, FeaturePath does not have the uniqueID() method

The feature path syntax also allows some built-in functions on the last
feature path element. Built-in functions are added with a ":" followed by
the function name. E.g. "/my/path:fsId()". The allowed built-in functions
are:
      coveredText()
      fsId()
      typeName()
Built-in functions are only evaluated if getValueAsString() is called.

At least, I don't get it. Can I get an example?  Thanks