You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Pablo Duboue <pa...@gmail.com> on 2023/01/06 14:07:12 UTC

UIMA-CPP update

Hello,
Things are looking good so far with the UIMA-CPP update. Before I had been
compiling it in an old machine to avoid compiler issues.

Now I moved to another machine with Debian testing using GCC 12 and after
fiddling with some deprecation issues it compiles again using system
libraries for xercers and APR. All the tests pass, so things are looking
good.

It also opens up the possibility of building a Debian package for it, which
can be very handy for Docker installations.

This is what remains (from my perspective):

* Compiling with ActiveMQ (and test it)
* Testing the JNI bit
* Windows build

And these potential extensions:

* Drop UIMACPP_HOME and move to system-wide pkg-config
* Debian package
* GitHub actions for multi-platform compilation
* Mac OS X binaries (where experimental before)

I still haven't hit any killer issue or bug but it might be lurking,
particularly with the Windows build (I know close to nothing about Windows
building).

But things are looking good and I'm very happy about it.

P

Re: UIMA-CPP update

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Pablo,

Good you are confused about my comments. I'll try to explain my reasoning.

> Much more work, but possibly much more user friendly, would be to add an
> > annotator controller mechanism with RPC support around cassis, and add
> > delta-CAS support to cassis serialization.
> >
>
> I found this part really confusing, I don't understand why you are bringing
> dkpro-cassis into the discussion.


I have in mind that python is a popular language for analytic processing,
or
for being a front end for such processing. Perhaps these analytics could
benefit
from using a CAS for all the things a CAS is good for, including
interchange with
other UIMA analytics. Python programmers have long complained about the
non-Pythonic nature of the swig-based interface to a CAS, and my assumption
is that the cassis interface would be preferred.


> I'm also thinking node.js and PHP are suitable targets, both
> languages I'm involved in NLP efforts [1][2]).
>

As far as I understand, PHP and node.js are not used for deep analytics,
but they are popular for client interfaces and traditional backend services.
It would be awkward for PHP clients to use a CAS interface. Makes more
sense for a PHP client to have a much simpler, likely custom interface
to a CAS-based analytic, perhaps using a PHP to CAS translator on the
analysis service, and for that the SWIG interface to uima-cpp makes sense.
I'll be happy to learn how wrong I am :)

Eddie

Re: UIMA-CPP update

Posted by Richard Eckart de Castilho <re...@apache.org>.
> On 2. Feb 2023, at 12:26, Pablo Duboue <pa...@gmail.com> wrote:
> 
> From my perspective, dkpro-cassis exists to help Apache UIMA Java users
> access Python annotators.

That is one use-case. But people are creative and find others ;)

Another "popular" use-case is to use it to perform data conversions
from some format foo to UIMA CAS XMI or vice versa. Even to convert
between CASes in different type systems. Such conversion code can be
way more convenient to write in Python than in Java.

E.g. the INCEpTION annotation tool can import UIMA CAS files and when
people need to import data from some custom format that INCEpTION does
not support, I recommend people to write a converter using cassis.

Similarly, when people want to analyze XMI CAS data, cassis can be a 
helpful tool to load the data and pack it into pandas frames or other
similar Python frameworks and to visualize it using Streamlit or
similar frameworks that are more easily accessible in Python than in Java.

Cheers,

-- Richard

Re: UIMA-CPP update

Posted by Pablo Duboue <pa...@gmail.com>.
Hi!

I have some great news, but I'll send them in a longer email.

On Sun, Jun 4, 2023 at 9:26 AM Richard Eckart de Castilho <re...@apache.org>
wrote:

> Hi,
>
> > On 10. Apr 2023, at 10:32, Pablo Duboue <pa...@gmail.com> wrote:
> >
> > Last but not least, I have been working on my fork [8]. I see most
> > commits have JIRA issues associated with them. When the time is ready
> > I make a pull request from my project? What would you suggest?
>
> I would say best work directly on the official repo instead of working on
> your own branch. Activity is best visible when performed on the main repo.
>

Very good point. It turns out I was missing registering with the gitbox and
just got access to the repo.


> We do not use JIRA anymore but have switched to using GitHub issues
> exclusively.
>

This is great news!


> The uimacpp repo does not seem to have issues enabled atm because it was
> scheduled for going out of business. However, you can enable those by
> setting
> the `github/features/issues` key in the `.asf.yaml` file. See here
>
>   https://github.com/apache/uima-uimaj/blob/main/.asf.yaml


Yes, thanks for that. I have enabled it now.


> As for things like timelines and such - I usually work without those ;)
> I do work as work comes in and every once in a while, I do a release.
> If you wish to do release planning, I'd say that's completely up to you.
>

Very sane approach.

P

Re: UIMA-CPP update

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi,

> On 10. Apr 2023, at 10:32, Pablo Duboue <pa...@gmail.com> wrote:
> 
> Last but not least, I have been working on my fork [8]. I see most
> commits have JIRA issues associated with them. When the time is ready
> I make a pull request from my project? What would you suggest?

I would say best work directly on the official repo instead of working on your own branch. Activity is best visible when performed on the main repo.

We do not use JIRA anymore but have switched to using GitHub issues exclusively.

The uimacpp repo does not seem to have issues enabled atm because it was
scheduled for going out of business. However, you can enable those by setting
the `github/features/issues` key in the `.asf.yaml` file. See here

  https://github.com/apache/uima-uimaj/blob/main/.asf.yaml

As for things like timelines and such - I usually work without those ;)
I do work as work comes in and every once in a while, I do a release.
If you wish to do release planning, I'd say that's completely up to you.

Cheers,

-- Richard

Re: UIMA-CPP update

Posted by Pablo Duboue <pa...@gmail.com>.
On Thu, Apr 6, 2023 at 11:47 AM Richard Eckart de Castilho
<re...@apache.org> wrote:
>
> Hi Pablo,
>
> how are things going? Are there any blockers you might need help with?

Hello!

All is good. Too good actually (since last time we talked I started 3
new contracts, trying to see if I can make one of them somewhat
related to UIMA-CPP).

Yes, there are a number of things that'd be good to discuss here:

* The main new work I have done is to write a pkg-config file for
UIMA-CPP that is fed from automake (more on this below).
* I looked at the existing documentation, the README.4bin [1] file
says that only linux is supported (the exact wording is "MacOS and
Windows versions are delayed pending user requests"). If that is the
case, we are much closer to making a release and I can reuse that
wording on the README.md.
* Went through the bug reports [2], most of them relate to either the
MacOS version or the framework failing to compile on newer systems
(feel free to assign the bugs to me or let me know if I can do that
myself and I'll figure out JIRA a bit). Or relate to UIMA-AS which
we're not working on.
* JIRA and the code in the main branch point to a 3.0.0 release. I
have scaled that back to a 2.5.0 release. I'm concerned users will be
confused as being on par with UIMA3.
* The Release Notes [3] contain the following bit:

>> Installing UIMACPP SDK as a system-wide shared library is discouraged since we do not
>> have support for parallel versions.  The include directory does not have version number and
>>  there cannot be multiple versions of executables runAECpp and deployCppService.

As I'm working on getting UIMA-CPP to build with system dependencies,
this is a good moment to talk about the changes to the world since
UIMA CPP came to be:

Most programs these days run on containers. It seems to me that it
makes more sense to ship a UIMA-CPP that will run on a particular
Linux install and/or a base docker image.

That way we ship the UIMA-CPP binary library, include files, etc
installed as system files. There is no longer need for UIMACPP_HOME to
be defined and we can use existing techniques to find dependencies and
command-line arguments (like pkg-config [4]).This allows to do things
like

  g++  `pkg-config uima --cflags` -c DaveAnnotator.cpp

This changes the concept of what the SDK _is_, as it is now tied to
different underlying releases of the OS. Maybe we can use github
actions [5] to support a few distros without extra effort. But let's
discuss what other people think and where to take it.

I'm also thinking about how useful this release will be in a world
where UIMA2 has been retired. So I'm thinking about adding a
non-trivial example wrapping the ONNX C++ runtime [6]. That could be
fun (and useful) (and related to some of my current contracts) but
will further delay the release.

For a future release, I really want to work on aggregate AEs support.
There was an earlier comment when discussing the retirement of
UIMA-CPP in that regard [7] and it has stayed with me. But that'd be
too much for this release.

There is a bit of busy work ahead cleaning up the existing
documentation files. That also needs to be factored in for a timeline
for the release.

Last but not least, I have been working on my fork [8]. I see most
commits have JIRA issues associated with them. When the time is ready
I make a pull request from my project? What would you suggest?

P



[1] https://github.com/apache/uima-uimacpp/blob/main/README.4bin
[2] https://issues.apache.org/jira/browse/UIMA-6175?jql=project%20%3D%20UIMA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20%3D%20%22C%2B%2B%20Framework%22
[3] https://github.com/apache/uima-uimacpp/blob/main/RELEASE_NOTES.html
 (yes, that is a MS Word HTML file, I'll change it to markdown for the
release.)
[4] https://people.freedesktop.org/~dbn/pkg-config-guide.html
[5] https://docs.github.com/en/actions
[6] https://onnxruntime.ai/docs/build/inferencing.html#supported-architectures-and-build-environments
[7] https://lists.apache.org/thread/f1r3sghgn2oqhvzz27y26zg6j3olv8qq
[8] https://github.com/DrDub/uima-uimacpp

Re: UIMA-CPP update

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi Pablo,

how are things going? Are there any blockers you might need help with?

Cheers,

-- Richard


Re: UIMA-CPP update

Posted by Pablo Duboue <pa...@gmail.com>.
On Wed, Mar 1, 2023 at 9:49 PM Richard Eckart de Castilho <re...@apache.org>
wrote:

>
> > On 1. Mar 2023, at 23:06, Pablo Duboue <pa...@gmail.com> wrote:
> >
> > ICLA and CCLA submitted.
>
> Great. I believe we should get notified about the arrival of the ICLA
> assuming that you mentioned "UIMA" in the "notify project" line of the
> ICLA.
>

All good, I did include UIMA and I got an email that they acknowledge
receipt of it. Things are moving.

P

Re: UIMA-CPP update

Posted by Richard Eckart de Castilho <re...@apache.org>.
> On 1. Mar 2023, at 23:06, Pablo Duboue <pa...@gmail.com> wrote:
> 
> ICLA and CCLA submitted.

Great. I believe we should get notified about the arrival of the ICLA
assuming that you mentioned "UIMA" in the "notify project" line of the
ICLA.

Cheers,

-- Richard


Re: UIMA-CPP update

Posted by Pablo Duboue <pa...@gmail.com>.
ICLA and CCLA submitted.

Looking at https://infra.apache.org/new-committers-guide.html

P

On Wed, Mar 1, 2023 at 5:46 AM Pablo Duboue <pa...@gmail.com> wrote:

>
>
> On Tue, Feb 28, 2023 at 11:32 PM Richard Eckart de Castilho <
> rec@apache.org> wrote:
>
>> Hi Pablo,
>>
>
> Hi!
>
> I am sorry, I have been teaching this month and was hosed. Thanks so much
> for the ping.
>
> have you been able yet to check if you already have an ICLA on file?
>
>
> I believe I filed one with Marshall, but let's do it again just to make
> sure. I'm employed by my own company so a CCLA wouldn't be needed but
> filing one might be good in case I sell my company.
>
>
>> The 2023 Q1 board report for the UIMA project is due next week. I noted
>> in the draft that the UIMA C++ maintainer role is in the process of being
>> transferred. The report is due next week Wednesday. Do you think we might
>> be able to complete the transfer until then?
>>
>
> Yes, I have the time to make it happen. I'll look at the ICLA link and
> follow up with it.
>
>
>> What we would need:
>>
>> * ICLA
>> * Establishment of committer role in the project and associated
>> permissions
>>   in the ASF infrastructure
>>
>> Looking forward to your response!
>>
>
> Let's do it. Thanks again!
>
> P
>
>

Re: UIMA-CPP update

Posted by Pablo Duboue <pa...@gmail.com>.
On Tue, Feb 28, 2023 at 11:32 PM Richard Eckart de Castilho <re...@apache.org>
wrote:

> Hi Pablo,
>

Hi!

I am sorry, I have been teaching this month and was hosed. Thanks so much
for the ping.

have you been able yet to check if you already have an ICLA on file?


I believe I filed one with Marshall, but let's do it again just to make
sure. I'm employed by my own company so a CCLA wouldn't be needed but
filing one might be good in case I sell my company.


> The 2023 Q1 board report for the UIMA project is due next week. I noted
> in the draft that the UIMA C++ maintainer role is in the process of being
> transferred. The report is due next week Wednesday. Do you think we might
> be able to complete the transfer until then?
>

Yes, I have the time to make it happen. I'll look at the ICLA link and
follow up with it.


> What we would need:
>
> * ICLA
> * Establishment of committer role in the project and associated permissions
>   in the ASF infrastructure
>
> Looking forward to your response!
>

Let's do it. Thanks again!

P

Re: UIMA-CPP update

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi Pablo,

have you been able yet to check if you already have an ICLA on file?

The 2023 Q1 board report for the UIMA project is due next week. I noted
in the draft that the UIMA C++ maintainer role is in the process of being
transferred. The report is due next week Wednesday. Do you think we might
be able to complete the transfer until then?

What we would need:

* ICLA
* Establishment of committer role in the project and associated permissions
  in the ASF infrastructure

Looking forward to your response!

Cheers,

-- Richard


Re: UIMA-CPP update

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi Pablo,

> On 2. Feb 2023, at 12:26, Pablo Duboue <pa...@gmail.com> wrote:
> 
> With that understanding, I can say I can maintain UIMA-CPP and make a new
> release for Linux. We can discuss what exactly should go into that release
> and then I can set a potential date. UIMA-CPP is a great codebase, I would
> love to see it used more widely.

Excellent! 

Did you already file an ICLA with the Apache Foundation?

  https://www.apache.org/licenses/contributor-agreements.html

Please also consider if you need to file a CCLA - this is important in particular
if you make your contributions during work time and/or of the project potentially
touches on the interest of your employer(s). In general, it is a good idea to
obtain a CCLA and put it on file to avoid potential discussions with employers at
a later stage.

Cheers,

-- Richard


Re: UIMA-CPP update

Posted by Pablo Duboue <pa...@gmail.com>.
Hi Eddie!

On Mon, Jan 9, 2023 at 11:14 AM Eddie Epstein <ea...@gmail.com> wrote:

> > What is ActiveMQ support enabling in UIMA-CPP?
> >
>
> > Is it for interaction with UIMA-AS (which is also currently unmaintained
> > and has been dropped from the website as part of dropping all the other
> > v2-only code)?
> >
>
> Yea, UIMA-AS. From my perspective (being away from the code for a few
> years) If activeMQ is not desirable, I think it would be straightforward to
> use the XmiCas management from UIMA-AS with a different RPC mechanism to
> utilize a standalone python annotator based on UIMA-CPP. This would support
> a python or C++ client calling a remote python or C++ annotator service.
>

I have studied the code and spent some time looking at the state of message
queues in C++. ActiveMQ is in a state of transition and its C++ library has
not been shipped by linux distributions (at least the ones I checked). The
established player in the space seems to be RabbitMQ.

The good news is that we're talking only of 3,000 lines of code, so putting
them aside for the time being is not an issue.


> Much more work, but possibly much more user friendly, would be to add an
> annotator controller mechanism with RPC support around cassis, and add
> delta-CAS support to cassis serialization.
>

I found this part really confusing, I don't understand why you are bringing
dkpro-cassis into the discussion. It has different goals than improving the
existing interface of UIMA-CPP to scripting languages (Python being one
example, I'm also thinking node.js and PHP are suitable targets, both
languages I'm involved in NLP efforts [1][2]).

From my perspective, dkpro-cassis exists to help Apache UIMA Java users
access Python annotators. The Python Apache UIMA-CPP packages will help
Python users make better NLP systems. I'll be happy to take requests from
dkpro-cassis that can help it fulfill its goals. But Python Apache UIMA-CPP
builds on the SWIG adapters that predate dkpro-cassis and it is to be an
independent effort.

With that understanding, I can say I can maintain UIMA-CPP and make a new
release for Linux. We can discuss what exactly should go into that release
and then I can set a potential date. UIMA-CPP is a great codebase, I would
love to see it used more widely.

P



[1] https://github.com/NaturalNode/natural
[2] https://github.com/RubixML/ML

Re: UIMA-CPP update

Posted by Eddie Epstein <ea...@gmail.com>.
> What is ActiveMQ support enabling in UIMA-CPP?
>

> Is it for interaction with UIMA-AS (which is also currently unmaintained
> and has been dropped from the website as part of dropping all the other
> v2-only code)?
>

Yea, UIMA-AS. From my perspective (being away from the code for a few
years) If activeMQ is not desirable, I think it would be straightforward to
use the XmiCas management from UIMA-AS with a different RPC mechanism to
utilize a standalone python annotator based on UIMA-CPP. This would support
a python or C++ client calling a remote python or C++ annotator service.

Much more work, but possibly much more user friendly, would be to add an
annotator controller mechanism with RPC support around cassis, and add
delta-CAS support to cassis serialization.

Eddie

Re: UIMA-CPP update

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hey, that sounds great!

> On 6. Jan 2023, at 15:07, Pablo Duboue <pa...@gmail.com> wrote:
> 
> * Compiling with ActiveMQ (and test it)

What is ActiveMQ support enabling in UIMA-CPP? 

Is it for interaction with UIMA-AS (which is also currently unmaintained and has been dropped from the website as part of dropping all the other v2-only code)?

Cheers,

-- Richard