You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Benedict Holland <be...@gmail.com> on 2017/09/27 17:27:12 UTC

UIMA on top of Spark example

Hello all,

My name is Ben Holland and I am a data scientist at Abt Associates. We are
working to develop a scalable NLP engine and selected UIMA with OpenNLP as
our tech. We wanted to run this over Spark as well.

The huge draw for me was the awesome set of examples and documentation that
UIMA provided so that I could easily get up and running. With that in mind,
I am working with my company to put together code that I can give to the
UIMA team using only open source libraries (specifically UIMA, Hadoop,
Spark, and OpenNLP). I want to provide you with a fully functional example
developed in eclipse.

I will need a contact within the UIMA team at Apache. If someone could
please get back to me on this, I would be most grateful.

The goal of this process is to entirely mimic the CPE using the UIMA
xml descriptor files over a spark cluster. I do not rely on UIMAfit or any
3rd party libraries apart from the JDBC driver. For bonus points, I hooked
this up to a database that reads text, populates N cas objects with
database values, processes the text, and saves particularly interesting
text to the database. I pull out names.

Why am I coming to you? This is a very simple application. It really is a
proof of concept example but it is enough to get the architecture in place
to expand on it.

I hope this interests you. I found it fascinating to work on this.

BTW, you should all feel extremely proud of your work. I don't make these
offers often but the UIMA documentation, architecture, and code
readability/stability is incredible. Within a few months, we were able to
get a NLP engine into a process chain. I am very impressed.

Thank you all so much,
~Ben

Re: UIMA on top of Spark example

Posted by Marshall Schor <ms...@schor.com>.

This sounds like it would be a valuable addition.

I would be happy to help you with it where I can; others on the team also may be
interested.

-Marshall Schor


On 9/27/2017 1:27 PM, Benedict Holland wrote:
> Hello all,
>
> My name is Ben Holland and I am a data scientist at Abt Associates. We are
> working to develop a scalable NLP engine and selected UIMA with OpenNLP as
> our tech. We wanted to run this over Spark as well.
>
> The huge draw for me was the awesome set of examples and documentation that
> UIMA provided so that I could easily get up and running. With that in mind,
> I am working with my company to put together code that I can give to the
> UIMA team using only open source libraries (specifically UIMA, Hadoop,
> Spark, and OpenNLP). I want to provide you with a fully functional example
> developed in eclipse.
>
> I will need a contact within the UIMA team at Apache. If someone could
> please get back to me on this, I would be most grateful.
>
> The goal of this process is to entirely mimic the CPE using the UIMA
> xml descriptor files over a spark cluster. I do not rely on UIMAfit or any
> 3rd party libraries apart from the JDBC driver. For bonus points, I hooked
> this up to a database that reads text, populates N cas objects with
> database values, processes the text, and saves particularly interesting
> text to the database. I pull out names.
>
> Why am I coming to you? This is a very simple application. It really is a
> proof of concept example but it is enough to get the architecture in place
> to expand on it.
>
> I hope this interests you. I found it fascinating to work on this.
>
> BTW, you should all feel extremely proud of your work. I don't make these
> offers often but the UIMA documentation, architecture, and code
> readability/stability is incredible. Within a few months, we were able to
> get a NLP engine into a process chain. I am very impressed.
>
> Thank you all so much,
> ~Ben
>

Re: UIMA on top of Spark example

Posted by Richard Eckart de Castilho <re...@apache.org>.

Hi Benedict,

> On 27.09.2017, at 19:27, Benedict Holland <be...@gmail.com> wrote:
> 
> The huge draw for me was the awesome set of examples and documentation that
> UIMA provided so that I could easily get up and running. With that in mind,
> I am working with my company to put together code that I can give to the
> UIMA team using only open source libraries (specifically UIMA, Hadoop,
> Spark, and OpenNLP). I want to provide you with a fully functional example
> developed in eclipse.
> 
> I will need a contact within the UIMA team at Apache. If someone could
> please get back to me on this, I would be most grateful.

It is great to hear that you have a successful project based on UIMA!

And of course we are always grateful for any kind of contributions.

The Apache Software Foundation is a community of communities which values
transparency and openness. Communication is usually handled via the
users and developers mailing lists of the respective projects, such as UIMA
or e.g. via Jira or other issue trackers that are also reflected on the
mailing lists and archived. In fact, a saying at Apache goes "if it didn't 
happen on the mailing list, it didn't happen".

Thus, the best way to contact and to communicate any Apache community
(including the developer part of the respective community) is via
the mailing lists. 

So whatever you wish to discuss: we are here!

Best,

-- Richard

P.S.: If you wish to learn more about the way the ASF works, have a look
here: https://www.apache.org/foundation/how-it-works.html