You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Benedict Holland <be...@gmail.com> on 2017/09/26 19:02:21 UTC

UIMA on Spark mimicking CPE pipelines

Hello all,

I have a working application that essentially implements the CPE within a
spark context. The best part about this is that it does not use UIMAFit or
any 3rd party applications. It simply uses hadoop, spark, UIMA, and
OpenNLP.

Users are able to configure, design, and build the UIMA pipeline using all
of the eclipse XML plugin applications. Instead of running the application
via the CPE.process() driver from a main class, it will run from the
foreach() function on the Dataframe<Row> object.

Oh also, it plugs into a database to get the text and to write results.

Would the UIMA community be interested in getting a working example put
together? If so, please feel free to contact me. I think this could be an
excellent example of what people would like to use and your examples are
particularly good.

Thanks,
~Ben

Re: UIMA on Spark mimicking CPE pipelines

Posted by Benedict Holland <be...@gmail.com>.

That is a great suggestion. I will add it to the list of project tasks
since that would also be a smart extension to get working soon.

Thanks,
~Ben

On Thu, Sep 28, 2017 at 12:24 PM, Nicolas Paris <ni...@gmail.com> wrote:

> hi ben
>
> you can mimic a yarn instance by creating a slave and a master. this
> would confirm no serialization problem are involved
>
> Le 28 sept. 2017 à 16:55, Benedict Holland écrivait :
> > Hello All,
> >
> > It does, in fact, look like it works with standalone instances. We don't
> > have an environment to test with yarn, but given how it works, it looks
> > like it should work fine. The only thing is, each node will have to have
> > access to the database that the CPE runs over. I was actually thinking
> > about making the Dataset<Row> collection be created from the CPE
> getNext()
> > method until hasNext returns false, but I think that will cause memory
> > problems with huge databases.
> >
> > Hopefully, I will have more information on exactly what I can release
> over
> > the next upcoming days. I am pushing to provide a minimum working example
> > with a MySQL schema and a small setup guide.
> >
> > ~Ben
> >
> > On Thu, Sep 28, 2017 at 3:29 AM, Nicolas Paris <ni...@gmail.com>
> wrote:
> >
> > > Hey ben
> > >
> > > thanks for the feedbacak, looks interesting approach
> > > have you both validate your approach on standalone/yarn spark instances
> > > ?
> > >
> > > thanks
> > > Le 26 sept. 2017 à 21:02, Benedict Holland écrivait :
> > > > Hello all,
> > > >
> > > > I have a working application that essentially implements the CPE
> within a
> > > > spark context. The best part about this is that it does not use
> UIMAFit
> > > or
> > > > any 3rd party applications. It simply uses hadoop, spark, UIMA, and
> > > > OpenNLP.
> > > >
> > > > Users are able to configure, design, and build the UIMA pipeline
> using
> > > all
> > > > of the eclipse XML plugin applications. Instead of running the
> > > application
> > > > via the CPE.process() driver from a main class, it will run from the
> > > > foreach() function on the Dataframe<Row> object.
> > > >
> > > > Oh also, it plugs into a database to get the text and to write
> results.
> > > >
> > > > Would the UIMA community be interested in getting a working example
> put
> > > > together? If so, please feel free to contact me. I think this could
> be an
> > > > excellent example of what people would like to use and your examples
> are
> > > > particularly good.
> > > >
> > > > Thanks,
> > > > ~Ben
> > >
>

Re: UIMA on Spark mimicking CPE pipelines

Posted by Nicolas Paris <ni...@gmail.com>.

hi ben

you can mimic a yarn instance by creating a slave and a master. this
would confirm no serialization problem are involved 

Le 28 sept. 2017 à 16:55, Benedict Holland écrivait :
> Hello All,
> 
> It does, in fact, look like it works with standalone instances. We don't
> have an environment to test with yarn, but given how it works, it looks
> like it should work fine. The only thing is, each node will have to have
> access to the database that the CPE runs over. I was actually thinking
> about making the Dataset<Row> collection be created from the CPE getNext()
> method until hasNext returns false, but I think that will cause memory
> problems with huge databases.
> 
> Hopefully, I will have more information on exactly what I can release over
> the next upcoming days. I am pushing to provide a minimum working example
> with a MySQL schema and a small setup guide.
> 
> ~Ben
> 
> On Thu, Sep 28, 2017 at 3:29 AM, Nicolas Paris <ni...@gmail.com> wrote:
> 
> > Hey ben
> >
> > thanks for the feedbacak, looks interesting approach
> > have you both validate your approach on standalone/yarn spark instances
> > ?
> >
> > thanks
> > Le 26 sept. 2017 à 21:02, Benedict Holland écrivait :
> > > Hello all,
> > >
> > > I have a working application that essentially implements the CPE within a
> > > spark context. The best part about this is that it does not use UIMAFit
> > or
> > > any 3rd party applications. It simply uses hadoop, spark, UIMA, and
> > > OpenNLP.
> > >
> > > Users are able to configure, design, and build the UIMA pipeline using
> > all
> > > of the eclipse XML plugin applications. Instead of running the
> > application
> > > via the CPE.process() driver from a main class, it will run from the
> > > foreach() function on the Dataframe<Row> object.
> > >
> > > Oh also, it plugs into a database to get the text and to write results.
> > >
> > > Would the UIMA community be interested in getting a working example put
> > > together? If so, please feel free to contact me. I think this could be an
> > > excellent example of what people would like to use and your examples are
> > > particularly good.
> > >
> > > Thanks,
> > > ~Ben
> >

Re: UIMA on Spark mimicking CPE pipelines

Posted by Benedict Holland <be...@gmail.com>.

Hello All,

It does, in fact, look like it works with standalone instances. We don't
have an environment to test with yarn, but given how it works, it looks
like it should work fine. The only thing is, each node will have to have
access to the database that the CPE runs over. I was actually thinking
about making the Dataset<Row> collection be created from the CPE getNext()
method until hasNext returns false, but I think that will cause memory
problems with huge databases.

Hopefully, I will have more information on exactly what I can release over
the next upcoming days. I am pushing to provide a minimum working example
with a MySQL schema and a small setup guide.

~Ben

On Thu, Sep 28, 2017 at 3:29 AM, Nicolas Paris <ni...@gmail.com> wrote:

> Hey ben
>
> thanks for the feedbacak, looks interesting approach
> have you both validate your approach on standalone/yarn spark instances
> ?
>
> thanks
> Le 26 sept. 2017 à 21:02, Benedict Holland écrivait :
> > Hello all,
> >
> > I have a working application that essentially implements the CPE within a
> > spark context. The best part about this is that it does not use UIMAFit
> or
> > any 3rd party applications. It simply uses hadoop, spark, UIMA, and
> > OpenNLP.
> >
> > Users are able to configure, design, and build the UIMA pipeline using
> all
> > of the eclipse XML plugin applications. Instead of running the
> application
> > via the CPE.process() driver from a main class, it will run from the
> > foreach() function on the Dataframe<Row> object.
> >
> > Oh also, it plugs into a database to get the text and to write results.
> >
> > Would the UIMA community be interested in getting a working example put
> > together? If so, please feel free to contact me. I think this could be an
> > excellent example of what people would like to use and your examples are
> > particularly good.
> >
> > Thanks,
> > ~Ben
>

Re: UIMA on Spark mimicking CPE pipelines

Posted by Nicolas Paris <ni...@gmail.com>.

Hey ben

thanks for the feedbacak, looks interesting approach
have you both validate your approach on standalone/yarn spark instances
?

thanks
Le 26 sept. 2017 à 21:02, Benedict Holland écrivait :
> Hello all,
> 
> I have a working application that essentially implements the CPE within a
> spark context. The best part about this is that it does not use UIMAFit or
> any 3rd party applications. It simply uses hadoop, spark, UIMA, and
> OpenNLP.
> 
> Users are able to configure, design, and build the UIMA pipeline using all
> of the eclipse XML plugin applications. Instead of running the application
> via the CPE.process() driver from a main class, it will run from the
> foreach() function on the Dataframe<Row> object.
> 
> Oh also, it plugs into a database to get the text and to write results.
> 
> Would the UIMA community be interested in getting a working example put
> together? If so, please feel free to contact me. I think this could be an
> excellent example of what people would like to use and your examples are
> particularly good.
> 
> Thanks,
> ~Ben

Re: UIMA on Spark mimicking CPE pipelines

Posted by Hugues de Mazancourt <hu...@mazancourt.com>.

Hi,

I would be very interested also. We are working with both UIMA and Spark, but the two are not directly connected. 
An insight of how this could be made would certainly open some perspectives.

Best,


Hugues de Mazancourt



> Le 27 sept. 2017 à 19:10, Benedict Holland <be...@gmail.com> a écrit :
> 
> Hello All,
> 
> I am very happy to hear that this has interest. I work at a for-profit
> company but we have and process to release full working examples of this.
> We call it technical dissemination. I will work through my organization and
> hopefully provide a bit more than a simple driver.
> 
> Thanks,
> ~Ben
> 
> 
> 
> On Wed, Sep 27, 2017 at 6:07 AM, Benjamin De Boe <
> Benjamin.DeBoe@intersystems.com> wrote:
> 
>> Hi Benedict,
>> 
>> I'd be very interested to see an example of this, as we've been playing
>> with the very same idea, but haven't yet gotten to any actual trial (and
>> error) yet.
>> 
>> Many thanks in advance,
>> benjamin
>> 
>> 
>> Benjamin De Boe
>> Product Manager | InterSystems
>> T: +32 2 464 97 33 | M: +32 495 19 19 27
>> http://www.intersystems.com/
>> 
>> 
>> 
>> -----Original Message-----
>> From: Benedict Holland [mailto:benedict.m.holland@gmail.com]
>> Sent: Tuesday, September 26, 2017 9:02 PM
>> To: user@uima.apache.org
>> Subject: UIMA on Spark mimicking CPE pipelines
>> 
>> Hello all,
>> 
>> I have a working application that essentially implements the CPE within a
>> spark context. The best part about this is that it does not use UIMAFit or
>> any 3rd party applications. It simply uses hadoop, spark, UIMA, and OpenNLP.
>> 
>> Users are able to configure, design, and build the UIMA pipeline using all
>> of the eclipse XML plugin applications. Instead of running the application
>> via the CPE.process() driver from a main class, it will run from the
>> foreach() function on the Dataframe<Row> object.
>> 
>> Oh also, it plugs into a database to get the text and to write results.
>> 
>> Would the UIMA community be interested in getting a working example put
>> together? If so, please feel free to contact me. I think this could be an
>> excellent example of what people would like to use and your examples are
>> particularly good.
>> 
>> Thanks,
>> ~Ben
>>

Re: UIMA on Spark mimicking CPE pipelines

Posted by Benedict Holland <be...@gmail.com>.

Hello All,

I am very happy to hear that this has interest. I work at a for-profit
company but we have and process to release full working examples of this.
We call it technical dissemination. I will work through my organization and
hopefully provide a bit more than a simple driver.

Thanks,
~Ben



On Wed, Sep 27, 2017 at 6:07 AM, Benjamin De Boe <
Benjamin.DeBoe@intersystems.com> wrote:

> Hi Benedict,
>
> I'd be very interested to see an example of this, as we've been playing
> with the very same idea, but haven't yet gotten to any actual trial (and
> error) yet.
>
> Many thanks in advance,
> benjamin
>
>
> Benjamin De Boe
> Product Manager | InterSystems
> T: +32 2 464 97 33 | M: +32 495 19 19 27
> http://www.intersystems.com/
>
>
>
> -----Original Message-----
> From: Benedict Holland [mailto:benedict.m.holland@gmail.com]
> Sent: Tuesday, September 26, 2017 9:02 PM
> To: user@uima.apache.org
> Subject: UIMA on Spark mimicking CPE pipelines
>
> Hello all,
>
> I have a working application that essentially implements the CPE within a
> spark context. The best part about this is that it does not use UIMAFit or
> any 3rd party applications. It simply uses hadoop, spark, UIMA, and OpenNLP.
>
> Users are able to configure, design, and build the UIMA pipeline using all
> of the eclipse XML plugin applications. Instead of running the application
> via the CPE.process() driver from a main class, it will run from the
> foreach() function on the Dataframe<Row> object.
>
> Oh also, it plugs into a database to get the text and to write results.
>
> Would the UIMA community be interested in getting a working example put
> together? If so, please feel free to contact me. I think this could be an
> excellent example of what people would like to use and your examples are
> particularly good.
>
> Thanks,
> ~Ben
>

RE: UIMA on Spark mimicking CPE pipelines

Posted by Benjamin De Boe <Be...@intersystems.com>.

Hi Benedict,

I'd be very interested to see an example of this, as we've been playing with the very same idea, but haven't yet gotten to any actual trial (and error) yet.

Many thanks in advance,
benjamin


Benjamin De Boe 
Product Manager | InterSystems 
T: +32 2 464 97 33 | M: +32 495 19 19 27  
http://www.intersystems.com/ 



-----Original Message-----
From: Benedict Holland [mailto:benedict.m.holland@gmail.com] 
Sent: Tuesday, September 26, 2017 9:02 PM
To: user@uima.apache.org
Subject: UIMA on Spark mimicking CPE pipelines

Hello all,

I have a working application that essentially implements the CPE within a spark context. The best part about this is that it does not use UIMAFit or any 3rd party applications. It simply uses hadoop, spark, UIMA, and OpenNLP.

Users are able to configure, design, and build the UIMA pipeline using all of the eclipse XML plugin applications. Instead of running the application via the CPE.process() driver from a main class, it will run from the
foreach() function on the Dataframe<Row> object.

Oh also, it plugs into a database to get the text and to write results.

Would the UIMA community be interested in getting a working example put together? If so, please feel free to contact me. I think this could be an excellent example of what people would like to use and your examples are particularly good.

Thanks,
~Ben