You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ctakes.apache.org by Bandeep Singh <bs...@phemi.com> on 2016/08/12 17:05:25 UTC

How to use cTakes with SPARK

Hi Team,

I am very new to cTAKES and just started learning how to use it.
I am wondering how to use cTakes API with SPARk (pyspark preferably) for
Big data.
Can somebody point me in the right direction.

Till now I downloaded cTakes jars and tried building it with SPARK, but it
threw me some resource allocation exception.

Any response will be highly appreciated.

Thanks,
Bandeep

Re: How to use cTakes with SPARK

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Bandeep, my team has done some work here, see:

Spark  / cTAKES – Giuseppe Totaro
https://github.com/giuseppetotaro/ctakes-clinical-pipeline

UIMA/DUCC/cTAKES – Yi-Wen Liu
https://github.com/yiwenliuable/ctakes-scale-out-with-uima-ducc

UIMA/DUCC/cTAKES – Selina Chu
https://github.com/selinachu/DUCC-cTAKES-AWS

Comments + Feedback welcome.

Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect, Instrument Software and Science Data Systems Section (398)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov<ma...@nasa.gov>
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From: Bandeep Singh <bs...@phemi.com>
Reply-To: "user@ctakes.apache.org" <us...@ctakes.apache.org>
Date: Friday, August 12, 2016 at 10:05 AM
To: "user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: How to use cTakes with SPARK

Hi Team,

I am very new to cTAKES and just started learning how to use it.
I am wondering how to use cTakes API with SPARk (pyspark preferably) for Big data.
Can somebody point me in the right direction.

Till now I downloaded cTakes jars and tried building it with SPARK, but it threw me some resource allocation exception.

Any response will be highly appreciated.

Thanks,
Bandeep

RE: [External] Re: How to use cTakes with SPARK

Posted by "Geise, Brandon D." <bd...@geisinger.edu>.
Apologies, I missed that.

From: Bandeep Singh [mailto:bsingh@phemi.com]
Sent: Friday, August 12, 2016 3:33 PM
To: user@ctakes.apache.org
Subject: Re: [External] Re: How to use cTakes with SPARK

Thanks Brandon !

I tried that example, it doesn't work any more with latest cTakes libraries.

Regards,
Bandeep



On Fri, Aug 12, 2016 at 12:28 PM, Geise, Brandon D. <bd...@geisinger.edu>> wrote:
I haven’t tried this but maybe have a look here: https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-spark-streaming-twitter/<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fsvn.apache.org%2frepos%2fasf%2fctakes%2fsandbox%2fctakes-spark-streaming-twitter%2f&data=01%7c01%7cbdgeise%40geisinger.edu%7ca28c05898c604652e35708d3c2e78af3%7c37d46c567c664402a16055c2313b910d%7c0&sdata=%2bIWyz3kKX2%2ftY7JpZGdpIxuxG9%2bw8FnXFWW1FUsxJus%3d>

Thanks,
Brandon

From: Bandeep Singh [mailto:bsingh@phemi.com<ma...@phemi.com>]
Sent: Friday, August 12, 2016 3:16 PM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: [External] Re: How to use cTakes with SPARK

Hi,

Thanks for your reply !

But I am wondering if somebody has run cTakes using SPARK and actually succeed. If yes, some resources/examples would be really helpful.

Budha I tried building SPARK with cTAKES, however the when i execute a sime HelloWorldAnnotator.java function it threw exceptions, which is suspect is because the example was written a long back and doesn't comply any more with the current libraries.

Thanks Again,
Bandeep

On Fri, Aug 12, 2016 at 10:21 AM, buddha <bu...@yahoo.com>> wrote:
cTAKES is a Java project, so it should work “out of the box” with the Java Spark libraries.  If you’re not used to using Spark + Java, then I would not recommend starting with cTAKES.  I suggest you start by using cTAKES as a Maven dependency alongside the Spark Maven dependencies.

If you want to use pySpark, then you are in the business of using Java libs from Python, like in http://stackoverflow.com/questions/476968/using-a-java-library-from-python<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fstackoverflow.com%2fquestions%2f476968%2fusing-a-java-library-from-python&data=01%7c01%7cbdgeise%40geisinger.edu%7cb2d65ee9cb8f4b32d8df08d3c2e5284b%7c37d46c567c664402a16055c2313b910d%7c0&sdata=yVV3Va5%2ba9NPQM%2fBU2gVhl6NqdAwmbtCEc%2bgwix1WkA%3d> and there is nothing special about cTAKES.

cTAKES uses UIMA on the backend, and this can be extremely confusing to new users.  Maybe you should isolate your problems

1. Use Spark + Java libs
2. Use Python + Java libs
3. Learn cTAKES on it’s own turf.  Namely, Java

Apache projects notoriously have dependency problems, and Spark is no exception.  HA!  “Exception”-- I’m funny.  Anyway, don’t expect the two to play together nicely at first.

b

~~~~~
May All Your Sequences Converge

On Aug 12, 2016, at 10:05 AM, Bandeep Singh <bs...@phemi.com>> wrote:

Hi Team,

I am very new to cTAKES and just started learning how to use it.
I am wondering how to use cTakes API with SPARk (pyspark preferably) for Big data.
Can somebody point me in the right direction.

Till now I downloaded cTakes jars and tried building it with SPARK, but it threw me some resource allocation exception.

Any response will be highly appreciated.

Thanks,
Bandeep



________________________________

IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail.


Re: [External] Re: How to use cTakes with SPARK

Posted by Bandeep Singh <bs...@phemi.com>.
Thanks Brandon !

I tried that example, it doesn't work any more with latest cTakes libraries.

Regards,
Bandeep



On Fri, Aug 12, 2016 at 12:28 PM, Geise, Brandon D. <bd...@geisinger.edu>
wrote:

> I haven’t tried this but maybe have a look here:
> https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-
> spark-streaming-twitter/
>
>
>
> Thanks,
>
> Brandon
>
>
>
> *From:* Bandeep Singh [mailto:bsingh@phemi.com]
> *Sent:* Friday, August 12, 2016 3:16 PM
> *To:* user@ctakes.apache.org
> *Subject:* [External] Re: How to use cTakes with SPARK
>
>
>
> Hi,
>
>
>
> Thanks for your reply !
>
>
>
> But I am wondering if somebody has run cTakes using SPARK and actually
> succeed. If yes, some resources/examples would be really helpful.
>
>
>
> Budha I tried building SPARK with cTAKES, however the when i execute a
> sime HelloWorldAnnotator.java function it threw exceptions, which is
> suspect is because the example was written a long back and doesn't comply
> any more with the current libraries.
>
>
>
> Thanks Again,
>
> Bandeep
>
>
>
> On Fri, Aug 12, 2016 at 10:21 AM, buddha <bu...@yahoo.com> wrote:
>
> cTAKES is a Java project, so it should work “out of the box” with the Java
> Spark libraries.  If you’re not used to using Spark + Java, then I would
> not recommend starting with cTAKES.  I suggest you start by using cTAKES as
> a Maven dependency alongside the Spark Maven dependencies.
>
>
>
> If you want to use pySpark, then you are in the business of using Java
> libs from Python, like in http://stackoverflow.com/
> questions/476968/using-a-java-library-from-python
> <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fstackoverflow.com%2fquestions%2f476968%2fusing-a-java-library-from-python&data=01%7c01%7cbdgeise%40geisinger.edu%7cb2d65ee9cb8f4b32d8df08d3c2e5284b%7c37d46c567c664402a16055c2313b910d%7c0&sdata=yVV3Va5%2ba9NPQM%2fBU2gVhl6NqdAwmbtCEc%2bgwix1WkA%3d> and
> there is nothing special about cTAKES.
>
>
>
> cTAKES uses UIMA on the backend, and this can be extremely confusing to
> new users.  Maybe you should isolate your problems
>
>
>
> 1. Use Spark + Java libs
>
> 2. Use Python + Java libs
>
> 3. Learn cTAKES on it’s own turf.  Namely, Java
>
>
>
> Apache projects notoriously have dependency problems, and Spark is no
> exception.  HA!  “Exception”-- I’m funny.  Anyway, don’t expect the two to
> play together nicely at first.
>
>
>
> b
>
>
>
> ~~~~~
>
> May All Your Sequences Converge
>
>
>
> On Aug 12, 2016, at 10:05 AM, Bandeep Singh <bs...@phemi.com> wrote:
>
>
>
> Hi Team,
>
>
>
> I am very new to cTAKES and just started learning how to use it.
>
> I am wondering how to use cTakes API with SPARk (pyspark preferably) for
> Big data.
>
> Can somebody point me in the right direction.
>
> Till now I downloaded cTakes jars and tried building it with SPARK, but it
> threw me some resource allocation exception.
>
>
>
> Any response will be highly appreciated.
>
>
> Thanks,
>
> Bandeep
>
>
>
>
>
> ------------------------------
>
> IMPORTANT WARNING: The information in this message (and the documents
> attached to it, if any) is confidential and may be legally privileged. It
> is intended solely for the addressee. Access to this message by anyone else
> is unauthorized. If you are not the intended recipient, any disclosure,
> copying, distribution or any action taken, or omitted to be taken, in
> reliance on it is prohibited and may be unlawful. If you have received this
> message in error, please delete all electronic copies of this message (and
> the documents attached to it, if any), destroy any hard copies you may have
> created and notify me immediately by replying to this email. Thank you.
> Geisinger Health System utilizes an encryption process to safeguard
> Protected Health Information and other confidential data contained in
> external e-mail messages. If email is encrypted, the recipient will receive
> an e-mail instructing them to sign on to the Geisinger Health System Secure
> E-mail Message Center to retrieve the encrypted e-mail.
>

RE: [External] Re: How to use cTakes with SPARK

Posted by "Geise, Brandon D." <bd...@geisinger.edu>.
I haven’t tried this but maybe have a look here: https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-spark-streaming-twitter/

Thanks,
Brandon

From: Bandeep Singh [mailto:bsingh@phemi.com]
Sent: Friday, August 12, 2016 3:16 PM
To: user@ctakes.apache.org
Subject: [External] Re: How to use cTakes with SPARK

Hi,

Thanks for your reply !

But I am wondering if somebody has run cTakes using SPARK and actually succeed. If yes, some resources/examples would be really helpful.

Budha I tried building SPARK with cTAKES, however the when i execute a sime HelloWorldAnnotator.java function it threw exceptions, which is suspect is because the example was written a long back and doesn't comply any more with the current libraries.

Thanks Again,
Bandeep

On Fri, Aug 12, 2016 at 10:21 AM, buddha <bu...@yahoo.com>> wrote:
cTAKES is a Java project, so it should work “out of the box” with the Java Spark libraries.  If you’re not used to using Spark + Java, then I would not recommend starting with cTAKES.  I suggest you start by using cTAKES as a Maven dependency alongside the Spark Maven dependencies.

If you want to use pySpark, then you are in the business of using Java libs from Python, like in http://stackoverflow.com/questions/476968/using-a-java-library-from-python<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fstackoverflow.com%2fquestions%2f476968%2fusing-a-java-library-from-python&data=01%7c01%7cbdgeise%40geisinger.edu%7cb2d65ee9cb8f4b32d8df08d3c2e5284b%7c37d46c567c664402a16055c2313b910d%7c0&sdata=yVV3Va5%2ba9NPQM%2fBU2gVhl6NqdAwmbtCEc%2bgwix1WkA%3d> and there is nothing special about cTAKES.

cTAKES uses UIMA on the backend, and this can be extremely confusing to new users.  Maybe you should isolate your problems

1. Use Spark + Java libs
2. Use Python + Java libs
3. Learn cTAKES on it’s own turf.  Namely, Java

Apache projects notoriously have dependency problems, and Spark is no exception.  HA!  “Exception”-- I’m funny.  Anyway, don’t expect the two to play together nicely at first.

b

~~~~~
May All Your Sequences Converge

On Aug 12, 2016, at 10:05 AM, Bandeep Singh <bs...@phemi.com>> wrote:

Hi Team,

I am very new to cTAKES and just started learning how to use it.
I am wondering how to use cTakes API with SPARk (pyspark preferably) for Big data.
Can somebody point me in the right direction.

Till now I downloaded cTakes jars and tried building it with SPARK, but it threw me some resource allocation exception.

Any response will be highly appreciated.

Thanks,
Bandeep




IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail.

Re: How to use cTakes with SPARK

Posted by Bandeep Singh <bs...@phemi.com>.
Hi,

Thanks for your reply !

But I am wondering if somebody has run cTakes using SPARK and actually
succeed. If yes, some resources/examples would be really helpful.

Budha I tried building SPARK with cTAKES, however the when i execute a sime
HelloWorldAnnotator.java function it threw exceptions, which is suspect is
because the example was written a long back and doesn't comply any more
with the current libraries.

Thanks Again,
Bandeep

On Fri, Aug 12, 2016 at 10:21 AM, buddha <bu...@yahoo.com> wrote:

> cTAKES is a Java project, so it should work “out of the box” with the Java
> Spark libraries.  If you’re not used to using Spark + Java, then I would
> not recommend starting with cTAKES.  I suggest you start by using cTAKES as
> a Maven dependency alongside the Spark Maven dependencies.
>
> If you want to use pySpark, then you are in the business of using Java
> libs from Python, like in http://stackoverflow.com/
> questions/476968/using-a-java-library-from-python and there is nothing
> special about cTAKES.
>
> cTAKES uses UIMA on the backend, and this can be extremely confusing to
> new users.  Maybe you should isolate your problems
>
> 1. Use Spark + Java libs
> 2. Use Python + Java libs
> 3. Learn cTAKES on it’s own turf.  Namely, Java
>
> Apache projects notoriously have dependency problems, and Spark is no
> exception.  HA!  “Exception”-- I’m funny.  Anyway, don’t expect the two to
> play together nicely at first.
>
> b
>
> ~~~~~
> May All Your Sequences Converge
>
> On Aug 12, 2016, at 10:05 AM, Bandeep Singh <bs...@phemi.com> wrote:
>
> Hi Team,
>
> I am very new to cTAKES and just started learning how to use it.
> I am wondering how to use cTakes API with SPARk (pyspark preferably) for
> Big data.
> Can somebody point me in the right direction.
>
> Till now I downloaded cTakes jars and tried building it with SPARK, but it
> threw me some resource allocation exception.
>
> Any response will be highly appreciated.
>
> Thanks,
> Bandeep
>
>
>

Re: How to use cTakes with SPARK

Posted by buddha <bu...@yahoo.com>.
cTAKES is a Java project, so it should work “out of the box” with the Java Spark libraries.  If you’re not used to using Spark + Java, then I would not recommend starting with cTAKES.  I suggest you start by using cTAKES as a Maven dependency alongside the Spark Maven dependencies.

If you want to use pySpark, then you are in the business of using Java libs from Python, like in http://stackoverflow.com/questions/476968/using-a-java-library-from-python <http://stackoverflow.com/questions/476968/using-a-java-library-from-python> and there is nothing special about cTAKES.

cTAKES uses UIMA on the backend, and this can be extremely confusing to new users.  Maybe you should isolate your problems

1. Use Spark + Java libs
2. Use Python + Java libs
3. Learn cTAKES on it’s own turf.  Namely, Java

Apache projects notoriously have dependency problems, and Spark is no exception.  HA!  “Exception”-- I’m funny.  Anyway, don’t expect the two to play together nicely at first.

b

~~~~~
May All Your Sequences Converge

> On Aug 12, 2016, at 10:05 AM, Bandeep Singh <bs...@phemi.com> wrote:
> 
> Hi Team,
> 
> I am very new to cTAKES and just started learning how to use it.
> I am wondering how to use cTakes API with SPARk (pyspark preferably) for Big data.
> Can somebody point me in the right direction.
> 
> Till now I downloaded cTakes jars and tried building it with SPARK, but it threw me some resource allocation exception.
> 
> Any response will be highly appreciated.
> 
> Thanks,
> Bandeep