You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jatinpreet <ja...@gmail.com> on 2014/11/21 08:39:37 UTC

Spark serialization issues with third-party libraries

Hi,

I am planning to use UIMA library to process data in my RDDs. I have had bad
experiences while using third party libraries inside worker tasks. The
system gets plagued with Serialization issues. But as UIMA classes are not
necessarily Serializable, I am not sure if it will work. 

Please explain which classes need to be Serializable and which of them can
be left as it is? A clear understanding will help me a lot.

Thanks,
Jatin



-----
Novice Big Data Programmer
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-serialization-issues-with-third-party-libraries-tp19454.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark serialization issues with third-party libraries

Posted by jatinpreet <ja...@gmail.com>.
Thanks Arush! Your example is nice and easy to understand. I am implementing
it through Java though.

Jatin



-----
Novice Big Data Programmer
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-serialization-issues-with-third-party-libraries-tp19454p19624.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark serialization issues with third-party libraries

Posted by Arush Kharbanda <ar...@sigmoidanalytics.com>.
Hi

You can see my code here .

Its a POC to implement UIMA on spark

https://bitbucket.org/SigmoidDev/uimaspark

https://bitbucket.org/SigmoidDev/uimaspark/src/8476fdf16d84d0f517cce45a8bc1bd3410927464/UIMASpark/src/main/scala/
*UIMAProcessor.scala*?at=master

this is the class where the major part of the integration happens.

Thanks
Arush

On Sun, Nov 23, 2014 at 7:52 PM, jatinpreet <ja...@gmail.com> wrote:

> Thanks Sean, I was actually using instances created elsewhere inside my RDD
> transformations which as I understand is against Spark programming model. I
> was referred to a talk about UIMA and Spark integration from this year's
> Spark summit, which had a workaround for this problem. I just had to make
> some class members transient.
>
> http://spark-summit.org/2014/talk/leveraging-uima-in-spark
>
> Thanks
>
>
>
> -----
> Novice Big Data Programmer
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-serialization-issues-with-third-party-libraries-tp19454p19589.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 

[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>

*Arush Kharbanda* || Technical Teamlead

arush@sigmoidanalytics.com || www.sigmoidanalytics.com

Re: Spark serialization issues with third-party libraries

Posted by jatinpreet <ja...@gmail.com>.
Thanks Sean, I was actually using instances created elsewhere inside my RDD
transformations which as I understand is against Spark programming model. I
was referred to a talk about UIMA and Spark integration from this year's
Spark summit, which had a workaround for this problem. I just had to make
some class members transient.

http://spark-summit.org/2014/talk/leveraging-uima-in-spark

Thanks



-----
Novice Big Data Programmer
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-serialization-issues-with-third-party-libraries-tp19454p19589.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark serialization issues with third-party libraries

Posted by Sean Owen <so...@cloudera.com>.
You are probably casually sending UIMA objects from the driver to
executors in a closure. You'll have to design your program so that you
do not need to ship these objects to or from the remote task workers.

On Fri, Nov 21, 2014 at 8:39 AM, jatinpreet <ja...@gmail.com> wrote:
> Hi,
>
> I am planning to use UIMA library to process data in my RDDs. I have had bad
> experiences while using third party libraries inside worker tasks. The
> system gets plagued with Serialization issues. But as UIMA classes are not
> necessarily Serializable, I am not sure if it will work.
>
> Please explain which classes need to be Serializable and which of them can
> be left as it is? A clear understanding will help me a lot.
>
> Thanks,
> Jatin
>
>
>
> -----
> Novice Big Data Programmer
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-serialization-issues-with-third-party-libraries-tp19454.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org