You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jatinpreet <ja...@gmail.com> on 2014/11/21 08:39:37 UTC
Spark serialization issues with third-party libraries
Hi,
I am planning to use UIMA library to process data in my RDDs. I have had bad
experiences while using third party libraries inside worker tasks. The
system gets plagued with Serialization issues. But as UIMA classes are not
necessarily Serializable, I am not sure if it will work.
Please explain which classes need to be Serializable and which of them can
be left as it is? A clear understanding will help me a lot.
Thanks,
Jatin
-----
Novice Big Data Programmer
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-serialization-issues-with-third-party-libraries-tp19454.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark serialization issues with third-party libraries
Posted by jatinpreet <ja...@gmail.com>.
Thanks Arush! Your example is nice and easy to understand. I am implementing
it through Java though.
Jatin
-----
Novice Big Data Programmer
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-serialization-issues-with-third-party-libraries-tp19454p19624.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark serialization issues with third-party libraries
Posted by Arush Kharbanda <ar...@sigmoidanalytics.com>.
Hi
You can see my code here .
Its a POC to implement UIMA on spark
https://bitbucket.org/SigmoidDev/uimaspark
https://bitbucket.org/SigmoidDev/uimaspark/src/8476fdf16d84d0f517cce45a8bc1bd3410927464/UIMASpark/src/main/scala/
*UIMAProcessor.scala*?at=master
this is the class where the major part of the integration happens.
Thanks
Arush
On Sun, Nov 23, 2014 at 7:52 PM, jatinpreet <ja...@gmail.com> wrote:
> Thanks Sean, I was actually using instances created elsewhere inside my RDD
> transformations which as I understand is against Spark programming model. I
> was referred to a talk about UIMA and Spark integration from this year's
> Spark summit, which had a workaround for this problem. I just had to make
> some class members transient.
>
> http://spark-summit.org/2014/talk/leveraging-uima-in-spark
>
> Thanks
>
>
>
> -----
> Novice Big Data Programmer
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-serialization-issues-with-third-party-libraries-tp19454p19589.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
--
[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
*Arush Kharbanda* || Technical Teamlead
arush@sigmoidanalytics.com || www.sigmoidanalytics.com
Re: Spark serialization issues with third-party libraries
Posted by jatinpreet <ja...@gmail.com>.
Thanks Sean, I was actually using instances created elsewhere inside my RDD
transformations which as I understand is against Spark programming model. I
was referred to a talk about UIMA and Spark integration from this year's
Spark summit, which had a workaround for this problem. I just had to make
some class members transient.
http://spark-summit.org/2014/talk/leveraging-uima-in-spark
Thanks
-----
Novice Big Data Programmer
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-serialization-issues-with-third-party-libraries-tp19454p19589.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark serialization issues with third-party libraries
Posted by Sean Owen <so...@cloudera.com>.
You are probably casually sending UIMA objects from the driver to
executors in a closure. You'll have to design your program so that you
do not need to ship these objects to or from the remote task workers.
On Fri, Nov 21, 2014 at 8:39 AM, jatinpreet <ja...@gmail.com> wrote:
> Hi,
>
> I am planning to use UIMA library to process data in my RDDs. I have had bad
> experiences while using third party libraries inside worker tasks. The
> system gets plagued with Serialization issues. But as UIMA classes are not
> necessarily Serializable, I am not sure if it will work.
>
> Please explain which classes need to be Serializable and which of them can
> be left as it is? A clear understanding will help me a lot.
>
> Thanks,
> Jatin
>
>
>
> -----
> Novice Big Data Programmer
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-serialization-issues-with-third-party-libraries-tp19454.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org