You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Chetan Khatri <ch...@gmail.com> on 2016/12/23 06:44:02 UTC

Best Practice for Spark Job Jar Generation

Hello Spark Community,

For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and
then submit to spark-submit.

Example,

bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
/home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar

But other folks has debate with for Uber Less Jar, Guys can you please
explain me best practice industry standard for the same.

Thanks,

Chetan Khatri.

Re: Best Practice for Spark Job Jar Generation

Posted by Chetan Khatri <ch...@gmail.com>.

Correct, so the approach you suggested and Uber Jar Approach. What i think
that Uber Jar approach is best practice because if you wish to do
environment migration then would be easy. and Performance wise also Uber
Jar Approach would be more optimised rather than Uber less approach.

Thanks.

On Fri, Dec 23, 2016 at 11:41 PM, Andy Dang <na...@gmail.com> wrote:

> We remodel Spark dependencies and ours together and chuck them under the
> /jars path. There are other ways to do it but we want the classpath to be
> strictly as close to development as possible.
>
> -------
> Regards,
> Andy
>
> On Fri, Dec 23, 2016 at 6:00 PM, Chetan Khatri <
> chetan.opensource@gmail.com> wrote:
>
>> Andy, Thanks for reply.
>>
>> If we download all the dependencies at separate location  and link with
>> spark job jar on spark cluster, is it best way to execute spark job ?
>>
>> Thanks.
>>
>> On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang <na...@gmail.com> wrote:
>>
>>> I used to use uber jar in Spark 1.x because of classpath issues (we
>>> couldn't re-model our dependencies based on our code, and thus cluster's
>>> run dependencies could be very different from running Spark directly in the
>>> IDE. We had to use userClasspathFirst "hack" to work around this.
>>>
>>> With Spark 2, it's easier to replace dependencies (say, Guava) than
>>> before. We moved away from deploying superjar and just pass the libraries
>>> as part of Spark jars (still can't use Guava v19 or later because Spark
>>> uses a deprecated method that's not available, but that's not a big issue
>>> for us).
>>>
>>> -------
>>> Regards,
>>> Andy
>>>
>>> On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri <
>>> chetan.opensource@gmail.com> wrote:
>>>
>>>> Hello Spark Community,
>>>>
>>>> For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar
>>>> and then submit to spark-submit.
>>>>
>>>> Example,
>>>>
>>>> bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
>>>> /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
>>>>
>>>> But other folks has debate with for Uber Less Jar, Guys can you please
>>>> explain me best practice industry standard for the same.
>>>>
>>>> Thanks,
>>>>
>>>> Chetan Khatri.
>>>>
>>>
>>>
>>
>

Re: Best Practice for Spark Job Jar Generation

Posted by Chetan Khatri <ch...@gmail.com>.

Correct, so the approach you suggested and Uber Jar Approach. What i think
that Uber Jar approach is best practice because if you wish to do
environment migration then would be easy. and Performance wise also Uber
Jar Approach would be more optimised rather than Uber less approach.

Thanks.

On Fri, Dec 23, 2016 at 11:41 PM, Andy Dang <na...@gmail.com> wrote:

> We remodel Spark dependencies and ours together and chuck them under the
> /jars path. There are other ways to do it but we want the classpath to be
> strictly as close to development as possible.
>
> -------
> Regards,
> Andy
>
> On Fri, Dec 23, 2016 at 6:00 PM, Chetan Khatri <
> chetan.opensource@gmail.com> wrote:
>
>> Andy, Thanks for reply.
>>
>> If we download all the dependencies at separate location  and link with
>> spark job jar on spark cluster, is it best way to execute spark job ?
>>
>> Thanks.
>>
>> On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang <na...@gmail.com> wrote:
>>
>>> I used to use uber jar in Spark 1.x because of classpath issues (we
>>> couldn't re-model our dependencies based on our code, and thus cluster's
>>> run dependencies could be very different from running Spark directly in the
>>> IDE. We had to use userClasspathFirst "hack" to work around this.
>>>
>>> With Spark 2, it's easier to replace dependencies (say, Guava) than
>>> before. We moved away from deploying superjar and just pass the libraries
>>> as part of Spark jars (still can't use Guava v19 or later because Spark
>>> uses a deprecated method that's not available, but that's not a big issue
>>> for us).
>>>
>>> -------
>>> Regards,
>>> Andy
>>>
>>> On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri <
>>> chetan.opensource@gmail.com> wrote:
>>>
>>>> Hello Spark Community,
>>>>
>>>> For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar
>>>> and then submit to spark-submit.
>>>>
>>>> Example,
>>>>
>>>> bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
>>>> /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
>>>>
>>>> But other folks has debate with for Uber Less Jar, Guys can you please
>>>> explain me best practice industry standard for the same.
>>>>
>>>> Thanks,
>>>>
>>>> Chetan Khatri.
>>>>
>>>
>>>
>>
>

Re: Best Practice for Spark Job Jar Generation

Posted by Andy Dang <na...@gmail.com>.

We remodel Spark dependencies and ours together and chuck them under the
/jars path. There are other ways to do it but we want the classpath to be
strictly as close to development as possible.

-------
Regards,
Andy

On Fri, Dec 23, 2016 at 6:00 PM, Chetan Khatri <ch...@gmail.com>
wrote:

> Andy, Thanks for reply.
>
> If we download all the dependencies at separate location  and link with
> spark job jar on spark cluster, is it best way to execute spark job ?
>
> Thanks.
>
> On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang <na...@gmail.com> wrote:
>
>> I used to use uber jar in Spark 1.x because of classpath issues (we
>> couldn't re-model our dependencies based on our code, and thus cluster's
>> run dependencies could be very different from running Spark directly in the
>> IDE. We had to use userClasspathFirst "hack" to work around this.
>>
>> With Spark 2, it's easier to replace dependencies (say, Guava) than
>> before. We moved away from deploying superjar and just pass the libraries
>> as part of Spark jars (still can't use Guava v19 or later because Spark
>> uses a deprecated method that's not available, but that's not a big issue
>> for us).
>>
>> -------
>> Regards,
>> Andy
>>
>> On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri <
>> chetan.opensource@gmail.com> wrote:
>>
>>> Hello Spark Community,
>>>
>>> For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and
>>> then submit to spark-submit.
>>>
>>> Example,
>>>
>>> bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
>>> /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
>>>
>>> But other folks has debate with for Uber Less Jar, Guys can you please
>>> explain me best practice industry standard for the same.
>>>
>>> Thanks,
>>>
>>> Chetan Khatri.
>>>
>>
>>
>

Re: Best Practice for Spark Job Jar Generation

Posted by Andy Dang <na...@gmail.com>.

We remodel Spark dependencies and ours together and chuck them under the
/jars path. There are other ways to do it but we want the classpath to be
strictly as close to development as possible.

-------
Regards,
Andy

On Fri, Dec 23, 2016 at 6:00 PM, Chetan Khatri <ch...@gmail.com>
wrote:

> Andy, Thanks for reply.
>
> If we download all the dependencies at separate location  and link with
> spark job jar on spark cluster, is it best way to execute spark job ?
>
> Thanks.
>
> On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang <na...@gmail.com> wrote:
>
>> I used to use uber jar in Spark 1.x because of classpath issues (we
>> couldn't re-model our dependencies based on our code, and thus cluster's
>> run dependencies could be very different from running Spark directly in the
>> IDE. We had to use userClasspathFirst "hack" to work around this.
>>
>> With Spark 2, it's easier to replace dependencies (say, Guava) than
>> before. We moved away from deploying superjar and just pass the libraries
>> as part of Spark jars (still can't use Guava v19 or later because Spark
>> uses a deprecated method that's not available, but that's not a big issue
>> for us).
>>
>> -------
>> Regards,
>> Andy
>>
>> On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri <
>> chetan.opensource@gmail.com> wrote:
>>
>>> Hello Spark Community,
>>>
>>> For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and
>>> then submit to spark-submit.
>>>
>>> Example,
>>>
>>> bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
>>> /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
>>>
>>> But other folks has debate with for Uber Less Jar, Guys can you please
>>> explain me best practice industry standard for the same.
>>>
>>> Thanks,
>>>
>>> Chetan Khatri.
>>>
>>
>>
>

Re: Best Practice for Spark Job Jar Generation

Posted by Chetan Khatri <ch...@gmail.com>.

Andy, Thanks for reply.

If we download all the dependencies at separate location  and link with
spark job jar on spark cluster, is it best way to execute spark job ?

Thanks.

On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang <na...@gmail.com> wrote:

> I used to use uber jar in Spark 1.x because of classpath issues (we
> couldn't re-model our dependencies based on our code, and thus cluster's
> run dependencies could be very different from running Spark directly in the
> IDE. We had to use userClasspathFirst "hack" to work around this.
>
> With Spark 2, it's easier to replace dependencies (say, Guava) than
> before. We moved away from deploying superjar and just pass the libraries
> as part of Spark jars (still can't use Guava v19 or later because Spark
> uses a deprecated method that's not available, but that's not a big issue
> for us).
>
> -------
> Regards,
> Andy
>
> On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri <
> chetan.opensource@gmail.com> wrote:
>
>> Hello Spark Community,
>>
>> For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and
>> then submit to spark-submit.
>>
>> Example,
>>
>> bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
>> /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
>>
>> But other folks has debate with for Uber Less Jar, Guys can you please
>> explain me best practice industry standard for the same.
>>
>> Thanks,
>>
>> Chetan Khatri.
>>
>
>

Re: Best Practice for Spark Job Jar Generation

Posted by Chetan Khatri <ch...@gmail.com>.

Andy, Thanks for reply.

If we download all the dependencies at separate location  and link with
spark job jar on spark cluster, is it best way to execute spark job ?

Thanks.

On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang <na...@gmail.com> wrote:

> I used to use uber jar in Spark 1.x because of classpath issues (we
> couldn't re-model our dependencies based on our code, and thus cluster's
> run dependencies could be very different from running Spark directly in the
> IDE. We had to use userClasspathFirst "hack" to work around this.
>
> With Spark 2, it's easier to replace dependencies (say, Guava) than
> before. We moved away from deploying superjar and just pass the libraries
> as part of Spark jars (still can't use Guava v19 or later because Spark
> uses a deprecated method that's not available, but that's not a big issue
> for us).
>
> -------
> Regards,
> Andy
>
> On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri <
> chetan.opensource@gmail.com> wrote:
>
>> Hello Spark Community,
>>
>> For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and
>> then submit to spark-submit.
>>
>> Example,
>>
>> bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
>> /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
>>
>> But other folks has debate with for Uber Less Jar, Guys can you please
>> explain me best practice industry standard for the same.
>>
>> Thanks,
>>
>> Chetan Khatri.
>>
>
>

Re: Best Practice for Spark Job Jar Generation

Posted by Andy Dang <na...@gmail.com>.

I used to use uber jar in Spark 1.x because of classpath issues (we
couldn't re-model our dependencies based on our code, and thus cluster's
run dependencies could be very different from running Spark directly in the
IDE. We had to use userClasspathFirst "hack" to work around this.

With Spark 2, it's easier to replace dependencies (say, Guava) than before.
We moved away from deploying superjar and just pass the libraries as part
of Spark jars (still can't use Guava v19 or later because Spark uses a
deprecated method that's not available, but that's not a big issue for us).

-------
Regards,
Andy

On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri <ch...@gmail.com>
wrote:

> Hello Spark Community,
>
> For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and
> then submit to spark-submit.
>
> Example,
>
> bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
> /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
>
> But other folks has debate with for Uber Less Jar, Guys can you please
> explain me best practice industry standard for the same.
>
> Thanks,
>
> Chetan Khatri.
>

Re: Best Practice for Spark Job Jar Generation

Posted by Andy Dang <na...@gmail.com>.

I used to use uber jar in Spark 1.x because of classpath issues (we
couldn't re-model our dependencies based on our code, and thus cluster's
run dependencies could be very different from running Spark directly in the
IDE. We had to use userClasspathFirst "hack" to work around this.

With Spark 2, it's easier to replace dependencies (say, Guava) than before.
We moved away from deploying superjar and just pass the libraries as part
of Spark jars (still can't use Guava v19 or later because Spark uses a
deprecated method that's not available, but that's not a big issue for us).

-------
Regards,
Andy

On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri <ch...@gmail.com>
wrote:

> Hello Spark Community,
>
> For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and
> then submit to spark-submit.
>
> Example,
>
> bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
> /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
>
> But other folks has debate with for Uber Less Jar, Guys can you please
> explain me best practice industry standard for the same.
>
> Thanks,
>
> Chetan Khatri.
>