You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "S. Kai Chen" <se...@gmail.com> on 2016/06/06 22:40:24 UTC

Add hot-deploy capability in Spark Shell

Hi,

We use spark-shell heavily for ad-hoc data analysis as well as iterative
development of the analytics code. A common workflow consists the following
steps:

   1. Write a small Scala module, assemble the fat jar
   2. Start spark-shell with the assembly jar file
   3. Try out some ideas in the shell, then capture the code back into the
   module
   4. Go back to step 1 and restart the shell

This is very similar to what people do in web-app development. And the pain
point is similar: in web-app development, a lot of time is spent waiting
for new code to be deployed; here, a lot of time is spent waiting for Spark
to restart. Having the ability to hot-deploy code in the REPL would help a
lot, just as being able to hot-deploy in containers like Play, or using
JRebel, has helped boost productivity tremendously.

I do have code that works with the 1.5.2 release.  Is this something that's
interesting enough to be included in Spark proper?  If so, should I create
a Jira ticket or github PR for the master branch?


Cheers,

Kai

Re: Add hot-deploy capability in Spark Shell

Posted by Kai Chen <se...@gmail.com>.

I don't.  The hot-deploy shouldn't happen while there is a job running.  At
least in the REPL it won't make much sense.  It's a development-only
feature to shorten the iterative coding cycle.  In production environment,
this is not enabled ... though there might be situations where it would be
desirable.  But currently I'm not handling that, as it's much more complex.

On Mon, Jun 6, 2016 at 4:16 PM, Reynold Xin <rx...@databricks.com> wrote:

> Thanks for the email. How do you deal with in-memory state that reference
> the classes? This can happen in both streaming and caching in RDD and
> temporary view creation in SQL.
>
> On Mon, Jun 6, 2016 at 3:40 PM, S. Kai Chen <se...@gmail.com>
> wrote:
>
>> Hi,
>>
>> We use spark-shell heavily for ad-hoc data analysis as well as iterative
>> development of the analytics code. A common workflow consists the following
>> steps:
>>
>>    1. Write a small Scala module, assemble the fat jar
>>    2. Start spark-shell with the assembly jar file
>>    3. Try out some ideas in the shell, then capture the code back into
>>    the module
>>    4. Go back to step 1 and restart the shell
>>
>> This is very similar to what people do in web-app development. And the
>> pain point is similar: in web-app development, a lot of time is spent
>> waiting for new code to be deployed; here, a lot of time is spent waiting
>> for Spark to restart. Having the ability to hot-deploy code in the REPL
>> would help a lot, just as being able to hot-deploy in containers like Play,
>> or using JRebel, has helped boost productivity tremendously.
>>
>> I do have code that works with the 1.5.2 release.  Is this something
>> that's interesting enough to be included in Spark proper?  If so, should I
>> create a Jira ticket or github PR for the master branch?
>>
>>
>> Cheers,
>>
>> Kai
>>
>
>

Re: Add hot-deploy capability in Spark Shell

Posted by Reynold Xin <rx...@databricks.com>.

Thanks for the email. How do you deal with in-memory state that reference
the classes? This can happen in both streaming and caching in RDD and
temporary view creation in SQL.

On Mon, Jun 6, 2016 at 3:40 PM, S. Kai Chen <se...@gmail.com> wrote:

> Hi,
>
> We use spark-shell heavily for ad-hoc data analysis as well as iterative
> development of the analytics code. A common workflow consists the following
> steps:
>
>    1. Write a small Scala module, assemble the fat jar
>    2. Start spark-shell with the assembly jar file
>    3. Try out some ideas in the shell, then capture the code back into
>    the module
>    4. Go back to step 1 and restart the shell
>
> This is very similar to what people do in web-app development. And the
> pain point is similar: in web-app development, a lot of time is spent
> waiting for new code to be deployed; here, a lot of time is spent waiting
> for Spark to restart. Having the ability to hot-deploy code in the REPL
> would help a lot, just as being able to hot-deploy in containers like Play,
> or using JRebel, has helped boost productivity tremendously.
>
> I do have code that works with the 1.5.2 release.  Is this something
> that's interesting enough to be included in Spark proper?  If so, should I
> create a Jira ticket or github PR for the master branch?
>
>
> Cheers,
>
> Kai
>