You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Stephen Boesch <ja...@gmail.com> on 2015/03/27 18:02:27 UTC

Iterative pyspark / scala codebase development

I am iteratively making changes to the scala side of some new pyspark code
and re-testing from the python/pyspark side.

Presently my only solution is to rebuild completely

      sbt assembly

after any scala side change - no matter how small.

Any better / expedited way for pyspark to see small scala side updates?

Re: Iterative pyspark / scala codebase development

Posted by Davies Liu <da...@databricks.com>.
On Fri, Mar 27, 2015 at 4:16 PM, Stephen Boesch <ja...@gmail.com> wrote:
> Thx much!  This works.
>
> My workflow is making changes to files in Intelij and running ipython to
> execute pyspark.
>
> Is there any way for ipython to "see the updated class files without first
> exiting?

No, iPython shell is statefull, it will have unexpected behavior when
you reload the library.

> 2015-03-27 10:21 GMT-07:00 Davies Liu <da...@databricks.com>:
>
>> put these lines in your ~/.bash_profile
>>
>> export SPARK_PREPEND_CLASSES=true
>> export SPARK_HOME=path_to_spark
>> export
>> PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip:${SPARK_HOME}/python:${PYTHONPATH}"
>>
>> $ source ~/.bash_profile
>> $ build/sbt assembly
>> $ build/sbt ~compile  # do not stop this
>>
>> Then in another terminal you could run python tests as
>> $ cd python/pyspark/
>> $  python rdd.py
>>
>>
>> cc to dev list
>>
>>
>> On Fri, Mar 27, 2015 at 10:15 AM, Stephen Boesch <ja...@gmail.com>
>> wrote:
>> > Which aspect of that page are you suggesting provides a more optimized
>> > alternative?
>> >
>> > 2015-03-27 10:13 GMT-07:00 Davies Liu <da...@databricks.com>:
>> >
>> >> see
>> >>
>> >> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
>> >>
>> >> On Fri, Mar 27, 2015 at 10:02 AM, Stephen Boesch <ja...@gmail.com>
>> >> wrote:
>> >> > I am iteratively making changes to the scala side of some new pyspark
>> >> > code
>> >> > and re-testing from the python/pyspark side.
>> >> >
>> >> > Presently my only solution is to rebuild completely
>> >> >
>> >> >       sbt assembly
>> >> >
>> >> > after any scala side change - no matter how small.
>> >> >
>> >> > Any better / expedited way for pyspark to see small scala side
>> >> > updates?
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Iterative pyspark / scala codebase development

Posted by Stephen Boesch <ja...@gmail.com>.
Thx much!  This works.

My workflow is making changes to files in Intelij and running ipython to
execute pyspark.

Is there any way for ipython to "see the updated class files without first
exiting?

2015-03-27 10:21 GMT-07:00 Davies Liu <da...@databricks.com>:

> put these lines in your ~/.bash_profile
>
> export SPARK_PREPEND_CLASSES=true
> export SPARK_HOME=path_to_spark
> export
> PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip:${SPARK_HOME}/python:${PYTHONPATH}"
>
> $ source ~/.bash_profile
> $ build/sbt assembly
> $ build/sbt ~compile  # do not stop this
>
> Then in another terminal you could run python tests as
> $ cd python/pyspark/
> $  python rdd.py
>
>
> cc to dev list
>
>
> On Fri, Mar 27, 2015 at 10:15 AM, Stephen Boesch <ja...@gmail.com>
> wrote:
> > Which aspect of that page are you suggesting provides a more optimized
> > alternative?
> >
> > 2015-03-27 10:13 GMT-07:00 Davies Liu <da...@databricks.com>:
> >
> >> see
> >>
> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
> >>
> >> On Fri, Mar 27, 2015 at 10:02 AM, Stephen Boesch <ja...@gmail.com>
> >> wrote:
> >> > I am iteratively making changes to the scala side of some new pyspark
> >> > code
> >> > and re-testing from the python/pyspark side.
> >> >
> >> > Presently my only solution is to rebuild completely
> >> >
> >> >       sbt assembly
> >> >
> >> > after any scala side change - no matter how small.
> >> >
> >> > Any better / expedited way for pyspark to see small scala side
> updates?
> >
> >
>

Re: Iterative pyspark / scala codebase development

Posted by Davies Liu <da...@databricks.com>.
put these lines in your ~/.bash_profile

export SPARK_PREPEND_CLASSES=true
export SPARK_HOME=path_to_spark
export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip:${SPARK_HOME}/python:${PYTHONPATH}"

$ source ~/.bash_profile
$ build/sbt assembly
$ build/sbt ~compile  # do not stop this

Then in another terminal you could run python tests as
$ cd python/pyspark/
$  python rdd.py


cc to dev list


On Fri, Mar 27, 2015 at 10:15 AM, Stephen Boesch <ja...@gmail.com> wrote:
> Which aspect of that page are you suggesting provides a more optimized
> alternative?
>
> 2015-03-27 10:13 GMT-07:00 Davies Liu <da...@databricks.com>:
>
>> see
>> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
>>
>> On Fri, Mar 27, 2015 at 10:02 AM, Stephen Boesch <ja...@gmail.com>
>> wrote:
>> > I am iteratively making changes to the scala side of some new pyspark
>> > code
>> > and re-testing from the python/pyspark side.
>> >
>> > Presently my only solution is to rebuild completely
>> >
>> >       sbt assembly
>> >
>> > after any scala side change - no matter how small.
>> >
>> > Any better / expedited way for pyspark to see small scala side updates?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Iterative pyspark / scala codebase development

Posted by Davies Liu <da...@databricks.com>.
see https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools

On Fri, Mar 27, 2015 at 10:02 AM, Stephen Boesch <ja...@gmail.com> wrote:
> I am iteratively making changes to the scala side of some new pyspark code
> and re-testing from the python/pyspark side.
>
> Presently my only solution is to rebuild completely
>
>       sbt assembly
>
> after any scala side change - no matter how small.
>
> Any better / expedited way for pyspark to see small scala side updates?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Iterative pyspark / scala codebase development

Posted by Stephen Boesch <ja...@gmail.com>.
Compile alone did not show the scala code changes AFAICT. I will reverify.

2015-03-27 10:16 GMT-07:00 Davies Liu <da...@databricks.com>:

> I usually just open a terminal to do `build/sbt ~compile`, coding in
> IntelliJ, then run python tests in another terminal once it compiled
> successfully.
>
> On Fri, Mar 27, 2015 at 10:11 AM, Reynold Xin <rx...@databricks.com> wrote:
> > Python is tough if you need to change Scala at the same time.
> >
> > sbt/sbt assembly/assembly
> >
> > can be slightly faster than just assembly.
> >
> >
> > On Fri, Mar 27, 2015 at 10:02 AM, Stephen Boesch <ja...@gmail.com>
> wrote:
> >
> >> I am iteratively making changes to the scala side of some new pyspark
> code
> >> and re-testing from the python/pyspark side.
> >>
> >> Presently my only solution is to rebuild completely
> >>
> >>       sbt assembly
> >>
> >> after any scala side change - no matter how small.
> >>
> >> Any better / expedited way for pyspark to see small scala side updates?
> >>
>

Re: Iterative pyspark / scala codebase development

Posted by Davies Liu <da...@databricks.com>.
I usually just open a terminal to do `build/sbt ~compile`, coding in
IntelliJ, then run python tests in another terminal once it compiled
successfully.

On Fri, Mar 27, 2015 at 10:11 AM, Reynold Xin <rx...@databricks.com> wrote:
> Python is tough if you need to change Scala at the same time.
>
> sbt/sbt assembly/assembly
>
> can be slightly faster than just assembly.
>
>
> On Fri, Mar 27, 2015 at 10:02 AM, Stephen Boesch <ja...@gmail.com> wrote:
>
>> I am iteratively making changes to the scala side of some new pyspark code
>> and re-testing from the python/pyspark side.
>>
>> Presently my only solution is to rebuild completely
>>
>>       sbt assembly
>>
>> after any scala side change - no matter how small.
>>
>> Any better / expedited way for pyspark to see small scala side updates?
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Iterative pyspark / scala codebase development

Posted by Reynold Xin <rx...@databricks.com>.
Python is tough if you need to change Scala at the same time.

sbt/sbt assembly/assembly

can be slightly faster than just assembly.


On Fri, Mar 27, 2015 at 10:02 AM, Stephen Boesch <ja...@gmail.com> wrote:

> I am iteratively making changes to the scala side of some new pyspark code
> and re-testing from the python/pyspark side.
>
> Presently my only solution is to rebuild completely
>
>       sbt assembly
>
> after any scala side change - no matter how small.
>
> Any better / expedited way for pyspark to see small scala side updates?
>