You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Aniket <an...@gmail.com> on 2014/05/30 10:12:32 UTC

Why does spark REPL not embed scala REPL?

My apologies in advance if this is a dev mailing list topic. I am working on
a small project to provide web interface to spark REPL. The interface will
allow people to use spark REPL and perform exploratory analysis on the data.
I already have a play application running that provides web interface to
standard scala REPL and I am just looking to extend it to optionally include
support for spark REPL. My initial idea was to include spark dependencies in
the project, create a new instance of SparkContext and bind it to a variable
(lets say 'sc') using imain.bind("sc", sparkContext). While theoretically
this may work, I am trying to understand why spark REPL takes a different
path by creating it's own SparkILoop, SparkIMain, etc. Can anyone help me
understand why there was a need to provide custom versions of IMain, ILoop,
etc instead of embedding the standard scala REPL and binding SparkContext
instance?

Here is my analysis so far:
1. ExecutorClassLoader - I understand this is need to load classes from
HDFS. Perhaps this could have been plugged into the standard scala REPL
using settings.embeddedDefaults(classLoaderInstance). Also, it's not clear
what ConstructorCleaner does.

2. SparkCommandLine & SparkRunnerSettings - Allow for providing an extra -i
file argument to the REPL. The standard sourcepath wouldn't have sufficed?

3. SparkExprTyper - The only difference between standard ExprTyper and
SparkExprTyper is that repldbg is replaced with logDebug. Not sure if this
was intentional/needed.

4. SparkILoop - Has a few deviations from standard ILoop class but this
could have been managed by extending or wrapping ILoop class or using
settings. Not sure what triggered the need to copy the source code and make
edits.

5. SparkILoopInit - Changes the welcome message and binds spark context in
the interpreter. Welcome message could have been changed by extending
ILoopInit.

6. SparkIMain - Contains quiet a few changes around class
loading/logging/etc but I found it very hard to figure out if extension of
IMain was an option and what exactly didn't work/will not work with IMain.

Rest of the classes seem very similar to their standard counterparts. I have
a feeling the spark REPL can be refactored to embed standard scala REPL. I
know refactoring would not help Spark project as such but would help people
embed the spark REPL much in the same way it's done with standard scala
REPL. Thoughts?

--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-does-spark-REPL-not-embed-scala-REPL-tp6871.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Why does spark REPL not embed scala REPL?

Posted by Aaron Davidson <il...@gmail.com>.

There's some discussion here as well on just using the Scala REPL for 2.11:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-on-Scala-2-11-td6506.html#a6523

Matei's response mentions the features we needed to change from the Scala
REPL (class-based wrappers and where to output the generated classes),
which were added as options to the 2.11 REPL, so we may be able to trim
down a bunch when 2.11 becomes standard.


On Fri, May 30, 2014 at 4:16 AM, Kan Zhang <kz...@apache.org> wrote:

> One reason is standard Scala REPL uses object based wrappers and their
> static initializers will be run on remote worker nodes, which may fail due
> to differences between driver and worker nodes. See discussion here
> https://groups.google.com/d/msg/scala-internals/h27CFLoJXjE/JoobM6NiUMQJ
>
>
> On Fri, May 30, 2014 at 1:12 AM, Aniket <an...@gmail.com>
> wrote:
>
> > My apologies in advance if this is a dev mailing list topic. I am working
> > on
> > a small project to provide web interface to spark REPL. The interface
> will
> > allow people to use spark REPL and perform exploratory analysis on the
> > data.
> > I already have a play application running that provides web interface to
> > standard scala REPL and I am just looking to extend it to optionally
> > include
> > support for spark REPL. My initial idea was to include spark dependencies
> > in
> > the project, create a new instance of SparkContext and bind it to a
> > variable
> > (lets say 'sc') using imain.bind("sc", sparkContext). While theoretically
> > this may work, I am trying to understand why spark REPL takes a different
> > path by creating it's own SparkILoop, SparkIMain, etc. Can anyone help me
> > understand why there was a need to provide custom versions of IMain,
> ILoop,
> > etc instead of embedding the standard scala REPL and binding SparkContext
> > instance?
> >
> > Here is my analysis so far:
> > 1. ExecutorClassLoader - I understand this is need to load classes from
> > HDFS. Perhaps this could have been plugged into the standard scala REPL
> > using settings.embeddedDefaults(classLoaderInstance). Also, it's not
> clear
> > what ConstructorCleaner does.
> >
> > 2. SparkCommandLine & SparkRunnerSettings - Allow for providing an extra
> -i
> > file argument to the REPL. The standard sourcepath wouldn't have
> sufficed?
> >
> > 3. SparkExprTyper - The only difference between standard ExprTyper and
> > SparkExprTyper is that repldbg is replaced with logDebug. Not sure if
> this
> > was intentional/needed.
> >
> > 4. SparkILoop - Has a few deviations from standard ILoop class but this
> > could have been managed by extending or wrapping ILoop class or using
> > settings. Not sure what triggered the need to copy the source code and
> make
> > edits.
> >
> > 5. SparkILoopInit - Changes the welcome message and binds spark context
> in
> > the interpreter. Welcome message could have been changed by extending
> > ILoopInit.
> >
> > 6. SparkIMain - Contains quiet a few changes around class
> > loading/logging/etc but I found it very hard to figure out if extension
> of
> > IMain was an option and what exactly didn't work/will not work with
> IMain.
> >
> > Rest of the classes seem very similar to their standard counterparts. I
> > have
> > a feeling the spark REPL can be refactored to embed standard scala REPL.
> I
> > know refactoring would not help Spark project as such but would help
> people
> > embed the spark REPL much in the same way it's done with standard scala
> > REPL. Thoughts?
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Why-does-spark-REPL-not-embed-scala-REPL-tp6871.html
> > Sent from the Apache Spark Developers List mailing list archive at
> > Nabble.com.
> >
>

Re: Why does spark REPL not embed scala REPL?

Posted by Kan Zhang <kz...@apache.org>.

One reason is standard Scala REPL uses object based wrappers and their
static initializers will be run on remote worker nodes, which may fail due
to differences between driver and worker nodes. See discussion here
https://groups.google.com/d/msg/scala-internals/h27CFLoJXjE/JoobM6NiUMQJ


On Fri, May 30, 2014 at 1:12 AM, Aniket <an...@gmail.com> wrote:

> My apologies in advance if this is a dev mailing list topic. I am working
> on
> a small project to provide web interface to spark REPL. The interface will
> allow people to use spark REPL and perform exploratory analysis on the
> data.
> I already have a play application running that provides web interface to
> standard scala REPL and I am just looking to extend it to optionally
> include
> support for spark REPL. My initial idea was to include spark dependencies
> in
> the project, create a new instance of SparkContext and bind it to a
> variable
> (lets say 'sc') using imain.bind("sc", sparkContext). While theoretically
> this may work, I am trying to understand why spark REPL takes a different
> path by creating it's own SparkILoop, SparkIMain, etc. Can anyone help me
> understand why there was a need to provide custom versions of IMain, ILoop,
> etc instead of embedding the standard scala REPL and binding SparkContext
> instance?
>
> Here is my analysis so far:
> 1. ExecutorClassLoader - I understand this is need to load classes from
> HDFS. Perhaps this could have been plugged into the standard scala REPL
> using settings.embeddedDefaults(classLoaderInstance). Also, it's not clear
> what ConstructorCleaner does.
>
> 2. SparkCommandLine & SparkRunnerSettings - Allow for providing an extra -i
> file argument to the REPL. The standard sourcepath wouldn't have sufficed?
>
> 3. SparkExprTyper - The only difference between standard ExprTyper and
> SparkExprTyper is that repldbg is replaced with logDebug. Not sure if this
> was intentional/needed.
>
> 4. SparkILoop - Has a few deviations from standard ILoop class but this
> could have been managed by extending or wrapping ILoop class or using
> settings. Not sure what triggered the need to copy the source code and make
> edits.
>
> 5. SparkILoopInit - Changes the welcome message and binds spark context in
> the interpreter. Welcome message could have been changed by extending
> ILoopInit.
>
> 6. SparkIMain - Contains quiet a few changes around class
> loading/logging/etc but I found it very hard to figure out if extension of
> IMain was an option and what exactly didn't work/will not work with IMain.
>
> Rest of the classes seem very similar to their standard counterparts. I
> have
> a feeling the spark REPL can be refactored to embed standard scala REPL. I
> know refactoring would not help Spark project as such but would help people
> embed the spark REPL much in the same way it's done with standard scala
> REPL. Thoughts?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Why-does-spark-REPL-not-embed-scala-REPL-tp6871.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>