You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@toree.apache.org by "Patrick McCarty (JIRA)" <ji...@apache.org> on 2017/10/09 05:33:00 UTC

[jira] [Commented] (TOREE-399) Make Spark Kernel work on Windows

    [ https://issues.apache.org/jira/browse/TOREE-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196515#comment-16196515 ] 

Patrick McCarty commented on TOREE-399:
---------------------------------------

I wrote my own hacky run.cmd and managed to get it working using Spark 2.2.0 and toree-assembly-0.2.0.dev1-incubating-SNAPSHOT. I'm a complete newbie to Spark, Scala, and Jupyter (learning them for a college class) so hopefully someone with greater familiarity can cleanup what I've done and implement it properly.

Firstly, I found that I needed to set -Dscala.usejavacp=true, or else you get the error mentioned by the previous 2 posters:
Failed to initialize compiler: object scala in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programmatically, settings.usejavacp.value = true.
Exception in thread "main" java.lang.NullPointerException

Secondly, I found that I needed to include the toree-assembly jar file in the classpath, or else I get "error: object toree is not a member of package org.apache".

Third, I found that I couldn't achieve the above classpath change in a good way by simply invoking spark-submit.cmd. I tried to make use of the --jars argument to the SparkSubmit program which feels like it ought to be the proper way to put toree in the classpath, but it didn't work. Maybe that's a bug or maybe I wasn't using it correctly. Due to the way that the spark-submit2.cmd and spark-class2.cmd files work, I could not find a way to use those scripts unmodified and still add toree to the classpath in addition to the normal classpath items. So the only way I was able to get this working was to avoid using the spark-submit.cmd script and just hardcode the SparkSubmit java command-line directly as shown in the following run.cmd file:

{{@echo off

set PROG_HOME=%~dp0..

if not defined SPARK_HOME (
  echo SPARK_HOME must be set to the location of a Spark distribution!
  exit 1
)

REM disable randomized hash for string in Python 3.3+
set PYTHONHASHSEED=0

REM The SPARK_OPTS values during installation are stored in __TOREE_SPARK_OPTS__. This allows values to be specified during
REM install, but also during runtime. The runtime options take precedence over the install options.

if not defined SPARK_OPTS (
  set SPARK_OPTS=%__TOREE_SPARK_OPTS__%
) else (
  if "%SPARK_OPTS%" == "" (
    set SPARK_OPTS=%__TOREE_SPARK_OPTS__%
  )
)

if not defined TOREE_OPTS (
  set TOREE_OPTS=%__TOREE_OPTS__%
) else (
  if "%TOREE_OPTS%" == "" (
    set TOREE_OPTS=%__TOREE_OPTS__%
  )
)

echo Starting Spark Kernel with SPARK_HOME=%SPARK_HOME%

REM This doesn't work because the classpath doesn't get set properly, unless you hardcode it in SPARK_SUBMIT_OPTS using forward slashes or double backslashes, but then you can't use the SPARK_HOME and PROG_HOME variables.
REM set SPARK_SUBMIT_OPTS=-cp "%SPARK_HOME%\conf\;%SPARK_HOME%\jars\*;%PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar" -Dscala.usejavacp=true
REM set TOREE_COMMAND="%SPARK_HOME%\bin\spark-submit.cmd" %SPARK_OPTS% --class org.apache.toree.Main %PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar %TOREE_OPTS% %*

REM The two important things that we must do differently on Windows are that we must add toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar to the classpath, and we must define the java property scala.usejavacp=true.
set TOREE_COMMAND=%JAVA_HOME%\bin\java -cp "%SPARK_HOME%\conf\;%SPARK_HOME%\jars\*;%PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar" -Dscala.usejavacp=true -Xmx1g org.apache.spark.deploy.SparkSubmit %SPARK_OPTS% --class org.apache.toree.Main %PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar %TOREE_OPTS% %*

echo.
echo %TOREE_COMMAND%
echo.

%TOREE_COMMAND%}}


The run.cmd file should be placed in C:\ProgramData\jupyter\kernels\apache_toree_scala\bin\
Additionally, you need to edit kernel.json in the folder above that to change run.sh to run.cmd.
If you want to allow for installing additional Toree kernels, you should also edit toreeapp.py to change run.sh to run.cmd (obviously the real solution will need code to detect the OS and reference the appropriate script).

> Make Spark Kernel work on Windows
> ---------------------------------
>
>                 Key: TOREE-399
>                 URL: https://issues.apache.org/jira/browse/TOREE-399
>             Project: TOREE
>          Issue Type: New Feature
>         Environment: Windows 7/8/10
>            Reporter: aldo
>         Attachments: run.bat
>
>
> After a successful install of the Spark Kernel the error: "Failed to run command:" occurs when from jupyter we select a Scala Notebook.
> The error happens because the kernel.json runs C:\\ProgramData\\jupyter\\kernels\\apache_toree_scala\\bin\\run.sh which is bash shell script and hence cannot work on windows.
> Can you give me some direction to fix this, and I will implement it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)