You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by AhyoungRyu <gi...@git.apache.org> on 2016/08/18 02:59:32 UTC

[GitHub] zeppelin pull request #1339: [WIP][ZEPPELIN-1332] Remove spark-dependencies ...

GitHub user AhyoungRyu opened a pull request:

    https://github.com/apache/zeppelin/pull/1339

    [WIP][ZEPPELIN-1332] Remove spark-dependencies & suggest new way

    ### What is this PR for?
    Currently, Zeppelin's embedded Spark is located under `interpreter/spark/`. For whom **builds Zeppelin from source**, this Spark is downloaded when they build the source with [build profiles](https://github.com/apache/zeppelin#spark-interpreter). I think this various build profiles are useful to customize the embedded Spark, but many Spark users are using their own Spark not Zeppelin's embedded one. Nowadays, only Spark&Zeppelin beginners use this embedded Spark. For them, there are too many build profiles(it's so complicated i think). In case of **Zeppelin binary package**, it's included by default under `interpreter/spark/`. That's why Zeppelin package size is so huge. 
    
    This PR will change the embedded Spark binary downloading mechanism as like below.
    
    1. If user didn't set their own `SPARK_HOME`, [bin/download-spark.sh](https://github.com/AhyoungRyu/zeppelin/blob/5703fbf27fedda9ec7dd142e275b8654c9bc6296/bin/download-spark.sh) will be run when they start Zeppelin server using `bin/zeppelin-daemon.sh` or `bin/zeppelin.sh`.
    2. [bin/download-spark.sh](https://github.com/AhyoungRyu/zeppelin/blob/5703fbf27fedda9ec7dd142e275b8654c9bc6296/bin/download-spark.sh) : Download `spark-2.0.0-bin-hadoop2.7.tgz` from mirror site to `$ZEPPELIN_HOME/.spark-dist/` and untar -> Set `SPARK_HOME` as `$ZEPPELIN_HOME/.spark-dist/spark-2.0.0-bin-hadoop2.7` -> add this `SPARK_HOME` to `conf/zeppelin-env.sh`
    
    With this new mechanism, we can not only reduce Zeppelin overall binary package size but also user doesn't need to type complicating build profiles when they build Zeppelin source.
    
    ### What type of PR is it?
    Improvement
    
    ### Todos
    * [ ] - update [README.md](https://github.com/apache/zeppelin/blob/master/README.md)
    * [ ] - add `download-spark.cmd` for Window users 
    
    ### What is the Jira issue?
    See [ZEPPELIN-1332](https://issues.apache.org/jira/browse/ZEPPELIN-1332)'s description for the details about **Why we need to remove spark-dependencies** & **New suggestion for Zeppelin's embedded Spark binary**.
    
    
    ### How should this be tested?
    After apply this patch, build with `mvn clean package -DskipTests`. Please note that you need to check `spark-dependencies` is removed well or not.
     - Without prespecified `SPARK_HOME` 
      1. Start Zeppelin daemon
      <img width="975" alt="screen shot 2016-08-18 at 11 20 27 am" src="https://cloud.githubusercontent.com/assets/10060731/17759836/e3c16022-6535-11e6-8576-43975c3293c3.png">
      2. Check `conf/zeppelin-env.sh` line 46. `SPARK_HOME` will be set like below 
      ```
      export SPARK_HOME="/YOUR_ZEPPELIN_HOME/.spark-dist/spark-2.0.0-bin-hadoop2.7"
      ```
      3. Go to Zeppelin website and run `sc.version` with Spark interpreter & `echo $SPARK_HOME` with sh interpreter.
      <img width="1030" alt="screen shot 2016-08-18 at 11 26 21 am" src="https://cloud.githubusercontent.com/assets/10060731/17759937/a7bcc584-6536-11e6-9664-cffdc6e5bdf8.png">
    
     - With prespecified `SPARK_HOME`
    Nothing happened. Zeppelin will be started as like before.
     
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? no
    * Is there breaking changes for older versions? no
    * Does this needs documentation? need to update [README.md](https://github.com/apache/zeppelin/blob/master/README.md)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AhyoungRyu/zeppelin ZEPPELIN-1332

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1339
    
----
commit ae74e90f8409b7396eeebf34c103a6db071b1771
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-16T15:08:19Z

    Fix typo comment in interpreter.sh

commit ada6f37d1df60f37740d63c913cdd89f7b919269
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T01:52:06Z

    Remove spark-dependencies

commit 87b929d7d38e447306796cec44b35cb7317b9bb3
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T07:14:35Z

    Add spark-2.*-bin-hadoop* to .gitignore

commit 5703fbf27fedda9ec7dd142e275b8654c9bc6296
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:22:25Z

    Add download-spark.sh file

commit 35350bb9990436cd7ede1e611f0b94a56ed24793
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:28:51Z

    Remove useless comment line in common.sh

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @Leemoonsoo Rebased but there is an issue raised from #1564 like below 
    
    ```
    16/11/05 06:04:34 ERROR PySparkInterpreter: Error
    java.util.NoSuchElementException: spark.submit.pyFiles
    	at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:235)
    	at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:235)
    	at scala.Option.getOrElse(Option.scala:121)
    	at org.apache.spark.SparkConf.get(SparkConf.scala:235)
    	at org.apache.zeppelin.spark.PySparkInterpreter.setupPySparkEnv(PySparkInterpreter.java:172)
    	at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:209)
    	at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:159)
    	at org.apache.zeppelin.spark.PySparkInterpreterTest.setUp(PySparkInterpreterTest.java:98)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:606)
    	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
    	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
    	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
    	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
    	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
    	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
    	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
    Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 15.942 sec <<< FAILURE! - in org.apache.zeppelin.spark.PySparkInterpreterTest
    ```
    
    Seems #1564 and this change has conflict by removing `spark-dependency`. Let me fix this first. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [WIP][ZEPPELIN-1332] Remove spark-dependencies & sugge...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @bzz Thank you for such precise comment! Let me break down your feedback one by one(just for making it clear) :)
    
    1.
    >/.spark-dist/ is under cache on TravisCI which is S3 bucket that gets synced automatically with the content of this folder while running a build. 
    
    Right. That's my bad. I'll change the dir to another. Then how about `ZEPPELIN_HOME/interpreter/spark/` as like before? 
    
    2, 3, 4.
    >what is the benefit and what problem does this change solves?
    
    Actually I also tried to describe well about the current problem & the advantage of this change in Jira issue and the PR description, but i guess i didn't. I should've explain more clearly. Let me explain more in here with actual digit. (I'll update the Jira & PR description as well)
    
     - **What was the problem?**
    
    As you said in the above, yes. The main problem is the Zeppelin binary package size. The latest version of Zeppelin bin size was
    ```
    zeppelin-0.6.1-bin-all.tgz: 517MB
    zeppelin-0.6.1-bin-netinst.tgz: 236MB
    ```
    Didn't we ask ASF infra team(?) every release because of Zeppelin's huge package size?
    
     - **What is the benefit?**
    
    When I created binary package without `spark-dependencies`, the each bin package size was
    ```
    zeppelin-0.6.1-bin-all.tgz: 344MB
    zeppelin-0.6.1-bin-netinst.tgz: 64MB
    ```
    As you can see in the above those two cases' size diff is about `170MB`!  Moreover, users don't need to type build profiles i.e. `-Pr` or `-Psparkr`. I saw many users who are trying to use `%sparkr` in Zeppelin, they hit NPE because they didn't build with `-Psparkr`. It's truly confuse maybe they don't know well about the maven build mechanism. But with this change, they don't need to know about the complicating maven build profiles. 
    
    5.
    > Also regarding user experience - while running zeppelin-demon.sh user does not usually expect it to be network-dependant and download 100Mb archives - is there at least a user notification\progress indicator
    
    So far, I just added below line to show in console after users start `zeppelin-daemon.sh`
    ```
    echo "There is no SPARK_HOME in your system. After successful Spark bin installation, Zeppelin will be started."
    ``` 
    Then it starts downloading Spark binary from the mirror site. I'm planning to add some description to README as we have provided many build profiles information in there. I also agree there must be better way to notify that instead of just writing about "We will download 100MB Spark binary package if you don't set SPARK_HOME yet" on README. 
    
    After first I came up with removing `spark-dependencies` to reduce Zeppelin bin package size, I spent long time to think about how can we substitute the preexisting way seamlessly to provide embedded Spark in Zeppelin as like before. Please regard this PR as the first initiative. And will be appreciated if you can share your awesome idea about this issue! :)
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1339: [ZEPPELIN-1332] Remove spark-dependencies & sug...

Posted by AhyoungRyu <gi...@git.apache.org>.

GitHub user AhyoungRyu reopened a pull request:

    https://github.com/apache/zeppelin/pull/1339

    [ZEPPELIN-1332] Remove spark-dependencies & suggest new way

    ### What is this PR for?
    Currently, Zeppelin's embedded Spark is located under `interpreter/spark/`. 
    For whom **builds Zeppelin from source**, this Spark is downloaded when they build the source with [build profiles](https://github.com/apache/zeppelin#spark-interpreter). I think this various build profiles are useful to customize the embedded Spark, but many Spark users are using their own Spark not Zeppelin's embedded one. Nowadays only Spark&Zeppelin beginners use this embedded Spark. For them, there are too many build profiles(it's so complicated i think). 
    In case of **Zeppelin binary package**, it's included by default under `interpreter/spark/`. That's why Zeppelin package size is so huge. 
    
    #### New suggestions
    This PR will change the embedded Spark binary downloading mechanism like below.
    
    1. `./bin/zeppelin-daemon.sh get-spark` or `./bin/zeppelin.sh get-spark`
    2. create `ZEPPELIN_HOME/local-spark/` and will download `spark-2.0.1-hadoop2.7.bin.tgz` and untar 
    3. we can use this local spark without any configuration like before (e.g. setting `SPARK_HOME`)
    
    ### What type of PR is it?
    Improvement
    
    ### Todos
    - [x] - trap `ctrl+c` & `ctrl+z` key interruption during downloading Spark
    - [x] - test in the different OS 
    - [x] - update related document pages again after get feedbacks
    
    ### What is the Jira issue?
    [ZEPPELIN-1332](https://issues.apache.org/jira/browse/ZEPPELIN-1332)
    
    ### How should this be tested?
    1. `rm -r spark-dependencies` 
    2. Apply this patch and build with `mvn clean package -DskipTests`
    3. try`bin/zeppelin-daemon.sh get-spark` or `bin/zeppelin.sh get-spark`
    4. should be able to run `sc.version` without setting external `SPARK_HOME`
    
    ### Screenshots (if appropriate)
    - `./bin/zeppelin-daemon.sh get-spark`
    ```
    $ ./bin/zeppelin-daemon.sh get-spark
    Download spark-2.0.1-bin-hadoop2.7.tgz from mirror ...
    
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  178M  100  178M    0     0  10.4M      0  0:00:17  0:00:17 --:--:-- 10.2M
    
    spark-2.0.1-bin-hadoop2.7 is successfully downloaded and saved under /Users/ahyoungryu/Dev/zeppelin-development/zeppelin/local-spark
    ```
    - if `ZEPPELIN_HOME/local-spark/spark-2.0.1-hadoop2.7` already exists
    ```
    $ ./bin/zeppelin-daemon.sh get-spark
    spark-2.0.1-bin-hadoop2.7 already exists under local-spark.
    ```
    
    ### Questions:
    - Does the licenses files need update? no
    - Is there breaking changes for older versions? no
    - Does this needs documentation? Need to update some related documents (e.g. README.md, spark.md and install.md ?)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AhyoungRyu/zeppelin ZEPPELIN-1332

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1339
    
----
commit d377cc6f28dd6cae43364f61135ed8abcba3b269
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-16T15:08:19Z

    Fix typo comment in interpreter.sh

commit 4f3edfd87e84e65789e0e937b5330c16442fcfbe
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T01:52:06Z

    Remove spark-dependencies

commit 99ef019521ca1fd0fc41958b20da8642773825d5
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T07:14:35Z

    Add spark-2.*-bin-hadoop* to .gitignore

commit 4e8d5ff067c5428a5254e45b4de533c56393f7b4
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:22:25Z

    Add download-spark.sh file

commit 6784015b8da439894dd09bbc3e54477a0f3cba84
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:28:51Z

    Remove useless comment line in common.sh

commit c866f0b231432b14c092a365d270e81a2222f54a
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-18T03:32:11Z

    Remove zeppelin-spark-dependencies from r/pom.xml

commit 3fe19bff1bdbdccba63e3163bd7aabfe23a35777
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-21T05:38:55Z

    Change SPARK_HOME with proper message

commit 99545233c0e84f48fbf98da25ad131eeba6dd293
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-06T08:55:20Z

    Check interpreter/spark/ instead of SPARK_HOME

commit e6973b3887e9c0d50a1168f26e6f0337f9f78986
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-06T08:55:40Z

    Refactor download-spark.sh

commit 552185ac03f1b5edc9fabb4d381d471c59078903
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-07T07:48:15Z

    Revert: remove spark-dependencies

commit ffe64d9b264ab3db67d28a045e34c9c4d471058a
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-07T13:23:11Z

    Remove useless ZEPPELIN_HOME

commit 5ed33112d64dc3063a29d515d4987e193a909dd0
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T05:51:40Z

    Change dir of Spark bin to 'local-spark'

commit 1419f0b8d76a8e15ac7646e3827dd536246038d1
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T06:07:20Z

    Set timeout for travis test

commit a813d922ba29b5c392a908c3199050884266b969
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T06:16:54Z

    Add license header to download-spark.cmd

commit 368c15aefd650a59c6fb0fdd040efe1bbb2618cc
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T11:48:43Z

    Fix wrong check condition in common.sh

commit e58075d046f65ae173fecc31c0b648b87f445af4
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T13:14:29Z

    Add travis condition to download-spark.sh

commit 89be91b049a646b1a0fc7dcfeb5e8bfde68bdab4
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T05:42:29Z

    Remove bin/download-spark.cmd again

commit b22364ddba120842933e96eca1e082680cd5407a
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T16:25:31Z

    Remove spark-dependency profiles & reorganize some titles in README.md

commit 24dc95faa39586be323365f21a2beb1f683becf8
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T18:30:41Z

    Update spark.md to add a guide for local-spark mode

commit 2537fa14d5e13c34be9eeab932bf5dc853bda5d4
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T18:49:49Z

    Remove '-Ppyspark' build options

commit ca534e596c36ced04f832b0a7ab7e78e951929e1
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-13T08:09:18Z

    Remove useless creating .bak file process

commit edd525d0f6eac0a956bc64f58e77ac3afc423f58
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-13T11:21:10Z

    Update install.md & spark.md

commit a9b110a809463ac1795e76a30b9cd2df6c40292d
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T09:35:37Z

    Resolve 'sed' command issue between OSX & Linux

commit f383d3afb8f9e2c1e240f69d8d970c469d0a9ced
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T11:20:31Z

    Trap ctrl+c during downloading Spark

commit 527ef5b6518d3477d9731422cad190a59df11d1e
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T11:26:56Z

    Remove useless condition

commit 555372a655b788b3b0fdd85d430b6f063ce13834
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-20T17:05:16Z

    Make local spark mode with zero-configuration as @moon suggested

commit de87cb2adf5ad510a712e4f696ae127c7a414077
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T14:20:31Z

    Modify SparkRInterpreter.java to enable SparkR without SPARK_HOME

commit 1dd51d8e1dcb8d65e22a1cc67a5d089c5d7c196b
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T17:01:40Z

    Remove duplicated variable declaration

commit f068bef554507e7125865f77816986d5b085a7b3
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T17:02:01Z

    Update related docs again

commit 437f2063a39d2a7a583bb647cb885e51a0990098
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-23T05:37:57Z

    Fix typo in SparkRInterpreter.java

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1339: [ZEPPELIN-1332] Remove spark-dependencies & sug...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu closed the pull request at:

    https://github.com/apache/zeppelin/pull/1339


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    ping \U0001f46f 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

https://github.com/apache/zeppelin/pull/1339

@bzz @Leemoonsoo
Sorry for my late response. I spent some time to test various cases on the different OS.
I think it's ready for review(CI is green at last!).
I updated the PR description accordingly. Hope it helps you to remind the purpose of this PR :)

Here is the list of changes after my initial commits

- Directory of Spark bin
I changed the dir of Spark bin from `interpreter/spark/` to `local-spark`. Since `mvn clean` will remove `interpreter/`, users will see \u201cDo you want to download local Spark?\u201d whenever they re-build and restart Zeppelin. So I think creating a new dir(`local-spark`) would be better in this case.

- Supporting Windows users
I wanted to create `download-spark.cmd` for Windows. But sadly we can\u2019t use many shell commands such as `curl`, `tar` and `sed` in the batch script. It was hard to find 100% compatible commands for Windows. Maybe we can guide the Windows users to install those commands by themselves, but it\u2019s a bit overdoing i think. Actually `download-spark` script is only for downloading the latest version of Spark and set `SPARK_HOME`. So I updated some docs to explain this as a alternative way.

- Documentations
I think it\u2019s quite big change that ppl need to enter \u201cYes/No\u201d when they start Zeppelin. Even though it\u2019s only one time. So I updated `README.md`, `install.md` and `spark.md`. The below screenshot is `spark.md`.
![screen shot 2016-09-13 at 8 22 34 pm](https://cloud.githubusercontent.com/assets/10060731/18516345/8b0820ac-7ad3-11e6-8581-11e3cb12c57a.png)

And as @bzz said before,
>How about EMR\Dataproc\Juju\BigTop users, will the proposed change affect them?

Do we need to provide this local Spark mode for them? Actually it's my question.. :D
And please feel free to point me for anything if it's needed.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1339: [ZEPPELIN-1332] Remove spark-dependencies & sug...

Posted by AhyoungRyu <gi...@git.apache.org>.

GitHub user AhyoungRyu reopened a pull request:

    https://github.com/apache/zeppelin/pull/1339

    [ZEPPELIN-1332] Remove spark-dependencies & suggest new way

    ### What is this PR for?
    Currently, Zeppelin's embedded Spark is located under `interpreter/spark/`. 
    For whom **builds Zeppelin from source**, this Spark is downloaded when they build the source with [build profiles](https://github.com/apache/zeppelin#spark-interpreter). I think this various build profiles are useful to customize the embedded Spark, but many Spark users are using their own Spark not Zeppelin's embedded one. Nowadays only Spark&Zeppelin beginners use this embedded Spark. For them, there are too many build profiles(it's so complicated i think). 
    In case of **Zeppelin binary package**, it's included by default under `interpreter/spark/`. That's why Zeppelin package size is so huge. 
    
    #### New suggestions
    This PR will change the embedded Spark binary downloading mechanism like below.
    
    1. `./bin/zeppelin-daemon.sh get-spark` or `./bin/zeppelin.sh get-spark`
    2. create `ZEPPELIN_HOME/local-spark/` and will download `spark-2.0.1-hadoop2.7.bin.tgz` and untar 
    3. we can use this local spark without any configuration like before (e.g. setting `SPARK_HOME`)
    
    ### What type of PR is it?
    Improvement
    
    ### Todos
    - [x] - trap `ctrl+c` & `ctrl+z` key interruption during downloading Spark
    - [x] - test in the different OS 
    - [x] - update related document pages again after get feedbacks
    
    ### What is the Jira issue?
    [ZEPPELIN-1332](https://issues.apache.org/jira/browse/ZEPPELIN-1332)
    
    ### How should this be tested?
    1. `rm -r spark-dependencies` 
    2.  Apply this patch and build with `mvn clean package -DskipTests`
    3. try`bin/zeppelin-daemon.sh get-spark` or `bin/zeppelin.sh get-spark`
    
    ### Screenshots (if appropriate)
    - `./bin/zeppelin-daemon.sh get-spark`
    ```
    $ ./bin/zeppelin-daemon.sh get-spark
    Download spark-2.0.1-bin-hadoop2.7.tgz from mirror ...
    
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  178M  100  178M    0     0  10.4M      0  0:00:17  0:00:17 --:--:-- 10.2M
    
    spark-2.0.1-bin-hadoop2.7 is successfully downloaded and saved under /Users/ahyoungryu/Dev/zeppelin-development/zeppelin/local-spark
    ```
    - if `ZEPPELIN_HOME/local-spark/spark-2.0.1-hadoop2.7` already exists
    ```
    $ ./bin/zeppelin-daemon.sh get-spark
    spark-2.0.1-bin-hadoop2.7 already exists under local-spark.
    ```
    
    ### Questions:
    - Does the licenses files need update? no
    - Is there breaking changes for older versions? no
    - Does this needs documentation? Need to update some related documents (e.g. README.md, spark.md and install.md ?)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AhyoungRyu/zeppelin ZEPPELIN-1332

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1339
    
----
commit cf91a45420ea3047522998238beba274db9a5fca
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-16T15:08:19Z

    Fix typo comment in interpreter.sh

commit 6c08f5207cb2286b6072b3dcd5cc882b4dbca39b
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T01:52:06Z

    Remove spark-dependencies

commit a36702f8b35d7ee0d269190fe42ac8a2ff5d5b6e
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T07:14:35Z

    Add spark-2.*-bin-hadoop* to .gitignore

commit 31b04f58491502d5b4ea7c1800f7606013a8ae74
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:22:25Z

    Add download-spark.sh file

commit fd87a09d83000c94ced1b04b4254de9b35e4ccc5
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:28:51Z

    Remove useless comment line in common.sh

commit e0fc280de061f7ee06603d5bc9ab41b5219a749d
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-18T03:32:11Z

    Remove zeppelin-spark-dependencies from r/pom.xml

commit bf06931b988aee4d9dfc3c173cac18a740666e36
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-21T05:38:55Z

    Change SPARK_HOME with proper message

commit dceb74fff19eac2071eed0d661c0571eceeada54
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-06T08:55:20Z

    Check interpreter/spark/ instead of SPARK_HOME

commit e2a078ab87deba8cdf4a99f6c3642e3d4b41f3d8
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-06T08:55:40Z

    Refactor download-spark.sh

commit 3c792d07c6d6b55896ae5b0e3e2b0d08f70fafb1
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-07T07:48:15Z

    Revert: remove spark-dependencies

commit 1071566f442b9cf01c7145fd9dcbb48eb343f81a
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-07T13:23:11Z

    Remove useless ZEPPELIN_HOME

commit 0c7e1b73299634f9fc5c579c54d4cff49449f910
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T05:51:40Z

    Change dir of Spark bin to 'local-spark'

commit 787cec50ce796ddae9a1302e1ce376b2f3e5c5be
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T06:07:20Z

    Set timeout for travis test

commit b5fc541a96d17db513dcd7d5c1ec5671e85733f0
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T06:16:54Z

    Add license header to download-spark.cmd

commit c4d39f1df4dfeed1ad8544fe75621dc1aac693da
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T11:48:43Z

    Fix wrong check condition in common.sh

commit 5c631477133253506490744abb54a0582a066f6c
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T13:14:29Z

    Add travis condition to download-spark.sh

commit e91e7f83da0c44d91f53ae94a9d2f7f8117b86ae
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T05:42:29Z

    Remove bin/download-spark.cmd again

commit f40fd2f13071647e812ac54278b1fbff87b808e7
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T16:25:31Z

    Remove spark-dependency profiles & reorganize some titles in README.md

commit 31ebd191203139ee6f6bd794375c64c4f66cd28a
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T18:30:41Z

    Update spark.md to add a guide for local-spark mode

commit 803f21cbfff07deaa7cd5d5be1e423b4db4802c7
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T18:49:49Z

    Remove '-Ppyspark' build options

commit d5882554562e0244cb063186630b9e952fdf1c1c
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-13T08:09:18Z

    Remove useless creating .bak file process

commit b7a91453255cced389bc639e27dc8b2232afd19f
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-13T11:21:10Z

    Update install.md & spark.md

commit 63f29e91c3bd22df44aae91929468ea6a9516474
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T09:35:37Z

    Resolve 'sed' command issue between OSX & Linux

commit 6e329a7b832cf9e526b72aa5e3eb32ab697ebfd7
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T11:20:31Z

    Trap ctrl+c during downloading Spark

commit 1205f2d67f8116353f30128b367cebe2d35fd344
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T11:26:56Z

    Remove useless condition

commit ff069af7023132ea4478ad38ea1164176b753ab9
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-20T17:05:16Z

    Make local spark mode with zero-configuration as @moon suggested

commit c818cf766a60ef432d5310a664aeded7d9a58ab3
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T06:47:05Z

    Put 'autodetect HADOOP_CONF_HOME by heuristic' back code blocks

commit b2dca36e25b03ac56a9ee221c1dff2d1ed105c95
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T14:20:31Z

    Modify SparkRInterpreter.java to enable SparkR without SPARK_HOME

commit 310d607564e156b85a687e37c2c1d14d00ad1348
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T17:01:40Z

    Remove duplicated variable declaration

commit 1ee4325aea1f761765018e15396860e9ca2bc538
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T17:02:01Z

    Update related docs again

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    All test passes(except for selenium test) in my own travis [AhyoungRyu/zeppelin/builds](https://travis-ci.org/AhyoungRyu/zeppelin/builds/174094481), but Zeppelin travis doesn't even started... [apache/zeppelin/build](https://travis-ci.org/apache/zeppelin/builds/174094499). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by astroshim <gi...@git.apache.org>.

Github user astroshim commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    PySparkInterpreter need a [python library for pyspark](https://github.com/apache/zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java#L483) so I think we need [pyspark] (https://github.com/apache/zeppelin/blob/1cde24665180e8f10651012f53d7bcb58ea2eb44/spark-dependencies/pom.xml#L819). 
    What do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

https://github.com/apache/zeppelin/pull/1339

### To whom may concern about the breaking current UX with this change

This change has many benefits comparing to current embedded Spark as I wrote in the PR description (and @tae-jun mentioned in [this comment](https://github.com/apache/zeppelin/pull/1339#issuecomment-259486249) as well. Thanks!).
But as always, this kind of big change brings downside as well (e.g. breaking current UX). So I wanna write down how we can address some major cases as below. I think it would be better to share my opinion and get more feedback before merging. :)

1. New Spark/Zeppelin user, running Zeppelin for the first time
: Quite easy to cover and already handled by updating the related docs pages I guess.

2. Existing Spark/Zeppelin user, running new Zeppelin installation (e.g. upgrading version)
: Definitely this case is harder to handle than 1. As the user already has expectation, that local mode will **just works** and surely they won't read the docs. To resolve this, I\u2019ll update `bin/download-spark.sh` to print sth like \u201cYou don\u2019t have local-spark/, you can download embedded Spark with `get-spark` option.\u201d When the user run `./bin/zeppelin-daemon.sh start`. And this sentences can be removed in the future when Zeppelin users can be getting accustomed with `get-spark` option.

3. Docker user, starting `bin/zeppelin.sh` inside the container
: This one can be also hard to handle because the user might assume that Spark just works. So I would suggest start applying this change to #1538 as a first step. Since it can be a Zeppelin-provided official docker script.

4. CI issue
Since @bzz raised some concern about CI issue, let me answer again in here to make sure :)
The reason I removed `-Ppyspark` in `.travis` is `pyspark` profile is only existed in `spark-dependencies/pom.xml`. So `pyspark` profile won\u2019t be anymore after this PR merged. Actually the Pyspark testcase that @astroshim added recently had some conflict with this change. But we solved by simply adding `export SPARK_HOME=`pwd`/spark-$SPARK_VER-bin-hadoop$HADOOP_VER` to `.travis.yml` so that travis can run it before running the script. So there are no more CI issues especially concerning about removing `spark-dependencies` related build profiles.

[GitHub] zeppelin issue #1339: [WIP][ZEPPELIN-1332] Remove spark-dependencies & sugge...

Posted by bzz <gi...@git.apache.org>.

Github user bzz commented on the issue:

https://github.com/apache/zeppelin/pull/1339

@AhyoungRyu great initiative, but while making this changes, you have to think also about CI use case of zeppelin build as well.

I.e so far `/.spark-dist/` is under cache on TravisCI which is S3 bucket that gets synced automatically with the content of this folder while running a build. It you un-tar the whole archive there - it will take forever to sync \w S3 and will defeat the purpose of cache on CI side, making build times longer.

If you ask me - I would say that before doing such big changes as refactoring of the build structure we all need very clear understanding and explanation of `what is the benefit` and what problem does this change solves.

So far I have not understood the answer to the questions above from PR description (may be my fault). But, in case of voting for such change, will make me at least `-0` for if, if not `-1` due to potential bugs that such changes will bring.

If that is reduction of convenience binary size - then we need to know how much does the size changes with the proposed changes to understand if that is worth. If that impacts CI build times - we also need to know how much.

Also regarding user experience - while running `zeppelin-demon.sh` user does not usually expect it to be network-dependant and download 100Mb archives - is there at least a user notification\progress indicator? Otherwise there going to be bug reports like "Zeppelin is not starting" as soon as such change is introduced.
And how about Windows users of Zeppelin? How about EMR\Dataproc\Juju\BigTop users, will the proposed change affect them?

Please take it with the grain of salt, and of course I will be happy to help addressing each item addressed one by one.

[GitHub] zeppelin issue #1339: [WIP][ZEPPELIN-1332] Remove spark-dependencies & sugge...

Posted by bzz <gi...@git.apache.org>.

Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    Thank you for kind explanation and feedback. I think you proposal and implementation with recent updates makes perfect sense.
    
    Please keep up a good work and ping me back for the final review, once you think it's ready!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by tae-jun <gi...@git.apache.org>.

Github user tae-jun commented on the issue:

https://github.com/apache/zeppelin/pull/1339

Tested and worked as expected \U0001f604

Fantastic work! But, there are other benefits caused by this change.

Before this change, users couldn't use `SPARK_SUBMIT_OPTIONS` env variable using embedded Spark. (am i right?)

But now it's possible! I tested with `export SPARK_SUBMIT_OPTIONS="--driver-memory 4G"` and on Spark UI, I could check It works.

![image](https://cloud.githubusercontent.com/assets/8201019/20149088/128d4bc4-a6f3-11e6-9990-705040e04a59.png)

Therefore, I think it would be better to update `conf/zeppelin-env.sh`. There is a comment which is:

```sh
## Use provided spark installation ##
## defining SPARK_HOME makes Zeppelin run spark interpreter process using spark-submit
##
# export SPARK_HOME # (required) When it is defined, load it instead of Zeppelin embedded Spark libraries
# export SPARK_SUBMIT_OPTIONS # (optional) extra options to pass to spark submit. eg) "--driver-memory 512M --executor-memory 1G".
# export SPARK_APP_NAME # (optional) The name of spark application.

## Use embedded spark binaries ##
## without SPARK_HOME defined, Zeppelin still able to run spark interpreter process using embedded spark binaries.
## however, it is not encouraged when you can define SPARK_HOME
##
```

This should be updated properly. In my opinion, it doesn't need to encourage use external spark anymore :-)

And, is it possible to use embedded spark without `get-spark`? If not, I think it should be written on README clearly.

LGTM \U0001f44d

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    I'm closing this PR since there'll be better solution for this (e.g. [ZEPPELIN-1993](https://issues.apache.org/jira/browse/ZEPPELIN-1993)) :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @astroshim It passed at last!! Thanks again. 
    
    Will update the related docs if there are no further discussions about this changes :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @bzz Yeah I also wanted to get more and more feedbacks for this change since it's a huge change as you said. Thanks for asking and i'm willing to explain again :)
    
    > ** 1.**  Is the comment above is how it works now? Meaning, does on the first run of ./bin/zeppelin-deamon.sh or ./bin/zeppelin.sh a download of Apache Spark (100+Mb) happen, without asking a user?
    
    First time, I intended to ask sth like "Do you want to download local Spark?" when user starts Zeppelin daemon. But there are lot's of things to think about more since this question will be added before Zeppelin server start. e.g. [Some ppl are using Zeppelin as a start up service](https://github.com/apache/zeppelin/pull/1339#issuecomment-250672904) with their script as @jongyoul said. This kind of interactive mode will bother their env. 
    So I decided to download this local Spark with `./bin/zeppelin-daemon.sh get-spark` or `./bin/zeppelin.sh get-spark`. With `get-spark` option, users don't need to be asked and they can choose whether they download this local mode Spark or not. Also they can use this local Spark without any configuration aka `zero configuration`. But we need to notice them the existence of `get-spark` option. That's why I updated documentation pages to let them know. 
    
    > **2.** does this also mean that on CI it will happen on every run of SeleniumTests as well?
    This change won't effect to CI build. I added `./bin/download-spark.sh` to download Spark only when the user run `./bin/zeppelin-daemon.sh get-spark`.  
    
    > **3.** -Ppyspark disappeared, but I remember it was added because we need to re-pack some files from Apache Spark to incorporate them in Zeppelin build in order for it to work on a cluster. Is it not the case any more? For Spark standalone and YARN, etc
    
    `pyspark` profile only exists in `spark-dependency` (Please see [here](https://github.com/apache/zeppelin/blob/master/spark-dependencies/pom.xml#L820)). Since `spark-dependencies` won't be existed anymore, `-Ppyspark` needs to be removed accordingly I guess. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by bzz <gi...@git.apache.org>.

Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    Guys, what great work here, simplifying the build! 
    
    A quick question @AhyoungRyu as it's kind of a big change, and I'm sorry if that was explained before, but could you please recap:
    
    ```
    AhyoungRyu commented on Sep 17
    @Leemoonsoo Thanks for your quick feedback! 
    The "zero configuration like before" makes sense. Let me update and will ping you again.
    ```
    
     1. Is the comment above is how it works now? Meaning, does on the first run of `./bin/zeppelin-deamon.sh` or `./bin/zeppelin.sh` a download of Apache Spark (100+Mb) happen, without asking a user?
     2. does this also mean that on CI it will happen on every run of SeleniumTests as well?
     3. `-Ppyspark` disappeared, but I remember it was added because we need to re-pack some files from Apache Spark to incorporate them in Zeppelin build in order for it to work on a cluster. Is it not the case any more? For Spark standalone and YARN, etc
    
    Thanks in advance!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    ping \U0001f483 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @jongyoul Thanks. Yeah i just wanted to get feedback about the change before updating the docs. 
    @Leemoonsoo What do you think? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    I think [ZEPPELIN-1101](https://issues.apache.org/jira/browse/ZEPPELIN-1101) can also be resolved by this change. 
    
    >It looks related to ZEPPELIN-1099 which is about removing dependencies from Spark. I think we don't need to build spark-dependencies by ourselves. we'd better support script to download spark binary and set SPARK_HOME. How about it?
    
    @jongyoul As you replied like above in ZEPPELIN-1101, could you please take a look this one? :)
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by Leemoonsoo <gi...@git.apache.org>.

Github user Leemoonsoo commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @AhyoungRyu Could you rebase and see if CI test goes green?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @tae-jun Appreciate your nice feedback! Will update again `zeppelin-env.sh` and `install.md` instead of `README.md` as you suggested(since #1615 is trying to make it simpler to deliver only key content). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1339: [ZEPPELIN-1332] Remove spark-dependencies & sug...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu closed the pull request at:

    https://github.com/apache/zeppelin/pull/1339


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1339: [ZEPPELIN-1332] Remove spark-dependencies & sug...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu closed the pull request at:

    https://github.com/apache/zeppelin/pull/1339


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by bzz <gi...@git.apache.org>.

Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    Thank you @AhyoungRyu for great job and taking care in addressing the [user experience concerns](https://github.com/apache/zeppelin/pull/1339#issuecomment-259683752)!
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @AhyoungRyu Basically, it's simple solution but you need to update docs. Except that, LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @bzz Just updated `upgrade.md` as your feedback.
    @1ambda Sure. Thanks! Please do :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1339: [ZEPPELIN-1332] Remove spark-dependencies & sug...

Posted by AhyoungRyu <gi...@git.apache.org>.

GitHub user AhyoungRyu reopened a pull request:

    https://github.com/apache/zeppelin/pull/1339

    [ZEPPELIN-1332] Remove spark-dependencies & suggest new way

    ### What is this PR for?
    Currently, Zeppelin's embedded Spark is located under `interpreter/spark/`. 
    For whom **builds Zeppelin from source**, this Spark is downloaded when they build the source with [build profiles](https://github.com/apache/zeppelin#spark-interpreter). I think this various build profiles are useful to customize the embedded Spark, but many Spark users are using their own Spark not Zeppelin's embedded one. Nowadays only Spark&Zeppelin beginners use this embedded Spark. For them, there are too many build profiles(it's so complicated i think). 
    In case of **Zeppelin binary package**, it's included by default under `interpreter/spark/`. That's why Zeppelin package size is so huge. 
    
    #### New suggestions
    This PR will change the embedded Spark binary downloading mechanism like below.
    
    1. `./bin/zeppelin-daemon.sh get-spark` or `./bin/zeppelin.sh get-spark`
    2. create `ZEPPELIN_HOME/local-spark/` and will download `spark-2.0.1-hadoop2.7.bin.tgz` and untar 
    3. we can use this local spark without any configuration like before (e.g. setting `SPARK_HOME`)
    
    ### What type of PR is it?
    Improvement
    
    ### Todos
    - [x] - trap `ctrl+c` & `ctrl+z` key interruption during downloading Spark
    - [x] - test in the different OS 
    - [x] - update related document pages again after get feedbacks
    
    ### What is the Jira issue?
    [ZEPPELIN-1332](https://issues.apache.org/jira/browse/ZEPPELIN-1332)
    
    ### How should this be tested?
    1. `rm -r spark-dependencies` 
    2. Apply this patch and build with `mvn clean package -DskipTests`
    3. try`bin/zeppelin-daemon.sh get-spark` or `bin/zeppelin.sh get-spark`
    4. should be able to run `sc.version` without setting external `SPARK_HOME`
    
    ### Screenshots (if appropriate)
    - `./bin/zeppelin-daemon.sh get-spark`
    ```
    $ ./bin/zeppelin-daemon.sh get-spark
    Download spark-2.0.1-bin-hadoop2.7.tgz from mirror ...
    
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  178M  100  178M    0     0  10.4M      0  0:00:17  0:00:17 --:--:-- 10.2M
    
    spark-2.0.1-bin-hadoop2.7 is successfully downloaded and saved under /Users/ahyoungryu/Dev/zeppelin-development/zeppelin/local-spark
    ```
    - if `ZEPPELIN_HOME/local-spark/spark-2.0.1-hadoop2.7` already exists
    ```
    $ ./bin/zeppelin-daemon.sh get-spark
    spark-2.0.1-bin-hadoop2.7 already exists under local-spark.
    ```
    
    ### Questions:
    - Does the licenses files need update? no
    - Is there breaking changes for older versions? no
    - Does this needs documentation? Need to update some related documents (e.g. README.md, spark.md and install.md ?)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AhyoungRyu/zeppelin ZEPPELIN-1332

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1339
    
----
commit d377cc6f28dd6cae43364f61135ed8abcba3b269
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-16T15:08:19Z

    Fix typo comment in interpreter.sh

commit 4f3edfd87e84e65789e0e937b5330c16442fcfbe
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T01:52:06Z

    Remove spark-dependencies

commit 99ef019521ca1fd0fc41958b20da8642773825d5
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T07:14:35Z

    Add spark-2.*-bin-hadoop* to .gitignore

commit 4e8d5ff067c5428a5254e45b4de533c56393f7b4
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:22:25Z

    Add download-spark.sh file

commit 6784015b8da439894dd09bbc3e54477a0f3cba84
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:28:51Z

    Remove useless comment line in common.sh

commit c866f0b231432b14c092a365d270e81a2222f54a
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-18T03:32:11Z

    Remove zeppelin-spark-dependencies from r/pom.xml

commit 3fe19bff1bdbdccba63e3163bd7aabfe23a35777
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-21T05:38:55Z

    Change SPARK_HOME with proper message

commit 99545233c0e84f48fbf98da25ad131eeba6dd293
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-06T08:55:20Z

    Check interpreter/spark/ instead of SPARK_HOME

commit e6973b3887e9c0d50a1168f26e6f0337f9f78986
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-06T08:55:40Z

    Refactor download-spark.sh

commit 552185ac03f1b5edc9fabb4d381d471c59078903
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-07T07:48:15Z

    Revert: remove spark-dependencies

commit ffe64d9b264ab3db67d28a045e34c9c4d471058a
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-07T13:23:11Z

    Remove useless ZEPPELIN_HOME

commit 5ed33112d64dc3063a29d515d4987e193a909dd0
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T05:51:40Z

    Change dir of Spark bin to 'local-spark'

commit 1419f0b8d76a8e15ac7646e3827dd536246038d1
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T06:07:20Z

    Set timeout for travis test

commit a813d922ba29b5c392a908c3199050884266b969
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T06:16:54Z

    Add license header to download-spark.cmd

commit 368c15aefd650a59c6fb0fdd040efe1bbb2618cc
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T11:48:43Z

    Fix wrong check condition in common.sh

commit e58075d046f65ae173fecc31c0b648b87f445af4
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T13:14:29Z

    Add travis condition to download-spark.sh

commit 89be91b049a646b1a0fc7dcfeb5e8bfde68bdab4
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T05:42:29Z

    Remove bin/download-spark.cmd again

commit b22364ddba120842933e96eca1e082680cd5407a
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T16:25:31Z

    Remove spark-dependency profiles & reorganize some titles in README.md

commit 24dc95faa39586be323365f21a2beb1f683becf8
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T18:30:41Z

    Update spark.md to add a guide for local-spark mode

commit 2537fa14d5e13c34be9eeab932bf5dc853bda5d4
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T18:49:49Z

    Remove '-Ppyspark' build options

commit ca534e596c36ced04f832b0a7ab7e78e951929e1
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-13T08:09:18Z

    Remove useless creating .bak file process

commit edd525d0f6eac0a956bc64f58e77ac3afc423f58
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-13T11:21:10Z

    Update install.md & spark.md

commit a9b110a809463ac1795e76a30b9cd2df6c40292d
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T09:35:37Z

    Resolve 'sed' command issue between OSX & Linux

commit f383d3afb8f9e2c1e240f69d8d970c469d0a9ced
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T11:20:31Z

    Trap ctrl+c during downloading Spark

commit 527ef5b6518d3477d9731422cad190a59df11d1e
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T11:26:56Z

    Remove useless condition

commit 555372a655b788b3b0fdd85d430b6f063ce13834
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-20T17:05:16Z

    Make local spark mode with zero-configuration as @moon suggested

commit de87cb2adf5ad510a712e4f696ae127c7a414077
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T14:20:31Z

    Modify SparkRInterpreter.java to enable SparkR without SPARK_HOME

commit 1dd51d8e1dcb8d65e22a1cc67a5d089c5d7c196b
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T17:01:40Z

    Remove duplicated variable declaration

commit f068bef554507e7125865f77816986d5b085a7b3
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T17:02:01Z

    Update related docs again

commit 437f2063a39d2a7a583bb647cb885e51a0990098
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-23T05:37:57Z

    Fix typo in SparkRInterpreter.java

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by tae-jun <gi...@git.apache.org>.

Github user tae-jun commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @AhyoungRyu Thanks for taking care of my feedback \U0001f604 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1339: [ZEPPELIN-1332] Remove spark-dependencies & sug...

Posted by AhyoungRyu <gi...@git.apache.org>.

GitHub user AhyoungRyu reopened a pull request:

    https://github.com/apache/zeppelin/pull/1339

    [ZEPPELIN-1332] Remove spark-dependencies & suggest new way

    ### What is this PR for?
    
    Currently, Zeppelin's embedded Spark is located under `interpreter/spark/`. 
    For whom **builds Zeppelin from source**, this Spark is downloaded when they build the source with [build profiles](https://github.com/apache/zeppelin#spark-interpreter). I think this various build profiles are useful to customize the embedded Spark, but many Spark users are using their own Spark not Zeppelin's embedded one. Nowadays only Spark&Zeppelin beginners use this embedded Spark. For them, there are too many build profiles(it's so complicated i think). 
    In case of **Zeppelin binary package**, it's included by default under `interpreter/spark/`. That's why Zeppelin package size is so huge. 
    #### New suggestions
    
    This PR will change the embedded Spark binary downloading mechanism like below.
    
    ![flowchart](https://cloud.githubusercontent.com/assets/10060731/18757089/6034ceb0-812d-11e6-9094-768bee257c9c.png)
    
    The below text will be saved in `spark-2.0.0-hadoop2.7/README.txt` if the user answers "N/n" 
    
    ```
    Please note that you answered 'No' when we asked whether you want to download local Spark binary under ZEPPELIN_HOME/local-spark/ or not.
    
    If you want to use Spark interpreter in Apache Zeppelin, you need to set your own SPARK_HOME.
    
    See http://zeppelin.apache.org/docs/ZEPPELIN_VERSION/interpreter/spark.html#configuration for the further details about Spark configuration in Zeppelin.
    
    ```
    ### What type of PR is it?
    
    Improvement
    ### Todos
    - [x] - trap `ctrl+c` & `ctrl+z` key interruption during downloading Spark
    - [x] - test in the different OS 
    - [ ] - update related document pages again after get feedbacks
    ### What is the Jira issue?
    
    [ZEPPELIN-1332](https://issues.apache.org/jira/browse/ZEPPELIN-1332)
    ### How should this be tested?
    1. `rm -r spark-dependencies` 
    2. Apply this patch and build with `mvn clean package -DskipTests`
    3. Start Zeppelin with `bin/zeppelin-daemon.sh` or `bin/zeppelin.sh`
    ### Screenshots (if appropriate)
    - Without `ZEPPELIN_HOME/local-spark/spark-2.0.0-hadoop2.7`
      - Do you want to download local Spark?  "Yes"
        <img width="939" alt="screen shot 2016-09-23 at 1 33 03 am" src="https://cloud.githubusercontent.com/assets/10060731/18757222/ea30add2-812d-11e6-97e8-b31199b15283.png">
      - "No"
        <img width="953" alt="screen shot 2016-09-23 at 1 34 12 am" src="https://cloud.githubusercontent.com/assets/10060731/18757229/ee6f330a-812d-11e6-84c9-db5fe4d5a35b.png">
    - With `ZEPPELIN_HOME/local-spark/spark-2.0.0-hadoop2.7`
      Nothing happened. Zeppelin will be started like before.
    ### Questions:
    - Does the licenses files need update? no
    - Is there breaking changes for older versions? no
    - Does this needs documentation? Need to update some related documents (e.g. README.md, spark.md and install.md ?)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AhyoungRyu/zeppelin ZEPPELIN-1332

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1339
    
----
commit aaabb9a274810b9bbc903587c715d2589b8ecc0a
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-16T15:08:19Z

    Fix typo comment in interpreter.sh

commit 9b5e7eacc72c613c1dc66502df6d54f82e51d937
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T01:52:06Z

    Remove spark-dependencies

commit cb65e7e5b56dab01412c2cbe8a17e36335f6e4eb
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T07:14:35Z

    Add spark-2.*-bin-hadoop* to .gitignore

commit 126a7470c40518f857db85fc5a003bd8ff5d209e
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:22:25Z

    Add download-spark.sh file

commit 40276e19b3cd394301f0d13869f45c53e0408024
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:28:51Z

    Remove useless comment line in common.sh

commit 8e827577538fc406ddc1c02aa5f618981fc840b8
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-18T03:32:11Z

    Remove zeppelin-spark-dependencies from r/pom.xml

commit 33b9dce0c4cdf23056eb128a35ba65cbb1021b28
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-21T05:38:55Z

    Change SPARK_HOME with proper message

commit 050877c60a9ee9d320134939decf0b3cd8e9c4a3
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-06T08:55:20Z

    Check interpreter/spark/ instead of SPARK_HOME

commit 7990c5aae5da4f7234ebf26e913f2cf7b434d1fb
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-06T08:55:40Z

    Refactor download-spark.sh

commit db53a9e63edb82d417593f5373a9652dc065fcbd
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-07T07:48:15Z

    Revert: remove spark-dependencies

commit f7c5a23199a289bf3941978860ace489e8dff1fe
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-07T13:23:11Z

    Remove useless ZEPPELIN_HOME

commit 455417450fcfa85c71fb0c3d965ffcaded289f4a
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T05:51:40Z

    Change dir of Spark bin to 'local-spark'

commit cc4012eb2a664ac79425cdb0bf6e849ffd87b83b
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T06:07:20Z

    Set timeout for travis test

commit f3ab4756b749841ffe034a5c57f3494889ae87f2
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T06:16:54Z

    Add license header to download-spark.cmd

commit 7cce923097ed48a0bd4873c4de77f56b656fe44f
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T11:48:43Z

    Fix wrong check condition in common.sh

commit ce7766775dd98cf9d7a76984ef5d8cc93977dfef
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-08T13:14:29Z

    Add travis condition to download-spark.sh

commit a5ef077e339b6e45d3d9908ac0102aba3a7f65f1
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T05:42:29Z

    Remove bin/download-spark.cmd again

commit 1edd5fb67486c2b51e51f74ce6fa3fb6962abb41
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T16:25:31Z

    Remove spark-dependency profiles & reorganize some titles in README.md

commit 132d24b35d47dc384999405f90b62706307dc0c4
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T18:30:41Z

    Update spark.md to add a guide for local-spark mode

commit 8e4a256036b8a067114b3b40d6434e6bb478caaa
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-12T18:49:49Z

    Remove '-Ppyspark' build options

commit 117c52d26bcb55dda15b36e416597b6b73803ef9
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-13T08:09:18Z

    Remove useless creating .bak file process

commit 5ba99ea598f091e668fdc3b0f3005a3b2ceb6ea5
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-13T11:21:10Z

    Update install.md & spark.md

commit 075195ddc43bbc5f3798d4ac73e7b5365c0881cb
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T09:35:37Z

    Resolve 'sed' command issue between OSX & Linux

commit b4ef1f54730374e269a405a4f7a9b306cbcdbc24
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T11:20:31Z

    Trap ctrl+c during downloading Spark

commit b21188b3f4b7c5c6c60aa80bd44c061438462914
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-09-14T11:26:56Z

    Remove useless condition

commit ce1d0c44d4755cd9007443ad0178093b691665cd
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-20T17:05:16Z

    Make local spark mode with zero-configuration as @moon suggested

commit 4440554d63ff17ce4e34d9472dc1932a04ca917f
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T06:47:05Z

    Put 'autodetect HADOOP_CONF_HOME by heuristic' back code blocks

commit fb27690d98616156c0a3059b6c748860a7d64788
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T14:20:31Z

    Modify SparkRInterpreter.java to enable SparkR without SPARK_HOME

commit fea61077273e62778eef2f1a637a7dd9b3df5f6d
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T17:01:40Z

    Remove duplicated variable declaration

commit 972fb3e6bab4b41164f70c2abe5b06fc7d09aa6b
Author: AhyoungRyu <ah...@apache.org>
Date:   2016-09-22T17:02:01Z

    Update related docs again

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    CI is green now, so ready for review.
    I updated related docs again based on #1615 and @tae-jun 's feedback as well.
    
    @bzz Could you take a look this again? 
    As I mentioned in [this comment](https://github.com/apache/zeppelin/pull/1339#issuecomment-259683752), I added `You do not have neither local-spark, nor external SPARK_HOME set up.\nIf you want to use Spark interpreter, you need to run get-spark at least one time or set SPARK_HOME.` This msg will be printed when the user starts Zeppelin if he doesn't have neither `local-spark/` yet nor set external `SPARK_HOME` in his machine. Please see [my latest commit](https://github.com/apache/zeppelin/pull/1339/commits/2747d9eec49aa04f92ac93408f4c00cb101cb23e) :) 
    
    Maybe this msg can be removed in the future, when many Zeppelin users can get accustomed to this change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by Leemoonsoo <gi...@git.apache.org>.

Github user Leemoonsoo commented on the issue:

https://github.com/apache/zeppelin/pull/1339

In case of

1) Don't have plan to use spark interpreter, just want to use other interpreters like python, big query.
2) Set SPARK_HOME in interpreter property instead of conf/zeppelin-env.sh

User may not interested in local-spark. but user will keep seeing messages

```
Lees-MacBook:pr1339 moon$ bin/zeppelin-daemon.sh start

You do not have neither local-spark, nor external SPARK_HOME set up.
If you want to use Spark interpreter, you need to run get-spark at least one time or set SPARK_HOME.

Zeppelin start [ OK ]
Lees-MacBook:pr1339 moon$ bin/zeppelin-daemon.sh stop
Zeppelin stop [ OK ]
Lees-MacBook:pr1339 moon$ bin/zeppelin-daemon.sh start

You do not have neither local-spark, nor external SPARK_HOME set up.
If you want to use Spark interpreter, you need to run get-spark at least one time or set SPARK_HOME.

Zeppelin start [ OK ]
```

@AhyoungRyu What do you think?

[GitHub] zeppelin pull request #1339: [ZEPPELIN-1332] Remove spark-dependencies & sug...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu closed the pull request at:

    https://github.com/apache/zeppelin/pull/1339


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1339: [WIP][ZEPPELIN-1332] Remove spark-dependencies ...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu closed the pull request at:

    https://github.com/apache/zeppelin/pull/1339


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1339: [WIP][ZEPPELIN-1332] Remove spark-dependencies ...

Posted by AhyoungRyu <gi...@git.apache.org>.

GitHub user AhyoungRyu reopened a pull request:

    https://github.com/apache/zeppelin/pull/1339

    [WIP][ZEPPELIN-1332] Remove spark-dependencies & suggest new way

    ### What is this PR for?
    Currently, Zeppelin's embedded Spark is located under `interpreter/spark/`. 
    For whom **builds Zeppelin from source**, this Spark is downloaded when they build the source with [build profiles](https://github.com/apache/zeppelin#spark-interpreter). I think this various build profiles are useful to customize the embedded Spark, but many Spark users are using their own Spark not Zeppelin's embedded one. Nowadays only Spark&Zeppelin beginners use this embedded Spark. For them, there are too many build profiles(it's so complicated i think). 
    In case of **Zeppelin binary package**, it's included by default under `interpreter/spark/`. That's why Zeppelin package size is so huge. 
    
    This PR will change the embedded Spark binary downloading mechanism as like below.
    
    1. If users didn't set their own `SPARK_HOME`, [bin/download-spark.sh](https://github.com/AhyoungRyu/zeppelin/blob/5703fbf27fedda9ec7dd142e275b8654c9bc6296/bin/download-spark.sh) will be run when they start Zeppelin server using `bin/zeppelin-daemon.sh` or `bin/zeppelin.sh`.
    2. [bin/download-spark.sh](https://github.com/AhyoungRyu/zeppelin/blob/5703fbf27fedda9ec7dd142e275b8654c9bc6296/bin/download-spark.sh) : download `spark-2.0.0-bin-hadoop2.7.tgz` from mirror site to `$ZEPPELIN_HOME/.spark-dist/` and untar -> set `SPARK_HOME` as `$ZEPPELIN_HOME/.spark-dist/spark-2.0.0-bin-hadoop2.7` -> add this `SPARK_HOME` to `conf/zeppelin-env.sh`
    
    With this new mechanism, we can not only reduce Zeppelin overall binary package size but also user doesn't need to type complicating build profiles when they build Zeppelin source.
    
    ### What type of PR is it?
    Improvement
    
    ### Todos
    * [ ] - update [README.md](https://github.com/apache/zeppelin/blob/master/README.md)
    * [ ] - add `download-spark.cmd` for Window users 
    
    ### What is the Jira issue?
    See [ZEPPELIN-1332](https://issues.apache.org/jira/browse/ZEPPELIN-1332)'s description for the details about **Why we need to remove spark-dependencies** & **New suggestion for Zeppelin's embedded Spark binary**.
    
    
    ### How should this be tested?
    After apply this patch, build with `mvn clean package -DskipTests`. Please note that you need to check `spark-dependencies` is removed well or not.
     - Without prespecified `SPARK_HOME` 
      1. Start Zeppelin daemon
      <img width="975" alt="screen shot 2016-08-18 at 11 20 27 am" src="https://cloud.githubusercontent.com/assets/10060731/17759836/e3c16022-6535-11e6-8576-43975c3293c3.png">
      2. Check `conf/zeppelin-env.sh` line 46. `SPARK_HOME` will be set like below 
      ```
      export SPARK_HOME="/YOUR_ZEPPELIN_HOME/.spark-dist/spark-2.0.0-bin-hadoop2.7"
      ```
      3. Go to Zeppelin website and run `sc.version` with Spark interpreter & `echo $SPARK_HOME` with sh interpreter.
      <img width="1030" alt="screen shot 2016-08-18 at 11 26 21 am" src="https://cloud.githubusercontent.com/assets/10060731/17759937/a7bcc584-6536-11e6-9664-cffdc6e5bdf8.png">
    
     - With prespecified `SPARK_HOME`
    Nothing happened. Zeppelin will be started as like before.
     
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? no
    * Is there breaking changes for older versions? no
    * Does this needs documentation? need to update [README.md](https://github.com/apache/zeppelin/blob/master/README.md)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AhyoungRyu/zeppelin ZEPPELIN-1332

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1339
    
----
commit ae74e90f8409b7396eeebf34c103a6db071b1771
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-16T15:08:19Z

    Fix typo comment in interpreter.sh

commit ada6f37d1df60f37740d63c913cdd89f7b919269
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T01:52:06Z

    Remove spark-dependencies

commit 87b929d7d38e447306796cec44b35cb7317b9bb3
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T07:14:35Z

    Add spark-2.*-bin-hadoop* to .gitignore

commit 5703fbf27fedda9ec7dd142e275b8654c9bc6296
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:22:25Z

    Add download-spark.sh file

commit 35350bb9990436cd7ede1e611f0b94a56ed24793
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-17T15:28:51Z

    Remove useless comment line in common.sh

commit d6500a854c0a6a3616023c507fbdd061ae731288
Author: AhyoungRyu <fb...@hanmail.net>
Date:   2016-08-18T03:32:11Z

    Remove zeppelin-spark-dependencies from r/pom.xml

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by 1ambda <gi...@git.apache.org>.

Github user 1ambda commented on the issue:

https://github.com/apache/zeppelin/pull/1339

Short summary and small thought about #1399

1. Using symlink like `local-spark/master` would be safe i think. It enables user replace his local spark without renaming directories. Currently we are using hard coded name.

```
SPARK_CACHE="local-spark"
SPARK_ARCHIVE="spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}"
```

2. about UX,

- most (experienced in zeppelin) users do not use local spark
- for new comers, we can provide embedded spark using docker will be shipped by #1538
- and storking `get-spark` is not too hard even if new users do not use the docker images.

3. Now users need to type `get-spark`. it works as described

```
$ zeppelin-review git:(pr/1339) ./bin/zeppelin-daemon.sh start
Log dir doesn't exist, create /Users/1ambda/github/apache-zeppelin/zeppelin-review/logs
Pid dir doesn't exist, create /Users/1ambda/github/apache-zeppelin/zeppelin-review/run

You do not have neither local-spark, nor external SPARK_HOME set up.
If you want to use Spark interpreter, you need to run get-spark at least one time or set SPARK_HOME.

Zeppelin start [ OK ]
$ zeppelin-review git:(pr/1339) ./bin/zeppelin-daemon.sh stop
Zeppelin stop [ OK ]
$ zeppelin-review git:(pr/1339) ./bin/zeppelin-daemon.sh get-spark
Download spark-2.0.1-bin-hadoop2.7.tgz from mirror ...

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 178M 100 178M 0 0 7157k 0 0:00:25 0:00:25 --:--:-- 6953k

spark-2.0.1-bin-hadoop2.7 is successfully downloaded and saved under /Users/lambda/github/apache-zeppelin/zeppelin-review/local-spark

$ zeppelin-review git:(pr/1339) ./bin/zeppelin-daemon.sh start
Zeppelin start [ OK ]
```

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @jongyoul Thanks for your feedback! Yeah I didn't try to cover that case. So you mean we need to support ppl who are using [this upstart option](http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/install.html#optional-start-apache-zeppelin-with-a-service-manager), am I right? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1339: [ZEPPELIN-1332] Remove spark-dependencies & sug...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu closed the pull request at:

    https://github.com/apache/zeppelin/pull/1339


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @astroshim Appreciate for your help! I've just pushed it and let's wait until it finished :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [WIP][ZEPPELIN-1332] Remove spark-dependencies & sugge...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @bzz Thank you for saying so! Then I'll continue my work in here and let you know :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @AhyoungRyu Thanks for your effort. LGTM. But I think it would be better to support non-interactive mode for running the server because some of users launches Zeppelin as a start-up service for their server and interactive mode would break this feature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by 1ambda <gi...@git.apache.org>.

Github user 1ambda commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    Let me also review this great PR and then give some feedbacks \U0001f44d 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    I think this PR is working well as expected(at least to me haha). So ready for review again.
    @moon If you possible, could you please check this one again? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @Leemoonsoo @jongyoul Sorry for my late update. 
    I've just added new option: `get-spark` to [zeppelin-daemon.sh](https://github.com/apache/zeppelin/pull/1339/files#diff-bd1714fd11d1853b691468647374113dR23) and [zeppelin.sh](https://github.com/apache/zeppelin/pull/1339/files#diff-1724182f3ebaf54f5c9e202dcdf82415R46) to download local Spark binary. I think this is more simpler than getting user's answer and then separating "interactive mode" and "non-interactive mode" that @jongyoul mentioned in [here](https://github.com/apache/zeppelin/pull/1339#issuecomment-250672904).
    
    So to sum up, ppl can download local Spark with `./bin/zeppelin-daemon.sh get-spark` or `./bin/zeppelin.sh get-spark` with my latest update. If this way is okay, i'll update related docs pages accordingly. Maybe we need to let ppl know the existence of `get-spark` option by updating documentation i think.
    
    What do you think? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    @Leemoonsoo "zero configuration like before" makes sense. Let me update and will ping you again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...

Posted by AhyoungRyu <gi...@git.apache.org>.

Github user AhyoungRyu commented on the issue:

    https://github.com/apache/zeppelin/pull/1339
  
    Updated the related docs pages ([README.md](https://github.com/apache/zeppelin/pull/1339/files#diff-04c6e90faac2675aa89e2176d2eec7d8), [spark.md](https://github.com/apache/zeppelin/pull/1339/files#diff-83df2e7970d5a53a9028d05098bc626d), [upgrade.md](https://github.com/apache/zeppelin/pull/1339/files#diff-f472957a611b3e4d6c1171edca51cf93) and [install.md](https://github.com/apache/zeppelin/pull/1339/files#diff-f472957a611b3e4d6c1171edca51cf93)) and CI has passed now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---