You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by echarles <gi...@git.apache.org> on 2016/06/09 14:36:58 UTC

[GitHub] incubator-zeppelin pull request #980: [ZEPPELIN-871] [WIP] spark 2.0 interpr...

GitHub user echarles opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/980

    [ZEPPELIN-871] [WIP] spark 2.0 interpreter on scala 2.11

    ### What is this PR for?
    Spark interpreter for spark version 2.0.0 and scala 2.11 (implemented in Scala)
    
    ### What type of PR is it?
    [Feature]
    
    ### Todos
    * [ ] - Clean code
    * [ ] - Test SQL
    * [ ] - Test Python
    * [ ] - Test R
    
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN--871
    
    ### How should this be tested?
    Build it with 
    ```
    mvn install   -Pscala-2.11 -Dscala.binary.version=2.11 -Dscala.version=2.11.8   -Pspark-2.0 -Dspark.version=2.0.0-SNAPSHOT   -Phadoop-2.6 -Dhadoop.version=2.7.2 -Pyarn   -Dmaven.findbugs.enable=false   -Drat.skip=true   -Ppyspark   -Psparkr   -Dcheckstyle.skip=true   -Dcobertura.skip=true   -Pbuild-distr   -pl '!alluxio,!cassandra,!elasticsearch,!file,!flink,!hbase,!hive,!ignite,!jdbc,!kylin,!lens,!phoenix,!postgresql,!tajo'   -DskipTests
    ```
    Run and test the spark paragraph.
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? N
    * Is there breaking changes for older versions? N
    * Does this needs documentation? Y


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/datalayer/zeppelin-datalayer spark-2.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #980
    
----
commit f041ec3e4b524594a2a32692c5418e04f7335323
Author: Eric Charles <er...@datalayer.io>
Date:   2016-06-09T14:31:50Z

    Initial implemenation for spark 2.0 interpreter on scala 2.11

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin issue #980: [ZEPPELIN-871] [WIP] spark 2.0 interpreter on...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the issue:

    https://github.com/apache/incubator-zeppelin/pull/980
  
    @echarles Thanks for the contribution.
    
    How about we divide problem into
     - scala 2.11 support
     - spark 2.0 support
     - reimplement spark interpreter in scala
    
    @lresende is working on scala 2.11 support on #747. I'm also trying to help merge code for scala 2.10 and 2.11 into one via https://github.com/lresende/incubator-zeppelin/pull/1
    One of the goal in #747 is support scala 2.11 and 2.10 from the single Zeppelin binary package.
    Once #747 is done, this PR can take scala 2.11 support from #747.
    
    Regarding spark 2.0 and reimplementation in scala,
    How do you think about support spark 2.0 as well as spark 1.x while many users will use Zeppelin with spark 1.x for a some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin issue #980: [ZEPPELIN-871] [WIP] spark 2.0 interpreter on...

Posted by echarles <gi...@git.apache.org>.
Github user echarles commented on the issue:

    https://github.com/apache/incubator-zeppelin/pull/980
  
    Sure, we can wait on #747 merge.
    
    I see there is a `spark/src/main/scala-2.11/org/apache/zeppelin/spark/SparkInterpreter.java`: does this mean that there will be 2 separate implementations: one for 2.10 et and for 2.11?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin issue #980: [ZEPPELIN-871] [WIP] spark 2.0 interpreter on...

Posted by echarles <gi...@git.apache.org>.
Github user echarles commented on the issue:

    https://github.com/apache/incubator-zeppelin/pull/980
  
    This is WIP.
    
    I'd like to get feedback on the approach (scala implementation in the current spark module), taking into consideration:
    
    + Spark 2.0 will ship by defaulit on scala 2.11 (makes sense to me to develop the interpeter on scala 2.11)
    + SqlContext and HiveContext are deprected in favor of SparkSession.
    + HttpServer is removed and some classes methods have access restricted to spark only packages.
    + SpakIMain and SparkJLineCompletion don't exist anymore.
    
    Building on the current java classes with method invocation may be possible, but would make the code more than difficult to read and develop.
    
    This PR proposes separated scala classes for this very specific 2.0 API breaking changes.
    
    WDYT?
    
    Based on feedback, I will further validate the functionalities (for now, a simple spark 2.0 call works well on my local env).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---