You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by sriramkrishnan <gi...@git.apache.org> on 2015/12/22 01:26:35 UTC

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

GitHub user sriramkrishnan opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/561

    Add a (local mode) Scalding Interpreter to Zeppelin

    ### What is this PR for?
    Scalding (https://github.com/twitter/scalding) is a Scala library for writing MapReduce jobs.
    This issue tracks the addition of a Scalding interpreter for Zeppelin. To keep this work incremental, this PR will focus on just a local mode implementation. The Hadoop mode can be a subsequent addition.
    
    ### What type of PR is it?
    Feature
    
    ### Todos
    * Addition of Hadoop mode for Scalding
    
    ### Is there a relevant Jira issue?
    https://issues.apache.org/jira/browse/ZEPPELIN-526
    
    ### How should this be tested?
    Run the tests in: scalding/src/test/java/org/apache/zeppelin/scalding/ScaldingInterpreterTest.java
    
    ### Screenshots
    <img width="1167" alt="scalding-example" src="https://cloud.githubusercontent.com/assets/1509691/11944979/8788d5c2-a7ff-11e5-9863-2a1216c51896.png">
    <img width="1151" alt="scalding-screenshot" src="https://cloud.githubusercontent.com/assets/1509691/11944978/8787e3ec-a7ff-11e5-8383-456adb16b977.png">
    
    
    ### Questions:
    * This could use documentation, which could just be the example in the screenshot. Where can I contribute that?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sriramkrishnan/incubator-zeppelin scalding

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/561.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #561
    
----
commit 721dcb7bcfbc56d923c101e5b701cd9c2ad23372
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-17T03:34:17Z

    Getting a basic interpreter going. Next step is to hook in the Scalding REPL.

commit 35fc032d7e6cc679cf3604f44cf6ba4a3545a53b
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-17T21:22:52Z

    Initial version of a ScaldingInterpreter running in local mode. Need to get console output next.
    And add tests, and make it work for HDFS.

commit e13576fcfea4fb86f8fafd46df716d2d50ef0b3c
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-17T22:35:53Z

    Now seem to be getting the console out, but only for last line. Will need some debugging.

commit b19fda41b586486e315f9dc6eaa3a71e4ecf7632
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-18T00:20:02Z

    More cleanup - flushing output stream. Still can't seem to get the Scala console output. Need to figure that out.

commit 1ffbb3b2d70d57ff65d0d43ef7f02defe263ac41
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-18T05:00:13Z

    Fixing output of stdout from console

commit d3916b751627b7fd02f9b583214d578236178864
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-18T05:01:02Z

    Adding modified version of ScaldingILoop for grabbing Console output - will want to move this to Scalding itself

commit 36a2dace020bea6bf9787778bfa6bf2ca5900132
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-18T19:58:30Z

    Added a link to the scalding code where the ILoop was lifted from.

commit 8944b0c3aebd2c4c66a12518613cb5f1724c0dee
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-21T22:08:59Z

    Merge remote-tracking branch 'upstream/master' into scalding

commit c27ec48ff73093ebfd9a71a3f0d31b9626edc91c
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-21T22:25:34Z

    Formatting, license

commit 368dc042ef71c08d3e0b5ccc879bef0b1bce2da1
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-21T23:00:25Z

    Cleaning up imports, comments, etc

commit 7ec294138078bc3a196e985a3c3f211c51fe2566
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-21T23:22:08Z

    More code cleanup

commit 7a9ceeb5b801017fb7354ef5b3d0e2b3256d7ae9
Author: Sriram Krishnan <sk...@twitter.com>
Date:   2015-12-22T00:04:21Z

    Adding some tests for the Scalding interpreter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-168097018
  
    Thanks @Leemoonsoo. So how should we proceed? Should I add it as LGPL?
    
    Also, are you OK with using the conjars repo now? Or do you recommend that I add a -Pscalding flag. Would be great to get this merged either ways! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-167517423
  
    Thanks @jongyoul!
     
    @bzz I have gone ahead and manually added all the licenses I could find (phew!). There were some duplicate jars with different versions, which I didn't re-add.
    
    I think I am mostly done with this PR. Let me know if this is good to merge, or if there is anything else I need to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-167619360
  
    Thanks @Leemoonsoo I believe I have addressed all your comments, except for the jgrapht.
    
    Note that the jgrapht here is thirdparty:jgrapht, which seems to be a version that Cascading has forked. I am inclined to leave it as it, but open to suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-168101636
  
    Apart from license for jgrapht, using 3rd party repo (conjars) need to be avoided. How about revert commits for the LICENSE / files added under licenses dir and add -Pscalding flag, for this pullrequest. After it is merged, create another issue for removing -Pscalding and continue to work. Would this way works for you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-167937452
  
    I am OK making the Scalding interpreter optional with a -Pscalding flag. Does that also mean I have to revert all my changes to the LICENSE file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by prasadwagle <gi...@git.apache.org>.
Github user prasadwagle commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-166755452
  
    Should we add ScaldingInterpreter to ZEPPELIN_INTERPRETERS in ZeppelinConfiguration.java?
    
    tw-172-25-131-152 incubator-zeppelin (presto) $ git diff master -- zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java
    diff --git a/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java b/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java
    index 909345a..0c6e9f0 100755
    --- a/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java
    +++ b/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java
    @@ -415,6 +415,7 @@ public class ZeppelinConfiguration extends XMLConfiguration {
             + "org.apache.zeppelin.cassandra.CassandraInterpreter,"
             + "org.apache.zeppelin.geode.GeodeOqlInterpreter,"
             + "org.apache.zeppelin.postgresql.PostgreSqlInterpreter,"
    +        + "org.apache.zeppelin.scalding.ScaldingInterpreter,"
             + "org.apache.zeppelin.kylin.KylinInterpreter"),
         ZEPPELIN_INTERPRETER_DIR("zeppelin.interpreter.dir", "interpreter"),
         ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT("zeppelin.interpreter.connect.timeout", 30000),



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-168023202
  
    fyi, i have got replied about license
    
    >> moon soo Lee
    >> Subject: Questions about license of jgrapht-jdk1.6:0.8.1 binary, like
    >
    > DEC 30, 2015  |  07:13AM PST 
    > Ryan Desmond replied:
    > Hi Moon,
    >
    > Chris has informed me that you can choose either license. That said, he also mentioned that you will need to speak with a lawyer for proper consultation on this issue.
    >
    > Best Regards,
    > Ryan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-168290811
  
    I'm merging it, if there're no more discussions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-167203417
  
    Addressed all PR comments, except for docs, license as @bzz has suggested, which I will do next. Build is also green now.
    
    ps: I do need the hadoop-client jar as the Scalding REPL uses it even for local mode (and provided doesn't work there AFAICT).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-167264787
  
    Added docs.
    
    @bzz I believe I have addressed all comments, except adding LICENSE info to zeppelin-distribution/src/bin_license/LICENSE. Do you happen to have a script for doing that already? I am dreading doing this by hand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-168273426
  
    LGTM. @sriramkrishnan Thanks for the contribution!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-167418142
  
    I've tested it in my local machine and worked well. Thanks for this contribution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-zeppelin/pull/561


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-167936558
  
    I have queried and waiting for answer about license of thirdparty:jgraph.
    
    One of the fastest way to get this merged, making it optional like we did it for geode https://github.com/apache/incubator-zeppelin/pull/379/files.
    
    That would avoid license problem and 3rd party maven repo. Then i think it can be merged.
    
    Eventually, i hope https://issues.apache.org/jira/browse/ZEPPELIN-546 solve the problem.
    i.e. Does not include scaling dependency in binary package of Zeppelin. but release scalding interpreter as a maven artifact. And then Zeppelin can load it, on runtime.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-167935776
  
    \*bump\*
    
    What else do we need to do to get this merged? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-167533045
  
    Thanks @sriramkrishnan for great work.
    
    I think following files
    ```
    LICENSE-paranamer-2.3
    LICENSE-protobuf-2.5.0
    LICENSE-xmlenc-0.52
    ```
    need to be moved from `/licenses` to `zeppelin-distribution/src/bin_license/licenses` directory.
    And if `LICENSE-paranamer-2.3` and `LICENSE-protobuf-2.5.0` can have text format instead of html, that would be easier to read.
    
    
    
    
    For `org.tukaani:xz:jar:1.0`, how about make a section, such as
    ```
    ========================================================================
    For the org.tukaani:xz:jar:1.0
    ========================================================================
    See licenses/LICENSE-tukaani-xy-1.0
    ```
    and copy contents of `http://git.tukaani.org/?p=xz-java.git;a=blob;f=COPYING` into `zeppelin-distribution/src/bin_license/licenses/LICENSE-tukaani-xy-1.0`?
    
    
    
    
    For `jgrapht-jdk1.6`, it became LGPL and EPL dual license since 0.9.x. But I'm not sure we can assume the license of 0.8.1 as EPL, since it was released with LGPL at that time. Can someone clarify?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-168543837
  
    Filed https://issues.apache.org/jira/browse/ZEPPELIN-555 to track addition of Hadoop mode to the Scalding interpreter. I may pick it up in my "free time", but I have added a description there if anyone wants to tackle it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-166764926
  
    @bzz thanks for the review. I will address your other comments and update the PR in a day or so.
    
    @prasadwagle good point. I will add that too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-167944693
  
    Yes it is, changes to LICENSE file and files that has been added to licenses directory need to be reverted. And you probably want to describe -Pscalding flag in https://github.com/apache/incubator-zeppelin/blob/master/README.md


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-168134305
  
    @Leemoonsoo as you suggested, I have reverted commits for the LICENSE / files added under licenses dir, added -Pscalding flag, and updated the docs.
    
    Are you OK with shipping/merging this PR now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-166489064
  
    @sriramkrishnan could you please also add some docs for the new interpreter here http://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/interpreter/ ?
    
    Here are some details on how to [add it](http://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/development/howtocontributewebsite.html).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-168152743
  
    And we also have a green build now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-168290915
  
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Add a (local mode) Scalding Inter...

Posted by sriramkrishnan <gi...@git.apache.org>.
Github user sriramkrishnan commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/561#issuecomment-168290262
  
    Thanks @Leemoonsoo for the review. Could one of the committers please merge if there are no other objections?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---