You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by zjffdu <gi...@git.apache.org> on 2017/06/30 08:00:12 UTC

[GitHub] zeppelin pull request #2455: ZEPPELIN-1515. Notebook: HDFS as a backend stor...

GitHub user zjffdu opened a pull request:

    https://github.com/apache/zeppelin/pull/2455

    ZEPPELIN-1515. Notebook: HDFS as a backend storage (Read & Write Mode)

    ### What is this PR for?
    This PR is trying to add hdfs as another implementation for `NotebookRepo`. There's another PR about using webhdfs to implement that. Actually hdfs client library is compatibility cross major versions. See http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility
    
    This PR is also required for HA of zeppelin, so that multiple zeppelin instances can share notes via hdfs.  Besides I add hadoop-client in pom file. So zeppelin will package hadoop client jar into its binary distribution. This is because zeppelin may be installed in a gateway machine where no hadoop is installed (only hadoop configuration file is existed in this machine) 
    
    ### What type of PR is it?
    [Feature]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-1515
    
    ### How should this be tested?
    Unit test is added.  Also manually verify it in a single node cluster. 
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zjffdu/zeppelin ZEPPELIN-1515

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/2455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2455
    
----
commit 2857369734d8eb698cde2286c49fdccfd6ed2eea
Author: Jeff Zhang <zj...@apache.org>
Date:   2017-06-30T06:48:22Z

    ZEPPELIN-1515. Notebook: HDFS as a backend storage (Read & Write Mode)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by hayssams <gi...@git.apache.org>.
Github user hayssams commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    @zjffdu I think that the zeppelin.sh file should be updated with the HADOOP_CONF_DIR in the classpath 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Re...

Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    Hi, Recently, I've found some version of hadoop related library make a crash of jetty which is used in Zeppelin Server. I've also changed yarn cluster manager to use another classloader. How do you think you can use another classloader while running hdfs notebook storage? It would make more changes in Zeppelin server side but it will be better for the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by hayssams <gi...@git.apache.org>.
Github user hayssams commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    @zjffdu Yes when you want to start zeppelin not as a daemon. 
    Please also note the docker image rely on zeppelin.sh



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #2455: ZEPPELIN-1515. Notebook: HDFS as a backend stor...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu closed the pull request at:

    https://github.com/apache/zeppelin/pull/2455


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by hayssams <gi...@git.apache.org>.
Github user hayssams commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    Yes but sometimes we need to launch zeppelin through zeppelin.sh equally in Mesos by Marathon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    Also add HADOOP_CONF_DIR in zeppelin.sh, but it looks like there's some code duplication between `zeppelin.sh` and `zeppelin-daemon.sh`, we need a followup PR to remove these duplication. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    @jongyoul Yeah, since your PR is also using the hadoop jar. I think we need to do 2 things:
    * Use the same hadoop jar version
    * Decide whether include it in distribution or not. For this, I personally prefer to include the hadoop jar in distribution. Several Reasons:
      - It is much easy to implement, we don't need to find the jars at runtime by ourselves.
      - Zeppelin might be installed in a gateway machine where hadoop is not installed, in this case, zeppelin would not work because if could not find the hadoop jars.  
      - Spark put hadoop jar in its distribution and approve it can work, so I think we can trust this approach. 
    
    Let me know your concern, thanks. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #2455: ZEPPELIN-1515. Notebook: HDFS as a backend stor...

Posted by zjffdu <gi...@git.apache.org>.
GitHub user zjffdu reopened a pull request:

    https://github.com/apache/zeppelin/pull/2455

    ZEPPELIN-1515. Notebook: HDFS as a backend storage (Use hadoop client jar)

    ### What is this PR for?
    This PR is trying to add hdfs as another implementation for `NotebookRepo`. There's another PR about using webhdfs to implement that. Actually hdfs client library is compatibility cross major versions. See http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility, if using webhdfs, the code become more complicated and may lose some features of hdfs. 
    
    This PR is also required for HA of zeppelin, so that multiple zeppelin instances can share notes via hdfs.  Besides I add hadoop-client in pom file. So zeppelin will package hadoop client jar into its binary distribution. This is because zeppelin may be installed in a gateway machine where no hadoop is installed (only hadoop configuration file is existed in this machine) And since the hadoop client will work with multiple versions of hadoop, so it is fine to package into binary distribution. Spark also package hadoop client jar in its binary distribution. 
    
    ### What type of PR is it?
    [Feature]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-1515
    
    ### How should this be tested?
    Unit test is added.  Also manually verify it in a single node cluster. 
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zjffdu/zeppelin ZEPPELIN-1515

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/2455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2455
    
----
commit 71b978d3d24afe32470b0516eeb6f2caff947cc1
Author: Jeff Zhang <zj...@apache.org>
Date:   2017-06-30T06:48:22Z

    ZEPPELIN-1515. Notebook: HDFS as a backend storage (Read & Write Mode)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #2455: ZEPPELIN-1515. Notebook: HDFS as a backend stor...

Posted by zjffdu <gi...@git.apache.org>.
GitHub user zjffdu reopened a pull request:

    https://github.com/apache/zeppelin/pull/2455

    ZEPPELIN-1515. Notebook: HDFS as a backend storage (Use hadoop client jar)

    ### What is this PR for?
    This PR is trying to add hdfs as another implementation for `NotebookRepo`. There's another PR about using webhdfs to implement that. Actually hdfs client library is compatibility cross major versions. See http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility, if using webhdfs, the code become more complicated and may lose some features of hdfs. 
    
    This PR is also required for HA of zeppelin, so that multiple zeppelin instances can share notes via hdfs.  I add hadoop-client in pom file. So zeppelin will package hadoop client jar into its binary distribution. This is because zeppelin may be installed in a gateway machine where no hadoop is installed (only hadoop configuration file is existed in this machine) And since the hadoop client will work with multiple versions of hadoop, so it is fine to package into binary distribution. Spark also package hadoop client jar in its binary distribution. 
    
    ### What type of PR is it?
    [Feature]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-1515
    
    ### How should this be tested?
    Unit test is added.  Also manually verify it in a single node cluster. 
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zjffdu/zeppelin ZEPPELIN-1515

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/2455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2455
    
----
commit 7059244479eb3299c99bc930b29061d051862a25
Author: Jeff Zhang <zj...@apache.org>
Date:   2017-06-30T06:48:22Z

    ZEPPELIN-1515. Notebook: HDFS as a backend storage (Read & Write Mode)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    I also verified the kerberized hdfs manually in my local box.  @jongyoul @Leemoonsoo @felixcheung @prabhjyotsingh @hayssams Please help review it, Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Re...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    @Leemoonsoo @felixcheung @khalidhuseynov  Could you help review it ? Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #2455: ZEPPELIN-1515. Notebook: HDFS as a backend stor...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu closed the pull request at:

    https://github.com/apache/zeppelin/pull/2455


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    @hayssams Any reason why `zeppelin-daemon.sh` doesn't fit for you ? Because zeppelin does not guarantee `zeppelin.sh` can launch zeppelin server properly. `zeppelin-daemon.sh` not only add `HADOOP_CONF_DIR` but also other libraries, if you don't use `zeppelin-daemon.sh`, zeppelin-server may fail to launch. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Re...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    @jongyoul What kind of issue do you see in cluster manager PR ? Usually this is due to jar version conflict. You can exclude this transitive jar of hadoop client library. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    Thanks for review @prabhjyotsingh @hayssams , will merge it if no more comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by prabhjyotsingh <gi...@git.apache.org>.
Github user prabhjyotsingh commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    Tested on local, works as expected, have tried on both environments, with and without Kerberos.
    LGTM, except a minor comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    Thanks for review. @hayssams I did that in zeppelin-daemon.sh. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #2455: ZEPPELIN-1515. Notebook: HDFS as a backend stor...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zeppelin/pull/2455


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    @zjffdu Almost conflicts will be removed by extracting jars but how we guarantee those libraries are not related my features? I'm just curious.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #2455: ZEPPELIN-1515. Notebook: HDFS as a backend storage (Us...

Posted by hayssams <gi...@git.apache.org>.
Github user hayssams commented on the issue:

    https://github.com/apache/zeppelin/pull/2455
  
    LGTM 👍 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---