You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by aspen01 <gi...@git.apache.org> on 2016/11/04 15:41:10 UTC

[GitHub] zeppelin pull request #1600: Using HDFS to backup and restore notebook

GitHub user aspen01 opened a pull request:

    https://github.com/apache/zeppelin/pull/1600

    Using HDFS to backup and restore notebook

    ### What is this PR for?
    This PR supports using HDFS to backup and restore notebook.
    It is similar to https://github.com/apache/zeppelin/pull/1479.
    However this PR just use WebHDFS API, so we don't need to care about hadoop libraries' dependency.
    
    ### What type of PR is it?
    Improvement
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    [https://issues.apache.org/jira/browse/ZEPPELIN-1515](https://issues.apache.org/jira/browse/ZEPPELIN-1515)
    
    ### How should this be tested?
    
    Set the variables in zeppelin-site.xml
    - zeppelin.notebook.storage : org.apache.zeppelin.notebook.repo.HDFSNotebookRepo
    - hdfs.url : Hadoop WebHDFS URL. default: http://localhost:50070/webhdfs/v1/
    - hdfs.user : HDFS user. default: hdfs
    - hdfs.maxlength : Maximum number of lines of results fetched. default: 100000
    - hdfs.notebook.dir : notebook location directory in HDFS. default: /tmp
    
    After zeppelin daemon start, check the notebook directories in ``hdfs.notebook.dir``.
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? Yes


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aspen01/zeppelin branch-0.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1600.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1600
    
----
commit 4b57a2faefdd34f4675e1acd78b25677111a5e57
Author: miru <d....@navercorp.com>
Date:   2016-10-26T09:38:21Z

    Using HDFS to backup and restore notebook

commit 2cc65adabc760c5da032cf1e7525aeece6705799
Author: miru <d....@navercorp.com>
Date:   2016-11-01T09:14:53Z

    add hadoop pseudo auth in GETFILESTATUS, LISTSTATUS
    add header for uploading file using REST API

commit 858e31beed3903fbdd12fe08b88ee028d397903b
Author: miru <d....@navercorp.com>
Date:   2016-11-04T03:19:47Z

    always add hadoop pseudo authentication

commit 45f182419fe451cdee970c95b67704fa6cff09a4
Author: miru <d....@navercorp.com>
Date:   2016-11-04T14:39:35Z

    add document for HDFSNotebookRepo

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    @aspen01 thank you for contribution!
    
    Could you please double-check that all new files, as any other file in Apache Zeppelin, have [ASF's Apache 2.0 license header](http://www.apache.org/legal/src-headers.html)? 
    
    As for copyright notices, could you please help to understand if your contribution contains any [third-party works](http://www.apache.org/legal/src-headers.html#3party) or is that all code, [developed for the ASF](http://www.apache.org/legal/src-headers.html#headers)? 
    
    That would determine, if the existing copyright notice from the files should be moved to root NOTICE file or not. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by aspen01 <gi...@git.apache.org>.
Github user aspen01 commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    @EronWright  Thank you for your feedback. When I first developed the feature, I assumed that HDFS would not be used as default storage because remote FS failure could affect zeppelin usage. So extended VFSNotebookRepo to keep the default storage space on local FS and HDFS as backup storage for failover. This makes the implementation is simple, but it may have created complexity.
    I'm using this feature, but I am also considering using a third-party storage such as HBase, because the latency of HDFS affects the latency of zeppelin notebooks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    I see! This looks like something failing on the TravisCI side though. 
    Could you please rebase this branch on latest master and force-push it here again? This will trigger CI \w latest code\fixes from master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by aspen01 <gi...@git.apache.org>.
Github user aspen01 commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    @bzz Thank you for advice. But the build is failed even though I fixed license and checkstyle.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by aspen01 <gi...@git.apache.org>.
Github user aspen01 commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
     I committed this branch on latest master and force-push it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by keithchambers <gi...@git.apache.org>.
Github user keithchambers commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    Is this ready to be merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    @aspen01 I wonder if any progress on this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1600: Using HDFS to backup and restore notebook

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zeppelin/pull/1600


---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by placeybordeaux <gi...@git.apache.org>.
Github user placeybordeaux commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    I built with ```mvn clean package```, but got the same results with ```mvn clean package -Pspark-1.6 -Phadoop-2.6 -Pyarn -Ppyspark -Psparkr -Pscala-2.10```. It's likely this is just a problem on my side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by aspen01 <gi...@git.apache.org>.
Github user aspen01 commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    @placeybordeaux Thank you.
    @bzz  Do you mean I have to to merge these commits to master not branch-0.6?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    Thank you @aspen01, @placeybordeaux 
    
    In order to be merged, this branch have to be rebased on top of the master, which itself already includes CI fixes, i.e 1c7d8fb. 
    
    This branch must not include any other commits, except for ones that implement "HDFS to backup and restore notebook" - edae7bc, d94eb16 need to be removed.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by placeybordeaux <gi...@git.apache.org>.
Github user placeybordeaux commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    Not sure why the CI hasn't kicked in, but I tried compiling this on my local and I am getting some test failures:
    
    ```
    Results :
    
    Failed tests:
      NotebookTest.testAbortParagraphStatusOnInterpreterRestart:760 expected:<ABORT> but was:<RUNNING>
    
    Tests in error:
      HeliumApplicationFactoryTest.testUnloadOnInterpreterUnbind:232 � ClassCast jav...
      HeliumApplicationFactoryTest.testLoadRunUnloadApplication:148 � ClassCast java...
      HeliumApplicationFactoryTest.testUnloadOnInterpreterRestart:299 � ClassCast ja...
      HeliumApplicationFactoryTest.testUnloadOnParagraphRemove:193 � ClassCast java....
    
    Tests run: 149, Failures: 1, Errors: 4, Skipped: 0
    ```
    
    Not sure if these are related to the PR at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by aspen01 <gi...@git.apache.org>.
Github user aspen01 commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    @placeybordeaux How did you build a package?
    
    ```
    $ mvn clean package -Pspark-1.6 -Phadoop-2.6 -Pyarn -Ppyspark -Psparkr -Pscala-2.10
    
    ...
    
    
    Results :
    
    Tests run: 66, Failures: 0, Errors: 0, Skipped: 0
    
    ...
    
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 19:23 min
    [INFO] Finished at: 2017-01-11T23:00:58+09:00
    [INFO] Final Memory: 160M/815M
    [INFO] ------------------------------------------------------------------------
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by placeybordeaux <gi...@git.apache.org>.
Github user placeybordeaux commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    Looks like you're missing a hotfix commit: 
    
    https://github.com/apache/zeppelin/commit/1c7d8fb0f7f8dd3cb6ce4837053c11c6453b5f18


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1600: Using HDFS to backup and restore notebook

Posted by EronWright <gi...@git.apache.org>.
Github user EronWright commented on the issue:

    https://github.com/apache/zeppelin/pull/1600
  
    Seems overcomplicated for `HDFSNotebookRepo` to extend `VFSNotebookRepo`, and to be synchronizing the local FS with the remote FS.   Just implement `NotebookRepo` directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---