You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by khalidhuseynov <gi...@git.apache.org> on 2015/06/26 11:41:58 UTC

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

GitHub user khalidhuseynov opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/123

    Sync functionality with another storage system

    This PR is to handle the synchronization of notebooks between local Zeppelin fs and remote storage system. 
    
    TODO: 
    - [ ] sync non existing files at start
    - [ ] sync modified files at start
    - [ ] save to both storages on save()
    - [ ] delete from both storages on remove()

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/khalidhuseynov/incubator-zeppelin sync-with-remote

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #123
    
----
commit e778c4e95da6bf84ef782c9445ced1ae340f3289
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-26T08:50:33Z

    initial sync implementation
    
    sync of modified notes at the start should be handled

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-123124310
  
    Thanks for the contribution. LGTM!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-121494695
  
    closing to trigger the build again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-zeppelin/pull/123


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-117411618
  
    @guptarajat @Leemoonsoo As @bzz mentioned, with default settings only local storage is used. When  `ConfVars.ZEPPELIN_NOTEBOOK_STORAGE` is exported new value with class name implementing the interface for storage, this sync functionality will be activated. Let me know if some explicit flag you think is better.
    @Leemoonsoo @bzz tests will be added soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-115633893
  
    Thank you for contribution, looks very interesting!
    
    Shall we add [WIP] as prefix for PR name, as AFAIK this is work in progress?
    
    Also, another thing is - as this is a part of Apache project, please create a [JIRA Issue](https://github.com/apache/incubator-zeppelin/blob/master/CONTRIBUTING.md#jira) with brief description of changes you are about to propose and post a link here - this way in case of community feedback to your work it would be saved in Foundation-hosted JIRA history.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-117413041
  
    @khalidhuseynov Can we have flag to skips local storage sync when `ZEPPELIN_NOTEBOOK_STORAGE` is defined? for some use case, user might not want to keep local copy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-117427149
  
    @khalidhuseynov Thanks for the suggestion. But here's couple of reason i suggest.
    
    1. having `ZEPPELIN_NOTEBOOK_LOCAL_STORAGE` and `ZEPPELIN_NOTEBOOK_REMOTE_STORAGE` means changing configuration. That impact existing user.
    
    2. name 'LOCAL_STORAGE'  and 'REMOTE_STORAGE' is not appropriate. because user provided storage layer can be anything. It can be based on local filesystem, it can be based on remote storage, it can be based on something other.
    
    
    Here's my suggestion,
    
    Let ZEPPELIN_NOTEBOOK_STORAGE receive comma separated class name, so user can use 1 to n 
    storage (for now, i think it's okay only handle up to 2 storages). And always synchronize all the storage.
    
    I think we can keep compatibility of the configuration and generalize this feature in this say.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-123553057
  
    master is merged to this branch and i'm merging it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by corneadoug <gi...@git.apache.org>.
Github user corneadoug commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-123128036
  
    Need merge or rebase first.
    This branch is far from master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-117052462
  
    @bzz it's valid concerns and will be addressed now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by bzz <gi...@git.apache.org>.
Github user bzz commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-117382324
  
    I think right now, with default configuration, Zeppelin is only going to use local storage, as before, but in case user put new value to  `ConfVars.ZEPPELIN_NOTEBOOK_STORAGE` - it will only then sync between local and a new one.
    
    @guptarajat @Leemoonsoo will this be enough, or do you think we need an explicit `feature toggle` for the same behaviour?
    
    
    @khalidhuseynov adding tests for `NotebookRepoSync`, to cover the implemented cases is really a must.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-117423228
  
    @Leemoonsoo How about having two env. variables: `ZEPPELIN_NOTEBOOK_LOCAL_STORAGE` and `ZEPPELIN_NOTEBOOK_REMOTE_STORAGE`. Initially `ZEPPELIN_NOTEBOOK_LOCAL_STORAGE` is set to default local storage class name, and if user wants to skip it, then user may update it, say, with empty string ` ` or `none` keyword. `ZEPPELIN_NOTEBOOK_REMOTE_STORAGE` will be needed to explicitly be defined though. Let me know what other options you think would be better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-117428043
  
    @Leemoonsoo thanks for explanation and suggestion as well. It'll be changed to handle comma separated class names then. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by guptarajat <gi...@git.apache.org>.
Github user guptarajat commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-117250817
  
    is this feature guarded by some flag in case I don't want to use this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: [WIP] Sync functionality with ano...

Posted by borisclemencon <gi...@git.apache.org>.
Github user borisclemencon commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-119952798
  
    Thanks @khalidhuseynov for this cool feature. Me and my colleagues work with Amamzon EMR on temporary clusters. We would love to store our notebooks on S3 rather than on the cluster's master node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by guptarajat <gi...@git.apache.org>.
Github user guptarajat commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-117530109
  
    +1 on @Leemoonsoo 's last comment on having a comma separated list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov closed the pull request at:

    https://github.com/apache/incubator-zeppelin/pull/123


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-121477384
  
    @Leemoonsoo please could you review it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by khalidhuseynov <gi...@git.apache.org>.
GitHub user khalidhuseynov reopened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/123

    Sync functionality with another storage system

    This PR is to handle the synchronization of notebooks between local Zeppelin fs and remote storage system. 
    
    Corresponding issue: https://issues.apache.org/jira/browse/ZEPPELIN-133
    
    TODO: 
    - [x] sync non existing files at start
    - [x] sync modified files at start
    - [x] save to both storages on save()
    - [x] delete from both storages on remove()
    - [x] pull new/updated notes from secondary storage 
    - [x] add comma separated storage class definitions
    - [x] corresponding tests
    - [x] handle failure cases of secondary storage
    - [x] handle updated notes on Paragraph level instead of Note class itself

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/khalidhuseynov/incubator-zeppelin sync-with-remote

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/123.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #123
    
----
commit e778c4e95da6bf84ef782c9445ced1ae340f3289
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-26T08:50:33Z

    initial sync implementation
    
    sync of modified notes at the start should be handled

commit 032e59b1244f7c4a66a49ea216065299cd511eed
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-29T05:40:36Z

    add modification time field for Note and NoteInfo

commit 5b8d99486506ae0eb8190821290e7cb035ab2970
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-29T05:51:14Z

    modify containsID() function to return object instead of boolean

commit 5f7ff18d13ba1a967cd7857f0da89e1eafe887e5
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-30T06:31:34Z

    use modTime field for uploading modified notes

commit 51e6271250f9573c0081161de11586597e0daab9
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-30T06:33:25Z

    change modTime of Note object on save()

commit 359a41e1fe551816fb8aed3d2e13a3df22efec14
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-30T06:52:59Z

    minor fix of default classname

commit 3899f8eda2ac708c9a6429cd7c935133015c014c
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-30T07:02:57Z

    minor changes to log comments

commit d001ebf6037852ccd41b79c487e82986ab100198
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-30T07:14:44Z

    function name/description change

commit 7c4366d4c18d0dc0973042ec09a43e4dd7d587e3
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-30T08:26:19Z

    remove commented code section and redundant type conversion

commit b6a3590971ced0558b1c4877e68ec698160147f7
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-30T08:38:40Z

    minor refactoring of getting storage class

commit 3314971c75d7818ab5e7c79a6677308d7a4d2d4b
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-06-30T08:51:14Z

    remove more comments

commit 5a967f1838bce2da24e06fbeec8c7e0b9ad02569
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-07-02T19:31:40Z

    logic ramification; add comma-separated storage class names; use list of repos

commit a441f9d46507774472364005f2ffc680ed92b8fa
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-07-03T08:33:57Z

    catch write to secondary storage failed exception, write to main storage

commit 429372366fe9bc974a8a9231ce04b21d5c8aa20e
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-07-13T09:37:05Z

    add initial tests

commit c2827ff8e7da3eb442d0c2fba5f48e262fde5982
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-07-14T08:13:02Z

    delete modTime field from Note and NoteInfo

commit d33ffc79f34fe286d7e78509d7761ee2ad147ef9
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-07-14T08:57:36Z

    check latest modified from Paragraph fields
    
    dateCreated, dateStarted, dateFinished fields of Paragraph class are used for finding latest modification time of Note

commit cbe167cdbe385881ed7c709053e0c4cad1be9a4b
Author: Khalid Huseynov <kh...@nflabs.com>
Date:   2015-07-15T04:23:31Z

    minor test change + log removal

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: [WIP] Sync functionality with ano...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-115684564
  
    https://issues.apache.org/jira/browse/ZEPPELIN-133


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: Sync functionality with another s...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/123#issuecomment-117267929
  
    I think @guptarajat suggestion make sense. If there're flag to turn on/off, it'll be nice.
    @khalidhuseynov Do you mind add some unittest for it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---