You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by tae-jun <gi...@git.apache.org> on 2017/02/13 07:34:12 UTC

[GitHub] zeppelin pull request #1826: [ZEPPELIN-1859] Add MongoNotebookRepo

GitHub user tae-jun reopened a pull request:

    https://github.com/apache/zeppelin/pull/1826

    [ZEPPELIN-1859] Add MongoNotebookRepo

    ### What is this PR for?
    This PR adds Mongo notebook storage.
    
    The reason that I made this feature is for HA(High Availability).
    S3 and Git storage are the only available method for HA as far as I know.
    
    I'm managing Ambari cluster in my lab, but Zeppelin is the most vulnerable part of it.
    Because one server contains all Zeppelin notes.
    
    Therefore, by deploying MongoDB's [replica set](https://docs.mongodb.com/manual/replication/) and using it as Zeppelin notebook storage, I would like to achieve HA.
    
    #### The way to use Mongo DB as notebook storage
    ```sh
    export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo 
    ```
    
    or at `zeppelin-site.xml`:
    ```xml
    <property>
      <name>zeppelin.notebook.storage</name>
      <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value>
      <description>notebook persistence layer implementation</description>
    </property>
    ```
    #### Configurable environment variables
    * `ZEPPELIN_NOTEBOOK_MONGO_URI` MongoDB connection URI
    * `ZEPPELIN_NOTEBOOK_MONGO_DATABASE` Database name
    * `ZEPPELIN_NOTEBOOK_MONGO_COLLECTION` Collection name
    * `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` If `true`, automatically import your local notes. Default `false`
    
    They can be configured at `zeppelin-site.xml` as well:
    * `zeppelin.notebook.mongo.uri`
    * `zeppelin.notebook.mongo.database`
    * `zeppelin.notebook.mongo.collection`
    * `zeppelin.notebook.mongo.autoimport`
    
    
    #### Future work
    If we use Mongo DB's [oplog tailing](https://docs.mongodb.com/manual/core/replica-set-oplog/), maybe multi-server architecture is possible.
    
    ### What type of PR is it?
    [Feature]
    
    ### Todos
    * [ ] - Write a documentation for Mongo storage
    
    ### What is the Jira issue?
    https://issues.apache.org/jira/browse/ZEPPELIN-1859
    
    ### How should this be tested?
    #### Install MongoDB (if you don't have)
    ```sh
    brew update
    brew install mongodb
    ```
    #### Build Zepppelin
    ```sh
    mvn clean package -DskipTests
    ```
    #### Run Zeppelin wih Mongo storage
    ```sh
    export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo
    export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true
    bin/zeppelin-daemon.sh restart
    ```
    The default database and collection names are `zeppelin`, `notes` respectively.
    And `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` option will automatically import your `local notes` that don't exist in MongoDB.
    #### Check whether a document in MongoDB updated
    Create, update, remove a note and open mongo shell:
    ```sh
    mongo zeppelin
    ```
    And check state of the note is the same as you think:
    ```sh
    db.notes.findOne({_id: '<NOTE_ID_THAT_YOU_WANT_TO_SEE>'})
    ```
    #### Confirm that configurations works
    ```sh
    export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo
    export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true
    export ZEPPELIN_NOTEBOOK_MONGO_DATABASE=otherdb
    export ZEPPELIN_NOTEBOOK_MONGO_COLLECTION=mynotes
    export ZEPPELIN_NOTEBOOK_MONGO_URI=mongodb://localhost:27017
    bin/zeppelin-daemon.sh restart
    ```
    The collection `mynotes` should be created in db `otherdb`.
    Let's check it!
    ```sh
    mongo otherdb
    db.mynotes.count()
    ```
    The result should not be zero.
    
    #### Confirm that configurations from `zeppelin-site.xml` works
    Open your `conf/zeppelin-site.xml` file (copy from `zeppelin-site.xml.template` if you don't have one), and comment lines below:
    ```xml
    <!--
    <property>
      <name>zeppelin.notebook.storage</name>
      <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value>
      <description>notebook persistence layer implementation</description>
    </property>
    -->
    ```
    And add lines below:
    ```xml
    <property>
      <name>zeppelin.notebook.storage</name>
      <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value>
      <description>notebook persistence layer implementation</description>
    </property>
    
    <property>
      <name>zeppelin.notebook.mongo.uri</name>
      <value>mongodb://localhost</value>
      <description>MongoDB connection URI used to connect to a MongoDB database server</description>
    </property>
    
    <property>
      <name>zeppelin.notebook.mongo.database</name>
      <value>zepl</value>
      <description>database name for notebook storage</description>
    </property>
    
    <property>
      <name>zeppelin.notebook.mongo.collection</name>
      <value>notes</value>
      <description>collection name for notebook storage</description>
    </property>
    
    <property>
      <name>zeppelin.notebook.mongo.autoimport</name>
      <value>false</value>
      <description>import local notes into MongoDB automatically on startup</description>
    </property>
    ```
    
    This time we will import a note via `mongoimport`. I made it possible to import a note from JSON just in case.
    ```sh
    cd $ZEPPELIN_HOME/notebook/<NOTE_ID_YOU_WANT_TO_IMPORT>
    mongoimport --db zepl --collection notes --file note.json
    ```
    
    Ensure that your environment variables are clean(just reopen your terminal if you are not), and restart zeppelin:
    ```sh
    bin/zeppelin-daemon.sh restart
    ```
    Open browser and go to `localhost:8080`. The note that you imported should be shown.
    
    ### Questions:
    * Does the licenses files need update? Maybe...? I used [java-mongodb-driver](https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver/3.4.1) which has *The Apache Software License, Version 2.0*
    * Is there breaking changes for older versions? NO
    * Does this needs documentation? YES

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tae-jun/zeppelin ZEPPELIN-1859

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1826.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1826
    
----
commit a4fba8c1ce50aa2a6b1734ff02cd3c954c840352
Author: Jun Kim <i2...@gmail.com>
Date:   2016-12-25T13:24:31Z

    Add MongoNotebookRepo

commit 08eee3dacc880929b630981688ac0b08b60eb8e8
Author: Jun Kim <i2...@gmail.com>
Date:   2016-12-30T12:27:22Z

    fix style check error

commit 77947b81804fc327d70d6c524edfb804a2d889e2
Author: Jun Kim <i2...@gmail.com>
Date:   2017-01-15T17:16:17Z

    Add license information of mongo-java-driver

commit 98282ae5e8906bec4d0a60238db51f8ba24274be
Author: Jun Kim <i2...@gmail.com>
Date:   2017-02-12T11:21:59Z

    Add a documentation for MongoDB notebook storage

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---