You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@openwhisk.apache.org by "Jiang PengCheng (Confluence)" <no...@apache.org> on 2021/05/25 08:50:15 UTC

[CONF] OpenWhisk > MongoDB Artifact Store

|
[![](cid:avatar_4567d400ca6186bc9324eca0c5148733)](https://cwiki.apache.org/confluence/display/~jiang.pengcheng?src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1621932615205&src.mail.notification=com.atlassian.confluence.plugins.confluence-
content-notifications-plugin%3Apage-created-
notification&src.mail.recipient=8aa980875bf24635015c9267bc8e02f6) | Jiang
PengCheng **created** a page  
---|---  
|  
---  
|  | [![page icon](cid:page-
icon)](https://cwiki.apache.org/confluence/display/OPENWHISK/MongoDB+Artifact+Store?src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1621932615205&src.mail.notification=com.atlassian.confluence.plugins.confluence-
content-notifications-plugin%3Apage-created-
notification&src.mail.recipient=8aa980875bf24635015c9267bc8e02f6&src.mail.action=view
"page icon")  
---  
[MongoDB Artifact
Store](https://cwiki.apache.org/confluence/display/OPENWHISK/MongoDB+Artifact+Store?src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1621932615205&src.mail.notification=com.atlassian.confluence.plugins.confluence-
content-notifications-plugin%3Apage-created-
notification&src.mail.recipient=8aa980875bf24635015c9267bc8e02f6&src.mail.action=view
"MongoDB Artifact Store")  
|

## Introduction

This implement MongoDBArtifactStore which can replace CouchDB, and it can also
work along with ElasticSearch as activations store backend.

  

## Design

### Data Scheme

The data scheme in MongoDB is almost same with CouchDB, except 4 differences:

  1.   _`annotations `_ field in MongoDB is a string instead of array of objects in CouchDB, this is because that it may use arbitrary strcut while MongoDB doesn't support "$" as the first char for field name, so we need to convert this field in to a raw json string before store in to MongoDB, and convert it back to array of object when fetch it
  2. _`parameters`   _field is same as  _`annotations.`_
  3. _`_rev`_  field will not be generated automatically in MongoDB, so it is calculted and inserted in code explicitly
  4. there is a  __computed_ field, which store some extra fields to help to query,

Below is an example:

| CouchDB | MongoDB  
---|---  
      
    
    {
      "_id": "whisk.system/invokerHealthTestAction0",
      "_rev": "68-e72440f911c64ab11441c09e730e5ab8",
      "name": "invokerHealthTestAction0",
      "publish": false,
      "annotations": [],
      "version": "0.0.1",
      "updated": 1524476933182,
      "entityType": "action",
      "exec": {
        "kind": "nodejs:6",
        "code": "function main(params) { return params; }",
        "binary": false
      },
      "parameters": [],
      "limits": {
        "timeout": 60000,
        "memory": 256,
        "logs": 10
      },
      "namespace": "whisk.system"
    }

|

    
    
    {
      "_id" : "whisk.system/invokerHealthTestAction0",
      "name" : "invokerHealthTestAction0",  
      "_computed" : {  
        "rootns" : "whisk.system"  
      },
      "publish" : false,
      "annotations" : "[ ]",
      "version" : "0.0.1",
      "updated" : NumberLong("1524473794826"),
      "entityType" : "action",
      "exec" : {
        "kind" : "nodejs:6",
        "code" : "function main(params) { return params; }",
        "binary" : false
      },
      "parameters" : "[ ]",
      "limits" : {
        "timeout" : 60000,
        "memory" : 256,
        "logs" : 10
      },
      "namespace" : "whisk.system"
    }  
  
### Attachment

MongoDB use [GridFS](https://docs.mongodb.com/manual/core/gridfs/index.html)
to store and retrieve files that exceed the BSON-document size limit of 16 MB.

Attachment in MongoDB is stored in a separate collection with a independent
`_id`, this PR use the `doc._id + doc.file_name` as the attachment's `_id`
field, then we can find the relative attachment easily.

  

## Implementation

There are 5 brand new files except ansible scripts and testing files:

###
common/scala/src/main/scala/org/apache/openwhisk/core/database/mongodb/MongoDBArtifactStoreProvider.scala

Just work like `CouchDbStoreProvider`

It creates a singleton mongodb client so that
WhiskAuthStore/WhiskEntityStore/WhiskActivationStore will share one client.

###
common/scala/src/main/scala/org/apache/openwhisk/core/database/mongodb/MongoDBArtifactStore.scala

An implementation of `trait ArtifactStore[DocumentAbstraction]`.

There are some private methods need to mention:

  * **attach(d: DocumentAbstraction, name: String, contentType: ContentType, docStream: Source[ByteString, _])(implicit transid: TransactionId): Future[AttachResult]** :

          this will save **action's** attachment to MongoDB's gridFSBucket, since the attachment is in a **Source** format, so we need to use a **Sink  **to process it, this is what **MongoDBAsyncStreamSink** used for:
    
    
        val uploadStream = gridFSBucket.openUploadStream(BsonString(s"$id/$name"), name, option)
        val sink = MongoDBAsyncSt
    reamSink(uploadStream)
    
        val f = docStream
          .runWith(combinedSink(sink))
          .map { r =>
            transid
              .finished(this, start, s"[ATT_PUT] '$collName' completed uploading attachment '$name' of document '$id'")
            AttachResult(r.digest, r.length)
          }
          .recover {
            case t: MongoException =>
              transid.failed(
                this,
                start,
                s"[ATT_PUT] '$collName' failed to upload attachment '$name' of document '$id'; error code '${t.getCode}'",
                ErrorLevel)
              throw new Exception("Unexpected mongodb server error: " + t.getMessage)
          }
    
    

  * **readAttachmentFromMongo[T](doc: DocInfo, attachmentUri: Uri, sink: Sink[ByteString, Future[T]])(implicit transid: TransactionId): Future[T]:**

  Contrary to **attach** method, it read attchment from MongoDB, since we use
a **Sink  **to get the result, we need to construct a **Source** for MongoDB
attachment, so we have a **MongoDBAsyncStreamSource** :

    
    
        val downloadStream = gridFSBucket.openDownloadStream(BsonString(s"${doc.id.id}/$attachmentName"))
    
        def readStream(file: GridFSFile) = {
          val source = MongoDBAsyncStreamSource(downloadStream)
          source
            .runWith(sink)
            .map { result =>
              transid
                .finished(
                  this,
                  start,
                  s"[ATT_GET] '$collName' completed: found attachment '$attachmentName' of document '$doc'")
              result
            }
        }
    
        def getGridFSFile = {
          downloadStream
            .gridFSFile()
            .head()
            .transform(
              identity, {
                case ex: MongoGridFSException if ex.getMessage.contains("File not found") =>
                  transid.finished(
                    this,
                    start,
                    s"[ATT_GET] '$collName', retrieving attachment '$attachmentName' of document '$doc'; not found.")
                  NoDocumentException("Not found on 'readAttachment'.")
                case ex: MongoGridFSException =>
                  transid.failed(
                    this,
                    start,
                    s"[ATT_GET] '$collName' failed to get attachment '$attachmentName' of document '$doc'; error code: '${ex.getCode}'",
                    ErrorLevel)
                  throw new Exception("Unexpected mongodb server error: " + ex.getMessage)
                case t => t
              })
        }
    
        val f = for {
          file <- getGridFSFile
          result <- readStream(file)
        } yield result

  * **revisionCalculate(doc: JsObject): (String, String):**

          Calculate revision based on entity content, will return **old_revision(empty if not exist)** and **new_revision**

  * **getCollectionAndCreateIndexes(): MongoCollection[Document]:**

**          **This method will create indices for collection if indices not
exist

  * **encodeFields(fields: Seq[String], jsValue: JsObject): JsObject:**

          encode JsValue which has complex and arbitrary structure to JsString, used for **annotations** and **params** fields

  * **decodeFields(fields: Seq[String], jsValue: JsObject): JsObject:**

          decode fields from JsString

###
common/scala/src/main/scala/org/apache/openwhisk/core/database/mongodb/MongoDBAsyncStreamSink.scala

This file defines a **Sink** to process attachment **Source  **and save to
MongoDB GridFSBucket

###
common/scala/src/main/scala/org/apache/openwhisk/core/database/mongodb/MongoDBAsyncStreamSource.scala

This file defines a **Source** to get attachment from MongoDB GridFSBucket and
processed to a bytestring  through a **Sink**

###
**common/scala/src/main/scala/org/apache/openwhisk/core/database/mongodb/MongoDBViewMapper.scala**

To simulate design doc features of CouchDB/Cloudant, community has defined
**CosmosDBViewMapper** for CosmosDB and   **MemoryViewMapper** for Memory
based database(for test &develop only), and obviously this file is for
MongoDB.

We know that CouchDB's design doc can used to query against on computed key,
like a document

    
    
    {
      "name: "Jack"
      "nation": "Eng"
    }

We can create a design doc

    
    
    function (doc) {
      emit(doc.name + "/" + doc.nation, 1);
    }
    

Then we can query with a new computed key **doc.name + "/" + doc.nation.**

To simulate this, some extra fileds under **_computed  **will be saved to
MongoDB, for activations, it is

    
    
        "_computed" : {
            "deleteLogs" : true,
            "nspath" : "whisk.system/hello"
        },

And for other entities, is

    
    
        "_computed" : {
            "rootns" : "whisk.system"
        },

With these extra computed fields, we can do similar design doc queries on
MongoDB.

There are 1 variables and 2 methods which are overrideable in trait
**MongoDBViewMapper** :

  1. **val indexes: List[Document]:  **define required indices for related collection, so system can create indices to speed up query requests
  2. **def filter(ddoc: String, view: String, startKey: List[Any], endKey: List[Any]): Bson:** generate a MongoDB query DSL which should be identical with the given design doc query DSL for CouchDB
  3. **def sort(ddoc: String, view: String, descending: Boolean): Option[Bson]:** generate a sort query DSL for MongoDB

  
|  | [![View page Icon](cid:com.atlassian.confluence.plugins.confluence-email-
resources_view-page-email-adg-footer-
item_icon)](https://cwiki.apache.org/confluence/display/OPENWHISK/MongoDB+Artifact+Store?src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1621932615205&src.mail.notification=com.atlassian.confluence.plugins.confluence-
content-notifications-plugin%3Apage-created-
notification&src.mail.recipient=8aa980875bf24635015c9267bc8e02f6&src.mail.action=view
"View page Icon") | [View
page](https://cwiki.apache.org/confluence/display/OPENWHISK/MongoDB+Artifact+Store?src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1621932615205&src.mail.notification=com.atlassian.confluence.plugins.confluence-
content-notifications-plugin%3Apage-created-
notification&src.mail.recipient=8aa980875bf24635015c9267bc8e02f6&src.mail.action=view
"View page") | •  
---|---|---  
[![Like Icon](cid:com.atlassian.confluence.plugins.confluence-like_view-email-
adg-content-
item_icon)](https://cwiki.apache.org/confluence/plugins/likes/like.action?contentId=181307804&src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1621932615205&src.mail.notification=com.atlassian.confluence.plugins.confluence-
content-notifications-plugin%3Apage-created-
notification&src.mail.recipient=8aa980875bf24635015c9267bc8e02f6&src.mail.action=like&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ4c3JmOjhhYTk4MDg3NWJmMjQ2MzUwMTVjOTI2N2JjOGUwMmY2IiwicXNoIjoiOGY2Yjg5ZmU1OWQwYTk2ZWU0MjQxYmMzODRkZTVjYjhhZWMyMjhmZGVmYTg2MjMxYzU4NDIxZjI0NzhjYWQ4YyIsImlzcyI6ImNvbmZsdWVuY2Vfbm90aWZpY2F0aW9uc0FSRUgtWFVEMS1QT1FHLUNTQU8iLCJleHAiOjE2MjI1Mzc0MTUsImlhdCI6MTYyMTkzMjYxNX0.QXLL1mQCgEBGD8ZYZF-8j_Gy2W0Bio1LEBw3k6JqXbk
"Like Icon") |
[Like](https://cwiki.apache.org/confluence/plugins/likes/like.action?contentId=181307804&src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1621932615205&src.mail.notification=com.atlassian.confluence.plugins.confluence-
content-notifications-plugin%3Apage-created-
notification&src.mail.recipient=8aa980875bf24635015c9267bc8e02f6&src.mail.action=like&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ4c3JmOjhhYTk4MDg3NWJmMjQ2MzUwMTVjOTI2N2JjOGUwMmY2IiwicXNoIjoiOGY2Yjg5ZmU1OWQwYTk2ZWU0MjQxYmMzODRkZTVjYjhhZWMyMjhmZGVmYTg2MjMxYzU4NDIxZjI0NzhjYWQ4YyIsImlzcyI6ImNvbmZsdWVuY2Vfbm90aWZpY2F0aW9uc0FSRUgtWFVEMS1QT1FHLUNTQU8iLCJleHAiOjE2MjI1Mzc0MTUsImlhdCI6MTYyMTkzMjYxNX0.QXLL1mQCgEBGD8ZYZF-8j_Gy2W0Bio1LEBw3k6JqXbk
"Like")  
---|---  
  
|  | [Stop watching
space](https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=OPENWHISK&src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1621932615205&src.mail.notification=com.atlassian.confluence.plugins.confluence-
content-notifications-plugin%3Apage-created-
notification&src.mail.recipient=8aa980875bf24635015c9267bc8e02f6&src.mail.action=stop-
watching&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ4c3JmOjhhYTk4MDg3NWJmMjQ2MzUwMTVjOTI2N2JjOGUwMmY2IiwicXNoIjoiODI4OWNkNTM5NzMwNzk5YTUyNjBiODJmZDhlNzE0OTA2NWY0MGUwNzAwZDUzM2UxYWI5OTM1NGExMjUyNjVlOSIsImlzcyI6ImNvbmZsdWVuY2Vfbm90aWZpY2F0aW9uc0FSRUgtWFVEMS1QT1FHLUNTQU8iLCJleHAiOjE2MjI1Mzc0MTUsImlhdCI6MTYyMTkzMjYxNX0.nkvBGw8-NQxhqwV_pOOeuZu3CaEio7yrOqWj49iW9RI)
| •  
---|---  
[Manage
notifications](https://cwiki.apache.org/confluence/users/editmyemailsettings.action?src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1621932615205&src.mail.notification=com.atlassian.confluence.plugins.confluence-
content-notifications-plugin%3Apage-created-
notification&src.mail.recipient=8aa980875bf24635015c9267bc8e02f6&src.mail.action=manage)  
---  
| ![Confluence logo big](cid:footer-desktop-logo)  
---  
This message was sent by Atlassian Confluence 7.5.0  
![](cid:footer-mobile-logo)  
---