You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openwhisk.apache.org by GitBox <gi...@apache.org> on 2018/04/24 07:10:28 UTC

[GitHub] jiangpengcheng opened a new pull request #3570: ArtifactStore implementation for MongoDB

jiangpengcheng opened a new pull request #3570: ArtifactStore implementation for MongoDB
URL: https://github.com/apache/incubator-openwhisk/pull/3570
 
 
   This PR provides a MongoDBArtifactStore implementation to enable using MongoDB for storing subjects, whisks and activation in [MongoDB](https://www.mongodb.com/)
   
   This is currently work in progress and opening this early so as to get feedback on design progress
   
   ## Description
   Some users may want to use other database backends instead of CouchDB to store entities, this PR provides MongoDB as an alternative choice for these users
   
   
   ## MongoDB driver
   MongoDB provides [official drivers](https://docs.mongodb.com/ecosystem/drivers/) for many languages, in this case, we pick the [mongo-scala-driver](http://mongodb.github.io/mongo-scala-driver/2.2/), this driver provides an idiomatic Scala API that is built on top of the MongoDB Async Java driver.
   
   ## Design Considerations
   ### Data Model
   The data scheme in MongoDB is just like CouchDB except that MongoDB doesn't have a `_rev` field, below is the data of `whisk.system/invokerHealthTestAction0` in CouchDB and MongoDB:
   
   <table>
   <tr>
   <td>CouchDB</td>
   <td>MongoDB</td>
   </tr>
   <tr>
   <td>
   <pre>
   {
     "_id": "whisk.system/invokerHealthTestAction0",
     "_rev": "68-e72440f911c64ab11441c09e730e5ab8",
     "name": "invokerHealthTestAction0",
     "publish": false,
     "annotations": [],
     "version": "0.0.1",
     "updated": 1524476933182,
     "entityType": "action",
     "exec": {
       "kind": "nodejs:6",
       "code": "function main(params) { return params; }",
       "binary": false
     },
     "parameters": [],
     "limits": {
       "timeout": 60000,
       "memory": 256,
       "logs": 10
     },
     "namespace": "whisk.system"
   }
   </pre>
   </td>
   <td>
   <pre>
   {
     "_id" : "whisk.system/invokerHealthTestAction0",
     "name" : "invokerHealthTestAction0",
     "publish" : false,
     "annotations" : [ ],
     "version" : "0.0.1",
     "updated" : NumberLong("1524473794826"),
     "entityType" : "action",
     "exec" : {
       "kind" : "nodejs:6",
       "code" : "function main(params) { return params; }",
       "binary" : false
     },
     "parameters" : [ ],
     "limits" : {
       "timeout" : 60000,
       "memory" : 256,
       "logs" : 10
     },
     "namespace" : "whisk.system"
   }
   </pre>
   </td>
   </tr>
   </table>
   
   *The MongoDB use BSON to store data,  `MongoDbStrore` will convert bson to json, this is transparent to users*
   
   Regarding the `_rev` field, there is no alternative way in MongoDB to do the same thing, so the `MongoDbStore` will ignore this field(always empty).
   
   ### Attachment
   MongoDB use [GridFS](https://docs.mongodb.com/manual/core/gridfs/index.html) to store and retrieve files that exceed the BSON-document size limit of 16 MB.
   
   Attachment in MongoDB is stored in a separate collection with a independent `_id`, this PR use the `doc._id + doc.file_name` as the attachment's `_id` field, then we can find the relative attachment easily.
   
   ### Query
   CouchDB use `designDoc` to do the query operation, MongoDB can use `view` feature to achieve the same effect, there are some examples below:
   
   1. whisks-filters.v2.1.0/activations
   ```
   db.createView(
       "whisk_local_whisks.whisks-filters.v2.1.0/activations",
       "whisk_local_whisks",
       [
           { "$match": { "activationId": { "$exists": true } } },
           {
               "$addFields": {
                   "flag": {
                       "$arrayElemAt": [{ "$filter": { "input": "$annotations", "cond": { "$eq": ["$$this.key", "path"] } } }, 0]
                   }
               }
           },
           { "$lookup": { "from": "whisk_local_whisks", "localField": "_id", "foreignField": "_id", "as": "doc" } },
           {
               "$project": {
                   "key": [{ "$ifNull": ["$flag.value", { "$concat": ["$namespace", "/", "$name"] }] }, "$start"],
                   "value": {
                       "namespace": "$namespace",
                       "name": "$name",
                       "version": "$version",
                       "publish": "$publish",
                       "annotations": "$annotations",
                       "activationId": "$activationId",
                       "start": "$start",
                       "end": "$end",
                       "duration": "$duration",
                       "cause": "$cause",
                       "statusCode": "$response.statusCode"
                   },
                   "_id": 0,
                   "id": "$_id",
                   "doc": { "$arrayElemAt": ["$doc", 0] }
               }
           }
       ]
   )
   ```
   
   2. whisks.v2.1.0/packages
   ```
   db.createView(
       "whisk_local_whisks.whisks.v2.1.0/packages",
       "whisk_local_whisks",
       [
           { "$match": { "binding": { "$exists": true } } },
           { "$lookup": { "from": "whisk_local_whisks", "localField": "_id", "foreignField": "_id", "as": "doc" } },
           {
               "$project": {
                   "key": ["$namespace", "$updated"],
                   "value": {
                       "namespace": "$namespace",
                       "name": "$name",
                       "version": "$version",
                       "publish": "$publish",
                       "annotations": "$annotations",
                       "updated": "$updated"
                   },
                   "_id": 0,
                   "id": "$_id",
                   "doc": { "$arrayElemAt": ["$doc", 0] }
               }
           }
       ]
   )
   ```
   
   3. whisks.v2.1.0/actions
   ```
   db.createView(
       "whisk_local_whisks.whisks.v2.1.0/actions",
       "whisk_local_whisks",
       [
           { "$match": { "exec": { "$exists": true } } },
           {
               "$addFields": {
                   "root": { "$arrayElemAt": [{ "$split": ["$namespace", "/"] }, 0] },
                   "value": {
                       "namespace": "$namespace",
                       "name": "$name",
                       "version": "$version",
                       "publish": "$publish",
                       "annotations": "$annotations",
                       "limits": "$limits",
                       "exec": {
                           "binary": { "$ifNull": ["$exec.binary", false] }
                       }
                   },
                   "updated": "$updated",
                   "flag": { "$indexOfBytes": ["$namespace", "/"] }
               }
           },
           { "$lookup": { "from": "whisk_local_whisks", "localField": "_id", "foreignField": "_id", "as": "doc" } },
           {
               "$facet": {
                   "namespaces": [
                       { "$addFields": { "key": ["$namespace", "$updated"] } }
                   ],
                   "packages": [
                       { "$match": { "flag": { "$ne": -1 } } },
                       { "$addFields": { "key": ["$root", "$updated"] } }
                   ]
               }
           },
           { "$project": { "final": { "$concatArrays": ["$namespaces", "$packages"] } } },
           { "$unwind": "$final" },
           {
               "$project": {
                   "key": "$final.key",
                   "value": "$final.value",
                   "doc": { "$arrayElemAt": ["$final.doc", 0] },
                   "id": "$final._id"
               }
           }
       ]
   )
   ```
   
   ### Reduce Design Doc
   Unfortunately, there is no easy way to do the same thing with the CouchDB style `reduce` in MongoDB, because of `reduce` in CouchDB can use `start_key` 、`end_key` as filters to produce a different result while `reduce` in MongoDB can not, so I do a trade-off to simulate some simple CouchDB style `reduce`:
   
   1. first create a reduce view in MongoDB
   
   ```js
   //namespaceThrottlings/blockedNamespaces.reduce
   db.createView(
   	"whisk_local_whisks.namespaceThrottlings/blockedNamespaces.reduce",
   	"whisk_local_whisks",
   	[
   		{
   			"$facet": {
   				"limits": [
   					{ "$match": { "$and": [{ "_id": { "$regex": " / limits$" } }, { "$or": [{ "concurrentInvocations": 0 }, { "invocationsPerMinute": 0 }] }] } },
   					{ "$addFields": { "key": { "$substrBytes": ["$_id", 0, { "$add": [{ "$strLenBytes": "$_id" }, -7] }] } } }
   				],
   				"subjects": [
   					{ "$match": { "$and": [{ "blocked": { "$eq": true } }, { "subject": { "$ne": null } }, { "namespaces": { "$ne": null } }] } },
   					{ "$unwind": "$namespaces" }, { "$addFields": { "key": "$namespaces.name" } }]
   			}
   		},
   		{ "$project": { "final": { "$concatArrays": ["$limits", "$subjects"] } } },
   		{ "$unwind": "$final" },
   		{
   			"$project": {
   				"id": "$final._id",
   				"key": "$final.key",
   				"value": { "$literal": 1 }
   			}
   		}
   	]
   )
   ```
   2. do aggregate on the reduce view:
   ```scala
     override protected[core] def query(table: String,
                                        startKey: List[Any],
                                        endKey: List[Any],
                                        skip: Int,
                                        limit: Int,
                                        includeDocs: Boolean,
                                        descending: Boolean,
                                        reduce: Boolean,
                                        stale: StaleParameter)(implicit transid: TransactionId): Future[List[JsObject]] = {
       //FIXME: staleParameter is meaningless in mongodb
   
       require(!(reduce && includeDocs), "reduce and includeDocs cannot both be true")
   
       val query_collection =
         if (reduce) database.getCollection(s"$collection.$table.reduce")
         else database.getCollection(s"$collection.$table")
   
       val start = transid.started(this, LoggingMarkers.DATABASE_QUERY, s"[QUERY] '$collection' searching '$table")
   
       val fields = if (includeDocs) IndexedSeq("id", "key", "value", "doc") else IndexedSeq("id", "key", "value")
       val sort = if (descending) Sorts.descending("key") else Sorts.ascending("key")
   
       //it's available for the startKey being an empty list, while endKey is not
       val query =
         if (endKey.isEmpty) Filters.gte("key", startKey)
         else Filters.and(Filters.gte("key", startKey), Filters.lte("key", endKey))
   
       val ob =
         if (reduce) {
           query_collection.aggregate(
             List(
               Aggregates.`match`(query),
               Aggregates.group("", Accumulators.sum("value", "$value")),
               Aggregates.project(Projections
                 .fields(Projections.computed("key", "$_id"), Projections.excludeId(), Projections.include("value"))),
               Aggregates.limit(limit),
               Aggregates.skip(skip),
               Aggregates.sort(sort)))
         } else {
           query_collection
             .find(query)
             .limit(limit)
             .skip(skip)
             .projection(Projections.include(fields: _*))
             .sort(sort)
         }
   
       val f = ob.toFuture.map { results =>
         transid.finished(this, start, s"[QUERY] '$collection' completed: matched ${results.size}")
         results.map(result => result.toJson(jsonWriteSettings).parseJson.convertTo[JsObject]).toList
       }
   
       reportFailure(
         f,
         failure =>
           transid
             .failed(this, start, s"[QUERY] '$collection' internal error, failure: '${failure.getMessage}'", ErrorLevel))
     }
   ```
   
   This can simulate simple `reduce` function like `_sum` and `_count`.
   
   ## MongoDB Deployment
   Currently only support deploying MongoDB replica set.
   
   ## Progress
   
   ### Finished Work
   - [x] Basic Usage(create、update、query、delete)
   - [x] Attachment Support
   - [x] Automated deployment of MongoDB using Ansible
   
   ### Pending Work
   - [ ] CI Integration
   - [ ] Add MongoDB support for wskadmin
   - [ ] Documents
   - [ ] MongoDB Sharding Cluster deployment
   
   ## My changes affect the following components
   - [x] Data stores (e.g., CouchDB)
   - [x] Tests
   - [x] Deployment
   - [x] CLI
   - [x] Documentation
   
   ## Types of changes
   - [x] Enhancement or new feature (adds new functionality).
   
   
   ## Checklist:
   
   - [x] I signed an [Apache CLA](https://github.com/apache/incubator-openwhisk/blob/master/CONTRIBUTING.md).
   - [x] I reviewed the [style guides](https://github.com/apache/incubator-openwhisk/wiki/Contributing:-Git-guidelines#code-readiness) and followed the recommendations (Travis CI will check :).
   - [ ] I added tests to cover my changes.
   - [ ] My changes require further changes to the documentation.
   - [ ] I updated the documentation where necessary.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services