You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openwhisk.apache.org by GitBox <gi...@apache.org> on 2018/06/19 08:00:20 UTC

[GitHub] chetanmeh opened a new pull request #3779: S3AttachmentStore

chetanmeh opened a new pull request #3779: S3AttachmentStore
URL: https://github.com/apache/incubator-openwhisk/pull/3779
 
 
   This PR introduces a `S3AttachmentStore` which is an `AttachmentStore` implementation for storing attachments in [S3][3] API compatible object storages
   
   It should be possible to use this with [IBM Cloud Object Storage][5] as it supports S3 API. 
   
   ## Description
   
   `S3AttachmentStore` uses [Alpakka S3 Connector][1] for making calls to S3. The object key is constructed in following form
   
       <namespace>/<docId>/<name>
   
   Where 
   
   * `namespace` - Entity type in lowercase. For now only `whiskentity` is used with attachments
       - whiskentity
       - whiskauth
       - whiskactivation 
   * docId - Document entity path
   * name - attachment name. It would be uuid 
   
   
   1. `readAttachment` - Read is done in a streaming way using [GET Object][6]
   2. `attach` - Stream is uploaded via [multipart upload][7] API where upload is done in chunks of 5 MB and then finally stitched together. This avoids buffering the whole stream on disk. So any upload would invoke 2+ remote calls to complete upload
   3. `deleteAttachments` - This is done via [List Objects v2][8] API to list object under prefix `<namespace>/<docId>` and then remove them via [delete object][9] api
   4. `deleteAttachment` - This is done via single [delete object][9] api
   
   ### Usage
   
   To enable use of `S3AttachmentStore` following env variables need to be set
   
   1. `CONFIG_whisk_spi_AttachmentStoreProvider` = `whisk.core.database.s3.S3AttachmentStoreProvider`
   2. `CONFIG_whisk_db_s3_bucket` - Bucket name
   3. `AWS_ACCESS_KEY_ID` - AWS Access Key ID
   4. `AWS_SECRET_ACCESS_KEY`
   5. `AWS_REGION` 
   
   See [Alpakka S3][1] for more configuration details. In OpenWhisk setup the config are under `whisk.db.s3.alpakka` namespace (see s3-reference.conf for details)
   
   ### Testing
   
   By default testing is done by starting a [Minio S3 Proxy][2] via `S3Minio` trait. These test only need docker to be present. 
   
   1. `S3AttachmentStoreMinioTests` - Run the `AttachmentStoreBehaviors` TCK
   2. `S3MemoryArtifactStoreTests` - Runs the `ArtifactStoreAttachmentBehaviors` with a `MemoryArtifactStore` configured with `S3AttachmentStore`
   
   We can also use [S3Mock][4] which is built on top of akka stack but then it may later pose problem if our dependency version diverge. So using Minio for now
   
   ### Dependencies
   
   S3 support pulls in following dependencies
   
   ```
   $ ./gradlew :core:controller:dependencies 
        \--- com.lightbend.akka:akka-stream-alpakka-s3_2.11:0.19
             +--- org.scala-lang:scala-library:2.11.12
             +--- com.typesafe.akka:akka-stream_2.11:2.5.12 (*)
             +--- com.typesafe.akka:akka-http_2.11:10.0.13 -> 10.1.1 (*)
             +--- com.typesafe.akka:akka-http-xml_2.11:10.0.13
             |    +--- org.scala-lang:scala-library:2.11.11 -> 2.11.12
             |    +--- com.typesafe.akka:akka-http_2.11:10.0.13 -> 10.1.1 (*)
             |    +--- com.typesafe.akka:akka-stream_2.11:2.4.20 -> 2.5.12 (*)
             |    \--- org.scala-lang.modules:scala-xml_2.11:1.0.6 (*)
             \--- com.amazonaws:aws-java-sdk-core:1.11.295
                  +--- software.amazon.ion:ion-java:1.0.2
                  \--- joda-time:joda-time:2.8.1
   ```
   
   Add s3 support increases the size of controller.tar from 88 MB (92 jars) to 91 MB (97 jars). Following are version change and new jars being pulled in
   
   ```diff
   14a15
   > akka-http-xml_2.11-10.0.13.jar
   22a24
   > akka-stream-alpakka-s3_2.11-0.19.jar
   30a33
   > aws-java-sdk-core-1.11.295.jar
   40a44
   > ion-java-1.0.2.jar
   52a57
   > joda-time-2.8.1.jar
   84c89
   < scala-xml_2.11-1.0.5.jar
   ---
   > scala-xml_2.11-1.0.6.jar
   
   ```
   
   Below was the original dependency list which was trimmed down to previous one based on the assumption alpakka s3 client uses akk http stack. So commons http and jackson can be excluded. All tests pass with these dependencies removed
   
   ```
        \--- com.lightbend.akka:akka-stream-alpakka-s3_2.11:0.19
             +--- org.scala-lang:scala-library:2.11.12
             +--- com.typesafe.akka:akka-stream_2.11:2.5.12 (*)
             +--- com.typesafe.akka:akka-http_2.11:10.0.13 -> 10.1.1 (*)
             +--- com.typesafe.akka:akka-http-xml_2.11:10.0.13
             |    +--- org.scala-lang:scala-library:2.11.11 -> 2.11.12
             |    +--- com.typesafe.akka:akka-http_2.11:10.0.13 -> 10.1.1 (*)
             |    +--- com.typesafe.akka:akka-stream_2.11:2.4.20 -> 2.5.12 (*)
             |    \--- org.scala-lang.modules:scala-xml_2.11:1.0.6 (*)
             \--- com.amazonaws:aws-java-sdk-core:1.11.295
                  +--- commons-logging:commons-logging:1.1.3 -> 1.2
                  +--- org.apache.httpcomponents:httpclient:4.5.5 (*)
                  +--- software.amazon.ion:ion-java:1.0.2
                  +--- com.fasterxml.jackson.core:jackson-databind:2.6.7.1 -> 2.7.7 (*)
                  +--- com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:2.6.7
                  |    \--- com.fasterxml.jackson.core:jackson-core:2.6.7 -> 2.7.7
                  \--- joda-time:joda-time:2.8.1
   
   ```
   
   #### TODO
   
   - [ ] - Add test against actual S3 Bucket
   - [ ] - Fix attachmentScheme name from `s3s` -> `s3` (see akka/akka-http#2080)
   
   
   ## Related issue and scope
   <!--- Please include a link to a related issue if there is one. -->
   - [ ] I opened an issue to propose and discuss this change (#3450)
   
   ## My changes affect the following components
   <!--- Select below all system components are affected by your change. -->
   <!--- Enter an `x` in all applicable boxes. -->
   - [ ] API
   - [ ] Controller
   - [ ] Message Bus (e.g., Kafka)
   - [ ] Loadbalancer
   - [ ] Invoker
   - [ ] Intrinsic actions (e.g., sequences, conductors)
   - [x] Data stores (e.g., CouchDB)
   - [ ] Tests
   - [ ] Deployment
   - [ ] CLI
   - [ ] General tooling
   - [ ] Documentation
   
   ## Types of changes
   <!--- What types of changes does your code introduce? Use `x` in all the boxes that apply: -->
   - [ ] Bug fix (generally a non-breaking change which closes an issue).
   - [x] Enhancement or new feature (adds new functionality).
   - [ ] Breaking change (a bug fix or enhancement which changes existing behavior).
   
   ## Checklist:
   <!--- Please review the points below which help you make sure you've covered all aspects of the change you're making. -->
   
   - [x] I signed an [Apache CLA](https://github.com/apache/incubator-openwhisk/blob/master/CONTRIBUTING.md).
   - [x] I reviewed the [style guides](https://github.com/apache/incubator-openwhisk/wiki/Contributing:-Git-guidelines#code-readiness) and followed the recommendations (Travis CI will check :).
   - [x] I added tests to cover my changes.
   - [ ] My changes require further changes to the documentation.
   - [ ] I updated the documentation where necessary.
   
   [1]: https://developer.lightbend.com/docs/alpakka/current/s3.html#usage
   [2]: https://github.com/minio/minio
   [3]: https://aws.amazon.com/s3/
   [4]: https://github.com/findify/s3mock
   [5]: https://www.ibm.com/cloud/object-storage
   [6]: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html
   [7]: https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
   [8]: https://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html
   [9]: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectDELETE.html

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services