You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openwhisk.apache.org by Chetan Mehrotra <ch...@gmail.com> on 2018/03/25 19:02:19 UTC

Possible enhancements around attachment handling

Last week I had a Slack call with Rodric around AttachmentStore PR and
following possible enhancements around attachment handling in
OpenWhisk were discussed. These enhancements would enable efficient
handling of code content specially for large actions

A - Attachment Inlining
------------------------------

Currently code related to action can be stored in following form

1. Plaintext string - Code is stored as is in plain text form in
`code` attribute

2. Base64 encoded zip content - Code is stored as base64 encoded
zipped content in `code` attribute with `binary` set to true

3. CouchDB attachment - Code content is stored as raw bytes via
CouchDB attachment support [1]

In all cases when Invoker invokes the action the code content is
passed as string as part of initializer json. Also when action is
created the code is passed inline as part of json. Going forward
(specially useful with switch to external AttachmentStore) the code
content would always be stored as attachment (PR #2847). For such
cases it would better to have a way where smaller code can be stored
inlined (as being done for non java cases currently) to avoid extra
calls to AttachmentStore.

This inlining logic would be an internal detail of AttachmentStore and
would be based on configurable maxInlineSize configuration
irrespective of action kind.

B - Streaming Action Code - Action Creation
----------------------------------------------------------

Currently the action is provided inline string attribute in json
payload POSTed to namespaces/<package>/actions/<actionName> endpoint.
Due to this whole code content is processed in memory as byte array.
Compared to this the ArtifactStore abstraction supports streams for
creating and reading attachments.

For supporting large action code it would better if we enable
streaming of action code from client to ArtifactStore without storing
it as byte array within heap. This can be done by supporting
multi-part upload [2] for action endpoint and then just passing the
byteSource to ArtifactStore

C - Streaming Action Code - Action Execution
-------------------------------------------------------------

This compliments the #B option. Currently action code is passed as
part of json in action initializer call. Going forward with support of
AttachmenetStore we can make use of constructs like signed url [3]
(suppored in most stores like S3, IBM COS and Azure Blob Storage).
Here instead of passing the action code in json the Invoker would
obtained a signed url which is then passed to container and internally
container directly streams the url content.

Kindly share your thoughts/comment around this topic

Chetan Mehrotra
[1] http://docs.couchdb.org/en/2.0.0/api/document/attachments.html
[2] https://doc.akka.io/docs/akka-http/current/routing-dsl/directives/file-upload-directives/fileUpload.html
[3] https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html

Re: Possible enhancements around attachment handling

Posted by Chetan Mehrotra <ch...@gmail.com>.
> Are there security implications of #C we should consider?  For example, an
> OpenWhisk deployment might have a network policy in place that prevented
> user action containers from directly communicating with the ArtifactStore.

Yes that would need to be considered. In such a case then we would
need to implement it as Markus suggested where Invoker and runtime
coordinate to stream the binary

Chetan Mehrotra

Re: Possible enhancements around attachment handling

Posted by David P Grove <gr...@us.ibm.com>.


Chetan Mehrotra <ch...@gmail.com> wrote on 03/27/2018 02:50:59
AM:
>
> > That'd put the "burden" of downloading the action code into each
> and every runtime, right? Do you think that is necessary?
>
> #C would need few prototypes to figure out the right approach. My
> current thinking was that each runtime can have at minimum wget like
> app embedded and then Invoker would just pass in the signed url which
> is then passed to wget to do actual streaming
>


Are there security implications of #C we should consider?  For example, an
OpenWhisk deployment might have a network policy in place that prevented
user action containers from directly communicating with the ArtifactStore.

--dave


>
> On Mon, Mar 26, 2018 at 4:42 PM, Markus Thoemmes
> <ma...@de.ibm.com> wrote:
> > Hi Chetan,
> >
> > thanks a lot for capturing this. It's about time we get streaming
> for the action code!
> >
> > One comment to C: That'd put the "burden" of downloading the
> action code into each and every runtime, right? Do you think that is
> necessary? We could keep the notion of the ArtifactStore as of today
> and just proxy the TCP stream through the invoker (which has
> credentials and everything) into the runtime itself.
> >
> > Benefits from my PoV:
> > - No needed change to the runtimes at all.
> > - Credential handling doesn't need to rely on the ObjectStore
> provider's featureset (as in: signed URLs)
> > - The invoker can potentially "deduplicate"/cache multiple calls.
> Under a burst for example it'd be neat if we only needed to download
> the code once per invoker
> >
> > What do you think?
> >
> > Other than that: Go for it!
> >
> > Cheers,
> > Markus
> >
> >
>

Re: Possible enhancements around attachment handling

Posted by Chetan Mehrotra <ch...@gmail.com>.
> This should be absolutely doable without a change in the runtimes at all, as they "shouldn't" (to be verified) need a change to support TCP streaming.

I think some of the runtimes are currently reading whole request in
memory. For e.g. in Java [1] it constructs a json instance. So would
need some logic to stream the content to filesystem (possiby via
multipart support). For nodejs some sort of multi part handling or
stream handling would be required.

Chetan Mehrotra
[1] https://github.com/apache/incubator-openwhisk-runtime-java/blob/master/core/javaAction/proxy/src/main/java/openwhisk/java/action/Proxy.java#L80


On Thu, Mar 29, 2018 at 4:40 PM, Markus Thoemmes
<ma...@de.ibm.com> wrote:
> Heya,
>
>>> So for #C is there a way to proxy the stream without changing the runtime http api where binary is passed embedded in the json object on /init?
>
>> They would need to be modified. A less intrusive approach would have
>> been to pass signed url and use client like wget to fetch the binary.
>> But as David mentioned some setups may lock down access to external
>> services. For such cases we would need to implement the support in
>> client runtimes as part of init protocol
>
>> Chetan Mehrotra
>
>
> I believe they don't need to be changed at all. TCP as the transport protocol is streamed by default (stuff is broken down into smallish packets). Say for example our payload we need to pass is: {"main": "foo", "code": "Some veeeeery long string here"}. Let's also assume our attachement is available as Source[ByteString]. What we can do, is generate the first bit until the part we *need* to stream, in this case: {"main": "foo", "code": ". Now we consume the chunks of the attachement Source and put them on the downstream. In the end, we put an element like "} on the stream, to close the JSON object.
>
> Pseudocode: Source.single(ByteString("""{"main": "foo", "code":""")).concat(attachementSource).to(httpConnectionToContainer).run()
>
> This should be absolutely doable without a change in the runtimes at all, as they "shouldn't" (to be verified) need a change to support TCP streaming.
>
> Does that make sense?
>
> Cheers,
> Markus
>

Re: Possible enhancements around attachment handling

Posted by Markus Thoemmes <ma...@de.ibm.com>.
Heya,

>> So for #C is there a way to proxy the stream without changing the runtime http api where binary is passed embedded in the json object on /init?

> They would need to be modified. A less intrusive approach would have
> been to pass signed url and use client like wget to fetch the binary.
> But as David mentioned some setups may lock down access to external
> services. For such cases we would need to implement the support in
> client runtimes as part of init protocol

> Chetan Mehrotra


I believe they don't need to be changed at all. TCP as the transport protocol is streamed by default (stuff is broken down into smallish packets). Say for example our payload we need to pass is: {"main": "foo", "code": "Some veeeeery long string here"}. Let's also assume our attachement is available as Source[ByteString]. What we can do, is generate the first bit until the part we *need* to stream, in this case: {"main": "foo", "code": ". Now we consume the chunks of the attachement Source and put them on the downstream. In the end, we put an element like "} on the stream, to close the JSON object.

Pseudocode: Source.single(ByteString("""{"main": "foo", "code":""")).concat(attachementSource).to(httpConnectionToContainer).run()

This should be absolutely doable without a change in the runtimes at all, as they "shouldn't" (to be verified) need a change to support TCP streaming.

Does that make sense?

Cheers,
Markus


Re: Possible enhancements around attachment handling

Posted by Chetan Mehrotra <ch...@gmail.com>.
> So for #C is there a way to proxy the stream without changing the runtime http api where binary is passed embedded in the json object on /init?

They would need to be modified. A less intrusive approach would have
been to pass signed url and use client like wget to fetch the binary.
But as David mentioned some setups may lock down access to external
services. For such cases we would need to implement the support in
client runtimes as part of init protocol

Chetan Mehrotra


On Wed, Mar 28, 2018 at 11:57 PM, Tyson Norris
<tn...@adobe.com.invalid> wrote:
> So for #C is there a way to proxy the stream without changing the runtime http api where binary is passed embedded in the json object on /init?
> It’s not obvious to me how this can be done without changing the runtimes (and the invoker), since currently the whole entity is loaded to memory to get the JSON sent to /init?
>
>
>> On Mar 26, 2018, at 11:50 PM, Chetan Mehrotra <ch...@gmail.com> wrote:
>>
>>> That'd put the "burden" of downloading the action code into each and every runtime, right? Do you think that is necessary?
>>
>> #C would need few prototypes to figure out the right approach. My
>> current thinking was that each runtime can have at minimum wget like
>> app embedded and then Invoker would just pass in the signed url which
>> is then passed to wget to do actual streaming
>>
>> If each runtime webserver can efficiently stream the pushed stream
>> without in memory buffering then that would indeed be much better
>> approach to take.
>>
>> Chetan Mehrotra
>>
>>
>> On Mon, Mar 26, 2018 at 4:42 PM, Markus Thoemmes
>> <ma...@de.ibm.com> wrote:
>>> Hi Chetan,
>>>
>>> thanks a lot for capturing this. It's about time we get streaming for the action code!
>>>
>>> One comment to C: That'd put the "burden" of downloading the action code into each and every runtime, right? Do you think that is necessary? We could keep the notion of the ArtifactStore as of today and just proxy the TCP stream through the invoker (which has credentials and everything) into the runtime itself.
>>>
>>> Benefits from my PoV:
>>> - No needed change to the runtimes at all.
>>> - Credential handling doesn't need to rely on the ObjectStore provider's featureset (as in: signed URLs)
>>> - The invoker can potentially "deduplicate"/cache multiple calls. Under a burst for example it'd be neat if we only needed to download the code once per invoker
>>>
>>> What do you think?
>>>
>>> Other than that: Go for it!
>>>
>>> Cheers,
>>> Markus
>>>
>>>
>

Re: Possible enhancements around attachment handling

Posted by Tyson Norris <tn...@adobe.com.INVALID>.
So for #C is there a way to proxy the stream without changing the runtime http api where binary is passed embedded in the json object on /init?
It’s not obvious to me how this can be done without changing the runtimes (and the invoker), since currently the whole entity is loaded to memory to get the JSON sent to /init?


> On Mar 26, 2018, at 11:50 PM, Chetan Mehrotra <ch...@gmail.com> wrote:
> 
>> That'd put the "burden" of downloading the action code into each and every runtime, right? Do you think that is necessary?
> 
> #C would need few prototypes to figure out the right approach. My
> current thinking was that each runtime can have at minimum wget like
> app embedded and then Invoker would just pass in the signed url which
> is then passed to wget to do actual streaming
> 
> If each runtime webserver can efficiently stream the pushed stream
> without in memory buffering then that would indeed be much better
> approach to take.
> 
> Chetan Mehrotra
> 
> 
> On Mon, Mar 26, 2018 at 4:42 PM, Markus Thoemmes
> <ma...@de.ibm.com> wrote:
>> Hi Chetan,
>> 
>> thanks a lot for capturing this. It's about time we get streaming for the action code!
>> 
>> One comment to C: That'd put the "burden" of downloading the action code into each and every runtime, right? Do you think that is necessary? We could keep the notion of the ArtifactStore as of today and just proxy the TCP stream through the invoker (which has credentials and everything) into the runtime itself.
>> 
>> Benefits from my PoV:
>> - No needed change to the runtimes at all.
>> - Credential handling doesn't need to rely on the ObjectStore provider's featureset (as in: signed URLs)
>> - The invoker can potentially "deduplicate"/cache multiple calls. Under a burst for example it'd be neat if we only needed to download the code once per invoker
>> 
>> What do you think?
>> 
>> Other than that: Go for it!
>> 
>> Cheers,
>> Markus
>> 
>> 


Re: Possible enhancements around attachment handling

Posted by Chetan Mehrotra <ch...@gmail.com>.
> That'd put the "burden" of downloading the action code into each and every runtime, right? Do you think that is necessary?

#C would need few prototypes to figure out the right approach. My
current thinking was that each runtime can have at minimum wget like
app embedded and then Invoker would just pass in the signed url which
is then passed to wget to do actual streaming

If each runtime webserver can efficiently stream the pushed stream
without in memory buffering then that would indeed be much better
approach to take.

Chetan Mehrotra


On Mon, Mar 26, 2018 at 4:42 PM, Markus Thoemmes
<ma...@de.ibm.com> wrote:
> Hi Chetan,
>
> thanks a lot for capturing this. It's about time we get streaming for the action code!
>
> One comment to C: That'd put the "burden" of downloading the action code into each and every runtime, right? Do you think that is necessary? We could keep the notion of the ArtifactStore as of today and just proxy the TCP stream through the invoker (which has credentials and everything) into the runtime itself.
>
> Benefits from my PoV:
> - No needed change to the runtimes at all.
> - Credential handling doesn't need to rely on the ObjectStore provider's featureset (as in: signed URLs)
> - The invoker can potentially "deduplicate"/cache multiple calls. Under a burst for example it'd be neat if we only needed to download the code once per invoker
>
> What do you think?
>
> Other than that: Go for it!
>
> Cheers,
> Markus
>
>

Re: Possible enhancements around attachment handling

Posted by Markus Thoemmes <ma...@de.ibm.com>.
Hi Chetan,

thanks a lot for capturing this. It's about time we get streaming for the action code!

One comment to C: That'd put the "burden" of downloading the action code into each and every runtime, right? Do you think that is necessary? We could keep the notion of the ArtifactStore as of today and just proxy the TCP stream through the invoker (which has credentials and everything) into the runtime itself.

Benefits from my PoV:
- No needed change to the runtimes at all.
- Credential handling doesn't need to rely on the ObjectStore provider's featureset (as in: signed URLs)
- The invoker can potentially "deduplicate"/cache multiple calls. Under a burst for example it'd be neat if we only needed to download the code once per invoker

What do you think?

Other than that: Go for it!

Cheers,
Markus