You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openwhisk.apache.org by James W Dubee <jw...@us.ibm.com> on 2018/09/21 17:09:25 UTC

Proposal to Remove Artifact Store Polling for Blocking Invocations




Hello OpenWhisk developers,

When a blocking action is invoked, the controller waits for that action's
response from the invoker and also polls the artifact store for the same
response. Usually blocking invocation responses are obtained from the
invoker. However, there are instances when the invocation response is
retrieved from the artifact store instead. From observation, the most
likely scenario for a blocking activation to be retrieve from the artifact
store is when an action generates a response that exceeds the maximum
allowed Kafka message size for the "completed" topic. However, this
situation should not occur as large action responses are meant to be
truncated by the invoker to the allowed maximum Kafka message size for the
corresponding topic.

Currently artifact store polling for activation records is masking a bug
involving large action responses. While OpenWhisk provides a configuration
value, whisk.activation.payload.max, for what one would assume would allow
for adjustments to be made to the maximum activation record size, this
configuration value only adjusts the Kafka topic that is used to schedule
actions for invocation. Instead the Kafka topic used to communicate the
completion of an action always uses the default value for
KAFKA_MESSAGE_MAX_BYTES, which is ~1MB. Additionally, the invoker truncates
action responses to the whisk.activation.payload.max value even though
whisk.activation.payload.max is not being applied properly to the
"completed" Kafka topic. More over, this truncation does not account for
data added to the action response by the Kafka producer during
serialization, so an action response may fail to be sent to the "completed"
topic even if its actual action response size adheres to the topic's size
limitations. As a result, any action response plus the size of
serialization done by the Kafka producer that exceeds ~1MB will be
retrieved via artifact store polling.

Performance degradation appears to occur when an activation recorded is
retrieved via artifact store polling. Artifact store polling occurs every
15 seconds for a blocking invocation. Since the response of an action that
generates a payload greater than ~1MB can not be sent through the
"completed" Kafka topic, that action's activation record must be retrieved
via polling. Even though such an action may complete in milliseconds, the
end user will not get back the activation response for at least 15 seconds
due to the polling logic in the controller.

I have submitted a pull request to remove the polling mechanism and also
fix the large action response bug. The pull request can be found here:
https://github.com/apache/incubator-openwhisk/pull/4033.

Regards,
James Dubee

Re: Proposal to Remove Artifact Store Polling for Blocking Invocations

Posted by Carlos Santana <cs...@gmail.com>.
Thanks James for sharing with the dev list, I think you did a great job
finding out what was this mystery behind polling and large payloads.

It was a hard nut to crack. +1

-- Carlos

On Mon, Sep 24, 2018 at 3:03 PM James W Dubee <jw...@us.ibm.com> wrote:

> Hey Rodric,
>
> Sure, I split up the two changes into different PRs. The defect fix is now
> located here: https://github.com/apache/incubator-openwhisk/pull/4040.
> I'll use the PR from my original email for removal of DB polling.
>
> Regards,
> James Dubee
>
>
> [image: Inactive hide details for Rodric Rabbah ---09/21/2018 02:34:42
> PM---Thanks James for the explanation and patches. It sounds lik]Rodric
> Rabbah ---09/21/2018 02:34:42 PM---Thanks James for the explanation and
> patches. It sounds like there should be two separate PRs, one t
>
> From: Rodric Rabbah <ro...@gmail.com>
> To: dev@openwhisk.apache.org
> Date: 09/21/2018 02:34 PM
> Subject: Re: Proposal to Remove Artifact Store Polling for Blocking
> Invocations
> ------------------------------
>
>
>
>
> Thanks James for the explanation and patches. It sounds like there should
> be two separate PRs, one to address the bug and the other to remove
> polling. What do you think?
>
> -r
>
> > On Sep 21, 2018, at 1:09 PM, James W Dubee <jw...@us.ibm.com> wrote:
> >
> >
> >
> >
> >
> > Hello OpenWhisk developers,
> >
> > When a blocking action is invoked, the controller waits for that action's
> > response from the invoker and also polls the artifact store for the same
> > response. Usually blocking invocation responses are obtained from the
> > invoker. However, there are instances when the invocation response is
> > retrieved from the artifact store instead. From observation, the most
> > likely scenario for a blocking activation to be retrieve from the
> artifact
> > store is when an action generates a response that exceeds the maximum
> > allowed Kafka message size for the "completed" topic. However, this
> > situation should not occur as large action responses are meant to be
> > truncated by the invoker to the allowed maximum Kafka message size for
> the
> > corresponding topic.
> >
> > Currently artifact store polling for activation records is masking a bug
> > involving large action responses. While OpenWhisk provides a
> configuration
> > value, whisk.activation.payload.max, for what one would assume would
> allow
> > for adjustments to be made to the maximum activation record size, this
> > configuration value only adjusts the Kafka topic that is used to schedule
> > actions for invocation. Instead the Kafka topic used to communicate the
> > completion of an action always uses the default value for
> > KAFKA_MESSAGE_MAX_BYTES, which is ~1MB. Additionally, the invoker
> truncates
> > action responses to the whisk.activation.payload.max value even though
> > whisk.activation.payload.max is not being applied properly to the
> > "completed" Kafka topic. More over, this truncation does not account for
> > data added to the action response by the Kafka producer during
> > serialization, so an action response may fail to be sent to the
> "completed"
> > topic even if its actual action response size adheres to the topic's size
> > limitations. As a result, any action response plus the size of
> > serialization done by the Kafka producer that exceeds ~1MB will be
> > retrieved via artifact store polling.
> >
> > Performance degradation appears to occur when an activation recorded is
> > retrieved via artifact store polling. Artifact store polling occurs every
> > 15 seconds for a blocking invocation. Since the response of an action
> that
> > generates a payload greater than ~1MB can not be sent through the
> > "completed" Kafka topic, that action's activation record must be
> retrieved
> > via polling. Even though such an action may complete in milliseconds, the
> > end user will not get back the activation response for at least 15
> seconds
> > due to the polling logic in the controller.
> >
> > I have submitted a pull request to remove the polling mechanism and also
> > fix the large action response bug. The pull request can be found here:
> > https://github.com/apache/incubator-openwhisk/pull/4033.
> >
> > Regards,
> > James Dubee
>
>
>
>
>

Re: Proposal to Remove Artifact Store Polling for Blocking Invocations

Posted by James W Dubee <jw...@us.ibm.com>.

Hey Rodric,

Sure, I split up the two changes into different PRs. The defect fix is now
located here: https://github.com/apache/incubator-openwhisk/pull/4040. I'll
use the PR from my original email for removal of DB polling.

Regards,
James Dubee




From:	Rodric Rabbah <ro...@gmail.com>
To:	dev@openwhisk.apache.org
Date:	09/21/2018 02:34 PM
Subject:	Re: Proposal to Remove Artifact Store Polling for Blocking
            Invocations



Thanks James for the explanation and patches. It sounds like there should
be two separate PRs, one to address the bug and the other to remove
polling. What do you think?

-r

> On Sep 21, 2018, at 1:09 PM, James W Dubee <jw...@us.ibm.com> wrote:
>
>
>
>
>
> Hello OpenWhisk developers,
>
> When a blocking action is invoked, the controller waits for that action's
> response from the invoker and also polls the artifact store for the same
> response. Usually blocking invocation responses are obtained from the
> invoker. However, there are instances when the invocation response is
> retrieved from the artifact store instead. From observation, the most
> likely scenario for a blocking activation to be retrieve from the
artifact
> store is when an action generates a response that exceeds the maximum
> allowed Kafka message size for the "completed" topic. However, this
> situation should not occur as large action responses are meant to be
> truncated by the invoker to the allowed maximum Kafka message size for
the
> corresponding topic.
>
> Currently artifact store polling for activation records is masking a bug
> involving large action responses. While OpenWhisk provides a
configuration
> value, whisk.activation.payload.max, for what one would assume would
allow
> for adjustments to be made to the maximum activation record size, this
> configuration value only adjusts the Kafka topic that is used to schedule
> actions for invocation. Instead the Kafka topic used to communicate the
> completion of an action always uses the default value for
> KAFKA_MESSAGE_MAX_BYTES, which is ~1MB. Additionally, the invoker
truncates
> action responses to the whisk.activation.payload.max value even though
> whisk.activation.payload.max is not being applied properly to the
> "completed" Kafka topic. More over, this truncation does not account for
> data added to the action response by the Kafka producer during
> serialization, so an action response may fail to be sent to the
"completed"
> topic even if its actual action response size adheres to the topic's size
> limitations. As a result, any action response plus the size of
> serialization done by the Kafka producer that exceeds ~1MB will be
> retrieved via artifact store polling.
>
> Performance degradation appears to occur when an activation recorded is
> retrieved via artifact store polling. Artifact store polling occurs every
> 15 seconds for a blocking invocation. Since the response of an action
that
> generates a payload greater than ~1MB can not be sent through the
> "completed" Kafka topic, that action's activation record must be
retrieved
> via polling. Even though such an action may complete in milliseconds, the
> end user will not get back the activation response for at least 15
seconds
> due to the polling logic in the controller.
>
> I have submitted a pull request to remove the polling mechanism and also
> fix the large action response bug. The pull request can be found here:
>
https://github.com/apache/incubator-openwhisk/pull/4033
.
>
> Regards,
> James Dubee




Re: Proposal to Remove Artifact Store Polling for Blocking Invocations

Posted by Rodric Rabbah <ro...@gmail.com>.
Thanks James for the explanation and patches. It sounds like there should be two separate PRs, one to address the bug and the other to remove polling. What do you think?

-r

> On Sep 21, 2018, at 1:09 PM, James W Dubee <jw...@us.ibm.com> wrote:
> 
> 
> 
> 
> 
> Hello OpenWhisk developers,
> 
> When a blocking action is invoked, the controller waits for that action's
> response from the invoker and also polls the artifact store for the same
> response. Usually blocking invocation responses are obtained from the
> invoker. However, there are instances when the invocation response is
> retrieved from the artifact store instead. From observation, the most
> likely scenario for a blocking activation to be retrieve from the artifact
> store is when an action generates a response that exceeds the maximum
> allowed Kafka message size for the "completed" topic. However, this
> situation should not occur as large action responses are meant to be
> truncated by the invoker to the allowed maximum Kafka message size for the
> corresponding topic.
> 
> Currently artifact store polling for activation records is masking a bug
> involving large action responses. While OpenWhisk provides a configuration
> value, whisk.activation.payload.max, for what one would assume would allow
> for adjustments to be made to the maximum activation record size, this
> configuration value only adjusts the Kafka topic that is used to schedule
> actions for invocation. Instead the Kafka topic used to communicate the
> completion of an action always uses the default value for
> KAFKA_MESSAGE_MAX_BYTES, which is ~1MB. Additionally, the invoker truncates
> action responses to the whisk.activation.payload.max value even though
> whisk.activation.payload.max is not being applied properly to the
> "completed" Kafka topic. More over, this truncation does not account for
> data added to the action response by the Kafka producer during
> serialization, so an action response may fail to be sent to the "completed"
> topic even if its actual action response size adheres to the topic's size
> limitations. As a result, any action response plus the size of
> serialization done by the Kafka producer that exceeds ~1MB will be
> retrieved via artifact store polling.
> 
> Performance degradation appears to occur when an activation recorded is
> retrieved via artifact store polling. Artifact store polling occurs every
> 15 seconds for a blocking invocation. Since the response of an action that
> generates a payload greater than ~1MB can not be sent through the
> "completed" Kafka topic, that action's activation record must be retrieved
> via polling. Even though such an action may complete in milliseconds, the
> end user will not get back the activation response for at least 15 seconds
> due to the polling logic in the controller.
> 
> I have submitted a pull request to remove the polling mechanism and also
> fix the large action response bug. The pull request can be found here:
> https://github.com/apache/incubator-openwhisk/pull/4033.
> 
> Regards,
> James Dubee