You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openwhisk.apache.org by GitBox <gi...@apache.org> on 2022/07/04 10:07:06 UTC

[GitHub] [openwhisk] style95 opened a new pull request, #5266: [Scheduler Enhancement] Increase the retention timeout for the blackbox action.

style95 opened a new pull request, #5266:
URL: https://github.com/apache/openwhisk/pull/5266

   ## Description
   For the blackbox actions, it may take more than 20 seconds which is a default in-progress timeout, to pull an image and it makes the activation timed out.
   A memory queue no longer drop black-box activations in such a case with this change.
   
   ## Related issue and scope
   <!--- Please include a link to a related issue if there is one. -->
   - [ ] I opened an issue to propose and discuss this change (#????)
   
   ## My changes affect the following components
   <!--- Select below all system components are affected by your change. -->
   <!--- Enter an `x` in all applicable boxes. -->
   - [ ] API
   - [ ] Controller
   - [ ] Message Bus (e.g., Kafka)
   - [ ] Loadbalancer
   - [ ] Scheduler
   - [ ] Invoker
   - [ ] Intrinsic actions (e.g., sequences, conductors)
   - [ ] Data stores (e.g., CouchDB)
   - [ ] Tests
   - [ ] Deployment
   - [ ] CLI
   - [ ] General tooling
   - [ ] Documentation
   
   ## Types of changes
   <!--- What types of changes does your code introduce? Use `x` in all the boxes that apply: -->
   - [ ] Bug fix (generally a non-breaking change which closes an issue).
   - [ ] Enhancement or new feature (adds new functionality).
   - [ ] Breaking change (a bug fix or enhancement which changes existing behavior).
   
   ## Checklist:
   <!--- Please review the points below which help you make sure you've covered all aspects of the change you're making. -->
   
   - [ ] I signed an [Apache CLA](https://github.com/apache/openwhisk/blob/master/CONTRIBUTING.md).
   - [ ] I reviewed the [style guides](https://github.com/apache/openwhisk/blob/master/CONTRIBUTING.md#coding-standards) and followed the recommendations (Travis CI will check :).
   - [ ] I added tests to cover my changes.
   - [ ] My changes require further changes to the documentation.
   - [ ] I updated the documentation where necessary.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@openwhisk.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [openwhisk] style95 commented on a diff in pull request #5266: [Scheduler Enhancement] Increase the retention timeout for the blackbox action.

Posted by GitBox <gi...@apache.org>.
style95 commented on code in PR #5266:
URL: https://github.com/apache/openwhisk/pull/5266#discussion_r916405916


##########
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/MemoryQueue.scala:
##########
@@ -323,33 +332,50 @@ class MemoryQueue(private val etcdClient: EtcdClient,
       goto(Running) using RunningData(schedulerActor, droppingActor)
 
     // log the failed information
-    case Event(FailedCreationJob(creationId, _, _, _, _, message), data: FlushingData) =>
+    case Event(FailedCreationJob(creationId, _, _, _, error, message), data: FlushingData) =>
       creationIds -= creationId.asString
       logging.info(
         this,
         s"[$invocationNamespace:$action:$stateName][$creationId] Failed to create a container due to $message")
 
       // keep updating the reason
-      stay using data.copy(reason = message)
+      stay using data.copy(error = error, reason = message)
 
     // since there is no container, activations cannot be handled.
     case Event(msg: ActivationMessage, data: FlushingData) =>
-      completeErrorActivation(msg, data.reason, ContainerCreationError.whiskErrors.contains(data.error))
+      logging.info(this, s"[$invocationNamespace:$action:$stateName] got a new activation message ${msg.activationId}")(
+        msg.transid)
+      val whiskError = isWhiskError(data.error)
+      if (whiskError)
+        queue = queue.enqueue(TimeSeriesActivationEntry(Instant.now, msg))
+      else
+        completeErrorActivation(msg, data.reason, whiskError)
       stay() using data.copy(activeDuringFlush = true)
 
     // Since SchedulingDecisionMaker keep sending a message to create a container, this state is not automatically timed out.
     // Instead, StateTimeout message will be sent by a timer.
-    case Event(StateTimeout, data: FlushingData) =>
-      completeAllActivations(data.reason, ContainerCreationError.whiskErrors.contains(data.error))
-      if (data.activeDuringFlush)
+    case Event(StateTimeout | DropOld, data: FlushingData) =>
+      logging.info(this, s"[$invocationNamespace:$action:$stateName] Received StateTimeout, drop stale messages.")
+      queue =
+        MemoryQueue.dropOld(queue, Duration.ofMillis(actionRetentionTimeout), data.reason, completeErrorActivation)
+      if (data.activeDuringFlush || queue.nonEmpty)
         stay using data.copy(activeDuringFlush = false)
       else
         cleanUpActorsAndGotoRemoved(data)
 
-    case Event(GracefulShutdown, data: FlushingData) =>
-      completeAllActivations(data.reason, ContainerCreationError.whiskErrors.contains(data.error))
+      completeAllActivations(data.reason, isWhiskError(data.error))

Review Comment:
   Nice catch. I added the case back.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@openwhisk.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [openwhisk] ningyougang commented on a diff in pull request #5266: [Scheduler Enhancement] Increase the retention timeout for the blackbox action.

Posted by GitBox <gi...@apache.org>.
ningyougang commented on code in PR #5266:
URL: https://github.com/apache/openwhisk/pull/5266#discussion_r913337342


##########
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/MemoryQueue.scala:
##########
@@ -323,33 +332,50 @@ class MemoryQueue(private val etcdClient: EtcdClient,
       goto(Running) using RunningData(schedulerActor, droppingActor)
 
     // log the failed information
-    case Event(FailedCreationJob(creationId, _, _, _, _, message), data: FlushingData) =>
+    case Event(FailedCreationJob(creationId, _, _, _, error, message), data: FlushingData) =>
       creationIds -= creationId.asString
       logging.info(
         this,
         s"[$invocationNamespace:$action:$stateName][$creationId] Failed to create a container due to $message")
 
       // keep updating the reason
-      stay using data.copy(reason = message)
+      stay using data.copy(error = error, reason = message)
 
     // since there is no container, activations cannot be handled.
     case Event(msg: ActivationMessage, data: FlushingData) =>
-      completeErrorActivation(msg, data.reason, ContainerCreationError.whiskErrors.contains(data.error))
+      logging.info(this, s"[$invocationNamespace:$action:$stateName] got a new activation message ${msg.activationId}")(
+        msg.transid)
+      val whiskError = isWhiskError(data.error)
+      if (whiskError)
+        queue = queue.enqueue(TimeSeriesActivationEntry(Instant.now, msg))
+      else
+        completeErrorActivation(msg, data.reason, whiskError)
       stay() using data.copy(activeDuringFlush = true)
 
     // Since SchedulingDecisionMaker keep sending a message to create a container, this state is not automatically timed out.
     // Instead, StateTimeout message will be sent by a timer.
-    case Event(StateTimeout, data: FlushingData) =>
-      completeAllActivations(data.reason, ContainerCreationError.whiskErrors.contains(data.error))
-      if (data.activeDuringFlush)
+    case Event(StateTimeout | DropOld, data: FlushingData) =>
+      logging.info(this, s"[$invocationNamespace:$action:$stateName] Received StateTimeout, drop stale messages.")
+      queue =
+        MemoryQueue.dropOld(queue, Duration.ofMillis(actionRetentionTimeout), data.reason, completeErrorActivation)
+      if (data.activeDuringFlush || queue.nonEmpty)
         stay using data.copy(activeDuringFlush = false)
       else
         cleanUpActorsAndGotoRemoved(data)
 
-    case Event(GracefulShutdown, data: FlushingData) =>
-      completeAllActivations(data.reason, ContainerCreationError.whiskErrors.contains(data.error))
+      completeAllActivations(data.reason, isWhiskError(data.error))

Review Comment:
   Seems our downstream didn't have this statement
   ```
   completeAllActivations(data.reason, isWhiskError(data.error))
   ```
   So, just add `completeAllActivations`, is it  for safe?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@openwhisk.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [openwhisk] bdoyle0182 commented on pull request #5266: [Scheduler Enhancement] Increase the retention timeout for the blackbox action.

Posted by GitBox <gi...@apache.org>.
bdoyle0182 commented on PR #5266:
URL: https://github.com/apache/openwhisk/pull/5266#issuecomment-1175449337

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@openwhisk.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [openwhisk] codecov-commenter commented on pull request #5266: [Scheduler Enhancement] Increase the retention timeout for the blackbox action.

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #5266:
URL: https://github.com/apache/openwhisk/pull/5266#issuecomment-1174526257

   # [Codecov](https://codecov.io/gh/apache/openwhisk/pull/5266?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#5266](https://codecov.io/gh/apache/openwhisk/pull/5266?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3debd1e) into [master](https://codecov.io/gh/apache/openwhisk/commit/c5970a657a3070e9964cb14715b9df31819d3b75?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c5970a6) will **decrease** coverage by `75.68%`.
   > The diff coverage is `1.85%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #5266       +/-   ##
   ==========================================
   - Coverage   80.27%   4.59%   -75.69%     
   ==========================================
     Files         238     238               
     Lines       14066   14098       +32     
     Branches      574     570        -4     
   ==========================================
   - Hits        11292     648    -10644     
   - Misses       2774   13450    +10676     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/openwhisk/pull/5266?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...in/scala/org/apache/openwhisk/common/Logging.scala](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NjYWxhL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvb3BlbndoaXNrL2NvbW1vbi9Mb2dnaW5nLnNjYWxh) | `41.85% <0.00%> (-44.94%)` | :arrow_down: |
   | [.../core/scheduler/container/CreationJobManager.scala](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29yZS9zY2hlZHVsZXIvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9vcGVud2hpc2svY29yZS9zY2hlZHVsZXIvY29udGFpbmVyL0NyZWF0aW9uSm9iTWFuYWdlci5zY2FsYQ==) | `0.00% <0.00%> (-83.08%)` | :arrow_down: |
   | [...e/openwhisk/core/scheduler/queue/MemoryQueue.scala](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29yZS9zY2hlZHVsZXIvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9vcGVud2hpc2svY29yZS9zY2hlZHVsZXIvcXVldWUvTWVtb3J5UXVldWUuc2NhbGE=) | `0.00% <0.00%> (-84.16%)` | :arrow_down: |
   | [.../scala/org/apache/openwhisk/core/WhiskConfig.scala](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NjYWxhL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvb3BlbndoaXNrL2NvcmUvV2hpc2tDb25maWcuc2NhbGE=) | `68.20% <100.00%> (-27.73%)` | :arrow_down: |
   | [.../main/scala/org/apache/openwhisk/core/WarmUp.scala](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NjYWxhL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvb3BlbndoaXNrL2NvcmUvV2FybVVwLnNjYWxh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ain/scala/org/apache/openwhisk/spi/SpiLoader.scala](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NjYWxhL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvb3BlbndoaXNrL3NwaS9TcGlMb2FkZXIuc2NhbGE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...n/scala/org/apache/openwhisk/utils/JsHelpers.scala](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NjYWxhL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvb3BlbndoaXNrL3V0aWxzL0pzSGVscGVycy5zY2FsYQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...scala/org/apache/openwhisk/common/Prometheus.scala](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NjYWxhL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvb3BlbndoaXNrL2NvbW1vbi9Qcm9tZXRoZXVzLnNjYWxh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...scala/org/apache/openwhisk/core/FeatureFlags.scala](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NjYWxhL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvb3BlbndoaXNrL2NvcmUvRmVhdHVyZUZsYWdzLnNjYWxh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ala/org/apache/openwhisk/common/ConfigMXBean.scala](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NjYWxhL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvb3BlbndoaXNrL2NvbW1vbi9Db25maWdNWEJlYW4uc2NhbGE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [206 more](https://codecov.io/gh/apache/openwhisk/pull/5266/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/openwhisk/pull/5266?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/openwhisk/pull/5266?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c5970a6...3debd1e](https://codecov.io/gh/apache/openwhisk/pull/5266?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@openwhisk.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [openwhisk] style95 merged pull request #5266: [Scheduler Enhancement] Increase the retention timeout for the blackbox action.

Posted by GitBox <gi...@apache.org>.
style95 merged PR #5266:
URL: https://github.com/apache/openwhisk/pull/5266


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@openwhisk.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [openwhisk] ningyougang commented on a diff in pull request #5266: [Scheduler Enhancement] Increase the retention timeout for the blackbox action.

Posted by GitBox <gi...@apache.org>.
ningyougang commented on code in PR #5266:
URL: https://github.com/apache/openwhisk/pull/5266#discussion_r913337342


##########
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/MemoryQueue.scala:
##########
@@ -323,33 +332,50 @@ class MemoryQueue(private val etcdClient: EtcdClient,
       goto(Running) using RunningData(schedulerActor, droppingActor)
 
     // log the failed information
-    case Event(FailedCreationJob(creationId, _, _, _, _, message), data: FlushingData) =>
+    case Event(FailedCreationJob(creationId, _, _, _, error, message), data: FlushingData) =>
       creationIds -= creationId.asString
       logging.info(
         this,
         s"[$invocationNamespace:$action:$stateName][$creationId] Failed to create a container due to $message")
 
       // keep updating the reason
-      stay using data.copy(reason = message)
+      stay using data.copy(error = error, reason = message)
 
     // since there is no container, activations cannot be handled.
     case Event(msg: ActivationMessage, data: FlushingData) =>
-      completeErrorActivation(msg, data.reason, ContainerCreationError.whiskErrors.contains(data.error))
+      logging.info(this, s"[$invocationNamespace:$action:$stateName] got a new activation message ${msg.activationId}")(
+        msg.transid)
+      val whiskError = isWhiskError(data.error)
+      if (whiskError)
+        queue = queue.enqueue(TimeSeriesActivationEntry(Instant.now, msg))
+      else
+        completeErrorActivation(msg, data.reason, whiskError)
       stay() using data.copy(activeDuringFlush = true)
 
     // Since SchedulingDecisionMaker keep sending a message to create a container, this state is not automatically timed out.
     // Instead, StateTimeout message will be sent by a timer.
-    case Event(StateTimeout, data: FlushingData) =>
-      completeAllActivations(data.reason, ContainerCreationError.whiskErrors.contains(data.error))
-      if (data.activeDuringFlush)
+    case Event(StateTimeout | DropOld, data: FlushingData) =>
+      logging.info(this, s"[$invocationNamespace:$action:$stateName] Received StateTimeout, drop stale messages.")
+      queue =
+        MemoryQueue.dropOld(queue, Duration.ofMillis(actionRetentionTimeout), data.reason, completeErrorActivation)
+      if (data.activeDuringFlush || queue.nonEmpty)
         stay using data.copy(activeDuringFlush = false)
       else
         cleanUpActorsAndGotoRemoved(data)
 
-    case Event(GracefulShutdown, data: FlushingData) =>
-      completeAllActivations(data.reason, ContainerCreationError.whiskErrors.contains(data.error))
+      completeAllActivations(data.reason, isWhiskError(data.error))

Review Comment:
   Our downstream didn't have this statement
   ```
   completeAllActivations(data.reason, isWhiskError(data.error))
   ```
   So, just add `completeAllActivations`, is it  for safe?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@openwhisk.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org