You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2019/12/20 23:01:28 UTC

[GitHub] [samza] prateekm opened a new pull request #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

prateekm opened a new pull request #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241
 
 
   Symptom: AM runs out of memory or gets stuck in GC loop during job deployment.
    
   Cause: During startup, containers fetch the JobModel and configs from the job coordinator http endpoint. For jobs with a large number of containers this can mean many concurrent requests to the JC Servlet for the large job model. The servlet currently serializes the JobModel to JSON for every incoming request. This can lead to excessive memory utilization and GC, leading to slow response times, AM running out of memory, or AM getting stuck in GC loop.
    
   Changes: JobModelManager now serializes the JobModel when creating the JobServlet. JobServlet now serves this serialized JobModel instead of serializing every time.
    
   Tests: Local performance test results using ab and monitoring memory utilization using YourKit: https://docs.google.com/document/d/1Qy4ScC9aLNZaJQuUBvtyoXdilUhkVtYjb5Mzwzv-DV8/edit?usp=sharing
   
   API Changes: None
    
   Upgrade Instructions: None
    
   Usage Instructions: None

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] rmatharu edited a comment on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
rmatharu edited a comment on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#issuecomment-570322826
 
 
   As @shanthoosh  pointed out, state-store-GC-monitor uses containers' locality mapping to determine the tasks on a given host (if the app is currently running), with this change the monitor could end up using a previous app-attempt's locality mappings.
   
   One way out may be to expose separate calls for cached vs. non-cached JobModel?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] prateekm commented on a change in pull request #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
prateekm commented on a change in pull request #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#discussion_r363426900
 
 

 ##########
 File path: samza-core/src/main/scala/org/apache/samza/coordinator/server/JobServlet.scala
 ##########
 @@ -22,19 +22,22 @@ package org.apache.samza.coordinator.server
 
 import java.util.concurrent.atomic.AtomicReference
 
-import org.apache.samza.SamzaException
-import org.apache.samza.job.model.JobModel
-import org.apache.samza.util.Logging
+import javax.servlet.http.{HttpServlet, HttpServletRequest, HttpServletResponse}
 
 /**
- * A servlet that dumps the job model for a Samza job.
+ * Serves the JSON serialized job model for the job.
  */
-class JobServlet(jobModelRef: AtomicReference[JobModel]) extends ServletBase with Logging {
-  protected def getObjectToWrite() = {
+class JobServlet(jobModelRef: AtomicReference[Array[Byte]]) extends HttpServlet {
+  override protected def doGet(request: HttpServletRequest, response: HttpServletResponse) {
 
 Review comment:
   Good catch. Meant to delete `ServletBase`, it's no longer used.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] rmatharu commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
rmatharu commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#issuecomment-570322826
 
 
   As @shanthoosh  pointed out, state-store-GC-monitor uses containers' locality mapping to determine the tasks on a given host (if the app is currently running), with this change the monitor could end up using a previous app-attempt's locality mappings.
   
   An alternative way may be to expose separate calls for cached vs. non-cached JobModel?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] prateekm commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
prateekm commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#issuecomment-571285288
 
 
   @shanthoosh clarified that the JobModel object isn't immutable, it re-reads the container locality from the metadata store every time getAllContainerLocality() is called (e.g.during serialization). Let me see what we can do to handle that, it might require a bigger change to how the JobModel is created.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] prateekm commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
prateekm commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#issuecomment-571260241
 
 
   @shanthoosh @rmatharu Thanks for the review.
   
   This PR does not affect LocalStoreMonitor:
   1. This PR updates the only place where jobModelRef is set to store the serialized job model instead of unserialized job model. So this shouldn't affect any usages of jobModelRef. Btw, I don't think this field needs to be an AtomicReference since its only updated once. @shanthoosh do you know if JobModelManager#apply is called more than once in any scenario (YARN or Standalone)? If not, I can change it to a regular field.
   
   2. LocalStoreMonitor doesn't use the JobModel served from the AM. It reads the coordinator stream and creates the JobModel itself, in a rather roundabout way. The code path is: `LocalStoreMonitor -> JobsClient#getTasks ---HTTP Request---> TasksResource#getTasks -> SamzaTaskProxy#getTasks -> CoordinatorStreamSystemConsumer/LocalityManger read`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] rmatharu commented on a change in pull request #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
rmatharu commented on a change in pull request #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#discussion_r362583103
 
 

 ##########
 File path: samza-core/src/main/scala/org/apache/samza/coordinator/JobModelManager.scala
 ##########
 @@ -93,15 +94,17 @@ object JobModelManager extends Logging {
       val streamMetadataCache = new StreamMetadataCache(systemAdmins, 0)
       val grouperMetadata: GrouperMetadata = getGrouperMetadata(config, localityManager, taskAssignmentManager, taskPartitionAssignmentManager)
 
-      val jobModel: JobModel = readJobModel(config, changelogPartitionMapping, streamMetadataCache, grouperMetadata)
-      jobModelRef.set(new JobModel(jobModel.getConfig, jobModel.getContainers, localityManager))
+      val jobModel = readJobModel(config, changelogPartitionMapping, streamMetadataCache, grouperMetadata)
+      val jobModelToServe = new JobModel(jobModel.getConfig, jobModel.getContainers, localityManager)
+      val serializedJobModelToServe = SamzaObjectMapper.getObjectMapper().writeValueAsBytes(jobModelToServe)
+      jobModelRef.set(serializedJobModelToServe)
 
 Review comment:
   Maybe serializedJobModelRef ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] prateekm commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
prateekm commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#issuecomment-571280408
 
 
   @shanthoosh See point 1. If those tools are querying the AM for container locality, they're already affected since the job model served by AM is not currently updated with locality information after the initial read. This PR does not change that behavior. 
   
   Also, I think Samza dashboard uses the Yarn RM APIs for finding container location, not the AM job model.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] shanthoosh commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
shanthoosh commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#issuecomment-571275976
 
 
   @prateekm
   
   1. LocalStoreMonitor in OSS is not used internally within LI. We'd re-implemented the same logic internally in C for GDPR purposes. The re-implementation queries the AM, AFAIK, to get the JobModel. 
   2. Last I checked, the samza dashboard also queries the AM endpoint to get the JobModel. Might be better to check if it would break any existing dashboard functionality with this caching change(if we return inconsistent container-to-host mapping).
   3. I think you're right in that we do not call the JobModelManager.apply is not invoked multiple times. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] shanthoosh commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
shanthoosh commented on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#issuecomment-568138070
 
 
   
   Please correct me if i'm wrong.
   
   
   JobModel currently contains the container-to-host-locality information, accessible via the high-level field `all-container-locality`.
   
   In a typical run of a samza-job, containers could move around different NM hosts due to transient h/w failures (or containers could be explicitly moved around by using our powerful container placement API)
   
   
   So if we cache the JobModel and after that the container-locality changes, then subsequent AM  HTTP reads of JobModel could potentially be serving inconsistent JobModel. 
   
   Sample JobModel taken from a samza job 
   
   ```bash
   [svenkata@host ~]$ cat ~/job-model  | jq . | head -n 10 
   {
     "all-container-locality": {
       "68-0": "host-1",
       "135-0": "host-2",
       "173-0": "host-3",
       "158-0": "host-4",
       "45-0": "host-5",
       "112-0": "host-6",
       "109": "host-7",
       "108": "host-8",
   ```
   
   In the above sample JobModel, the active container '68-0' runs in host-1. 
   
   Currently the ApplicationMaster process is used as the source of truth for JobModel. Some of our internal samza-admin monitors uses it to garbage-collect the left-over local RocksDB state-stores from disk(to reclaim disk-space). Just curious, if the above API change would affect our state-store GC-monitor.
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] shanthoosh edited a comment on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
shanthoosh edited a comment on issue #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#issuecomment-571275976
 
 
   @prateekm
   
   1. LocalStoreMonitor in OSS is not used internally within LI. We'd re-implemented the same logic internally in C for GDPR purposes. The re-implementation queries the AM, AFAIK, to get the JobModel. 
   2. Last I checked, the samza dashboard also queries the AM endpoint to get the JobModel. Might be better to check if it would break any existing dashboard functionality with this caching change(if we return inconsistent container-to-host mapping).
   3. I think you're right in that we do not call the JobModelManager.apply multiple times in YARN/standalone. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] rmatharu commented on a change in pull request #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers

Posted by GitBox <gi...@apache.org>.
rmatharu commented on a change in pull request #1241: SAMZA-2424: AM should cache and serve serialized Job Model to containers
URL: https://github.com/apache/samza/pull/1241#discussion_r362586519
 
 

 ##########
 File path: samza-core/src/main/scala/org/apache/samza/coordinator/server/JobServlet.scala
 ##########
 @@ -22,19 +22,22 @@ package org.apache.samza.coordinator.server
 
 import java.util.concurrent.atomic.AtomicReference
 
-import org.apache.samza.SamzaException
-import org.apache.samza.job.model.JobModel
-import org.apache.samza.util.Logging
+import javax.servlet.http.{HttpServlet, HttpServletRequest, HttpServletResponse}
 
 /**
- * A servlet that dumps the job model for a Samza job.
+ * Serves the JSON serialized job model for the job.
  */
-class JobServlet(jobModelRef: AtomicReference[JobModel]) extends ServletBase with Logging {
-  protected def getObjectToWrite() = {
+class JobServlet(jobModelRef: AtomicReference[Array[Byte]]) extends HttpServlet {
+  override protected def doGet(request: HttpServletRequest, response: HttpServletResponse) {
 
 Review comment:
   Should we update the 
   doGet implementation in ServletBase ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services