You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by "Shadi A. Noghabi" <ab...@illinois.edu> on 2015/07/02 21:20:06 UTC

Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated July 2, 2015, 7:20 p.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Summary (updated)
-----------------

Writing a tool to read from the coordinator stream and react to config changes accordingly. 


Repository: samza


Description (updated)
-------

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to cset-config massages with key "yarn.container.count". 

If the config manager is enabled, it will come up after the JobRunner submits the job, and listen to config messages added there after. To enable the ConfigManager use the flag "--config-manager-is-active=true" when submitting the job. The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured using "--config-manager-polling-interval=200" when submitting the job. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-job.sh --config-factory=<config factory> --config-path=<path to config file of a job> --config-manager-is-active=<false,true> --config-manager-polling-interval=<polling-interval>

And this command should be used to run a job instead.


Diffs
-----

  checkstyle/import-control.xml 3374f0c432e61ac4cda275377604cfd481f0cddf 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java 6c1e488d00d8593d59c89b57e673e0b6b90fd7d2 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/job/ConfigManager.java PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
  samza-core/src/main/scala/org/apache/samza/util/CommandLine.scala f26501b2820b99d1ad2964c6f7833ef7eaddba97 
  samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamSystemFactory.java 647cadb3a4e51bec8204197d77ad35a6b29afcec 
  samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamSystemProducer.java 68e32554c18f443565284b807f43f4a5b05afbce 
  samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamWriter.java PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala 20aa3736a842c462449682f3fc54c7f66c356d7d 
  samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMaster.scala 7b7d86a43c69e72c47eaa91f68be24e0f4022891 

Diff: https://reviews.apache.org/r/36006/diff/


Testing
-------


Thanks,

Shadi A. Noghabi


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Gustavo Anatoly F. V. Solís" <gu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review90929
-----------------------------------------------------------



checkstyle/import-control.xml (line 126)
<https://reviews.apache.org/r/36006/#comment144096>

    Remove trailing whitespaces


- Gustavo Anatoly F. V. Solís


On Julho 2, 2015, 7:20 p.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated Julho 2, 2015, 7:20 p.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to cset-config massages with key "yarn.container.count". 
> 
> If the config manager is enabled, it will come up after the JobRunner submits the job, and listen to config messages added there after. To enable the ConfigManager use the flag "--config-manager-is-active=true" when submitting the job. The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured using "--config-manager-polling-interval=200" when submitting the job. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-job.sh --config-factory=<config factory> --config-path=<path to config file of a job> --config-manager-is-active=<false,true> --config-manager-polling-interval=<polling-interval>
> 
> And this command should be used to run a job instead.
> 
> 
> Diffs
> -----
> 
>   checkstyle/import-control.xml 3374f0c432e61ac4cda275377604cfd481f0cddf 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java 6c1e488d00d8593d59c89b57e673e0b6b90fd7d2 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/job/ConfigManager.java PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-core/src/main/scala/org/apache/samza/util/CommandLine.scala f26501b2820b99d1ad2964c6f7833ef7eaddba97 
>   samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamSystemFactory.java 647cadb3a4e51bec8204197d77ad35a6b29afcec 
>   samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamSystemProducer.java 68e32554c18f443565284b807f43f4a5b05afbce 
>   samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamWriter.java PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala 20aa3736a842c462449682f3fc54c7f66c356d7d 
>   samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMaster.scala 7b7d86a43c69e72c47eaa91f68be24e0f4022891 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by Navina Ramesh <nr...@linkedin.com>.

> On July 21, 2015, 9:53 p.m., Navina Ramesh wrote:
> > Ideally, the job coordinator should act as the config manager. Since the chain of control is still from AppMaster to JobCoordinator (instead of the other way around), the only reasonable way to get this working is to have it as a separate script (like run-config-manager.sh). Having said this, I wonder if it is necessary for the ConfigManger to remain in samza-core. @Shadi: Were there any issues in keeping this code within the auto-scaling module?
> 
> Shadi A. Noghabi wrote:
>     It is in the auto scaling module. It is in the package org.apache.samza.autoScaling.deployer. However the whole auto scaling is in samza-core. Is that a problem?

Ok. Miscommunication :) I meant to ask if there were any issues in keeping this code withing auto-scaling "sub-project". Looks fine now!


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92227
-----------------------------------------------------------


On July 28, 2015, 2:39 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated July 28, 2015, 2:39 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.

> On July 21, 2015, 9:53 p.m., Navina Ramesh wrote:
> > Ideally, the job coordinator should act as the config manager. Since the chain of control is still from AppMaster to JobCoordinator (instead of the other way around), the only reasonable way to get this working is to have it as a separate script (like run-config-manager.sh). Having said this, I wonder if it is necessary for the ConfigManger to remain in samza-core. @Shadi: Were there any issues in keeping this code within the auto-scaling module?

It is in the auto scaling module. It is in the package org.apache.samza.autoScaling.deployer. However the whole auto scaling is in samza-core. Is that a problem?


- Shadi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92227
-----------------------------------------------------------


On July 28, 2015, 2:39 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated July 28, 2015, 2:39 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by Navina Ramesh <nr...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92227
-----------------------------------------------------------


Ideally, the job coordinator should act as the config manager. Since the chain of control is still from AppMaster to JobCoordinator (instead of the other way around), the only reasonable way to get this working is to have it as a separate script (like run-config-manager.sh). Having said this, I wonder if it is necessary for the ConfigManger to remain in samza-core. @Shadi: Were there any issues in keeping this code within the auto-scaling module?


samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 47)
<https://reviews.apache.org/r/36006/#comment146268>

    Remove commented lines



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 90)
<https://reviews.apache.org/r/36006/#comment146269>

    RM_ADDRESS_OPT, RM_PORT_OPT, POLLING_INTERVAL_OPT, SERVER_URL_OPT can be made private



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 151)
<https://reviews.apache.org/r/36006/#comment146272>

    Use logger instance



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 215)
<https://reviews.apache.org/r/36006/#comment146271>

    nit: typo in "server ural"



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 236)
<https://reviews.apache.org/r/36006/#comment146381>

    We should, in the future, change the handlers to implement a common interface. This way, users can register custom handlers and also, perhaps precondition checks. Whether that will open up to mishandling of config changes is still up to debate. Just wanted to comment this here.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 310)
<https://reviews.apache.org/r/36006/#comment146679>

    Can you add a check here to verify that the job ha actually been killed? You can use the rest api to verify that the status of the job has been changed to "KILLED".



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 317)
<https://reviews.apache.org/r/36006/#comment146680>

    I think it is fine to skip messages written to the stream after you have killed a job's application attempt and before the next attempt starts up. However you can't avoid the bootstrap code from reading these messages. I don't think there is clean solution for this here.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 330)
<https://reviews.apache.org/r/36006/#comment146682>

    Can you move the yarn rest api related code to a separate YarnUtils class? Makes the code more cleaner and it can be re-used in other places in the future.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 392)
<https://reviews.apache.org/r/36006/#comment146684>

    Why do you need this class here?



samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala (line 52)
<https://reviews.apache.org/r/36006/#comment146685>

    Can you please add a comment here for what "isFirstTime" is used for?



samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala (line 123)
<https://reviews.apache.org/r/36006/#comment146687>

    Code here looks kind of arbitary here. Looks like you want to write the jobcoordinator url to the coordinator stream. You should perhaps move this to the init method in SamzaAppMasterService??
    This way you don't have to change the interface of the run method.


- Navina Ramesh


On July 18, 2015, 2:52 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated July 18, 2015, 2:52 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala af42c6a6636953a95f79837fe372e0dbd735df70 
>   samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMaster.scala 7b7d86a43c69e72c47eaa91f68be24e0f4022891 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.

> On July 22, 2015, 7:08 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java, line 188
> > <https://reviews.apache.org/r/36006/diff/2/?file=1015151#file1015151line188>
> >
> >     Any reason the boostrap, skipUnreadMessages, readConfigMessages need to be public?

bootstrap and skipUnreadMessages can be private. However, readConfigMessages is later used in other components of auto-scaling.


> On July 22, 2015, 7:08 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java, line 318
> > <https://reviews.apache.org/r/36006/diff/2/?file=1015151#file1015151line318>
> >
> >     Did you update the container count in this config object? I couldn't find the code to update this before you start the JobRunner?

There is no need to change the config, since the new configuration is written to the coordinator stream, and it will be picked up there. I thought changing the config file might make it confusing.


- Shadi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92625
-----------------------------------------------------------


On July 28, 2015, 2:39 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated July 28, 2015, 2:39 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.

> On July 22, 2015, 7:08 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java, line 318
> > <https://reviews.apache.org/r/36006/diff/2/?file=1015151#file1015151line318>
> >
> >     Did you update the container count in this config object? I couldn't find the code to update this before you start the JobRunner?
> 
> Shadi A. Noghabi wrote:
>     There is no need to change the config, since the new configuration is written to the coordinator stream, and it will be picked up there. I thought changing the config file might make it confusing.

OK. This is not clear to me. Who is writing the new configuration to the coordinator stream? What's the sequence of messages written in the coordinator stream, between the new job configuration, the set-configure messages changing the JobCoordinator URL, and the set-configure messages changing the number of containers? Let's sync up tomorrow.


- Yi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92625
-----------------------------------------------------------


On July 28, 2015, 2:39 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated July 28, 2015, 2:39 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92625
-----------------------------------------------------------



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 21)
<https://reviews.apache.org/r/36006/#comment146833>

    We don't use camel style package names in Samza. It should be org.apache.samza.autoscaling.deployer



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 123)
<https://reviews.apache.org/r/36006/#comment146838>

    Question: shouldn't the JobConfig be returned from the JobCoordinator's web interface? Or, more precisely, this ConfigManager should only need to know the CoordinatorStream topic to start with, since the JobCoordinator will send its url info into the CoordinatorStream topic. Even the initial config can be read from that topic as well.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 131)
<https://reviews.apache.org/r/36006/#comment146839>

    As Navina pointed out, we could have wrapped the Yarn related variables/functions into a YarnUtils class, which can hide all the details about yarnClient, hConfig, etc.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 188)
<https://reviews.apache.org/r/36006/#comment146841>

    Any reason the boostrap, skipUnreadMessages, readConfigMessages need to be public?



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 217)
<https://reviews.apache.org/r/36006/#comment146842>

    The function name does not convey what it actually does. This is *not* just readConfigMessages, this method is *processing* messages in coordinator streams. I am still a bit confused among the three modes you defined here. If I read correctly, you are referring to three different *process_mode* here:
    1) SKIP_ALL
    2) PROCESS_SERVER_URL
    3) PROCESS_CONTAINER_COUNT_AND_SERVER_URL
    Does the above make sense? I would suggest that you define a set-config message filter here that can define the list of messages that you want to react on. Therefore, if you want to skip all, the filter filters everything. And you can put arbitrary combination of different types of messages you want to react to, instead of just 3 different combinations here.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 303)
<https://reviews.apache.org/r/36006/#comment146843>

    Change to logger-based logging. And the message should be "killing the current job".



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 310)
<https://reviews.apache.org/r/36006/#comment146844>

    And add a "Killed the current job" log line after confirming the job is dead.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 318)
<https://reviews.apache.org/r/36006/#comment146850>

    Did you update the container count in this config object? I couldn't find the code to update this before you start the JobRunner?



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 338)
<https://reviews.apache.org/r/36006/#comment146848>

    I don't see why jackson lib can't do this? Can we avoid adding the Gson dependency here?



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java (line 92)
<https://reviews.apache.org/r/36006/#comment146851>

    How is it used? I don't see it used in ConfigManager. Where else could it be used? If no one uses it, we should remove this function.


- Yi Pan (Data Infrastructure)


On July 18, 2015, 2:52 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated July 18, 2015, 2:52 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala af42c6a6636953a95f79837fe372e0dbd735df70 
>   samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMaster.scala 7b7d86a43c69e72c47eaa91f68be24e0f4022891 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by Navina Ramesh <nr...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review94005
-----------------------------------------------------------



samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java (line 68)
<https://reviews.apache.org/r/36006/#comment148507>

    If application id is not found , it returns null. Please add this to the doc.



samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala (line 52)
<https://reviews.apache.org/r/36006/#comment148511>

    Can you please add a one-line comment about the shouldRewriteConfigToCoordinatorStream variable? Someone without the context of ConfigManager may not understand the motivation behind it.


Overall, looks good. Please address Yi's comments as well and put up a patch.

- Navina Ramesh


On July 28, 2015, 2:39 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated July 28, 2015, 2:39 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: SAMZA-704: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.

> On Aug. 1, 2015, 12:07 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java, line 307
> > <https://reviews.apache.org/r/36006/diff/3/?file=1023300#file1023300line307>
> >
> >     This also reminds me of one thing: where do we assume that the ConfigManager will be running? If we assumes a specific host or hosts to run this ConfigManager (e.g. RM nodes), we better call it out and add some docs to describe it.

It can run any where. However, in order to run it you have to set these configs:
yarn.rm.address
yarn.rm.port


> On Aug. 1, 2015, 12:07 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java, line 1
> > <https://reviews.apache.org/r/36006/diff/3/?file=1023300#file1023300line1>
> >
> >     The file directory name is still autoScaling. Isn't it a problem?

I have fixed it to autoscaling not "autoScaling". I don't know why the RB is showing so.


- Shadi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review93820
-----------------------------------------------------------


On Aug. 8, 2015, 1:26 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2015, 1:26 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: SAMZA-704: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.

> On Aug. 1, 2015, 12:07 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java, line 1
> > <https://reviews.apache.org/r/36006/diff/3/?file=1023300#file1023300line1>
> >
> >     The file directory name is still autoScaling. Isn't it a problem?
> 
> Shadi A. Noghabi wrote:
>     I have fixed it to autoscaling not "autoScaling". I don't know why the RB is showing so.

This was due to an issue in mac, where it is not case sensitive. I fixed the problem, and put up a update.


- Shadi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review93820
-----------------------------------------------------------


On Aug. 8, 2015, 6:12 p.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2015, 6:12 p.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review93820
-----------------------------------------------------------



samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala (line 70)
<https://reviews.apache.org/r/36006/#comment148216>

    shouldRewriteConfigToCoordinatorStream is the action, not the job-level functionality this variable is controlling. I would prefer "overwriteJobConfig" or "resetJobConfig" which tells more explicitly what the job-level function this controls.



samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 1)
<https://reviews.apache.org/r/36006/#comment148213>

    The file directory name is still autoScaling. Isn't it a problem?



samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 226)
<https://reviews.apache.org/r/36006/#comment148214>

    Can we lower it down to info? This should not trigger alerts.



samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java (line 307)
<https://reviews.apache.org/r/36006/#comment148215>

    This also reminds me of one thing: where do we assume that the ConfigManager will be running? If we assumes a specific host or hosts to run this ConfigManager (e.g. RM nodes), we better call it out and add some docs to describe it.



samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala (line 61)
<https://reviews.apache.org/r/36006/#comment148217>

    Can we use CoordinatorStreamMessage defined constants, instead of hard-code strings here?


- Yi Pan (Data Infrastructure)


On July 28, 2015, 2:39 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated July 28, 2015, 2:39 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: SAMZA-724: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review94705
-----------------------------------------------------------

Ship it!


LGTM. Just two nits. Thanks!


samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java (line 70)
<https://reviews.apache.org/r/36006/#comment149321>

    Question: why is this function only limited to running application instances? It seems maybe getRunningAppId() is more accurate?


- Yi Pan (Data Infrastructure)


On Aug. 8, 2015, 6:19 p.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2015, 6:19 p.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> This RBis for SAMZA-724.
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: SAMZA-724: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by Navina Ramesh <nr...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review95359
-----------------------------------------------------------

Ship it!


Tried it out. Works as expected. Thanks!

- Navina Ramesh


On Aug. 13, 2015, 11:54 p.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2015, 11:54 p.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> This RBis for SAMZA-724.
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle a935088eccb3aee4fbde21275fa8e701c215a69e 
>   checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala c6e994ff707af802ded57c3bc1762971892014da 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle a8d2c885254ca3994327fda18e09c49bc9c5e830 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: SAMZA-724: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated Aug. 14, 2015, 1:25 a.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Changes
-------

Fixed some java doc nits.


Repository: samza


Description
-------

This RBis for SAMZA-724.

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 

The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >

The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>


Diffs (updated)
-----

  build.gradle a935088eccb3aee4fbde21275fa8e701c215a69e 
  checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
  gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java PRE-CREATION 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
  samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala c6e994ff707af802ded57c3bc1762971892014da 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
  settings.gradle a8d2c885254ca3994327fda18e09c49bc9c5e830 

Diff: https://reviews.apache.org/r/36006/diff/


Testing
-------

Tested with hello samza and works properly.


Thanks,

Shadi A. Noghabi


Re: Review Request 36006: SAMZA-724: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated Aug. 13, 2015, 11:54 p.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Repository: samza


Description
-------

This RBis for SAMZA-724.

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 

The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >

The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>


Diffs (updated)
-----

  build.gradle a935088eccb3aee4fbde21275fa8e701c215a69e 
  checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
  gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java PRE-CREATION 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
  samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala c6e994ff707af802ded57c3bc1762971892014da 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
  settings.gradle a8d2c885254ca3994327fda18e09c49bc9c5e830 

Diff: https://reviews.apache.org/r/36006/diff/


Testing
-------

Tested with hello samza and works properly.


Thanks,

Shadi A. Noghabi


Re: Review Request 36006: SAMZA-724: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by Navina Ramesh <nr...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review94992
-----------------------------------------------------------



samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java (line 98)
<https://reviews.apache.org/r/36006/#comment149763>

    The javadoc does not indicate that the polling Interval is configurable.



samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java (line 289)
<https://reviews.apache.org/r/36006/#comment149771>

    I was trying out your patch. 1000 seconds (~16 min) seems a litte too much time to wait. I would expect the overall time for wait (including multiple kill attempts) to be no more than 5 minutes. Can you change this hard-coded sleep time and make it configurable?



samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java (line 291)
<https://reviews.apache.org/r/36006/#comment149775>

    You are reading the application state only once outside the while loop in Line 286. So, the state never changes when the check happens in line 291. I think you should attempt to kill more than once and check the state each time.
    
    When I was trying out the patch, it killed the job fine. But it never came back up!



samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java (line 126)
<https://reviews.apache.org/r/36006/#comment149741>

    Can you add a short description for these exceptions here to get rid of these warning?
    
    samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java:355: warning: no @param for args
      public static void main(String[] args) {
                         ^
    samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java:126: warning: no description for @throws
       * @throws IOException
         ^
    samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java:127: warning: no description for @throws
       * @throws YarnException
         ^


- Navina Ramesh


On Aug. 11, 2015, 6:38 p.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated Aug. 11, 2015, 6:38 p.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> This RBis for SAMZA-724.
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle a935088eccb3aee4fbde21275fa8e701c215a69e 
>   checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala c6e994ff707af802ded57c3bc1762971892014da 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle a8d2c885254ca3994327fda18e09c49bc9c5e830 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: SAMZA-724: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated Aug. 11, 2015, 6:38 p.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Repository: samza


Description
-------

This RBis for SAMZA-724.

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 

The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >

The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>


Diffs (updated)
-----

  build.gradle a935088eccb3aee4fbde21275fa8e701c215a69e 
  checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
  gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java PRE-CREATION 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
  samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala c6e994ff707af802ded57c3bc1762971892014da 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
  settings.gradle a8d2c885254ca3994327fda18e09c49bc9c5e830 

Diff: https://reviews.apache.org/r/36006/diff/


Testing
-------

Tested with hello samza and works properly.


Thanks,

Shadi A. Noghabi


Re: Review Request 36006: SAMZA-724: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated Aug. 10, 2015, 6:35 p.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Repository: samza


Description
-------

This RBis for SAMZA-724.

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 

The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >

The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>


Diffs (updated)
-----

  build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
  checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
  gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java PRE-CREATION 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
  samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
  settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 

Diff: https://reviews.apache.org/r/36006/diff/


Testing
-------

Tested with hello samza and works properly.


Thanks,

Shadi A. Noghabi


Re: Review Request 36006: SAMZA-724: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated Aug. 8, 2015, 6:19 p.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Summary (updated)
-----------------

SAMZA-724: Writing a tool to read from the coordinator stream and react to config changes accordingly. 


Repository: samza


Description (updated)
-------

This RBis for SAMZA-724.

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 

The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >

The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>


Diffs
-----

  build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
  checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
  gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java PRE-CREATION 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
  samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
  settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 

Diff: https://reviews.apache.org/r/36006/diff/


Testing
-------

Tested with hello samza and works properly.


Thanks,

Shadi A. Noghabi


Re: Review Request 36006: SAMZA-704: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated Aug. 8, 2015, 6:12 p.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Repository: samza


Description
-------

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 

The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >

The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>


Diffs (updated)
-----

  build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
  checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
  gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/deployer/ConfigManager.java PRE-CREATION 
  samza-autoscaling/src/main/java/org/apache/samza/autoscaling/utils/YarnUtil.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
  samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
  settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 

Diff: https://reviews.apache.org/r/36006/diff/


Testing
-------

Tested with hello samza and works properly.


Thanks,

Shadi A. Noghabi


Re: Review Request 36006: SAMZA-704: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated Aug. 8, 2015, 1:26 a.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Summary (updated)
-----------------

SAMZA-704: Writing a tool to read from the coordinator stream and react to config changes accordingly. 


Repository: samza


Description
-------

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 

The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >

The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>


Diffs (updated)
-----

  build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
  checkstyle/import-control.xml 24ed680785175f3cdf955602b7a813684edd813e 
  gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
  samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
  samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java 2277a732b9ab763edf19a0fbec288ff72b27583b 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala d7c928c7401e539a370d4e82276e7dabbce1b638 
  samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
  settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 

Diff: https://reviews.apache.org/r/36006/diff/


Testing
-------

Tested with hello samza and works properly.


Thanks,

Shadi A. Noghabi


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated July 28, 2015, 2:39 a.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Repository: samza


Description
-------

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 

The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >

The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>


Diffs
-----

  build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
  checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
  gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
  samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
  samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
  samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
  settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 

Diff: https://reviews.apache.org/r/36006/diff/


Testing (updated)
-------

Tested with hello samza and works properly.


Thanks,

Shadi A. Noghabi


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated July 28, 2015, 2:22 a.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Repository: samza


Description
-------

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 

The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >

The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>


Diffs (updated)
-----

  build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
  checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
  gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
  samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
  samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
  samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
  settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 

Diff: https://reviews.apache.org/r/36006/diff/


Testing
-------


Thanks,

Shadi A. Noghabi


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.

> On July 22, 2015, 6:25 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala, line 130
> > <https://reviews.apache.org/r/36006/diff/2/?file=1015156#file1015156line130>
> >
> >     Question: can we move this code into the JobCoordinator.start()? I don't see any reason why this has to be outside JobCoordinator?

It cannot be moved to there since the server url is not set yet in JobCoordinator.start(). I have moved this to SamzaAppMasterService.onInit()


On July 22, 2015, 6:25 p.m., Shadi A. Noghabi wrote:
> > Some high-level comments: I don't see a reason that these tools classes have to be in samza-core. Can we move it to a separate module, like samza-man? That will also allow us to keep the minimum dependencies in samza-core. I will continue w/ the implementation of ConfigManager.

I moved the config manager to a separate module. What do you mean by "I will continue w/ the implementation of ConfigManager" ?


- Shadi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92616
-----------------------------------------------------------


On July 28, 2015, 2:39 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated July 28, 2015, 2:39 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92616
-----------------------------------------------------------



build.gradle (line 150)
<https://reviews.apache.org/r/36006/#comment146824>

    We already are including jackson libraries as dependencies. What are additional functions that must require gson lib here? Generally, we want to be stingy in expanding out dependencies unless absolutely necessary.



build.gradle (line 151)
<https://reviews.apache.org/r/36006/#comment146825>

    Same question here. We want to make samza-core depends on less externals. I start to think that it might be reasonable to move the tools to a separate module like samza-man?



samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala (line 52)
<https://reviews.apache.org/r/36006/#comment146826>

    And how does the JobRunner knows whether it is starting the job the first time, if the first time is defined across job restarts?



samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala (line 127)
<https://reviews.apache.org/r/36006/#comment146828>

    Question: can we move this code into the JobCoordinator.start()? I don't see any reason why this has to be outside JobCoordinator?


Some high-level comments: I don't see a reason that these tools classes have to be in samza-core. Can we move it to a separate module, like samza-man? That will also allow us to keep the minimum dependencies in samza-core. I will continue w/ the implementation of ConfigManager.

- Yi Pan (Data Infrastructure)


On July 18, 2015, 2:52 a.m., Shadi A. Noghabi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> -----------------------------------------------------------
> 
> (Updated July 18, 2015, 2:52 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
> 1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
> 2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >
> 
> The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:
> 
> 
> <path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala af42c6a6636953a95f79837fe372e0dbd735df70 
>   samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMaster.scala 7b7d86a43c69e72c47eaa91f68be24e0f4022891 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>


Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

Posted by "Shadi A. Noghabi" <ab...@illinois.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/
-----------------------------------------------------------

(Updated July 18, 2015, 2:52 a.m.)


Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram.


Repository: samza


Description (updated)
-------

After a job is submitted, it might need some configuration change, specifically it might need more containers. In SAMZA-704 a tool is being added to write to the coordinator stream (CoordinatorStreamWriter).  This tool can be used to write new configurations to the coordinator stream. However, another tool (ConfigManager) is needed to read the config changes and react to them, which is the goal of this task. This tool should be brought up after the job is submitted and read any config changes added to the coordinator stream, and react to each accordingly. 

This tool, called the Config Manager, is focusing on handling container changs by reacting to set-config massages with key "yarn.container.count". 

The config manager is a separate standa alone module, that should be brought up separately after the submission of a job. Therefore, you have to add two configurations to the input config file:
1. yarn.rm.address= <ip of resource manager in yarn. ex: localhost >
2. yarn.rm.port= <the port of the resource manager http server. ex: 8088 >

The config manger will periodically poll the coordinator stream to see if there are any new messages. This period is set to 100 ms by deafualt. However, it can be configured by adding configManager.polling.interval=<polling interval> to the input config file. Thus, overal the command to run the config manager along with the job would be:


<path to samza deployment>/bin/run-config-manager.sh --config-factory=<config factory> --config-path=<path to config file of a job>


Diffs (updated)
-----

  build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
  checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
  gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
  samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
  samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
  samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala ea702a919348305ff95ce0b4ca1996a13aff04ec 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala af42c6a6636953a95f79837fe372e0dbd735df70 
  samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMaster.scala 7b7d86a43c69e72c47eaa91f68be24e0f4022891 

Diff: https://reviews.apache.org/r/36006/diff/


Testing
-------


Thanks,

Shadi A. Noghabi