You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/12/05 08:08:00 UTC

[jira] [Commented] (SAMZA-1835) Consolidate all processorId generation code to StreamProcessor

    [ https://issues.apache.org/jira/browse/SAMZA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709715#comment-16709715 ] 

ASF GitHub Bot commented on SAMZA-1835:
---------------------------------------

GitHub user shanthoosh opened a pull request:

    https://github.com/apache/samza/pull/844

    SAMZA-1835: Consolidate all processorId generation code.

    Currently, the processorId creation function createProcessorId() is repeated in three different implementation of `JobCoordinator` viz `ZkJobCoordinator`, `PassthroughJobCoordinator`, and `AzureJobCoordinator`.  Here're the few problems that stems from this duplication.
    
    1. `ProcessorId` is passed into the `MetricsReporterFactory` through the factory create method: `MetricsReporter getMetricsReporter(String name, String processorId, Config config);`. Custom `MetricsReporter` implementations currently use the processorId as a component in the generated metric names. Metrics reporters are instantiated from `LocalApplicationRunner` and`processorId` is currently passed in as null to `MetricsReporterFactory.getMetricsReporter`. This corrupts the generated metrics names.
    2. `ZkJobCoordinator`, `ZkUtils`,  `ZkLeaderElector` and different downstream components of `LocalApplicationRunner` currently instantiate and manage their private reporters, rather than the sharing common `MetricsRegistry` managed by `LocalApplicationRunner`. Since there is no common namespace and reporter shared between reported metrics,  generating metrics dashboards for standalone is kind of a hassle.
    
    This PR is comprised of the following changes:
    
    1. Moved the processorId generation to `LocalApplicationRunner` and injects the generated identifier to all the downstream layers.
    2. Deprecated the getProcessorId API in JobCoordinator interface.
    3. Add the `processorId` and `metricsRegistry` arguments to the `getJobCoordinator` method of `JobCoordinatorFactory` t
    4. Fixed the unit tests and added unit tests for `LocalApplicationRunner.createProcessorId`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shanthoosh/samza SAMZA-1835

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/samza/pull/844.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #844
    
----
commit 6afe2b27c595b2870cc979f2f48c0af47b0fde84
Author: Shanthoosh Venkataraman <sp...@...>
Date:   2018-11-30T20:11:26Z

    SAMZA-1835: Consolidate all processorId generation code.

----


> Consolidate all processorId generation code to StreamProcessor
> --------------------------------------------------------------
>
>                 Key: SAMZA-1835
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1835
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Yi Pan (Data Infrastructure)
>            Assignee: Sanil Jain
>            Priority: Major
>             Fix For: 1.0
>
>
> Currently, the processorId creation function createProcessorId() is repeated in three different implementation of JobCoordinator: ZkJobCoordinator, PassthroughJobCoordinator, and AzureJobCoordinator.
> Making the processId generation dependent on JobCoordinator is also not required now and each processor should know the durable processorId at the startup time, not depending on the creation of JobCoordinator. Hence, consolidating the createProcessorId() code to StreamProcessor and pass the processId to the JobCoordinator constructor should be the right thing to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)