You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/11 03:39:00 UTC
[jira] [Commented] (SAMZA-2058) Integrate the input partition expansion aware SystemStreamGrouper changes to JobModel generation flow.

    [ https://issues.apache.org/jira/browse/SAMZA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739993#comment-16739993 ] 

ASF GitHub Bot commented on SAMZA-2058:
---------------------------------------

GitHub user shanthoosh opened a pull request:

    https://github.com/apache/samza/pull/874

    SAMZA-2058: Integrate the input partition expansion aware SystemStreamGrouper to JobModel generation flow.

    SAMZA-1989 added a partition expansion aware SystemStreamPartitionGrouper in samza. This PR aims at integrating the SystemStreamGrouper with the job model generation workflow of samza
    and make it work for both the yarn and standalone deployment models. 
    
    **Changes:** 
    
    1. Addition of TaskPartitionAssignmentManager to store the task to partition assignments present in JobModel to the underlying metadata store.  This is essential in persisting the Task to SystemStreamPartition assignments for the previous run of a samza job. Currently samza-yarn stores the metadata for a execution of a job in coordinator stream. Maximum supported kafka message size within LI is 1 MB. This limitation drove the decision to denormalize the task to SystemStreamPartition Map into individual messages and store in the coordinator stream. 
    2. Used the existing Coordinator stream json serde to deserialize/serialize the task to partition assigments to raw bytes before reading/writing into coordinator stream. 
    3. Changes in JobModelManager to integrate the input partition expansion aware SSPGrouper changes.
    4. Code/JavaDoc cleanup done in  MetadataStore utility classes.
    
    **Testing**:
    
    1. Added new unit-tests for all the newly added classes and fixed the existing unit-tests depending upon the changes.
    2. Standalone: Wrote few integration tests in TestZkLocalApplcationRunner for standalone to test input stream partition expansion.
    3. YARN: Tested this patch with a sample stream-to-table join high-level job from samza-hello-samza. Here're the relevant logs:  https://gist.github.com/shanthoosh/07357bb615d9cbbfa23cc02b98c9d142, which verifies that the AM is restarted on partition expansion of input stream and correct task to partition assignments are generated.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shanthoosh/samza SEP-5_left-over

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/samza/pull/874.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #874
    
----
commit 7bedd46ba98ebb18bdcbf6e3feace7188ac9af20
Author: Shanthoosh Venkataraman <sp...@...>
Date:   2018-12-07T02:04:18Z

    Initial commit.

----


> Integrate the input partition expansion aware SystemStreamGrouper changes to JobModel generation flow.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SAMZA-2058
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2058
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)