You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Navina Ramesh (JIRA)" <ji...@apache.org> on 2015/05/14 01:45:00 UTC

[jira] [Commented] (SAMZA-676) Implement Broadcast Stream

    [ https://issues.apache.org/jira/browse/SAMZA-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542925#comment-14542925 ] 

Navina Ramesh commented on SAMZA-676:
-------------------------------------

Hi Yan,
I went through your design document and have a few questions/comments.

1. I think usecase #3 and #4 are very similar. I have seen many instances of #4 coming up (reg. bootstrap stream) which will work well with global state implementation. For now, broadcast stream is a feature for convenience. I think we have been suggesting workarounds in the mailing lists :) Thanks for picking it up!
2. 	
|TaskName: |Partition 0| Partition 1| Partition 2 |
	|														  |
	|Stream A|Partition 0 |Partition 1 |Partition 2  |
	|Stream B|Partition 0 |Partition 1 |Partition 2  |
	|Stream C|Partition 0 |Partition 1 |             |
	|*Broadcast Stream* |*Partition 0* |*Partition 0* |*Partition 0*  |

bq. a. Do all broadcast streams have only 1 partition?
bq. b. How does this affect the consumer’s messagechooser priority? does it provide more priority to broadcast stream by default ? In general, my question is how will each task proceed at the same rate. We could have hot partitions and those tasks may not react to the broadcast stream at the same time as other tasks.
bq. c. Is the broadcast stream also intended to make config changes at a task level? Isn’t that a functionality at the JC?

3. bq. However, this is the feature we will need for the broadcast stream. Because all the tasks will have the broadcast stream. When more than two tasks are assigned to the same container, the two broadcast streams have different offsets, the consumer needs to consumer the same stream more than once, with different offsets.
> Can you explain this better?

4. 
bq. task.global.input=kafka.foo#1,kafka.doo#0
Why is partition number needed here? Are you suggesting that the tasks can consume from one partition of the broadcast stream only? 
If I have a broadcast topic with 32 partitions and I want all tasks to consume from all of them, then specifying the config will be tedious. 


> Implement Broadcast Stream
> --------------------------
>
>                 Key: SAMZA-676
>                 URL: https://issues.apache.org/jira/browse/SAMZA-676
>             Project: Samza
>          Issue Type: Improvement
>          Components: container
>            Reporter: Yan Fang
>            Assignee: Yan Fang
>         Attachments: BroadcastStreamDesign.md, BroadcastStreamDesign.pdf
>
>
> There are a lot of discussion in SAMZA-353 about assigning the same SSP to multiple taskNames. This ticket is a subset of the discussion. Only focus on the broadcast stream implementation. 
> The goal is to assign one SSP to all the taskNames. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)