You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by mikel roah <mi...@outlook.com> on 2015/02/06 14:00:31 UTC

Message delivery

Hello,I'm working on a project with Samza. The system receives streams of messages and if a single message matches a set of keywords, it performs an action on it (i.e. deliver it outside the system or update the internal state). There is one job that performs the keyword matching and it should scale in 2 ways:with the number of eventswith the number of keywordsThe first point is achieved by controlling the number of partitions and containers. Instead the second one by splitting the set of keywords over different tasks that run in containers like this:
              This design would allow to handle messages and split the matching job over different tasks. How hard is to deliver the message to task 1 on partition X and to task 4 on partition Y?ThanksMikel 		 	   		  

Re: Message delivery

Posted by Chris Riccomini <cr...@apache.org>.
Hey Mikel,

Another thing that you can do is store your keywords in a remote store, and
query them from the job. This comes with some operational overhead, but
some people put an in-memory cache in the job to reduce RPC calls.

Cheers,
Chris

On Fri, Feb 6, 2015 at 8:41 AM, Chris Riccomini <cr...@apache.org>
wrote:

> Hey Mikel,
>
> This use case has been discussed in some detail here:
>
>   https://issues.apache.org/jira/browse/SAMZA-353
>
> Samza currently doesn't allow a single partition to be consumed by more
> than one task. You can, however, send the same message to multiple
> partitions. Within Samza, this can be achieved using this
> OutgoingMessageEnvelope constructor:
>
>   OutgoingMessageEnvelope(SystemStream systemStream, Object partitionKey,
> Object key, Object message)
>
> If you specify the partitionKey to be the partition #, you can send the
> same message to partition X and partition Y (in your example). Obviously,
> this isn't ideal (you have to write the same message N times), but it
> should work.
>
> Martin Kleppmann had a very similar use case, which he describes here:
>
>
> https://issues.apache.org/jira/browse/SAMZA-353?focusedCommentId=14205216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14205216
>
> Cheers,
> Chris
>
> On Fri, Feb 6, 2015 at 5:00 AM, mikel roah <mi...@outlook.com> wrote:
>
>> Hello,
>>
>> I'm working on a project with Samza. The system receives streams of
>> messages and if a single message matches a set of keywords, it performs an
>> action on it (i.e. deliver it outside the system or update the internal
>> state). There is one job that performs the keyword matching and it should
>> scale in 2 ways:
>>
>>    - with the number of events
>>    - with the number of keywords
>>
>>
>> The first point is achieved by controlling the number of partitions and
>> containers. Instead the second one by splitting the set of keywords over
>> different tasks that run in containers like this:
>>
>>
>>
>>
>>
>> This design would allow to handle messages and split the matching job
>> over different tasks. How hard is to deliver the message to task 1 on
>> partition X and to task 4 on partition Y?
>>
>>
>> Thanks
>>
>>
>> Mikel
>>
>
>

Re: Message delivery

Posted by Chris Riccomini <cr...@apache.org>.
Hey Mikel,

This use case has been discussed in some detail here:

  https://issues.apache.org/jira/browse/SAMZA-353

Samza currently doesn't allow a single partition to be consumed by more
than one task. You can, however, send the same message to multiple
partitions. Within Samza, this can be achieved using this
OutgoingMessageEnvelope constructor:

  OutgoingMessageEnvelope(SystemStream systemStream, Object partitionKey,
Object key, Object message)

If you specify the partitionKey to be the partition #, you can send the
same message to partition X and partition Y (in your example). Obviously,
this isn't ideal (you have to write the same message N times), but it
should work.

Martin Kleppmann had a very similar use case, which he describes here:


https://issues.apache.org/jira/browse/SAMZA-353?focusedCommentId=14205216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14205216

Cheers,
Chris

On Fri, Feb 6, 2015 at 5:00 AM, mikel roah <mi...@outlook.com> wrote:

> Hello,
>
> I'm working on a project with Samza. The system receives streams of
> messages and if a single message matches a set of keywords, it performs an
> action on it (i.e. deliver it outside the system or update the internal
> state). There is one job that performs the keyword matching and it should
> scale in 2 ways:
>
>    - with the number of events
>    - with the number of keywords
>
>
> The first point is achieved by controlling the number of partitions and
> containers. Instead the second one by splitting the set of keywords over
> different tasks that run in containers like this:
>
>
>
>
>
> This design would allow to handle messages and split the matching job over
> different tasks. How hard is to deliver the message to task 1 on
> partition X and to task 4 on partition Y?
>
>
> Thanks
>
>
> Mikel
>