You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by "Balusamy, Elangovan" <El...@altisource.com> on 2015/09/02 09:19:24 UTC

One task sending payload to multiple output streams

Folks,

We are running a multi-node Samza cluster with multiple partitions for each task. In one of the tasks, we would like to send output to two different tasks and the payload also is different. Below is the code that does it

messageCollector.send(new OutgoingMessageEnvelope(output_stream_1, "this is for you"));

messageCollector.send(new OutgoingMessageEnvelope(output_stream_2, "this is for the other guy"));

output_stream_1 has 4 partitions
output_stream_2 has 2 partitions

We see that  only 50% of the  partitions are being used, the other 50% doesn't get any messages.

output_stream_1 has messages only in 2 partitions and output_stream_2 has messages only in 1 partition.



Samza version: 0.9.0


Kafka Version:  0.8.2.1


Thanks
Elango
***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
***********************************************************************************************************************

Re: One task sending payload to multiple output streams

Posted by Yan Fang <ya...@gmail.com>.
Hi Elangovan,

I think not providing the partition key is the reason. Can you try to put
the partition key, such as 0,1,2,4 to see how it works? This uses the
default partition class. For better control, you may provide your own
partition class. This is to demonstrate the cause of your problem.

Thanks,

Fang, Yan
yanfang724@gmail.com

On Wed, Sep 2, 2015 at 10:27 AM, Balusamy, Elangovan <
Elangovan.Balusamy@altisource.com> wrote:

> We have one container for each partition, we are not providing any
> partition key.
>
> messageCollector.send(new OutgoingMessageEnvelope(output_stream_1, "this
> is for you"));
> messageCollector.send(new OutgoingMessageEnvelope(output_stream_2,  "this
> is for the other guy"));
>
> this was an example, the actual message payload is a HashMap serialized
> into bytes.
>
>   ByteArrayOutputStream b = new ByteArrayOutputStream();
>   ObjectOutputStream o = new ObjectOutputStream(b);
>   o.writeObject(map);
>   bytes = b.toByteArray();
>
> -----Original Message-----
> From: Yi Pan [mailto:nickpan47@gmail.com]
> Sent: Wednesday, September 02, 2015 10:31 PM
> To: dev@samza.apache.org
> Subject: Re: One task sending payload to multiple output streams
>
> Hi, Elangovan,
>
> Could you confirm how many containers in your job? And how is the outgoing
> messages partitioned on? Most likely, this is related to the choice on the
> outgoing message partition key, which is the only deciding factor for which
> partition of a topic the message is sent to.
>
> -Yi
>
> On Wed, Sep 2, 2015 at 12:58 AM, Balusamy, Elangovan <
> Elangovan.Balusamy@altisource.com> wrote:
>
> > The task consumes from one stream with 2 partitions.
> >
> > -----Original Message-----
> > From: Garry Turkington [mailto:g.turkington@improvedigital.com]
> > Sent: Wednesday, September 02, 2015 1:12 PM
> > To: dev@samza.apache.org
> > Subject: RE: One task sending payload to multiple output streams
> >
> > Hi,
> >
> > How many input streams does this task consume and how are they
> partitioned?
> >
> > Garry
> >
> > -----Original Message-----
> > From: Balusamy, Elangovan [mailto:Elangovan.Balusamy@altisource.com]
> > Sent: 02 September 2015 08:19
> > To: dev@samza.apache.org
> > Cc: Chandra, Saurabh
> > Subject: One task sending payload to multiple output streams
> >
> > Folks,
> >
> > We are running a multi-node Samza cluster with multiple partitions for
> > each task. In one of the tasks, we would like to send output to two
> > different tasks and the payload also is different. Below is the code
> > that does it
> >
> > messageCollector.send(new OutgoingMessageEnvelope(output_stream_1,
> > "this is for you"));
> >
> > messageCollector.send(new OutgoingMessageEnvelope(output_stream_2,
> > "this is for the other guy"));
> >
> > output_stream_1 has 4 partitions
> > output_stream_2 has 2 partitions
> >
> > We see that  only 50% of the  partitions are being used, the other 50%
> > doesn't get any messages.
> >
> > output_stream_1 has messages only in 2 partitions and output_stream_2
> > has messages only in 1 partition.
> >
> >
> >
> > Samza version: 0.9.0
> >
> >
> > Kafka Version:  0.8.2.1
> >
> >
> > Thanks
> > Elango
> >
> > **********************************************************************
> > *************************************************
> >
> > This email message and any attachments are intended solely for the use
> > of the addressee. If you are not the intended recipient, you are
> > prohibited from reading, disclosing, reproducing, distributing,
> > disseminating or otherwise using this transmission. If you have
> > received this message in error, please promptly notify the sender by
> > reply email and immediately delete this message from your system. This
> > message and any attachments may contain information that is
> > confidential, privileged or exempt from disclosure. Delivery of this
> > message to any person other than the intended recipient is not
> > intended to waive any right or privilege. Message transmission is not
> guaranteed to be secure or free of software viruses.
> >
> > **********************************************************************
> > *************************************************
> >
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 2014.0.4830 / Virus Database: 4365/10512 - Release Date:
> > 08/25/15 Internal Virus Database is out of date.
> >
> > **********************************************************************
> > *************************************************
> >
> > This email message and any attachments are intended solely for the use
> > of the addressee. If you are not the intended recipient, you are
> > prohibited from reading, disclosing, reproducing, distributing,
> > disseminating or otherwise using this transmission. If you have
> > received this message in error, please promptly notify the sender by
> > reply email and immediately delete this message from your system. This
> > message and any attachments may contain information that is
> > confidential, privileged or exempt from disclosure. Delivery of this
> > message to any person other than the intended recipient is not
> > intended to waive any right or privilege. Message transmission is not
> guaranteed to be secure or free of software viruses.
> >
> > **********************************************************************
> > *************************************************
> >
> >
>
> ***********************************************************************************************************************
>
> This email message and any attachments are intended solely for the use of
> the addressee. If you are not the intended recipient, you are prohibited
> from reading, disclosing, reproducing, distributing, disseminating or
> otherwise using this transmission. If you have received this message in
> error, please promptly notify the sender by reply email and immediately
> delete this message from your system. This message and any attachments may
> contain information that is confidential, privileged or exempt from
> disclosure. Delivery of this message to any person other than the intended
> recipient is not intended to waive any right or privilege. Message
> transmission is not guaranteed to be secure or free of software viruses.
>
> ***********************************************************************************************************************
>

RE: One task sending payload to multiple output streams

Posted by "Balusamy, Elangovan" <El...@altisource.com>.
We have one container for each partition, we are not providing any partition key.

messageCollector.send(new OutgoingMessageEnvelope(output_stream_1, "this is for you"));
messageCollector.send(new OutgoingMessageEnvelope(output_stream_2,  "this is for the other guy"));

this was an example, the actual message payload is a HashMap serialized into bytes.

  ByteArrayOutputStream b = new ByteArrayOutputStream();
  ObjectOutputStream o = new ObjectOutputStream(b);
  o.writeObject(map);
  bytes = b.toByteArray();

-----Original Message-----
From: Yi Pan [mailto:nickpan47@gmail.com] 
Sent: Wednesday, September 02, 2015 10:31 PM
To: dev@samza.apache.org
Subject: Re: One task sending payload to multiple output streams

Hi, Elangovan,

Could you confirm how many containers in your job? And how is the outgoing messages partitioned on? Most likely, this is related to the choice on the outgoing message partition key, which is the only deciding factor for which partition of a topic the message is sent to.

-Yi

On Wed, Sep 2, 2015 at 12:58 AM, Balusamy, Elangovan < Elangovan.Balusamy@altisource.com> wrote:

> The task consumes from one stream with 2 partitions.
>
> -----Original Message-----
> From: Garry Turkington [mailto:g.turkington@improvedigital.com]
> Sent: Wednesday, September 02, 2015 1:12 PM
> To: dev@samza.apache.org
> Subject: RE: One task sending payload to multiple output streams
>
> Hi,
>
> How many input streams does this task consume and how are they partitioned?
>
> Garry
>
> -----Original Message-----
> From: Balusamy, Elangovan [mailto:Elangovan.Balusamy@altisource.com]
> Sent: 02 September 2015 08:19
> To: dev@samza.apache.org
> Cc: Chandra, Saurabh
> Subject: One task sending payload to multiple output streams
>
> Folks,
>
> We are running a multi-node Samza cluster with multiple partitions for 
> each task. In one of the tasks, we would like to send output to two 
> different tasks and the payload also is different. Below is the code 
> that does it
>
> messageCollector.send(new OutgoingMessageEnvelope(output_stream_1, 
> "this is for you"));
>
> messageCollector.send(new OutgoingMessageEnvelope(output_stream_2, 
> "this is for the other guy"));
>
> output_stream_1 has 4 partitions
> output_stream_2 has 2 partitions
>
> We see that  only 50% of the  partitions are being used, the other 50% 
> doesn't get any messages.
>
> output_stream_1 has messages only in 2 partitions and output_stream_2 
> has messages only in 1 partition.
>
>
>
> Samza version: 0.9.0
>
>
> Kafka Version:  0.8.2.1
>
>
> Thanks
> Elango
>
> **********************************************************************
> *************************************************
>
> This email message and any attachments are intended solely for the use 
> of the addressee. If you are not the intended recipient, you are 
> prohibited from reading, disclosing, reproducing, distributing, 
> disseminating or otherwise using this transmission. If you have 
> received this message in error, please promptly notify the sender by 
> reply email and immediately delete this message from your system. This 
> message and any attachments may contain information that is 
> confidential, privileged or exempt from disclosure. Delivery of this 
> message to any person other than the intended recipient is not 
> intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
>
> **********************************************************************
> *************************************************
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2014.0.4830 / Virus Database: 4365/10512 - Release Date: 
> 08/25/15 Internal Virus Database is out of date.
>
> **********************************************************************
> *************************************************
>
> This email message and any attachments are intended solely for the use 
> of the addressee. If you are not the intended recipient, you are 
> prohibited from reading, disclosing, reproducing, distributing, 
> disseminating or otherwise using this transmission. If you have 
> received this message in error, please promptly notify the sender by 
> reply email and immediately delete this message from your system. This 
> message and any attachments may contain information that is 
> confidential, privileged or exempt from disclosure. Delivery of this 
> message to any person other than the intended recipient is not 
> intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
>
> **********************************************************************
> *************************************************
>
>
***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
***********************************************************************************************************************

Re: One task sending payload to multiple output streams

Posted by Yi Pan <ni...@gmail.com>.
Hi, Elangovan,

Could you confirm how many containers in your job? And how is the outgoing
messages partitioned on? Most likely, this is related to the choice on the
outgoing message partition key, which is the only deciding factor for which
partition of a topic the message is sent to.

-Yi

On Wed, Sep 2, 2015 at 12:58 AM, Balusamy, Elangovan <
Elangovan.Balusamy@altisource.com> wrote:

> The task consumes from one stream with 2 partitions.
>
> -----Original Message-----
> From: Garry Turkington [mailto:g.turkington@improvedigital.com]
> Sent: Wednesday, September 02, 2015 1:12 PM
> To: dev@samza.apache.org
> Subject: RE: One task sending payload to multiple output streams
>
> Hi,
>
> How many input streams does this task consume and how are they partitioned?
>
> Garry
>
> -----Original Message-----
> From: Balusamy, Elangovan [mailto:Elangovan.Balusamy@altisource.com]
> Sent: 02 September 2015 08:19
> To: dev@samza.apache.org
> Cc: Chandra, Saurabh
> Subject: One task sending payload to multiple output streams
>
> Folks,
>
> We are running a multi-node Samza cluster with multiple partitions for
> each task. In one of the tasks, we would like to send output to two
> different tasks and the payload also is different. Below is the code that
> does it
>
> messageCollector.send(new OutgoingMessageEnvelope(output_stream_1, "this
> is for you"));
>
> messageCollector.send(new OutgoingMessageEnvelope(output_stream_2, "this
> is for the other guy"));
>
> output_stream_1 has 4 partitions
> output_stream_2 has 2 partitions
>
> We see that  only 50% of the  partitions are being used, the other 50%
> doesn't get any messages.
>
> output_stream_1 has messages only in 2 partitions and output_stream_2 has
> messages only in 1 partition.
>
>
>
> Samza version: 0.9.0
>
>
> Kafka Version:  0.8.2.1
>
>
> Thanks
> Elango
>
> ***********************************************************************************************************************
>
> This email message and any attachments are intended solely for the use of
> the addressee. If you are not the intended recipient, you are prohibited
> from reading, disclosing, reproducing, distributing, disseminating or
> otherwise using this transmission. If you have received this message in
> error, please promptly notify the sender by reply email and immediately
> delete this message from your system. This message and any attachments may
> contain information that is confidential, privileged or exempt from
> disclosure. Delivery of this message to any person other than the intended
> recipient is not intended to waive any right or privilege. Message
> transmission is not guaranteed to be secure or free of software viruses.
>
> ***********************************************************************************************************************
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2014.0.4830 / Virus Database: 4365/10512 - Release Date: 08/25/15
> Internal Virus Database is out of date.
>
> ***********************************************************************************************************************
>
> This email message and any attachments are intended solely for the use of
> the addressee. If you are not the intended recipient, you are prohibited
> from reading, disclosing, reproducing, distributing, disseminating or
> otherwise using this transmission. If you have received this message in
> error, please promptly notify the sender by reply email and immediately
> delete this message from your system. This message and any attachments may
> contain information that is confidential, privileged or exempt from
> disclosure. Delivery of this message to any person other than the intended
> recipient is not intended to waive any right or privilege. Message
> transmission is not guaranteed to be secure or free of software viruses.
>
> ***********************************************************************************************************************
>
>

RE: One task sending payload to multiple output streams

Posted by "Balusamy, Elangovan" <El...@altisource.com>.
The task consumes from one stream with 2 partitions. 

-----Original Message-----
From: Garry Turkington [mailto:g.turkington@improvedigital.com] 
Sent: Wednesday, September 02, 2015 1:12 PM
To: dev@samza.apache.org
Subject: RE: One task sending payload to multiple output streams

Hi,

How many input streams does this task consume and how are they partitioned?

Garry

-----Original Message-----
From: Balusamy, Elangovan [mailto:Elangovan.Balusamy@altisource.com] 
Sent: 02 September 2015 08:19
To: dev@samza.apache.org
Cc: Chandra, Saurabh
Subject: One task sending payload to multiple output streams

Folks,

We are running a multi-node Samza cluster with multiple partitions for each task. In one of the tasks, we would like to send output to two different tasks and the payload also is different. Below is the code that does it

messageCollector.send(new OutgoingMessageEnvelope(output_stream_1, "this is for you"));

messageCollector.send(new OutgoingMessageEnvelope(output_stream_2, "this is for the other guy"));

output_stream_1 has 4 partitions
output_stream_2 has 2 partitions

We see that  only 50% of the  partitions are being used, the other 50% doesn't get any messages.

output_stream_1 has messages only in 2 partitions and output_stream_2 has messages only in 1 partition.



Samza version: 0.9.0


Kafka Version:  0.8.2.1


Thanks
Elango
***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
***********************************************************************************************************************

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4830 / Virus Database: 4365/10512 - Release Date: 08/25/15 Internal Virus Database is out of date.
***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
***********************************************************************************************************************


RE: One task sending payload to multiple output streams

Posted by Garry Turkington <g....@improvedigital.com>.
Hi,

How many input streams does this task consume and how are they partitioned?

Garry

-----Original Message-----
From: Balusamy, Elangovan [mailto:Elangovan.Balusamy@altisource.com] 
Sent: 02 September 2015 08:19
To: dev@samza.apache.org
Cc: Chandra, Saurabh
Subject: One task sending payload to multiple output streams

Folks,

We are running a multi-node Samza cluster with multiple partitions for each task. In one of the tasks, we would like to send output to two different tasks and the payload also is different. Below is the code that does it

messageCollector.send(new OutgoingMessageEnvelope(output_stream_1, "this is for you"));

messageCollector.send(new OutgoingMessageEnvelope(output_stream_2, "this is for the other guy"));

output_stream_1 has 4 partitions
output_stream_2 has 2 partitions

We see that  only 50% of the  partitions are being used, the other 50% doesn't get any messages.

output_stream_1 has messages only in 2 partitions and output_stream_2 has messages only in 1 partition.



Samza version: 0.9.0


Kafka Version:  0.8.2.1


Thanks
Elango
***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
***********************************************************************************************************************

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4830 / Virus Database: 4365/10512 - Release Date: 08/25/15 Internal Virus Database is out of date.