You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by "Alexander S. Klimov" <al...@microsoft.com> on 2014/01/09 17:37:06 UTC

spout instance id - can be used for partitioning?

Hi guys,

>From a given topic in Kafka I can read messages from multiple partitions. Let's say we have 100 partitions.

Can this load be evenly distributed across 10 spouts reading from Kafka. If spout instance has persistent id - I can use hash function to understand which part of partitions should given spout instance read.

Do we have spout instance id notion accessible in Storm Java API? Even simple number from 1 to 10 would work.

Thanks,
Alex

RE: spout instance id - can be used for partitioning?

Posted by "Alexander S. Klimov" <al...@microsoft.com>.

Thank you!

From: Guillaume Perrot [mailto:gperrot@ubikod.com]
Sent: Thursday, January 9, 2014 8:48 AM
To: user@storm.incubator.apache.org
Subject: Re: spout instance id - can be used for partitioning?

Hi, yes you can do that, DRPCSpout uses the following code from the open(Map,TopologyContext,SpoutOutputCollector) method:

int numTasks = context.getComponentTasks(context.getThisComponentId()).size();  // number of this spout tasks
int index = context.getThisTaskIndex();                                                                 // this task index

2014/1/9 Alexander S. Klimov <al...@microsoft.com>>
Hi guys,

From a given topic in Kafka I can read messages from multiple partitions. Let’s say we have 100 partitions.

Can this load be evenly distributed across 10 spouts reading from Kafka. If spout instance has persistent id – I can use hash function to understand which part of partitions should given spout instance read.

Do we have spout instance id notion accessible in Storm Java API? Even simple number from 1 to 10 would work.

Thanks,
Alex

Re: spout instance id - can be used for partitioning?

Posted by Guillaume Perrot <gp...@ubikod.com>.

Hi, yes you can do that, DRPCSpout uses the following code from the
open(Map,TopologyContext,SpoutOutputCollector) method:

int numTasks =
context.getComponentTasks(context.getThisComponentId()).size();  // number
of this spout tasks
int index = context.getThisTaskIndex();
                            // this task index

2014/1/9 Alexander S. Klimov <al...@microsoft.com>

>  Hi guys,
>
>
>
> From a given topic in Kafka I can read messages from multiple partitions.
> Let’s say we have 100 partitions.
>
>
>
> Can this load be evenly distributed across 10 spouts reading from Kafka.
> If spout instance has persistent id – I can use hash function to understand
> which part of partitions should given spout instance read.
>
>
>
> Do we have spout instance id notion accessible in Storm Java API? Even
> simple number from 1 to 10 would work.
>
>
>
> Thanks,
> Alex
>