You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by prasad ch <ch...@outlook.com> on 2015/05/07 07:20:23 UTC

Increasing Workers or Executors has Problem ?

HI,

i did small example on storm in cluster mode , which contains one spout and one bolt.here in my spout am reading list of files(10 files each contains 100 records ) ,while in my bolt am just writing receiving tuples into file.
when i run this application with 2 executors for bolt and 2 executors for spout and 2 workers  then it is executed fine. there is no duplicate tuples.  i received 1000 records in my result file.
while when i run same application with same files i used 4 executors for bot and 4 executors for spout and 4 workersthen i receive duplicate records. i received 1200+ records ,i observed always how many number of workers we have all the workers share data equally  .

for avoiding duplicate tuples and performing processing as fast any relation b/w executors and workers  please help  me
Q) when i use 2  executors for spout and bolt and 2 workers ,instead of showing  4 executors , it shows 6 i mean 2(spout)+2(bolt)+2(workerprocess)=6 but it is correct please clarify , it is not mentioned in document.



 THANK YOU
Regard's
prasad.ch

Re: Increasing Workers or Executors has Problem ?

Posted by Jeff Maass <JM...@cccis.com>.

If you are going to run 2 spouts, your spout code needs to be aware of other instances.  All Storm is doing is running our code, then managing the outputs between code instances.  That's it.  If each instance of your spout opens each file, and sends on each record, then all of those records will be sent on by Storm for processing.

From: prasad ch <ch...@outlook.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: 2015,Thursday, May 7 at 00:20
To: storm-users <us...@storm.apache.org>>
Subject: Increasing Workers or Executors has Problem ?

HI,

i did small example on storm in cluster mode , which contains one spout and one bolt.
here in my spout am reading list of files(10 files each contains 100 records ) ,while in my bolt am just writing receiving tuples into file.

when i run this application with 2 executors for bolt and 2 executors for spout and 2 workers  then
it is executed fine. there is no duplicate tuples.  i received 1000 records in my result file.

while when i run same application with same files i used 4 executors for bot and 4 executors for spout and 4 workers
then i receive duplicate records. i received 1200+ records ,i observed always how many number of workers we have all the workers share data equally  .

for avoiding duplicate tuples and performing processing as fast any relation b/w executors and workers  please help  me

Q) when i use 2  executors for spout and bolt and 2 workers ,instead of showing  4 executors , it shows 6
 i mean 2(spout)+2(bolt)+2(workerprocess)=6 but it is correct please clarify , it is not mentioned in document.

THANK YOU

Regard's

prasad.ch