You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Milind Vaidya <ka...@gmail.com> on 2016/11/15 21:59:02 UTC

Multi Threading precautions for multiple executers / tasks

Hi

I am having a use case where few files in a directory are needed to be
processed by a certain bolt x written in Java.

I am setting number of executers and tasks same which is > 1. Say I have 4
executers and tasks.

As I understand, these are essentially threads in the worker process. Now I
want to make sure that each of the executer / task should process a file
uniquely. How to ensure that ?

Should I put a synchronised block inside execute method and make sure
processing is done in thread safe manner ?


This is actually to be done when a topology is launched. As the worker
starts, the corresponding bolts will scan a specific directory and process
the files previously generated but not processed.

In normal scenario, another bolt will pass on file name and path to be
processed to this bolt.

Re: Multi Threading precautions for multiple executers / tasks

Posted by Milind Vaidya <ka...@gmail.com>.
Thanks Jacob.

I understand what you are saying. But it will be applicable to a case when
bolt to bolt connection is in place. So when a previous bolt sends a file
to this bolt grouped on something, it will be passed on to a unique thread.

In my case, few files are already lying in a directory and when the
topology comes up, the init method of this bolt x checks to see if there
are any files in this directory which is common across topology as such.
Now 4 threads of bolt x will be able to see these files, but I want to
ensure that each thread handles unique file.


I hope I am more clear this time in explaining the situation.



On Tue, Nov 15, 2016 at 2:45 PM, Jacob Johansen <jo...@gmail.com>
wrote:

> you need to partition on something, the bolt grouping should do
> your partitioning and eliminate the need for locking. Remove synchronised
> as it will slow down processes.
>
> Jacob Johansen
>
> On Tue, Nov 15, 2016 at 3:59 PM, Milind Vaidya <ka...@gmail.com> wrote:
>
>> Hi
>>
>> I am having a use case where few files in a directory are needed to be
>> processed by a certain bolt x written in Java.
>>
>> I am setting number of executers and tasks same which is > 1. Say I have
>> 4 executers and tasks.
>>
>> As I understand, these are essentially threads in the worker process. Now
>> I want to make sure that each of the executer / task should process a file
>> uniquely. How to ensure that ?
>>
>> Should I put a synchronised block inside execute method and make sure
>> processing is done in thread safe manner ?
>>
>>
>> This is actually to be done when a topology is launched. As the worker
>> starts, the corresponding bolts will scan a specific directory and process
>> the files previously generated but not processed.
>>
>> In normal scenario, another bolt will pass on file name and path to be
>> processed to this bolt.
>>
>>
>>
>>
>>
>

Re: Multi Threading precautions for multiple executers / tasks

Posted by Jacob Johansen <jo...@gmail.com>.
you need to partition on something, the bolt grouping should do
your partitioning and eliminate the need for locking. Remove synchronised
as it will slow down processes.

Jacob Johansen

On Tue, Nov 15, 2016 at 3:59 PM, Milind Vaidya <ka...@gmail.com> wrote:

> Hi
>
> I am having a use case where few files in a directory are needed to be
> processed by a certain bolt x written in Java.
>
> I am setting number of executers and tasks same which is > 1. Say I have 4
> executers and tasks.
>
> As I understand, these are essentially threads in the worker process. Now
> I want to make sure that each of the executer / task should process a file
> uniquely. How to ensure that ?
>
> Should I put a synchronised block inside execute method and make sure
> processing is done in thread safe manner ?
>
>
> This is actually to be done when a topology is launched. As the worker
> starts, the corresponding bolts will scan a specific directory and process
> the files previously generated but not processed.
>
> In normal scenario, another bolt will pass on file name and path to be
> processed to this bolt.
>
>
>
>
>