You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Kushan Maskey <ku...@mmillerassociates.com> on 2014/11/20 04:15:36 UTC

Thread safe function

I have a scenario,

I have a common project where I have a synchronized function to validate
whether a text file contains a string. Example, that text file contains say
a list of food that needs to be excluded. I have a data coming through
kafka and storm which contain list of food.

The function i created is a synchronized SET with all these food. When i
get some kind of food in my data I look up to see if it needs to be
excluded form getting inserted into the database. Everything worked in my
local environment but when i deploy this code in a clustered environment,
exclusion is a hit or miss. Now the data that gets loaded is not correct
coz the food that is supposed to be excluded still exists. I am 100% sure
it is because of the thread safe issue of that function. How do i achieve
this functionality in the clustered environment. Please advice. Thanks.

--
Kushan Maskey
817.403.7500
M. Miller & Associates <http://mmillerassociates.com/>
kushan.maskey@mmillerassociates.com

Re: Thread safe function

Posted by Toby Hobson <to...@gmail.com>.
I'm not clear at what point you are modifying this set? You mentioned that
you have a text file of food to exclude ... Do you use this to populate the
set which you then subsequently modify? If so what is the logic (in simple
terms) which dictates whether elements are checked, added or removed from
the set?

It sounds like there could be many different ways to tackle your problem
(distributed locks, consistent hashing, rdbm transactions etc), but I
think,we would all need to know a bit more about your setup

On Thursday, 20 November 2014, Kushan Maskey <
kushan.maskey@mmillerassociates.com> wrote:

>
> I have a scenario,
>
> I have a common project where I have a synchronized function to validate
> whether a text file contains a string. Example, that text file contains say
> a list of food that needs to be excluded. I have a data coming through
> kafka and storm which contain list of food.
>
> The function i created is a synchronized SET with all these food. When i
> get some kind of food in my data I look up to see if it needs to be
> excluded form getting inserted into the database. Everything worked in my
> local environment but when i deploy this code in a clustered environment,
> exclusion is a hit or miss. Now the data that gets loaded is not correct
> coz the food that is supposed to be excluded still exists. I am 100% sure
> it is because of the thread safe issue of that function. How do i achieve
> this functionality in the clustered environment. Please advice. Thanks.
>
> --
> Kushan Maskey
> 817.403.7500
> M. Miller & Associates <http://mmillerassociates.com/>
> kushan.maskey@mmillerassociates.com
> <javascript:_e(%7B%7D,'cvml','kushan.maskey@mmillerassociates.com');>
>

Re: Thread safe function

Posted by Itai Frenkel <It...@forter.com>.
This is what we do if this "list of excluded foods" is immutable (constant) and can fit in memory:


1. Store the list in source control (manually curated) or in a database (automatically copied from some other source of truth)


2. Use bolt prepare method to load it into memory:

2.1 in a singleton class (we use http://en.wikipedia.org/wiki/Singleton_pattern#Initialization-on-demand_holder_idiom )

2.2 We use Guava ImmutableSet. Immutable (constant) is thread safe.

2.3 This means one copy per worker which is not so bad if the data size is small compared to Xmx. If it is too bit, then copy it to redis (see http://redis.io/commands/sismember )


3. Access the immutable set from the bolt execute() method.


________________________________
From: Kushan Maskey <ku...@mmillerassociates.com>
Sent: Thursday, November 20, 2014 5:15 AM
To: user@storm.incubator.apache.org
Subject: Thread safe function


I have a scenario,

I have a common project where I have a synchronized function to validate whether a text file contains a string. Example, that text file contains say a list of food that needs to be excluded. I have a data coming through kafka and storm which contain list of food.

The function i created is a synchronized SET with all these food. When i get some kind of food in my data I look up to see if it needs to be excluded form getting inserted into the database. Everything worked in my local environment but when i deploy this code in a clustered environment, exclusion is a hit or miss. Now the data that gets loaded is not correct coz the food that is supposed to be excluded still exists. I am 100% sure it is because of the thread safe issue of that function. How do i achieve this functionality in the clustered environment. Please advice. Thanks.

--
Kushan Maskey
817.403.7500
M. Miller & Associates<http://mmillerassociates.com/>
kushan.maskey@mmillerassociates.com<ma...@mmillerassociates.com>