You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Nadav Glickman <na...@orchestra.group> on 2022/08/07 15:25:29 UTC

Global Topology Cache

Hi all,

We're looking for a caching solution to cache data reads from the DB and
have it available to the entire topology.
We need the data to be updated and the same for all the bolts, so we can't
have the same cache split among different executors.
Ideally we can have some in-memory solution within Storm. We tried an enum,
and singletons, but they aren't shared between executors.

I know distributed caching DBs like memcached and Redis are viable options,
but I'd really like to find a solution that won't require another machine
and another piece of technology in our stack.

Looking forward to your ideas!
Thanks,
Nadav

Re: Global Topology Cache

Posted by Bipin Prasad via user <us...@storm.apache.org>.
Does the Cassandra update trigger use a class that is generating events for a Storm Topology spout?If that is the case then the event could use fieldsGrouping instead of shuffleGrouping that it appears from your description.

A very minimal version topology builder code would be useful for the discussion.

Sent from Yahoo Mail for iPhone


On Sunday, August 7, 2022, 9:32 AM, Nadav Glickman <na...@orchestra.group> wrote:

Thanks for the quick answer!
Our DB is Cassandra. It's updated by Storm as well as by another service, and it's also read by both Storm and the other service. One of the main issues we are facing is when a table is updated by the service, Storm detects it in one of the executors, but the others remain oblivious to the change because the cache is not shared between them.The cache would be updated and read quite frequently and should be able to handle roughly 50-100k entries, each being a complex Java class. Naturally we're looking for minimal network overhead, which is why the caching solution as part of the topology would be most ideal.
How would you update all the executors but query with only one of them? Is there a way to partition data between them? Similar to Bolt grouping, but for the entire executor?
On Sun, Aug 7, 2022 at 6:59 PM Bipin Prasad via user <us...@storm.apache.org> wrote:

Hello Nadav,
Is the database updated by some other data flow other that this topology? How are the database changes detected and the “cache” updated? The size of the cached data and update volume will also influence the design. 
Without knowing some crucial details of data, size, update frequency, natural partitioning, network speed, etc it is hard to give a general answer.
But assuming that you are looking at a storm topology to itself serve as a cache provider and cache is small, one “possible” way to do this would be to have the updates hit all executors, but queries hit one of the “many” executors.
Sent from Yahoo Mail for iPhone


On Sunday, August 7, 2022, 8:25 AM, Nadav Glickman <na...@orchestra.group> wrote:

Hi all,
We're looking for a caching solution to cache data reads from the DB and have it available to the entire topology.We need the data to be updated and the same for all the bolts, so we can't have the same cache split among different executors.Ideally we can have some in-memory solution within Storm. We tried an enum, and singletons, but they aren't shared between executors.
I know distributed caching DBs like memcached and Redis are viable options, but I'd really like to find a solution that won't require another machine and another piece of technology in our stack.
Looking forward to your ideas!
Thanks,Nadav







Re: Global Topology Cache

Posted by Nadav Glickman <na...@orchestra.group>.
Thanks for the quick answer!

Our DB is Cassandra. It's updated by Storm as well as by another service,
and it's also read by both Storm and the other service. One of the main
issues we are facing is when a table is updated by the service, Storm
detects it in one of the executors, but the others remain oblivious to the
change because the cache is not shared between them.
The cache would be updated and read quite frequently and should be able to
handle roughly 50-100k entries, each being a complex Java class. Naturally
we're looking for minimal network overhead, which is why the caching
solution as part of the topology would be most ideal.

How would you update all the executors but query with only one of them? Is
there a way to partition data between them? Similar to Bolt grouping, but
for the entire executor?

On Sun, Aug 7, 2022 at 6:59 PM Bipin Prasad via user <us...@storm.apache.org>
wrote:

> Hello Nadav,
>
> Is the database updated by some other data flow other that this topology?
> How are the database changes detected and the “cache” updated? The size of
> the cached data and update volume will also influence the design.
>
> Without knowing some crucial details of data, size, update frequency,
> natural partitioning, network speed, etc it is hard to give a general
> answer.
>
> But assuming that you are looking at a storm topology to itself serve as a
> cache provider and cache is small, one “possible” way to do this would be
> to have the updates hit all executors, but queries hit one of the “many”
> executors.
>
> Sent from Yahoo Mail for iPhone
> <https://overview.mail.yahoo.com/?.src=iOS>
>
> On Sunday, August 7, 2022, 8:25 AM, Nadav Glickman
> <na...@orchestra.group> wrote:
>
> Hi all,
>
> We're looking for a caching solution to cache data reads from the DB and
> have it available to the entire topology.
> We need the data to be updated and the same for all the bolts, so we can't
> have the same cache split among different executors.
> Ideally we can have some in-memory solution within Storm. We tried an
> enum, and singletons, but they aren't shared between executors.
>
> I know distributed caching DBs like memcached and Redis are viable
> options, but I'd really like to find a solution that won't require another
> machine and another piece of technology in our stack.
>
> Looking forward to your ideas!
> Thanks,
> Nadav
>
>

Re: Global Topology Cache

Posted by Bipin Prasad via user <us...@storm.apache.org>.
Hello Nadav,
Is the database updated by some other data flow other that this topology? How are the database changes detected and the “cache” updated? The size of the cached data and update volume will also influence the design. 
Without knowing some crucial details of data, size, update frequency, natural partitioning, network speed, etc it is hard to give a general answer.
But assuming that you are looking at a storm topology to itself serve as a cache provider and cache is small, one “possible” way to do this would be to have the updates hit all executors, but queries hit one of the “many” executors.
Sent from Yahoo Mail for iPhone


On Sunday, August 7, 2022, 8:25 AM, Nadav Glickman <na...@orchestra.group> wrote:

Hi all,
We're looking for a caching solution to cache data reads from the DB and have it available to the entire topology.We need the data to be updated and the same for all the bolts, so we can't have the same cache split among different executors.Ideally we can have some in-memory solution within Storm. We tried an enum, and singletons, but they aren't shared between executors.
I know distributed caching DBs like memcached and Redis are viable options, but I'd really like to find a solution that won't require another machine and another piece of technology in our stack.
Looking forward to your ideas!
Thanks,Nadav