You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by I PVP <ip...@hotmail.com> on 2017/09/11 22:08:51 UTC

sharing across Bolts

What is the best practice approach to share, across bolts, a Collection that will be used by many bolts each will perform a specific summarization and statistics calculation.
The objective is to retrieve the collection only once , instead of retrieving from  each for each bolt.

Should I just emit the collection from the intermediary bolt or is there a better way something like a internal cache ?

The overall topology approach is , using fieldsGrouping:
---
1)KafkaSpout
Receives the identifier(UUID) that will drive the retrieval of a collection of retail  transactions.  example: List<Transaction>

2) Bolt
Retrieves and emitts (collector.emit) the collection of transactions that will be subjet to multiple calculations  ( Is this correct  or could cause a memory issue as the number of Bolts growth ?)

3) Around 6 other Bolts should use that same collection of transactions to execute different types of summarization and statistics calculation and write the metrics to Cassandra.
---

Thanks
IPVP