You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Marco Costantini <mc...@gmail.com> on 2017/11/20 09:00:30 UTC

A Batching Bolt

Hello,
I need to group/batch tuples. I've seen an excellent tutorial which does
this. It handles timeouts and batch size breaches. Great. However, there,
all of the logic takes place in the final bolt. That means it does not have
the problem of "emitting batched information".

Sadly for me, I want to create a distinct bolt in the middle of a topology
for batching. This means I have to worry about emitting batches of
information.

I tried it out. Both with the batching done in the final bolt, and with the
batching done in a separate bolt. When it's done in the final bolt, all is
well. When it's done in a separate bolt, performance suffers greatly. By
this I mean the indexing rate of ElasticSearch (probably not a good measure
of performance, I know). The batching method is the same in both cases.

Question: Is it bad to emit a Map or a List of objects? What are the best
practices for batching in a distinct batching bolt?

Please and thank you,
Marco.

RE: A Batching Bolt

Posted by Marco Costantini <mc...@gmail.com>.
Thanks Mauro. I think my situation is different. I need to emit even the
information from each tuple, it's just that I have to restructure it and
perform some grouping. What is the best way to emit these mappings and
collections in batch?I tried emitting the whole map but the performance of
that seemed low.

Marco.

On 20 Nov 2017 17:46, "Mauro Giusti" <ma...@microsoft.com> wrote:

> Marco –
>
> Our first bolt emits a summarized record of the info we received from the
> spouts –
>
> It is time based – every 30 seconds we emit one record that summarizes all
> the records we received from the spout –
>
> We don’t re-emit the source records that we received from the spouts, they
> are persisted on cold path storage though and we can access them offline
> for detailed analysis -
>
>
>
> Is this similar to what you are trying to do?
>
>
>
> Thx,
>
> Mauro.
>
>
>
> *From:* Marco Costantini [mailto:mcsilvio@gmail.com]
> *Sent:* Monday, November 20, 2017 1:01 AM
> *To:* user@storm.apache.org
> *Subject:* A Batching Bolt
>
>
>
> Hello,
>
> I need to group/batch tuples. I've seen an excellent tutorial which does
> this. It handles timeouts and batch size breaches. Great. However, there,
> all of the logic takes place in the final bolt. That means it does not have
> the problem of "emitting batched information".
>
> Sadly for me, I want to create a distinct bolt in the middle of a topology
> for batching. This means I have to worry about emitting batches of
> information.
>
> I tried it out. Both with the batching done in the final bolt, and with
> the batching done in a separate bolt. When it's done in the final bolt, all
> is well. When it's done in a separate bolt, performance suffers greatly. By
> this I mean the indexing rate of ElasticSearch (probably not a good measure
> of performance, I know). The batching method is the same in both cases.
>
> Question: Is it bad to emit a Map or a List of objects? What are the best
> practices for batching in a distinct batching bolt?
>
>
>
> Please and thank you,
>
> Marco.
>

RE: A Batching Bolt

Posted by Mauro Giusti <ma...@microsoft.com>.
Marco –
Our first bolt emits a summarized record of the info we received from the spouts –
It is time based – every 30 seconds we emit one record that summarizes all the records we received from the spout –
We don’t re-emit the source records that we received from the spouts, they are persisted on cold path storage though and we can access them offline for detailed analysis -

Is this similar to what you are trying to do?

Thx,
Mauro.

From: Marco Costantini [mailto:mcsilvio@gmail.com]
Sent: Monday, November 20, 2017 1:01 AM
To: user@storm.apache.org
Subject: A Batching Bolt

Hello,
I need to group/batch tuples. I've seen an excellent tutorial which does this. It handles timeouts and batch size breaches. Great. However, there, all of the logic takes place in the final bolt. That means it does not have the problem of "emitting batched information".

Sadly for me, I want to create a distinct bolt in the middle of a topology for batching. This means I have to worry about emitting batches of information.

I tried it out. Both with the batching done in the final bolt, and with the batching done in a separate bolt. When it's done in the final bolt, all is well. When it's done in a separate bolt, performance suffers greatly. By this I mean the indexing rate of ElasticSearch (probably not a good measure of performance, I know). The batching method is the same in both cases.

Question: Is it bad to emit a Map or a List of objects? What are the best practices for batching in a distinct batching bolt?

Please and thank you,
Marco.