You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Eric Ruel <er...@wantedanalytics.com> on 2015/02/10 23:37:03 UTC

RE: Trident topology - can we deactivate batch ordering

The spout emit batches of 100 ids to process,

some steps are faster to be executed in batches, like fetching data from the database which is done with an aggregator that emit all the same rows with additional values

we need Trident because we need joins, merge, aggregator, etc

but each batches are independant..., as my colleage said, with a maxSpoutSpending > 1

in our context, it's acceptable that the second batch can finish before the first one, but currently, it waits that the first batch is completed, which made our processing slower.

is it possible to keep the Trident and its features, but allowing unordering batch processing

Is it a problem of kind of Spout, or because we use a StateUpdater at the end?

we tried to remove the StateUpdater and use an aggregator but it does not help

is it clearer?



________________________________
De : Pascal Arnal <pa...@wantedanalytics.com>
Envoyé : 21 novembre 2014 12:15
À : user@storm.apache.org
Objet : RE: Trident topology


This post of one colleague is about the same thing.


https://mail-archives.apache.org/mod_mbox/storm-user/201401.mbox/%3C2730f9f8f8a44d16858c346886978886@BY2PR08MB144.namprd08.prod.outlook.com%3E



________________________________
De : Brunner, Bill <bi...@baml.com>
Envoyé : 21 novembre 2014 12:04
À : user@storm.apache.org
Objet : RE: Trident topology

Still not very clear

From: Pascal Arnal [mailto:pascal.arnal@wantedanalytics.com]
Sent: Friday, November 21, 2014 9:33 AM
To: user@storm.apache.org
Subject: RE: Trident topology


any help?



________________________________
De : Pascal Arnal <pa...@wantedanalytics.com>>
Envoyé : 20 novembre 2014 14:01
À : user@storm.apache.org<ma...@storm.apache.org>
Objet : RE: Trident topology


If i run one topology with max spout pending of 3, actual execution of stateupdater is batch 1 then batch 2 then batch 3, and one new batch 4 is generated after commit of batch 1, batch 5 after batch 2 ....

If batch 2 finish its execution before batch 1, it should wait that batch 1 is commited.
I don't want that it waits and i want the sequence in stateupdater batch 2 then batch 1 then batch 3 ...
and one new batch 4 after batch 2, batch 5 after batch 1 ....

is-it more clear, and is-it possible ?

Thanks





________________________________
De : P. Taylor Goetz <pt...@gmail.com>>
Envoyé : 20 novembre 2014 12:53
À : user@storm.apache.org<ma...@storm.apache.org>
Objet : Re: Trident topology

Hi Pascal,

I'm not sure I understand what you are asking. Could you elaborate?

-Taylor

On Nov 20, 2014, at 10:52 AM, Pascal Arnal <pa...@wantedanalytics.com>> wrote:


nobody for response ?
Should I create one issue / feature in Jira ?
________________________________
De : Pascal Arnal <pa...@wantedanalytics.com>>
Envoyé : 19 novembre 2014 10:58
À : user@storm.apache.org<ma...@storm.apache.org>
Objet : Trident topology

Hi,

I try to build one topology with trident for some functions, filters and aggregators.
I don't care about transaction and I would like that my batchs are unordered.
I use IBatchSpout for the Spout and BaseStateUpdater for the updater with TridentState.

Is-it possible to build one topology with my required ?
May be with another state updater, or simply by using aggregator ?

Thanks

________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.