You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Eric Ruel <er...@wantedanalytics.com> on 2015/02/10 23:37:03 UTC
RE: Trident topology - can we deactivate batch ordering
The spout emit batches of 100 ids to process,
some steps are faster to be executed in batches, like fetching data from the database which is done with an aggregator that emit all the same rows with additional values
we need Trident because we need joins, merge, aggregator, etc
but each batches are independant..., as my colleage said, with a maxSpoutSpending > 1
in our context, it's acceptable that the second batch can finish before the first one, but currently, it waits that the first batch is completed, which made our processing slower.
is it possible to keep the Trident and its features, but allowing unordering batch processing
Is it a problem of kind of Spout, or because we use a StateUpdater at the end?
we tried to remove the StateUpdater and use an aggregator but it does not help
is it clearer?
________________________________
De : Pascal Arnal <pa...@wantedanalytics.com>
Envoyé : 21 novembre 2014 12:15
À : user@storm.apache.org
Objet : RE: Trident topology
This post of one colleague is about the same thing.
https://mail-archives.apache.org/mod_mbox/storm-user/201401.mbox/%3C2730f9f8f8a44d16858c346886978886@BY2PR08MB144.namprd08.prod.outlook.com%3E
________________________________
De : Brunner, Bill <bi...@baml.com>
Envoyé : 21 novembre 2014 12:04
À : user@storm.apache.org
Objet : RE: Trident topology
Still not very clear
From: Pascal Arnal [mailto:pascal.arnal@wantedanalytics.com]
Sent: Friday, November 21, 2014 9:33 AM
To: user@storm.apache.org
Subject: RE: Trident topology
any help?
________________________________
De : Pascal Arnal <pa...@wantedanalytics.com>>
Envoyé : 20 novembre 2014 14:01
À : user@storm.apache.org<ma...@storm.apache.org>
Objet : RE: Trident topology
If i run one topology with max spout pending of 3, actual execution of stateupdater is batch 1 then batch 2 then batch 3, and one new batch 4 is generated after commit of batch 1, batch 5 after batch 2 ....
If batch 2 finish its execution before batch 1, it should wait that batch 1 is commited.
I don't want that it waits and i want the sequence in stateupdater batch 2 then batch 1 then batch 3 ...
and one new batch 4 after batch 2, batch 5 after batch 1 ....
is-it more clear, and is-it possible ?
Thanks
________________________________
De : P. Taylor Goetz <pt...@gmail.com>>
Envoyé : 20 novembre 2014 12:53
À : user@storm.apache.org<ma...@storm.apache.org>
Objet : Re: Trident topology
Hi Pascal,
I'm not sure I understand what you are asking. Could you elaborate?
-Taylor
On Nov 20, 2014, at 10:52 AM, Pascal Arnal <pa...@wantedanalytics.com>> wrote:
nobody for response ?
Should I create one issue / feature in Jira ?
________________________________
De : Pascal Arnal <pa...@wantedanalytics.com>>
Envoyé : 19 novembre 2014 10:58
À : user@storm.apache.org<ma...@storm.apache.org>
Objet : Trident topology
Hi,
I try to build one topology with trident for some functions, filters and aggregators.
I don't care about transaction and I would like that my batchs are unordered.
I use IBatchSpout for the Spout and BaseStateUpdater for the updater with TridentState.
Is-it possible to build one topology with my required ?
May be with another state updater, or simply by using aggregator ?
Thanks
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.