You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn> on 2019/10/15 11:56:15 UTC
MergeRecord can not guarantee the ordering of the input sequence?
If FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
However, when testing with large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
This really confuses me a lot.
Anybody has any insight on this?
Thanks,
Lei
wanglei2@geekplus.com.cn
Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
Posted by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn>.
Hi Koji,
My test is as follows.
ProcessorA, scheduled only on primary node and with only one cocurrency.
The result of ProcessorA load balanced to ProcessorB. The strategy is by attribute. All the output FlowFiles of ProcessorA has the same attribute used for balance, so all FlowFiles will be balanced to the same node.
The order of ProcessorB received will probably not the same as ProcessorA emited. And the order is nondeterministic.
Thanks,
Lei
wanglei2@geekplus.com.cn
From: Koji Kawamura
Date: 2019-10-20 18:02
To: users
Subject: Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
Hi Lei,
Does 'balance strategy' means load balance strategy? Which strategy
are you using? I thought Prioritizers are applied on the destination
node after load balancing has transferred FlowFiles. Are those A, B
and C flow files generated on different nodes and sent to a single
node to merge them?
Thanks,
Koji
On Fri, Oct 18, 2019 at 7:12 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> Seems it is because of the balance strategy that is used.
> The balance will not guarantee the the order.
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn
>
>
> From: wanglei2@geekplus.com.cn
> Date: 2019-10-16 10:21
> To: dev; users
> CC: dev
> Subject: Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
> Hi Koji,
> Actually i have set all connections to FIFO and concurrency tasks to 1 for all processors.
> Before and after the MergeRecord, I add a LogAttribute to debug.
>
> Before MergeRecord,the order in logfile is A,B,C in three flowfile
> After MergeRecord, the order becomes {A,C,B} in one flowfile
> This is nondeterministic.
>
> I think I should look up the MergeRecord code and do further debug.
>
> Thanks,
> Lei
>
>
>
>
> wanglei2@geekplus.com.cn
> From: Koji Kawamura
> Date: 2019-10-16 09:46
> To: users
> CC: dev
> Subject: Re: MergeRecord can not guarantee the ordering of the input sequence?
> Hi Lei,
> How about setting FIFO prioritizer at all the preceding connections
> before the MergeRecord?
> Without setting any prioritizer, FlowFile ordering is nondeterministic.
> Thanks,
> Koji
> On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
> <wa...@geekplus.com.cn> wrote:
> >
> >
> > If FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> > However, when testing with large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
> >
> > This really confuses me a lot.
> > Anybody has any insight on this?
> >
> > Thanks,
> > Lei
> >
> > ________________________________
> > wanglei2@geekplus.com.cn
Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
Posted by Koji Kawamura <ij...@gmail.com>.
Hi Lei,
Does 'balance strategy' means load balance strategy? Which strategy
are you using? I thought Prioritizers are applied on the destination
node after load balancing has transferred FlowFiles. Are those A, B
and C flow files generated on different nodes and sent to a single
node to merge them?
Thanks,
Koji
On Fri, Oct 18, 2019 at 7:12 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> Seems it is because of the balance strategy that is used.
> The balance will not guarantee the the order.
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn
>
>
> From: wanglei2@geekplus.com.cn
> Date: 2019-10-16 10:21
> To: dev; users
> CC: dev
> Subject: Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
> Hi Koji,
> Actually i have set all connections to FIFO and concurrency tasks to 1 for all processors.
> Before and after the MergeRecord, I add a LogAttribute to debug.
>
> Before MergeRecord,the order in logfile is A,B,C in three flowfile
> After MergeRecord, the order becomes {A,C,B} in one flowfile
> This is nondeterministic.
>
> I think I should look up the MergeRecord code and do further debug.
>
> Thanks,
> Lei
>
>
>
>
> wanglei2@geekplus.com.cn
> From: Koji Kawamura
> Date: 2019-10-16 09:46
> To: users
> CC: dev
> Subject: Re: MergeRecord can not guarantee the ordering of the input sequence?
> Hi Lei,
> How about setting FIFO prioritizer at all the preceding connections
> before the MergeRecord?
> Without setting any prioritizer, FlowFile ordering is nondeterministic.
> Thanks,
> Koji
> On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
> <wa...@geekplus.com.cn> wrote:
> >
> >
> > If FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> > However, when testing with large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
> >
> > This really confuses me a lot.
> > Anybody has any insight on this?
> >
> > Thanks,
> > Lei
> >
> > ________________________________
> > wanglei2@geekplus.com.cn
Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
Posted by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn>.
Seems it is because of the balance strategy that is used.
The balance will not guarantee the the order.
Thanks,
Lei
wanglei2@geekplus.com.cn
From: wanglei2@geekplus.com.cn
Date: 2019-10-16 10:21
To: dev; users
CC: dev
Subject: Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
Hi Koji,
Actually i have set all connections to FIFO and concurrency tasks to 1 for all processors.
Before and after the MergeRecord, I add a LogAttribute to debug.
Before MergeRecord,the order in logfile is A,B,C in three flowfile
After MergeRecord, the order becomes {A,C,B} in one flowfile
This is nondeterministic.
I think I should look up the MergeRecord code and do further debug.
Thanks,
Lei
wanglei2@geekplus.com.cn
From: Koji Kawamura
Date: 2019-10-16 09:46
To: users
CC: dev
Subject: Re: MergeRecord can not guarantee the ordering of the input sequence?
Hi Lei,
How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.
Thanks,
Koji
On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> If FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> However, when testing with large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn
Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
Posted by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn>.
Hi Koji,
Actually i have set all connections to FIFO and concurrency tasks to 1 for all processors.
Before and after the MergeRecord, I add a LogAttribute to debug.
Before MergeRecord,the order in logfile is A,B,C in three flowfile
After MergeRecord, the order becomes {A,C,B} in one flowfile
This is nondeterministic.
I think I should look up the MergeRecord code and do further debug.
Thanks,
Lei
wanglei2@geekplus.com.cn
From: Koji Kawamura
Date: 2019-10-16 09:46
To: users
CC: dev
Subject: Re: MergeRecord can not guarantee the ordering of the input sequence?
Hi Lei,
How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.
Thanks,
Koji
On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> If FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> However, when testing with large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn
Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
Posted by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn>.
Hi Koji,
Actually i have set all connections to FIFO and concurrency tasks to 1 for all processors.
Before and after the MergeRecord, I add a LogAttribute to debug.
Before MergeRecord,the order in logfile is A,B,C in three flowfile
After MergeRecord, the order becomes {A,C,B} in one flowfile
This is nondeterministic.
I think I should look up the MergeRecord code and do further debug.
Thanks,
Lei
wanglei2@geekplus.com.cn
From: Koji Kawamura
Date: 2019-10-16 09:46
To: users
CC: dev
Subject: Re: MergeRecord can not guarantee the ordering of the input sequence?
Hi Lei,
How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.
Thanks,
Koji
On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> If FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> However, when testing with large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn
Re: MergeRecord can not guarantee the ordering of the input sequence?
Posted by Koji Kawamura <ij...@gmail.com>.
Hi Lei,
How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.
Thanks,
Koji
On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> If FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> However, when testing with large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn
Re: MergeRecord can not guarantee the ordering of the input sequence?
Posted by Koji Kawamura <ij...@gmail.com>.
Hi Lei,
How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.
Thanks,
Koji
On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> If FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> However, when testing with large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn