You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn> on 2019/10/15 11:56:15 UTC

MergeRecord can not guarantee the ordering of the input sequence?

If  FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
However, when testing with  large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic

This really confuses me a lot.
Anybody has any insight on this?

Thanks,
Lei



wanglei2@geekplus.com.cn

Re: Re: MergeRecord can not guarantee the ordering of the input sequence?

Posted by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn>.
Hi Koji, 

My test is as follows.
ProcessorA, scheduled only on primary node and with only one cocurrency. 
The result of ProcessorA load balanced to ProcessorB.  The strategy is by attribute.  All the output FlowFiles of ProcessorA has  the same attribute used for balance, so all FlowFiles will be balanced to the same node. 
The order of ProcessorB received will probably not the same as ProcessorA emited. And the order is nondeterministic. 

Thanks,
Lei



wanglei2@geekplus.com.cn
 
From: Koji Kawamura
Date: 2019-10-20 18:02
To: users
Subject: Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
Hi Lei,
 
Does 'balance strategy' means load balance strategy? Which strategy
are you using? I thought Prioritizers are applied on the destination
node after load balancing has transferred FlowFiles. Are those A, B
and C flow files generated on different nodes and sent to a single
node to merge them?
 
Thanks,
Koji
 
On Fri, Oct 18, 2019 at 7:12 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> Seems it is because of the balance strategy that is used.
> The balance will not guarantee the the order.
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn
>
>
> From: wanglei2@geekplus.com.cn
> Date: 2019-10-16 10:21
> To: dev; users
> CC: dev
> Subject: Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
> Hi Koji,
> Actually i have set all connections to FIFO and concurrency tasks to 1 for all processors.
> Before and after the MergeRecord, I add a LogAttribute to debug.
>
> Before MergeRecord,the order in logfile is A,B,C in three flowfile
> After  MergeRecord, the order becomes {A,C,B} in one flowfile
> This is nondeterministic.
>
> I think I should look up the MergeRecord code and do further debug.
>
> Thanks,
> Lei
>
>
>
>
> wanglei2@geekplus.com.cn
> From: Koji Kawamura
> Date: 2019-10-16 09:46
> To: users
> CC: dev
> Subject: Re: MergeRecord can not guarantee the ordering of the input sequence?
> Hi Lei,
> How about setting FIFO prioritizer at all the preceding connections
> before the MergeRecord?
> Without setting any prioritizer, FlowFile ordering is nondeterministic.
> Thanks,
> Koji
> On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
> <wa...@geekplus.com.cn> wrote:
> >
> >
> > If  FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> > However, when testing with  large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
> >
> > This really confuses me a lot.
> > Anybody has any insight on this?
> >
> > Thanks,
> > Lei
> >
> > ________________________________
> > wanglei2@geekplus.com.cn

Re: Re: MergeRecord can not guarantee the ordering of the input sequence?

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Lei,

Does 'balance strategy' means load balance strategy? Which strategy
are you using? I thought Prioritizers are applied on the destination
node after load balancing has transferred FlowFiles. Are those A, B
and C flow files generated on different nodes and sent to a single
node to merge them?

Thanks,
Koji

On Fri, Oct 18, 2019 at 7:12 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> Seems it is because of the balance strategy that is used.
> The balance will not guarantee the the order.
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn
>
>
> From: wanglei2@geekplus.com.cn
> Date: 2019-10-16 10:21
> To: dev; users
> CC: dev
> Subject: Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
> Hi Koji,
> Actually i have set all connections to FIFO and concurrency tasks to 1 for all processors.
> Before and after the MergeRecord, I add a LogAttribute to debug.
>
> Before MergeRecord,the order in logfile is A,B,C in three flowfile
> After  MergeRecord, the order becomes {A,C,B} in one flowfile
> This is nondeterministic.
>
> I think I should look up the MergeRecord code and do further debug.
>
> Thanks,
> Lei
>
>
>
>
> wanglei2@geekplus.com.cn
> From: Koji Kawamura
> Date: 2019-10-16 09:46
> To: users
> CC: dev
> Subject: Re: MergeRecord can not guarantee the ordering of the input sequence?
> Hi Lei,
> How about setting FIFO prioritizer at all the preceding connections
> before the MergeRecord?
> Without setting any prioritizer, FlowFile ordering is nondeterministic.
> Thanks,
> Koji
> On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
> <wa...@geekplus.com.cn> wrote:
> >
> >
> > If  FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> > However, when testing with  large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
> >
> > This really confuses me a lot.
> > Anybody has any insight on this?
> >
> > Thanks,
> > Lei
> >
> > ________________________________
> > wanglei2@geekplus.com.cn

Re: Re: MergeRecord can not guarantee the ordering of the input sequence?

Posted by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn>.
Seems it is because of the balance strategy that is used. 
The balance will not guarantee the the order.

Thanks,
Lei



wanglei2@geekplus.com.cn
 
From: wanglei2@geekplus.com.cn
Date: 2019-10-16 10:21
To: dev; users
CC: dev
Subject: Re: Re: MergeRecord can not guarantee the ordering of the input sequence?
Hi Koji, 
Actually i have set all connections to FIFO and concurrency tasks to 1 for all processors.
Before and after the MergeRecord, I add a LogAttribute to debug.
 
Before MergeRecord,the order in logfile is A,B,C in three flowfile 
After  MergeRecord, the order becomes {A,C,B} in one flowfile
This is nondeterministic.
 
I think I should look up the MergeRecord code and do further debug.
 
Thanks, 
Lei
 
 
 
 
wanglei2@geekplus.com.cn
From: Koji Kawamura
Date: 2019-10-16 09:46
To: users
CC: dev
Subject: Re: MergeRecord can not guarantee the ordering of the input sequence?
Hi Lei,
How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.
Thanks,
Koji
On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> If  FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> However, when testing with  large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn

Re: Re: MergeRecord can not guarantee the ordering of the input sequence?

Posted by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn>.
Hi Koji, 
Actually i have set all connections to FIFO and concurrency tasks to 1 for all processors.
Before and after the MergeRecord, I add a LogAttribute to debug.

Before MergeRecord,the order in logfile is A,B,C in three flowfile 
After  MergeRecord, the order becomes {A,C,B} in one flowfile
This is nondeterministic.

I think I should look up the MergeRecord code and do further debug.

Thanks, 
Lei




wanglei2@geekplus.com.cn
 
From: Koji Kawamura
Date: 2019-10-16 09:46
To: users
CC: dev
Subject: Re: MergeRecord can not guarantee the ordering of the input sequence?
Hi Lei,
 
How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.
 
Thanks,
Koji
 
On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> If  FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> However, when testing with  large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn

Re: Re: MergeRecord can not guarantee the ordering of the input sequence?

Posted by "wanglei2@geekplus.com.cn" <wa...@geekplus.com.cn>.
Hi Koji, 
Actually i have set all connections to FIFO and concurrency tasks to 1 for all processors.
Before and after the MergeRecord, I add a LogAttribute to debug.

Before MergeRecord,the order in logfile is A,B,C in three flowfile 
After  MergeRecord, the order becomes {A,C,B} in one flowfile
This is nondeterministic.

I think I should look up the MergeRecord code and do further debug.

Thanks, 
Lei




wanglei2@geekplus.com.cn
 
From: Koji Kawamura
Date: 2019-10-16 09:46
To: users
CC: dev
Subject: Re: MergeRecord can not guarantee the ordering of the input sequence?
Hi Lei,
 
How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.
 
Thanks,
Koji
 
On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> If  FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> However, when testing with  large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn

Re: MergeRecord can not guarantee the ordering of the input sequence?

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Lei,

How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.

Thanks,
Koji

On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> If  FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> However, when testing with  large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn

Re: MergeRecord can not guarantee the ordering of the input sequence?

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Lei,

How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.

Thanks,
Koji

On Tue, Oct 15, 2019 at 8:56 PM wanglei2@geekplus.com.cn
<wa...@geekplus.com.cn> wrote:
>
>
> If  FlowFile A, B, C enter the MergeRecord sequentially, the output should be one FlowFile {A, B, C}
> However, when testing with  large data volume, sometimes the output order will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> ________________________________
> wanglei2@geekplus.com.cn