You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yuqi Du (Code Review)" <ge...@cloudera.org> on 2023/01/31 08:30:02 UTC

[kudu-CR] [tserver] keep the order of appling ops the same as raft wal

Yuqi Du has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19463


Change subject: [tserver] keep the order of appling ops the same as raft wal
......................................................................

[tserver] keep the order of appling ops the same as raft wal

The order of tablet's write ops(including other raft logs) is an total order
in server-side. Apply them into kudu engine should be the same as raft wal.
But now apply ops by 'apply_pool->Submit' that's a concurrent threadpool,
may cause some ops out of order. For example, 4 logs of 2 ops, we expected:
    replicate 1.1
    commit 1.1
    replicate 1.2
    commit 1.2
or
    replicate 1.1
    replicate 1.2
    commit 1.1
    commit 1.2

A incorrect order is:
    replicate 1.1
    replicate 1.2
    commit 1.2
    commit 1.1

That means 1.2 would valid before 1.1  in kudu engine, that's not our expected.
'apply_pool->Submit' would cause the incorrect order.

This patch try to fix the problem by using apply_pool_token with SERIAL_MODE
created by apply_pool.

Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
---
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_driver.h
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
4 files changed, 50 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/19463/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 1
Gerrit-Owner: Yuqi Du <sh...@gmail.com>

[kudu-CR] [tserver] keep the order of appling ops the same as raft wal

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19463 )

Change subject: [tserver] keep the order of appling ops the same as raft wal
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG
Commit Message:

PS1: 
What exactly the problem with reordering REPLICATE and their COMMIT counterparts like in the example you mentioned?  Does it lead to any consistency issue or what?

IIUC, COMMIT operations are just some synthetic operations, so as far as REPLICATEs are in correct order, I don't see any problem with that.  Maybe, I'm missing something here.  If so, it would be great if you could clarify on that.

Thanks!



-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 1
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Comment-Date: Thu, 02 Feb 2023 16:17:05 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tserver] Keep the order of appling the ops same as raft WAL

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Ashwani Raina, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19463

to look at the new patch set (#2).

Change subject: [tserver] Keep the order of appling the ops same as raft WAL
......................................................................

[tserver] Keep the order of appling the ops same as raft WAL

The order of tablet's write ops(including other raft logs) is in total order
on server-side. Appling them into kudu engine should be same as raft wal.
Currently, appling the ops using 'apply_pool->Submit'(i.e concurrent thread-pool),
may cause some ops to go out of order.

For example, 4 logs of 2 ops, we expected:
    replicate 1.1
    commit 1.1
    replicate 1.2
    commit 1.2
or
    replicate 1.1
    replicate 1.2
    commit 1.1
    commit 1.2

A incorrect order is:
    replicate 1.1
    replicate 1.2
    commit 1.2
    commit 1.1

That means 1.2 would become valid before 1.1 in kudu engine, that's not expected.
'apply_pool->Submit' cauld cause the incorrect order.

This patch try to fix the problem by using apply_pool_token with SERIAL_MODE
created by apply_pool.

Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
---
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_driver.h
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
4 files changed, 50 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/19463/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 2
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19463

to look at the new patch set (#6).

Change subject: [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL
......................................................................

[tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

The order of tablet's write ops(including other raft logs) is in total order
on server-side. Appling them into kudu engine should be same as raft wal.
Currently, appling the ops using 'apply_pool->Submit'(i.e concurrent thread-pool),
may cause some ops to go out of order.

For example, 4 logs of 2 ops, we expected:
    replicate 1.1
    commit 1.1
    replicate 1.2
    commit 1.2
or
    replicate 1.1
    replicate 1.2
    commit 1.1
    commit 1.2

A incorrect order is:
    replicate 1.1
    replicate 1.2
    commit 1.2
    commit 1.1

That means 1.2 would become valid before 1.1 in kudu engine, that's not expected.
'apply_pool->Submit' cauld cause the incorrect order. More details can be found:
https://issues.apache.org/jira/browse/KUDU-3446.

A test added for checking out of order of commit messages, this can help
us disscuss about the problem KUDU-3446:
https://issues.apache.org/jira/browse/KUDU-3446. I think
kudu should Keep the order of appling the ops same as raft WAL.

This patch try to fix the problem by using apply_pool_token with SERIAL_MODE
created by apply_pool.

Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
---
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_driver.h
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica-test-base.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
7 files changed, 143 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/19463/6
-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 6
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [tserver] keep the order of appling ops the same as raft wal

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19463 )

Change subject: [tserver] keep the order of appling ops the same as raft wal
......................................................................


Patch Set 1:

Could you add some tests?


-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 1
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Comment-Date: Thu, 02 Feb 2023 16:04:11 +0000
Gerrit-HasComments: No

[kudu-CR] [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19463

to look at the new patch set (#5).

Change subject: [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL
......................................................................

[tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

The order of tablet's write ops(including other raft logs) is in total order
on server-side. Appling them into kudu engine should be same as raft wal.
Currently, appling the ops using 'apply_pool->Submit'(i.e concurrent thread-pool),
may cause some ops to go out of order.

For example, 4 logs of 2 ops, we expected:
    replicate 1.1
    commit 1.1
    replicate 1.2
    commit 1.2
or
    replicate 1.1
    replicate 1.2
    commit 1.1
    commit 1.2

A incorrect order is:
    replicate 1.1
    replicate 1.2
    commit 1.2
    commit 1.1

That means 1.2 would become valid before 1.1 in kudu engine, that's not expected.
'apply_pool->Submit' cauld cause the incorrect order. More details can be found:
https://issues.apache.org/jira/browse/KUDU-3446.

A test added for checking out of order of commit messages, this can help
us disscuss about the problem KUDU-3446:
https://issues.apache.org/jira/browse/KUDU-3446. I think
kudu should Keep the order of appling the ops same as raft WAL.

This patch try to fix the problem by using apply_pool_token with SERIAL_MODE
created by apply_pool.

Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
---
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_driver.h
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica-test-base.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
7 files changed, 143 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/19463/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 5
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [tserver] keep the order of appling ops the same as raft wal

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19463 )

Change subject: [tserver] keep the order of appling ops the same as raft wal
......................................................................


Patch Set 1:

(10 comments)

> Patch Set 1:
> 
> (11 comments)
> 
> Is this specific issue of out-of-order ops seen in any environment?

Yes.
The environment can get easily if provide enough qps of WriteRequest, I'll provide a test for this.

> It would really help if you could also capture one liners about following points:
> 1. How can out-of-order ops happen given the current concurrent thread-pool implementation?

I have described that, and it's too simple. I will provide a method of reproducing the case later and add a new jira issue. 

> 2. What would be the repercussions if we have these out-of-order ops? Any corruption?

I think, kudu currently has ignored the problem or think it not a problem. I
don't think that is not correct, we should talk about that.

> 3. Any performance impact due to this serialisation?
Yes, I will take a little cost about performance, but I think it's will be no confluence about that. The real performance impact, we should show the experiment metrics.

http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@7
PS1, Line 7: keep the order of appling ops the same as raft wal
> nit: Keep the order of applying the ops same as raft WAL.
Done


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@10
PS1, Line 10: be the same
> nit: be same
Done


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@10
PS1, Line 10: Apply
> nit: Applying
Done


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@9
PS1, Line 9: is an total order
           : in server-side
> nit: is in total order on server side
Done


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@11
PS1, Line 11: But now apply ops by 'apply_pool->Submit' that's a concurrent threadpool,
> nit: Currently, applying ops using "apply_pool_->Submit" (i.e. concurrent t
Done


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@12
PS1, Line 12: some ops out
> nit: some ops to go out
Done


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@29
PS1, Line 29: would valid 
> nit: would become valid
Done


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@29
PS1, Line 29:   
> nit: extra whitespace
Done


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@29
PS1, Line 29: not our expected
> nit: not expected
Done


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@30
PS1, Line 30: would
> nit: could
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 1
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 03 Feb 2023 12:05:46 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19463 )

Change subject: [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL
......................................................................


Patch Set 3:

(2 comments)

Your advices are very helpful. Thank you.

http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG
Commit Message:

PS1: 
> What exactly the problem with reordering REPLICATE and their COMMIT counter
I am sorry, the commit message is too simple. Detail information can be found at the new jira: https://issues.apache.org/jira/browse/KUDU-3446.

We can talk about it.


http://gerrit.cloudera.org:8080/#/c/19463/1/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

http://gerrit.cloudera.org:8080/#/c/19463/1/src/kudu/tablet/tablet_replica.h@604
PS1, Line 604: seria
> nit: Serial
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 3
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Thu, 09 Feb 2023 08:10:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Ashwani Raina, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19463

to look at the new patch set (#4).

Change subject: [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL
......................................................................

[tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

The order of tablet's write ops(including other raft logs) is in total order
on server-side. Appling them into kudu engine should be same as raft wal.
Currently, appling the ops using 'apply_pool->Submit'(i.e concurrent thread-pool),
may cause some ops to go out of order.

For example, 4 logs of 2 ops, we expected:
    replicate 1.1
    commit 1.1
    replicate 1.2
    commit 1.2
or
    replicate 1.1
    replicate 1.2
    commit 1.1
    commit 1.2

A incorrect order is:
    replicate 1.1
    replicate 1.2
    commit 1.2
    commit 1.1

That means 1.2 would become valid before 1.1 in kudu engine, that's not expected.
'apply_pool->Submit' cauld cause the incorrect order. More details can be found:
https://issues.apache.org/jira/browse/KUDU-3446.

A test added for checking out of order of commit messages, this can help
us disscuss about the problem KUDU-3446:
https://issues.apache.org/jira/browse/KUDU-3446. I think
kudu should Keep the order of appling the ops same as raft WAL.

This patch try to fix the problem by using apply_pool_token with SERIAL_MODE
created by apply_pool.

Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
---
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_driver.h
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica-test-base.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
7 files changed, 153 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/19463/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 4
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19463

to look at the new patch set (#8).

Change subject: [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL
......................................................................

[tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

The order of tablet's write ops(including other raft logs) is in total order
on server-side. Appling them into kudu engine should be same as raft wal.
Currently, appling the ops using 'apply_pool->Submit'(i.e concurrent thread-pool),
may cause some ops to go out of order.

For example, 4 logs of 2 ops, we expected:
    replicate 1.1
    commit 1.1
    replicate 1.2
    commit 1.2
or
    replicate 1.1
    replicate 1.2
    commit 1.1
    commit 1.2

A incorrect order is:
    replicate 1.1
    replicate 1.2
    commit 1.2
    commit 1.1

That means 1.2 would become valid before 1.1 in kudu engine, that's not expected.
'apply_pool->Submit' cauld cause the incorrect order. More details can be found:
https://issues.apache.org/jira/browse/KUDU-3446.

A test added for checking out of order of commit messages, this can help
us disscuss about the problem KUDU-3446:
https://issues.apache.org/jira/browse/KUDU-3446. I think
kudu should Keep the order of appling the ops same as raft WAL.

This patch try to fix the problem by using apply_pool_token with SERIAL_MODE
created by apply_pool. This patch add two test cases:
1. A case reproduce the commit message orders what I said.
2. A test case check external consistency about causual consistency.
The two cases would be failed if using 'apply_pool->Submit()'.

Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
---
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_driver.h
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica-test-base.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
7 files changed, 296 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/19463/8
-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 8
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Ashwani Raina, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19463

to look at the new patch set (#3).

Change subject: [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL
......................................................................

[tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

The order of tablet's write ops(including other raft logs) is in total order
on server-side. Appling them into kudu engine should be same as raft wal.
Currently, appling the ops using 'apply_pool->Submit'(i.e concurrent thread-pool),
may cause some ops to go out of order.

For example, 4 logs of 2 ops, we expected:
    replicate 1.1
    commit 1.1
    replicate 1.2
    commit 1.2
or
    replicate 1.1
    replicate 1.2
    commit 1.1
    commit 1.2

A incorrect order is:
    replicate 1.1
    replicate 1.2
    commit 1.2
    commit 1.1

That means 1.2 would become valid before 1.1 in kudu engine, that's not expected.
'apply_pool->Submit' cauld cause the incorrect order. More details can be found:
https://issues.apache.org/jira/browse/KUDU-3446.

This patch try to fix the problem by using apply_pool_token with SERIAL_MODE
created by apply_pool.

Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
---
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_driver.h
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
4 files changed, 50 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/19463/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 3
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [tserver] keep the order of appling ops the same as raft wal

Posted by "Ashwani Raina (Code Review)" <ge...@cloudera.org>.
Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/19463 )

Change subject: [tserver] keep the order of appling ops the same as raft wal
......................................................................


Patch Set 1:

(11 comments)

Is this specific issue of out-of-order ops seen in any environment?
It would really help if you could also capture one liners about following points:
1. How can out-of-order ops happen given the current concurrent thread-pool implementation?
2. What would be the repercussions if we have these out-of-order ops? Any corruption?
3. Any performance impact due to this serialisation?

http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@7
PS1, Line 7: keep the order of appling ops the same as raft wal
nit: Keep the order of applying the ops same as raft WAL.


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@10
PS1, Line 10: Apply
nit: Applying


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@10
PS1, Line 10: be the same
nit: be same


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@9
PS1, Line 9: is an total order
           : in server-side
nit: is in total order on server side


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@11
PS1, Line 11: But now apply ops by 'apply_pool->Submit' that's a concurrent threadpool,
nit: Currently, applying ops using "apply_pool_->Submit" (i.e. concurrent thread-pool)


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@12
PS1, Line 12: some ops out
nit: some ops to go out


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@29
PS1, Line 29: would valid 
nit: would become valid


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@29
PS1, Line 29:   
nit: extra whitespace


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@29
PS1, Line 29: not our expected
nit: not expected


http://gerrit.cloudera.org:8080/#/c/19463/1//COMMIT_MSG@30
PS1, Line 30: would
nit: could


http://gerrit.cloudera.org:8080/#/c/19463/1/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

http://gerrit.cloudera.org:8080/#/c/19463/1/src/kudu/tablet/tablet_replica.h@604
PS1, Line 604: seral
nit: Serial



-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 1
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 31 Jan 2023 13:27:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19463

to look at the new patch set (#7).

Change subject: [tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL
......................................................................

[tserver] KUDU-3446 Keep the order of appling the ops same as raft WAL

The order of tablet's write ops(including other raft logs) is in total order
on server-side. Appling them into kudu engine should be same as raft wal.
Currently, appling the ops using 'apply_pool->Submit'(i.e concurrent thread-pool),
may cause some ops to go out of order.

For example, 4 logs of 2 ops, we expected:
    replicate 1.1
    commit 1.1
    replicate 1.2
    commit 1.2
or
    replicate 1.1
    replicate 1.2
    commit 1.1
    commit 1.2

A incorrect order is:
    replicate 1.1
    replicate 1.2
    commit 1.2
    commit 1.1

That means 1.2 would become valid before 1.1 in kudu engine, that's not expected.
'apply_pool->Submit' cauld cause the incorrect order. More details can be found:
https://issues.apache.org/jira/browse/KUDU-3446.

A test added for checking out of order of commit messages, this can help
us disscuss about the problem KUDU-3446:
https://issues.apache.org/jira/browse/KUDU-3446. I think
kudu should Keep the order of appling the ops same as raft WAL.

This patch try to fix the problem by using apply_pool_token with SERIAL_MODE
created by apply_pool. This patch add two test cases:
1. A case reproduce the commit message orders what I said.
2. A test case check external consistency about causual consistency.
The two cases would be failed if using 'apply_pool->Submit()'.

Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
---
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_driver.h
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica-test-base.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
7 files changed, 298 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/19463/7
-- 
To view, visit http://gerrit.cloudera.org:8080/19463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id040122ab9473d1a8b03a2a1e751454a32b89d87
Gerrit-Change-Number: 19463
Gerrit-PatchSet: 7
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>