You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2021/05/22 06:30:03 UTC

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17486


Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................

KUDU-3277 fix concurrent session flush issue in Java client

This patch fixes the issue reported by KUDU-3277.  The bug manifested
itself in rare cases when a session running in the AUTO_FLUSH_BACKGROUND
mode was either being flushed explicitly or implicitly (i.e. upon
closing the session) while the AUTO_FLUSH_BACKGROUND session's logic was
flushing its data buffers concurrently as well.

This patch also adds a reproduction scenario for KUDU-3277.  The newly
introduced test scenario was reliably failing before the fix:

  * 'java.lang.AssertionError: This Deferred was already called'
    messages were encountered in the log multiple times with the stack
    exactly as described in KUDU-3277

  * some flusher threads were unable to join since KuduSession.flush()
    would hang (i.e. would not return)

Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
2 files changed, 163 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/86/17486/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17486 )

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................


Patch Set 2:

I can be convinced otherwise. My gut is that unless there is something that makes running under TSAN impossible/untenable, we should try to run the tests across all of the build types. Looking at the 4 examples you mention, that is why they 3/4 are disabled. TestReadWriteLargeStrings is the only one disabled due to running slowly in TSAN. I am okay with this approach if making the test run well under TSAN isn't feasible.


-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 23 May 2021 22:03:05 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17486 )

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java:

http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java@894
PS1, Line 894:       if (injectLatencyBufferFlushCb) {
             :         try {
             :           Thread.sleep(randomizer.nextInt(16));
             :         } catch (InterruptedException e) {
             :           Thread.currentThread().interrupt();
             :         }
             :       }
I would be happy to have this only in 'debug' mode, but I didn't find a way to do so in Java :(



-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 22 May 2021 06:33:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17486 )

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................


Patch Set 2:

I am not sure I like the idea of skipping tests under certain test run types. Can we reduce the work the test is doing and still reliably reproduce so that the test doesn't timeout in tsan? Reducing numSessions and numRowsPerSession will likely help.


-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 23 May 2021 16:59:27 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Grant Henke, Bankim Bhavsar, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17486

to look at the new patch set (#2).

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................

KUDU-3277 fix concurrent session flush issue in Java client

This patch fixes the issue reported by KUDU-3277.  The bug manifested
itself in rare cases when a session running in the AUTO_FLUSH_BACKGROUND
mode was either being flushed explicitly or implicitly (i.e. upon
closing the session) while the AUTO_FLUSH_BACKGROUND session's logic was
flushing its data buffers concurrently as well.

This patch also adds a reproduction scenario for KUDU-3277.  The newly
introduced test scenario was reliably failing before the fix:

  * 'java.lang.AssertionError: This Deferred was already called'
    messages were encountered in the log multiple times with the stack
    exactly as described in KUDU-3277

  * some flusher threads were unable to join since KuduSession.flush()
    would hang (i.e. would not return)

Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
2 files changed, 170 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/86/17486/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17486 )

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................


Patch Set 1:

(3 comments)

> (3 comments)
 > 
 > Looks like the test failed in TSAN. The workload might be a little
 > too intense and the slower TSAN environment times out.
 > 1) testConcurrentFlush(org.apache.kudu.client.TestKuduClient)
 > org.junit.runners.model.TestTimedOutException: test timed out after
 > 100000 milliseconds
 > at sun.misc.Unsafe.park(Native Method)
 > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 > at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 > at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
 > at org.apache.kudu.client.TestKuduClient.testConcurrentFlush(TestKuduClient.java:1530)

Thank you for taking a look at that.  I added logic to skip the testConcurrentFlush scenario for sanitized builds (the new functionality depends on one extra patch: https://gerrit.cloudera.org/#/c/17487/)

http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java:

http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java@894
PS1, Line 894:       if (injectLatencyBufferFlushCb) {
             :         try {
             :           Thread.sleep(randomizer.nextInt(16));
             :         } catch (InterruptedException e) {
             :           Thread.currentThread().interrupt();
             :         }
             :       }
> Java does't really have multiple binary types. This approach makes sense.
Yep, it seems pre-processor could be useful here, but Java lacks one.


http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java@1005
PS1, Line 1005:     injectLatencyBufferFlushCb = injectLatency;
> Maybe log a warning here so it's easy to tell from the logs that latency wi
Done


http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@1468
PS1, Line 1468: 501
> Is this specific value particularly significant? Looks like it's just over 
Ah, that's doesn't matter much -- it's a left-over from one of many iterations of this test.  Removed.



-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 22 May 2021 21:16:56 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17486 )

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 23 May 2021 22:03:10 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17486 )

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................


Patch Set 2: Verified+1

unrelated test failure: TxnCommitITest.TestRestartingWhileCommitting


-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 22 May 2021 22:29:11 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17486 )

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................


Patch Set 2:

> I am not sure I like the idea of skipping tests under certain test
 > run types. Can we reduce the work the test is doing and still
 > reliably reproduce so that the test doesn't timeout in tsan?
 > Reducing numSessions and numRowsPerSession will likely help.

I didn't see firm reproduction of the issue if drastically reducing the number of sessions and rows written, and the scenario under TSAN is still flaky even if reducing the load to barely have the issue reproducing itself: http://dist-test.cloudera.org//job?job_id=aserbin.1621804245.42503#

What aspect you don't like particularly in skipping tests for ASAN/TSAN binaries?  We have multiple tests like that for the backend, where the whole test scenario is skipped:
  * TestCFileBothCacheMemoryTypes.TestReadWriteLargeStrings (skipped)
  * ThreadPoolTest.TestDeadlocks (skipped)
  * DebugUtilTest.TestUnwindWhileUnsafe (skipped)
  * TsRecoveryITestDeathTest.TestRecoverFromOpIdOverflow (skipped)

Also, about hundred of test scenarios has special settings for ASAN/TSAN builds. 

In this context we are not testing the back-end as is, so what's wrong with skipping the test if running against binaries built with ASAN/TSAN?


-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 23 May 2021 21:39:42 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17486 )

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................

KUDU-3277 fix concurrent session flush issue in Java client

This patch fixes the issue reported by KUDU-3277.  The bug manifested
itself in rare cases when a session running in the AUTO_FLUSH_BACKGROUND
mode was either being flushed explicitly or implicitly (i.e. upon
closing the session) while the AUTO_FLUSH_BACKGROUND session's logic was
flushing its data buffers concurrently as well.

This patch also adds a reproduction scenario for KUDU-3277.  The newly
introduced test scenario was reliably failing before the fix:

  * 'java.lang.AssertionError: This Deferred was already called'
    messages were encountered in the log multiple times with the stack
    exactly as described in KUDU-3277

  * some flusher threads were unable to join since KuduSession.flush()
    would hang (i.e. would not return)

Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Reviewed-on: http://gerrit.cloudera.org:8080/17486
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Grant Henke <gr...@apache.org>
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
2 files changed, 170 insertions(+), 8 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Grant Henke: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17486 )

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................


Patch Set 1:

(3 comments)

Looks like the test failed in TSAN. The workload might be a little too intense and the slower TSAN environment times out. 
1) testConcurrentFlush(org.apache.kudu.client.TestKuduClient)
org.junit.runners.model.TestTimedOutException: test timed out after 100000 milliseconds
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at org.apache.kudu.client.TestKuduClient.testConcurrentFlush(TestKuduClient.java:1530)

http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java:

http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java@894
PS1, Line 894:       if (injectLatencyBufferFlushCb) {
             :         try {
             :           Thread.sleep(randomizer.nextInt(16));
             :         } catch (InterruptedException e) {
             :           Thread.currentThread().interrupt();
             :         }
             :       }
> I would be happy to have this only in 'debug' mode, but I didn't find a way
Java does't really have multiple binary types. This approach makes sense.


http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java@1005
PS1, Line 1005:     injectLatencyBufferFlushCb = injectLatency;
Maybe log a warning here so it's easy to tell from the logs that latency will be injected just in case.


http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/17486/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@1468
PS1, Line 1468: 501
Is this specific value particularly significant? Looks like it's just over half the default of 1000.



-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 22 May 2021 13:35:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3277 fix concurrent session flush issue in Java client

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17486 )

Change subject: KUDU-3277 fix concurrent session flush issue in Java client
......................................................................


Patch Set 2:

> I can be convinced otherwise. My gut is that unless there is
 > something that makes running under TSAN impossible/untenable, we
 > should try to run the tests across all of the build types. Looking
 > at the 4 examples you mention, that is why they 3/4 are disabled.
 > TestReadWriteLargeStrings is the only one disabled due to running
 > slowly in TSAN. I am okay with this approach if making the test run
 > well under TSAN isn't feasible.

I tried to make it work under TSAN: 32 sessions, 5000 rows per session (vs original 50 and 10000 rows).  Unfortunately, in that case the issue isn't reliably reproducible under any build type, but for TSAN it's still 15% flaky :(  I'm striving to have reproduction rate close to 100% here.

I'd be happy to have a way to resolve this without introducing TSAN-specific nonsense, but unfortunately I didn't find a way to do so, at least this time.

I guess we could remove that newly introduced TSAN-related nonsense one day once we have a firm repro.  Maybe, that means we need to tweak the RPC timeout and make it longer?  We will see.

Thank you very much for the review!


-- 
To view, visit http://gerrit.cloudera.org:8080/17486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6aaccc06abf1a2673620ab7c649f51f91999ad9
Gerrit-Change-Number: 17486
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 23 May 2021 22:20:06 +0000
Gerrit-HasComments: No