You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2019/10/17 21:43:30 UTC

[kudu-CR] [nvm cache] proper retry logic for AllocateAndRetry()

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14498


Change subject: [nvm_cache] proper retry logic for AllocateAndRetry()
......................................................................

[nvm_cache] proper retry logic for AllocateAndRetry()

This patch addresses the issue with allocation retry logic
in NvmLRUCache::AllocateAndRetry(), properly deallocating
the memory before making another allocation attempt.

This is a follow-up to 0897b6b0e47a64c0cf30c956c020371258f2bacd.

Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
---
M src/kudu/util/nvm_cache.cc
1 file changed, 10 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/98/14498/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
Gerrit-Change-Number: 14498
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [nvm cache] remove allocation retry logic

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, ye yuqiang, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14498

to look at the new patch set (#2).

Change subject: [nvm_cache] remove allocation retry logic
......................................................................

[nvm_cache] remove allocation retry logic

This patch addresses the issue with allocation retry logic
in NvmLRUCache::AllocateAndRetry(), removing it altogether.

The idea of removing some entries from the cache and trying again
in case of allocation failure is:
  * not aligned with DRAM-based cache behavior
  * not robust enough

First, from the conceptual point, DRAM block cache doesn't retry
in such cases (i.e. in the sense that new[] may raise an exception and
crash Kudu if it runs out of memory).  Second, there is no guarantee
there would be some entries in the cache not referenced from anywhere
else.  Third, since the block cache performs clean-up of the cache
on its own by running a periodic grooming task, the value of such
de-allocation attempt is greatly diminished.

This is a follow-up to 0897b6b0e47a64c0cf30c956c020371258f2bacd.

Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
---
M src/kudu/util/nvm_cache.cc
1 file changed, 5 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/98/14498/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
Gerrit-Change-Number: 14498
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: ye yuqiang <yu...@intel.com>

[kudu-CR] [nvm cache] proper retry logic for AllocateAndRetry()

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14498 )

Change subject: [nvm_cache] proper retry logic for AllocateAndRetry()
......................................................................


Patch Set 1:

Conceptually, why retry at all? The DRAM block cache doesn't retry (in the sense that new[] may raise an exception and crash Kudu if it runs out of memory). Is that morally equivalent? Or is the NVM case different?


-- 
To view, visit http://gerrit.cloudera.org:8080/14498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
Gerrit-Change-Number: 14498
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 17 Oct 2019 22:18:54 +0000
Gerrit-HasComments: No

[kudu-CR] [nvm cache] proper retry logic for AllocateAndRetry()

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14498 )

Change subject: [nvm_cache] proper retry logic for AllocateAndRetry()
......................................................................


Patch Set 1:

> Conceptually, why retry at all? The DRAM block cache doesn't retry
 > (in the sense that new[] may raise an exception and crash Kudu if
 > it runs out of memory). Is that morally equivalent? Or is the NVM
 > case different?

I completely agree.  In addition to the DRAM-related story you rightfully mentioned, I think the whole idea of removing those entries from the cache is not robust enough.  Even with the fixed logic of freeing some space in the cache (as in this patch), there is no guarantee there should would be some entries in the cache not referenced from anywhere else.  Also, since the block cache performs clean-up of the cache on its own (by running a periodic task), the value of such de-allocation attempt is diminishing.

OK, let me post a patch which removes the retrial logic at all.

Thank you!


-- 
To view, visit http://gerrit.cloudera.org:8080/14498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
Gerrit-Change-Number: 14498
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 17 Oct 2019 23:16:18 +0000
Gerrit-HasComments: No

[kudu-CR] [nvm cache] remove allocation retry logic

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14498 )

Change subject: [nvm_cache] remove allocation retry logic
......................................................................


Patch Set 2: Code-Review+2

Would be good to get feedback from Yuqiang though.


-- 
To view, visit http://gerrit.cloudera.org:8080/14498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
Gerrit-Change-Number: 14498
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: ye yuqiang <yu...@intel.com>
Gerrit-Comment-Date: Fri, 18 Oct 2019 00:04:35 +0000
Gerrit-HasComments: No

[kudu-CR] [nvm cache] remove allocation retry logic

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14498 )

Change subject: [nvm_cache] remove allocation retry logic
......................................................................


Patch Set 2:

> Patch Set 2:
> 
> The retry logic prevent from memkind/jemalloc allocation failure occasionally. It's better to keep these code there in case running to crash.

Could you describe this in more detail? Under what circumstances do allocation failures happen?

We're sniffing around this code because we saw that since the switch from libpmem to memkind, cfile-test has been crashing with a SIGSEGV in CacheMemoryTypes/TestCFileBothCacheMemoryTypes.TestReadWriteLargeStrings/1, which is the NVMCache parameterization of the test.


-- 
To view, visit http://gerrit.cloudera.org:8080/14498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
Gerrit-Change-Number: 14498
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: ye yuqiang <yu...@intel.com>
Gerrit-Comment-Date: Fri, 18 Oct 2019 03:10:21 +0000
Gerrit-HasComments: No

[kudu-CR] [nvm cache] remove allocation retry logic

Posted by "ye yuqiang (Code Review)" <ge...@cloudera.org>.
ye yuqiang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14498 )

Change subject: [nvm_cache] remove allocation retry logic
......................................................................


Patch Set 2:

The retry logic prevent from memkind/jemalloc allocation failure occasionally. It's better to keep these code there in case running to crash.


-- 
To view, visit http://gerrit.cloudera.org:8080/14498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
Gerrit-Change-Number: 14498
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: ye yuqiang <yu...@intel.com>
Gerrit-Comment-Date: Fri, 18 Oct 2019 02:57:23 +0000
Gerrit-HasComments: No

[kudu-CR] [nvm cache] remove allocation retry logic

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14498 )

Change subject: [nvm_cache] remove allocation retry logic
......................................................................

[nvm_cache] remove allocation retry logic

This patch addresses the issue with allocation retry logic
in NvmLRUCache::AllocateAndRetry(), removing it altogether.

The idea of removing some entries from the cache and trying again
in case of allocation failure is:
  * not aligned with DRAM-based cache behavior
  * not robust enough

First, from the conceptual point, DRAM block cache doesn't retry
in such cases (i.e. in the sense that new[] may raise an exception and
crash Kudu if it runs out of memory).  Second, there is no guarantee
there would be some entries in the cache not referenced from anywhere
else.  Third, since the block cache performs clean-up of the cache
on its own by running a periodic grooming task, the value of such
de-allocation attempt is greatly diminished.

This is a follow-up to 0897b6b0e47a64c0cf30c956c020371258f2bacd.

Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
Reviewed-on: http://gerrit.cloudera.org:8080/14498
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/util/nvm_cache.cc
1 file changed, 5 insertions(+), 40 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
Gerrit-Change-Number: 14498
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: ye yuqiang <yu...@intel.com>

[kudu-CR] [nvm cache] remove allocation retry logic

Posted by "ye yuqiang (Code Review)" <ge...@cloudera.org>.
ye yuqiang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14498 )

Change subject: [nvm_cache] remove allocation retry logic
......................................................................


Patch Set 2:

> > Patch Set 2:
 > >
 > > The retry logic prevent from memkind/jemalloc allocation failure
 > occasionally. It's better to keep these code there in case running
 > to crash. 
 > 
 > Could you describe this in more detail? Under what circumstances do
 > allocation failures happen?
 > 
 > We're sniffing around this code because we saw that since the
 > switch from libpmem to memkind, cfile-test has been crashing with a
 > SIGSEGV in CacheMemoryTypes/TestCFileBothCacheMemoryTypes.TestReadWriteLargeStrings/1,
 > which is the NVMCache parameterization of the test.

The allocation failure may occur when eviction happens. This caused by fragmentation issue from memkind/jemalloc. The retry logic is target to solve this issue. But looks like it may not related to SIGSEGV in unit test. Another option is try to create memkind with conservative policy. https://github.com/memkind/memkind/blob/074e47e81dd825630fb099407366ddddd333ac93/examples/pmem_config.c#L75


-- 
To view, visit http://gerrit.cloudera.org:8080/14498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5671b2413160ca10b7e41c38d7e6164efdccfb8b
Gerrit-Change-Number: 14498
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: ye yuqiang <yu...@intel.com>
Gerrit-Comment-Date: Fri, 18 Oct 2019 03:34:33 +0000
Gerrit-HasComments: No