You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/09 12:59:44 UTC

[GitHub] [spark] xuechendi commented on issue #24322: [SPARK-27412][Shuffle]Add a new shuffle manager: PmemShuffleManager

xuechendi commented on issue #24322: [SPARK-27412][Shuffle]Add a new shuffle manager: PmemShuffleManager
URL: https://github.com/apache/spark/pull/24322#issuecomment-481240897
 
 
   @attilapiros , thanks for the comment, I am trying to propose this idea to spark, and also submitted a JIRA here https://issues.apache.org/jira/browse/SPARK-27412
   I added this PR is aiming for people who are interested can not only reading the design doc, but also be able to to check upon the codes.
   
   And for you question:
   1. My ultimate goal is to add an abstract layer above DiskBlockObjectWriter, let's call it 'BlockObjectStream', to hide file-channel by using a inputStream and outputStream. By which way, people who want to implement new storage backend for shuffle and external sorter only need to derive from 'BlockObjectStream' and implement a inputstream and outputstream for it. like what the PmemBlockObjectWriter does in my current codes. So the reason I submitted my current codes is more like a proposal.
   
   2. For the C++ codes, thanks for the suggestion, I will make it as a separate artifact later.
   
   3. If you want to have a try, the configuration is to add below configuration and with Persistent Memory or emulation(https://pmem.io/2016/02/22/pm-emulation.html)
   And I will add some shuffle manager test to cover PmemShuffle path, thanks for the suggestion.
   
   Key | Value | Description
   -- | -- | --
   spark.shuffle.manager | org.apache.spark.shuffle.pmem.PmemShuffleManager | Enable PmemShuffleManager
   spark.shuffle.spill.pmem.MemoryThreshold | 16777216 | When inMemoryData   exceeds 16MB, Spill to pmem
   spark.shuffle.pmem.pmem_list | /dev/dax0.0,/dev/dax1.0 | Listing   Pmem device
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org