You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@crail.apache.org by David Crespi <da...@storedgesystems.com> on 2019/07/09 01:05:37 UTC

Clarifying questions...

HI,
Wanted to ask if there is a way of using local ssd via the RdmaStorageTier, so a couple of question.
From the blog example there were these three classes.

crail@clustermaster:~$ cat $CRAIL_HOME/conf/slaves

clusternode1 -t org.apache.crail.storage.rdma.RdmaStorageTier -c 0

clusternode1 -t org.apache.crail.storage.nvmf.NvmfStorageTier -c 1

disaggnode -t org.apache.crail.storage.nvmf.NvmfStorageTier -c 2

1. Is there a way of using the RdmaStorageTier directly with a SSD that is local to the server “clusternode1”?
Or is it that the local SSD has to be included into a NVMf subsystem on that local server, thus the NvmfStorageTier
is used on that same server in order to access the SSD locally via an nvmf subsystem.

1. I asked the question a few days ago about how to use the same Subsystem NQN, which I can’t with a single

instance of SPDK. Is this how using the same a NQN is possible, that different instances of SPDK would be used… one on each server (i.e. clusternode1 & clusternode2), each with their own “version” of that same Subsystem?

BTW…
I have my environment all running now, and all in containers. Everything appears to be working as advertised.
The spark shuffle seems to be filling up the memory tier, then continuing on to the ssd tier. Haven’t done anything
over 300G yet, but it’s coming. I’m clarifying the above to be sure I’m not missing out on one of the configs. I’m
currently also using HDFS for the tmp results as I currently only have one instance of SPDK, so both
NVMf class 1 and 2 can’t exist for me (assuming the answers above that is 😊).

Regards,

David

Re: Clarifying questions...

Posted by Jonas Pfefferle <pe...@japf.ch>.

Hi David,


Good to hear things work now.
1) Technically, you can use the RdmaStorageTier "directly" with a SSD since 
it allocates its data in "datapath" (and then mmaps it). Now this path is 
typically a hugetlbfs but it can be a standard mount point. However, there 
are a few drawbacks with this approach: all IO is buffered and you have no 
control over when it is written to the SSD and since Rdma requires that all 
memory is pinned you have to allocate as much memory as your SSD has. So 
overall that is not really feasible.
My recommendation is to use the NVMf storage tier locally.
2) Correct, at the moment that is the only way you can do this: start 
multiple instances of SPDK or use SPDK RAID0 if you just want to use 
multiple devices in the same storage class.

FYI the shuffle plugin also supports configuring the storage class it should 
write to: "spark.crail.shuffle.storageclass" (put into Spark config)

Regards,
Jonas

  On Tue, 9 Jul 2019 01:05:37 +0000
  David Crespi <da...@storedgesystems.com> wrote:
> HI,
> Wanted to ask if there is a way of using local ssd via the 
>RdmaStorageTier, so a couple of question.
>From the blog example there were these three classes.
> 
> crail@clustermaster:~$ cat $CRAIL_HOME/conf/slaves
> 
> clusternode1 -t org.apache.crail.storage.rdma.RdmaStorageTier -c 0
> 
> clusternode1 -t org.apache.crail.storage.nvmf.NvmfStorageTier -c 1
> 
> disaggnode -t org.apache.crail.storage.nvmf.NvmfStorageTier -c 2
> 
>  1.  Is there a way of using the RdmaStorageTier directly with a SSD 
>that is local to the server “clusternode1”?
> Or is it that the local SSD has to be included into a NVMf subsystem 
>on that local server, thus the NvmfStorageTier
> is used on that same server in order to access the SSD locally via 
>an nvmf subsystem.
> 
> 
>  1.  I asked the question a few days ago about how to use the same 
>Subsystem NQN, which I can’t with a single
> 
> instance of SPDK. Is this how using the same a NQN is possible, that 
>different instances of SPDK would be used… one on each server (i.e. 
>clusternode1 & clusternode2), each with their own “version” of that 
>same Subsystem?
> 
> BTW…
> I have my environment all running now, and all in containers. 
> Everything appears to be working as advertised.
> The spark shuffle seems to be filling up the memory tier, then 
>continuing on to the ssd tier.  Haven’t done anything
> over 300G yet, but it’s coming.  I’m clarifying the above to be sure 
>I’m not missing out on one of the configs.  I’m
> currently also using HDFS for the tmp results as I currently only 
>have one instance of SPDK, so both
> NVMf class 1 and 2 can’t exist for me (assuming the answers above 
>that is 😊).
> 
> Regards,
> 
>           David
> 
>