You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crail.apache.org by David Crespi <da...@storedgesystems.com> on 2019/07/09 01:05:37 UTC
Clarifying questions...
HI,
Wanted to ask if there is a way of using local ssd via the RdmaStorageTier, so a couple of question.
From the blog example there were these three classes.
crail@clustermaster:~$ cat $CRAIL_HOME/conf/slaves
clusternode1 -t org.apache.crail.storage.rdma.RdmaStorageTier -c 0
clusternode1 -t org.apache.crail.storage.nvmf.NvmfStorageTier -c 1
disaggnode -t org.apache.crail.storage.nvmf.NvmfStorageTier -c 2
1. Is there a way of using the RdmaStorageTier directly with a SSD that is local to the server “clusternode1”?
Or is it that the local SSD has to be included into a NVMf subsystem on that local server, thus the NvmfStorageTier
is used on that same server in order to access the SSD locally via an nvmf subsystem.
1. I asked the question a few days ago about how to use the same Subsystem NQN, which I can’t with a single
instance of SPDK. Is this how using the same a NQN is possible, that different instances of SPDK would be used… one on each server (i.e. clusternode1 & clusternode2), each with their own “version” of that same Subsystem?
BTW…
I have my environment all running now, and all in containers. Everything appears to be working as advertised.
The spark shuffle seems to be filling up the memory tier, then continuing on to the ssd tier. Haven’t done anything
over 300G yet, but it’s coming. I’m clarifying the above to be sure I’m not missing out on one of the configs. I’m
currently also using HDFS for the tmp results as I currently only have one instance of SPDK, so both
NVMf class 1 and 2 can’t exist for me (assuming the answers above that is 😊).
Regards,
David
Re: Clarifying questions...
Posted by Jonas Pfefferle <pe...@japf.ch>.
Hi David,
Good to hear things work now.
1) Technically, you can use the RdmaStorageTier "directly" with a SSD since
it allocates its data in "datapath" (and then mmaps it). Now this path is
typically a hugetlbfs but it can be a standard mount point. However, there
are a few drawbacks with this approach: all IO is buffered and you have no
control over when it is written to the SSD and since Rdma requires that all
memory is pinned you have to allocate as much memory as your SSD has. So
overall that is not really feasible.
My recommendation is to use the NVMf storage tier locally.
2) Correct, at the moment that is the only way you can do this: start
multiple instances of SPDK or use SPDK RAID0 if you just want to use
multiple devices in the same storage class.
FYI the shuffle plugin also supports configuring the storage class it should
write to: "spark.crail.shuffle.storageclass" (put into Spark config)
Regards,
Jonas
On Tue, 9 Jul 2019 01:05:37 +0000
David Crespi <da...@storedgesystems.com> wrote:
> HI,
> Wanted to ask if there is a way of using local ssd via the
>RdmaStorageTier, so a couple of question.
>From the blog example there were these three classes.
>
> crail@clustermaster:~$ cat $CRAIL_HOME/conf/slaves
>
> clusternode1 -t org.apache.crail.storage.rdma.RdmaStorageTier -c 0
>
> clusternode1 -t org.apache.crail.storage.nvmf.NvmfStorageTier -c 1
>
> disaggnode -t org.apache.crail.storage.nvmf.NvmfStorageTier -c 2
>
> 1. Is there a way of using the RdmaStorageTier directly with a SSD
>that is local to the server “clusternode1”?
> Or is it that the local SSD has to be included into a NVMf subsystem
>on that local server, thus the NvmfStorageTier
> is used on that same server in order to access the SSD locally via
>an nvmf subsystem.
>
>
> 1. I asked the question a few days ago about how to use the same
>Subsystem NQN, which I can’t with a single
>
> instance of SPDK. Is this how using the same a NQN is possible, that
>different instances of SPDK would be used… one on each server (i.e.
>clusternode1 & clusternode2), each with their own “version” of that
>same Subsystem?
>
> BTW…
> I have my environment all running now, and all in containers.
> Everything appears to be working as advertised.
> The spark shuffle seems to be filling up the memory tier, then
>continuing on to the ssd tier. Haven’t done anything
> over 300G yet, but it’s coming. I’m clarifying the above to be sure
>I’m not missing out on one of the configs. I’m
> currently also using HDFS for the tmp results as I currently only
>have one instance of SPDK, so both
> NVMf class 1 and 2 can’t exist for me (assuming the answers above
>that is 😊).
>
> Regards,
>
> David
>
>