You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crail.apache.org by sy...@gmail.com, sy...@gmail.com on 2019/03/23 20:07:36 UTC

RDMA and Crail Implementation

Hello,

I am learning Crail and RDMA implementation. I have a question regarding RDMA + Crail and I was hoping you could help me figure it out. In RDMA, a memory space is required to be registered with a protection domain to get a memory region (IbvMr) and respected rkey. However, each registration will take extra overhead.

After studying Crail code, I found it seems Crail only registers memory space once with a single protection domain to get a single memory region and save registration overhead, and all incoming connections/storageClients share the same memory space, memory region, and protection domain with the same RDMA privilege. Did I understand this correctly?

Reference code in Crail: 
1 [create endpoint]. https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L68
2 [register memory]. https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L112

William

Re: RDMA and Crail Implementation

Posted by West Lafayette L <sy...@gmail.com>.
Hello Patrick and Animesh,

Thank you for the explanation. This answers my question clearly.

Sincerely,
William

On Mon, Mar 25, 2019 at 5:47 PM Animesh Trivedi <an...@gmail.com>
wrote:

> You are right to assume that all connections on the server side are put in
> a single protection domain, so they share memory registration done in that
> protection domain, consequently accesses as well. Multiple protection
> domains within a single server-process on a NIC would have been another
> design, but with the current design it is meant to be deployed/developed as
> what Patrick described.
>
> Cheers,
> --
> Animesh
>
>
> On Mon, Mar 25, 2019 at 9:56 PM Patrick Stuedi <ps...@gmail.com> wrote:
>
> > Hi William,
> >
> > Not sure I completely understand the question but here is an to what I
> > think you're asking. A storage server is always bound to one NIC, so one
> > protection domain. You can start multiple storage servers per node, for
> > instance one per NIC (or any number of storage servers you like) which
> > would then involve different protection domains. Running multiple storage
> > servers per node is a perfectly valid configuration, Crail storage
> servers
> > are designed to be "micro-servers" and are meant to be deployed like
> this.
> >
> > From the moment the storage server registers a region with the metadata
> > server, the region itself looses any meaning. Regions are broken into
> > blocks, and blocks are assigned to files and different clients my read or
> > write different files concurrently.
> >
> > -Patrick
> >
> >
> >
> > On Mon, Mar 25, 2019 at 7:53 PM West Lafayette L <sy...@gmail.com>
> > wrote:
> >
> > > Hello Patrick,
> > >
> > > Thank you for your detailed explanation. My question actually focuses
> on
> > > the storage server side.
> > >
> > > I am also aware that multiple memory registrations will take
> significant
> > > memory space for the state on the RDMA NIC and it will also increase
> the
> > > message exchange between the storage server and the metadata server.
> > >
> > > More specifically, my question is does Crail register a Crail region
> > > (assume crail.regionsize = 1GB) with different RDMA memory regions
> > (IbvMr)
> > > for different clients? In Crail, a storage server will host connections
> > to
> > > multiple clients, and clients use these connections to issue RDMA
> > requests
> > > to get data from Crail storage server. Does Crail
> > > 1. use different RDMA memory regions (IbvMr) for single Crail region?
> > (put
> > > each connection under different protection domains, register the single
> > > Crail region by N times to get N different RDMA memory regions, and
> give
> > > different connections different memory regions)
> > > 2. or it's a kind of shared memory region idea? (put all connections
> > under
> > > the same protection domain, register the Crail region once, and share
> the
> > > single RDMA memory region between different connections which can keep
> > > state on the NIC small)?
> > > Based on code study and discussion, I suppose it's close to 2.
> > >
> > > Again, many thanks.
> > >
> > > William
> > >
> > > On Mon, Mar 25, 2019 at 2:42 AM Patrick Stuedi <ps...@gmail.com>
> > wrote:
> > >
> > > > HI William,
> > > >
> > > > You have to differentiate the server side registration from the
> client
> > > side
> > > > registration. The links above are server side. There we allocate
> memory
> > > in
> > > > larger segments (defined by the config variable
> crail.allocationSize).
> > > > AllocationSize must be a multiples of crail.bufferSize which is the
> > basic
> > > > allocatoin unit for files. Reason we choose to allocate larger
> segments
> > > is
> > > > to keep the state on the NIC small (each registration consumes
> state),
> > > and
> > > > to amortize the regMr calls, but also to minimize the number of
> > messages
> > > > between the storage server and the metadata server (a storage server
> > > needs
> > > > to inform the metadata server about each allocated segment).
> > > >
> > > > On the client side we allocate memory that is used for buffered
> > streams.
> > > > This memory is also allocated in larger segments for the same reason
> to
> > > > keep the registration state on the NIC small and to amortize the
> > overhead
> > > > of regMr calls. We further try to use huge pages for both client and
> > > server
> > > > side which further reduces the state on the NIC.
> > > >
> > > > Let me know if you need further information.
> > > >
> > > > -Patrick
> > > >
> > > >
> > > > On Mon, Mar 25, 2019 at 12:18 AM sybuycar@gmail.com <
> > sybuycar@gmail.com>
> > > > wrote:
> > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > I am learning Crail and RDMA implementation. I have a question
> > > regarding
> > > > > RDMA + Crail and I was hoping you could help me figure it out. In
> > > RDMA, a
> > > > > memory space is required to be registered with a protection domain
> to
> > > > get a
> > > > > memory region (IbvMr) and respected rkey. However, each
> registration
> > > will
> > > > > take extra overhead.
> > > > >
> > > > > After studying Crail code, I found it seems Crail only registers
> > memory
> > > > > space once with a single protection domain to get a single memory
> > > region
> > > > > and save registration overhead, and all incoming
> > > > connections/storageClients
> > > > > share the same memory space, memory region, and protection domain
> > with
> > > > the
> > > > > same RDMA privilege. Did I understand this correctly?
> > > > >
> > > > > Reference code in Crail:
> > > > > 1 [create endpoint].
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L68
> > > > > 2 [register memory].
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L112
> > > > >
> > > > > William
> > > > >
> > > >
> > >
> >
>

Re: RDMA and Crail Implementation

Posted by Animesh Trivedi <an...@gmail.com>.
You are right to assume that all connections on the server side are put in
a single protection domain, so they share memory registration done in that
protection domain, consequently accesses as well. Multiple protection
domains within a single server-process on a NIC would have been another
design, but with the current design it is meant to be deployed/developed as
what Patrick described.

Cheers,
--
Animesh


On Mon, Mar 25, 2019 at 9:56 PM Patrick Stuedi <ps...@gmail.com> wrote:

> Hi William,
>
> Not sure I completely understand the question but here is an to what I
> think you're asking. A storage server is always bound to one NIC, so one
> protection domain. You can start multiple storage servers per node, for
> instance one per NIC (or any number of storage servers you like) which
> would then involve different protection domains. Running multiple storage
> servers per node is a perfectly valid configuration, Crail storage servers
> are designed to be "micro-servers" and are meant to be deployed like this.
>
> From the moment the storage server registers a region with the metadata
> server, the region itself looses any meaning. Regions are broken into
> blocks, and blocks are assigned to files and different clients my read or
> write different files concurrently.
>
> -Patrick
>
>
>
> On Mon, Mar 25, 2019 at 7:53 PM West Lafayette L <sy...@gmail.com>
> wrote:
>
> > Hello Patrick,
> >
> > Thank you for your detailed explanation. My question actually focuses on
> > the storage server side.
> >
> > I am also aware that multiple memory registrations will take significant
> > memory space for the state on the RDMA NIC and it will also increase the
> > message exchange between the storage server and the metadata server.
> >
> > More specifically, my question is does Crail register a Crail region
> > (assume crail.regionsize = 1GB) with different RDMA memory regions
> (IbvMr)
> > for different clients? In Crail, a storage server will host connections
> to
> > multiple clients, and clients use these connections to issue RDMA
> requests
> > to get data from Crail storage server. Does Crail
> > 1. use different RDMA memory regions (IbvMr) for single Crail region?
> (put
> > each connection under different protection domains, register the single
> > Crail region by N times to get N different RDMA memory regions, and give
> > different connections different memory regions)
> > 2. or it's a kind of shared memory region idea? (put all connections
> under
> > the same protection domain, register the Crail region once, and share the
> > single RDMA memory region between different connections which can keep
> > state on the NIC small)?
> > Based on code study and discussion, I suppose it's close to 2.
> >
> > Again, many thanks.
> >
> > William
> >
> > On Mon, Mar 25, 2019 at 2:42 AM Patrick Stuedi <ps...@gmail.com>
> wrote:
> >
> > > HI William,
> > >
> > > You have to differentiate the server side registration from the client
> > side
> > > registration. The links above are server side. There we allocate memory
> > in
> > > larger segments (defined by the config variable crail.allocationSize).
> > > AllocationSize must be a multiples of crail.bufferSize which is the
> basic
> > > allocatoin unit for files. Reason we choose to allocate larger segments
> > is
> > > to keep the state on the NIC small (each registration consumes state),
> > and
> > > to amortize the regMr calls, but also to minimize the number of
> messages
> > > between the storage server and the metadata server (a storage server
> > needs
> > > to inform the metadata server about each allocated segment).
> > >
> > > On the client side we allocate memory that is used for buffered
> streams.
> > > This memory is also allocated in larger segments for the same reason to
> > > keep the registration state on the NIC small and to amortize the
> overhead
> > > of regMr calls. We further try to use huge pages for both client and
> > server
> > > side which further reduces the state on the NIC.
> > >
> > > Let me know if you need further information.
> > >
> > > -Patrick
> > >
> > >
> > > On Mon, Mar 25, 2019 at 12:18 AM sybuycar@gmail.com <
> sybuycar@gmail.com>
> > > wrote:
> > >
> > > >
> > > > Hello,
> > > >
> > > > I am learning Crail and RDMA implementation. I have a question
> > regarding
> > > > RDMA + Crail and I was hoping you could help me figure it out. In
> > RDMA, a
> > > > memory space is required to be registered with a protection domain to
> > > get a
> > > > memory region (IbvMr) and respected rkey. However, each registration
> > will
> > > > take extra overhead.
> > > >
> > > > After studying Crail code, I found it seems Crail only registers
> memory
> > > > space once with a single protection domain to get a single memory
> > region
> > > > and save registration overhead, and all incoming
> > > connections/storageClients
> > > > share the same memory space, memory region, and protection domain
> with
> > > the
> > > > same RDMA privilege. Did I understand this correctly?
> > > >
> > > > Reference code in Crail:
> > > > 1 [create endpoint].
> > > >
> > >
> >
> https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L68
> > > > 2 [register memory].
> > > >
> > >
> >
> https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L112
> > > >
> > > > William
> > > >
> > >
> >
>

Re: RDMA and Crail Implementation

Posted by Patrick Stuedi <ps...@gmail.com>.
Hi William,

Not sure I completely understand the question but here is an to what I
think you're asking. A storage server is always bound to one NIC, so one
protection domain. You can start multiple storage servers per node, for
instance one per NIC (or any number of storage servers you like) which
would then involve different protection domains. Running multiple storage
servers per node is a perfectly valid configuration, Crail storage servers
are designed to be "micro-servers" and are meant to be deployed like this.

From the moment the storage server registers a region with the metadata
server, the region itself looses any meaning. Regions are broken into
blocks, and blocks are assigned to files and different clients my read or
write different files concurrently.

-Patrick



On Mon, Mar 25, 2019 at 7:53 PM West Lafayette L <sy...@gmail.com> wrote:

> Hello Patrick,
>
> Thank you for your detailed explanation. My question actually focuses on
> the storage server side.
>
> I am also aware that multiple memory registrations will take significant
> memory space for the state on the RDMA NIC and it will also increase the
> message exchange between the storage server and the metadata server.
>
> More specifically, my question is does Crail register a Crail region
> (assume crail.regionsize = 1GB) with different RDMA memory regions (IbvMr)
> for different clients? In Crail, a storage server will host connections to
> multiple clients, and clients use these connections to issue RDMA requests
> to get data from Crail storage server. Does Crail
> 1. use different RDMA memory regions (IbvMr) for single Crail region? (put
> each connection under different protection domains, register the single
> Crail region by N times to get N different RDMA memory regions, and give
> different connections different memory regions)
> 2. or it's a kind of shared memory region idea? (put all connections under
> the same protection domain, register the Crail region once, and share the
> single RDMA memory region between different connections which can keep
> state on the NIC small)?
> Based on code study and discussion, I suppose it's close to 2.
>
> Again, many thanks.
>
> William
>
> On Mon, Mar 25, 2019 at 2:42 AM Patrick Stuedi <ps...@gmail.com> wrote:
>
> > HI William,
> >
> > You have to differentiate the server side registration from the client
> side
> > registration. The links above are server side. There we allocate memory
> in
> > larger segments (defined by the config variable crail.allocationSize).
> > AllocationSize must be a multiples of crail.bufferSize which is the basic
> > allocatoin unit for files. Reason we choose to allocate larger segments
> is
> > to keep the state on the NIC small (each registration consumes state),
> and
> > to amortize the regMr calls, but also to minimize the number of messages
> > between the storage server and the metadata server (a storage server
> needs
> > to inform the metadata server about each allocated segment).
> >
> > On the client side we allocate memory that is used for buffered streams.
> > This memory is also allocated in larger segments for the same reason to
> > keep the registration state on the NIC small and to amortize the overhead
> > of regMr calls. We further try to use huge pages for both client and
> server
> > side which further reduces the state on the NIC.
> >
> > Let me know if you need further information.
> >
> > -Patrick
> >
> >
> > On Mon, Mar 25, 2019 at 12:18 AM sybuycar@gmail.com <sy...@gmail.com>
> > wrote:
> >
> > >
> > > Hello,
> > >
> > > I am learning Crail and RDMA implementation. I have a question
> regarding
> > > RDMA + Crail and I was hoping you could help me figure it out. In
> RDMA, a
> > > memory space is required to be registered with a protection domain to
> > get a
> > > memory region (IbvMr) and respected rkey. However, each registration
> will
> > > take extra overhead.
> > >
> > > After studying Crail code, I found it seems Crail only registers memory
> > > space once with a single protection domain to get a single memory
> region
> > > and save registration overhead, and all incoming
> > connections/storageClients
> > > share the same memory space, memory region, and protection domain with
> > the
> > > same RDMA privilege. Did I understand this correctly?
> > >
> > > Reference code in Crail:
> > > 1 [create endpoint].
> > >
> >
> https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L68
> > > 2 [register memory].
> > >
> >
> https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L112
> > >
> > > William
> > >
> >
>

Re: RDMA and Crail Implementation

Posted by West Lafayette L <sy...@gmail.com>.
Hello Patrick,

Thank you for your detailed explanation. My question actually focuses on
the storage server side.

I am also aware that multiple memory registrations will take significant
memory space for the state on the RDMA NIC and it will also increase the
message exchange between the storage server and the metadata server.

More specifically, my question is does Crail register a Crail region
(assume crail.regionsize = 1GB) with different RDMA memory regions (IbvMr)
for different clients? In Crail, a storage server will host connections to
multiple clients, and clients use these connections to issue RDMA requests
to get data from Crail storage server. Does Crail
1. use different RDMA memory regions (IbvMr) for single Crail region? (put
each connection under different protection domains, register the single
Crail region by N times to get N different RDMA memory regions, and give
different connections different memory regions)
2. or it's a kind of shared memory region idea? (put all connections under
the same protection domain, register the Crail region once, and share the
single RDMA memory region between different connections which can keep
state on the NIC small)?
Based on code study and discussion, I suppose it's close to 2.

Again, many thanks.

William

On Mon, Mar 25, 2019 at 2:42 AM Patrick Stuedi <ps...@gmail.com> wrote:

> HI William,
>
> You have to differentiate the server side registration from the client side
> registration. The links above are server side. There we allocate memory in
> larger segments (defined by the config variable crail.allocationSize).
> AllocationSize must be a multiples of crail.bufferSize which is the basic
> allocatoin unit for files. Reason we choose to allocate larger segments is
> to keep the state on the NIC small (each registration consumes state), and
> to amortize the regMr calls, but also to minimize the number of messages
> between the storage server and the metadata server (a storage server needs
> to inform the metadata server about each allocated segment).
>
> On the client side we allocate memory that is used for buffered streams.
> This memory is also allocated in larger segments for the same reason to
> keep the registration state on the NIC small and to amortize the overhead
> of regMr calls. We further try to use huge pages for both client and server
> side which further reduces the state on the NIC.
>
> Let me know if you need further information.
>
> -Patrick
>
>
> On Mon, Mar 25, 2019 at 12:18 AM sybuycar@gmail.com <sy...@gmail.com>
> wrote:
>
> >
> > Hello,
> >
> > I am learning Crail and RDMA implementation. I have a question regarding
> > RDMA + Crail and I was hoping you could help me figure it out. In RDMA, a
> > memory space is required to be registered with a protection domain to
> get a
> > memory region (IbvMr) and respected rkey. However, each registration will
> > take extra overhead.
> >
> > After studying Crail code, I found it seems Crail only registers memory
> > space once with a single protection domain to get a single memory region
> > and save registration overhead, and all incoming
> connections/storageClients
> > share the same memory space, memory region, and protection domain with
> the
> > same RDMA privilege. Did I understand this correctly?
> >
> > Reference code in Crail:
> > 1 [create endpoint].
> >
> https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L68
> > 2 [register memory].
> >
> https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L112
> >
> > William
> >
>

Re: RDMA and Crail Implementation

Posted by Patrick Stuedi <ps...@gmail.com>.
HI William,

You have to differentiate the server side registration from the client side
registration. The links above are server side. There we allocate memory in
larger segments (defined by the config variable crail.allocationSize).
AllocationSize must be a multiples of crail.bufferSize which is the basic
allocatoin unit for files. Reason we choose to allocate larger segments is
to keep the state on the NIC small (each registration consumes state), and
to amortize the regMr calls, but also to minimize the number of messages
between the storage server and the metadata server (a storage server needs
to inform the metadata server about each allocated segment).

On the client side we allocate memory that is used for buffered streams.
This memory is also allocated in larger segments for the same reason to
keep the registration state on the NIC small and to amortize the overhead
of regMr calls. We further try to use huge pages for both client and server
side which further reduces the state on the NIC.

Let me know if you need further information.

-Patrick


On Mon, Mar 25, 2019 at 12:18 AM sybuycar@gmail.com <sy...@gmail.com>
wrote:

>
> Hello,
>
> I am learning Crail and RDMA implementation. I have a question regarding
> RDMA + Crail and I was hoping you could help me figure it out. In RDMA, a
> memory space is required to be registered with a protection domain to get a
> memory region (IbvMr) and respected rkey. However, each registration will
> take extra overhead.
>
> After studying Crail code, I found it seems Crail only registers memory
> space once with a single protection domain to get a single memory region
> and save registration overhead, and all incoming connections/storageClients
> share the same memory space, memory region, and protection domain with the
> same RDMA privilege. Did I understand this correctly?
>
> Reference code in Crail:
> 1 [create endpoint].
> https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L68
> 2 [register memory].
> https://github.com/apache/incubator-crail/blob/v1.1/storage-rdma/src/main/java/org/apache/crail/storage/rdma/RdmaStorageServer.java#L112
>
> William
>