You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Marcus Sorensen <sh...@gmail.com> on 2012/11/01 01:12:33 UTC

Re: Requirements of Storage Orchestration

Ok. Thanks for the detail. I agree that creating volumes would be a
great place to start and build from there. I think #1 is ideal on the
more advanced backend features.

On Wed, Oct 31, 2012 at 3:05 PM, Edison Su <Ed...@citrix.com> wrote:
>
>
>> -----Original Message-----
>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>> Sent: Wednesday, October 31, 2012 12:35 PM
>> To: cloudstack-dev@incubator.apache.org
>> Subject: Re: Requirements of Storage Orchestration
>>
>> I just don't see how this would be easy to implement. Actually creating an
>> iscsi lun on a target would be fine, there are a few known parameters and
>> the plugin would do the work. Allowing someone to configure any arbitrary
>> storage array via plugin seems tricky. I'm not an expert in the code though,
>> nor do I really understand the storage framework, but let me explain why I
>> think it's tricky.
>>
>>  Admin wants to create new primary storage, there's a plugin call provided to
>> list devices/disks attached to an appliance maybe? And it's up to the plugin to
>> do the actual work of collecting that, but it returns a list of objects to
>> cloudstack with a vendor-agnostic string for disk identifier and an arbitrary
>> set of properties based on storage appliance capabilities, like physical
>> blocksize of disks, disk size, controller/backplane location, maybe some
>> SMART attributes. Then we query the storage plugin for methods that can be
>> used on that appliance to create pools out of those disks (raid levels, zpools,
>> etc), and what features can be set, and what values those features accept.
>> Then we present all of this to the admin in some way that allows him/her to
>> define a storage pool with compression, dedup, encryption, ashift, and
>> whatever features the admin wants. Then we accept admin's input and send
>> a create pool command to the plugin.
>> This command might let us set a feature on the pool to let us know if we're
>> dealing with a filesystem or if we need to carve that pool into volumes. Then
>> we can call the plugin to create volumes if necessary, and perhaps format
>> those volumes with some filesystem. Then call the plugin for exporting them
>> via NFS, or export those volumes directly as iscsi or FC or whatever.
>
> Hand over the complicated tasks to storage provider itself. The life cycle of a data store provider is:
> Register -> enable -> disable -> deregister
> Before enabling the data store provider, admin should call some APIs to initialize or configure the storage backend. Each storage provider may have its own specific features and configurations. If we can't generalize it in one bunch of standard APIs, there are two ways to deal with it in my mind:
> 1. The provider can expose its own APIs to admin. This is what we are doing for network provider.
> 2. Make the API itself extensible. We define a big API, like initializeProvider(map<string, Object> parameters), admin then passes a map of parameters that specific to the provider.
> If the provider is configured properly by admin, storage backend is ready to create storage pool, then admin can call an API to enable it.
> If the provider is enabled, admin can call another API to create storage pools on it. Again, this API, like above initialize API, can be specific to each provider, but what this API returns is a token or a URI, which uniquely identify this storage pool. During the API call, provider can format the disk, exporting a NFS export, or any kind of specific operations on the storage backend. After the API call, the storage pool is ready to create volume on it.
> If storage pool is created, admin can attach it into a specific scope(a zone/cluster/host), then can create volume on it. CreateVolume api is a standard API, the parameters are volume size, and volume format(vhd/raw/qcow2 etc), cloudstack will then call storage pool's provider to create a volume, which will returns URI or token to cloudstack. The volume uri or token is transparent to cloudstack, it can be in the form of "iscsi://whatever", "nfs://whatever" or just a simple uuid.
> After we can get the volume uri or token, then pass it to hypervisor to lunch VM. On the hypervisor side, there should have code to decode the uri or token, which can be specific to each storage provider.
> Right now, I am not focusing on how to initialize or configure storage backend, but more than on how to discover storage pool from existing properly configured storage backend, then create volumes on it. If storage provider is interested in how to integrate its storage into cloudstack seamlessly, definitely, we can work together to get it work.
>
>>
>> That last sentence makes sense to me. OpenStack for example allows you to
>> create iscsi luns on various appliances, to use for VMs. It's at the point where
>> we're actually configuring and managing the appliance's disks and filesystems
>> that seems redundant to the vendor tools, difficult to do, and rarely
>> used/useful. Even if it were available, I wonder who would write such a
>> plugin?
>>
>> On Wed, Oct 31, 2012 at 12:39 PM, Edison Su <Ed...@citrix.com> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>> >> Sent: Wednesday, October 31, 2012 11:13 AM
>> >> To: cloudstack-dev@incubator.apache.org
>> >> Subject: Re: Requirements of Storage Orchestration
>> >>
>> >> It seems to me that for the most part these things are a function of
>> >> managing the storage backend, along with creating the disk pools,
>> >> formatting filesystems, and the like, that are used as primary storage by
>> CloudStack.
>> >>
>> >> Should there be plugins to manage storage backends? Does any
>> >> competing project in the segment do this? It seems extremely complex
>> >> to add in functionality to expose disks and arrays from a SAN or NAS,
>> >> allow the admin to configure them into pools, choose filesystems,
>> >> manage NFS/iSCSI/RBD exports, and configure filesystem features all
>> >> through cloudstack. The root admin would be the only one with access
>> >> and they likely would find it just as easy to do it with the tools the storage
>> vendor provides.
>> >
>> > It should be easy to add storage pool management functionality into the
>> new storage framework.
>> > The primary storage layer looks like this:
>> > Volume service -> primary data store provider -> primary data store ->
>> > volume The lifecycle of primary data store is:
>> > Create -> attach ->detach -> delete
>> > Whenever admin wants to create a storage pool, call an api on the data
>> store provider, the provider can talk to its storage backend, create
>> storage(an ISCSI target or nfs mount point etc ).
>> > Admin can attach the storage to a zone, or pod, or cluster, or host, thus
>> cloudstack can use it to create volume on it.
>> >
>> >>
>> >> To me the only way it makes sense to roll those things in is if
>> >> there's some way to do it at the VM image level. I believe qcow2
>> >> supports encryption, and we can probably do encrypted lvm volumes as
>> >> well. I'd actually like to look into this. We also need to realize
>> >> that encrypting the disk doesn't do much good if someone gets access
>> >> to the VM host or cloudstack, they could likely see the encryption
>> >> key as well, but it does help in a case where someone tries to
>> >> download a copy of the disk image, if someone takes the physical disk
>> array, or something like that.
>> >>
>> >> Dedup will likely always be a function of the filesystem or storage
>> >> array, and I don't see a way for cloudstack to work at that level.
>> >>
>> >> On Wed, Oct 31, 2012 at 11:40 AM, Nguyen Anh Tu <ng...@gmail.com>
>> >> wrote:
>> >> > Love to hear that!!! Some days ago I post a mail to ask community
>> >> > about encrypting VM data in CloudStack, but seemly not to much
>> >> > people
>> >> take care.
>> >> > I'm writing an encryption service based on TrueCrypt, runing in
>> >> > background inside the VM. It separates from CloudStack. Great to
>> >> > hear about the API idea. I think it's a good choice. Some questions
>> >> > about API scenario: how to generate passphase/key? how to keep it?
>> >> >
>> >> > 2012/10/31 Edison Su <Ed...@citrix.com>
>> >> >
>> >> >>
>> >> >>
>> >> >> > -----Original Message-----
>> >> >> > From: Umasankar Mukkara
>> [mailto:umasankar.mukkara@cloudbyte.co]
>> >> >> > Sent: Tuesday, October 30, 2012 9:20 AM
>> >> >> > To: cloudstack-dev@incubator.apache.org
>> >> >> > Subject: Requirements of Storage Orchestration
>> >> >> >
>> >> >> > Today I had the opportunity to listen to Kevin Kluge at the
>> >> >> > inauguration
>> >> >> event
>> >> >> > of Bangalore CloudStack Group. Some thoughts around new storage
>> >> >> > requirements popped out after this event. I thought I will post
>> >> >> > to the
>> >> >> dev
>> >> >> > group and check what are already in progress. Kevin said, Edison
>> >> >> > Su is
>> >> >> already
>> >> >> > in the process of designing and implementing/re-factoring some
>> >> >> > portions
>> >> >> of
>> >> >> > storage orchestrator.
>> >> >> >
>> >> >> > I could think of the following extensions to the current
>> >> >> > cloudstack
>> >> >> >
>> >> >> >
>> >> >> >    1. Ability to offload the data protection capabilities to the storage
>> >> >> >    array. (like dedup/snapshot/backup/encypt/compress)
>> >> >> >    2. Ability to provide an API at storage orchestrator so that
>> >> >> > the
>> >> >> storage
>> >> >> >    array can write to this API
>> >> >>
>> >> >> Only snapshot/backup are taken into consideration. Any details
>> >> >> about the scenario of encypt/compress/dedup?
>> >> >> Such as,  how to use this functionalities, what's the api should look like?
>> >> >> We can expose more capabilities on the API and storage driver layer.
>> >> >>
>> >> >> >    3. Extend the current storage offerings to include some of the
>> storage
>> >> >> >    array capabilities such as IOPS guarantee (or throttle), throughput
>> >> >> >    guarantee (or throttle)
>> >> >> >
>> >> >> > Where can I learn the current development threads around these
>> >> >> > in cloudstack? Edision Su (or) some one who is working on this,
>> >> >> > can please provide some pointers these so that I can pull myself
>> >> >> > up to speed ?I
>> >> >> would
>> >> >> > like to actively participate and hack some parts of it :)
>> >> >> Oh, great! There are so many code I want to change, really need
>> >> >> help and get feedback from other people.
>> >> >> I'll send out the status of my current work and what I am trying
>> >> >> to do in another thread.
>> >> >>
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> > Regards,
>> >> >> > Uma.
>> >> >> >
>> >> >> ------------------------------------------------------------------
>> >> >> ---
>> >> >> ---------
>> >> >> > CloudByte ElastiStor 1.0 is now available under Early Access
>> >> >> > Program<http://www.cloudbyte.com/eap.aspx>
>> >> >> >
>> >> >> ------------------------------------------------------------------
>> >> >> ---
>> >> >> ----------
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > N.g.U.y.e.N.A.n.H.t.U

Re: Requirements of Storage Orchestration

Posted by Umasankar Mukkara <um...@cloudbyte.co>.
Thanks Edison for the details. Yes, it is always good to offload the
storage features such as dedup, encryption to the storage array. I
understand that the volume API that you are referring to will be able to
call the storage array api to add specific operations on the NFS or iSCSI
or FC volumes.

The next generation storage arrays provide advanced features such as
ability to control QoS on a per volume basis. It would be a phenomenal
feature to control IOPS and throughput on a per-vm basis from cloudstack by
the use of storage-offerings.

Looking forward to get more details on the new storage manager code.


Regards,
Uma.


On Thu, Nov 1, 2012 at 5:42 AM, Marcus Sorensen <sh...@gmail.com> wrote:

> Ok. Thanks for the detail. I agree that creating volumes would be a
> great place to start and build from there. I think #1 is ideal on the
> more advanced backend features.
>
> On Wed, Oct 31, 2012 at 3:05 PM, Edison Su <Ed...@citrix.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> >> Sent: Wednesday, October 31, 2012 12:35 PM
> >> To: cloudstack-dev@incubator.apache.org
> >> Subject: Re: Requirements of Storage Orchestration
> >>
> >> I just don't see how this would be easy to implement. Actually creating
> an
> >> iscsi lun on a target would be fine, there are a few known parameters
> and
> >> the plugin would do the work. Allowing someone to configure any
> arbitrary
> >> storage array via plugin seems tricky. I'm not an expert in the code
> though,
> >> nor do I really understand the storage framework, but let me explain
> why I
> >> think it's tricky.
> >>
> >>  Admin wants to create new primary storage, there's a plugin call
> provided to
> >> list devices/disks attached to an appliance maybe? And it's up to the
> plugin to
> >> do the actual work of collecting that, but it returns a list of objects
> to
> >> cloudstack with a vendor-agnostic string for disk identifier and an
> arbitrary
> >> set of properties based on storage appliance capabilities, like physical
> >> blocksize of disks, disk size, controller/backplane location, maybe some
> >> SMART attributes. Then we query the storage plugin for methods that can
> be
> >> used on that appliance to create pools out of those disks (raid levels,
> zpools,
> >> etc), and what features can be set, and what values those features
> accept.
> >> Then we present all of this to the admin in some way that allows
> him/her to
> >> define a storage pool with compression, dedup, encryption, ashift, and
> >> whatever features the admin wants. Then we accept admin's input and send
> >> a create pool command to the plugin.
> >> This command might let us set a feature on the pool to let us know if
> we're
> >> dealing with a filesystem or if we need to carve that pool into
> volumes. Then
> >> we can call the plugin to create volumes if necessary, and perhaps
> format
> >> those volumes with some filesystem. Then call the plugin for exporting
> them
> >> via NFS, or export those volumes directly as iscsi or FC or whatever.
> >
> > Hand over the complicated tasks to storage provider itself. The life
> cycle of a data store provider is:
> > Register -> enable -> disable -> deregister
> > Before enabling the data store provider, admin should call some APIs to
> initialize or configure the storage backend. Each storage provider may have
> its own specific features and configurations. If we can't generalize it in
> one bunch of standard APIs, there are two ways to deal with it in my mind:
> > 1. The provider can expose its own APIs to admin. This is what we are
> doing for network provider.
> > 2. Make the API itself extensible. We define a big API, like
> initializeProvider(map<string, Object> parameters), admin then passes a map
> of parameters that specific to the provider.
> > If the provider is configured properly by admin, storage backend is
> ready to create storage pool, then admin can call an API to enable it.
> > If the provider is enabled, admin can call another API to create storage
> pools on it. Again, this API, like above initialize API, can be specific to
> each provider, but what this API returns is a token or a URI, which
> uniquely identify this storage pool. During the API call, provider can
> format the disk, exporting a NFS export, or any kind of specific operations
> on the storage backend. After the API call, the storage pool is ready to
> create volume on it.
> > If storage pool is created, admin can attach it into a specific scope(a
> zone/cluster/host), then can create volume on it. CreateVolume api is a
> standard API, the parameters are volume size, and volume
> format(vhd/raw/qcow2 etc), cloudstack will then call storage pool's
> provider to create a volume, which will returns URI or token to cloudstack.
> The volume uri or token is transparent to cloudstack, it can be in the form
> of "iscsi://whatever", "nfs://whatever" or just a simple uuid.
> > After we can get the volume uri or token, then pass it to hypervisor to
> lunch VM. On the hypervisor side, there should have code to decode the uri
> or token, which can be specific to each storage provider.
> > Right now, I am not focusing on how to initialize or configure storage
> backend, but more than on how to discover storage pool from existing
> properly configured storage backend, then create volumes on it. If storage
> provider is interested in how to integrate its storage into cloudstack
> seamlessly, definitely, we can work together to get it work.
> >
> >>
> >> That last sentence makes sense to me. OpenStack for example allows you
> to
> >> create iscsi luns on various appliances, to use for VMs. It's at the
> point where
> >> we're actually configuring and managing the appliance's disks and
> filesystems
> >> that seems redundant to the vendor tools, difficult to do, and rarely
> >> used/useful. Even if it were available, I wonder who would write such a
> >> plugin?
> >>
> >> On Wed, Oct 31, 2012 at 12:39 PM, Edison Su <Ed...@citrix.com>
> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> >> >> Sent: Wednesday, October 31, 2012 11:13 AM
> >> >> To: cloudstack-dev@incubator.apache.org
> >> >> Subject: Re: Requirements of Storage Orchestration
> >> >>
> >> >> It seems to me that for the most part these things are a function of
> >> >> managing the storage backend, along with creating the disk pools,
> >> >> formatting filesystems, and the like, that are used as primary
> storage by
> >> CloudStack.
> >> >>
> >> >> Should there be plugins to manage storage backends? Does any
> >> >> competing project in the segment do this? It seems extremely complex
> >> >> to add in functionality to expose disks and arrays from a SAN or NAS,
> >> >> allow the admin to configure them into pools, choose filesystems,
> >> >> manage NFS/iSCSI/RBD exports, and configure filesystem features all
> >> >> through cloudstack. The root admin would be the only one with access
> >> >> and they likely would find it just as easy to do it with the tools
> the storage
> >> vendor provides.
> >> >
> >> > It should be easy to add storage pool management functionality into
> the
> >> new storage framework.
> >> > The primary storage layer looks like this:
> >> > Volume service -> primary data store provider -> primary data store ->
> >> > volume The lifecycle of primary data store is:
> >> > Create -> attach ->detach -> delete
> >> > Whenever admin wants to create a storage pool, call an api on the data
> >> store provider, the provider can talk to its storage backend, create
> >> storage(an ISCSI target or nfs mount point etc ).
> >> > Admin can attach the storage to a zone, or pod, or cluster, or host,
> thus
> >> cloudstack can use it to create volume on it.
> >> >
> >> >>
> >> >> To me the only way it makes sense to roll those things in is if
> >> >> there's some way to do it at the VM image level. I believe qcow2
> >> >> supports encryption, and we can probably do encrypted lvm volumes as
> >> >> well. I'd actually like to look into this. We also need to realize
> >> >> that encrypting the disk doesn't do much good if someone gets access
> >> >> to the VM host or cloudstack, they could likely see the encryption
> >> >> key as well, but it does help in a case where someone tries to
> >> >> download a copy of the disk image, if someone takes the physical disk
> >> array, or something like that.
> >> >>
> >> >> Dedup will likely always be a function of the filesystem or storage
> >> >> array, and I don't see a way for cloudstack to work at that level.
> >> >>
> >> >> On Wed, Oct 31, 2012 at 11:40 AM, Nguyen Anh Tu <ng...@gmail.com>
> >> >> wrote:
> >> >> > Love to hear that!!! Some days ago I post a mail to ask community
> >> >> > about encrypting VM data in CloudStack, but seemly not to much
> >> >> > people
> >> >> take care.
> >> >> > I'm writing an encryption service based on TrueCrypt, runing in
> >> >> > background inside the VM. It separates from CloudStack. Great to
> >> >> > hear about the API idea. I think it's a good choice. Some questions
> >> >> > about API scenario: how to generate passphase/key? how to keep it?
> >> >> >
> >> >> > 2012/10/31 Edison Su <Ed...@citrix.com>
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> > -----Original Message-----
> >> >> >> > From: Umasankar Mukkara
> >> [mailto:umasankar.mukkara@cloudbyte.co]
> >> >> >> > Sent: Tuesday, October 30, 2012 9:20 AM
> >> >> >> > To: cloudstack-dev@incubator.apache.org
> >> >> >> > Subject: Requirements of Storage Orchestration
> >> >> >> >
> >> >> >> > Today I had the opportunity to listen to Kevin Kluge at the
> >> >> >> > inauguration
> >> >> >> event
> >> >> >> > of Bangalore CloudStack Group. Some thoughts around new storage
> >> >> >> > requirements popped out after this event. I thought I will post
> >> >> >> > to the
> >> >> >> dev
> >> >> >> > group and check what are already in progress. Kevin said, Edison
> >> >> >> > Su is
> >> >> >> already
> >> >> >> > in the process of designing and implementing/re-factoring some
> >> >> >> > portions
> >> >> >> of
> >> >> >> > storage orchestrator.
> >> >> >> >
> >> >> >> > I could think of the following extensions to the current
> >> >> >> > cloudstack
> >> >> >> >
> >> >> >> >
> >> >> >> >    1. Ability to offload the data protection capabilities to
> the storage
> >> >> >> >    array. (like dedup/snapshot/backup/encypt/compress)
> >> >> >> >    2. Ability to provide an API at storage orchestrator so that
> >> >> >> > the
> >> >> >> storage
> >> >> >> >    array can write to this API
> >> >> >>
> >> >> >> Only snapshot/backup are taken into consideration. Any details
> >> >> >> about the scenario of encypt/compress/dedup?
> >> >> >> Such as,  how to use this functionalities, what's the api should
> look like?
> >> >> >> We can expose more capabilities on the API and storage driver
> layer.
> >> >> >>
> >> >> >> >    3. Extend the current storage offerings to include some of
> the
> >> storage
> >> >> >> >    array capabilities such as IOPS guarantee (or throttle),
> throughput
> >> >> >> >    guarantee (or throttle)
> >> >> >> >
> >> >> >> > Where can I learn the current development threads around these
> >> >> >> > in cloudstack? Edision Su (or) some one who is working on this,
> >> >> >> > can please provide some pointers these so that I can pull myself
> >> >> >> > up to speed ?I
> >> >> >> would
> >> >> >> > like to actively participate and hack some parts of it :)
> >> >> >> Oh, great! There are so many code I want to change, really need
> >> >> >> help and get feedback from other people.
> >> >> >> I'll send out the status of my current work and what I am trying
> >> >> >> to do in another thread.
> >> >> >>
> >> >> >> >
> >> >> >> > --
> >> >> >> >
> >> >> >> > Regards,
> >> >> >> > Uma.
> >> >> >> >
> >> >> >> ------------------------------------------------------------------
> >> >> >> ---
> >> >> >> ---------
> >> >> >> > CloudByte ElastiStor 1.0 is now available under Early Access
> >> >> >> > Program<http://www.cloudbyte.com/eap.aspx>
> >> >> >> >
> >> >> >> ------------------------------------------------------------------
> >> >> >> ---
> >> >> >> ----------
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > N.g.U.y.e.N.A.n.H.t.U
>