You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/06/16 09:25:00 UTC

[jira] [Commented] (KUDU-3413) Kudu multi-tenancy

    [ https://issues.apache.org/jira/browse/KUDU-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733418#comment-17733418 ] 

ASF subversion and git services commented on KUDU-3413:
-------------------------------------------------------

Commit 8e8a397415c819c5a454460028b4f4397fd18ae8 in kudu's branch refs/heads/master from kedeng
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=8e8a39741 ]

KUDU-3413 add tenant info in metadata for multi-tenancy

If we want to introduce multi-tenancy feature, it is important to
add relevant metadata information first. In this patch, I added
definition of tenant information in metadata, and refactored some
metadata loading to distinguish between data at rest encryption
feature and multi-tenancy feature.

In the subsequent use, if we encounter a situation where both server
key and tenant key are present in the metadata, we need to pay
attention to the fact that tenant key will have a higher priority.
That is, we will consider it as enabling the multi-tenancy feature
and ignore the processing of server key.

Of course, it is also necessary to provide a way to upgrade server key
to tenant key, which I plan to implement in the next patch.

In this commit, I added a new flag '--enable_multi_tenancy' to indicate
whether we enable multi-tenancy feature. However, considering that the
development of the entire feature has not been completed, it is not
recommended to set this flag temporarily. We need to wait until the
entire feature is developed before considering whether to set it.

And I also added unit tests for the new open process to ensure the
open logic is effective.

Change-Id: I9e450d73940eb1dbaac6f905a46d6ccd084f15cf
Reviewed-on: http://gerrit.cloudera.org:8080/19622
Tested-by: Kudu Jenkins
Reviewed-by: Yingchun Lai <la...@apache.org>


> Kudu multi-tenancy
> ------------------
>
>                 Key: KUDU-3413
>                 URL: https://issues.apache.org/jira/browse/KUDU-3413
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: dengke
>            Assignee: dengke
>            Priority: Major
>         Attachments: data_and_metadata.png, kudu table topology.png, metadata_record.png, new_fs_manager.png, tablet_rowsets.png, zonekey_update.png
>
>
> h1. 1、Definition
>  * Tenant: A cluster user can be called a tenant. Tenants may be divided by project or actual application. Each tenant is equivalent to a resource pool, and all users under a tenant share all resources of the resource pool. Multiple tenants share a cluster resource.
>  * User: The user of cluster resources.
>  * Multi tenant: The database level controls that tenants cannot access each other, and resources are private and independent(Note: Kudu does not have the concept of database, which is simply understood as multiple tables).
> h1. 2、Current situation
>         The latest version of kudu has realized ‘data at rest encryption', mainly cluster level authentication and encryption, data storage encryption of a single server level, which can meet the needs of basic encryption scenarios, but there is still a little gap from the tenant level encryption we are pursuing.
> h1. 3、Outline design
>         In general, there are the following differences between tenant level encryption and cluster level encryption:
>  * Tenant level encryption requires data storage isolation, which means data between tenants needs to be separated (a new layer of namespace namespace may be added to the storage topology, and data of the same tenant is stored in the same namespace path, with minimal mutual impact);
>  * The generation and use of tenants'keys. In a multi tenant scenario, we need to replace the cluster key with the tenant key.
> h1. 4、Design
> h2. 4.1 Namespace
>         The namespace in the storage field of the industry is mainly used to maintain the file attributes, directory tree structure and other metadata information of the file system, and is compatible with POSIX directory trees and file operations. It is a core concept in file storage. Taking the common HDFS as an example, its namespace is mainly implemented based on "the disk allows logical partitioning, while attaching partition files to different directories, and finally modifying the directory owner's permissions" to achieve resource isolation.
>         Corresponding to the Kudu system, the current storage topology is relatively mature, and the kudu client's read/write requests need to be processed by tserver before the corresponding data can be obtained. The request does not involve direct manipulation of raw data, that is, the client does not perceive the data distribution in the storage engine at all, there is a natural degree of data isolation.
>         However, the data in the storage engine are intertwined. In some extreme cases, there is still the possibility of interaction. The best solution is to completely distinguish the read/write, compact and other processing processes of different tenants. However, it requires a lot of changes and may lead to system instability. We can make minimal changes by tenant to achieve physical isolation of data.
>         First, we need to analyze the current storage topology: a table in kudu will be divided into multiple tablet partitions. Each tablet includes metadata meta information and several RowSets. The RowSet contains a 'MemRowSet'(corresponding to the data in memory) and multiple 'DiskRowSets'(corresponding to the data on the disk). The 'DiskRowSet' contains 'BloomFile’、'Ad_hoc Index’、'BaseData'、'DeltaMem' and several 'RedoFiles' and 'UndoFile' (generally, there is only one 'UndoFile'). For more specific distribution information, please refer to the following figure.
> !kudu table topology.png|width=1282,height=721!
>         *The simplest way to achieve physical isolation is to set different storage paths for the data of different tenants.* Currently, we only need to consider the physical isolation of 'DiskRowSet'.
>         Kudu system writes disks through containers. Each container can write a large continuous disk space for writing data to a CFile (the actual storage form of ‘DiskRowSet'). When one CFile is written, the container will be returned to the ‘BlockManager', and then the container can be used to write data to the next CFile. When no container is available in the BlockManager, a new container will be created for the new CFile. Each container consists of a *. metadata and a * Data. Each DiskRowSet has several blocks, and all the blocks corresponding to a DiskRowset are distributed to multiple containers. A container may also contain data from multiple DiskRowSets.
>         It can be simply understood that one DiskRowSet corresponds to one CFile file (it refers to the single column case. If it is multi column, it corresponds to multiple CFile files). The difference is that DiskRowSet is our logical organization, while CFile is our physical storage. For the six parts of a DiskRowSet (BloomFile, BaseData, UndoFile, RedoFile, DeltaMem, AdhocIndex as shown in the figure above), neither one CFile corresponds to a DiskRowSet nor one CFile contains all six parts of a DiskRowSet. These six parts will be independent in multiple CFiles, and each part will be a separate CFile. As shown in the figure below, we can only find the following files (*. data and *. metadata) in the actual production environment, and no CFile file exists.
> !data_and_metadata.png|width=1298,height=395!
>         This is because a large number of CFiles will be merged and written to a *.data file by the container, and the *.data is actually a collection of CFiles. The CFile corresponding to each part of the DiskRowSet and its mapping relationship are recorded in the tablet-meta/<tablet_id>. In the file, each mapping relationship is based on the tablet_id which saved separately. 
>         In current storage topology, the *.metadata file corresponds to the metadata of the block (the final representation of CFile in fs) of the lowest level fs layer. It is not in the same dimension as the above concepts such as CFile and BlockManager. Instead, it records the relevant information of the block. As shown in the figure below, it is a record in *. metadata.
> !metadata_record.png!
>         According to the above description, we can draw the approximate corresponding relationship as shown in the figure below:
> !tablet_rowsets.png|width=1315,height=695!
>         Base on  the above logic, we can know that the *.data file is the actual storage location of tenant data. To achieve data isolation, the isolation of *.data is needed. In order to achieve this goal, we can choose to create different BlockManagers for each tenant, maintain their own *.data files. *_In the default scenario (no tenant name is specified), the data will have a default block_manager. If multi tenant encryption is enabled, fs_manager will create a new tenant_block_manager based on the tenant name, the data of the specified tenant name will be stored in the tenant_block_manager corresponding to the tenant name to achieve the purpose of data physical isolation._* The modified schematic diagram is as follows:
> {{!new_fs_manager.png|width=1306,height=552!}}        Add the correspondence between the tenant and the block_manager in fs_manager, and maintain it in memory. The tenant's information needs to be persistent. We can consider appending metadata, or adding new metadata files for real-time update.
> {code:java}
> message TenantMetadataPB {
>   message TenantMeta {
>     // The name of tenant.
>     optional string tenant_name = 1;
>     // Encrypted tenant key used to encrypt/decrypt file keys for tenant.
>     optional string tenant_key = 2;
>     // Initialization vector for the tenant key.
>     optional string server_key_iv = 3;
>   }
>   repeated TenantMeta tenant_meta = 1;
>   // Tenant key version.
>   optional string tenant_key_version = 2;
> } {code}
> h2. 4.2 Tenant Key
>         There are two current implementations of the key:
>  * When static encryption is enabled, server_key is randomly generated by default;
>  * When the address and cluster name of the kms are specified, try to get the server_key from kms.
>         The server_key is mainly used for encryption and decryption of sensitive files. We should change the work mode like 'no encryption’, 'default cluster static encryption’, 'KMS cluster static encryption' and 'KMS multi tenant encryption’. In the 'KMS multi tenant encryption' mode, the new tenant name need to add. The tenant name is used to distinguish different tenants and obtain the corresponding key. If the tenant name is not set, it corresponds to the "default cluster static encryption” mode, which means sharing the randomly generated server_key by default. 
>         In the previous cluster encryption scenario, kms_client gets the zonekey information of the cluster. But there is only zonekey information and no tenant information in the ranger system, so we need to maintain the correspondence between the tenant name and zonekey. To do this, we need to add a configuration file(maybe JSON format) to mark the corresponding relationship between the tenant name and zonekey. Every time the tenant name changes, we need to add a zoneKey in Ranger first, then update the configuration item in the configuration file, and finally use the new tenant name when creating the table by the end.
> {code:c++}
> class RangerKMSClient {
>  public:
>   RangerKMSClient(std::string kms_url)
>     : kms_url_(std::move(kms_url)) {}
>  
>   Status DecryptKey(const std::string tenant_name,
>                     const std::string& encrypted_key,
>                     const std::string& iv,
>                     const std::string& key_version,
>                     std::string* decrypted_key);
>  
>   Status GenerateEncryptedServerKey(const std::string tenant_name,
>                                     std::string* encrypted_key,
>                                     std::string* iv,
>                                     std::string* key_version);
>  
>  private:
>   std::string kms_url_;
> };
> class DefaultKeyProvider : public KeyProvider {
> public:
>   ~DefaultKeyProvider() override {}
>   Status DecryptServerKey(const std::string& encrypted_server_key,
>                           const std::string& /*iv*/,
>                           const std::string& /*key_version*/,
>                           std::string* server_key);
>  
>   Status GenerateEncryptedServerKey(std::string* server_key,
>                                     std::string* iv,
>                                     std::string* key_version);
> };
> {code}
>         The encryption and decryption api of the kms client needs to pass in the tenant name, and maintain the correspondence between the tenant name and the zonekey in the memory. Each time we use it, search it in the memory at first. If the search fails, we will search in the configuration file, and update the memory data at the same time. If it fails again, we will return. Otherwise, we will use the queried zonekey to obtain the key.
> !zonekey_update.png|width=1273,height=754!
> h1. 5、Follow-up work
>  * Add the parameter of tenant name;
>  * Add multi tenant encryption mode parameter control;
>  * Modify the use of block_manager to adapt to multi tenant scenarios;
>  * Modify the key acquisition;
>  * Add new multi tenant key acquisition and sensitive data encryption;
>  * Modify the key acquisition and sensitive data encryption behavior of the default scenario (no tenant is specified);



--
This message was sent by Atlassian Jira
(v8.20.10#820010)