You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Indhumathi M <in...@gmail.com> on 2020/02/13 06:58:57 UTC

[DISCUSSION] Multi-tenant support by refactoring datamaps

Hello all,

Currently, when user creates a datamap, system will store the datamap
metadata in a configurable system folder in HDFS or S3. And also, since we
use same naming conventions
as datamap name for datamapschema, users cannot create datamap with same
name which is already present in storage.

System folder currently holds the following files,
1. DataMapSchema -> a json file containing schema for one datamap.
2. DataMapStatus -> status for each datamap

In cloud scenarios, when one user creates SYSTEM_FOLDER and stores metadata
for materalized views and index datamap's such as bloom and lucene, other
user's are not able to access the SYSTEM_FOLDER.

In order to support multi-tenancy for datamaps, i am planning to move
system_folder under
each database level, so that users can access it. As system folder is moved
across database folder,users can create datamap with same name under
different databases.

Datamaps will be saved to database folder specified while creating datamap.

Any suggestions/inputs from the community is appreciated.

Thanks
Indhumathi

Re: 回复:[DISCUSSION] Multi-tenant support by refactoring datamaps

Posted by akashrn5 <ak...@gmail.com>.
Hi, 

+1

I agree with jacky,  we can store Info in table metadata. But here one
problem we can face, that is metastore connection issue. If there are lot of
tables and datamaps, doing many connection to metastore reduces performance.
In this case reading from one schema file will be better. 

So if we are planning to store in metadata, then better while refactoring we
need to take care of almost reducing the metastore connection for getting
info of datamaps until and unless table is altered or any other similar
scenario.

Regards
Akash 



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

回复:[DISCUSSION] Multi-tenant support by refactoring datamaps

Posted by Jacky Li <ja...@qq.com>.
Hi,


+1 for moving the DataMapSchema json file to database folder, for supporting multi-tenancy.


Furthermore, I suggest we further refactor the datamap. The reason is that now the Sencodary Index feature have been introduced into CarbonData, and it stores the index metadata as the table property in the main table, and Index datamap actually also only associated with one main table only, so we can do the same for index datamap.


Propose to refactor as following:
1. For index datamap like bloom filter and lucene datamap, move their metadata (DataMapSchema) to the table property of the main table. Just like the way SI has done.&nbsp;


2. Then DataMapSchema is only for Materialized View. We can rename it to MVSchema and clean up to keep only required fields for MV only.


3. Add separate commands for CREATE MATERIALIZED VIEW and CREATE INDEX, unify the Index SQL syntax for bloomfilter, lucene and SI.


4. After these refactory, for MV we can enlarge its support scope for non-carbon table. This could be a big benefit for user as he can accelerate OLAP queries on orc/parquet tables, for example.


Regards,
Jacky




------------------&nbsp;原始邮件&nbsp;------------------
发件人:&nbsp;"Indhumathi M"<indhumathim27@gmail.com&gt;;
发送时间:&nbsp;2020年2月13日(星期四) 下午3:28
收件人:&nbsp;"dev"<dev@carbondata.apache.org&gt;;

主题:&nbsp;[DISCUSSION] Multi-tenant support by refactoring datamaps



Hello all,

Currently, when user creates a datamap, system will store the datamap
metadata in a configurable system folder in HDFS or S3. And also, since we
use same naming conventions
as datamap name for datamapschema, users cannot create datamap with same
name which is already present in storage.

System folder currently holds the following files,
1. DataMapSchema -&gt; a json file containing schema for one datamap.
2. DataMapStatus -&gt; status for each datamap

In cloud scenarios, when one user creates SYSTEM_FOLDER and stores metadata
for materalized views and index datamap's such as bloom and lucene, other
user's are not able to access the SYSTEM_FOLDER.

In order to support multi-tenancy for datamaps, i am planning to move
system_folder under
each database level, so that users can access it. As system folder is moved
across database folder,users can create datamap with same name under
different databases.

Datamaps will be saved to database folder specified while creating datamap.

Any suggestions/inputs from the community is appreciated.

Thanks
Indhumathi

Re: [DISCUSSION] Multi-tenant support by refactoring datamaps

Posted by David CaiQiang <da...@gmail.com>.
+1

please take care of the performance changes during refactoring datamaps



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/