You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by ☼ R Nair <ra...@gmail.com> on 2018/09/01 17:50:22 UTC

Data Duplication

Hi all,

I am new to Kylin. So here is a fundamental question: When I create a cube,
as its MOLAP, I believe that irrespectivve of the already existing data in
HBase, Kylin will create a copy of the data in a cube/multidimensional
format (separate from the underlying Base data) to help slice/dice faster.
Any idea on size of the duplicate copy created? Thanks

Best,
Ravion

Re: Data Duplication

Posted by ☼ R Nair <ra...@gmail.com>.
Thanks, that clears my question well.

Best, Ravion

On Sun, Sep 2, 2018, 7:30 AM <ro...@stratebi.com> wrote:

> Hello Ravion,
>
>
>
> Indeed Kylin generates a MOLAP cube from data source tables (Hive tables,
> or also other systems like Kafka queues or JDBC-MySQL, Oracle...). In a
> Kylin project, data sources are defined in "Data Sources" section and then
> a "Data Model" has to be created where the relationship between the source
> tables (joins in the star schema or level flake) is indicated, as well as
> the columns of each table that will be used as dimensions and those that
> will be used as measurements. After this, *the last metadata layer "Cube"
> is defined, which is closely related to the generation and storage of the
> MOLAP cube in HBase.* After the first construction, the generated MOLAP
> cube is stored in HBase.
>
>
>
> *The size of this generated MOLAP cube therefore depends on the definition
> of the "Cube", where the level of pre-aggregation of the data stored in the
> MOLAP cube is determined by means of different concepts (e.g. Normal or
> Derived dimensions).* For example, I have 2 Kylin Cubes mounted on Data
> Model which is a DW in Hive. This DW fact table sizes 1 Gb (ORC format and
> compression) Snappy.  One of the generated kylin cubes sizes 1 Gb, that is,
> almost the same size as the DW in Hive font (1 Gb Hive + 1 Cube in HBase).
> However, other generated Kylin cube, with different cube definition over
> same Data Model, sizes 10 Gb. This bigger size is due to I defined more
> dimensions as Normal type in Kylin cube definition, in order to achieve
> better results in querying times.
>
>
>
> I'm hoping to clear up the doubts for you.
>
>
>
> Best Regards,
>
>
>
> *Roberto Tardío Olmos*
>
> *Head of Big Data Analytics*
>
> Avenida de Brasil, 17, Planta 16.28020 Madrid
>
> Fijo: 91.788.34.10
>
>
> [image:
> http://www.stratebi.com/image/layout_set_logo?img_id=21615&t=1486381163544]
>
>
>
> http://bigdata.stratebi.com/
>
>
>
> http://www.stratebi.com
>
>
>
> *From:* ☼ R Nair [mailto:ravishankar.nair@gmail.com]
> *Sent:* sábado, 1 de septiembre de 2018 19:50
> *To:* user@kylin.apache.org
> *Subject:* Data Duplication
>
>
>
> Hi all,
>
>
>
> I am new to Kylin. So here is a fundamental question: When I create a
> cube, as its MOLAP, I believe that irrespectivve of the already existing
> data in HBase, Kylin will create a copy of the data in a
> cube/multidimensional format (separate from the underlying Base data) to
> help slice/dice faster. Any idea on size of the duplicate copy created?
> Thanks
>
>
>
> Best,
>
> Ravion
>

RE: Data Duplication

Posted by ro...@stratebi.com.
Hello Ravion,

 

Indeed Kylin generates a MOLAP cube from data source tables (Hive tables, or also other systems like Kafka queues or JDBC-MySQL, Oracle...). In a Kylin project, data sources are defined in "Data Sources" section and then a "Data Model" has to be created where the relationship between the source tables (joins in the star schema or level flake) is indicated, as well as the columns of each table that will be used as dimensions and those that will be used as measurements. After this, the last metadata layer "Cube" is defined, which is closely related to the generation and storage of the MOLAP cube in HBase. After the first construction, the generated MOLAP cube is stored in HBase. 

 

The size of this generated MOLAP cube therefore depends on the definition of the "Cube", where the level of pre-aggregation of the data stored in the MOLAP cube is determined by means of different concepts (e.g. Normal or Derived dimensions). For example, I have 2 Kylin Cubes mounted on Data Model which is a DW in Hive. This DW fact table sizes 1 Gb (ORC format and compression) Snappy.  One of the generated kylin cubes sizes 1 Gb, that is, almost the same size as the DW in Hive font (1 Gb Hive + 1 Cube in HBase). However, other generated Kylin cube, with different cube definition over same Data Model, sizes 10 Gb. This bigger size is due to I defined more dimensions as Normal type in Kylin cube definition, in order to achieve better results in querying times.

 

I'm hoping to clear up the doubts for you.

 

Best Regards,

 

Roberto Tardío Olmos

Head of Big Data Analytics

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com <http://www.stratebi.com/>  

 

From: ☼ R Nair [mailto:ravishankar.nair@gmail.com] 
Sent: sábado, 1 de septiembre de 2018 19:50
To: user@kylin.apache.org
Subject: Data Duplication

 

Hi all,

 

I am new to Kylin. So here is a fundamental question: When I create a cube, as its MOLAP, I believe that irrespectivve of the already existing data in HBase, Kylin will create a copy of the data in a cube/multidimensional format (separate from the underlying Base data) to help slice/dice faster. Any idea on size of the duplicate copy created? Thanks

 

Best,

Ravion