You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by BELLIER Jean-luc <je...@rte-france.com> on 2018/03/01 08:30:15 UTC

RE: Questions about 'RAW' measure

Hello Alberto,

Thank you for your answer. I will look further for this mistake on the cube building.

Concerning the RAW measure, are you referring to this discussion ?
I still can see this option on measures section on Kylin 2.2, that is why it kept my attention.
Does it mean that to access raw data, we need to first use an aggregated measure ? My final users mainly use raw data (e.g. slicing), so I want to be sure on that.

What about building cubes using only a table of facts with all the data inside ? Is it a conceivable way of doing (in terms of space storage, efficiency) or is it preferable to use separate tables foe dimensions and why ?

Thank you in advance for your help.
Have a good day.

Best regards,
Jean-Luc.

De : Alberto Ramón [mailto:a.ramonportoles@gmail.com]
Envoyé : mercredi 28 février 2018 19:04
À : user <us...@kylin.apache.org>
Objet : Re: Questions about 'RAW' measure

Hello
- RAW format are deprecated. You will find the thread in this MailList
- "Job hasn't been submitted after" sound a configuration problem with your YARN, please find it on Google and review your CPU and RAM resources

On 28 February 2018 at 16:44, BELLIER Jean-luc <je...@rte-france.com>> wrote:
Hello

I discovered that there wsas a RAW measure to get raw data instead of aggregated data (http://kylin.apache.org/blog/2016/05/29/raw-measure-in-kylin/)

My assumption is that these raw data are stored in HBase, as aggregated data are, i.e. these data are duplicated from Hive into HBase.
So my question is : are there limitations on the data volume ? My fact tables contain billions of rows and we need to get detailed information from them. So what are the restrictions, and also the benefits related to querying directly the data into Hive ?

I have another question : I tested the way to create a model directly from a facts table containing raw data, in order to make a test of feasibility and avoid transformations (the table is a CSV file provided by an external team). I wanted in a first step to avoid creating files for the corresponding dimensions a generate a “clean” facts table having foreign keys corresponding to the primary keys of dimension tables.
The creation of the model was OK.
However the cube generation failed at first step, and I got this message :

INFO : Query ID = hive_20180228120101_6990f9d4-182d-4dd9-b319-fce02caf75ef
INFO : Total jobs = 3
INFO : Launching Job 1 out of 3
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Spark Job = 3556ecc6-2609-4085-bcca-b1b81fa9855c
ERROR : Job hasn't been submitted after 61s. Aborting it.

How could I process to avoid this. Are there kylin parameters (or other) to adjust ?

Thank you in advance for your help. Have a good day.
Best regards,
Jean-Luc

"Ce message est destiné exclusivement aux personnes ou entités auxquelles il est adressé et peut contenir des informations privilégiées ou confidentielles. Si vous avez reçu ce document par erreur, merci de nous l'indiquer par retour, de ne pas le transmettre et de procéder à sa destruction.

This message is solely intended for the use of the individual or entity to which it is addressed and may contain information that is privileged or confidential. If you have received this communication by error, please notify us immediately by electronic mail, do not disclose it and delete the original message."

Re: Questions about 'RAW' measure

Posted by Alberto Ramón <a....@gmail.com>.

MailList
<http://apache-kylin.74782.x6.nabble.com/Discuss-Disable-hide-RAW-measure-in-Kylin-web-GUI-tp6636.html>:
Kylin 3062 <https://issues.apache.org/jira/browse/KYLIN-3062> v2.3 Propose
to disable RAW from UI

Nowadays you cant control the execution (or not) to create Flat Tables,
there is a propuse Kylin 2532
<https://issues.apache.org/jira/browse/KYLIN-2532?focusedCommentId=15956535&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15956535>
v2.1



On 1 March 2018 at 08:30, BELLIER Jean-luc <je...@rte-france.com>
wrote:

> Hello Alberto,
>
>
>
> Thank you for your answer. I will look further for this mistake on the
> cube building.
>
>
>
> Concerning the RAW measure, are you referring to this discussion  ?
>
> I still can see this option on measures section on Kylin 2.2, that is why
> it kept my attention.
>
> Does it mean that to access raw data, we need to first use an aggregated
> measure ? My final users mainly use raw data (e.g. slicing), so I want to
> be sure on that.
>
>
>
> What about building cubes using only a table of facts with all the data
> inside ? Is it a conceivable way of doing (in terms of space storage,
> efficiency) or is it preferable to use separate tables foe dimensions and
> why ?
>
>
>
> Thank you in advance for your help.
>
> Have a good day.
>
>
>
> Best regards,
>
> Jean-Luc.
>
>
>
> *De :* Alberto Ramón [mailto:a.ramonportoles@gmail.com]
> *Envoyé :* mercredi 28 février 2018 19:04
> *À :* user <us...@kylin.apache.org>
> *Objet :* Re: Questions about 'RAW' measure
>
>
>
> Hello
>
> - RAW format are deprecated. You will find the thread in this MailList
> - "Job hasn't been submitted after" sound a configuration problem with
> your YARN, please find it on Google and review your CPU and RAM resources
>
>
>
> On 28 February 2018 at 16:44, BELLIER Jean-luc <
> jean-luc.bellier@rte-france.com> wrote:
>
> Hello
>
>
>
> I discovered that there wsas a RAW measure to get raw data instead of
> aggregated data (http://kylin.apache.org/blog/2016/05/29/raw-measure-in-
> kylin/)
>
>
>
> My assumption is that these raw data are stored in HBase, as aggregated
> data are, i.e. these data are duplicated from Hive into HBase.
>
> So my question is : are there limitations on the data volume ? My fact
> tables contain billions of rows and we need to get detailed information
> from them. So what are the restrictions, and also the benefits related to
> querying directly the data into Hive ?
>
>
>
> I have another question : I tested the way to create a model directly from
> a  facts table containing raw data, in order to make a test of feasibility
> and avoid transformations (the table is a CSV file provided by an external
> team). I wanted in a first step to avoid creating files for the
> corresponding dimensions a generate a “clean” facts table having foreign
> keys corresponding to  the primary keys of dimension tables.
>
> The creation of the model was OK.
>
> However the cube generation failed at first step, and I got this message :
>
>
>
> INFO  : Query ID = hive_20180228120101_6990f9d4-
> 182d-4dd9-b319-fce02caf75ef
>
> INFO  : Total jobs = 3
>
> INFO  : Launching Job 1 out of 3
>
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
>
> INFO  : In order to change the average load for a reducer (in bytes):
>
> INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
>
> INFO  : In order to limit the maximum number of reducers:
>
> INFO  :   set hive.exec.reducers.max=<number>
>
> INFO  : In order to set a constant number of reducers:
>
> INFO  :   set mapreduce.job.reduces=<number>
>
> INFO  : Starting Spark Job = 3556ecc6-2609-4085-bcca-b1b81fa9855c
>
> ERROR : Job hasn't been submitted after 61s. Aborting it.
>
>
>
> How could I process to avoid this. Are there kylin parameters (or other)
> to adjust ?
>
>
>
> Thank you in advance for your help. Have a good day.
>
> Best regards,
>
> Jean-Luc
>
>
>
>
>
>
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is privileged or
> confidential. If you have received this communication by error, please
> notify us immediately by electronic mail, do not disclose it and delete the
> original message."
>
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is privileged or
> confidential. If you have received this communication by error, please
> notify us immediately by electronic mail, do not disclose it and delete the
> original message."
>