You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by "Zhang, Zhong" <zz...@cardlytics.com> on 2016/01/05 20:07:55 UTC

missing a field in the SQL when creating cube

Hi All,

I've encountered a kind of weird problem. I just normally created a cube step by step.
Based on my understanding, after filling all the information in cube info, data model,
dimensions, and etc, a SQL will be generated in the SQL column. The following is the
generated SQL:

SELECT
...
SPENDTRIPINC.POSTALCD
,FLATGEO2.TOWN
,FLATGEO2.DMA_CODE
,FLATGEO2.PROVINCECD
,FLATGEO2.REGIONNAME
,FLATGEO2.MSA
...
FROM FACT.SPENDTRIPINC as SPENDTRIPINC
INNER JOIN FACT.CALENDARDATES as CALENDARDATES
ON SPENDTRIPINC.DIMDATEKEY = CALENDARDATES.CALENDARDAYID
INNER JOIN FACT.FLATGEO2 as FLATGEO2
ON SPENDTRIPINC.POSTALCD = FLATGEO2.POSTALCD

I do put fields "POSTALCD","TOWN","DMA_CODE","PROVINCECD","REGIONNAME" and "MSA"
from table FLATGEO2 as a hierarchy dimension. There should be a filed POSTALCD
in the SQL. But unfortunately it is not there just as you see.

The weird thing is that if checking at JSON(cube) column, POSTALCD is there:

{

      "id": 12,

      "name": "FACTGEO",

      "table": "FACT.FLATGEO2",

      "column": [

        "POSTALCD",

        "TOWN",

        "DMA_CODE",

        "PROVINCECD",

        "REGIONNAME",

        "MSA"

      ],

      "derived": null,

      "hierarchy": true

    }

Best regards,
Zhong


Re: missing a field in the SQL when creating cube

Posted by Li Yang <li...@apache.org>.
Ah, I see your point!!

It's expected.

We don't need "lookuptable2.col3" because its value always equals to
"facttable.col3". Note "facttable.col3" is already pulled.

On Fri, Jan 8, 2016 at 5:41 AM, Zhang, Zhong <zz...@cardlytics.com> wrote:

> Hi Yang,
>
>
>
> The generated SQL query for supplement.
>
>
>
> SELECT
>
> ...
>
> ,facttable.col3
>
> ...
>
> ,lookuptable2.h1
>
> ,lookuptable2.h2
>
> ,lookuptable2.h3
>
> ,lookuptable2.h4
>
> ,lookuptable2.h5
>
> FROM FACT.facttable as facttable
>
> INNER JOIN FACT.lookuptable1 as lookuptable1
>
> ON facttable.col6 = lookuptable1.p1
>
> INNER JOIN FACT.lookuptable2 as lookuptable2
>
> ON facttable.col3 = lookuptable2.col3
>
>
>
> There is no field “col3” in lookuptable2.
>
>
>
> Best regards,
>
> Zhong
>
>
>
> *From:* Zhang, Zhong [mailto:zzhang@cardlytics.com]
> *Sent:* Thursday, January 07, 2016 4:25 PM
> *To:* user@kylin.apache.org
> *Subject:* RE: missing a field in the SQL when creating cube
>
>
>
> Hi Yang,
>
>
>
> Thanks so much for your reply.
>
>
>
> Let me give a little bit introduction. There is a column “col3” in the
> facttable and lookuptable2.
>
> “col3” is the primary key and foreign key in the lookuptable2. “col3” is
> also the missing field
>
> in the SQL query.
>
>
>
> JSON(Cube):
>
>
>
> {
>
>   "uuid": "14834977-626c-4f24-8d78-adf73cbf8cae",
>
>   "name": "missing_a_field_cube",
>
>   "description": "",
>
>   "dimensions": [
>
>     {
>
>       "id": 1,
>
>       "name": "col1",
>
>       "table": "FACT.facttable",
>
>       "column": [
>
>         "col1"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 2,
>
>       "name": "col2",
>
>       "table": "FACT.facttable",
>
>       "column": [
>
>         "col2"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 3,
>
>       "name": "col3",
>
>       "table": "FACT.facttable",
>
>       "column": [
>
>         "col3"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 4,
>
>       "name": "col4",
>
>       "table": "FACT.facttable",
>
>       "column": [
>
>         "col4"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 5,
>
>       "name": "col5",
>
>       "table": "FACT.facttable",
>
>       "column": [
>
>         "col5"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 6,
>
>       "name": "col6",
>
>       "table": "FACT.facttable",
>
>       "column": [
>
>         "col6"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 7,
>
>       "name": "col7",
>
>       "table": "FACT.facttable",
>
>       "column": [
>
>         "col7"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 8,
>
>       "name": "col8",
>
>       "table": "FACT.facttable",
>
>       "column": [
>
>         "col8"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 9,
>
>       "name": "col9",
>
>       "table": "FACT.facttable",
>
>       "column": [
>
>         "col9"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 10,
>
>       "name": "col10",
>
>       "table": "FACT.facttable",
>
>       "column": [
>
>         "col10"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 11,
>
>       "name": "L_col1",
>
>       "table": "FACT.lookuptable1",
>
>       "column": [
>
>         "L_col1"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": false
>
>     },
>
>     {
>
>       "id": 12,
>
>       "name": "hier_dim",
>
>       "table": "FACT.lookuptable2",
>
>       "column": [
>
>         "col3",
>
>         "h1",
>
>         "h2",
>
>         "h3",
>
>         "h4",
>
>        "h5"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": true
>
>     }
>
>   ],
>
>   "measures": [
>
>     {
>
>       "id": 1,
>
>       "name": "_COUNT_",
>
>       "function": {
>
>         "expression": "COUNT",
>
>         "parameter": {
>
>           "type": "constant",
>
>           "value": "1"
>
>         },
>
>         "returntype": "bigint"
>
>       },
>
>       "dependent_measure_ref": null
>
>     },
>
>     {
>
>       "id": 2,
>
>       "name": "SUMcol9",
>
>       "function": {
>
>         "expression": "SUM",
>
>         "parameter": {
>
>           "type": "column",
>
>           "value": "col9"
>
>         },
>
>         "returntype": "bigint"
>
>       },
>
>       "dependent_measure_ref": null
>
>     },
>
>     {
>
>       "id": 3,
>
>       "name": "SUMcol10",
>
>       "function": {
>
>         "expression": "SUM",
>
>         "parameter": {
>
>           "type": "column",
>
>           "value": "col10"
>
>         },
>
>         "returntype": "bigint"
>
>       },
>
>       "dependent_measure_ref": null
>
>     }
>
>   ],
>
>   "rowkey": {
>
>     "rowkey_columns": [
>
>       {
>
>         "column": "col1",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "col2",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "col3",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "col4",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "col5",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "col6",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "col7",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "col8",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "col9",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "col10",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "l_col1",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "h1",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "h2",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "h3",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>         "column": "h4",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       },
>
>       {
>
>        "column": "h5",
>
>         "length": 0,
>
>         "dictionary": "true",
>
>         "mandatory": false
>
>       }
>
>     ],
>
>     "aggregation_groups": [
>
>       [
>
>         "col1",
>
>         "col2",
>
>         "col4",
>
>         "col5",
>
>         "col6",
>
>         "col7",
>
>         "col8",
>
>         "col9",
>
>         "col10",
>
>         "l_col1"
>
>       ],
>
>       [
>
>         "col3",
>
>         "h1",
>
>         "h2",
>
>         "h3",
>
>         "h4",
>
>         "h5"
>
>       ]
>
>     ]
>
>   },
>
>   "signature": "ZGOjdHfxChhWpdXXAIbImQ==",
>
>   "last_modified": 1452199211425,
>
>   "model_name": "missing_a_field_cube",
>
>   "null_string": null,
>
>   "hbase_mapping": {
>
>     "column_family": [
>
>       {
>
>         "name": "F1",
>
>         "columns": [
>
>           {
>
>             "qualifier": "M",
>
>             "measure_refs": [
>
>               "_COUNT_",
>
>               "SUMcol9",
>
>               "SUMcol10"
>
>             ]
>
>           }
>
>         ]
>
>       }
>
>     ]
>
>   },
>
>   "notify_list": [],
>
>   "auto_merge_time_ranges": [
>
>     604800000,
>
>     2419200000
>
>   ],
>
>   "retention_range": 0
>
> }
>
>
>
>
>
>
>
> ========================================
>
> JSON(Model):
>
>
>
> {
>
>   "uuid": null,
>
>   "name": "missing_a_field_cube",
>
>   "lookups": [
>
>     {
>
>       "table": "FACT.lookuptable1",
>
>       "join": {
>
>         "type": "inner",
>
>         "primary_key": [
>
>           "p1"
>
>         ],
>
>         "foreign_key": [
>
>           "col6"
>
>         ]
>
>       }
>
>     },
>
>     {
>
>       "table": "FACT.lookuptable2",
>
>       "join": {
>
>         "type": "inner",
>
>         "primary_key": [
>
>           "col3"
>
>         ],
>
>         "foreign_key": [
>
>           "col3"
>
>         ]
>
>      }
>
>     }
>
>   ],
>
>   "capacity": "LARGE",
>
>   "last_modified": 1452199211170,
>
>   "fact_table": "FACT.facttable",
>
>   "filter_condition": "",
>
>   "partition_desc": {
>
>     "partition_date_column": "FACT.facttable.SPT_DATE",
>
>     "partition_date_start": 1325289600000,
>
>     "partition_type": "APPEND",
>
>     "partition_condition_builder":
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>
>   }
>
> }
>
>
>
> Best regards,
>
> Zhong
>
>
>
> *From:* Li Yang [mailto:liyang@apache.org <li...@apache.org>]
> *Sent:* Thursday, January 07, 2016 3:03 AM
> *To:* Li Yang <li...@apache.org>
> *Cc:* user@kylin.apache.org; Jones, James <jj...@cardlytics.com>
> *Subject:* Re: missing a field in the SQL when creating cube
>
>
>
> I mean need the two json files "model json and cube json" to further
> troubleshoot.
>
>
>
> On Thu, Jan 7, 2016 at 4:02 PM, Li Yang <li...@apache.org> wrote:
>
> Em... derived dimensions are not in extraction hive SQL, but hierarchy
> dimensions should appear.
>
>
>
> If you could both the model json and cube json files, we could try to
> reproduce and debug.
>
>
>
> On Wed, Jan 6, 2016 at 8:16 AM, Luke Han <lu...@gmail.com> wrote:
>
> it is by design, hierarchy/derived dimension will not be included in
> generated SQL which read data from Hive, which associate with PK/FK.
>
>
>
> Thanks.
>
>
>
>
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
>
>
> On Wed, Jan 6, 2016 at 3:07 AM, Zhang, Zhong <zz...@cardlytics.com>
> wrote:
>
> Hi All,
>
>
>
> I’ve encountered a kind of weird problem. I just normally created a cube
> step by step.
>
> Based on my understanding, after filling all the information in cube info,
> data model,
>
> dimensions, and etc, a SQL will be generated in the SQL column. The
> following is the
>
> generated SQL:
>
>
>
> SELECT
>
> ...
>
> SPENDTRIPINC.POSTALCD
>
> ,FLATGEO2.TOWN
>
> ,FLATGEO2.DMA_CODE
>
> ,FLATGEO2.PROVINCECD
>
> ,FLATGEO2.REGIONNAME
>
> ,FLATGEO2.MSA
>
> ...
>
> FROM FACT.SPENDTRIPINC as SPENDTRIPINC
>
> INNER JOIN FACT.CALENDARDATES as CALENDARDATES
>
> ON SPENDTRIPINC.DIMDATEKEY = CALENDARDATES.CALENDARDAYID
>
> INNER JOIN FACT.FLATGEO2 as FLATGEO2
>
> ON SPENDTRIPINC.POSTALCD = FLATGEO2.POSTALCD
>
>
>
> I do put fields "POSTALCD","TOWN","DMA_CODE","PROVINCECD","REGIONNAME"
> and "MSA"
>
> from table FLATGEO2 as a hierarchy dimension. There should be a filed
> POSTALCD
>
> in the SQL. But unfortunately it is not there just as you see.
>
>
>
> The weird thing is that if checking at JSON(cube) column, POSTALCD is
> there:
>
> {
>
>       "id": 12,
>
>       "name": "FACTGEO",
>
>       "table": "FACT.FLATGEO2",
>
>       "column": [
>
>         "POSTALCD",
>
>         "TOWN",
>
>         "DMA_CODE",
>
>         "PROVINCECD",
>
>         "REGIONNAME",
>
>         "MSA"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": true
>
>     }
>
>
>
> Best regards,
>
> Zhong
>
>
>
>
>
>
>
>
>

RE: missing a field in the SQL when creating cube

Posted by "Zhang, Zhong" <zz...@cardlytics.com>.
Hi Yang,

The generated SQL query for supplement.

SELECT
...
,facttable.col3
...
,lookuptable2.h1
,lookuptable2.h2
,lookuptable2.h3
,lookuptable2.h4
,lookuptable2.h5
FROM FACT.facttable as facttable
INNER JOIN FACT.lookuptable1 as lookuptable1
ON facttable.col6 = lookuptable1.p1
INNER JOIN FACT.lookuptable2 as lookuptable2
ON facttable.col3 = lookuptable2.col3

There is no field “col3” in lookuptable2.

Best regards,
Zhong

From: Zhang, Zhong [mailto:zzhang@cardlytics.com]
Sent: Thursday, January 07, 2016 4:25 PM
To: user@kylin.apache.org
Subject: RE: missing a field in the SQL when creating cube

Hi Yang,

Thanks so much for your reply.

Let me give a little bit introduction. There is a column “col3” in the facttable and lookuptable2.
“col3” is the primary key and foreign key in the lookuptable2. “col3” is also the missing field
in the SQL query.

JSON(Cube):

{
  "uuid": "14834977-626c-4f24-8d78-adf73cbf8cae",
  "name": "missing_a_field_cube",
  "description": "",
  "dimensions": [
    {
      "id": 1,
      "name": "col1",
      "table": "FACT.facttable",
      "column": [
        "col1"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 2,
      "name": "col2",
      "table": "FACT.facttable",
      "column": [
        "col2"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 3,
      "name": "col3",
      "table": "FACT.facttable",
      "column": [
        "col3"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 4,
      "name": "col4",
      "table": "FACT.facttable",
      "column": [
        "col4"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 5,
      "name": "col5",
      "table": "FACT.facttable",
      "column": [
        "col5"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 6,
      "name": "col6",
      "table": "FACT.facttable",
      "column": [
        "col6"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 7,
      "name": "col7",
      "table": "FACT.facttable",
      "column": [
        "col7"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 8,
      "name": "col8",
      "table": "FACT.facttable",
      "column": [
        "col8"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 9,
      "name": "col9",
      "table": "FACT.facttable",
      "column": [
        "col9"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 10,
      "name": "col10",
      "table": "FACT.facttable",
      "column": [
        "col10"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 11,
      "name": "L_col1",
      "table": "FACT.lookuptable1",
      "column": [
        "L_col1"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 12,
      "name": "hier_dim",
      "table": "FACT.lookuptable2",
      "column": [
        "col3",
        "h1",
        "h2",
        "h3",
        "h4",
       "h5"
      ],
      "derived": null,
      "hierarchy": true
    }
  ],
  "measures": [
    {
      "id": 1,
      "name": "_COUNT_",
      "function": {
        "expression": "COUNT",
        "parameter": {
          "type": "constant",
          "value": "1"
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    },
    {
      "id": 2,
      "name": "SUMcol9",
      "function": {
        "expression": "SUM",
        "parameter": {
          "type": "column",
          "value": "col9"
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    },
    {
      "id": 3,
      "name": "SUMcol10",
      "function": {
        "expression": "SUM",
        "parameter": {
          "type": "column",
          "value": "col10"
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    }
  ],
  "rowkey": {
    "rowkey_columns": [
      {
        "column": "col1",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col2",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col3",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col4",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col5",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col6",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col7",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col8",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col9",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col10",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "l_col1",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "h1",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "h2",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "h3",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "h4",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
       "column": "h5",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      }
    ],
    "aggregation_groups": [
      [
        "col1",
        "col2",
        "col4",
        "col5",
        "col6",
        "col7",
        "col8",
        "col9",
        "col10",
        "l_col1"
      ],
      [
        "col3",
        "h1",
        "h2",
        "h3",
        "h4",
        "h5"
      ]
    ]
  },
  "signature": "ZGOjdHfxChhWpdXXAIbImQ==",
  "last_modified": 1452199211425,
  "model_name": "missing_a_field_cube",
  "null_string": null,
  "hbase_mapping": {
    "column_family": [
      {
        "name": "F1",
        "columns": [
          {
            "qualifier": "M",
            "measure_refs": [
              "_COUNT_",
              "SUMcol9",
              "SUMcol10"
            ]
          }
        ]
      }
    ]
  },
  "notify_list": [],
  "auto_merge_time_ranges": [
    604800000,
    2419200000
  ],
  "retention_range": 0
}



========================================
JSON(Model):

{
  "uuid": null,
  "name": "missing_a_field_cube",
  "lookups": [
    {
      "table": "FACT.lookuptable1",
      "join": {
        "type": "inner",
        "primary_key": [
          "p1"
        ],
        "foreign_key": [
          "col6"
        ]
      }
    },
    {
      "table": "FACT.lookuptable2",
      "join": {
        "type": "inner",
        "primary_key": [
          "col3"
        ],
        "foreign_key": [
          "col3"
        ]
     }
    }
  ],
  "capacity": "LARGE",
  "last_modified": 1452199211170,
  "fact_table": "FACT.facttable",
  "filter_condition": "",
  "partition_desc": {
    "partition_date_column": "FACT.facttable.SPT_DATE",
    "partition_date_start": 1325289600000,
    "partition_type": "APPEND",
    "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
  }
}

Best regards,
Zhong

From: Li Yang [mailto:liyang@apache.org]
Sent: Thursday, January 07, 2016 3:03 AM
To: Li Yang <li...@apache.org>>
Cc: user@kylin.apache.org<ma...@kylin.apache.org>; Jones, James <jj...@cardlytics.com>>
Subject: Re: missing a field in the SQL when creating cube

I mean need the two json files "model json and cube json" to further troubleshoot.

On Thu, Jan 7, 2016 at 4:02 PM, Li Yang <li...@apache.org>> wrote:
Em... derived dimensions are not in extraction hive SQL, but hierarchy dimensions should appear.

If you could both the model json and cube json files, we could try to reproduce and debug.

On Wed, Jan 6, 2016 at 8:16 AM, Luke Han <lu...@gmail.com>> wrote:
it is by design, hierarchy/derived dimension will not be included in generated SQL which read data from Hive, which associate with PK/FK.

Thanks.




Best Regards!
---------------------

Luke Han

On Wed, Jan 6, 2016 at 3:07 AM, Zhang, Zhong <zz...@cardlytics.com>> wrote:
Hi All,

I’ve encountered a kind of weird problem. I just normally created a cube step by step.
Based on my understanding, after filling all the information in cube info, data model,
dimensions, and etc, a SQL will be generated in the SQL column. The following is the
generated SQL:

SELECT
...
SPENDTRIPINC.POSTALCD
,FLATGEO2.TOWN
,FLATGEO2.DMA_CODE
,FLATGEO2.PROVINCECD
,FLATGEO2.REGIONNAME
,FLATGEO2.MSA
...
FROM FACT.SPENDTRIPINC as SPENDTRIPINC
INNER JOIN FACT.CALENDARDATES as CALENDARDATES
ON SPENDTRIPINC.DIMDATEKEY = CALENDARDATES.CALENDARDAYID
INNER JOIN FACT.FLATGEO2 as FLATGEO2
ON SPENDTRIPINC.POSTALCD = FLATGEO2.POSTALCD

I do put fields "POSTALCD","TOWN","DMA_CODE","PROVINCECD","REGIONNAME" and "MSA"
from table FLATGEO2 as a hierarchy dimension. There should be a filed POSTALCD
in the SQL. But unfortunately it is not there just as you see.

The weird thing is that if checking at JSON(cube) column, POSTALCD is there:

{

      "id": 12,

      "name": "FACTGEO",

      "table": "FACT.FLATGEO2",

      "column": [

        "POSTALCD",

        "TOWN",

        "DMA_CODE",

        "PROVINCECD",

        "REGIONNAME",

        "MSA"

      ],

      "derived": null,

      "hierarchy": true

    }

Best regards,
Zhong





RE: missing a field in the SQL when creating cube

Posted by "Zhang, Zhong" <zz...@cardlytics.com>.
Hi Yang,

Thanks so much for your reply.

Let me give a little bit introduction. There is a column “col3” in the facttable and lookuptable2.
“col3” is the primary key and foreign key in the lookuptable2. “col3” is also the missing field
in the SQL query.

JSON(Cube):

{
  "uuid": "14834977-626c-4f24-8d78-adf73cbf8cae",
  "name": "missing_a_field_cube",
  "description": "",
  "dimensions": [
    {
      "id": 1,
      "name": "col1",
      "table": "FACT.facttable",
      "column": [
        "col1"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 2,
      "name": "col2",
      "table": "FACT.facttable",
      "column": [
        "col2"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 3,
      "name": "col3",
      "table": "FACT.facttable",
      "column": [
        "col3"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 4,
      "name": "col4",
      "table": "FACT.facttable",
      "column": [
        "col4"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 5,
      "name": "col5",
      "table": "FACT.facttable",
      "column": [
        "col5"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 6,
      "name": "col6",
      "table": "FACT.facttable",
      "column": [
        "col6"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 7,
      "name": "col7",
      "table": "FACT.facttable",
      "column": [
        "col7"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 8,
      "name": "col8",
      "table": "FACT.facttable",
      "column": [
        "col8"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 9,
      "name": "col9",
      "table": "FACT.facttable",
      "column": [
        "col9"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 10,
      "name": "col10",
      "table": "FACT.facttable",
      "column": [
        "col10"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 11,
      "name": "L_col1",
      "table": "FACT.lookuptable1",
      "column": [
        "L_col1"
      ],
      "derived": null,
      "hierarchy": false
    },
    {
      "id": 12,
      "name": "hier_dim",
      "table": "FACT.lookuptable2",
      "column": [
        "col3",
        "h1",
        "h2",
        "h3",
        "h4",
       "h5"
      ],
      "derived": null,
      "hierarchy": true
    }
  ],
  "measures": [
    {
      "id": 1,
      "name": "_COUNT_",
      "function": {
        "expression": "COUNT",
        "parameter": {
          "type": "constant",
          "value": "1"
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    },
    {
      "id": 2,
      "name": "SUMcol9",
      "function": {
        "expression": "SUM",
        "parameter": {
          "type": "column",
          "value": "col9"
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    },
    {
      "id": 3,
      "name": "SUMcol10",
      "function": {
        "expression": "SUM",
        "parameter": {
          "type": "column",
          "value": "col10"
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    }
  ],
  "rowkey": {
    "rowkey_columns": [
      {
        "column": "col1",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col2",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col3",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col4",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col5",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col6",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col7",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col8",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col9",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "col10",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "l_col1",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "h1",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "h2",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "h3",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
        "column": "h4",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      },
      {
       "column": "h5",
        "length": 0,
        "dictionary": "true",
        "mandatory": false
      }
    ],
    "aggregation_groups": [
      [
        "col1",
        "col2",
        "col4",
        "col5",
        "col6",
        "col7",
        "col8",
        "col9",
        "col10",
        "l_col1"
      ],
      [
        "col3",
        "h1",
        "h2",
        "h3",
        "h4",
        "h5"
      ]
    ]
  },
  "signature": "ZGOjdHfxChhWpdXXAIbImQ==",
  "last_modified": 1452199211425,
  "model_name": "missing_a_field_cube",
  "null_string": null,
  "hbase_mapping": {
    "column_family": [
      {
        "name": "F1",
        "columns": [
          {
            "qualifier": "M",
            "measure_refs": [
              "_COUNT_",
              "SUMcol9",
              "SUMcol10"
            ]
          }
        ]
      }
    ]
  },
  "notify_list": [],
  "auto_merge_time_ranges": [
    604800000,
    2419200000
  ],
  "retention_range": 0
}



========================================
JSON(Model):

{
  "uuid": null,
  "name": "missing_a_field_cube",
  "lookups": [
    {
      "table": "FACT.lookuptable1",
      "join": {
        "type": "inner",
        "primary_key": [
          "p1"
        ],
        "foreign_key": [
          "col6"
        ]
      }
    },
    {
      "table": "FACT.lookuptable2",
      "join": {
        "type": "inner",
        "primary_key": [
          "col3"
        ],
        "foreign_key": [
          "col3"
        ]
     }
    }
  ],
  "capacity": "LARGE",
  "last_modified": 1452199211170,
  "fact_table": "FACT.facttable",
  "filter_condition": "",
  "partition_desc": {
    "partition_date_column": "FACT.facttable.SPT_DATE",
    "partition_date_start": 1325289600000,
    "partition_type": "APPEND",
    "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
  }
}

Best regards,
Zhong

From: Li Yang [mailto:liyang@apache.org]
Sent: Thursday, January 07, 2016 3:03 AM
To: Li Yang <li...@apache.org>
Cc: user@kylin.apache.org; Jones, James <jj...@cardlytics.com>
Subject: Re: missing a field in the SQL when creating cube

I mean need the two json files "model json and cube json" to further troubleshoot.

On Thu, Jan 7, 2016 at 4:02 PM, Li Yang <li...@apache.org>> wrote:
Em... derived dimensions are not in extraction hive SQL, but hierarchy dimensions should appear.

If you could both the model json and cube json files, we could try to reproduce and debug.

On Wed, Jan 6, 2016 at 8:16 AM, Luke Han <lu...@gmail.com>> wrote:
it is by design, hierarchy/derived dimension will not be included in generated SQL which read data from Hive, which associate with PK/FK.

Thanks.




Best Regards!
---------------------

Luke Han

On Wed, Jan 6, 2016 at 3:07 AM, Zhang, Zhong <zz...@cardlytics.com>> wrote:
Hi All,

I’ve encountered a kind of weird problem. I just normally created a cube step by step.
Based on my understanding, after filling all the information in cube info, data model,
dimensions, and etc, a SQL will be generated in the SQL column. The following is the
generated SQL:

SELECT
...
SPENDTRIPINC.POSTALCD
,FLATGEO2.TOWN
,FLATGEO2.DMA_CODE
,FLATGEO2.PROVINCECD
,FLATGEO2.REGIONNAME
,FLATGEO2.MSA
...
FROM FACT.SPENDTRIPINC as SPENDTRIPINC
INNER JOIN FACT.CALENDARDATES as CALENDARDATES
ON SPENDTRIPINC.DIMDATEKEY = CALENDARDATES.CALENDARDAYID
INNER JOIN FACT.FLATGEO2 as FLATGEO2
ON SPENDTRIPINC.POSTALCD = FLATGEO2.POSTALCD

I do put fields "POSTALCD","TOWN","DMA_CODE","PROVINCECD","REGIONNAME" and "MSA"
from table FLATGEO2 as a hierarchy dimension. There should be a filed POSTALCD
in the SQL. But unfortunately it is not there just as you see.

The weird thing is that if checking at JSON(cube) column, POSTALCD is there:

{

      "id": 12,

      "name": "FACTGEO",

      "table": "FACT.FLATGEO2",

      "column": [

        "POSTALCD",

        "TOWN",

        "DMA_CODE",

        "PROVINCECD",

        "REGIONNAME",

        "MSA"

      ],

      "derived": null,

      "hierarchy": true

    }

Best regards,
Zhong





Re: missing a field in the SQL when creating cube

Posted by Li Yang <li...@apache.org>.
I mean need the two json files "model json and cube json" to further
troubleshoot.

On Thu, Jan 7, 2016 at 4:02 PM, Li Yang <li...@apache.org> wrote:

> Em... derived dimensions are not in extraction hive SQL, but hierarchy
> dimensions should appear.
>
> If you could both the model json and cube json files, we could try to
> reproduce and debug.
>
> On Wed, Jan 6, 2016 at 8:16 AM, Luke Han <lu...@gmail.com> wrote:
>
>> it is by design, hierarchy/derived dimension will not be included in
>> generated SQL which read data from Hive, which associate with PK/FK.
>>
>> Thanks.
>>
>>
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Wed, Jan 6, 2016 at 3:07 AM, Zhang, Zhong <zz...@cardlytics.com>
>> wrote:
>>
>>> Hi All,
>>>
>>>
>>>
>>> I’ve encountered a kind of weird problem. I just normally created a cube
>>> step by step.
>>>
>>> Based on my understanding, after filling all the information in cube
>>> info, data model,
>>>
>>> dimensions, and etc, a SQL will be generated in the SQL column. The
>>> following is the
>>>
>>> generated SQL:
>>>
>>>
>>>
>>> SELECT
>>>
>>> ...
>>>
>>> SPENDTRIPINC.POSTALCD
>>>
>>> ,FLATGEO2.TOWN
>>>
>>> ,FLATGEO2.DMA_CODE
>>>
>>> ,FLATGEO2.PROVINCECD
>>>
>>> ,FLATGEO2.REGIONNAME
>>>
>>> ,FLATGEO2.MSA
>>>
>>> ...
>>>
>>> FROM FACT.SPENDTRIPINC as SPENDTRIPINC
>>>
>>> INNER JOIN FACT.CALENDARDATES as CALENDARDATES
>>>
>>> ON SPENDTRIPINC.DIMDATEKEY = CALENDARDATES.CALENDARDAYID
>>>
>>> INNER JOIN FACT.FLATGEO2 as FLATGEO2
>>>
>>> ON SPENDTRIPINC.POSTALCD = FLATGEO2.POSTALCD
>>>
>>>
>>>
>>> I do put fields "POSTALCD","TOWN","DMA_CODE","PROVINCECD","REGIONNAME"
>>> and "MSA"
>>>
>>> from table FLATGEO2 as a hierarchy dimension. There should be a filed
>>> POSTALCD
>>>
>>> in the SQL. But unfortunately it is not there just as you see.
>>>
>>>
>>>
>>> The weird thing is that if checking at JSON(cube) column, POSTALCD is
>>> there:
>>>
>>> {
>>>
>>>       "id": 12,
>>>
>>>       "name": "FACTGEO",
>>>
>>>       "table": "FACT.FLATGEO2",
>>>
>>>       "column": [
>>>
>>>         "POSTALCD",
>>>
>>>         "TOWN",
>>>
>>>         "DMA_CODE",
>>>
>>>         "PROVINCECD",
>>>
>>>         "REGIONNAME",
>>>
>>>         "MSA"
>>>
>>>       ],
>>>
>>>       "derived": null,
>>>
>>>       "hierarchy": true
>>>
>>>     }
>>>
>>>
>>>
>>> Best regards,
>>>
>>> Zhong
>>>
>>>
>>>
>>
>>
>

Re: missing a field in the SQL when creating cube

Posted by Li Yang <li...@apache.org>.
Em... derived dimensions are not in extraction hive SQL, but hierarchy
dimensions should appear.

If you could both the model json and cube json files, we could try to
reproduce and debug.

On Wed, Jan 6, 2016 at 8:16 AM, Luke Han <lu...@gmail.com> wrote:

> it is by design, hierarchy/derived dimension will not be included in
> generated SQL which read data from Hive, which associate with PK/FK.
>
> Thanks.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Wed, Jan 6, 2016 at 3:07 AM, Zhang, Zhong <zz...@cardlytics.com>
> wrote:
>
>> Hi All,
>>
>>
>>
>> I’ve encountered a kind of weird problem. I just normally created a cube
>> step by step.
>>
>> Based on my understanding, after filling all the information in cube
>> info, data model,
>>
>> dimensions, and etc, a SQL will be generated in the SQL column. The
>> following is the
>>
>> generated SQL:
>>
>>
>>
>> SELECT
>>
>> ...
>>
>> SPENDTRIPINC.POSTALCD
>>
>> ,FLATGEO2.TOWN
>>
>> ,FLATGEO2.DMA_CODE
>>
>> ,FLATGEO2.PROVINCECD
>>
>> ,FLATGEO2.REGIONNAME
>>
>> ,FLATGEO2.MSA
>>
>> ...
>>
>> FROM FACT.SPENDTRIPINC as SPENDTRIPINC
>>
>> INNER JOIN FACT.CALENDARDATES as CALENDARDATES
>>
>> ON SPENDTRIPINC.DIMDATEKEY = CALENDARDATES.CALENDARDAYID
>>
>> INNER JOIN FACT.FLATGEO2 as FLATGEO2
>>
>> ON SPENDTRIPINC.POSTALCD = FLATGEO2.POSTALCD
>>
>>
>>
>> I do put fields "POSTALCD","TOWN","DMA_CODE","PROVINCECD","REGIONNAME"
>> and "MSA"
>>
>> from table FLATGEO2 as a hierarchy dimension. There should be a filed
>> POSTALCD
>>
>> in the SQL. But unfortunately it is not there just as you see.
>>
>>
>>
>> The weird thing is that if checking at JSON(cube) column, POSTALCD is
>> there:
>>
>> {
>>
>>       "id": 12,
>>
>>       "name": "FACTGEO",
>>
>>       "table": "FACT.FLATGEO2",
>>
>>       "column": [
>>
>>         "POSTALCD",
>>
>>         "TOWN",
>>
>>         "DMA_CODE",
>>
>>         "PROVINCECD",
>>
>>         "REGIONNAME",
>>
>>         "MSA"
>>
>>       ],
>>
>>       "derived": null,
>>
>>       "hierarchy": true
>>
>>     }
>>
>>
>>
>> Best regards,
>>
>> Zhong
>>
>>
>>
>
>

Re: missing a field in the SQL when creating cube

Posted by Luke Han <lu...@gmail.com>.
it is by design, hierarchy/derived dimension will not be included in
generated SQL which read data from Hive, which associate with PK/FK.

Thanks.




Best Regards!
---------------------

Luke Han

On Wed, Jan 6, 2016 at 3:07 AM, Zhang, Zhong <zz...@cardlytics.com> wrote:

> Hi All,
>
>
>
> I’ve encountered a kind of weird problem. I just normally created a cube
> step by step.
>
> Based on my understanding, after filling all the information in cube info,
> data model,
>
> dimensions, and etc, a SQL will be generated in the SQL column. The
> following is the
>
> generated SQL:
>
>
>
> SELECT
>
> ...
>
> SPENDTRIPINC.POSTALCD
>
> ,FLATGEO2.TOWN
>
> ,FLATGEO2.DMA_CODE
>
> ,FLATGEO2.PROVINCECD
>
> ,FLATGEO2.REGIONNAME
>
> ,FLATGEO2.MSA
>
> ...
>
> FROM FACT.SPENDTRIPINC as SPENDTRIPINC
>
> INNER JOIN FACT.CALENDARDATES as CALENDARDATES
>
> ON SPENDTRIPINC.DIMDATEKEY = CALENDARDATES.CALENDARDAYID
>
> INNER JOIN FACT.FLATGEO2 as FLATGEO2
>
> ON SPENDTRIPINC.POSTALCD = FLATGEO2.POSTALCD
>
>
>
> I do put fields "POSTALCD","TOWN","DMA_CODE","PROVINCECD","REGIONNAME"
> and "MSA"
>
> from table FLATGEO2 as a hierarchy dimension. There should be a filed
> POSTALCD
>
> in the SQL. But unfortunately it is not there just as you see.
>
>
>
> The weird thing is that if checking at JSON(cube) column, POSTALCD is
> there:
>
> {
>
>       "id": 12,
>
>       "name": "FACTGEO",
>
>       "table": "FACT.FLATGEO2",
>
>       "column": [
>
>         "POSTALCD",
>
>         "TOWN",
>
>         "DMA_CODE",
>
>         "PROVINCECD",
>
>         "REGIONNAME",
>
>         "MSA"
>
>       ],
>
>       "derived": null,
>
>       "hierarchy": true
>
>     }
>
>
>
> Best regards,
>
> Zhong
>
>
>