You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ofbiz.apache.org by "Pierre Smits (Jira)" <ji...@apache.org> on 2020/03/17 08:48:00 UTC

[jira] [Comment Edited] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key

    [ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032320#comment-17032320 ] 

Pierre Smits edited comment on OFBIZ-10953 at 3/17/20, 8:47 AM:
----------------------------------------------------------------

The reasoning we applied to having a different primary key for the Uom table in the transaction db (an exception to the primary key definition for majority of the tables) applies also to the DWH.

For those not having access to the Data Warehouse Toolkit book (which is the starting point for the project's approach to Bi through the same named component) the first line states:

??Surrogate keys are used to implement the primary keys of *almost all* dimension tables.??

So, which dimension tables can the project consider to be the exceptions?
 These are the generic dimension tables that are uniform in meaning across businesses, e.g. those data sets defined by standardisation bodies. 
 Measurement dimension tables, like date, time, but also country, currency and all those other groups of records which are relating to records in the Uom table in the transaction db are such generic tables. These tables don't need a 'self defined' primary keys added, because the natural key is self-explanatory.

That is why Kimball in his book suggests for the Date dimension table to use the ISO 8601 (specifically YYMMDD) for the date as the definition for the primary key. And the same is suggested for the Time dimension table (HHMM in that case).

The validation for these exceptions are:
 # the natural key is self-explanatory,
 # data abstraction negatively impacts query performance and legibility.

Compare following examples:
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" createdStamp="2019-05-29 10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" customerDimId="12001" extDiscountAmount="0.000" extDistributionCost="0.000" extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000" extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" *invoiceDateDimId="20151008"* invoiceId="95" invoiceItemSeqId="00001" lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29 10:03:24.0" orderId="_NA_" organisationDimId="10010" *origCurrencyDimId="EUR"* productDimId="10087" quantity="1.000000" quantityUomDimId="10"/>
{code}
vs
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" createdStamp="2019-05-29 10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" customerDimId="12001_" extDiscountAmount="0.000" extDistributionCost="0.000" extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000" extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" *invoiceDateDimId="10000"* invoiceId="95" invoiceItemSeqId="00001" lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29 10:03:24.0" orderId="_NA_" organisationDimId="10010" *origCurrencyDimId="10000"* productDimId="10087" quantity="1.000000" quantityUomDimId="LEN_m"/>
{code}
The latter means that in a production infrastructure the using company is penalised (performance and cost-wise) with an additional query (for the *origCurrencyDimId*) to the currency dimension to retrieve the underlying explanation/meaning (EUR). And similarly for the *quantityUomDimId* and other generically defined measurements (e.g. in the examples the  *invoiceDateDimId*).

This may not mean much in DHW implementations with a low number of records in fact tables. But when there a multitude of such fact tables with each having millions/billions of records in those tables using such keys of generic tables (these measurement tables), then it adds up significantly. 


was (Author: pfm.smits):
The reasoning we applied to having a different primary key for the Uom table in the transaction db (an exception to the primary key definition for majority of the tables) applies also to the DWH.

For those not having access to the Data Warehouse Toolkit book (which is the starting point for the project's approach to Bi through the same named component) the first line states:

??Surrogate keys are used to implement the primary keys of *almost all* dimension tables.??

So, which dimension tables can the project consider to be the exceptions?
 These are the generic dimension tables that are uniform in meaning across businesses, e.g. those data sets defined by standardisation bodies. 
 Measurement dimension tables, like date, time, but also country, currency and all those other groups of records which are relating to records in the Uom table in the transaction db are such generic tables. These tables don't need a 'self defined' primary keys added, because the natural key is self-explanatory.

That is why Kimball in his book suggests for the Date dimension table to use the ISO 8601 (specifically YYMMDD) for the date as the definition for the primary key. And the same is suggested for the Time dimension table (HHMM in that case).

The validation for these exceptions are:
 # the natural key is self-explanatory,
 # data abstraction negatively impacts query performance and legibility.

Compare following examples:
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" createdStamp="2019-05-29 10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" customerDimId="12001" extDiscountAmount="0.000" extDistributionCost="0.000" extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000" extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" invoiceDateDimId="20151008" invoiceId="95" invoiceItemSeqId="00001" lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29 10:03:24.0" orderId="_NA_" organisationDimId="10010" origCurrencyDimId="EUR" productDimId="10087" quantity="1.000000" quantityUomDimId="10"/>
{code}
vs
{code:java}
<SalesInvoiceItemFact billToCustomerDimId="_NA_" contributionAmount="2250.000" createdStamp="2019-05-29 10:05:01.0" createdTxStamp="2019-05-29 10:03:24.0" customerDimId="12001_" extDiscountAmount="0.000" extDistributionCost="0.000" extGrossAmount="2250.000" extManFixedCost="0.000" extManVarCost="0.000" extNetAmount="2250.000" extStorageCost="0.000" extTaxAmount="0.000" invoiceDateDimId="20151008" invoiceId="95" invoiceItemSeqId="00001" lastUpdatedStamp="2019-05-29 10:05:01.0" lastUpdatedTxStamp="2019-05-29 10:03:24.0" orderId="_NA_" organisationDimId="10010" origCurrencyDimId="10000" productDimId="10087" quantity="1.000000" quantityUomDimId="LEN_m"/>
{code}
The latter means that in a production infrastructure the using company is penalised (performance and cost-wise) with an additional query (for the *origCurrencyDimId*) to the currency dimension to retrieve the underlying explanation/meaning (EUR). And similarly for the *quantityUomDimId* and other generically defined measurements (e.g. in the examples the  *invoiceDateDimId*).

This may not mean much in DHW implementations with a low number of records in fact tables. But when there a multitude of such fact tables with each having millions/billions of records in those tables using such keys of generic tables (these measurement tables), then it adds up significantly. 

> have CurrencyDimension have a dimensionId that is based on the natural key
> --------------------------------------------------------------------------
>
>                 Key: OFBIZ-10953
>                 URL: https://issues.apache.org/jira/browse/OFBIZ-10953
>             Project: OFBiz
>          Issue Type: Sub-task
>          Components: bi
>    Affects Versions: Release Branch 18.12, Release Branch 17.12, Trunk
>            Reporter: Pierre Smits
>            Assignee: Pierre Smits
>            Priority: Major
>              Labels: CurrencyDimension, birt, currency, dimension, dwh
>         Attachments: 20200208_094840.jpg
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine the dimensionId for the CurrencyDimension. This is unnecessary as the uomId from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating it based on the date provided, than by retrieving the dimensionId based on a retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)