You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lens.apache.org by Tao Yan <ty...@linkedin.com> on 2016/08/05 20:37:14 UTC

Lens Query Conversion

Hi Lens Developers,

I am running the example queries, and found the following Lens query:

query execute cube select customer_city_name, store_cost from sales where
time_range_in(order_time, '2015-04-13-03', '2015-04-13-04')

Was converted to the following hive Query:

SELECT ( customer_city . name ), sum(( sales . store_cost )) FROM
local_sales_aggr_fact2 sales join local_city_table customer_city on
sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest')
WHERE ((((*(( sales . dt ) =  '2015-04-13-04' ) or (( sales . dt ) =
 '2015-04-13-05' ))* and ((( *sales . order_time* ) >=  '2015-04-13
03:00:00' ) and (( *sales . order_time* ) <  '2015-04-13 04:00:00' )))))
GROUP BY ( customer_city . name )

The property shows (! grep -A 33333333 <x_
examples/resources/sales-cube.xml):
  <properties>
    <property name="cube.sales.timed.dimensions.list"
value="order_time,delivery_time" />
    *<property name="cube.timedim.partition.order_time" value="ot" />*
    <property name="cube.timedim.partition.delivery_time" value="dt" />
    *<property name="cube.timedim.relation.order_time"
value="delivery_time+[-20 days,-1 hour]" />*
    <!-- means dt-20days <= ot <= dt-1hour -->
  </properties>


Which means that "order time can be between delivery time -20 days to
delivery time - 1 hour" according to the 20 minutes demo.

However, the Hive query's 'where' condition use deliver_time equal '
*2015-04-13-04*' or '*2015-04-13-05*', and how does it find the column
'sales.order_time'? Shouldn't it match order_time with partition 'ot'?

Thanks,
-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
tyan@linkedin.com
https://www.linkedin.com/in/taousc

Re: Lens Query Conversion

Posted by Dayou Jiang <fe...@gmail.com>.
Logged a bug here: https://issues.apache.org/jira/browse/LENS-1276

Thanks,
Dayou Jiang

On Wed, Aug 10, 2016 at 9:05 PM, amareshwarisr . <am...@gmail.com>
wrote:

> Dayou Jiang,
>
> Seems a bug. Please file a jira at https://issues.apache.org/
> jira/browse/LENS.
>
> Thanks
>
> On Thu, Aug 11, 2016 at 12:34 AM, Dayou Jiang <fe...@gmail.com> wrote:
>
>> Hi,
>>
>> I would like to add something in this partition discussion:
>>
>> I found out that Lens doesn't allow me to use the "update fact" command
>> to change the partition update periods. For example, i have the "<
>> update_period>DAILY</update_period>" tag in one of my fact table
>> definition and i now decide to have the
>> MONTHLY partition too, but i couldn't update my fact table to accept
>> that. Even after i dropped the entire fact table
>> and tried to create it again with both "DAILY" and "MONTHLY" as the
>> update periods i still couldn't add the MONTHLY partition.
>> I think some of the meta data still got hanging around and not being
>> deleted after i dropped the table.
>> Is this a bug at Lens?
>>
>>
>> Thanks,
>> Dayou Jiang
>>
>> On Wed, Aug 10, 2016 at 11:36 AM, Tao Yan <ty...@linkedin.com> wrote:
>>
>>> Thanks for answering the question! It is helpful.
>>>
>>> On Wed, Aug 10, 2016 at 1:15 AM, amareshwarisr . <am...@gmail.com>
>>> wrote:
>>>
>>>> We have a mapping time dimension columns to its partition columns,
>>>> because system's like HIVE would require partition column to be separate
>>>> column than table columns.
>>>>
>>>> So, if tables already contain order_time as a column, and the data is
>>>> partitioned by order_time - then partition column name has to be a separate
>>>> column - which is defined through the mapping.
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Aug 10, 2016 at 1:40 PM, Rajat Khandelwal <pr...@apache.org>
>>>> wrote:
>>>>
>>>>> "value" is required in a "property" tag. A property tag is name and
>>>>> value. This special property defines that whenever queried on order_time
>>>>> time dimension, try to answer with facts that have "ot" as
>>>>> partition column. If not, it falls back to delivery_time time dimension.
>>>>> There again, the mapping is checked, which is "dt", so facts that have "dt"
>>>>> as partition column are preferred. This can happen in a chain until a valid
>>>>> combination is found which can answer the query.
>>>>>
>>>>>
>>>>> On Wed, Aug 10, 2016 at 12:13 AM Tao Yan <ty...@linkedin.com> wrote:
>>>>>
>>>>>> So, the field 'value' can be used to define both alias and real
>>>>>> values?
>>>>>>
>>>>>> On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <pr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes, that is correct.
>>>>>>>
>>>>>>> On Wed, Aug 10, 2016, 00:09 Tao Yan <ty...@linkedin.com> wrote:
>>>>>>>
>>>>>>>> Thanks for the explanation. I got the idea of tentative partitions.
>>>>>>>>
>>>>>>>> Is the property
>>>>>>>> *<property name="cube.timedim.partition.order_time" value="ot" />*
>>>>>>>>  referring to the dim_attribute
>>>>>>>> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
>>>>>>>>
>>>>>>>> Is the value 'ot' an alias of the partition column name?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <
>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>
>>>>>>>>> The idea of partitions is that there is a stream of data that
>>>>>>>>> keeps flowing in, each line of which has delivery time and order time both.
>>>>>>>>> The data is partitioned and stored in partitions later and facts are
>>>>>>>>> created on top of that partitioned data. So if data is partitioned in order
>>>>>>>>> time and the time range queried is also order time, then we can pick the
>>>>>>>>> partitions directly. But if the data is partitioned on delivery time, as is
>>>>>>>>> the case for this fact, and the query is on order time, then you pick some
>>>>>>>>> tentative partitions of delivery time, but you still have to filter over
>>>>>>>>> order time. Here the purpose of picking partitions is to reduce the amount
>>>>>>>>> of data read. When you pick delivery time partitions, it doesn't guarantee
>>>>>>>>> what order times data has, it only provides a tentative range. Hence, the
>>>>>>>>> partitions are picked based on the relation of time dimensions, but the
>>>>>>>>> extra filter still has to be supplied.
>>>>>>>>>
>>>>>>>>> Hope it's clear.
>>>>>>>>>
>>>>>>>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <
>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> That's the original time range asked in the cube query.
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Rajat,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the explain. I have a clear understanding about dim
>>>>>>>>>>> relation now, however, how does the following condition generated:
>>>>>>>>>>>
>>>>>>>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>>>>>>>>>>>
>>>>>>>>>>> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <
>>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I've created a jira for this: https://issues.apache.or
>>>>>>>>>>>> g/jira/browse/LENS-1269
>>>>>>>>>>>>
>>>>>>>>>>>> You should be able to make change in your data locally and see
>>>>>>>>>>>> the non-empty results:
>>>>>>>>>>>>
>>>>>>>>>>>> cat /Users/rajat.khandelwal/Git/le
>>>>>>>>>>>> ns/lens-dist/target/apache-lens-2.7.0-SNAPSHOT-bin/apache-le
>>>>>>>>>>>> ns-2.7.0-SNAPSHOT-bin/client/examples/data/sales_aggr_fact2_
>>>>>>>>>>>> local3/sales-aggr-fact2-local3.data
>>>>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <
>>>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> More details on how it converts to the fallback range:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Range asked:
>>>>>>>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>>>>
>>>>>>>>>>>>> Timedim Relation :
>>>>>>>>>>>>>
>>>>>>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>>>>>>>>>>>
>>>>>>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>>>>>>>>>>>
>>>>>>>>>>>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>>>>>>>>>>>
>>>>>>>>>>>>> Derivation:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>>>>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot +
>>>>>>>>>>>>> 2 hour
>>>>>>>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hope it's clearer now.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <
>>>>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So it seems right now the query is designed to return 0 rows.
>>>>>>>>>>>>>> The query is there just to demonstrate the fallback feature.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cube query is
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>>>>>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>>>>> Driver query comes out to be
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then I modified the driver query and ran:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Found the following results:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>>>>>>>>>> translated query.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll look deeper and replace the example query with something
>>>>>>>>>>>>>> that demonstrates the fallback feature and also returns non-empty results.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let us know if you have any confusion understanding the
>>>>>>>>>>>>>> feature itself. I think the video should have covered it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> *Tao Yan*
>>>>>>>>>>> Software Engineer
>>>>>>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 206.250.5345
>>>>>>>>>>> tyan@linkedin.com
>>>>>>>>>>> https://www.linkedin.com/in/taousc
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> *Tao Yan*
>>>>>>>> Software Engineer
>>>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 206.250.5345
>>>>>>>> tyan@linkedin.com
>>>>>>>> https://www.linkedin.com/in/taousc
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Tao Yan*
>>>>>> Software Engineer
>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>
>>>>>>
>>>>>>
>>>>>> 206.250.5345
>>>>>> tyan@linkedin.com
>>>>>> https://www.linkedin.com/in/taousc
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Tao Yan*
>>> Software Engineer
>>> Data Analytics Infrastructure Tools and Services
>>>
>>>
>>>
>>> 206.250.5345
>>> tyan@linkedin.com
>>> https://www.linkedin.com/in/taousc
>>>
>>
>>
>

Re: Lens Query Conversion

Posted by "amareshwarisr ." <am...@gmail.com>.
Dayou Jiang,

Seems a bug. Please file a jira at
https://issues.apache.org/jira/browse/LENS.

Thanks

On Thu, Aug 11, 2016 at 12:34 AM, Dayou Jiang <fe...@gmail.com> wrote:

> Hi,
>
> I would like to add something in this partition discussion:
>
> I found out that Lens doesn't allow me to use the "update fact" command to
> change the partition update periods. For example, i have the "<
> update_period>DAILY</update_period>" tag in one of my fact table
> definition and i now decide to have the
> MONTHLY partition too, but i couldn't update my fact table to accept that.
> Even after i dropped the entire fact table
> and tried to create it again with both "DAILY" and "MONTHLY" as the update
> periods i still couldn't add the MONTHLY partition.
> I think some of the meta data still got hanging around and not being
> deleted after i dropped the table.
> Is this a bug at Lens?
>
>
> Thanks,
> Dayou Jiang
>
> On Wed, Aug 10, 2016 at 11:36 AM, Tao Yan <ty...@linkedin.com> wrote:
>
>> Thanks for answering the question! It is helpful.
>>
>> On Wed, Aug 10, 2016 at 1:15 AM, amareshwarisr . <am...@gmail.com>
>> wrote:
>>
>>> We have a mapping time dimension columns to its partition columns,
>>> because system's like HIVE would require partition column to be separate
>>> column than table columns.
>>>
>>> So, if tables already contain order_time as a column, and the data is
>>> partitioned by order_time - then partition column name has to be a separate
>>> column - which is defined through the mapping.
>>>
>>> Thanks
>>>
>>> On Wed, Aug 10, 2016 at 1:40 PM, Rajat Khandelwal <pr...@apache.org>
>>> wrote:
>>>
>>>> "value" is required in a "property" tag. A property tag is name and
>>>> value. This special property defines that whenever queried on order_time
>>>> time dimension, try to answer with facts that have "ot" as
>>>> partition column. If not, it falls back to delivery_time time dimension.
>>>> There again, the mapping is checked, which is "dt", so facts that have "dt"
>>>> as partition column are preferred. This can happen in a chain until a valid
>>>> combination is found which can answer the query.
>>>>
>>>>
>>>> On Wed, Aug 10, 2016 at 12:13 AM Tao Yan <ty...@linkedin.com> wrote:
>>>>
>>>>> So, the field 'value' can be used to define both alias and real values?
>>>>>
>>>>> On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <pr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Yes, that is correct.
>>>>>>
>>>>>> On Wed, Aug 10, 2016, 00:09 Tao Yan <ty...@linkedin.com> wrote:
>>>>>>
>>>>>>> Thanks for the explanation. I got the idea of tentative partitions.
>>>>>>>
>>>>>>> Is the property
>>>>>>> *<property name="cube.timedim.partition.order_time" value="ot" />*
>>>>>>>  referring to the dim_attribute
>>>>>>> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
>>>>>>>
>>>>>>> Is the value 'ot' an alias of the partition column name?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <prongs@apache.org
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> The idea of partitions is that there is a stream of data that keeps
>>>>>>>> flowing in, each line of which has delivery time and order time both. The
>>>>>>>> data is partitioned and stored in partitions later and facts are created on
>>>>>>>> top of that partitioned data. So if data is partitioned in order time and
>>>>>>>> the time range queried is also order time, then we can pick the partitions
>>>>>>>> directly. But if the data is partitioned on delivery time, as is the case
>>>>>>>> for this fact, and the query is on order time, then you pick some tentative
>>>>>>>> partitions of delivery time, but you still have to filter over order time.
>>>>>>>> Here the purpose of picking partitions is to reduce the amount of data
>>>>>>>> read. When you pick delivery time partitions, it doesn't guarantee what
>>>>>>>> order times data has, it only provides a tentative range. Hence, the
>>>>>>>> partitions are picked based on the relation of time dimensions, but the
>>>>>>>> extra filter still has to be supplied.
>>>>>>>>
>>>>>>>> Hope it's clear.
>>>>>>>>
>>>>>>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pr...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> That's the original time range asked in the cube query.
>>>>>>>>>
>>>>>>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Rajat,
>>>>>>>>>>
>>>>>>>>>> Thanks for the explain. I have a clear understanding about dim
>>>>>>>>>> relation now, however, how does the following condition generated:
>>>>>>>>>>
>>>>>>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>>>>>>>>>>
>>>>>>>>>> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <
>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> I've created a jira for this: https://issues.apache.or
>>>>>>>>>>> g/jira/browse/LENS-1269
>>>>>>>>>>>
>>>>>>>>>>> You should be able to make change in your data locally and see
>>>>>>>>>>> the non-empty results:
>>>>>>>>>>>
>>>>>>>>>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-len
>>>>>>>>>>> s-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/e
>>>>>>>>>>> xamples/data/sales_aggr_fact2_local3/sales-aggr-fact2-local3
>>>>>>>>>>> .data
>>>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <
>>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> More details on how it converts to the fallback range:
>>>>>>>>>>>>
>>>>>>>>>>>> Range asked:
>>>>>>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>>>
>>>>>>>>>>>> Timedim Relation :
>>>>>>>>>>>>
>>>>>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>>>>>>>>>>
>>>>>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>>>>>>>>>>
>>>>>>>>>>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>>>>>>>>>>
>>>>>>>>>>>> Derivation:
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>>>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot +
>>>>>>>>>>>> 2 hour
>>>>>>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>>>>>>>
>>>>>>>>>>>> Hope it's clearer now.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <
>>>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>
>>>>>>>>>>>>> So it seems right now the query is designed to return 0 rows.
>>>>>>>>>>>>> The query is there just to demonstrate the fallback feature.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cube query is
>>>>>>>>>>>>>
>>>>>>>>>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>>>>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>>>> Driver query comes out to be
>>>>>>>>>>>>>
>>>>>>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then I modified the driver query and ran:
>>>>>>>>>>>>>
>>>>>>>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>>>>>>>
>>>>>>>>>>>>> Found the following results:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>>>>>>>>> translated query.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'll look deeper and replace the example query with something
>>>>>>>>>>>>> that demonstrates the fallback feature and also returns non-empty results.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let us know if you have any confusion understanding the
>>>>>>>>>>>>> feature itself. I think the video should have covered it.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> *Tao Yan*
>>>>>>>>>> Software Engineer
>>>>>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 206.250.5345
>>>>>>>>>> tyan@linkedin.com
>>>>>>>>>> https://www.linkedin.com/in/taousc
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> *Tao Yan*
>>>>>>> Software Engineer
>>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 206.250.5345
>>>>>>> tyan@linkedin.com
>>>>>>> https://www.linkedin.com/in/taousc
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Tao Yan*
>>>>> Software Engineer
>>>>> Data Analytics Infrastructure Tools and Services
>>>>>
>>>>>
>>>>>
>>>>> 206.250.5345
>>>>> tyan@linkedin.com
>>>>> https://www.linkedin.com/in/taousc
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> tyan@linkedin.com
>> https://www.linkedin.com/in/taousc
>>
>
>

Re: Lens Query Conversion

Posted by Dayou Jiang <fe...@gmail.com>.
Hi,

I would like to add something in this partition discussion:

I found out that Lens doesn't allow me to use the "update fact" command to
change the partition update periods. For example, i have the "<update_period
>DAILY</update_period>" tag in one of my fact table definition and i now
decide to have the
MONTHLY partition too, but i couldn't update my fact table to accept that.
Even after i dropped the entire fact table
and tried to create it again with both "DAILY" and "MONTHLY" as the update
periods i still couldn't add the MONTHLY partition.
I think some of the meta data still got hanging around and not being
deleted after i dropped the table.
Is this a bug at Lens?


Thanks,
Dayou Jiang

On Wed, Aug 10, 2016 at 11:36 AM, Tao Yan <ty...@linkedin.com> wrote:

> Thanks for answering the question! It is helpful.
>
> On Wed, Aug 10, 2016 at 1:15 AM, amareshwarisr . <am...@gmail.com>
> wrote:
>
>> We have a mapping time dimension columns to its partition columns,
>> because system's like HIVE would require partition column to be separate
>> column than table columns.
>>
>> So, if tables already contain order_time as a column, and the data is
>> partitioned by order_time - then partition column name has to be a separate
>> column - which is defined through the mapping.
>>
>> Thanks
>>
>> On Wed, Aug 10, 2016 at 1:40 PM, Rajat Khandelwal <pr...@apache.org>
>> wrote:
>>
>>> "value" is required in a "property" tag. A property tag is name and
>>> value. This special property defines that whenever queried on order_time
>>> time dimension, try to answer with facts that have "ot" as
>>> partition column. If not, it falls back to delivery_time time dimension.
>>> There again, the mapping is checked, which is "dt", so facts that have "dt"
>>> as partition column are preferred. This can happen in a chain until a valid
>>> combination is found which can answer the query.
>>>
>>>
>>> On Wed, Aug 10, 2016 at 12:13 AM Tao Yan <ty...@linkedin.com> wrote:
>>>
>>>> So, the field 'value' can be used to define both alias and real values?
>>>>
>>>> On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <pr...@apache.org>
>>>> wrote:
>>>>
>>>>> Yes, that is correct.
>>>>>
>>>>> On Wed, Aug 10, 2016, 00:09 Tao Yan <ty...@linkedin.com> wrote:
>>>>>
>>>>>> Thanks for the explanation. I got the idea of tentative partitions.
>>>>>>
>>>>>> Is the property
>>>>>> *<property name="cube.timedim.partition.order_time" value="ot" />*
>>>>>>  referring to the dim_attribute
>>>>>> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
>>>>>>
>>>>>> Is the value 'ot' an alias of the partition column name?
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <pr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> The idea of partitions is that there is a stream of data that keeps
>>>>>>> flowing in, each line of which has delivery time and order time both. The
>>>>>>> data is partitioned and stored in partitions later and facts are created on
>>>>>>> top of that partitioned data. So if data is partitioned in order time and
>>>>>>> the time range queried is also order time, then we can pick the partitions
>>>>>>> directly. But if the data is partitioned on delivery time, as is the case
>>>>>>> for this fact, and the query is on order time, then you pick some tentative
>>>>>>> partitions of delivery time, but you still have to filter over order time.
>>>>>>> Here the purpose of picking partitions is to reduce the amount of data
>>>>>>> read. When you pick delivery time partitions, it doesn't guarantee what
>>>>>>> order times data has, it only provides a tentative range. Hence, the
>>>>>>> partitions are picked based on the relation of time dimensions, but the
>>>>>>> extra filter still has to be supplied.
>>>>>>>
>>>>>>> Hope it's clear.
>>>>>>>
>>>>>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pr...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> That's the original time range asked in the cube query.
>>>>>>>>
>>>>>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Rajat,
>>>>>>>>>
>>>>>>>>> Thanks for the explain. I have a clear understanding about dim
>>>>>>>>> relation now, however, how does the following condition generated:
>>>>>>>>>
>>>>>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>>>>>>>>>
>>>>>>>>> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <
>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> I've created a jira for this: https://issues.apache.or
>>>>>>>>>> g/jira/browse/LENS-1269
>>>>>>>>>>
>>>>>>>>>> You should be able to make change in your data locally and see
>>>>>>>>>> the non-empty results:
>>>>>>>>>>
>>>>>>>>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-len
>>>>>>>>>> s-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/
>>>>>>>>>> examples/data/sales_aggr_fact2_local3/sales-aggr-fact2-
>>>>>>>>>> local3.data
>>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <
>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> More details on how it converts to the fallback range:
>>>>>>>>>>>
>>>>>>>>>>> Range asked:
>>>>>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>>
>>>>>>>>>>> Timedim Relation :
>>>>>>>>>>>
>>>>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>>>>>>>>>
>>>>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>>>>>>>>>
>>>>>>>>>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>>>>>>>>>
>>>>>>>>>>> Derivation:
>>>>>>>>>>>
>>>>>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2
>>>>>>>>>>> hour
>>>>>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>>>>>>
>>>>>>>>>>> Hope it's clearer now.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <
>>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi
>>>>>>>>>>>>
>>>>>>>>>>>> So it seems right now the query is designed to return 0 rows.
>>>>>>>>>>>> The query is there just to demonstrate the fallback feature.
>>>>>>>>>>>>
>>>>>>>>>>>> Cube query is
>>>>>>>>>>>>
>>>>>>>>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>>>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>>> Driver query comes out to be
>>>>>>>>>>>>
>>>>>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>>>>>>>
>>>>>>>>>>>> Then I modified the driver query and ran:
>>>>>>>>>>>>
>>>>>>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>>>>>>
>>>>>>>>>>>> Found the following results:
>>>>>>>>>>>>
>>>>>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>>>>>>
>>>>>>>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>>>>>>>> translated query.
>>>>>>>>>>>>
>>>>>>>>>>>> I'll look deeper and replace the example query with something
>>>>>>>>>>>> that demonstrates the fallback feature and also returns non-empty results.
>>>>>>>>>>>>
>>>>>>>>>>>> Let us know if you have any confusion understanding the feature
>>>>>>>>>>>> itself. I think the video should have covered it.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> *Tao Yan*
>>>>>>>>> Software Engineer
>>>>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 206.250.5345
>>>>>>>>> tyan@linkedin.com
>>>>>>>>> https://www.linkedin.com/in/taousc
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Tao Yan*
>>>>>> Software Engineer
>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>
>>>>>>
>>>>>>
>>>>>> 206.250.5345
>>>>>> tyan@linkedin.com
>>>>>> https://www.linkedin.com/in/taousc
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Tao Yan*
>>>> Software Engineer
>>>> Data Analytics Infrastructure Tools and Services
>>>>
>>>>
>>>>
>>>> 206.250.5345
>>>> tyan@linkedin.com
>>>> https://www.linkedin.com/in/taousc
>>>>
>>>
>>
>
>
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> tyan@linkedin.com
> https://www.linkedin.com/in/taousc
>

Re: Lens Query Conversion

Posted by Tao Yan <ty...@linkedin.com>.
Thanks for answering the question! It is helpful.

On Wed, Aug 10, 2016 at 1:15 AM, amareshwarisr . <am...@gmail.com>
wrote:

> We have a mapping time dimension columns to its partition columns, because
> system's like HIVE would require partition column to be separate column
> than table columns.
>
> So, if tables already contain order_time as a column, and the data is
> partitioned by order_time - then partition column name has to be a separate
> column - which is defined through the mapping.
>
> Thanks
>
> On Wed, Aug 10, 2016 at 1:40 PM, Rajat Khandelwal <pr...@apache.org>
> wrote:
>
>> "value" is required in a "property" tag. A property tag is name and
>> value. This special property defines that whenever queried on order_time
>> time dimension, try to answer with facts that have "ot" as
>> partition column. If not, it falls back to delivery_time time dimension.
>> There again, the mapping is checked, which is "dt", so facts that have "dt"
>> as partition column are preferred. This can happen in a chain until a valid
>> combination is found which can answer the query.
>>
>>
>> On Wed, Aug 10, 2016 at 12:13 AM Tao Yan <ty...@linkedin.com> wrote:
>>
>>> So, the field 'value' can be used to define both alias and real values?
>>>
>>> On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <pr...@apache.org>
>>> wrote:
>>>
>>>> Yes, that is correct.
>>>>
>>>> On Wed, Aug 10, 2016, 00:09 Tao Yan <ty...@linkedin.com> wrote:
>>>>
>>>>> Thanks for the explanation. I got the idea of tentative partitions.
>>>>>
>>>>> Is the property
>>>>> *<property name="cube.timedim.partition.order_time" value="ot" />*
>>>>>  referring to the dim_attribute
>>>>> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
>>>>>
>>>>> Is the value 'ot' an alias of the partition column name?
>>>>>
>>>>>
>>>>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <pr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> The idea of partitions is that there is a stream of data that keeps
>>>>>> flowing in, each line of which has delivery time and order time both. The
>>>>>> data is partitioned and stored in partitions later and facts are created on
>>>>>> top of that partitioned data. So if data is partitioned in order time and
>>>>>> the time range queried is also order time, then we can pick the partitions
>>>>>> directly. But if the data is partitioned on delivery time, as is the case
>>>>>> for this fact, and the query is on order time, then you pick some tentative
>>>>>> partitions of delivery time, but you still have to filter over order time.
>>>>>> Here the purpose of picking partitions is to reduce the amount of data
>>>>>> read. When you pick delivery time partitions, it doesn't guarantee what
>>>>>> order times data has, it only provides a tentative range. Hence, the
>>>>>> partitions are picked based on the relation of time dimensions, but the
>>>>>> extra filter still has to be supplied.
>>>>>>
>>>>>> Hope it's clear.
>>>>>>
>>>>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> That's the original time range asked in the cube query.
>>>>>>>
>>>>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:
>>>>>>>
>>>>>>>> Hi Rajat,
>>>>>>>>
>>>>>>>> Thanks for the explain. I have a clear understanding about dim
>>>>>>>> relation now, however, how does the following condition generated:
>>>>>>>>
>>>>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>>>>>>>>
>>>>>>>> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <prongs@apache.org
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> I've created a jira for this: https://issues.apache.or
>>>>>>>>> g/jira/browse/LENS-1269
>>>>>>>>>
>>>>>>>>> You should be able to make change in your data locally and see the
>>>>>>>>> non-empty results:
>>>>>>>>>
>>>>>>>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-len
>>>>>>>>> s-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/
>>>>>>>>> client/examples/data/sales_aggr_fact2_local3/sales-aggr-
>>>>>>>>> fact2-local3.data
>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pr...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> More details on how it converts to the fallback range:
>>>>>>>>>>
>>>>>>>>>> Range asked:
>>>>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>
>>>>>>>>>> Timedim Relation :
>>>>>>>>>>
>>>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>>>>>>>>
>>>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>>>>>>>>
>>>>>>>>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>>>>>>>>
>>>>>>>>>> Derivation:
>>>>>>>>>>
>>>>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2
>>>>>>>>>> hour
>>>>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>>>>>
>>>>>>>>>> Hope it's clearer now.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <
>>>>>>>>>> prongs@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>>
>>>>>>>>>>> So it seems right now the query is designed to return 0 rows.
>>>>>>>>>>> The query is there just to demonstrate the fallback feature.
>>>>>>>>>>>
>>>>>>>>>>> Cube query is
>>>>>>>>>>>
>>>>>>>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>> Driver query comes out to be
>>>>>>>>>>>
>>>>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>>>>>>
>>>>>>>>>>> Then I modified the driver query and ran:
>>>>>>>>>>>
>>>>>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>>>>>
>>>>>>>>>>> Found the following results:
>>>>>>>>>>>
>>>>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>>>>>
>>>>>>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>>>>>>> translated query.
>>>>>>>>>>>
>>>>>>>>>>> I'll look deeper and replace the example query with something
>>>>>>>>>>> that demonstrates the fallback feature and also returns non-empty results.
>>>>>>>>>>>
>>>>>>>>>>> Let us know if you have any confusion understanding the feature
>>>>>>>>>>> itself. I think the video should have covered it.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> *Tao Yan*
>>>>>>>> Software Engineer
>>>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 206.250.5345
>>>>>>>> tyan@linkedin.com
>>>>>>>> https://www.linkedin.com/in/taousc
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Tao Yan*
>>>>> Software Engineer
>>>>> Data Analytics Infrastructure Tools and Services
>>>>>
>>>>>
>>>>>
>>>>> 206.250.5345
>>>>> tyan@linkedin.com
>>>>> https://www.linkedin.com/in/taousc
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Tao Yan*
>>> Software Engineer
>>> Data Analytics Infrastructure Tools and Services
>>>
>>>
>>>
>>> 206.250.5345
>>> tyan@linkedin.com
>>> https://www.linkedin.com/in/taousc
>>>
>>
>


-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
tyan@linkedin.com
https://www.linkedin.com/in/taousc

Re: Lens Query Conversion

Posted by "amareshwarisr ." <am...@gmail.com>.
We have a mapping time dimension columns to its partition columns, because
system's like HIVE would require partition column to be separate column
than table columns.

So, if tables already contain order_time as a column, and the data is
partitioned by order_time - then partition column name has to be a separate
column - which is defined through the mapping.

Thanks

On Wed, Aug 10, 2016 at 1:40 PM, Rajat Khandelwal <pr...@apache.org> wrote:

> "value" is required in a "property" tag. A property tag is name and value.
> This special property defines that whenever queried on order_time time
> dimension, try to answer with facts that have "ot" as partition column. If
> not, it falls back to delivery_time time dimension. There again, the
> mapping is checked, which is "dt", so facts that have "dt" as partition
> column are preferred. This can happen in a chain until a valid combination
> is found which can answer the query.
>
>
> On Wed, Aug 10, 2016 at 12:13 AM Tao Yan <ty...@linkedin.com> wrote:
>
>> So, the field 'value' can be used to define both alias and real values?
>>
>> On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <pr...@apache.org>
>> wrote:
>>
>>> Yes, that is correct.
>>>
>>> On Wed, Aug 10, 2016, 00:09 Tao Yan <ty...@linkedin.com> wrote:
>>>
>>>> Thanks for the explanation. I got the idea of tentative partitions.
>>>>
>>>> Is the property
>>>> *<property name="cube.timedim.partition.order_time" value="ot" />*
>>>>  referring to the dim_attribute
>>>> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
>>>>
>>>> Is the value 'ot' an alias of the partition column name?
>>>>
>>>>
>>>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <pr...@apache.org>
>>>> wrote:
>>>>
>>>>> The idea of partitions is that there is a stream of data that keeps
>>>>> flowing in, each line of which has delivery time and order time both. The
>>>>> data is partitioned and stored in partitions later and facts are created on
>>>>> top of that partitioned data. So if data is partitioned in order time and
>>>>> the time range queried is also order time, then we can pick the partitions
>>>>> directly. But if the data is partitioned on delivery time, as is the case
>>>>> for this fact, and the query is on order time, then you pick some tentative
>>>>> partitions of delivery time, but you still have to filter over order time.
>>>>> Here the purpose of picking partitions is to reduce the amount of data
>>>>> read. When you pick delivery time partitions, it doesn't guarantee what
>>>>> order times data has, it only provides a tentative range. Hence, the
>>>>> partitions are picked based on the relation of time dimensions, but the
>>>>> extra filter still has to be supplied.
>>>>>
>>>>> Hope it's clear.
>>>>>
>>>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> That's the original time range asked in the cube query.
>>>>>>
>>>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:
>>>>>>
>>>>>>> Hi Rajat,
>>>>>>>
>>>>>>> Thanks for the explain. I have a clear understanding about dim
>>>>>>> relation now, however, how does the following condition generated:
>>>>>>>
>>>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>>>>>>>
>>>>>>> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <pr...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've created a jira for this: https://issues.apache.
>>>>>>>> org/jira/browse/LENS-1269
>>>>>>>>
>>>>>>>> You should be able to make change in your data locally and see the
>>>>>>>> non-empty results:
>>>>>>>>
>>>>>>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-
>>>>>>>> lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-
>>>>>>>> bin/client/examples/data/sales_aggr_fact2_local3/sales-
>>>>>>>> aggr-fact2-local3.data
>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pr...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> More details on how it converts to the fallback range:
>>>>>>>>>
>>>>>>>>> Range asked:
>>>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>
>>>>>>>>> Timedim Relation :
>>>>>>>>>
>>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>>>>>>>
>>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>>>>>>>
>>>>>>>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>>>>>>>
>>>>>>>>> Derivation:
>>>>>>>>>
>>>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2
>>>>>>>>> hour
>>>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>>>>
>>>>>>>>> Hope it's clearer now.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pr...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> So it seems right now the query is designed to return 0 rows. The
>>>>>>>>>> query is there just to demonstrate the fallback feature.
>>>>>>>>>>
>>>>>>>>>> Cube query is
>>>>>>>>>>
>>>>>>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>> Driver query comes out to be
>>>>>>>>>>
>>>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>>>>>
>>>>>>>>>> Then I modified the driver query and ran:
>>>>>>>>>>
>>>>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>>>>
>>>>>>>>>> Found the following results:
>>>>>>>>>>
>>>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>>>>
>>>>>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>>>>>> translated query.
>>>>>>>>>>
>>>>>>>>>> I'll look deeper and replace the example query with something
>>>>>>>>>> that demonstrates the fallback feature and also returns non-empty results.
>>>>>>>>>>
>>>>>>>>>> Let us know if you have any confusion understanding the feature
>>>>>>>>>> itself. I think the video should have covered it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> *Tao Yan*
>>>>>>> Software Engineer
>>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 206.250.5345
>>>>>>> tyan@linkedin.com
>>>>>>> https://www.linkedin.com/in/taousc
>>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Tao Yan*
>>>> Software Engineer
>>>> Data Analytics Infrastructure Tools and Services
>>>>
>>>>
>>>>
>>>> 206.250.5345
>>>> tyan@linkedin.com
>>>> https://www.linkedin.com/in/taousc
>>>>
>>>
>>
>>
>> --
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> tyan@linkedin.com
>> https://www.linkedin.com/in/taousc
>>
>

Re: Lens Query Conversion

Posted by Rajat Khandelwal <pr...@apache.org>.
"value" is required in a "property" tag. A property tag is name and value.
This special property defines that whenever queried on order_time time
dimension, try to answer with facts that have "ot" as partition column. If
not, it falls back to delivery_time time dimension. There again, the
mapping is checked, which is "dt", so facts that have "dt" as partition
column are preferred. This can happen in a chain until a valid combination
is found which can answer the query.


On Wed, Aug 10, 2016 at 12:13 AM Tao Yan <ty...@linkedin.com> wrote:

> So, the field 'value' can be used to define both alias and real values?
>
> On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <pr...@apache.org>
> wrote:
>
>> Yes, that is correct.
>>
>> On Wed, Aug 10, 2016, 00:09 Tao Yan <ty...@linkedin.com> wrote:
>>
>>> Thanks for the explanation. I got the idea of tentative partitions.
>>>
>>> Is the property
>>> *<property name="cube.timedim.partition.order_time" value="ot" />*
>>>  referring to the dim_attribute
>>> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
>>>
>>> Is the value 'ot' an alias of the partition column name?
>>>
>>>
>>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <pr...@apache.org>
>>> wrote:
>>>
>>>> The idea of partitions is that there is a stream of data that keeps
>>>> flowing in, each line of which has delivery time and order time both. The
>>>> data is partitioned and stored in partitions later and facts are created on
>>>> top of that partitioned data. So if data is partitioned in order time and
>>>> the time range queried is also order time, then we can pick the partitions
>>>> directly. But if the data is partitioned on delivery time, as is the case
>>>> for this fact, and the query is on order time, then you pick some tentative
>>>> partitions of delivery time, but you still have to filter over order time.
>>>> Here the purpose of picking partitions is to reduce the amount of data
>>>> read. When you pick delivery time partitions, it doesn't guarantee what
>>>> order times data has, it only provides a tentative range. Hence, the
>>>> partitions are picked based on the relation of time dimensions, but the
>>>> extra filter still has to be supplied.
>>>>
>>>> Hope it's clear.
>>>>
>>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pr...@apache.org>
>>>> wrote:
>>>>
>>>>> That's the original time range asked in the cube query.
>>>>>
>>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:
>>>>>
>>>>>> Hi Rajat,
>>>>>>
>>>>>> Thanks for the explain. I have a clear understanding about dim
>>>>>> relation now, however, how does the following condition generated:
>>>>>>
>>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>>>>>>
>>>>>> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <pr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I've created a jira for this:
>>>>>>> https://issues.apache.org/jira/browse/LENS-1269
>>>>>>>
>>>>>>> You should be able to make change in your data locally and see the
>>>>>>> non-empty results:
>>>>>>>
>>>>>>> cat
>>>>>>> /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/examples/data/sales_aggr_fact2_local3/sales-aggr-fact2-local3.data
>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pr...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> More details on how it converts to the fallback range:
>>>>>>>>
>>>>>>>> Range asked:
>>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>>>
>>>>>>>> Timedim Relation :
>>>>>>>>
>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>>>>>>
>>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>>>>>>
>>>>>>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>>>>>>
>>>>>>>> Derivation:
>>>>>>>>
>>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2
>>>>>>>> hour
>>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>>>
>>>>>>>> Hope it's clearer now.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pr...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> So it seems right now the query is designed to return 0 rows. The
>>>>>>>>> query is there just to demonstrate the fallback feature.
>>>>>>>>>
>>>>>>>>> Cube query is
>>>>>>>>>
>>>>>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>>>> Driver query comes out to be
>>>>>>>>>
>>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>>>>
>>>>>>>>> Then I modified the driver query and ran:
>>>>>>>>>
>>>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>>>
>>>>>>>>> Found the following results:
>>>>>>>>>
>>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>>>
>>>>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>>>>> translated query.
>>>>>>>>>
>>>>>>>>> I'll look deeper and replace the example query with something that
>>>>>>>>> demonstrates the fallback feature and also returns non-empty results.
>>>>>>>>>
>>>>>>>>> Let us know if you have any confusion understanding the feature
>>>>>>>>> itself. I think the video should have covered it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Tao Yan*
>>>>>> Software Engineer
>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>
>>>>>>
>>>>>>
>>>>>> 206.250.5345
>>>>>> tyan@linkedin.com
>>>>>> https://www.linkedin.com/in/taousc
>>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>> *Tao Yan*
>>> Software Engineer
>>> Data Analytics Infrastructure Tools and Services
>>>
>>>
>>>
>>> 206.250.5345
>>> tyan@linkedin.com
>>> https://www.linkedin.com/in/taousc
>>>
>>
>
>
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> tyan@linkedin.com
> https://www.linkedin.com/in/taousc
>

Re: Lens Query Conversion

Posted by Tao Yan <ty...@linkedin.com>.
So, the field 'value' can be used to define both alias and real values?

On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <pr...@apache.org> wrote:

> Yes, that is correct.
>
> On Wed, Aug 10, 2016, 00:09 Tao Yan <ty...@linkedin.com> wrote:
>
>> Thanks for the explanation. I got the idea of tentative partitions.
>>
>> Is the property
>> *<property name="cube.timedim.partition.order_time" value="ot" />*
>>  referring to the dim_attribute
>> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
>>
>> Is the value 'ot' an alias of the partition column name?
>>
>>
>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <pr...@apache.org>
>> wrote:
>>
>>> The idea of partitions is that there is a stream of data that keeps
>>> flowing in, each line of which has delivery time and order time both. The
>>> data is partitioned and stored in partitions later and facts are created on
>>> top of that partitioned data. So if data is partitioned in order time and
>>> the time range queried is also order time, then we can pick the partitions
>>> directly. But if the data is partitioned on delivery time, as is the case
>>> for this fact, and the query is on order time, then you pick some tentative
>>> partitions of delivery time, but you still have to filter over order time.
>>> Here the purpose of picking partitions is to reduce the amount of data
>>> read. When you pick delivery time partitions, it doesn't guarantee what
>>> order times data has, it only provides a tentative range. Hence, the
>>> partitions are picked based on the relation of time dimensions, but the
>>> extra filter still has to be supplied.
>>>
>>> Hope it's clear.
>>>
>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pr...@apache.org>
>>> wrote:
>>>
>>>> That's the original time range asked in the cube query.
>>>>
>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:
>>>>
>>>>> Hi Rajat,
>>>>>
>>>>> Thanks for the explain. I have a clear understanding about dim
>>>>> relation now, however, how does the following condition generated:
>>>>>
>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>>>>>
>>>>> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <pr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I've created a jira for this: https://issues.apache.
>>>>>> org/jira/browse/LENS-1269
>>>>>>
>>>>>> You should be able to make change in your data locally and see the
>>>>>> non-empty results:
>>>>>>
>>>>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-
>>>>>> lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-
>>>>>> bin/client/examples/data/sales_aggr_fact2_local3/sales-
>>>>>> aggr-fact2-local3.data
>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> More details on how it converts to the fallback range:
>>>>>>>
>>>>>>> Range asked:
>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>>
>>>>>>> Timedim Relation :
>>>>>>>
>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>>>>>
>>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>>>>>
>>>>>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>>>>>
>>>>>>> Derivation:
>>>>>>>
>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 hour
>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>>
>>>>>>> Hope it's clearer now.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pr...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> So it seems right now the query is designed to return 0 rows. The
>>>>>>>> query is there just to demonstrate the fallback feature.
>>>>>>>>
>>>>>>>> Cube query is
>>>>>>>>
>>>>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>>> Driver query comes out to be
>>>>>>>>
>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>>>
>>>>>>>> Then I modified the driver query and ran:
>>>>>>>>
>>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>>
>>>>>>>> Found the following results:
>>>>>>>>
>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>>
>>>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>>>> translated query.
>>>>>>>>
>>>>>>>> I'll look deeper and replace the example query with something that
>>>>>>>> demonstrates the fallback feature and also returns non-empty results.
>>>>>>>>
>>>>>>>> Let us know if you have any confusion understanding the feature
>>>>>>>> itself. I think the video should have covered it.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Tao Yan*
>>>>> Software Engineer
>>>>> Data Analytics Infrastructure Tools and Services
>>>>>
>>>>>
>>>>>
>>>>> 206.250.5345
>>>>> tyan@linkedin.com
>>>>> https://www.linkedin.com/in/taousc
>>>>>
>>>>
>>
>>
>> --
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> tyan@linkedin.com
>> https://www.linkedin.com/in/taousc
>>
>


-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
tyan@linkedin.com
https://www.linkedin.com/in/taousc

Re: Lens Query Conversion

Posted by Rajat Khandelwal <pr...@apache.org>.
Yes, that is correct.

On Wed, Aug 10, 2016, 00:09 Tao Yan <ty...@linkedin.com> wrote:

> Thanks for the explanation. I got the idea of tentative partitions.
>
> Is the property
> *<property name="cube.timedim.partition.order_time" value="ot" />*
>  referring to the dim_attribute
> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
>
> Is the value 'ot' an alias of the partition column name?
>
>
> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <pr...@apache.org>
> wrote:
>
>> The idea of partitions is that there is a stream of data that keeps
>> flowing in, each line of which has delivery time and order time both. The
>> data is partitioned and stored in partitions later and facts are created on
>> top of that partitioned data. So if data is partitioned in order time and
>> the time range queried is also order time, then we can pick the partitions
>> directly. But if the data is partitioned on delivery time, as is the case
>> for this fact, and the query is on order time, then you pick some tentative
>> partitions of delivery time, but you still have to filter over order time.
>> Here the purpose of picking partitions is to reduce the amount of data
>> read. When you pick delivery time partitions, it doesn't guarantee what
>> order times data has, it only provides a tentative range. Hence, the
>> partitions are picked based on the relation of time dimensions, but the
>> extra filter still has to be supplied.
>>
>> Hope it's clear.
>>
>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pr...@apache.org>
>> wrote:
>>
>>> That's the original time range asked in the cube query.
>>>
>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:
>>>
>>>> Hi Rajat,
>>>>
>>>> Thanks for the explain. I have a clear understanding about dim relation
>>>> now, however, how does the following condition generated:
>>>>
>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>>>>
>>>> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <pr...@apache.org>
>>>> wrote:
>>>>
>>>>> I've created a jira for this:
>>>>> https://issues.apache.org/jira/browse/LENS-1269
>>>>>
>>>>> You should be able to make change in your data locally and see the
>>>>> non-empty results:
>>>>>
>>>>> cat
>>>>> /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/examples/data/sales_aggr_fact2_local3/sales-aggr-fact2-local3.data
>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>
>>>>>
>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> More details on how it converts to the fallback range:
>>>>>>
>>>>>> Range asked:
>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>
>>>>>> Timedim Relation :
>>>>>>
>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>>>>
>>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>>>>
>>>>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>>>>
>>>>>> Derivation:
>>>>>>
>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 hour
>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>
>>>>>> Hope it's clearer now.
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> So it seems right now the query is designed to return 0 rows. The
>>>>>>> query is there just to demonstrate the fallback feature.
>>>>>>>
>>>>>>> Cube query is
>>>>>>>
>>>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>> Driver query comes out to be
>>>>>>>
>>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>>
>>>>>>> Then I modified the driver query and ran:
>>>>>>>
>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>
>>>>>>> Found the following results:
>>>>>>>
>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>
>>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>>> translated query.
>>>>>>>
>>>>>>> I'll look deeper and replace the example query with something that
>>>>>>> demonstrates the fallback feature and also returns non-empty results.
>>>>>>>
>>>>>>> Let us know if you have any confusion understanding the feature
>>>>>>> itself. I think the video should have covered it.
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Tao Yan*
>>>> Software Engineer
>>>> Data Analytics Infrastructure Tools and Services
>>>>
>>>>
>>>>
>>>> 206.250.5345
>>>> tyan@linkedin.com
>>>> https://www.linkedin.com/in/taousc
>>>>
>>>
>
>
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> tyan@linkedin.com
> https://www.linkedin.com/in/taousc
>

Re: Lens Query Conversion

Posted by Tao Yan <ty...@linkedin.com>.
Thanks for the explanation. I got the idea of tentative partitions.

Is the property
*<property name="cube.timedim.partition.order_time" value="ot" />*
 referring to the dim_attribute
*<dim_attribute name="order_time" _type="TIMESTAMP" /> *?

Is the value 'ot' an alias of the partition column name?


On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <pr...@apache.org> wrote:

> The idea of partitions is that there is a stream of data that keeps
> flowing in, each line of which has delivery time and order time both. The
> data is partitioned and stored in partitions later and facts are created on
> top of that partitioned data. So if data is partitioned in order time and
> the time range queried is also order time, then we can pick the partitions
> directly. But if the data is partitioned on delivery time, as is the case
> for this fact, and the query is on order time, then you pick some tentative
> partitions of delivery time, but you still have to filter over order time.
> Here the purpose of picking partitions is to reduce the amount of data
> read. When you pick delivery time partitions, it doesn't guarantee what
> order times data has, it only provides a tentative range. Hence, the
> partitions are picked based on the relation of time dimensions, but the
> extra filter still has to be supplied.
>
> Hope it's clear.
>
> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pr...@apache.org>
> wrote:
>
>> That's the original time range asked in the cube query.
>>
>> On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:
>>
>>> Hi Rajat,
>>>
>>> Thanks for the explain. I have a clear understanding about dim relation
>>> now, however, how does the following condition generated:
>>>
>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>>>
>>> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <pr...@apache.org>
>>> wrote:
>>>
>>>> I've created a jira for this: https://issues.apache.
>>>> org/jira/browse/LENS-1269
>>>>
>>>> You should be able to make change in your data locally and see the
>>>> non-empty results:
>>>>
>>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-
>>>> lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-
>>>> bin/client/examples/data/sales_aggr_fact2_local3/sales-
>>>> aggr-fact2-local3.data
>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>
>>>>
>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pr...@apache.org>
>>>> wrote:
>>>>
>>>>> More details on how it converts to the fallback range:
>>>>>
>>>>> Range asked:
>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>
>>>>> Timedim Relation :
>>>>>
>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>>>
>>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>>>
>>>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>>>
>>>>> Derivation:
>>>>>
>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 hour
>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>
>>>>> Hope it's clearer now.
>>>>>
>>>>>
>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> So it seems right now the query is designed to return 0 rows. The
>>>>>> query is there just to demonstrate the fallback feature.
>>>>>>
>>>>>> Cube query is
>>>>>>
>>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>> Driver query comes out to be
>>>>>>
>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>>
>>>>>> Then I modified the driver query and ran:
>>>>>>
>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>
>>>>>> Found the following results:
>>>>>>
>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>
>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>> translated query.
>>>>>>
>>>>>> I'll look deeper and replace the example query with something that
>>>>>> demonstrates the fallback feature and also returns non-empty results.
>>>>>>
>>>>>> Let us know if you have any confusion understanding the feature
>>>>>> itself. I think the video should have covered it.
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>> *Tao Yan*
>>> Software Engineer
>>> Data Analytics Infrastructure Tools and Services
>>>
>>>
>>>
>>> 206.250.5345
>>> tyan@linkedin.com
>>> https://www.linkedin.com/in/taousc
>>>
>>


-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
tyan@linkedin.com
https://www.linkedin.com/in/taousc

Re: Lens Query Conversion

Posted by Rajat Khandelwal <pr...@apache.org>.
The idea of partitions is that there is a stream of data that keeps flowing
in, each line of which has delivery time and order time both. The data is
partitioned and stored in partitions later and facts are created on top of
that partitioned data. So if data is partitioned in order time and the time
range queried is also order time, then we can pick the partitions directly.
But if the data is partitioned on delivery time, as is the case for this
fact, and the query is on order time, then you pick some tentative
partitions of delivery time, but you still have to filter over order time.
Here the purpose of picking partitions is to reduce the amount of data
read. When you pick delivery time partitions, it doesn't guarantee what
order times data has, it only provides a tentative range. Hence, the
partitions are picked based on the relation of time dimensions, but the
extra filter still has to be supplied.

Hope it's clear.

On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pr...@apache.org> wrote:

> That's the original time range asked in the cube query.
>
> On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:
>
>> Hi Rajat,
>>
>> Thanks for the explain. I have a clear understanding about dim relation
>> now, however, how does the following condition generated:
>>
>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>>
>> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>>
>>
>> Thanks,
>>
>>
>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <pr...@apache.org>
>> wrote:
>>
>>> I've created a jira for this:
>>> https://issues.apache.org/jira/browse/LENS-1269
>>>
>>> You should be able to make change in your data locally and see the
>>> non-empty results:
>>>
>>> cat
>>> /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/examples/data/sales_aggr_fact2_local3/sales-aggr-fact2-local3.data
>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>
>>>
>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pr...@apache.org>
>>> wrote:
>>>
>>>> More details on how it converts to the fallback range:
>>>>
>>>> Range asked:
>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>
>>>> Timedim Relation :
>>>>
>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>>
>>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>>
>>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>>
>>>> Derivation:
>>>>
>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 hour
>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>
>>>> Hope it's clearer now.
>>>>
>>>>
>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pr...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> So it seems right now the query is designed to return 0 rows. The
>>>>> query is there just to demonstrate the fallback feature.
>>>>>
>>>>> Cube query is
>>>>>
>>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>> Driver query comes out to be
>>>>>
>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>>
>>>>> Then I modified the driver query and ran:
>>>>>
>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>
>>>>> Found the following results:
>>>>>
>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>
>>>>> It's apparent that both the rows will be filtered out by the
>>>>> translated query.
>>>>>
>>>>> I'll look deeper and replace the example query with something that
>>>>> demonstrates the fallback feature and also returns non-empty results.
>>>>>
>>>>> Let us know if you have any confusion understanding the feature
>>>>> itself. I think the video should have covered it.
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>
>>
>>
>> --
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> tyan@linkedin.com
>> https://www.linkedin.com/in/taousc
>>
>

Re: Lens Query Conversion

Posted by Rajat Khandelwal <pr...@apache.org>.
That's the original time range asked in the cube query.

On Tue, Aug 9, 2016, 01:46 Tao Yan <ty...@linkedin.com> wrote:

> Hi Rajat,
>
> Thanks for the explain. I have a clear understanding about dim relation
> now, however, how does the following condition generated:
>
> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00')))
>
> order_time is a dim attributes defined in sales cube, and it does not have any relationships with properties or other dim attributes (The property cube.timedim.partition.order_time does, but is it the same thing?), I wonder how the query could pick up this column.
>
>
> Thanks,
>
>
> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <pr...@apache.org>
> wrote:
>
>> I've created a jira for this:
>> https://issues.apache.org/jira/browse/LENS-1269
>>
>> You should be able to make change in your data locally and see the
>> non-empty results:
>>
>> cat
>> /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/examples/data/sales_aggr_fact2_local3/sales-aggr-fact2-local3.data
>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>
>>
>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pr...@apache.org>
>> wrote:
>>
>>> More details on how it converts to the fallback range:
>>>
>>> Range asked:
>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>
>>> Timedim Relation :
>>>
>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>>
>>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>>
>>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>>
>>> Derivation:
>>>
>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 hour
>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>
>>> Hope it's clearer now.
>>>
>>>
>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pr...@apache.org>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> So it seems right now the query is designed to return 0 rows. The query
>>>> is there just to demonstrate the fallback feature.
>>>>
>>>> Cube query is
>>>>
>>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>>> '2015-04-13-03', '2015-04-13-04')
>>>> Driver query comes out to be
>>>>
>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>>
>>>> Then I modified the driver query and ran:
>>>>
>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>
>>>> Found the following results:
>>>>
>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>
>>>> It's apparent that both the rows will be filtered out by the translated
>>>> query.
>>>>
>>>> I'll look deeper and replace the example query with something that
>>>> demonstrates the fallback feature and also returns non-empty results.
>>>>
>>>> Let us know if you have any confusion understanding the feature itself.
>>>> I think the video should have covered it.
>>>>
>>>>
>>>> Regards
>>>>
>>>
>
>
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> tyan@linkedin.com
> https://www.linkedin.com/in/taousc
>

Re: Lens Query Conversion

Posted by Tao Yan <ty...@linkedin.com>.
Hi Rajat,

Thanks for the explain. I have a clear understanding about dim relation
now, however, how does the following condition generated:

and ((sales.order_time) >= '2015-04-13 03:00:00') and
((sales.order_time) < '2015-04-13 04:00:00')))

order_time is a dim attributes defined in sales cube, and it does not
have any relationships with properties or other dim attributes (The
property cube.timedim.partition.order_time does, but is it the same
thing?), I wonder how the query could pick up this column.


Thanks,


On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <pr...@apache.org> wrote:

> I've created a jira for this: https://issues.apache.
> org/jira/browse/LENS-1269
>
> You should be able to make change in your data locally and see the
> non-empty results:
>
> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-
> lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-
> bin/client/examples/data/sales_aggr_fact2_local3/sales-
> aggr-fact2-local3.data
> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>
>
> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pr...@apache.org> wrote:
>
>> More details on how it converts to the fallback range:
>>
>> Range asked:
>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>
>> Timedim Relation :
>>
>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>>
>> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>>
>> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>>
>> Derivation:
>>
>> 2015-04-13-03 <=ot < 2015-04-13-04
>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 hour
>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>
>> Hope it's clearer now.
>>
>>
>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pr...@apache.org>
>> wrote:
>>
>>> Hi
>>>
>>> So it seems right now the query is designed to return 0 rows. The query
>>> is there just to demonstrate the fallback feature.
>>>
>>> Cube query is
>>>
>>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>>> '2015-04-13-03', '2015-04-13-04')
>>> Driver query comes out to be
>>>
>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>>
>>> Then I modified the driver query and ran:
>>>
>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>
>>> Found the following results:
>>>
>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>
>>> It's apparent that both the rows will be filtered out by the translated
>>> query.
>>>
>>> I'll look deeper and replace the example query with something that
>>> demonstrates the fallback feature and also returns non-empty results.
>>>
>>> Let us know if you have any confusion understanding the feature itself.
>>> I think the video should have covered it.
>>>
>>>
>>> Regards
>>>
>>


-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
tyan@linkedin.com
https://www.linkedin.com/in/taousc

Re: Lens Query Conversion

Posted by Rajat Khandelwal <pr...@apache.org>.
I've created a jira for this:
https://issues.apache.org/jira/browse/LENS-1269

You should be able to make change in your data locally and see the
non-empty results:

cat
/Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/examples/data/sales_aggr_fact2_local3/sales-aggr-fact2-local3.data
2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2


On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pr...@apache.org> wrote:

> More details on how it converts to the fallback range:
>
> Range asked:
> (order_time, '2015-04-13-03', '2015-04-13-04')
>
> Timedim Relation :
>
> <property name="cube.timedim.relation.order_time" value="delivery_time+[-20 days,-1 hour]" />
>
> <property name="cube.timedim.relation.order_time" value="delivery_time+[-2 hours,-1hour]" />
>
> The cube defines the first relation, the fact defines the second relation. A fact can optionally override the timedim relation defined by cube. So the second one is picked :
>
> Derivation:
>
> 2015-04-13-03 <=ot < 2015-04-13-04
> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 hour
> => 2015-04-13-04 <= dt < 2015-04-13-06.
>
> Hope it's clearer now.
>
>
> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pr...@apache.org> wrote:
>
>> Hi
>>
>> So it seems right now the query is designed to return 0 rows. The query
>> is there just to demonstrate the fallback feature.
>>
>> Cube query is
>>
>> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
>> '2015-04-13-03', '2015-04-13-04')
>> Driver query comes out to be
>>
>> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>>
>> Then I modified the driver query and ran:
>>
>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>
>> Found the following results:
>>
>> Bangalore^A2015-04-13 00:00:00^A0.0
>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>
>> It's apparent that both the rows will be filtered out by the translated
>> query.
>>
>> I'll look deeper and replace the example query with something that
>> demonstrates the fallback feature and also returns non-empty results.
>>
>> Let us know if you have any confusion understanding the feature itself. I
>> think the video should have covered it.
>>
>>
>> Regards
>>
>

Re: Lens Query Conversion

Posted by Rajat Khandelwal <pr...@apache.org>.
More details on how it converts to the fallback range:

Range asked:
(order_time, '2015-04-13-03', '2015-04-13-04')

Timedim Relation :

<property name="cube.timedim.relation.order_time"
value="delivery_time+[-20 days,-1 hour]" />

<property name="cube.timedim.relation.order_time"
value="delivery_time+[-2 hours,-1hour]" />

The cube defines the first relation, the fact defines the second
relation. A fact can optionally override the timedim relation defined
by cube. So the second one is picked :

Derivation:

2015-04-13-03 <=ot < 2015-04-13-04
dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 hour
=> 2015-04-13-04 <= dt < 2015-04-13-06.

Hope it's clearer now.


On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pr...@apache.org> wrote:

> Hi
>
> So it seems right now the query is designed to return 0 rows. The query is
> there just to demonstrate the fallback feature.
>
> Cube query is
>
> cube select customer_city_name, store_cost from sales where time_range_in(order_time,
> '2015-04-13-03', '2015-04-13-04')
> Driver query comes out to be
>
> SELECT (customer_city.name), sum((sales.store_cost)) FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY (customer_city.name)
>
> Then I modified the driver query and ran:
>
> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on sales.customer_city_id = customer_city.id and (customer_city.dt = 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>
> Found the following results:
>
> Bangalore^A2015-04-13 00:00:00^A0.0
> Hyderabad^A2015-04-13 00:00:00^A2.0
>
> It's apparent that both the rows will be filtered out by the translated
> query.
>
> I'll look deeper and replace the example query with something that
> demonstrates the fallback feature and also returns non-empty results.
>
> Let us know if you have any confusion understanding the feature itself. I
> think the video should have covered it.
>
>
> Regards
>

Re: Lens Query Conversion

Posted by Rajat Khandelwal <pr...@apache.org>.
Hi

So it seems right now the query is designed to return 0 rows. The query is
there just to demonstrate the fallback feature.

Cube query is

cube select customer_city_name, store_cost from sales where
time_range_in(order_time,
'2015-04-13-03', '2015-04-13-04')

Driver query comes out to be

SELECT (customer_city.name), sum((sales.store_cost)) FROM
a.local_sales_aggr_fact2 sales join a.local_city_table customer_city
on sales.customer_city_id = customer_city.id and (customer_city.dt =
'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) =
'2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00')
and ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY
(customer_city.name)

Then I modified the driver query and ran:

SELECT (customer_city.name), sales.order_time, sales.store_cost FROM
a.local_sales_aggr_fact2 sales join a.local_city_table customer_city
on sales.customer_city_id = customer_city.id and (customer_city.dt =
'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) =
'2015-04-13-05'))))

Found the following results:

Bangalore^A2015-04-13 00:00:00^A0.0
Hyderabad^A2015-04-13 00:00:00^A2.0

It's apparent that both the rows will be filtered out by the translated
query.

I'll look deeper and replace the example query with something that
demonstrates the fallback feature and also returns non-empty results.

Let us know if you have any confusion understanding the feature itself. I
think the video should have covered it.


Regards

Re: Lens Query Conversion

Posted by "amareshwarisr ." <am...@gmail.com>.
Tao,

Not sure if it is bug in recent code. But we are using this feature in
production. Will look at whats happening and update on this thread.

Rajat, Can you help to figure out whats the issue here?

Thanks

On Sat, Aug 6, 2016 at 2:26 AM, Tao Yan <ty...@linkedin.com> wrote:

> BTW, the query end up with empty result other than the following result in
> the 20 minutes demo:
>
> -----------my query result------------
> customer_city.name    _c1
> Result available in memory, attaching here:
>
> 0 rows processed in (57) seconds.
>
>
> ------------demo query result-----------
> customer_city.name    _c1
>
>
> Bangalore    0.0
>
>
> Hyderabad    2.0
>
>
> 2 rows process in (8) seconds.
>
> On Fri, Aug 5, 2016 at 1:37 PM, Tao Yan <ty...@linkedin.com> wrote:
>
>> Hi Lens Developers,
>>
>> I am running the example queries, and found the following Lens query:
>>
>> query execute cube select customer_city_name, store_cost from sales where
>> time_range_in(order_time, '2015-04-13-03', '2015-04-13-04')
>>
>> Was converted to the following hive Query:
>>
>> SELECT ( customer_city . name ), sum(( sales . store_cost )) FROM
>> local_sales_aggr_fact2 sales join local_city_table customer_city on
>> sales.customer_city_id = customer_city.id and (customer_city.dt =
>> 'latest') WHERE ((((*(( sales . dt ) =  '2015-04-13-04' ) or (( sales .
>> dt ) =  '2015-04-13-05' ))* and ((( *sales . order_time* ) >=
>>  '2015-04-13 03:00:00' ) and (( *sales . order_time* ) <  '2015-04-13
>> 04:00:00' ))))) GROUP BY ( customer_city . name )
>>
>> The property shows (! grep -A 33333333 <x_ examples/resources/sales-cube.
>> xml):
>>   <properties>
>>     <property name="cube.sales.timed.dimensions.list"
>> value="order_time,delivery_time" />
>>     *<property name="cube.timedim.partition.order_time" value="ot" />*
>>     <property name="cube.timedim.partition.delivery_time" value="dt" />
>>     *<property name="cube.timedim.relation.order_time"
>> value="delivery_time+[-20 days,-1 hour]" />*
>>     <!-- means dt-20days <= ot <= dt-1hour -->
>>   </properties>
>>
>>
>> Which means that "order time can be between delivery time -20 days to
>> delivery time - 1 hour" according to the 20 minutes demo.
>>
>> However, the Hive query's 'where' condition use deliver_time equal '
>> *2015-04-13-04*' or '*2015-04-13-05*', and how does it find the column
>> 'sales.order_time'? Shouldn't it match order_time with partition 'ot'?
>>
>> Thanks,
>> --
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> tyan@linkedin.com
>> https://www.linkedin.com/in/taousc
>>
>
>
>
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> tyan@linkedin.com
> https://www.linkedin.com/in/taousc
>

Re: Lens Query Conversion

Posted by Tao Yan <ty...@linkedin.com>.
BTW, the query end up with empty result other than the following result in
the 20 minutes demo:

-----------my query result------------
customer_city.name    _c1
Result available in memory, attaching here:

0 rows processed in (57) seconds.


------------demo query result-----------
customer_city.name    _c1

Bangalore    0.0

Hyderabad    2.0

2 rows process in (8) seconds.

On Fri, Aug 5, 2016 at 1:37 PM, Tao Yan <ty...@linkedin.com> wrote:

> Hi Lens Developers,
>
> I am running the example queries, and found the following Lens query:
>
> query execute cube select customer_city_name, store_cost from sales where
> time_range_in(order_time, '2015-04-13-03', '2015-04-13-04')
>
> Was converted to the following hive Query:
>
> SELECT ( customer_city . name ), sum(( sales . store_cost )) FROM
> local_sales_aggr_fact2 sales join local_city_table customer_city on
> sales.customer_city_id = customer_city.id and (customer_city.dt =
> 'latest') WHERE ((((*(( sales . dt ) =  '2015-04-13-04' ) or (( sales .
> dt ) =  '2015-04-13-05' ))* and ((( *sales . order_time* ) >=
>  '2015-04-13 03:00:00' ) and (( *sales . order_time* ) <  '2015-04-13
> 04:00:00' ))))) GROUP BY ( customer_city . name )
>
> The property shows (! grep -A 33333333 <x_ examples/resources/sales-cube.
> xml):
>   <properties>
>     <property name="cube.sales.timed.dimensions.list"
> value="order_time,delivery_time" />
>     *<property name="cube.timedim.partition.order_time" value="ot" />*
>     <property name="cube.timedim.partition.delivery_time" value="dt" />
>     *<property name="cube.timedim.relation.order_time"
> value="delivery_time+[-20 days,-1 hour]" />*
>     <!-- means dt-20days <= ot <= dt-1hour -->
>   </properties>
>
>
> Which means that "order time can be between delivery time -20 days to
> delivery time - 1 hour" according to the 20 minutes demo.
>
> However, the Hive query's 'where' condition use deliver_time equal '
> *2015-04-13-04*' or '*2015-04-13-05*', and how does it find the column
> 'sales.order_time'? Shouldn't it match order_time with partition 'ot'?
>
> Thanks,
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> tyan@linkedin.com
> https://www.linkedin.com/in/taousc
>



-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
tyan@linkedin.com
https://www.linkedin.com/in/taousc