You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lens.apache.org by "Rajat Khandelwal (JIRA)" <ji...@apache.org> on 2015/06/09 12:53:00 UTC

[jira] [Commented] (LENS-582) in lens query fact table update period weekly throws error if start and end date is not sunday

    [ https://issues.apache.org/jira/browse/LENS-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578715#comment-14578715 ] 

Rajat Khandelwal commented on LENS-582:
---------------------------------------

The first queried range 19 to 21 can't be answered by weekly facts, sunday or not. 

As far as the "Sunday" issue is concerned, lens just requires you to roll up your data on this boundary, and that is all. There are no other restrictions. 

I hope your data model is not creating one fact for each update period. While I understand that's a valid case, but a more practical case is that the weekly data is just roll up of daily data. Then, the fact would have both daily and weekly update periods. Then any query on day boundaries would be queryable. 

The error is coming because lens first checks whether fact can cover the given range or not. A fact can cover the range only if its update periods can cover the range. A fact which has only weekly update period can't be expected to cover any arbitrary day-boundary ranges. A fact which has both daily and weekly update periods *can*. 

So if you create the fact with two update periods, it can cover any day-boundary ranges. Then, partitions will be picked accordingly. Weekly partitions will be picked as much as possible, the rest will be covered by daily. 

With this knowledge, you can even game the system by registering daily as an update period and not adding any actual daily partitions. You'd have to set fail on partial = false while querying, and your weekly sub-range will be formed for you by lens. anything before first sunday and after last sunday will be ignored. 



> in lens query fact table update period weekly throws error if start and end date is not sunday
> ----------------------------------------------------------------------------------------------
>
>                 Key: LENS-582
>                 URL: https://issues.apache.org/jira/browse/LENS-582
>             Project: Apache Lens
>          Issue Type: Bug
>          Components: build
>            Reporter: Biru Kumar
>            Assignee: Rajat Khandelwal
>
> Lens query
> {noformat}
> lens-shell>query execute cube select  avg(servedImpressions) from user_activity where time_range_in(dt, '2015-01-19', '2015-01-21')
> Launching query failed cause: Driver :org.apache.lens.driver.hive.HiveDriver Cause :No candidate fact table available to answer the query, because {"brief":"No fact update periods for given range","details":{"user_attributestore_er_fact_supply_site_burn,user_attributestore_er_fact_demandcategory_click,user_attributestore_er_fact_supplycategory_visits,user_attributestore_er_fact_supply_site_impressions_rendered,user_attributestore_er_fact_adgroup_click,user_attributestore_er_fact_adgroup_impression_time_install,user_attributestore_er_fact_app_impression_time_install,user_attributestore_er_fact_supply_site_impressions_served,user_attributestore_er_fact_adgroup_burn,hive_fact_user_curation_good_traffic,user_attributestore_er_fact_app_visits,user_attributestore_er_fact_app_click,user_attributestore_er_fact_supply_site_click,user_attributestore_er_fact_adgroup_impressions_rendered":[{"cause":"COLUMN_NOT_FOUND","missingColumns":["servedimpressions"]}],"user_attributestore_er_fact_adgroup_view":[{"cause":"NO_FACT_UPDATE_PERIODS_FOR_GIVEN_RANGE"}]}}
> {noformat}
> fact table user_attributestore_er_fact_adgroup_view has the coloumn servedImpressions there and its update period is weekly.
> in the above query i have selected start and end date that does not fall on sunday.
> below the definition of fact table user_attributestore_er_fact_adgroup_view
> {noformat}
> lens-shell>describe fact user_attributestore_er_fact_adgroup_view
> columns :
> column :
>    name : userid  type : string
>    name : timestamp  type : timestamp
>    name : adgroupguid  type : string
>    name : servedimpressions  type : bigint
> properties :
> property :
>    name : cube.table.user_attributestore_er_fact_adgroup_view.weight  value : 0.1
>    name : cube.table.type  value : FACT
>    name : cube.fact.user_attributestore_er_fact_adgroup_view.uh1_hdfs.updateperiods  value : WEEKLY
>    name : cube.fact.is.aggregated  value : false
>    name : cube.fact.user_attributestore_er_fact_adgroup_view.cubename  value : user_activity
>    name : transient_lastDdlTime  value : 1431973737
>    name : cube.fact.user_attributestore_er_fact_adgroup_view.storages  value : uh1_hdfs
> storageTables :
> storageTable :
>    updatePeriods :   updatePeriod :  WEEKLY
>     storageName : uh1_hdfs  tableDesc :   partCols :   column :
>    name : dt  type : string  comment : Date partition field
>     tableParameters :   property :
>    name : cube.storagetable.partition.timeline.cache.WEEKLY.dt.latest  value : 2015-W08     name : conf.attributestore.schema  value : \granularity\:\weekly_dailycumulative\\erinfo\:\source\:null\erid\:\8589934594\\entityType\:\ADGROUP\\entityAlias\:\adGroupGuid\\relationshipType\:\VIEW\\fields\:\alias\:\servedImpressions\\name\:\count\\type\:\LONG\
> \stores\:\viewvisit\
>      name : cube.storagetable.partition.timeline.cache.present  value : true     name : EXTERNAL  value : TRUE     name : cube.storagetable.time.partcols  value : dt     name : cube.storagetable.partition.timeline.cache.WEEKLY.dt.first  value : 2015-W07     name : transient_lastDdlTime  value : 1432301397     name : cube.storagetable.partition.timeline.cache.WEEKLY.dt.holes.size  value : 0     name : conf.debugmode  value : true     name : cube.storagetable.partition.timeline.cache.WEEKLY.dt.storage.class  value : org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>     serdeParameters :   property :
>    name : serialization.format  value : 1
>     timePartCols :  dt
>   external : true  tableLocation : hdfs://hostname:8020/user/hive/warehouse/user.db/uh1_hdfs_user_attributestore_er_fact_adgroup_view  inputFormat : com.inmobi.user.analytics.storage.inputformat.UserAttributeThriftInputFormat  outputFormat : org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat  serdeClassName : com.inmobi.user.analytics.storage.serde.UserAttributeThriftSerde  storageHandlerName :   numBuckets : 0  compressed : false
> name : user_attributestore_er_fact_adgroup_view
> cubeName : user_activity
> weight : 0.1
> {noformat} 
> however below query runs successfully because start and end date mentioned below are sunday.
> {noformat}
> lens-shell>query execute cube select avg(servedImpressions) from user_activity where time_range_in(dt, '2015-02-08', '2015-03-01')
> _c0
> 2.7083333333333335
> 1 rows process in (20) seconds.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)