You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lens.apache.org by "Rajat Khandelwal (JIRA)" <ji...@apache.org> on 2015/06/09 12:53:00 UTC
[jira] [Commented] (LENS-582) in lens query fact table update
period weekly throws error if start and end date is not sunday
[ https://issues.apache.org/jira/browse/LENS-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578715#comment-14578715 ]
Rajat Khandelwal commented on LENS-582:
---------------------------------------
The first queried range 19 to 21 can't be answered by weekly facts, sunday or not.
As far as the "Sunday" issue is concerned, lens just requires you to roll up your data on this boundary, and that is all. There are no other restrictions.
I hope your data model is not creating one fact for each update period. While I understand that's a valid case, but a more practical case is that the weekly data is just roll up of daily data. Then, the fact would have both daily and weekly update periods. Then any query on day boundaries would be queryable.
The error is coming because lens first checks whether fact can cover the given range or not. A fact can cover the range only if its update periods can cover the range. A fact which has only weekly update period can't be expected to cover any arbitrary day-boundary ranges. A fact which has both daily and weekly update periods *can*.
So if you create the fact with two update periods, it can cover any day-boundary ranges. Then, partitions will be picked accordingly. Weekly partitions will be picked as much as possible, the rest will be covered by daily.
With this knowledge, you can even game the system by registering daily as an update period and not adding any actual daily partitions. You'd have to set fail on partial = false while querying, and your weekly sub-range will be formed for you by lens. anything before first sunday and after last sunday will be ignored.
> in lens query fact table update period weekly throws error if start and end date is not sunday
> ----------------------------------------------------------------------------------------------
>
> Key: LENS-582
> URL: https://issues.apache.org/jira/browse/LENS-582
> Project: Apache Lens
> Issue Type: Bug
> Components: build
> Reporter: Biru Kumar
> Assignee: Rajat Khandelwal
>
> Lens query
> {noformat}
> lens-shell>query execute cube select avg(servedImpressions) from user_activity where time_range_in(dt, '2015-01-19', '2015-01-21')
> Launching query failed cause: Driver :org.apache.lens.driver.hive.HiveDriver Cause :No candidate fact table available to answer the query, because {"brief":"No fact update periods for given range","details":{"user_attributestore_er_fact_supply_site_burn,user_attributestore_er_fact_demandcategory_click,user_attributestore_er_fact_supplycategory_visits,user_attributestore_er_fact_supply_site_impressions_rendered,user_attributestore_er_fact_adgroup_click,user_attributestore_er_fact_adgroup_impression_time_install,user_attributestore_er_fact_app_impression_time_install,user_attributestore_er_fact_supply_site_impressions_served,user_attributestore_er_fact_adgroup_burn,hive_fact_user_curation_good_traffic,user_attributestore_er_fact_app_visits,user_attributestore_er_fact_app_click,user_attributestore_er_fact_supply_site_click,user_attributestore_er_fact_adgroup_impressions_rendered":[{"cause":"COLUMN_NOT_FOUND","missingColumns":["servedimpressions"]}],"user_attributestore_er_fact_adgroup_view":[{"cause":"NO_FACT_UPDATE_PERIODS_FOR_GIVEN_RANGE"}]}}
> {noformat}
> fact table user_attributestore_er_fact_adgroup_view has the coloumn servedImpressions there and its update period is weekly.
> in the above query i have selected start and end date that does not fall on sunday.
> below the definition of fact table user_attributestore_er_fact_adgroup_view
> {noformat}
> lens-shell>describe fact user_attributestore_er_fact_adgroup_view
> columns :
> column :
> name : userid type : string
> name : timestamp type : timestamp
> name : adgroupguid type : string
> name : servedimpressions type : bigint
> properties :
> property :
> name : cube.table.user_attributestore_er_fact_adgroup_view.weight value : 0.1
> name : cube.table.type value : FACT
> name : cube.fact.user_attributestore_er_fact_adgroup_view.uh1_hdfs.updateperiods value : WEEKLY
> name : cube.fact.is.aggregated value : false
> name : cube.fact.user_attributestore_er_fact_adgroup_view.cubename value : user_activity
> name : transient_lastDdlTime value : 1431973737
> name : cube.fact.user_attributestore_er_fact_adgroup_view.storages value : uh1_hdfs
> storageTables :
> storageTable :
> updatePeriods : updatePeriod : WEEKLY
> storageName : uh1_hdfs tableDesc : partCols : column :
> name : dt type : string comment : Date partition field
> tableParameters : property :
> name : cube.storagetable.partition.timeline.cache.WEEKLY.dt.latest value : 2015-W08 name : conf.attributestore.schema value : \granularity\:\weekly_dailycumulative\\erinfo\:\source\:null\erid\:\8589934594\\entityType\:\ADGROUP\\entityAlias\:\adGroupGuid\\relationshipType\:\VIEW\\fields\:\alias\:\servedImpressions\\name\:\count\\type\:\LONG\
> \stores\:\viewvisit\
> name : cube.storagetable.partition.timeline.cache.present value : true name : EXTERNAL value : TRUE name : cube.storagetable.time.partcols value : dt name : cube.storagetable.partition.timeline.cache.WEEKLY.dt.first value : 2015-W07 name : transient_lastDdlTime value : 1432301397 name : cube.storagetable.partition.timeline.cache.WEEKLY.dt.holes.size value : 0 name : conf.debugmode value : true name : cube.storagetable.partition.timeline.cache.WEEKLY.dt.storage.class value : org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
> serdeParameters : property :
> name : serialization.format value : 1
> timePartCols : dt
> external : true tableLocation : hdfs://hostname:8020/user/hive/warehouse/user.db/uh1_hdfs_user_attributestore_er_fact_adgroup_view inputFormat : com.inmobi.user.analytics.storage.inputformat.UserAttributeThriftInputFormat outputFormat : org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serdeClassName : com.inmobi.user.analytics.storage.serde.UserAttributeThriftSerde storageHandlerName : numBuckets : 0 compressed : false
> name : user_attributestore_er_fact_adgroup_view
> cubeName : user_activity
> weight : 0.1
> {noformat}
> however below query runs successfully because start and end date mentioned below are sunday.
> {noformat}
> lens-shell>query execute cube select avg(servedImpressions) from user_activity where time_range_in(dt, '2015-02-08', '2015-03-01')
> _c0
> 2.7083333333333335
> 1 rows process in (20) seconds.
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)