You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lens.apache.org by "Puneet Gupta (JIRA)" <ji...@apache.org> on 2016/12/13 02:59:59 UTC

[jira] [Commented] (LENS-1381) Support Fact to Fact Union

    [ https://issues.apache.org/jira/browse/LENS-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743967#comment-15743967 ] 

Puneet Gupta commented on LENS-1381:
------------------------------------

Design for this requirement :

*Current rewrite flow*
- Currently the rewrite flow relies on Set<CandidateFact> and Set<Set<CandidateFact>> which represents the participating Facts and combination of Facts(in case of join between 2 or more facts) that can answer the user query respectively.  

- Set<CandidateFact> is initially populated considering all the Facts will participate and the Set<Set<CandidateFact>> is created based on joins that are required to answer the query (with assumption that two two facts can be joined if they have the dimensions that are being queried by the user. After joining the facts, the queried measures which are split across facts are picked). Along the rewrite flow the above data structures are pruned based on column availability, data availability, storage validity, fact validity, cost,etc. In the last a final CandidateFact combination is picked from  Set<Set<CandidateFact>>. 

- To write the rewritten query for the picked candidate combination, one of the following contexts are created 
-- SingleFactSingleStorageHQLContext or (Candidate combination has single fact and single storage)
-- SingleFactMultiStorageHQLContext or (Candidate combination has single fact and multiple storages within that fact  - Union Query)
-- MultiFactHQLContext  (Candidate combination has multiple facts - Join Query)

*New Flow*
# The new flow will work at Storage level and will use a list of StorageCandidates. Initially all Storages are candidates. 
# The list of StorageCandidates is pruned based on column availability, storage validity, fact validity, update period validity,etc
# The StorageCandidates are then grouped to ensure that a group can cover the entire time range queried by the user. Its possible for a group to have a single StorageCandidate incase this storage alone can fulfill the time ranges queried. If a group has more that one storages , then this group is represented as a UnionCandidate. 
# The groups created in step 3 ( UnionCandidates and StorageCandidates) are used to find a measure covering group such that members of this group cover all the measures queried by the user. Again its possible for this group to have a single member (which can be a StorageCandidate or a UnionCandidate) that can answer all the measures. If the group has more than one members, then  that group is represented as a JoinCandidate
# JoinCandidate, UnionCandidate and  StorageCandidate extend the same Candidate Interface. 
# The groups created in step 4 are further pruned based on data availability, cost ,etc  we pick a winning group (Candidate) 
# Query is then written for this winning Candidate


 



> Support Fact to Fact Union
> --------------------------
>
>                 Key: LENS-1381
>                 URL: https://issues.apache.org/jira/browse/LENS-1381
>             Project: Apache Lens
>          Issue Type: New Feature
>            Reporter: Puneet Gupta
>
> Currently Lens supports Union-ing data across different storages in a single Fact. With this JIRA Lens server will be able to Union Data Across Facts too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)