You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/12/06 12:37:00 UTC

[jira] [Commented] (KYLIN-4167) Refactor streaming coordinator

    [ https://issues.apache.org/jira/browse/KYLIN-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989711#comment-16989711 ] 

ASF GitHub Bot commented on KYLIN-4167:
---------------------------------------

nichunen commented on pull request #961: KYLIN-4167 Phase2
URL: https://github.com/apache/kylin/pull/961
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Refactor streaming coordinator
> ------------------------------
>
>                 Key: KYLIN-4167
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4167
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Real-time Streaming
>            Reporter: Xiaoxiang Yu
>            Assignee: Xiaoxiang Yu
>            Priority: Major
>             Fix For: v3.0.0
>
>
> h2. Summary
>  # Currently, *coordinator* has too many responsibility, which violate single responsibility principle, and it not easy for extension, a good separation of responsibilities is a recommended way.
>  # Some cluster level operation has no atomicity guarantee, we should implement then in idempotent way to achieve final consistency
>  #  Resubmit when job was discarded
>  # Clarify overall design for realtime OLAP
>  
> h4. StreamingCoordinator
> Facade of coordinator, will controll BuildJobSummitter/ReceiverClusterMangaer and delegate operation to them.
> h4. BuildJobSubmitter
> The main responsibility of BuildJobSubmitter including:
> 1. Try to find candidate segment which ready to submit a build job
> 2. Trace the status of candidate segment's build job and promote segment if it is has met requirements
> h4.  
> h4. ReceiverClusterManager
> This class manage operation related to multi streaming receivers. They are often not atomic and maybe idempotent.
> h4. ClusterStateChecker
> Basic step of this class:
> 1. stop/pause coordinator to avoid underlying concurrency issue
> 2. check inconsistent state of all receiver cluster
> 3. send summary via mail to kylin admin
> 4. if need, call ClusterDoctor to repair inconsistent issue
> h4. ClusterDoctor
> Repair inconsistent state according to result of ClusterStateChecker
>  
> ----
> h3. Candidate Segment
> The candidate segments are those segments what can be saw/perceived by streaming coordinator,
> candidate segment could be divided into following state/queue:
> 1. segment which data are uploaded *PARTLY*
> 2. segment which data are uploaded completely and *WAITING* to build
> 3. segment which in *BUILDING* state, job's state should be one of (NEW/RUNNING/ERROR/DISCARD)
> 4. segment which built *succeed* and wait to be delivered to historical part (and to be deleted in realtime part)
> 5. segment which *in historical part*(HBase Ready Segment)
>  
> By design, segment should transfer to next queue in sequential way(shouldn't jump the queue), do not break this.
> h3. Atomicity
> In a multi-step transcation, following acepts should be thought twice:
> 1. should *fail fast* or continue when exception thrown.
> 2. should API(remote call) be *synchronous* or asynchronous
> 3. when transcation failed, could *roll back* always succeed
> 4. transcation should be *idempotent* so when it failed, it could be fixed by retry
>  
> How to ensure whole cluster opreation smoothly without blocking problem. I divided all multi-step transcation into three kinds:
> NotAtomicIdempotent
> NotAtomicAndNotIdempotent
> NonSideEffect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)