You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Xiaoxiang Yu (Jira)" <ji...@apache.org> on 2019/12/26 06:26:00 UTC
[jira] [Resolved] (KYLIN-4167) Refactor streaming coordinator

     [ https://issues.apache.org/jira/browse/KYLIN-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiaoxiang Yu resolved KYLIN-4167.
---------------------------------
    Resolution: Fixed

> Refactor streaming coordinator
> ------------------------------
>
>                 Key: KYLIN-4167
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4167
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Real-time Streaming
>            Reporter: Xiaoxiang Yu
>            Assignee: Xiaoxiang Yu
>            Priority: Major
>             Fix For: v3.0.0
>
>
> h2. Summary
>  # Currently, *coordinator* has too many responsibility, which violate single responsibility principle, and it not easy for extension, a good separation of responsibilities is a recommended way.
>  # Some cluster level operation has no atomicity guarantee, we should implement then in idempotent way to achieve final consistency
>  #  Resubmit when job was discarded
>  # Clarify overall design for realtime OLAP
>  
> h4. StreamingCoordinator
> Facade of coordinator, will controll BuildJobSummitter/ReceiverClusterMangaer and delegate operation to them.
> h4. BuildJobSubmitter
> The main responsibility of BuildJobSubmitter including:
> 1. Try to find candidate segment which ready to submit a build job
> 2. Trace the status of candidate segment's build job and promote segment if it is has met requirements
> h4.  
> h4. ReceiverClusterManager
> This class manage operation related to multi streaming receivers. They are often not atomic and maybe idempotent.
> h4. ClusterStateChecker
> Basic step of this class:
> 1. stop/pause coordinator to avoid underlying concurrency issue
> 2. check inconsistent state of all receiver cluster
> 3. send summary via mail to kylin admin
> 4. if need, call ClusterDoctor to repair inconsistent issue
> h4. ClusterDoctor
> Repair inconsistent state according to result of ClusterStateChecker
>  
> ----
> h3. Candidate Segment
> The candidate segments are those segments what can be saw/perceived by streaming coordinator,
> candidate segment could be divided into following state/queue:
> 1. segment which data are uploaded *PARTLY*
> 2. segment which data are uploaded completely and *WAITING* to build
> 3. segment which in *BUILDING* state, job's state should be one of (NEW/RUNNING/ERROR/DISCARD)
> 4. segment which built *succeed* and wait to be delivered to historical part (and to be deleted in realtime part)
> 5. segment which *in historical part*(HBase Ready Segment)
>  
> By design, segment should transfer to next queue in sequential way(shouldn't jump the queue), do not break this.
> h3. Atomicity
> In a multi-step transcation, following acepts should be thought twice:
> 1. should *fail fast* or continue when exception thrown.
> 2. should API(remote call) be *synchronous* or asynchronous
> 3. when transcation failed, could *roll back* always succeed
> 4. transcation should be *idempotent* so when it failed, it could be fixed by retry
>  
> How to ensure whole cluster opreation smoothly without blocking problem. I divided all multi-step transcation into three kinds:
> NotAtomicIdempotent
> NotAtomicAndNotIdempotent
> NonSideEffect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)