You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Li Lu (JIRA)" <ji...@apache.org> on 2016/08/04 01:06:20 UTC

[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

    [ https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406938#comment-15406938 ] 

Li Lu commented on YARN-4061:
-----------------------------

Let me revive this thread after the branch merge. From some offline discussion I think our plan is to implement a specialized BufferedMutator so that it can retry when the cluster is down. The benefit of this approach is we do not need to repost the data to buffered mutator, so that saves much memory operations when the cluster is down. We can pretty much reuse some retry logic in our codebase today. 

The challenge for this design is that we're not persisting anything until the data reaches the HBase cluster. That is to say, with this change we can handle the case when the HBase cluster is down, but cannot handle if collectors themselves are down. If the collector fails when it's retrying, we lose the data. To address this problem, we may use a local journal file to store the state in the buffered mutator. 

Aggregation status is something we need to recover if collectors fail. However, at the very first phase maybe we can restart everything in the aggregation table on restarts? 

I know this thread is an old one, but please feel free to chime in since we're targeting to add this feature to the Alpha 2 phase of timeline v2. 

> [Fault tolerance] Fault tolerant writer for timeline v2
> -------------------------------------------------------
>
>                 Key: YARN-4061
>                 URL: https://issues.apache.org/jira/browse/YARN-4061
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>              Labels: YARN-5355
>         Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org