You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Joep Rottinghuis (JIRA)" <ji...@apache.org> on 2015/09/16 21:20:45 UTC

[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

    [ https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790987#comment-14790987 ] 

Joep Rottinghuis commented on YARN-4062:
----------------------------------------

While discussing flush and compaction with [~vrushalic] I just realized that there might be a complication with cross-dc replication.

Potentially the RS in two different datacenters might decide to flush/compact values for one row at the same time. We need to think through the consequences what happens if they make a different decision (because one DC might have later information that hasn't been replicated across such as app completion for example). Even if the order and the decisions are deterministic, we need to consider what happens if two regions modify the same row.
With hRaven we have been able to make master-master replication work because we were guaranteed that every row is "owned" and therefore manipulated only locally.

Perhaps we can do the same here, where flush and compactions happen only in the HBase cluster located in the datacenter where the row is owned. For example, only if the rowkey starts with the same datacenter as where the copro runs. This would ensure that each row is flushed/compacted only in one DC and the other DCs would be followers.

This would have to be configurable and disabled for installations with a single HBase instance that are written to remotely by multiple datacenters, otherwise no compaction will happen at all (at least perhaps functionally correct even if not optimal for space usage).


> Add the flush and compaction functionality via coprocessors and scanners for flow run table
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-4062
>                 URL: https://issues.apache.org/jira/browse/YARN-4062
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into the flow_run table. It also needs a flush & compaction processing in the coprocessor and perhaps a new scanner to deal with the data during flushing and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)