You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Joep Rottinghuis (JIRA)" <ji...@apache.org> on 2016/09/28 19:07:20 UTC

[jira] [Commented] (YARN-4561) Compaction coprocessor enhancements: On/Off, whitelisting, blacklisting

    [ https://issues.apache.org/jira/browse/YARN-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530564#comment-15530564 ] 

Joep Rottinghuis commented on YARN-4561:
----------------------------------------

For late arriving data that was spooled using whatever we come up with in YARN-4061, we can still recognize records that came in after we got rid of the finalized application's last write.
If we can have the _first_ write marked as a special value, just like the final value is marked, then we can later on distinguish the case where we have a later-arriving update.

Aside from the very first record (marked special) we will always have an existing record that will slide in front of or behind exising values. In other words, we see either a new value and it is the first, or we see multiple values. We could use that to see if late values come in. The trick will be to see if we can do this after the fact ( in the read or flush compaction) rather than during writes. This may have to be a write-time copro that processes data only if it is later than the normal timewindow (more than a day old). For those cases we might have to do reads during writes, or at least mark records as suspicious for later analysis.

> Compaction coprocessor enhancements: On/Off, whitelisting, blacklisting
> -----------------------------------------------------------------------
>
>                 Key: YARN-4561
>                 URL: https://issues.apache.org/jira/browse/YARN-4561
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>              Labels: YARN-5355
>
> YARN-4062 deals with the flush and compaction related coprocessor basic functionality. We also need to ensure we can turn compaction on/off as a whole (in case of dealing with production issues) as well as provide a way to allow for blacklisting and whitelisting of processing compaction for certain records.
> For instance, we may want to compact only those records which belong to applications in that datacenter. This way we donot interfere with hbase replication causing coprocessors to process the same record in more than one dc at the same time.
> Also, we might want to not compact/process certain records, perhaps whose rowkey matches a certain criteria.
> Filing jira to track these enhancements



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org