You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2015/06/01 18:59:17 UTC

[jira] [Commented] (YARN-3699) Decide if flow version should be part of row key or column

    [ https://issues.apache.org/jira/browse/YARN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567573#comment-14567573 ] 

Junping Du commented on YARN-3699:
----------------------------------

Hi [~jrottinghuis] and [~vrushalic], thanks for your comments and sorry for replying late on this as traveling last week. 
I fully agree with Joep's above comments that there is no right or wrong schema but just fit-in one for priority scenarios:
- if we need more for flow_run under specific flow/flows, then making flow version as column will make this query more efficient.
- if we equally (or more) need for flow_run under specific flow version(s), then our decision here could be different.
To me, the tricky/interesting part here is the boundary between different flows and flow versions could vague in practice: How big/small changes we made on a flow should start a new flow or new flow version? Why we have more active flow versions instead of having only one active flow version (with adding more flows). These trade-offs in application concepts also affect our trade-off in schema design which is pretty common thing that I saw also from other apps.
I would like to trust your priority here given your experience from hRaven which is already in production running well for years. So I agree Phoenix schema should be adjusted slightly to get closed to HBase one. 
May be we should have a new JIRA for this (Phoenix schema) change? We can either keep this JIRA open for discussion or resolve it as later so in future, if others from community bring other solid scenarios in practice, we can continue the discussion here and try to make better trade-off or innovation. Thoughts?

> Decide if  flow version should be part of row key or column
> -----------------------------------------------------------
>
>                 Key: YARN-3699
>                 URL: https://issues.apache.org/jira/browse/YARN-3699
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vrushali C
>
> Based on discussions in YARN-3411 with [~djp], filing jira for continuing discussion on putting the flow version in rowkey or column. 
> Either phoenix/hbase approach will update the jira with the conclusions..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)