You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2013/08/30 02:04:52 UTC
[jira] [Commented] (TEZ-410) Refactor Edge Connection Pattern to be
more clear
[ https://issues.apache.org/jira/browse/TEZ-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754234#comment-13754234 ]
Hitesh Shah commented on TEZ-410:
---------------------------------
Comments:
{code}
+ default : throw new RuntimeException("unknown 'SchedulingType'");
{code}
- might help to add the actual value to what enum was not handled
- may be required in other places in the same class ( DagTypeConverters.java )
{code}
+ /**
+ * Data produced by the source task is persisted and available even when the
+ * task is not running. The data may be unavailable and may cause the source
+ * task to be re-executed.
+ */
+ PERSISTED,
{code}
- "... data may be*come* unavailable ... "
- "source task is stored in reliably" --> remove the "in" ?
Looks good apart from the above minor nits. Good to commit after addressing above.
> Refactor Edge Connection Pattern to be more clear
> -------------------------------------------------
>
> Key: TEZ-410
> URL: https://issues.apache.org/jira/browse/TEZ-410
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-410.1.patch, TEZ-410.2.patch, TEZ-410.3.patch, TEZ-410.4.patch
>
>
> During discussion with users there was feedback that edge properties need to be named better to make them more clear. There was a suggestion to look at MPI for inspiration. Based on that feedback, the proposal is to renamed ConnectionPattern to DataMovement as that is essentially what the property is defining. A Bipartite connection pattern can be constructed from both broadcast and scatter-gather data movement types. There will be 3 kinds of data movements initially.
> ONE_TO_ONE - Defines an output produced by the ith upstream task is available the the ith downstream task.
> BROADCAST - Defines an output produced by any upstream task is available to all downstream tasks.
> SCATTER_GATHER - Defines that the ith output produced by all upstream tasks is available to the same downstream task. Upstream tasks scatter there outputs and they are gathered by designated downstream tasks.
> To be clear, output being available to the a task does not imply that the entire output is transferred/read by it. The task can choose to read any amount of the total data.
> Current users: In the EdgeProperty object
> Please change EdgeConnectionPattern.BIPARTITE -> DataMovementType.SCATTER_GATHER
> Please change SourceType.STABLE -> DataSourceType.PERSISTED
> Please add SchedulingType.SEQUENTIAL to EdgeProperty objects.
> The getter methods have similar name changes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira