You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2016/08/05 18:27:20 UTC

[jira] [Commented] (TEZ-3230) Implement vertex manager and edge manager of cartesian product edge

    [ https://issues.apache.org/jira/browse/TEZ-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409872#comment-15409872 ] 

Ming Ma commented on TEZ-3230:
------------------------------

Thanks [~aplusplus]. Nice patch! Some comments are about design and others could be something you plan to do later.

* Does it support product of partitioned vertex and unpartitioned vertex, e.g. each destination vertex task joins data from one partition from source vertex A and one full task output from source vertex B?

* {{new TezException("Only on demand routing is supported in cartesian product edge manager”);}} in {{CartesianProductEdgeManager#routeDataMovementEventToDestination}}. It appears this is called when source sends DATA_MOVEMENT_EVENT instead of COMPOSITE_DATA_MOVEMENT_EVENT and thus has been deprecated. Is that correct?

* Does it handle special case when one source vertex has 0 task?

* When some partition/source task don’t have any data, does it still schedule those no-op join vertex tasks using the empty partition/source task?

* {{CatesianProductPayload.proto}} defines {{repeated string sourceVertices = 2; repeated int32 numPartitions = 3;}}. Alternatively you can define another protobuf structure for sourceVertex -> numPartition map, e.g.

{noformat}
message VertexToNumOfPartitionProto {
required string sourceVertices = 1;
required int numPartitions = 2;
};

...
repeated VertexToNumOfPartitionProto sourceVerticesToNumPartitions;
{noformat}

* Currently CartesianProductFilter uses static state  "sourceVertex -> numPartition map”. Wonder if you have a concrete use case from Hive. Also If later we need to support filtering based on run time state such as partition stats, the CartesianProductFilter needs to be extended?

* A concrete example in CartesianProductCombination will be useful, to show the values in combination, factor array, and how a destination task is resolved, etc.

* {{CartesianProductCombination(int[] numPartitionOrTask)}} assumes {{numPartitionOrTask.length >= 2}}. Is there a precondition check somewhere else already?

* {{CartesianProductCombination fromTaskId}} implements computation from destination task id to the combination. Maybe move the computation to a member function in {{CartesianProductCombination}}.

* CartesianProductVertexManagerUnpartitioned#handleCompletedSrcTask, it generates all combinations that satisfy the completed task id. As an optimization, if one vertex doesn’t have any completed tasks, then we can skip the scheduling. BTW another way to find all matched combinations is to start with combinations of the completed source task ids, e.g. the values set in BitSet.

* Is there any perf data in terms of memory and CPU?

> Implement vertex manager and edge manager of cartesian product edge
> -------------------------------------------------------------------
>
>                 Key: TEZ-3230
>                 URL: https://issues.apache.org/jira/browse/TEZ-3230
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Zhiyuan Yang
>            Assignee: Zhiyuan Yang
>         Attachments: TEZ-3230.1.patch, TEZ-3230.2.patch, TEZ-3230.3.patch, TEZ-3230.WIP.1.patch, TEZ-3230.WIP.2.patch, TEZ-3230.WIP.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)