You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by Gurleen Dhody <Gu...@microsoft.com.INVALID> on 2019/03/26 21:18:22 UTC

OutputDescriptor OutputSpec physicalEdgeCount set to 0 for DataSink?

Hello Tez devs,

My question is regarding the understanding of output descriptor through data sink attached to a vertex in the dag.

We attach a dataSink to the vertex and specify the DataSinkDescriptor. The OutputDescriptor in the DataSinkDescriptor specifies the output type the vertex processor/task will receive. However in VertexImpl in function setAdditionalOutputs the OutputSpec sets physicalEdgeCount to 0. This leads to the LogicalOutput the task receives in the given vertex have 0 numPhysicalOutputs.

My question is that is there any other way we can change the LogicalOutputs numPhysicalOutputs to be set to a value determined by our internal routing logic. Where this LogicalOutput OutputSpec is determined by the OutputDescriptor of the dataSink. We need this because the LogicalOutput is later used in the task to parameterize our call to internal framework that produces global output.

Also why do we manually set the value of physicalEdgeCount to 0. Then what purpose does the OutputDescriptor in DataSinkDescriptor serves when the LogicalOutput will always have 0 channels?

Thank You,
Gurleen