You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Kostas Tzoumas (JIRA)" <ji...@apache.org> on 2014/10/27 18:59:34 UTC

[jira] [Created] (TEZ-1704) Derive from Edge configs

Kostas Tzoumas created TEZ-1704:
-----------------------------------

             Summary: Derive from Edge configs
                 Key: TEZ-1704
                 URL: https://issues.apache.org/jira/browse/TEZ-1704
             Project: Apache Tez
          Issue Type: Wish
    Affects Versions: 0.5.2
            Reporter: Kostas Tzoumas


I am working on making Apache Flink run on top of Tez.

Flink uses its own serialization and deserialization machinery and
does not rely on Hadoop Writables. 

To pass data between Tez processors, we encapsulate objects that are
(de)serialized by Flink inside a Hadoop writable, and use that
writable as the value in the Tez key-value pairs that are being read
and written by operators. This requires a Flink type serializer object
to be present at the Tez reader and the input classes.

To do that, we had to create a custom input reader and a custom input that derive from KeyValueReader and AbstractLogical input respectively:

https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/runtime/input/FlinkUnorderedKVInput.java

https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/runtime/input/FlinkUnorderedKVReader.java

This also meant creating custom edge configs to return the correct
input type (in this case FlinkUnorderedKVInput):

https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/dag/FlinkUnorderedKVEdgeConfig.java

https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/dag/FlinkUnorderedPartitionedKVEdgeConfig.java

To create these, we needed to derive from UnorderedKVEdgeConfig and
UnorderedPartitionedKVEdgeConfig respectively, and change some fields
from private to protected (a patch showing the changes is attached).

We are not using the sorting facilities of Tez, we rather use the
Flink sort operators inside Tez processors. This is the reason that
the Ordered classes are not modified.

I was wondering if there might be a better way to do this, and if not,
whether the change described in the patch would be acceptable for the next Tez release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)