You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Kostas Tzoumas (JIRA)" <ji...@apache.org> on 2014/10/27 18:59:34 UTC
[jira] [Created] (TEZ-1704) Derive from Edge configs
Kostas Tzoumas created TEZ-1704:
-----------------------------------
Summary: Derive from Edge configs
Key: TEZ-1704
URL: https://issues.apache.org/jira/browse/TEZ-1704
Project: Apache Tez
Issue Type: Wish
Affects Versions: 0.5.2
Reporter: Kostas Tzoumas
I am working on making Apache Flink run on top of Tez.
Flink uses its own serialization and deserialization machinery and
does not rely on Hadoop Writables.
To pass data between Tez processors, we encapsulate objects that are
(de)serialized by Flink inside a Hadoop writable, and use that
writable as the value in the Tez key-value pairs that are being read
and written by operators. This requires a Flink type serializer object
to be present at the Tez reader and the input classes.
To do that, we had to create a custom input reader and a custom input that derive from KeyValueReader and AbstractLogical input respectively:
https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/runtime/input/FlinkUnorderedKVInput.java
https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/runtime/input/FlinkUnorderedKVReader.java
This also meant creating custom edge configs to return the correct
input type (in this case FlinkUnorderedKVInput):
https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/dag/FlinkUnorderedKVEdgeConfig.java
https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/dag/FlinkUnorderedPartitionedKVEdgeConfig.java
To create these, we needed to derive from UnorderedKVEdgeConfig and
UnorderedPartitionedKVEdgeConfig respectively, and change some fields
from private to protected (a patch showing the changes is attached).
We are not using the sorting facilities of Tez, we rather use the
Flink sort operators inside Tez processors. This is the reason that
the Ordered classes are not modified.
I was wondering if there might be a better way to do this, and if not,
whether the change described in the patch would be acceptable for the next Tez release.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)