You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Greg Hogan (JIRA)" <ji...@apache.org> on 2016/05/13 18:18:13 UTC
[jira] [Created] (FLINK-3910) New self-join operator
Greg Hogan created FLINK-3910:
---------------------------------
Summary: New self-join operator
Key: FLINK-3910
URL: https://issues.apache.org/jira/browse/FLINK-3910
Project: Flink
Issue Type: New Feature
Components: DataSet API, Java API, Scala API
Affects Versions: 1.1.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Flink currently provides inner- and outer-joins as well as cogroup and the non-keyed cross. {{JoinOperator}} hints at future support for semi- and anti-joins.
Many Gelly algorithms perform a self-join [0]. Still pending reviews, FLINK-3768 performs a self-join on non-skewed data in TriangleListing.java and FLINK-3780 performs a self-join on skewed data in JaccardSimilarity.java. A {{SelfJoinHint}} will select between skewed and non-skewed implementations.
The object-reuse-disabled case can be simply handled with a new {{Operator}}. The object-reuse-enabled case requires either {{CopyableValue}} types (as in the code above) or a custom driver which has access to the serializer (or making the serializer accessible to rich functions, and I think there be dragons).
If the idea of a self-join is agreeable, I'd like to work out a rough implementation and go from there.
[0] https://en.wikipedia.org/wiki/Join_%28SQL%29#Self-join
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)