You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Robert Metzger (JIRA)" <ji...@apache.org> on 2015/01/25 11:35:34 UTC
[jira] [Commented] (FLINK-1444) Add data properties for data
sources
[ https://issues.apache.org/jira/browse/FLINK-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291033#comment-14291033 ]
Robert Metzger commented on FLINK-1444:
---------------------------------------
I've prototyped something like this here: https://github.com/rmetzger/flink/compare/local_joins
Maybe its helpful for you.
> Add data properties for data sources
> ------------------------------------
>
> Key: FLINK-1444
> URL: https://issues.apache.org/jira/browse/FLINK-1444
> Project: Flink
> Issue Type: New Feature
> Components: Java API, JobManager, Optimizer
> Affects Versions: 0.9
> Reporter: Fabian Hueske
> Priority: Minor
>
> This issue proposes to add support for attaching data properties to data sources. These data properties are defined with respect to input splits.
> Possible properties are:
> - partitioning across splits: all elements of the same key (combination) are contained in one split
> - sorting / grouping with splits: elements are sorted or grouped on certain keys within a split
> - key uniqueness: a certain key (combination) is unique for all elements of the data source. This property is not defined wrt. input splits.
> The optimizer can leverage this information to generate more efficient execution plans.
> The InputFormat will be responsible to generate input splits such that the promised data properties are actually in place. Otherwise, the program will produce invalid results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)