You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Dawid Wysakowicz (JIRA)" <ji...@apache.org> on 2016/04/01 13:29:25 UTC
[jira] [Commented] (FLINK-2946) Add orderBy() to Table API
[ https://issues.apache.org/jira/browse/FLINK-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221567#comment-15221567 ]
Dawid Wysakowicz commented on FLINK-2946:
-----------------------------------------
I still have some problems with range partitioning and parallelism.
* First of all the {{org.apache.flink.api.java.DataSet}} that I get from {{translateToPlan}} does not have the method getParallelism. But that's a minor issue.
* I am not sure how to extract the eventual parallelism of the input and if I need to do this. Let's take this as example:
{code}
val env = ExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val t = env.fromElements((1, 3, "Third"), (1, 2, "Fourth"), (1, 4, "Second"),
(2, 1, "Sixth"), (1, 5, "First"), (1, 1, "Fifth")).setParallelism(4)
.toTable.orderBy('_1.asc, '_2.desc)
{code}
The dataset then looks like(the numbers in brackets is parallelism of operator): DataSource(4) -> MapOperator(-1) -> here I must apply either SortOperator or PartitionOperator -> SortOperator.
On what parallelism shall I decide if the PartitionOperator should be applied? What should be the parallelism of PartitionOperator?(By default it is the one from ExecutionEnvironment)
Hope I stated my problems clearly.
> Add orderBy() to Table API
> --------------------------
>
> Key: FLINK-2946
> URL: https://issues.apache.org/jira/browse/FLINK-2946
> Project: Flink
> Issue Type: New Feature
> Components: Table API
> Reporter: Timo Walther
> Assignee: Dawid Wysakowicz
>
> In order to implement a FLINK-2099 prototype that uses the Table APIs code generation facilities, the Table API needs a sorting feature.
> I would implement it the next days. Ideas how to implement such a sorting feature are very welcome. Is there any more efficient way instead of {{.sortPartition(...).setParallism(1)}}? Is it better to sort locally on the nodes first and finally sort on one node afterwards?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)