You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Dawid Wysakowicz (JIRA)" <ji...@apache.org> on 2016/04/01 13:29:25 UTC

[jira] [Commented] (FLINK-2946) Add orderBy() to Table API

    [ https://issues.apache.org/jira/browse/FLINK-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221567#comment-15221567 ] 

Dawid Wysakowicz commented on FLINK-2946:
-----------------------------------------

I still have some problems with range partitioning and parallelism. 

* First of all the {{org.apache.flink.api.java.DataSet}} that I get from {{translateToPlan}} does not have the method getParallelism. But that's a minor issue.
* I am not sure how to extract the eventual parallelism of the input and if I need to do this. Let's take this as example:

{code}
    val env = ExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    val t = env.fromElements((1, 3, "Third"), (1, 2, "Fourth"), (1, 4, "Second"),
      (2, 1, "Sixth"), (1, 5, "First"), (1, 1, "Fifth")).setParallelism(4)
      .toTable.orderBy('_1.asc, '_2.desc)
{code}

The dataset then looks like(the numbers in brackets is parallelism of operator): DataSource(4) -> MapOperator(-1) -> here I must apply either SortOperator or PartitionOperator -> SortOperator.

On what parallelism shall I decide if the PartitionOperator should be applied? What should be the parallelism of PartitionOperator?(By default it is the one from ExecutionEnvironment)

Hope I stated my problems clearly.

> Add orderBy() to Table API
> --------------------------
>
>                 Key: FLINK-2946
>                 URL: https://issues.apache.org/jira/browse/FLINK-2946
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API
>            Reporter: Timo Walther
>            Assignee: Dawid Wysakowicz
>
> In order to implement a FLINK-2099 prototype that uses the Table APIs code generation facilities, the Table API needs a sorting feature.
> I would implement it the next days. Ideas how to implement such a sorting feature are very welcome. Is there any more efficient way instead of {{.sortPartition(...).setParallism(1)}}? Is it better to sort locally on the nodes first and finally sort on one node afterwards?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)