You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Thomas FOURNIER <th...@gmail.com> on 2016/11/13 22:38:49 UTC
Hello,
I'm trying to assign a unique (and deterministic) ID to a globally sorted
DataSet.
Given a DataSet of String, I can compute the frequency of each label as
follows:
val env = ExecutionEnvironment.getExecutionEnvironment
val data = env.fromCollection(List("a","b","c","a","a","d","a","a","a","b","b","c","a","c","b","c"))
val mapping = data.map(s => (s,1))
.groupBy(0)
.reduce((a,b) => (a._1, a._2 + b._2))
.partitionByRange(1)
.sortPartition(1, Order.DESCENDING)
I want the most frequent label to be ID 0 and so on in decreasing order. My
idea was to use zipWithIndex. But this does not guarantee that my DataSet
will be