You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by talgr <ta...@gmail.com> on 2016/06/20 14:00:28 UTC
dense_rank skips ranks on cube
I have a dataframe with 7 dimensions,
I built a cube on them
val cube = df.cube('d1,'d2,'d3,'d4,'d5,'d6,'d7)
val cc = cube.agg(sum('p1).as("p1"),sum('p2).as("p2")).cache
and then defined a rank function on a window:
val rankSpec =
Window.partitionBy('d1,'d2,'d3,'d4,'d5,'d6).orderBy('p1.desc)
val grank = dense_rank().over(rankSpec)
val cubed = cc.withColumn("rank",grank)
when I do:
cubed.filter('d1.isNull && 'd2.isNull && 'd3.isNull && 'd4.isNull &&
'd5.isNull && 'd6.isNull && 'd7.isNotNull).sort('rank).show
i see that the first ranks are 3,5,9,10,11,12,13,15...
it seems that they becomes more dense on higher ranks.
Any idea?
Thanks
Tal
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/dense-rank-skips-ranks-on-cube-tp27196.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org