You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by talgr <ta...@gmail.com> on 2016/06/20 14:00:28 UTC

dense_rank skips ranks on cube

I have a dataframe with 7 dimensions,
I built a cube on them

val cube = df.cube('d1,'d2,'d3,'d4,'d5,'d6,'d7)
val cc = cube.agg(sum('p1).as("p1"),sum('p2).as("p2")).cache

and then defined a rank function on a window:

 val rankSpec =
Window.partitionBy('d1,'d2,'d3,'d4,'d5,'d6).orderBy('p1.desc)
 val grank = dense_rank().over(rankSpec)
 val cubed = cc.withColumn("rank",grank)

when I do: 
cubed.filter('d1.isNull && 'd2.isNull && 'd3.isNull && 'd4.isNull &&
'd5.isNull && 'd6.isNull && 'd7.isNotNull).sort('rank).show

i see that the first ranks are 3,5,9,10,11,12,13,15...

it seems that they becomes more dense on higher ranks.
Any idea?

Thanks
Tal



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/dense-rank-skips-ranks-on-cube-tp27196.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org