You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by infa elance <in...@gmail.com> on 2017/04/14 15:19:33 UTC

PySpark row_number Question

Hi All,
I trying to understand how row_number is applied In the below code, does
spark store data in a dataframe and then perform row_number function or
does it apply while reading from hive ?

from pyspark.sql import HiveContext
hiveContext = HiveContext(sc)
hiveContext.sql("
( SELECT colunm1 ,column2,column3, ROW_NUMBER() OVER (ORDER BY columnname)
AS RowNum FROM tablename )

Appreciate any guidance.