You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Asif Khan (JIRA)" <ji...@apache.org> on 2018/09/22 14:41:00 UTC

[jira] [Created] (SPARK-25512) Using RowNumbers in SparkR Dataframe

Asif Khan created SPARK-25512:
---------------------------------

             Summary: Using RowNumbers in SparkR Dataframe
                 Key: SPARK-25512
                 URL: https://issues.apache.org/jira/browse/SPARK-25512
             Project: Spark
          Issue Type: Bug
          Components: SparkR
    Affects Versions: 2.3.1
            Reporter: Asif Khan


Hi,

I have a use case , where I have a  SparkR  dataframe and i want to iterate over the dataframe in a for loop using the row numbers  of the dataframe. Is it possible?

Only solution I have now is to collect() the SparkR dataframe in R dataframe , which brings the entire dataframe on Driver node and then iterate over it using row numbers. But as the for loop executes only on driver node, I don't get the advantage of parallel processing in Spark which was the whole purpose of using Spark. Please Help.

Thank You,

Asif Khan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org