You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by satyajit vegesna <sa...@gmail.com> on 2017/12/08 04:25:16 UTC

RDD[internalRow] -> DataSet

Hi All,

Is there a way to convert RDD[internalRow] to Dataset , from outside spark
sql package.

Regards,
Satyajit.

Re: RDD[internalRow] -> DataSet

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi Satyajit,

That's exactly what Dataset.rdd does -->
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala?utf8=%E2%9C%93#L2916-L2921

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Fri, Dec 8, 2017 at 5:25 AM, satyajit vegesna <satyajit.apasprk@gmail.com
> wrote:

> Hi All,
>
> Is there a way to convert RDD[internalRow] to Dataset , from outside spark
> sql package.
>
> Regards,
> Satyajit.
>

Re: RDD[internalRow] -> DataSet

Posted by Vadim Semenov <va...@datadoghq.com>.
not possible, but you can add your own object in your project to the
spark's package that would give you access to private methods

package org.apache.spark.sql

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.execution.LogicalRDD
import org.apache.spark.sql.types.StructType

object DataFrameUtil {
  /**
    * Creates a DataFrame out of RDD[InternalRow] that you can get
using `df.queryExection.toRdd`
    */
  def createFromInternalRows(sparkSession: SparkSession, schema:
StructType, rdd: RDD[InternalRow]): DataFrame = {
    val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession)
    Dataset.ofRows(sparkSession, logicalPlan)
  }
}

Re: RDD[internalRow] -> DataSet

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi Satyajit,

That's exactly what Dataset.rdd does -->
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala?utf8=%E2%9C%93#L2916-L2921

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Fri, Dec 8, 2017 at 5:25 AM, satyajit vegesna <satyajit.apasprk@gmail.com
> wrote:

> Hi All,
>
> Is there a way to convert RDD[internalRow] to Dataset , from outside spark
> sql package.
>
> Regards,
> Satyajit.
>

Re: RDD[internalRow] -> DataSet

Posted by Vadim Semenov <va...@datadoghq.com>.
not possible, but you can add your own object in your project to the
spark's package that would give you access to private methods

package org.apache.spark.sql

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.execution.LogicalRDD
import org.apache.spark.sql.types.StructType

object DataFrameUtil {
  /**
    * Creates a DataFrame out of RDD[InternalRow] that you can get
using `df.queryExection.toRdd`
    */
  def createFromInternalRows(sparkSession: SparkSession, schema:
StructType, rdd: RDD[InternalRow]): DataFrame = {
    val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession)
    Dataset.ofRows(sparkSession, logicalPlan)
  }
}