You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by satyajit vegesna <sa...@gmail.com> on 2017/12/08 04:25:16 UTC
RDD[internalRow] -> DataSet
Hi All,
Is there a way to convert RDD[internalRow] to Dataset , from outside spark
sql package.
Regards,
Satyajit.
Re: RDD[internalRow] -> DataSet
Posted by Jacek Laskowski <ja...@japila.pl>.
Hi Satyajit,
That's exactly what Dataset.rdd does -->
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala?utf8=%E2%9C%93#L2916-L2921
Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Fri, Dec 8, 2017 at 5:25 AM, satyajit vegesna <satyajit.apasprk@gmail.com
> wrote:
> Hi All,
>
> Is there a way to convert RDD[internalRow] to Dataset , from outside spark
> sql package.
>
> Regards,
> Satyajit.
>
Re: RDD[internalRow] -> DataSet
Posted by Vadim Semenov <va...@datadoghq.com>.
not possible, but you can add your own object in your project to the
spark's package that would give you access to private methods
package org.apache.spark.sql
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.execution.LogicalRDD
import org.apache.spark.sql.types.StructType
object DataFrameUtil {
/**
* Creates a DataFrame out of RDD[InternalRow] that you can get
using `df.queryExection.toRdd`
*/
def createFromInternalRows(sparkSession: SparkSession, schema:
StructType, rdd: RDD[InternalRow]): DataFrame = {
val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession)
Dataset.ofRows(sparkSession, logicalPlan)
}
}
Re: RDD[internalRow] -> DataSet
Posted by Jacek Laskowski <ja...@japila.pl>.
Hi Satyajit,
That's exactly what Dataset.rdd does -->
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala?utf8=%E2%9C%93#L2916-L2921
Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Fri, Dec 8, 2017 at 5:25 AM, satyajit vegesna <satyajit.apasprk@gmail.com
> wrote:
> Hi All,
>
> Is there a way to convert RDD[internalRow] to Dataset , from outside spark
> sql package.
>
> Regards,
> Satyajit.
>
Re: RDD[internalRow] -> DataSet
Posted by Vadim Semenov <va...@datadoghq.com>.
not possible, but you can add your own object in your project to the
spark's package that would give you access to private methods
package org.apache.spark.sql
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.execution.LogicalRDD
import org.apache.spark.sql.types.StructType
object DataFrameUtil {
/**
* Creates a DataFrame out of RDD[InternalRow] that you can get
using `df.queryExection.toRdd`
*/
def createFromInternalRows(sparkSession: SparkSession, schema:
StructType, rdd: RDD[InternalRow]): DataFrame = {
val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession)
Dataset.ofRows(sparkSession, logicalPlan)
}
}