You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Dan Bikle <bi...@gmail.com> on 2016/09/25 11:57:19 UTC
In Spark-Scala, how to copy Array of Lists into new DataFrame?
Hello World,
I am familiar with Python and I am learning Spark-Scala.
I want to build a DataFrame which has structure desribed by this syntax:
*// Prepare training data from a list of (label, features) tuples.val
training = spark.createDataFrame(Seq( (1.1, Vectors.dense(1.1, 0.1)),
(0.2, Vectors.dense(1.0, -1.0)), (3.0, Vectors.dense(1.3, 1.0)), (1.0,
Vectors.dense(1.2, -0.5)))).toDF("label", "features")*
I got the above syntax from this URL:
http://spark.apache.org/docs/latest/ml-pipeline.html
Currently my data is in array which I had pulled out of a DF:
*val my_a = gspc17_df.collect().map{row =>
Seq(row(2),Vectors.dense(row(3).asInstanceOf[Double],row(4).asInstanceOf[Double]))}*
The structure of my array is very similar to the above DF:
*my_a: Array[Seq[Any]] =Array( List(-1.4830674013266898,
[-0.004192832940431825,-0.003170667657263393]), List(-0.05876766500768526,
[-0.008462913654529357,-0.006880595828929472]), List(1.0109273250546658,
[-3.1816797620416693E-4,-0.006502619326182358]))*
How to copy data from my array into a DataFrame which has the above
structure?
I tried this syntax:
*val my_df = spark.createDataFrame(my_a).toDF("label","features")*
Spark barked at me:
*<console>:105: error: inferred type arguments [Seq[Any]] do not conform to
method createDataFrame's type parameter bounds [A <: Product] val
my_df =
spark.createDataFrame(my_a).toDF("label","features")
^<console>:105: error: type mismatch; found :
scala.collection.mutable.WrappedArray[Seq[Any]] required: Seq[A] val
my_df =
spark.createDataFrame(my_a).toDF("label","features")
^scala> *
Re: In Spark-Scala, how to copy Array of Lists into new DataFrame?
Posted by Marco Mistroni <mm...@gmail.com>.
Hi
in fact i have just found some written notes in my code.... see if this
docs help you (it will work with any spark versions, not only 1.3.0)
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#creating-dataframes
hth
On Sun, Sep 25, 2016 at 1:25 PM, Marco Mistroni <mm...@gmail.com> wrote:
> Hi
>
> i must admit , i had issues as well in finding a sample that does that,
> (hopefully Spark folks can add more examples or someone on the list can
> post a sample code?)
>
> hopefully you can reuse sample below
> So, you start from an rdd of doubles (myRdd)
>
> ## make a row
> val toRddOfRows = myRdd.map(doubleValues => Row.fromSeq(doubleValues)
>
> # then you can either call toDF directly. spk will build a schema for
> you..beware you will need to import import org.apache.spark.sql.
> SQLImplicits
>
> val df = toRddOfRows.toDF()
>
> # or you can create a schema yourself
> def createSchema(row: Row) = {
> val first = row.toSeq
> val firstWithIdx = first.zipWithIndex
> val fields = firstWithIdx.map(tpl => StructField("Col" + tpl._2,
> DoubleType, false))
> StructType(fields)
>
> }
>
> val mySchema = createSchema(toRddOfRow.first())
>
> // returning DataFrame
> val mydf = sqlContext.createDataFrame(toRddOfRow, schema)
>
>
> hth
>
>
>
>
>
> U need to define a schema to make a df out of your list... check spark
> docs on how to make a df or some machine learning examples
>
> On 25 Sep 2016 12:57 pm, "Dan Bikle" <bi...@gmail.com> wrote:
>
>> Hello World,
>>
>> I am familiar with Python and I am learning Spark-Scala.
>>
>> I want to build a DataFrame which has structure desribed by this syntax:
>>
>>
>>
>>
>>
>>
>>
>>
>> *// Prepare training data from a list of (label, features) tuples.val
>> training = spark.createDataFrame(Seq( (1.1, Vectors.dense(1.1, 0.1)),
>> (0.2, Vectors.dense(1.0, -1.0)), (3.0, Vectors.dense(1.3, 1.0)), (1.0,
>> Vectors.dense(1.2, -0.5)))).toDF("label", "features")*
>> I got the above syntax from this URL:
>>
>> http://spark.apache.org/docs/latest/ml-pipeline.html
>>
>> Currently my data is in array which I had pulled out of a DF:
>>
>>
>> *val my_a = gspc17_df.collect().map{row =>
>> Seq(row(2),Vectors.dense(row(3).asInstanceOf[Double],row(4).asInstanceOf[Double]))}*
>> The structure of my array is very similar to the above DF:
>>
>>
>>
>>
>>
>>
>> *my_a: Array[Seq[Any]] =Array( List(-1.4830674013266898,
>> [-0.004192832940431825,-0.003170667657263393]), List(-0.05876766500768526,
>> [-0.008462913654529357,-0.006880595828929472]), List(1.0109273250546658,
>> [-3.1816797620416693E-4,-0.006502619326182358]))*
>> How to copy data from my array into a DataFrame which has the above
>> structure?
>>
>> I tried this syntax:
>>
>>
>> *val my_df = spark.createDataFrame(my_a).toDF("label","features")*
>> Spark barked at me:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *<console>:105: error: inferred type arguments [Seq[Any]] do not conform
>> to method createDataFrame's type parameter bounds [A <: Product] val
>> my_df =
>> spark.createDataFrame(my_a).toDF("label","features")
>> ^<console>:105: error: type mismatch; found :
>> scala.collection.mutable.WrappedArray[Seq[Any]] required: Seq[A] val
>> my_df =
>> spark.createDataFrame(my_a).toDF("label","features")
>> ^scala> *
>>
>
Re: In Spark-Scala, how to copy Array of Lists into new DataFrame?
Posted by Marco Mistroni <mm...@gmail.com>.
Hi
i must admit , i had issues as well in finding a sample that does that,
(hopefully Spark folks can add more examples or someone on the list can
post a sample code?)
hopefully you can reuse sample below
So, you start from an rdd of doubles (myRdd)
## make a row
val toRddOfRows = myRdd.map(doubleValues => Row.fromSeq(doubleValues)
# then you can either call toDF directly. spk will build a schema for
you..beware you will need to import import
org.apache.spark.sql.SQLImplicits
val df = toRddOfRows.toDF()
# or you can create a schema yourself
def createSchema(row: Row) = {
val first = row.toSeq
val firstWithIdx = first.zipWithIndex
val fields = firstWithIdx.map(tpl => StructField("Col" + tpl._2,
DoubleType, false))
StructType(fields)
}
val mySchema = createSchema(toRddOfRow.first())
// returning DataFrame
val mydf = sqlContext.createDataFrame(toRddOfRow, schema)
hth
U need to define a schema to make a df out of your list... check spark docs
on how to make a df or some machine learning examples
On 25 Sep 2016 12:57 pm, "Dan Bikle" <bi...@gmail.com> wrote:
> Hello World,
>
> I am familiar with Python and I am learning Spark-Scala.
>
> I want to build a DataFrame which has structure desribed by this syntax:
>
>
>
>
>
>
>
>
> *// Prepare training data from a list of (label, features) tuples.val
> training = spark.createDataFrame(Seq( (1.1, Vectors.dense(1.1, 0.1)),
> (0.2, Vectors.dense(1.0, -1.0)), (3.0, Vectors.dense(1.3, 1.0)), (1.0,
> Vectors.dense(1.2, -0.5)))).toDF("label", "features")*
> I got the above syntax from this URL:
>
> http://spark.apache.org/docs/latest/ml-pipeline.html
>
> Currently my data is in array which I had pulled out of a DF:
>
>
> *val my_a = gspc17_df.collect().map{row =>
> Seq(row(2),Vectors.dense(row(3).asInstanceOf[Double],row(4).asInstanceOf[Double]))}*
> The structure of my array is very similar to the above DF:
>
>
>
>
>
>
> *my_a: Array[Seq[Any]] =Array( List(-1.4830674013266898,
> [-0.004192832940431825,-0.003170667657263393]), List(-0.05876766500768526,
> [-0.008462913654529357,-0.006880595828929472]), List(1.0109273250546658,
> [-3.1816797620416693E-4,-0.006502619326182358]))*
> How to copy data from my array into a DataFrame which has the above
> structure?
>
> I tried this syntax:
>
>
> *val my_df = spark.createDataFrame(my_a).toDF("label","features")*
> Spark barked at me:
>
>
>
>
>
>
>
>
>
>
> *<console>:105: error: inferred type arguments [Seq[Any]] do not conform
> to method createDataFrame's type parameter bounds [A <: Product] val
> my_df =
> spark.createDataFrame(my_a).toDF("label","features")
> ^<console>:105: error: type mismatch; found :
> scala.collection.mutable.WrappedArray[Seq[Any]] required: Seq[A] val
> my_df =
> spark.createDataFrame(my_a).toDF("label","features")
> ^scala> *
>