You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "Ulanov, Alexander" <al...@hpe.com> on 2015/09/14 19:50:29 UTC

Data frame with one column

Dear Spark developers,

I would like to create a dataframe with one column. However, the createDataFrame method accepts at least a Product:

val data = Seq(1.0, 2.0)
val rdd = sc.parallelize(data, 2)
val df = sqlContext.createDataFrame(rdd)
[fail]<console>:25: error: overloaded method value createDataFrame with alternatives:
 [A <: Product](data: Seq[A])(implicit evidence$2: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and>
  [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
cannot be applied to (org.apache.spark.rdd.RDD[Double])
       val df = sqlContext.createDataFrame(rdd)

So, if I zip rdd with index, then it is OK:
val df = sqlContext.createDataFrame(rdd.zipWithIndex)
[success]df: org.apache.spark.sql.DataFrame = [_1: double, _2: bigint]

Also, if I use the case class, it also seems to work:
case class Hack(x: Double)
val caseRDD = rdd.map( x => Hack(x))
val df = sqlContext.createDataFrame(caseRDD)
[success]df: org.apache.spark.sql.DataFrame = [x: double]

What is the recommended way of creating a dataframe with one column?

Best regards, Alexander

RE: Data frame with one column

Posted by "Ulanov, Alexander" <al...@hpe.com>.

Thank you for quick response! I’ll use Tuple1

From: Feynman Liang [mailto:fliang@databricks.com]
Sent: Monday, September 14, 2015 11:05 AM
To: Ulanov, Alexander
Cc: dev@spark.apache.org
Subject: Re: Data frame with one column

For an example, see the ml-feature word2vec user guide<https://spark.apache.org/docs/latest/ml-features.html#word2vec>

On Mon, Sep 14, 2015 at 11:03 AM, Feynman Liang <fl...@databricks.com>> wrote:
You could use `Tuple1(x)` instead of `Hack`

On Mon, Sep 14, 2015 at 10:50 AM, Ulanov, Alexander <al...@hpe.com>> wrote:
Dear Spark developers,

I would like to create a dataframe with one column. However, the createDataFrame method accepts at least a Product:

val data = Seq(1.0, 2.0)
val rdd = sc.parallelize(data, 2)
val df = sqlContext.createDataFrame(rdd)
[fail]<console>:25: error: overloaded method value createDataFrame with alternatives:
 [A <: Product](data: Seq[A])(implicit evidence$2: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and>
  [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
cannot be applied to (org.apache.spark.rdd.RDD[Double])
       val df = sqlContext.createDataFrame(rdd)

So, if I zip rdd with index, then it is OK:
val df = sqlContext.createDataFrame(rdd.zipWithIndex)
[success]df: org.apache.spark.sql.DataFrame = [_1: double, _2: bigint]

Also, if I use the case class, it also seems to work:
case class Hack(x: Double)
val caseRDD = rdd.map( x => Hack(x))
val df = sqlContext.createDataFrame(caseRDD)
[success]df: org.apache.spark.sql.DataFrame = [x: double]

What is the recommended way of creating a dataframe with one column?

Best regards, Alexander

Re: Data frame with one column

Posted by Feynman Liang <fl...@databricks.com>.

For an example, see the ml-feature word2vec user guide
<https://spark.apache.org/docs/latest/ml-features.html#word2vec>

On Mon, Sep 14, 2015 at 11:03 AM, Feynman Liang <fl...@databricks.com>
wrote:

> You could use `Tuple1(x)` instead of `Hack`
>
> On Mon, Sep 14, 2015 at 10:50 AM, Ulanov, Alexander <
> alexander.ulanov@hpe.com> wrote:
>
>> Dear Spark developers,
>>
>>
>>
>> I would like to create a dataframe with one column. However, the
>> createDataFrame method accepts at least a Product:
>>
>>
>>
>> val data = Seq(1.0, 2.0)
>>
>> val rdd = sc.parallelize(data, 2)
>>
>> val df = sqlContext.createDataFrame(rdd)
>>
>> [fail]<console>:25: error: overloaded method value createDataFrame with
>> alternatives:
>>
>>  [A <: Product](data: Seq[A])(implicit evidence$2:
>> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and>
>>
>>   [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1:
>> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
>>
>> cannot be applied to (org.apache.spark.rdd.RDD[Double])
>>
>>        val df = sqlContext.createDataFrame(rdd)
>>
>>
>>
>> So, if I zip rdd with index, then it is OK:
>>
>> val df = sqlContext.createDataFrame(rdd.zipWithIndex)
>>
>> [success]df: org.apache.spark.sql.DataFrame = [_1: double, _2: bigint]
>>
>>
>>
>> Also, if I use the case class, it also seems to work:
>>
>> case class Hack(x: Double)
>>
>> val caseRDD = rdd.map( x => Hack(x))
>>
>> val df = sqlContext.createDataFrame(caseRDD)
>>
>> [success]df: org.apache.spark.sql.DataFrame = [x: double]
>>
>>
>>
>> What is the recommended way of creating a dataframe with one column?
>>
>>
>>
>> Best regards, Alexander
>>
>
>

Re: Data frame with one column

Posted by Feynman Liang <fl...@databricks.com>.

You could use `Tuple1(x)` instead of `Hack`

On Mon, Sep 14, 2015 at 10:50 AM, Ulanov, Alexander <
alexander.ulanov@hpe.com> wrote:

> Dear Spark developers,
>
>
>
> I would like to create a dataframe with one column. However, the
> createDataFrame method accepts at least a Product:
>
>
>
> val data = Seq(1.0, 2.0)
>
> val rdd = sc.parallelize(data, 2)
>
> val df = sqlContext.createDataFrame(rdd)
>
> [fail]<console>:25: error: overloaded method value createDataFrame with
> alternatives:
>
>  [A <: Product](data: Seq[A])(implicit evidence$2:
> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and>
>
>   [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1:
> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
>
> cannot be applied to (org.apache.spark.rdd.RDD[Double])
>
>        val df = sqlContext.createDataFrame(rdd)
>
>
>
> So, if I zip rdd with index, then it is OK:
>
> val df = sqlContext.createDataFrame(rdd.zipWithIndex)
>
> [success]df: org.apache.spark.sql.DataFrame = [_1: double, _2: bigint]
>
>
>
> Also, if I use the case class, it also seems to work:
>
> case class Hack(x: Double)
>
> val caseRDD = rdd.map( x => Hack(x))
>
> val df = sqlContext.createDataFrame(caseRDD)
>
> [success]df: org.apache.spark.sql.DataFrame = [x: double]
>
>
>
> What is the recommended way of creating a dataframe with one column?
>
>
>
> Best regards, Alexander
>