You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Joanne Contact <jo...@gmail.com> on 2015/07/31 23:50:24 UTC

how to convert a sequence of TimeStamp to a dataframe

Hi Guys,

I have struggled for a while on this seeming simple thing:

I have a sequence of timestamps and want to create a dataframe with 1 column.

Seq[java.sql.Timestamp]

//import collection.breakOut

var seqTimestamp = scala.collection.Seq(listTs:_*)

seqTimestamp: Seq[java.sql.Timestamp] = List(2015-07-22 16:52:00.0,
2015-07-22 16:53:00.0, ....., )

I tried a lot of ways to create a dataframe and below is another failed way:

import sqlContext.implicits._
var rddTs = sc.parallelize(seqTimestamp)
rddTs.toDF("minInterval")

<console>:108: error: value toDF is not a member of
org.apache.spark.rdd.RDD[java.sql.Timestamp] rddTs.toDF("minInterval")

So, any guru could please tell me how to do this????

I am not familiar with Scala or Spark. I wonder if learning Scala will
help this at all? It just sounds a lot of time of trial/error and
googling.

docs like
https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/DataFrame.html
https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/SQLContext.html#createDataFrame(scala.collection.Seq,
scala.reflect.api.TypeTags.TypeTag)
does not help.

Btw, I am using Spark 1.4.

Thanks in advance,

J

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: how to convert a sequence of TimeStamp to a dataframe

Posted by Michael Armbrust <mi...@databricks.com>.
In general it needs to be a Seq of Tuples for the implicit toDF to work
(which is a little tricky when there is only one column).

scala> Seq(Tuple1(new
java.sql.Timestamp(System.currentTimeMillis))).toDF("a")
res3: org.apache.spark.sql.DataFrame = [a: timestamp]

or with multiple columns

scala> Seq(("1", new
java.sql.Timestamp(System.currentTimeMillis))).toDF("a", "b")
res4: org.apache.spark.sql.DataFrame = [a: string, b: timestamp]

On Fri, Jul 31, 2015 at 2:50 PM, Joanne Contact <jo...@gmail.com>
wrote:

> Hi Guys,
>
> I have struggled for a while on this seeming simple thing:
>
> I have a sequence of timestamps and want to create a dataframe with 1
> column.
>
> Seq[java.sql.Timestamp]
>
> //import collection.breakOut
>
> var seqTimestamp = scala.collection.Seq(listTs:_*)
>
> seqTimestamp: Seq[java.sql.Timestamp] = List(2015-07-22 16:52:00.0,
> 2015-07-22 16:53:00.0, ....., )
>
> I tried a lot of ways to create a dataframe and below is another failed
> way:
>
> import sqlContext.implicits._
> var rddTs = sc.parallelize(seqTimestamp)
> rddTs.toDF("minInterval")
>
> <console>:108: error: value toDF is not a member of
> org.apache.spark.rdd.RDD[java.sql.Timestamp] rddTs.toDF("minInterval")
>
> So, any guru could please tell me how to do this????
>
> I am not familiar with Scala or Spark. I wonder if learning Scala will
> help this at all? It just sounds a lot of time of trial/error and
> googling.
>
> docs like
>
> https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/DataFrame.html
>
> https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/SQLContext.html#createDataFrame(scala.collection.Seq
> ,
> scala.reflect.api.TypeTags.TypeTag)
> does not help.
>
> Btw, I am using Spark 1.4.
>
> Thanks in advance,
>
> J
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: how to convert a sequence of TimeStamp to a dataframe

Posted by Ted Yu <yu...@gmail.com>.
Please take a look at stringToTimestamp() in
./sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

Representing timestamp with long should work.

Cheers

On Fri, Jul 31, 2015 at 2:50 PM, Joanne Contact <jo...@gmail.com>
wrote:

> Hi Guys,
>
> I have struggled for a while on this seeming simple thing:
>
> I have a sequence of timestamps and want to create a dataframe with 1
> column.
>
> Seq[java.sql.Timestamp]
>
> //import collection.breakOut
>
> var seqTimestamp = scala.collection.Seq(listTs:_*)
>
> seqTimestamp: Seq[java.sql.Timestamp] = List(2015-07-22 16:52:00.0,
> 2015-07-22 16:53:00.0, ....., )
>
> I tried a lot of ways to create a dataframe and below is another failed
> way:
>
> import sqlContext.implicits._
> var rddTs = sc.parallelize(seqTimestamp)
> rddTs.toDF("minInterval")
>
> <console>:108: error: value toDF is not a member of
> org.apache.spark.rdd.RDD[java.sql.Timestamp] rddTs.toDF("minInterval")
>
> So, any guru could please tell me how to do this????
>
> I am not familiar with Scala or Spark. I wonder if learning Scala will
> help this at all? It just sounds a lot of time of trial/error and
> googling.
>
> docs like
>
> https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/DataFrame.html
>
> https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/SQLContext.html#createDataFrame(scala.collection.Seq
> ,
> scala.reflect.api.TypeTags.TypeTag)
> does not help.
>
> Btw, I am using Spark 1.4.
>
> Thanks in advance,
>
> J
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>