You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/02/02 02:51:05 UTC

how to covert millisecond time to SQL timeStamp

What little I know about working with timestamps is based on
https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-da
tetimestring-handling-time-intervals-and-udafs.html

Using the example of dates formatted into human friend strings -> timeStamps
I was able to figure out how to convert Epoch times to timestamps. The same
trick did not work for millisecond times.

Any suggestions would be greatly appreciated.


Andy

Working with epoch times
<http://localhost:8888/notebooks/sparkTimeSeriesExperiments.ipynb#Working-wi
th-epock-times> 
ref: http://www.epochconverter.com/
Epoch timestamp:  1456050620
Timestamp in milliseconds: 1456050620000
Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
Human time (your time zone): 2/21/2016, 2:30:20 AM

# Epoch time stamp example
data = [
  ("1456050620", "1456050621", 1),
  ("1456050622", "14560506203", 2),
  ("14560506204", "14560506205", 3)]
df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])

# convert epoch time strings in to spark timestamps
df = df.select(
  df.start_time.cast("long").alias("start_time"),
  df.end_time.cast("long").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)

# convert longs to timestamps
df = df.select(
  df.start_time.cast("timestamp").alias("start_time"),
  df.end_time.cast("timestamp").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)

root
 |-- start_time: long (nullable = true)
 |-- end_time: long (nullable = true)
 |-- id: long (nullable = true)

+-----------+-----------+---+
|start_time |end_time   |id |
+-----------+-----------+---+
|1456050620 |1456050621 |1  |
|1456050622 |14560506203|2  |
|14560506204|14560506205|3  |
+-----------+-----------+---+

root
 |-- start_time: timestamp (nullable = true)
 |-- end_time: timestamp (nullable = true)
 |-- id: long (nullable = true)

+---------------------+---------------------+---+
|start_time           |end_time             |id |
+---------------------+---------------------+---+
|2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
|2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
|2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
+---------------------+---------------------+---+

In [21]:
# working with millisecond times
data = [
  ("1456050620000", "1456050620000", 1)]
  
df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])

# convert epoch time strings in to spark timestamps
df = df.select(
  df.start_time.cast("long").alias("start_time"),
  df.end_time.cast("long").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)

# convert longs to timestamps
df = df.select(
  df.start_time.cast("timestamp").alias("start_time"),
  df.end_time.cast("timestamp").alias("end_time"),
  df.id)
df.printSchema()
df.show(truncate=False)
root
 |-- start_time: long (nullable = true)
 |-- end_time: long (nullable = true)
 |-- id: long (nullable = true)

+-------------+-------------+---+
|start_time   |end_time     |id |
+-------------+-------------+---+
|1456050620000|1456050620000|1  |
+-------------+-------------+---+

root
 |-- start_time: timestamp (nullable = true)
 |-- end_time: timestamp (nullable = true)
 |-- id: long (nullable = true)

+----------------------+----------------------+---+
|start_time            |end_time              |id |
+----------------------+----------------------+---+
|48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1  |
+----------------------+----------------------+---+

Re: how to covert millisecond time to SQL timeStamp

Posted by VISHNU SUBRAMANIAN <jo...@gmail.com>.

HI ,

If you need a data frame specific solution , you can try the below

df.select(from_unixtime(col("max(utcTimestamp)")/1000))

On Tue, 2 Feb 2016 at 09:44 Ted Yu <yu...@gmail.com> wrote:

> See related thread on using Joda DateTime:
> http://search-hadoop.com/m/q3RTtSfi342nveex1&subj=RE+NPE+
> when+using+Joda+DateTime
>
> On Mon, Feb 1, 2016 at 7:44 PM, Kevin Mellott <ke...@gmail.com>
> wrote:
>
>> I've had pretty good success using Joda-Time
>> <http://www.joda.org/joda-time/index.html> for date/time manipulations
>> within Spark applications. You may be able to use the *DateTIme* constructor
>> below, if you are starting with milliseconds.
>>
>> DateTime
>>
>> public DateTime(long instant)
>>
>> Constructs an instance set to the milliseconds from 1970-01-01T00:00:00Z
>> using ISOChronology in the default time zone.
>> Parameters:instant - the milliseconds from 1970-01-01T00:00:00Z
>>
>> On Mon, Feb 1, 2016 at 5:51 PM, Andy Davidson <
>> Andy@santacruzintegration.com> wrote:
>>
>>> What little I know about working with timestamps is based on
>>> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html
>>>
>>> Using the example of dates formatted into human friend strings ->
>>> timeStamps I was able to figure out how to convert Epoch times to
>>> timestamps. The same trick did not work for millisecond times.
>>>
>>> Any suggestions would be greatly appreciated.
>>>
>>>
>>> Andy
>>>
>>> Working with epoch times
>>> <http://localhost:8888/notebooks/sparkTimeSeriesExperiments.ipynb#Working-with-epock-times>
>>>
>>> ref: http://www.epochconverter.com/
>>>
>>> Epoch timestamp: 1456050620
>>>
>>> Timestamp in milliseconds: 1456050620000
>>>
>>> Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
>>>
>>> Human time (your time zone): 2/21/2016, 2:30:20 AM
>>>
>>>
>>> # Epoch time stamp example
>>>
>>> data = [
>>>
>>>   ("1456050620", "1456050621", 1),
>>>
>>>   ("1456050622", "14560506203", 2),
>>>
>>>   ("14560506204", "14560506205", 3)]
>>>
>>> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>>
>>> 
>>>
>>> # convert epoch time strings in to spark timestamps
>>>
>>> df = df.select(
>>>
>>>   df.start_time.cast("long").alias("start_time"),
>>>
>>>   df.end_time.cast("long").alias("end_time"),
>>>
>>>   df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>> 
>>>
>>> # convert longs to timestamps
>>>
>>> df = df.select(
>>>
>>>   df.start_time.cast("timestamp").alias("start_time"),
>>>
>>>   df.end_time.cast("timestamp").alias("end_time"),
>>>
>>>   df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>> 
>>>
>>> root
>>>  |-- start_time: long (nullable = true)
>>>  |-- end_time: long (nullable = true)
>>>  |-- id: long (nullable = true)
>>>
>>> +-----------+-----------+---+
>>> |start_time |end_time   |id |
>>> +-----------+-----------+---+
>>> |1456050620 |1456050621 |1  |
>>> |1456050622 |14560506203|2  |
>>> |14560506204|14560506205|3  |
>>> +-----------+-----------+---+
>>>
>>> root
>>>  |-- start_time: timestamp (nullable = true)
>>>  |-- end_time: timestamp (nullable = true)
>>>  |-- id: long (nullable = true)
>>>
>>> +---------------------+---------------------+---+
>>> |start_time           |end_time             |id |
>>> +---------------------+---------------------+---+
>>> |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
>>> |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
>>> |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
>>> +---------------------+---------------------+---+
>>>
>>>
>>> In [21]:
>>>
>>> # working with millisecond times
>>>
>>> data = [
>>>
>>>   ("1456050620000", "1456050620000", 1)]
>>>
>>>   df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>>
>>> 
>>>
>>> # convert epoch time strings in to spark timestamps
>>>
>>> df = df.select(
>>>
>>>   df.start_time.cast("long").alias("start_time"),
>>>
>>>   df.end_time.cast("long").alias("end_time"),
>>>
>>>   df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>> 
>>>
>>> # convert longs to timestamps
>>>
>>> df = df.select(
>>>
>>>   df.start_time.cast("timestamp").alias("start_time"),
>>>
>>>   df.end_time.cast("timestamp").alias("end_time"),
>>>
>>>   df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>> root
>>>  |-- start_time: long (nullable = true)
>>>  |-- end_time: long (nullable = true)
>>>  |-- id: long (nullable = true)
>>>
>>> +-------------+-------------+---+
>>> |start_time   |end_time     |id |
>>> +-------------+-------------+---+
>>> |1456050620000|1456050620000|1  |
>>> +-------------+-------------+---+
>>>
>>> root
>>>  |-- start_time: timestamp (nullable = true)
>>>  |-- end_time: timestamp (nullable = true)
>>>  |-- id: long (nullable = true)
>>>
>>> +----------------------+----------------------+---+
>>> |start_time            |end_time              |id |
>>> +----------------------+----------------------+---+
>>> |48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1  |
>>> +----------------------+----------------------+---+
>>>
>>>
>>>
>>
>

Re: how to covert millisecond time to SQL timeStamp

Posted by Ted Yu <yu...@gmail.com>.

See related thread on using Joda DateTime:
http://search-hadoop.com/m/q3RTtSfi342nveex1&subj=RE+NPE+
when+using+Joda+DateTime

On Mon, Feb 1, 2016 at 7:44 PM, Kevin Mellott <ke...@gmail.com>
wrote:

> I've had pretty good success using Joda-Time
> <http://www.joda.org/joda-time/index.html> for date/time manipulations
> within Spark applications. You may be able to use the *DateTIme* constructor
> below, if you are starting with milliseconds.
>
> DateTime
>
> public DateTime(long instant)
>
> Constructs an instance set to the milliseconds from 1970-01-01T00:00:00Z
> using ISOChronology in the default time zone.
> Parameters:instant - the milliseconds from 1970-01-01T00:00:00Z
>
> On Mon, Feb 1, 2016 at 5:51 PM, Andy Davidson <
> Andy@santacruzintegration.com> wrote:
>
>> What little I know about working with timestamps is based on
>> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html
>>
>> Using the example of dates formatted into human friend strings ->
>> timeStamps I was able to figure out how to convert Epoch times to
>> timestamps. The same trick did not work for millisecond times.
>>
>> Any suggestions would be greatly appreciated.
>>
>>
>> Andy
>>
>> Working with epoch times
>> <http://localhost:8888/notebooks/sparkTimeSeriesExperiments.ipynb#Working-with-epock-times>
>>
>> ref: http://www.epochconverter.com/
>>
>> Epoch timestamp: 1456050620
>>
>> Timestamp in milliseconds: 1456050620000
>>
>> Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
>>
>> Human time (your time zone): 2/21/2016, 2:30:20 AM
>>
>>
>> # Epoch time stamp example
>>
>> data = [
>>
>>   ("1456050620", "1456050621", 1),
>>
>>   ("1456050622", "14560506203", 2),
>>
>>   ("14560506204", "14560506205", 3)]
>>
>> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>
>> 
>>
>> # convert epoch time strings in to spark timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("long").alias("start_time"),
>>
>>   df.end_time.cast("long").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> 
>>
>> # convert longs to timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("timestamp").alias("start_time"),
>>
>>   df.end_time.cast("timestamp").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> 
>>
>> root
>>  |-- start_time: long (nullable = true)
>>  |-- end_time: long (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +-----------+-----------+---+
>> |start_time |end_time   |id |
>> +-----------+-----------+---+
>> |1456050620 |1456050621 |1  |
>> |1456050622 |14560506203|2  |
>> |14560506204|14560506205|3  |
>> +-----------+-----------+---+
>>
>> root
>>  |-- start_time: timestamp (nullable = true)
>>  |-- end_time: timestamp (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +---------------------+---------------------+---+
>> |start_time           |end_time             |id |
>> +---------------------+---------------------+---+
>> |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
>> |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
>> |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
>> +---------------------+---------------------+---+
>>
>>
>> In [21]:
>>
>> # working with millisecond times
>>
>> data = [
>>
>>   ("1456050620000", "1456050620000", 1)]
>>
>>   df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>
>> 
>>
>> # convert epoch time strings in to spark timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("long").alias("start_time"),
>>
>>   df.end_time.cast("long").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> 
>>
>> # convert longs to timestamps
>>
>> df = df.select(
>>
>>   df.start_time.cast("timestamp").alias("start_time"),
>>
>>   df.end_time.cast("timestamp").alias("end_time"),
>>
>>   df.id)
>>
>> df.printSchema()
>>
>> df.show(truncate=False)
>>
>> root
>>  |-- start_time: long (nullable = true)
>>  |-- end_time: long (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +-------------+-------------+---+
>> |start_time   |end_time     |id |
>> +-------------+-------------+---+
>> |1456050620000|1456050620000|1  |
>> +-------------+-------------+---+
>>
>> root
>>  |-- start_time: timestamp (nullable = true)
>>  |-- end_time: timestamp (nullable = true)
>>  |-- id: long (nullable = true)
>>
>> +----------------------+----------------------+---+
>> |start_time            |end_time              |id |
>> +----------------------+----------------------+---+
>> |48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1  |
>> +----------------------+----------------------+---+
>>
>>
>>
>

Re: how to covert millisecond time to SQL timeStamp

Posted by Kevin Mellott <ke...@gmail.com>.

I've had pretty good success using Joda-Time
<http://www.joda.org/joda-time/index.html> for date/time manipulations
within Spark applications. You may be able to use the *DateTIme* constructor
below, if you are starting with milliseconds.

DateTime

public DateTime(long instant)

Constructs an instance set to the milliseconds from 1970-01-01T00:00:00Z
using ISOChronology in the default time zone.
Parameters:instant - the milliseconds from 1970-01-01T00:00:00Z

On Mon, Feb 1, 2016 at 5:51 PM, Andy Davidson <Andy@santacruzintegration.com
> wrote:

> What little I know about working with timestamps is based on
> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html
>
> Using the example of dates formatted into human friend strings ->
> timeStamps I was able to figure out how to convert Epoch times to
> timestamps. The same trick did not work for millisecond times.
>
> Any suggestions would be greatly appreciated.
>
>
> Andy
>
> Working with epoch times
> <http://localhost:8888/notebooks/sparkTimeSeriesExperiments.ipynb#Working-with-epock-times>
>
> ref: http://www.epochconverter.com/
>
> Epoch timestamp: 1456050620
>
> Timestamp in milliseconds: 1456050620000
>
> Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
>
> Human time (your time zone): 2/21/2016, 2:30:20 AM
>
>
> # Epoch time stamp example
>
> data = [
>
>   ("1456050620", "1456050621", 1),
>
>   ("1456050622", "14560506203", 2),
>
>   ("14560506204", "14560506205", 3)]
>
> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>
> 
>
> # convert epoch time strings in to spark timestamps
>
> df = df.select(
>
>   df.start_time.cast("long").alias("start_time"),
>
>   df.end_time.cast("long").alias("end_time"),
>
>   df.id)
>
> df.printSchema()
>
> df.show(truncate=False)
>
> 
>
> # convert longs to timestamps
>
> df = df.select(
>
>   df.start_time.cast("timestamp").alias("start_time"),
>
>   df.end_time.cast("timestamp").alias("end_time"),
>
>   df.id)
>
> df.printSchema()
>
> df.show(truncate=False)
>
> 
>
> root
>  |-- start_time: long (nullable = true)
>  |-- end_time: long (nullable = true)
>  |-- id: long (nullable = true)
>
> +-----------+-----------+---+
> |start_time |end_time   |id |
> +-----------+-----------+---+
> |1456050620 |1456050621 |1  |
> |1456050622 |14560506203|2  |
> |14560506204|14560506205|3  |
> +-----------+-----------+---+
>
> root
>  |-- start_time: timestamp (nullable = true)
>  |-- end_time: timestamp (nullable = true)
>  |-- id: long (nullable = true)
>
> +---------------------+---------------------+---+
> |start_time           |end_time             |id |
> +---------------------+---------------------+---+
> |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1  |
> |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2  |
> |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3  |
> +---------------------+---------------------+---+
>
>
> In [21]:
>
> # working with millisecond times
>
> data = [
>
>   ("1456050620000", "1456050620000", 1)]
>
>   df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>
> 
>
> # convert epoch time strings in to spark timestamps
>
> df = df.select(
>
>   df.start_time.cast("long").alias("start_time"),
>
>   df.end_time.cast("long").alias("end_time"),
>
>   df.id)
>
> df.printSchema()
>
> df.show(truncate=False)
>
> 
>
> # convert longs to timestamps
>
> df = df.select(
>
>   df.start_time.cast("timestamp").alias("start_time"),
>
>   df.end_time.cast("timestamp").alias("end_time"),
>
>   df.id)
>
> df.printSchema()
>
> df.show(truncate=False)
>
> root
>  |-- start_time: long (nullable = true)
>  |-- end_time: long (nullable = true)
>  |-- id: long (nullable = true)
>
> +-------------+-------------+---+
> |start_time   |end_time     |id |
> +-------------+-------------+---+
> |1456050620000|1456050620000|1  |
> +-------------+-------------+---+
>
> root
>  |-- start_time: timestamp (nullable = true)
>  |-- end_time: timestamp (nullable = true)
>  |-- id: long (nullable = true)
>
> +----------------------+----------------------+---+
> |start_time            |end_time              |id |
> +----------------------+----------------------+---+
> |48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1  |
> +----------------------+----------------------+---+
>
>
>