You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by David Hodefi <da...@gmail.com> on 2017/11/09 11:05:49 UTC

Spark SQL - Truncate Day / Hour

I would like to truncate date to his day or hour. currently it is only
possible to truncate MONTH or YEAR.
1.How can achieve that?
2.Is there any pull request about this issue?
3.If there is not any open pull request about this issue, what are the
implications that I should be aware of when coding /contributing it as a
pull request?

Last question is,  Looking at DateTImeUtils class code, it seems like
implementation is not using any open library for handling dates i.e
apache-common , Why implementing it instead of reusing open source?

Thanks David

Re: Spark SQL - Truncate Day / Hour

Posted by Eike von Seggern <ei...@sevenval.com>.
Hi,

you can truncate datetimes like this (in pyspark), e.g. to 5 minutes:

import pyspark.sql.functions as F
df.select((F.floor(F.col('myDateColumn').cast('long') / 300) *
300).cast('timestamp'))

Best,
Eike

David Hodefi <da...@gmail.com> schrieb am Mo., 13. Nov. 2017 um
12:27 Uhr:

> I am familiar with those functions, none of them is actually truncating a
> date. We can use those methods to help implement truncate method. I think
> truncating a day/ hour should be as simple as "truncate(...,"DD")  or
> truncate(...,"HH")  ".
>
> On Thu, Nov 9, 2017 at 8:23 PM, Gaspar Muñoz <gm...@datiobd.com> wrote:
>
>> There are functions for day (called dayOfMonth and dayOfYear) and hour
>> (called hour). You can view them here:
>> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions
>>
>> Example:
>>
>> import org.apache.spark.sql.functions._
>> val df = df.select(hour($"myDateColumn"), dayOfMonth($"myDateColumn"),
>> dayOfYear($"myDateColumn"))
>>
>> 2017-11-09 12:05 GMT+01:00 David Hodefi <da...@gmail.com>:
>>
>>> I would like to truncate date to his day or hour. currently it is only
>>> possible to truncate MONTH or YEAR.
>>> 1.How can achieve that?
>>> 2.Is there any pull request about this issue?
>>> 3.If there is not any open pull request about this issue, what are the
>>> implications that I should be aware of when coding /contributing it as a
>>> pull request?
>>>
>>> Last question is,  Looking at DateTImeUtils class code, it seems like
>>> implementation is not using any open library for handling dates i.e
>>> apache-common , Why implementing it instead of reusing open source?
>>>
>>> Thanks David
>>>
>>
>>
>>
>> --
>> Gaspar Muñoz Soria
>>
>> Vía de las dos Castillas, 33
>> <https://maps.google.com/?q=V%C3%ADa+de+las+dos+Castillas,+33&entry=gmail&source=g>,
>> Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473
>>
>
>

Re: Spark SQL - Truncate Day / Hour

Posted by David Hodefi <da...@gmail.com>.
I am familiar with those functions, none of them is actually truncating a
date. We can use those methods to help implement truncate method. I think
truncating a day/ hour should be as simple as "truncate(...,"DD")  or
truncate(...,"HH")  ".

On Thu, Nov 9, 2017 at 8:23 PM, Gaspar Muñoz <gm...@datiobd.com> wrote:

> There are functions for day (called dayOfMonth and dayOfYear) and hour
> (called hour). You can view them here: https://spark.apache.
> org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions
>
> Example:
>
> import org.apache.spark.sql.functions._
> val df = df.select(hour($"myDateColumn"), dayOfMonth($"myDateColumn"),
> dayOfYear($"myDateColumn"))
>
> 2017-11-09 12:05 GMT+01:00 David Hodefi <da...@gmail.com>:
>
>> I would like to truncate date to his day or hour. currently it is only
>> possible to truncate MONTH or YEAR.
>> 1.How can achieve that?
>> 2.Is there any pull request about this issue?
>> 3.If there is not any open pull request about this issue, what are the
>> implications that I should be aware of when coding /contributing it as a
>> pull request?
>>
>> Last question is,  Looking at DateTImeUtils class code, it seems like
>> implementation is not using any open library for handling dates i.e
>> apache-common , Why implementing it instead of reusing open source?
>>
>> Thanks David
>>
>
>
>
> --
> Gaspar Muñoz Soria
>
> Vía de las dos Castillas, 33
> <https://maps.google.com/?q=V%C3%ADa+de+las+dos+Castillas,+33&entry=gmail&source=g>,
> Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473
>

Re: Spark SQL - Truncate Day / Hour

Posted by Gaspar Muñoz <gm...@datiobd.com>.
There are functions for day (called dayOfMonth and dayOfYear) and hour
(called hour). You can view them here:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions

Example:

import org.apache.spark.sql.functions._
val df = df.select(hour($"myDateColumn"), dayOfMonth($"myDateColumn"),
dayOfYear($"myDateColumn"))

2017-11-09 12:05 GMT+01:00 David Hodefi <da...@gmail.com>:

> I would like to truncate date to his day or hour. currently it is only
> possible to truncate MONTH or YEAR.
> 1.How can achieve that?
> 2.Is there any pull request about this issue?
> 3.If there is not any open pull request about this issue, what are the
> implications that I should be aware of when coding /contributing it as a
> pull request?
>
> Last question is,  Looking at DateTImeUtils class code, it seems like
> implementation is not using any open library for handling dates i.e
> apache-common , Why implementing it instead of reusing open source?
>
> Thanks David
>



-- 
Gaspar Muñoz Soria

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473