You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Enrico Minack <ma...@Enrico.Minack.dev> on 2020/02/11 16:00:52 UTC

comparable and orderable CalendarInterval

Hi Devs,

I would like to know what is the current roadmap of making 
CalendarInterval comparable and orderable again (SPARK-29679, 
SPARK-29385, #26337).

With #27262, this got reverted but SPARK-30551 does not mention how to 
go forward in this matter. I have found SPARK-28494, but this seems to 
be stale.

While I find it useful to compare such intervals, I cannot find a way to 
work around the missing comparability. Is there a way to get, e.g. the 
seconds that an interval represents to be able to compare intervals? In 
org.apache.spark.sql.catalyst.util.IntervalUtils there are methods like 
getEpoch or getDuration, which I cannot see are exposed to SQL or in the 
org.apache.spark.sql.functions package.

Thanks for the insights,
Enrico


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: comparable and orderable CalendarInterval

Posted by Joseph Torres <jo...@databricks.com>.
The problem is that there isn't a consistent number of seconds an interval
represents - as Wenchen mentioned, a month interval isn't a fixed number of
days. If your use case can account for that, maybe you could add the
interval to a fixed reference date and then compare the result.

On Tue, Feb 11, 2020 at 8:01 AM Enrico Minack <ma...@enrico.minack.dev>
wrote:

> Hi Devs,
>
> I would like to know what is the current roadmap of making
> CalendarInterval comparable and orderable again (SPARK-29679,
> SPARK-29385, #26337).
>
> With #27262, this got reverted but SPARK-30551 does not mention how to
> go forward in this matter. I have found SPARK-28494, but this seems to
> be stale.
>
> While I find it useful to compare such intervals, I cannot find a way to
> work around the missing comparability. Is there a way to get, e.g. the
> seconds that an interval represents to be able to compare intervals? In
> org.apache.spark.sql.catalyst.util.IntervalUtils there are methods like
> getEpoch or getDuration, which I cannot see are exposed to SQL or in the
> org.apache.spark.sql.functions package.
>
> Thanks for the insights,
> Enrico
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: comparable and orderable CalendarInterval

Posted by Enrico Minack <ma...@Enrico.Minack.dev>.
There is another feature missing for CalendarInterval, which is related 
to comparability: measure the length of an interval.

Would be nice if you could access the length of an interval, than you 
could compute something like this:

|Seq((Timestamp.valueOf("2020-02-01 12:00:00"), 
Timestamp.valueOf("2020-02-01 13:30:25"))) .toDF("start", "end") 
.withColumn("interval", $"end" - $"start") .withColumn("interval [h]", 
*INTERVAL LENGTH IN HOURS*) .withColumn("rate [€/h]", lit(1.45)) 
.withColumn("price [€]", $"interval [h]" * $"rate [€/h]") .show(false) 
+-------------------+-------------------+-----------------------------+------------------+----------+------------------+ 
|start |end |interval |interval [h] |rate [€/h]|price [€] | 
+-------------------+-------------------+-----------------------------+------------------+----------+------------------+ 
|2020-02-01 12:00:00|2020-02-01 13:30:25|1 hours 30 minutes 25 
seconds|1.5069444444444444|1.45 |2.1850694444444443| 
+-------------------+-------------------+-----------------------------+------------------+----------+------------------+ 
|


The length of an interval can be measured by dividing it with the length 
of your measuring unit, e.g. "1 hour":

||$"interval" / lit("1 hour").cast(CalendarIntervalType)| |


Which brings us to CalendarInterval division: 
https://github.com/apache/spark/pull/27805

Enrico


Am 11.02.20 um 21:09 schrieb Enrico Minack:
> I compute the difference of two timestamps and compare them with a 
> constant interval:
>
> Seq(("2019-01-02 12:00:00", "2019-01-02 13:30:00"))
>   .toDF("start", "end")
> .select($"start".cast(TimestampType), $"end".cast(TimestampType))
>   .select($"start", $"end", ($"end" - $"start").as("diff"))
>   .where($"diff" < lit("INTERVAL 2 HOUR").cast(CalendarIntervalType))
>   .show
>
> Coming from timestamps, the interval should have correct hours 
> (millisecond component), so comparing it with the "right kinds of 
> intervals" should always be correct.
>
> Enrico
>
>
> Am 11.02.20 um 17:06 schrieb Wenchen Fan:
>> What's your use case to compare intervals? It's tricky in Spark as 
>> there is only one interval type and you can't really compare one 
>> month with 30 days.
>>
>> On Wed, Feb 12, 2020 at 12:01 AM Enrico Minack 
>> <mail@enrico.minack.dev <ma...@enrico.minack.dev>> wrote:
>>
>>     Hi Devs,
>>
>>     I would like to know what is the current roadmap of making
>>     CalendarInterval comparable and orderable again (SPARK-29679,
>>     SPARK-29385, #26337).
>>
>>     With #27262, this got reverted but SPARK-30551 does not mention
>>     how to
>>     go forward in this matter. I have found SPARK-28494, but this
>>     seems to
>>     be stale.
>>
>>     While I find it useful to compare such intervals, I cannot find a
>>     way to
>>     work around the missing comparability. Is there a way to get,
>>     e.g. the
>>     seconds that an interval represents to be able to compare
>>     intervals? In
>>     org.apache.spark.sql.catalyst.util.IntervalUtils there are
>>     methods like
>>     getEpoch or getDuration, which I cannot see are exposed to SQL or
>>     in the
>>     org.apache.spark.sql.functions package.
>>
>>     Thanks for the insights,
>>     Enrico
>>
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>     <ma...@spark.apache.org>
>>
>


Re: comparable and orderable CalendarInterval

Posted by Enrico Minack <ma...@Enrico.Minack.dev>.
I compute the difference of two timestamps and compare them with a 
constant interval:

Seq(("2019-01-02 12:00:00", "2019-01-02 13:30:00"))
   .toDF("start", "end")
   .select($"start".cast(TimestampType), $"end".cast(TimestampType))
   .select($"start", $"end", ($"end" - $"start").as("diff"))
   .where($"diff" < lit("INTERVAL 2 HOUR").cast(CalendarIntervalType))
   .show

Coming from timestamps, the interval should have correct hours 
(millisecond component), so comparing it with the "right kinds of 
intervals" should always be correct.

Enrico


Am 11.02.20 um 17:06 schrieb Wenchen Fan:
> What's your use case to compare intervals? It's tricky in Spark as 
> there is only one interval type and you can't really compare one month 
> with 30 days.
>
> On Wed, Feb 12, 2020 at 12:01 AM Enrico Minack <mail@enrico.minack.dev 
> <ma...@enrico.minack.dev>> wrote:
>
>     Hi Devs,
>
>     I would like to know what is the current roadmap of making
>     CalendarInterval comparable and orderable again (SPARK-29679,
>     SPARK-29385, #26337).
>
>     With #27262, this got reverted but SPARK-30551 does not mention
>     how to
>     go forward in this matter. I have found SPARK-28494, but this
>     seems to
>     be stale.
>
>     While I find it useful to compare such intervals, I cannot find a
>     way to
>     work around the missing comparability. Is there a way to get, e.g.
>     the
>     seconds that an interval represents to be able to compare
>     intervals? In
>     org.apache.spark.sql.catalyst.util.IntervalUtils there are methods
>     like
>     getEpoch or getDuration, which I cannot see are exposed to SQL or
>     in the
>     org.apache.spark.sql.functions package.
>
>     Thanks for the insights,
>     Enrico
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
>


Re: comparable and orderable CalendarInterval

Posted by Wenchen Fan <cl...@gmail.com>.
What's your use case to compare intervals? It's tricky in Spark as there is
only one interval type and you can't really compare one month with 30 days.

On Wed, Feb 12, 2020 at 12:01 AM Enrico Minack <ma...@enrico.minack.dev>
wrote:

> Hi Devs,
>
> I would like to know what is the current roadmap of making
> CalendarInterval comparable and orderable again (SPARK-29679,
> SPARK-29385, #26337).
>
> With #27262, this got reverted but SPARK-30551 does not mention how to
> go forward in this matter. I have found SPARK-28494, but this seems to
> be stale.
>
> While I find it useful to compare such intervals, I cannot find a way to
> work around the missing comparability. Is there a way to get, e.g. the
> seconds that an interval represents to be able to compare intervals? In
> org.apache.spark.sql.catalyst.util.IntervalUtils there are methods like
> getEpoch or getDuration, which I cannot see are exposed to SQL or in the
> org.apache.spark.sql.functions package.
>
> Thanks for the insights,
> Enrico
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>