You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/06 16:49:01 UTC

[GitHub] [spark] MaxGekk opened a new pull request #31765: [WIP][SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

MaxGekk opened a new pull request #31765:
URL: https://github.com/apache/spark/pull/31765


   ### What changes were proposed in this pull request?
   TODO
   
   ### Why are the changes needed?
   1. To allow users parallelization of `java.time.Period` collections, and construct year-month interval columns. Also to collect such columns back to the driver side.
   2. This will allow to write tests in other sub-tasks of SPARK-27790.
   
   ### Does this PR introduce _any_ user-facing change?
   The PR extends existing functionality. So, users can parallelize instances of the `java.time.Duration` class and collect them back:
   
   ```scala
   scala> val ds = Seq(java.time.Period.ofYears(10).withMonths(2)).toDS
   ds: org.apache.spark.sql.Dataset[java.time.Period] = [value: yearmonthinterval]
   
   scala> ds.collect
   res0: Array[java.time.Period] = Array(P10Y2M)
   ```
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31765: [WIP][SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31765:
URL: https://github.com/apache/spark/pull/31765#issuecomment-792001291


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135832/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31765: [SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31765:
URL: https://github.com/apache/spark/pull/31765#discussion_r589241739



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
##########
@@ -791,4 +791,35 @@ object IntervalUtils {
    * @return A [[Duration]], not null
    */
   def microsToDuration(micros: Long): Duration = Duration.of(micros, ChronoUnit.MICROS)
+
+  /**
+   * Gets the total number of months in this period.
+   * <p>
+   * This returns the total number of months in the period by multiplying the
+   * number of years by 12 and adding the number of months.
+   * <p>
+   *
+   * @return The total number of months in the period, may be negative
+   * @throws ArithmeticException If numeric overflow occurs
+   */
+  def periodToMonths(period: Period): Int = {

Review comment:
       We don't fail when we convert:
   0. `java.sql.Date` has time component with millisecond precision but we ignore it when we convert to days at https://github.com/apache/spark/blob/56e664c7179eadeb5134b4418f3aaa6a9d742ef6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L93
   1. `java.sql.Timestamp` which has nanoseconds precision: https://github.com/apache/spark/blob/56e664c7179eadeb5134b4418f3aaa6a9d742ef6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L166
   2. `java.time.Instant` which contains nanoseconds, and we don't fail when we convert it to microseconds: https://github.com/apache/spark/blob/56e664c7179eadeb5134b4418f3aaa6a9d742ef6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L383
   
   To be consistent with current implementation for other types, I do believe we should not fail.
   
   > Or at least give a warning?
   
   This will just fill in the logs by useless records, and this is again inconsistent with current implementation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31765: [SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31765:
URL: https://github.com/apache/spark/pull/31765#discussion_r589229213



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
##########
@@ -791,4 +791,35 @@ object IntervalUtils {
    * @return A [[Duration]], not null
    */
   def microsToDuration(micros: Long): Duration = Duration.of(micros, ChronoUnit.MICROS)
+
+  /**
+   * Gets the total number of months in this period.
+   * <p>
+   * This returns the total number of months in the period by multiplying the
+   * number of years by 12 and adding the number of months.
+   * <p>
+   *
+   * @return The total number of months in the period, may be negative
+   * @throws ArithmeticException If numeric overflow occurs
+   */
+  def periodToMonths(period: Period): Int = {

Review comment:
       Shall we fail if the day field is not 0? Or at least give a warning?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #31765: [SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #31765:
URL: https://github.com/apache/spark/pull/31765#issuecomment-792340919


   @cloud-fan @yaooqinn @srielau @HyukjinKwon Could you review this PR, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31765: [WIP][SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31765:
URL: https://github.com/apache/spark/pull/31765#issuecomment-792001837


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40414/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31765: [WIP][SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31765:
URL: https://github.com/apache/spark/pull/31765#issuecomment-792026646


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40414/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31765: [SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31765:
URL: https://github.com/apache/spark/pull/31765#issuecomment-792576411


   thanks, merging to master!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on pull request #31765: [SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
yaooqinn commented on pull request #31765:
URL: https://github.com/apache/spark/pull/31765#issuecomment-792577302


   Late LGTM!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31765: [WIP][SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31765:
URL: https://github.com/apache/spark/pull/31765#issuecomment-792001291


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135832/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31765: [SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31765:
URL: https://github.com/apache/spark/pull/31765#discussion_r589241739



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
##########
@@ -791,4 +791,35 @@ object IntervalUtils {
    * @return A [[Duration]], not null
    */
   def microsToDuration(micros: Long): Duration = Duration.of(micros, ChronoUnit.MICROS)
+
+  /**
+   * Gets the total number of months in this period.
+   * <p>
+   * This returns the total number of months in the period by multiplying the
+   * number of years by 12 and adding the number of months.
+   * <p>
+   *
+   * @return The total number of months in the period, may be negative
+   * @throws ArithmeticException If numeric overflow occurs
+   */
+  def periodToMonths(period: Period): Int = {

Review comment:
       We don't fail when we convert:
   
   1. `java.sql.Date` has time component with millisecond precision but we ignore it when we convert to days at https://github.com/apache/spark/blob/56e664c7179eadeb5134b4418f3aaa6a9d742ef6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L93
   2. `java.sql.Timestamp` which has nanoseconds precision: https://github.com/apache/spark/blob/56e664c7179eadeb5134b4418f3aaa6a9d742ef6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L166
   3. `java.time.Instant` which contains nanoseconds, and we don't fail when we convert it to microseconds: https://github.com/apache/spark/blob/56e664c7179eadeb5134b4418f3aaa6a9d742ef6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L383
   
   To be consistent with current implementation for other types, I do believe we should not fail.
   
   > Or at least give a warning?
   
   This will just fill in the logs by useless records, and this is again inconsistent with current implementation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31765: [WIP][SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31765:
URL: https://github.com/apache/spark/pull/31765#issuecomment-792011694


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40414/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #31765: [SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #31765:
URL: https://github.com/apache/spark/pull/31765


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31765: [WIP][SPARK-34615][SQL] Support `java.time.Period` as an external type of the year-month interval type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31765:
URL: https://github.com/apache/spark/pull/31765#issuecomment-792026646


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40414/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org