You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/02 15:29:12 UTC

[GitHub] [spark] zero323 opened a new pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

zero323 opened a new pull request #29935:
URL: https://github.com/apache/spark/pull/29935


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795278517


   **[Test build #135934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135934/testReport)** for PR 29935 at commit [`69438b4`](https://github.com/apache/spark/commit/69438b418aafa76d6d23f6ffbb146ac2b6e10019).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702831957


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33963/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-755922081


   **[Test build #133778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133778/testReport)** for PR 29935 at commit [`69438b4`](https://github.com/apache/spark/commit/69438b418aafa76d6d23f6ffbb146ac2b6e10019).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-803946095


   Thank you for your input @MaxGekk!
   
   > I would like to propose to not expose `CalendarIntervalType` (consider it as a legacy one), and focus on new types `YearMonthIntervalType` and `DayTimeIntervalType` (see [SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)).
   
   In my opinion SPARK-27790 would definitely mark  SPARK-33056 obsolete, not that we had very good proposals for an useful implementation. However, I am not sure if this really affects SPARK-33055. I'd argue it is not so much a new feature (something new is exported) as a bugfix (a component, that was accidentally omitted is included) ‒ I can ran queries  included in the JIRA in Scala, Java, SparkR and even some 3rd party bindings ‒ even if legacy there are supported. 
   
   You can even add cast
   
   ```python
   spark.sql("SELECT CAST(current_date() - current_date() AS string)")
   ```
   
   and PySpark won't see a problem.
   
   Unhandled exception in such case is just not good.
   
   Even if  `CalendarIntervalType` I'd still consider starting a discussion about backporting this minimal fix to 3.0 and 3.1.
   
   
   > If you have a proposal of mapping `YearMonthIntervalType`/`DayTimeIntervalType` to python types (from standard lib) like we did for Java/Scala already:
   
   I'll take a look when I have a chance, but if I am not mistaken equivalents of these are already supported in Arrow, so that's probably where we should start looking.
   
   If you don't mind a QQ about the of future of `CalendarIntervalType` ‒ can it be decomposed into `YearMonthIntervalType` + `DayTimeIntervalType`?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #29935:
URL: https://github.com/apache/spark/pull/29935


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702872827






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-806341260


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41073/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702862115






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702859045


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33967/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795337759


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40517/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #29935:
URL: https://github.com/apache/spark/pull/29935#discussion_r562216651



##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):
+    """Calendar Interval type
+    """
+    @classmethod
+    def typeName(cls):
+        return "interval"
+
+    def needConversion(self):
+        return True
+
+    def simpleString(self):
+        return "interval"
+
+    def toInternal(self, di):
+        raise NotImplementedError(
+            "Conversion from external Python types to interval not supported"
+        )

Review comment:
       For full support we might need both Python and JVM component. If I recall correctly `timedelta` has razrovine mapping to their internal `net.razorvine.pickle.objects.TimeDelta`.
   
   In the opposite direction we could, if I am not mistaken, start with making `CalendarInterval` bean compatible, but there is compatibility issue ‒ we'd have to map from Spark's months to Python's days.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702848018


   **[Test build #129359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129359/testReport)** for PR 29935 at commit [`7d6ebd2`](https://github.com/apache/spark/commit/7d6ebd28b4e9a7b42f743b1fd63f53b51b46575d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] nchammas commented on a change in pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
nchammas commented on a change in pull request #29935:
URL: https://github.com/apache/spark/pull/29935#discussion_r562180659



##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):

Review comment:
       Doesn't @zero323's example from the PR description show that Spark already exposes this type? 
   
   ```python
   spark.sql("SELECT current_date() - current_date()")
   ```
   
   For the record, btw, Postgres supports [an `interval` type](https://www.postgresql.org/docs/current/datatype-datetime.html) and done so since at least [version 7.1](https://www.postgresql.org/docs/7.1/datatype-datetime.html), which was released in 2001. (I mention this since Postgres often comes up as a reference for whether Spark SQL should support a feature or not.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702868171






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702862115






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702842684






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MrPowers commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
MrPowers commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-802289372


   @MaxGekk - thanks for pointing this out.  I wasn't aware of the `YearMonthIntervalType` / `DayTimeIntervalType`.
   
   Do you plan on building `make_year_month_interval` and `make_day_time_interval` SQL functions, similar to `make_interval`?
   
   When users do `col("time1")` - `col("time2")` will two values be returned?  I understand the problem with `CalendarInterval` (thanks to your wonderful description in the JIRA) and want to better understand the user facing APIs with your new proposal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702842449


   **[Test build #129357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129357/testReport)** for PR 29935 at commit [`80fbf32`](https://github.com/apache/spark/commit/80fbf32b85755c0cdfcbd97a2fe1b8d7695b8c54).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] nchammas commented on a change in pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
nchammas commented on a change in pull request #29935:
URL: https://github.com/apache/spark/pull/29935#discussion_r562175621



##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):
+    """Calendar Interval type
+    """
+    @classmethod
+    def typeName(cls):
+        return "interval"
+
+    def needConversion(self):
+        return True
+
+    def simpleString(self):
+        return "interval"
+
+    def toInternal(self, di):
+        raise NotImplementedError(
+            "Conversion from external Python types to interval not supported"
+        )

Review comment:
       I suppose in the future if we want to support conversion of Python's [`datetime.timedelta`](https://docs.python.org/3/library/datetime.html#datetime.timedelta), it would happen here, right?

##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):

Review comment:
       Doesn't @zero323's example from the PR description show that Spark already exposes this type? 
   
   ```python
   spark.sql("SELECT current_date() - current_date()")
   ```
   
   For the record, btw, Postgres supports [an `interval` type](https://www.postgresql.org/docs/current/datatype-datetime.html) and done so since at least [version 7.1](https://www.postgresql.org/docs/7.1/datatype-datetime.html), which was released in 2001.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702808582


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702872827


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702861067


   **[Test build #129359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129359/testReport)** for PR 29935 at commit [`7d6ebd2`](https://github.com/apache/spark/commit/7d6ebd28b4e9a7b42f743b1fd63f53b51b46575d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-801738281


   I would like to propose to not expose `CalendarIntervalType` (consider it as a legacy one), and focus on new types `YearMonthIntervalType` and `DayTimeIntervalType` (see SPARK-27790).
   
   If you have a proposal of mapping `YearMonthIntervalType`/`DayTimeIntervalType` to python types (from standard lib) like we did for Java/Scala already:
   - [Support java.time.Duration as an external type of the day-time interval type](https://issues.apache.org/jira/browse/SPARK-34605)
   - [Support java.time.Period as an external type of the year-month interval type](https://issues.apache.org/jira/browse/SPARK-34615)
   
   Please, open a sub-task JIRA in SPARK-27790, and we will discuss your proposal in the tickets.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702835086


   **[Test build #129357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129357/testReport)** for PR 29935 at commit [`80fbf32`](https://github.com/apache/spark/commit/80fbf32b85755c0cdfcbd97a2fe1b8d7695b8c54).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702872843






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-764961743


   > Let me know if there is anything I can do to help you move this PR forward @zero323!
   
   Thanks @MrPowers! I reckon we mostly need some attention here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795347852


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40517/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MrPowers edited a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
MrPowers edited a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-761705061


   @zero323 - Adding `CalendarIntervalType` to PySpark is a great idea.
   
   Additional context for others: [CalendarIntervalType](https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/types/CalendarIntervalType.html) is already in the Scala API and allows for some awesome functionality.
   
   Here's the Spark 3.0.1 behavior with Scala:
   
   ```scala
   import java.sql.Date
   import org.apache.spark.sql.functions._
   val df = Seq(
     (Date.valueOf("2021-01-23"), Date.valueOf("2021-01-21"))
   ).toDF("date1", "date2")
   df.withColumn("new_datediff", $"date1" - $"date2").show()
   //+----------+----------+------------+
   //|     date1|     date2|new_datediff|
   //+----------+----------+------------+
   //|2021-01-23|2021-01-21|      2 days|
   //+----------+----------+------------+
   
   df.withColumn("new_datediff", $"date1" - $"date2").printSchema()
   //root
   // |-- date1: date (nullable = true)
   // |-- date2: date (nullable = true)
   // |-- new_datediff: interval (nullable = true)
   ```
   
   Getting this functionality in PySpark would be a huge win.  
   
   Let me know if there is anything I can do to help you move this PR forward @zero323!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-806350801


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41073/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-761542926


   FYI @MrPowers (in case of your going to work on Python API for `make_interval` and encounter this).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795347896


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40517/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795347896


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40517/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-873487919


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795306113


   **[Test build #135934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135934/testReport)** for PR 29935 at commit [`69438b4`](https://github.com/apache/spark/commit/69438b418aafa76d6d23f6ffbb146ac2b6e10019).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MrPowers commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
MrPowers commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-792419286


   Just wanted to check in and see if we can potentially get this merged, so we can add `make_interval` to the PySpark API.
   
   Seems like @HyukjinKwon is [cool with this getting added](https://github.com/apache/spark/pull/29935#discussion_r562301831).
   
   Should we check with anyone else in particular?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #29935:
URL: https://github.com/apache/spark/pull/29935#discussion_r562216651



##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):
+    """Calendar Interval type
+    """
+    @classmethod
+    def typeName(cls):
+        return "interval"
+
+    def needConversion(self):
+        return True
+
+    def simpleString(self):
+        return "interval"
+
+    def toInternal(self, di):
+        raise NotImplementedError(
+            "Conversion from external Python types to interval not supported"
+        )

Review comment:
       For full support we might need both Python and JVM component. If I recall correctly `timedelta` has razrovine mapping to their internal `net.razorvine.pickle.objects.TimeDelta`.
   
   In the opposite direction we could, if I am not mistaken, start with making `CalendarInterval` bean compatible, but there is compatibility issue ‒ we'd have to map from Spark's months to Python's days.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795335039






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-721069588






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29935:
URL: https://github.com/apache/spark/pull/29935#discussion_r562301332



##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):

Review comment:
       The problem is that it has been half-exposed so far. There have been many discussions up to which context we should support. e.g.) `CalendarInterval` is marked as `Unstable`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795274495


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702832071






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702868171






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-764961743


   > Let me know if there is anything I can do to help you move this PR forward @zero323!
   
   Thanks @MrPowers! I reckon we mostly need some attention here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702807683


   **[Test build #129354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129354/testReport)** for PR 29935 at commit [`d9a6e44`](https://github.com/apache/spark/commit/d9a6e44cd3458cbff86057cb80849c78e8b0f2f3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-721059608






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702823843


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33963/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702808602


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129354/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-806350801


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41073/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-806338741


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41073/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795278517


   **[Test build #135934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135934/testReport)** for PR 29935 at commit [`69438b4`](https://github.com/apache/spark/commit/69438b418aafa76d6d23f6ffbb146ac2b6e10019).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-799784361


   > before we go further, can we file an umbrella JIRA to address the full support in PySpark?
   
   @HyukjinKwon I think we can use SPARK-33054 for that. I originally suspected that  SPARK-33056 might be no go, based on some discussions I've seen around Scala implementation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 edited a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
zero323 edited a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-798601773


   > Yeah, it would be nice if we map it to timedelta.
   
   My biggest concern is that Spark implementation doesn't really map to `timedelta`. Let's say `INTERVAL 2 years` or `INTERVAL 1 month` ‒ we could derive some arbitrary rules for handling intervals expressed with units larger than weeks, but I am not sure how useful these will be in practice.
   
   Ultimately I'd like following to be satisfied
   
   ```python
   cd, i, yfn = spark.sql("""
       SELECT *, cd + i AS yfn FROM (SELECT current_date() as cd, INTERVAL 1 year AS i) t
   """).first()
   
   assert cd + i == yfn
   ```
   
   for arbitrary `Interval` `i` which, if I am not missing anything here, won't be possible with `timedelta`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-721059608


   **[Test build #130565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130565/testReport)** for PR 29935 at commit [`6ca50c5`](https://github.com/apache/spark/commit/6ca50c56c615f9d43da10f83d9c567e7aa72db86).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] nchammas commented on a change in pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
nchammas commented on a change in pull request #29935:
URL: https://github.com/apache/spark/pull/29935#discussion_r562175621



##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):
+    """Calendar Interval type
+    """
+    @classmethod
+    def typeName(cls):
+        return "interval"
+
+    def needConversion(self):
+        return True
+
+    def simpleString(self):
+        return "interval"
+
+    def toInternal(self, di):
+        raise NotImplementedError(
+            "Conversion from external Python types to interval not supported"
+        )

Review comment:
       I suppose in the future if we want to support conversion of Python's [`datetime.timedelta`](https://docs.python.org/3/library/datetime.html#datetime.timedelta), it would happen here, right?

##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):

Review comment:
       Doesn't @zero323's example from the PR description show that Spark already exposes this type? 
   
   ```python
   spark.sql("SELECT current_date() - current_date()")
   ```
   
   For the record, btw, Postgres supports [an `interval` type](https://www.postgresql.org/docs/current/datatype-datetime.html) and done so since at least [version 7.1](https://www.postgresql.org/docs/7.1/datatype-datetime.html), which was released in 2001.

##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):

Review comment:
       Doesn't @zero323's example from the PR description show that Spark already exposes this type? 
   
   ```python
   spark.sql("SELECT current_date() - current_date()")
   ```
   
   For the record, btw, Postgres supports [an `interval` type](https://www.postgresql.org/docs/current/datatype-datetime.html) and done so since at least [version 7.1](https://www.postgresql.org/docs/7.1/datatype-datetime.html), which was released in 2001. (I mention this since Postgres often comes up as a reference for whether Spark SQL should support a feature or not.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702835086


   **[Test build #129357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129357/testReport)** for PR 29935 at commit [`80fbf32`](https://github.com/apache/spark/commit/80fbf32b85755c0cdfcbd97a2fe1b8d7695b8c54).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702807683


   **[Test build #129354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129354/testReport)** for PR 29935 at commit [`d9a6e44`](https://github.com/apache/spark/commit/d9a6e44cd3458cbff86057cb80849c78e8b0f2f3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-798601773


   > Yeah, it would be nice if we map it to timedelta.
   
   My biggest concern is that Spark implementation doesn't really map to `timedelta`. Let's say `INTERVAL 2 years` or `INTERVAL 1 month` ‒ we could derive some arbitrary rules for handling intervals expressed with units larger than weeks, but I am not sure how useful these will be in practice.
   
   Ultimately I'd like following to be satisfied
   
   ```python
   cd, i, yfn = spark.sql("""
       SELECT *, cd + i AS yfn FROM (SELECT current_date() as cd, INTERVAL 1 year AS i) t
   """).first()
   
   assert cd + i == yfn
   ```
   
   which, if I am not missing anything here, won't be possible with `timedelta`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795274212


   Yeah, it would be nice if we map it to timedelta.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-755931641


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133778/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29935:
URL: https://github.com/apache/spark/pull/29935#discussion_r562301831



##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):

Review comment:
       BTW, I personally agree with adding this in particular due to @nchammas point https://github.com/apache/spark/pull/29935#discussion_r562175621 here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702808309


   **[Test build #129354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129354/testReport)** for PR 29935 at commit [`d9a6e44`](https://github.com/apache/spark/commit/d9a6e44cd3458cbff86057cb80849c78e8b0f2f3).
    * This patch **fails Python style tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702848018


   **[Test build #129359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129359/testReport)** for PR 29935 at commit [`7d6ebd2`](https://github.com/apache/spark/commit/7d6ebd28b4e9a7b42f743b1fd63f53b51b46575d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #29935:
URL: https://github.com/apache/spark/pull/29935#discussion_r499119929



##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):

Review comment:
       Yes, I glanced over a few threads and couldn't really figure out where it is going, hence a limited scope of this PR. However, if the type is supported in multiple contexts, current behavior doesn't seem like an intended one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702808582






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702832071






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702842684






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29935:
URL: https://github.com/apache/spark/pull/29935#discussion_r562301332



##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):

Review comment:
       The problem is that it has been half-exposed so far. There have been many discussions up to which context we should support. e.g.) `CalendarInterval` is marked as `Unstable`.

##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):

Review comment:
       BTW, I personally agree with adding this in particular due to @nchammas point https://github.com/apache/spark/pull/29935#discussion_r562175621 here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29935:
URL: https://github.com/apache/spark/pull/29935#discussion_r499113675



##########
File path: python/pyspark/sql/types.py
##########
@@ -186,6 +186,30 @@ def fromInternal(self, ts):
             return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
 
 
+class CalendarIntervalType(DataType, metaclass=DataTypeSingleton):

Review comment:
       There have been a lot of discussions about exposing interval type in other language APIs but I lost the track. @yaooqinn and @cloud-fan, are we going to make internal as a proper exposed type? Or only support it in some contexts?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-755912415


   **[Test build #133778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133778/testReport)** for PR 29935 at commit [`69438b4`](https://github.com/apache/spark/commit/69438b418aafa76d6d23f6ffbb146ac2b6e10019).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MrPowers commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
MrPowers commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-761705061


   @zero323 - Adding `CalendarIntervalType` to PySpark is a great idea.
   
   [CalendarIntervalType](https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/types/CalendarIntervalType.html) is already in the Scala API and allows for some awesome functionality.
   
   Here's the Spark 3.0.1 behavior with Scala:
   
   ```scala
   import java.sql.Date
   import org.apache.spark.sql.functions._
   val df = Seq(
     (Date.valueOf("2021-01-23"), Date.valueOf("2021-01-21"))
   ).toDF("date1", "date2")
   df.withColumn("new_datediff", $"date1" - $"date2").show()
   //+----------+----------+------------+
   //|     date1|     date2|new_datediff|
   //+----------+----------+------------+
   //|2021-01-23|2021-01-21|      2 days|
   //+----------+----------+------------+
   
   df.withColumn("new_datediff", $"date1" - $"date2").printSchema()
   //root
   // |-- date1: date (nullable = true)
   // |-- date2: date (nullable = true)
   // |-- new_datediff: interval (nullable = true)
   ```
   
   Getting this functionality in PySpark would be a huge win.  
   
   Let me know if there is anything I can do to help you move this PR forward @zero323!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702868160


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33967/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795278405


   Okay, looks pretty good. I will take another look tomorrow. @zero323, before we go further, can we file an umbrella JIRA to address the full support in PySpark? I would like to make sure we complete it once we add this type in PySpark.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-795335038






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-755912415


   **[Test build #133778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133778/testReport)** for PR 29935 at commit [`69438b4`](https://github.com/apache/spark/commit/69438b418aafa76d6d23f6ffbb146ac2b6e10019).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-755931641


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133778/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29935: [SPARK-33055][PYTHON][SQL] Add Python CalendarIntervalType

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29935:
URL: https://github.com/apache/spark/pull/29935#issuecomment-702869952


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33969/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org