You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/01 09:22:21 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

HyukjinKwon opened a new pull request #33877:
URL: https://github.com/apache/spark/pull/33877


   ### What changes were proposed in this pull request?
   
   This PR adds the support of `TimestampNTZType` in pandas API on Spark. They are handled identically with `TimestampType` for now. In fact, we don't support `datetime` with timezone yet.
   
   This PR is dependent on #33876 and #33875.
   
   ### Why are the changes needed?
   
   To complete `TimestampNTZ` support.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, now pandas API on Spark can handle `TimestampNTZType`. 
   
   ```python
   import datetime
   spark.createDataFrame([(datetime.datetime.now(),)], schema="dt timestamp_ntz").to_pandas_on_spark()
   ```
   
   ```
                             dt
   0 2021-08-31 19:58:55.024410
   ```
   
   ### How was this patch tested?
   
   Unittests were added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914020548


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47541/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912847527


   **[Test build #142985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142985/testReport)** for PR 33877 at commit [`83afb0b`](https://github.com/apache/spark/commit/83afb0b01a2059473801a99db8b41fe2354ccca7).
    * This patch **fails from timeout after a configured wait of `500m`**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913599261


   This PR is ready for a review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913457456


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47518/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914095069


   **[Test build #143037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143037/testReport)** for PR 33877 at commit [`6620ade`](https://github.com/apache/spark/commit/6620adeb54fe95870c6de1604dfc2cd32028bed2).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DatetimeNTZOps(DatetimeOps):`
     * `class DatetimeNTZConverter(object):`
     * `case class CastTimestampNTZToLong(child: Expression) extends TimestampToLongBase `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909168750


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47386/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909137450






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912493897


   **[Test build #142983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142983/testReport)** for PR 33877 at commit [`b4fc1fa`](https://github.com/apache/spark/commit/b4fc1fa67a66e433b1ec01278e8e96ac52531acc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913457538


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47518/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913457538


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47518/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913606738


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143016/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703672741



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       > pd.Timestamp - pd.Timestamp
   
   So this means the same timestamp type right? either TIMESTAMP_LZT - TIMESTAMP_LZT, or TIMESTAMP_NZT - TIMESTAMP_NZT




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912767698


   **[Test build #142982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142982/testReport)** for PR 33877 at commit [`6bdfad2`](https://github.com/apache/spark/commit/6bdfad2e703a16676894ef4ab2e17eff60db539d).
    * This patch **fails from timeout after a configured wait of `500m`**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DatetimeNTZConverter(object):`
     * `case class UnixSecondsUTC(child: Expression) extends TimestampToLongBase `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r701909457



##########
File path: python/pyspark/pandas/tests/test_dataframe.py
##########
@@ -5172,6 +5172,7 @@ def test_explode(self):
         self.assertRaises(TypeError, lambda: psdf.explode(["A", "B"]))
         self.assertRaises(ValueError, lambda: psdf.explode("A"))
 
+    @unittest.skip("Skip for testing")

Review comment:
       should revert this back before merging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913408571


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47515/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913379381


   **[Test build #143013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143013/testReport)** for PR 33877 at commit [`7a920ad`](https://github.com/apache/spark/commit/7a920adc906bac33ede8a8d81a11d5c8f01a88ad).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909131812


   cc @ueshin, @xinrong-databricks and @itholic FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913417594


   **[Test build #143016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143016/testReport)** for PR 33877 at commit [`128f796`](https://github.com/apache/spark/commit/128f796bb02f41a1a551bd0cd6b0571f3318d35d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913372210


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47513/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913997050


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47541/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912688012


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142984/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913688387


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47524/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913983644


   **[Test build #143038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143038/testReport)** for PR 33877 at commit [`f83511b`](https://github.com/apache/spark/commit/f83511b4729f5d4906205f400cba57f5ab0dcd3b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909137450


   **[Test build #142883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142883/testReport)** for PR 33877 at commit [`2354abf`](https://github.com/apache/spark/commit/2354abf497c27b15b3222e2b053372119e4ba28f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909131812


   cc @ueshin FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703373865



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -708,6 +708,19 @@ case class UnixSeconds(child: Expression) extends TimestampToLongBase {
     copy(child = newChild)
 }
 
+// Internal expression used to get the raw UTC timestamp in pandas API on Spark.

Review comment:
       The timstamp will be considered as a local timesone and will be normalized to UTC internally:
   
   https://github.com/apache/spark/blob/f2492772baf1d00d802e704f84c22a9c410929e9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L510
   
   e.g.)
   
   ```scala
   scala> sql("SELECT CAST(CAST(TIMESTAMP_NTZ '1970-01-01 00:00:00' AS TIMESTAMP) AS LONG)").show()
   ```
   ```
   +----------------------------------------------------------------------+
   |CAST(CAST(TIMESTAMP_NTZ '1970-01-01 00:00:00' AS TIMESTAMP) AS BIGINT)|
   +----------------------------------------------------------------------+
   |                                                                -32400|
   +----------------------------------------------------------------------+
   ```
   
   This has to be in UTC to mimic pandas' behaviour.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914098150


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143037/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913403585


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47515/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913606738


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143016/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r701796691



##########
File path: python/pyspark/testing/sqlutils.py
##########
@@ -249,6 +249,7 @@ class ReusedSQLTestCase(ReusedPySparkTestCase, SQLTestUtils):
     def setUpClass(cls):
         super(ReusedSQLTestCase, cls).setUpClass()
         cls.spark = SparkSession(cls.sc)
+        cls.spark.conf.set("spark.sql.timestampType", "TIMESTAMP_NTZ")

Review comment:
       should revert back when the tests pass




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909131812


   cc @ueshin FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r699640407



##########
File path: python/pyspark/pandas/groupby.py
##########
@@ -1439,8 +1439,15 @@ def _make_pandas_df_builder_func(
         the same pandas DataFrame as if the pandas-on-Spark DataFrame is collected to driver side.
         The index, column labels, etc. are re-constructed within the function.
         """
+        from pyspark.pandas.utils import default_session
+
         arguments_for_restore_index = psdf._internal.arguments_for_restore_index
 
+        prefer_timestamp_ntz = (

Review comment:
       I am wondering if we may save it as a global variable, then we may reuse it https://github.com/apache/spark/pull/33877/files#diff-fac4c35e2182657dfceedcaa20fd78963573ad34f08fd597067652e66dad53eeR1455-R1457 as well.
   
   The current approach looks good enough though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912232111


   **[Test build #142957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142957/testReport)** for PR 33877 at commit [`abce481`](https://github.com/apache/spark/commit/abce4819be4142f24a01e6259f231fc24c1577ea).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r701796298



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -470,6 +470,7 @@ def default_session(conf: Optional[Dict[str, Any]] = None) -> SparkSession:
     # Currently, pandas-on-Spark is dependent on such join due to 'compute.ops_on_diff_frames'
     # configuration. This is needed with Spark 3.0+.
     builder.config("spark.sql.analyzer.failAmbiguousSelfJoin", False)
+    builder.config("spark.sql.timestampType", "TIMESTAMP_NTZ")

Review comment:
       should revert this back once tests pass.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912546168


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47486/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703562782



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       This is not related to the change. It will hit the code path when:
   
   pd.Timestamp - pd.Timestamp which will result in interval types.
   
   Once we implement ones in PySpark, we should switch it to return interval types. Currently, we return long values instead of interval types due to missing interval in PySpark.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913321114


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47510/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912500264


   cc @MaxGekk @gengliangwang @cloud-fan too FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703059104



##########
File path: python/pyspark/sql/types.py
##########
@@ -1665,7 +1665,35 @@ def convert(self, obj, gateway_client):
         t.setNanos(obj.microsecond * 1000)
         return t
 
+
+class DatetimeNTZConverter(object):
+    def can_convert(self, obj):
+        def prefer_timestamp_ntz():
+            from pyspark.sql import SparkSession
+
+            session = SparkSession._activeSession
+
+            with SparkContext._lock:
+                context = SparkContext._active_spark_context
+                if context is not None and session is not None:
+                    return session._is_timestamp_ntz_preferred()
+                else:
+                    return False
+
+        return isinstance(obj, datetime.datetime) and obj.tzinfo is None and prefer_timestamp_ntz()
+
+    def convert(self, obj, gateway_client):
+        from pyspark import SparkContext
+
+        seconds = calendar.timegm(obj.utctimetuple())
+        jvm = SparkContext._jvm
+        return jvm.org.apache.spark.sql.catalyst.util.DateTimeUtils.microsToLocalDateTime(
+            int(seconds) * 1000000 + obj.microsecond

Review comment:
       Don't you afraid to overflow int?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703373865



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -708,6 +708,19 @@ case class UnixSeconds(child: Expression) extends TimestampToLongBase {
     copy(child = newChild)
 }
 
+// Internal expression used to get the raw UTC timestamp in pandas API on Spark.

Review comment:
       The timstamp will be considered as a local timesone and will be normalized to UTC internally:
   
   https://github.com/apache/spark/blob/f2492772baf1d00d802e704f84c22a9c410929e9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L510
   
   e.g.)
   
   ```scala
   scala> sql("SELECT CAST(CAST(TIMESTAMP_NTZ '1970-01-01 00:00:00' AS TIMESTAMP) AS LONG)").show()
   +----------------------------------------------------------------------+
   |CAST(CAST(TIMESTAMP_NTZ '1970-01-01 00:00:00' AS TIMESTAMP) AS BIGINT)|
   +----------------------------------------------------------------------+
   |                                                                -32400|
   +----------------------------------------------------------------------+
   ```
   
   This has to be in UTC to mimic pandas' behaviour.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r702168125



##########
File path: python/pyspark/pandas/data_type_ops/base.py
##########
@@ -184,10 +186,21 @@ def _as_other_type(
     )
     assert not need_pre_process, "Pre-processing is needed before the type casting."
 
-    scol = index_ops.spark.column.cast(spark_type)
+    if isinstance(index_ops.spark.data_type, TimestampNTZType) and isinstance(
+        spark_type, NumericType
+    ):
+        scol = _cast_spark_column_timestamp_ntz_to_long(index_ops.spark.column).cast(spark_type)
+    else:
+        scol = index_ops.spark.column.cast(spark_type)

Review comment:
       We shouldn't put `index_ops`'s data type dependent code here but in `DatetimeOps.astype` instead?

##########
File path: python/pyspark/pandas/data_type_ops/base.py
##########
@@ -184,10 +186,21 @@ def _as_other_type(
     )
     assert not need_pre_process, "Pre-processing is needed before the type casting."
 
-    scol = index_ops.spark.column.cast(spark_type)
+    if isinstance(index_ops.spark.data_type, TimestampNTZType) and isinstance(
+        spark_type, NumericType
+    ):
+        scol = _cast_spark_column_timestamp_ntz_to_long(index_ops.spark.column).cast(spark_type)
+    else:
+        scol = index_ops.spark.column.cast(spark_type)
+
     return index_ops._with_new_scol(scol, field=InternalField(dtype=dtype))
 
 
+def _cast_spark_column_timestamp_ntz_to_long(scol: Column) -> Column:
+    jvm = SparkContext._active_spark_context._jvm  # type: ignore
+    return Column(jvm.PythonSQLUtils.unixSecondsUTC(scol._jc))

Review comment:
       This should be placed in `.../data_type_ops/datetime_ops.py`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913353643


   **[Test build #143011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143011/testReport)** for PR 33877 at commit [`178c70b`](https://github.com/apache/spark/commit/178c70bfb9937ab6e94baabccad5af619ce1680a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r699575464



##########
File path: python/pyspark/pandas/typedef/typehints.py
##########
@@ -313,7 +315,9 @@ def pandas_on_spark_type(tpe: Union[str, type, Dtype]) -> Tuple[Dtype, types.Dat
     return dtype, spark_type
 
 
-def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType:
+def infer_pd_series_spark_type(
+    pser: pd.Series, dtype: Dtype, prefer_timestamp_ntz: bool = False
+) -> types.DataType:
     """Infer Spark DataType from pandas Series dtype.
 
     :param pser: :class:`pandas.Series` to be inferred

Review comment:
       nit: Shall we add docstring for the `prefer_timestamp_ntz` parameter?

##########
File path: python/pyspark/pandas/groupby.py
##########
@@ -1439,8 +1439,15 @@ def _make_pandas_df_builder_func(
         the same pandas DataFrame as if the pandas-on-Spark DataFrame is collected to driver side.
         The index, column labels, etc. are re-constructed within the function.
         """
+        from pyspark.pandas.utils import default_session
+
         arguments_for_restore_index = psdf._internal.arguments_for_restore_index
 
+        prefer_timestamp_ntz = (

Review comment:
       I am wondering if we may save it as a global variable, then we may reuse it https://github.com/apache/spark/pull/33877/files#diff-fac4c35e2182657dfceedcaa20fd78963573ad34f08fd597067652e66dad53eeR1455-R1457 as well.
   
   The current approach looks good enough though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913592649


   **[Test build #143016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143016/testReport)** for PR 33877 at commit [`128f796`](https://github.com/apache/spark/commit/128f796bb02f41a1a551bd0cd6b0571f3318d35d).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DatetimeNTZOps(DatetimeOps):`
     * `class DatetimeNTZConverter(object):`
     * `case class CastTimestampNTZToLong(child: Expression) extends TimestampToLongBase `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703916826



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       Yeah. So this one has to be removed once intervals are implemented in PySpark. At this moment, we cannot remove this or let Spark SQL to decide it by implicit cast. Not because Spark SQL does not have the implicit cast on NTZ and LTZ, but because PySpark doesn't have interval implementation.
   
   Or do you suggest to implement the type coercion between NTZ and LTZ in this PR, and use something like `(right - left).astype("int")` in this PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909173739






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914019836


   Overall LGTM. Thanks for the work!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913379381


   **[Test build #143013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143013/testReport)** for PR 33877 at commit [`7a920ad`](https://github.com/apache/spark/commit/7a920adc906bac33ede8a8d81a11d5c8f01a88ad).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914000087


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47540/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912616348


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47487/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912571866


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47484/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913704128


   **[Test build #143013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143013/testReport)** for PR 33877 at commit [`7a920ad`](https://github.com/apache/spark/commit/7a920adc906bac33ede8a8d81a11d5c8f01a88ad).
    * This patch **fails from timeout after a configured wait of `500m`**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DatetimeNTZOps(DatetimeOps):`
     * `class DatetimeNTZConverter(object):`
     * `case class CastTimestampNTZToLong(child: Expression) extends TimestampToLongBase `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913977412


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47540/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r701796298



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -470,6 +470,7 @@ def default_session(conf: Optional[Dict[str, Any]] = None) -> SparkSession:
     # Currently, pandas-on-Spark is dependent on such join due to 'compute.ops_on_diff_frames'
     # configuration. This is needed with Spark 3.0+.
     builder.config("spark.sql.analyzer.failAmbiguousSelfJoin", False)
+    builder.config("spark.sql.timestampType", "TIMESTAMP_NTZ")

Review comment:
       should revert this back once tests pass.

##########
File path: python/pyspark/testing/sqlutils.py
##########
@@ -249,6 +249,7 @@ class ReusedSQLTestCase(ReusedPySparkTestCase, SQLTestUtils):
     def setUpClass(cls):
         super(ReusedSQLTestCase, cls).setUpClass()
         cls.spark = SparkSession(cls.sc)
+        cls.spark.conf.set("spark.sql.timestampType", "TIMESTAMP_NTZ")

Review comment:
       should revert back when the tests pass




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913335390


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47510/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913408541


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47515/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912546143


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47486/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913417594


   **[Test build #143016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143016/testReport)** for PR 33877 at commit [`128f796`](https://github.com/apache/spark/commit/128f796bb02f41a1a551bd0cd6b0571f3318d35d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913607208


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47523/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913779282


   **[Test build #143022 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143022/testReport)** for PR 33877 at commit [`488f1e6`](https://github.com/apache/spark/commit/488f1e66e7b21e289a72119ded412a81f9051e4e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DatetimeNTZOps(DatetimeOps):`
     * `class DatetimeNTZConverter(object):`
     * `case class CastTimestampNTZToLong(child: Expression) extends TimestampToLongBase `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913424406


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143008/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913607183


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47523/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913367978


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47513/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703368553



##########
File path: python/pyspark/sql/utils.py
##########
@@ -210,3 +210,14 @@ def to_str(value):
         return value
     else:
         return str(value)
+
+
+def is_timestamp_ntz_preferred():

Review comment:
       We can. I took this out of `SparkSession` because
   1. there are some places like `types.py` where some methods can be accessed without initiating `SparkSession`
   2. it doesn't require any properties in `SparkSession` instance
   3. some tests fail for `SQLContext` (e.g., https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143016/console)
   4. When `SparkContext` or `SparkSession` is stopped, JVM is still supposed to be alive in PySpark. It's safer to don't assume on `SparkSession`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912540183


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47486/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912462593


   **[Test build #142982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142982/testReport)** for PR 33877 at commit [`6bdfad2`](https://github.com/apache/spark/commit/6bdfad2e703a16676894ef4ab2e17eff60db539d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r701796596



##########
File path: python/pyspark/sql/types.py
##########
@@ -202,7 +202,7 @@ def typeName(cls):
 
     def toInternal(self, dt):
         if dt is not None:
-            seconds = calendar.timegm(dt.timetuple())
+            seconds = calendar.timegm(dt.utctimetuple())

Review comment:
       doesn't make any difference in spark. just semantical correction.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913310625


   **[Test build #143008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143008/testReport)** for PR 33877 at commit [`3f30ab9`](https://github.com/apache/spark/commit/3f30ab9b35fc54679e4900905f8516cd1162fcf5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914026546


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47541/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914152414


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143038/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912242735


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913500934


   **[Test build #143011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143011/testReport)** for PR 33877 at commit [`178c70b`](https://github.com/apache/spark/commit/178c70bfb9937ab6e94baabccad5af619ce1680a).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909173739






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913964045


   **[Test build #143037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143037/testReport)** for PR 33877 at commit [`6620ade`](https://github.com/apache/spark/commit/6620adeb54fe95870c6de1604dfc2cd32028bed2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703294771



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       If left side is LTZ and right side is NTZ, we need to let Spark add implicit cast.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914150506


   **[Test build #143038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143038/testReport)** for PR 33877 at commit [`f83511b`](https://github.com/apache/spark/commit/f83511b4729f5d4906205f400cba57f5ab0dcd3b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DatetimeNTZOps(DatetimeOps):`
     * `class DatetimeNTZConverter(object):`
     * `case class CastTimestampNTZToLong(child: Expression) extends TimestampToLongBase `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909173739






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912847871


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142985/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703936859



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       Do you suggest something like `(right - left).astype("int")`? This won't work because:
   1. interval can't be converted to longs. To natively support this, it requires internal implementation on PySpark
   2. `TimestampNTZ` is considered as unix timestamp in UTC but `TIMESTAMP_LZT - TIMESTAMP_NZT` or `TIMESTAMP_LZT - TIMESTAMP_NZT` will assume `TIMESTAMP_NZT` is in local session timezone. e.g.):
       ```scala
       scala> sql("SELECT TIMESTAMP '1970-01-01 00:00:00' - TIMESTAMP_NTZ '1970-01-01 00:00:00'").show(false)
       ```
       should result in something like `INTERVAL '0 09:00:00' DAY TO SECOND` (I am in KST) but it result in `INTERVAL '0 00:00:00' DAY TO SECOND`
   
   

##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       Yeah. So this one has to be removed once intervals are implemented in PySpark. At this moment, we cannot remove this or let Spark SQL to decide it by implicit cast. Not because Spark SQL does not have the implicit cast on NTZ and LTZ, but because PySpark doesn't have interval implementation.
   
   Or do you suggest to implement the type coercion between NTZ and LTZ in this PR, and use something like `(right - left).astype("int")` in this PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913964232


   I cleaned up a bit more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r702536157



##########
File path: python/pyspark/pandas/spark/accessors.py
##########
@@ -352,33 +338,6 @@ def print_schema(self, index_col: Optional[Union[str, List[str]]] = None) -> Non
         Returns
         -------
         None
-

Review comment:
       should revert

##########
File path: python/pyspark/pandas/spark/accessors.py
##########
@@ -322,20 +322,6 @@ def schema(self, index_col: Optional[Union[str, List[str]]] = None) -> StructTyp
         index_col: str or list of str, optional, default: None
             Column names to be used in Spark to represent pandas-on-Spark's index. The index name
             in pandas-on-Spark is ignored. By default, the index is always lost.
-

Review comment:
       should revert




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913582274


   **[Test build #143021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143021/testReport)** for PR 33877 at commit [`7485ced`](https://github.com/apache/spark/commit/7485ced7690e1dee9134c1d1aaedff1d0cbfdde2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-911488421


   I set `spark.sql.timestampType` to `TIMESTAMP_NTZ` to see if tests pass at https://github.com/apache/spark/pull/33877/commits/59efcacf74272634bcde0b24007a8a87161da25f. Should revert before merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703207364



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -708,6 +708,19 @@ case class UnixSeconds(child: Expression) extends TimestampToLongBase {
     copy(child = newChild)
 }
 
+// Internal expression used to get the raw UTC timestamp in pandas API on Spark.
+// This is to work around casting timestamp_ntz to long disallowed by ANSI.

Review comment:
       Thanks for the comment here!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r702536157



##########
File path: python/pyspark/pandas/spark/accessors.py
##########
@@ -352,33 +338,6 @@ def print_schema(self, index_col: Optional[Union[str, List[str]]] = None) -> Non
         Returns
         -------
         None
-

Review comment:
       should revert

##########
File path: python/pyspark/pandas/spark/accessors.py
##########
@@ -322,20 +322,6 @@ def schema(self, index_col: Optional[Union[str, List[str]]] = None) -> StructTyp
         index_col: str or list of str, optional, default: None
             Column names to be used in Spark to represent pandas-on-Spark's index. The index name
             in pandas-on-Spark is ignored. By default, the index is always lost.
-

Review comment:
       should revert

##########
File path: python/pyspark/pandas/tests/test_dataframe.py
##########
@@ -5172,6 +5172,7 @@ def test_explode(self):
         self.assertRaises(TypeError, lambda: psdf.explode(["A", "B"]))
         self.assertRaises(ValueError, lambda: psdf.explode("A"))
 
+    @unittest.skip("Skip for testing")

Review comment:
       should revert this back before merging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909131812


   cc @ueshin FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913310625


   **[Test build #143008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143008/testReport)** for PR 33877 at commit [`3f30ab9`](https://github.com/apache/spark/commit/3f30ab9b35fc54679e4900905f8516cd1162fcf5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913424406


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143008/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912462593


   **[Test build #142982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142982/testReport)** for PR 33877 at commit [`6bdfad2`](https://github.com/apache/spark/commit/6bdfad2e703a16676894ef4ab2e17eff60db539d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913352501


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47510/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913964232


   I cleaned up a bit more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909137450


   **[Test build #142883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142883/testReport)** for PR 33877 at commit [`2354abf`](https://github.com/apache/spark/commit/2354abf497c27b15b3222e2b053372119e4ba28f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912608267


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47487/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912662146


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142983/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913502058


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143011/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912504131


   **[Test build #142984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142984/testReport)** for PR 33877 at commit [`785d89f`](https://github.com/apache/spark/commit/785d89f1199582db1083e2a41cdfbd571efca28d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909131812


   cc @ueshin, @xinrong-databricks and @itholic FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912542839






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912688012


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142984/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912575453


   **[Test build #142985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142985/testReport)** for PR 33877 at commit [`83afb0b`](https://github.com/apache/spark/commit/83afb0b01a2059473801a99db8b41fe2354ccca7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912662146


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142983/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909155647


   **[Test build #142883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142883/testReport)** for PR 33877 at commit [`2354abf`](https://github.com/apache/spark/commit/2354abf497c27b15b3222e2b053372119e4ba28f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913421364


   **[Test build #143008 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143008/testReport)** for PR 33877 at commit [`3f30ab9`](https://github.com/apache/spark/commit/3f30ab9b35fc54679e4900905f8516cd1162fcf5).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DatetimeNTZOps(DatetimeOps):`
     * `case class CastTimestampNTZToLong(child: Expression) extends TimestampToLongBase `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909174922


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47386/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913772990


   **[Test build #143021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143021/testReport)** for PR 33877 at commit [`7485ced`](https://github.com/apache/spark/commit/7485ced7690e1dee9134c1d1aaedff1d0cbfdde2).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DatetimeNTZOps(DatetimeOps):`
     * `class DatetimeNTZConverter(object):`
     * `case class CastTimestampNTZToLong(child: Expression) extends TimestampToLongBase `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909173739


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142883/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909137450


   **[Test build #142883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142883/testReport)** for PR 33877 at commit [`2354abf`](https://github.com/apache/spark/commit/2354abf497c27b15b3222e2b053372119e4ba28f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914152414


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143038/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913688387


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47524/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703075783



##########
File path: python/pyspark/sql/types.py
##########
@@ -1665,7 +1665,35 @@ def convert(self, obj, gateway_client):
         t.setNanos(obj.microsecond * 1000)
         return t
 
+
+class DatetimeNTZConverter(object):
+    def can_convert(self, obj):
+        def prefer_timestamp_ntz():
+            from pyspark.sql import SparkSession
+
+            session = SparkSession._activeSession
+
+            with SparkContext._lock:
+                context = SparkContext._active_spark_context
+                if context is not None and session is not None:
+                    return session._is_timestamp_ntz_preferred()
+                else:
+                    return False
+
+        return isinstance(obj, datetime.datetime) and obj.tzinfo is None and prefer_timestamp_ntz()
+
+    def convert(self, obj, gateway_client):
+        from pyspark import SparkContext
+
+        seconds = calendar.timegm(obj.utctimetuple())
+        jvm = SparkContext._jvm
+        return jvm.org.apache.spark.sql.catalyst.util.DateTimeUtils.microsToLocalDateTime(
+            int(seconds) * 1000000 + obj.microsecond

Review comment:
       Python one doesn't have overflow :). This is also consistent with other places of conversion in Python.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912536843


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47485/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912768521


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142982/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703936859



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       Do you suggest something like `(right - left).astype("int")`? This won't work because:
   1. interval can't be converted to longs. To natively support this, it requires interval implementation on PySpark
   2. Here in pandas context, `TIMESTAMP_NZT` is considered as unix timestamp in UTC, and `TIMESTAMP_LZT` is considered as a local (session) time. But in Spark SQL `TIMESTAMP_LZT - TIMESTAMP_NZT` or `TIMESTAMP_LZT - TIMESTAMP_NZT` will assume both are in local session timezone or an unknown timezone. e.g.):
       ```scala
       scala> sql("SELECT TIMESTAMP '1970-01-01 00:00:00' - TIMESTAMP_NTZ '1970-01-01 00:00:00'").show(false)
       ```
       should result in something like `INTERVAL '0 09:00:00' DAY TO SECOND` (I am in KST) but it result in `INTERVAL '0 00:00:00' DAY TO SECOND`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r701796596



##########
File path: python/pyspark/sql/types.py
##########
@@ -202,7 +202,7 @@ def typeName(cls):
 
     def toInternal(self, dt):
         if dt is not None:
-            seconds = calendar.timegm(dt.timetuple())
+            seconds = calendar.timegm(dt.utctimetuple())

Review comment:
       doesn't make any difference in spark. just semantical correction.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913352501


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47510/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-915675903


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912242495


   **[Test build #142957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142957/testReport)** for PR 33877 at commit [`abce481`](https://github.com/apache/spark/commit/abce4819be4142f24a01e6259f231fc24c1577ea).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912504131


   **[Test build #142984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142984/testReport)** for PR 33877 at commit [`785d89f`](https://github.com/apache/spark/commit/785d89f1199582db1083e2a41cdfbd571efca28d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912673172


   **[Test build #142984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142984/testReport)** for PR 33877 at commit [`785d89f`](https://github.com/apache/spark/commit/785d89f1199582db1083e2a41cdfbd571efca28d).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DatetimeNTZConverter(object):`
     * `case class UnixSecondsUTC(child: Expression) extends TimestampToLongBase `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912260710


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47457/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912616388


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47487/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913983644


   **[Test build #143038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143038/testReport)** for PR 33877 at commit [`f83511b`](https://github.com/apache/spark/commit/f83511b4729f5d4906205f400cba57f5ab0dcd3b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912558148


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47484/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913986456


   I cleaned up a bit more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913607208


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47523/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913378239


   This PR is ready for a review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909174994


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47386/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909137450






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913704693


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143013/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r704158782



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       offline discussed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913582274






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912542839


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47485/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912575453


   **[Test build #142985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142985/testReport)** for PR 33877 at commit [`83afb0b`](https://github.com/apache/spark/commit/83afb0b01a2059473801a99db8b41fe2354ccca7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913990550


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47540/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913376499


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47513/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912260673


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47457/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912616388


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47487/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912527524


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47484/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #33877:
URL: https://github.com/apache/spark/pull/33877


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703364329



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       This is to work around pandas' behaviour on internal types. PySpark didn't yet implement internal types so we can't let Spark do the implicit cast here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914098150


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143037/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913610119


   **[Test build #143022 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143022/testReport)** for PR 33877 at commit [`488f1e6`](https://github.com/apache/spark/commit/488f1e66e7b21e289a72119ded412a81f9051e4e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703936859



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       Do you suggest something like `(right - left).astype("int")`? This won't work because:
   1. interval can't be converted to longs. To natively support this, it requires internal implementation on PySpark
   2. Here in pandas context, `TimestampNTZType` is considered as unix timestamp in UTC, and `TimestampType` is considered as a local (session) time. But in Spark SQL `TIMESTAMP_LZT - TIMESTAMP_NZT` or `TIMESTAMP_LZT - TIMESTAMP_NZT` will assume both are in local session timezone or an unknown timezone. e.g.):
       ```scala
       scala> sql("SELECT TIMESTAMP '1970-01-01 00:00:00' - TIMESTAMP_NTZ '1970-01-01 00:00:00'").show(false)
       ```
       should result in something like `INTERVAL '0 09:00:00' DAY TO SECOND` (I am in KST) but it result in `INTERVAL '0 00:00:00' DAY TO SECOND`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913353643


   **[Test build #143011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143011/testReport)** for PR 33877 at commit [`178c70b`](https://github.com/apache/spark/commit/178c70bfb9937ab6e94baabccad5af619ce1680a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703936859



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       Do you suggest something like `(right - left).astype("int")`? This won't work because:
   1. interval can't be converted to longs. To natively support this, it requires internal implementation on PySpark
   2. Here in pandas context, `TIMESTAMP_NZT` is considered as unix timestamp in UTC, and `TIMESTAMP_LZT` is considered as a local (session) time. But in Spark SQL `TIMESTAMP_LZT - TIMESTAMP_NZT` or `TIMESTAMP_LZT - TIMESTAMP_NZT` will assume both are in local session timezone or an unknown timezone. e.g.):
       ```scala
       scala> sql("SELECT TIMESTAMP '1970-01-01 00:00:00' - TIMESTAMP_NTZ '1970-01-01 00:00:00'").show(false)
       ```
       should result in something like `INTERVAL '0 09:00:00' DAY TO SECOND` (I am in KST) but it result in `INTERVAL '0 00:00:00' DAY TO SECOND`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703936859



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       Do you suggest something like `(right - left).astype("int")`? This won't work because:
   1. interval can't be converted to longs. To natively support this, it requires internal implementation on PySpark
   2. Here `TimestampNTZType` is considered as unix timestamp in UTC, and `TimestampType` is considered as a local (session) time. But `TIMESTAMP_LZT - TIMESTAMP_NZT` or `TIMESTAMP_LZT - TIMESTAMP_NZT` will assume both are in local session timezone or an unknown timezone. e.g.):
       ```scala
       scala> sql("SELECT TIMESTAMP '1970-01-01 00:00:00' - TIMESTAMP_NTZ '1970-01-01 00:00:00'").show(false)
       ```
       should result in something like `INTERVAL '0 09:00:00' DAY TO SECOND` (I am in KST) but it result in `INTERVAL '0 00:00:00' DAY TO SECOND`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912661061


   **[Test build #142983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142983/testReport)** for PR 33877 at commit [`b4fc1fa`](https://github.com/apache/spark/commit/b4fc1fa67a66e433b1ec01278e8e96ac52531acc).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class DatetimeNTZConverter(object):`
     * `case class UnixSecondsUTC(child: Expression) extends TimestampToLongBase `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703517710



##########
File path: python/pyspark/pandas/data_type_ops/datetime_ops.py
##########
@@ -58,15 +66,18 @@ def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             "The timestamp subtraction returns an integer in seconds, "
             "whereas pandas returns 'timedelta64[ns]'."
         )
-        if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, TimestampType):
+        if isinstance(right, IndexOpsMixin) and isinstance(
+            right.spark.data_type, (TimestampType, TimestampNTZType)
+        ):
             warnings.warn(msg, UserWarning)
             return left.astype("long") - right.astype("long")

Review comment:
       Not very familiar with pandas. Can you give a real example that will hit this code path?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r699575464



##########
File path: python/pyspark/pandas/typedef/typehints.py
##########
@@ -313,7 +315,9 @@ def pandas_on_spark_type(tpe: Union[str, type, Dtype]) -> Tuple[Dtype, types.Dat
     return dtype, spark_type
 
 
-def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType:
+def infer_pd_series_spark_type(
+    pser: pd.Series, dtype: Dtype, prefer_timestamp_ntz: bool = False
+) -> types.DataType:
     """Infer Spark DataType from pandas Series dtype.
 
     :param pser: :class:`pandas.Series` to be inferred

Review comment:
       nit: Shall we add docstring for the `prefer_timestamp_ntz` parameter?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909131812


   cc @ueshin, @xinrong-databricks and @itholic FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912489818


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47483/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913502058


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143011/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913964045


   **[Test build #143037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143037/testReport)** for PR 33877 at commit [`6620ade`](https://github.com/apache/spark/commit/6620adeb54fe95870c6de1604dfc2cd32028bed2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913378239


   This PR is ready for a review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909173739






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912493897


   **[Test build #142983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142983/testReport)** for PR 33877 at commit [`b4fc1fa`](https://github.com/apache/spark/commit/b4fc1fa67a66e433b1ec01278e8e96ac52531acc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912491861


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47483/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913782393






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912260710


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47457/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912242735


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913452049


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47518/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912768521


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142982/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-915676100


   fyi, im gonna revisit all the dt operations in pandas api on Spark in q4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912847871


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142985/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703299464



##########
File path: python/pyspark/sql/utils.py
##########
@@ -210,3 +210,14 @@ def to_str(value):
         return value
     else:
         return str(value)
+
+
+def is_timestamp_ntz_preferred():

Review comment:
       shouldn't this be a method of `SparkSession` in pyspark?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909137450


   **[Test build #142883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142883/testReport)** for PR 33877 at commit [`2354abf`](https://github.com/apache/spark/commit/2354abf497c27b15b3222e2b053372119e4ba28f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912232111


   **[Test build #142957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142957/testReport)** for PR 33877 at commit [`abce481`](https://github.com/apache/spark/commit/abce4819be4142f24a01e6259f231fc24c1577ea).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909173739


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142883/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913636837


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47524/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912257202


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47457/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913376499


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47513/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-909174994


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47386/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914026546


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47541/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913674972


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47524/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912542798


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47485/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-914000087


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47540/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912491861


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47483/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913704693






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33877:
URL: https://github.com/apache/spark/pull/33877#discussion_r703300116



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -708,6 +708,19 @@ case class UnixSeconds(child: Expression) extends TimestampToLongBase {
     copy(child = newChild)
 }
 
+// Internal expression used to get the raw UTC timestamp in pandas API on Spark.

Review comment:
       shouldn't we cast NTZ to LTZ first and then cast LTZ to long?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-912484359


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47483/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913408571


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47515/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org