You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/14 20:37:36 UTC

[GitHub] [spark] xinrong-databricks opened a new pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

xinrong-databricks opened a new pull request #32177:
URL: https://github.com/apache/spark/pull/32177


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   Considate PySpark testing utils by merging python/pyspark/pandas/testing/utils.py into python/pyspark/testing/utils.py.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   python/pyspark/pandas/testing/utils.py and python/pyspark/testing/utils.py share the same goal. Merging them makes code cleaner and easier to maintain.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Unit tests under python/pyspark/pandas/tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820033615


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137392/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820612588


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42006/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824324661


   **[Test build #137744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137744/testReport)** for PR 32177 at commit [`5e6fa2a`](https://github.com/apache/spark/commit/5e6fa2ae9282b10d48539ce3d5784833061c62d9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-823719173






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-823701287


   **[Test build #137706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137706/testReport)** for PR 32177 at commit [`4be5e72`](https://github.com/apache/spark/commit/4be5e726713c1fd475ca5fa2f2710f386eaba21d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819827025


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137372/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822683267


   **[Test build #137644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137644/testReport)** for PR 32177 at commit [`ae73324`](https://github.com/apache/spark/commit/ae73324764d48c37dbf6caf5830145b45463ba63).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824389208


   **[Test build #137750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137750/testReport)** for PR 32177 at commit [`9beea9d`](https://github.com/apache/spark/commit/9beea9dc03f01c9b681bd6798f2a7ec223fa6604).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820617285


   **[Test build #137432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137432/testReport)** for PR 32177 at commit [`aa3590e`](https://github.com/apache/spark/commit/aa3590e655c7d0e9f5fb717aa633b843e42c4a53).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820067610


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41970/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820655853


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42009/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-821540797


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42069/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-821523085


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137493/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r617802092



##########
File path: python/pyspark/pandas/tests/plot/test_frame_plot_matplotlib.py
##########
@@ -25,7 +25,11 @@
 
 from pyspark import pandas as ps
 from pyspark.pandas.config import set_option, reset_option
-from pyspark.pandas.testing.utils import have_matplotlib, ReusedSQLTestCase, TestUtils
+from pyspark.testing.pandasutils import (
+    have_matplotlib,
+    matplotlib_requirement_message,
+    ReusedSQLTestCase, TestUtils

Review comment:
       nit: style. Shall we break the line between `ReusedSQLTestCase` and `TestUtils`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824425738


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42277/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820025444


   **[Test build #137392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137392/testReport)** for PR 32177 at commit [`9781e5d`](https://github.com/apache/spark/commit/9781e5dc8fbacfffa08e749a52a03e463af9b552).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-825043437


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42346/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-825034459


   **[Test build #137816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137816/testReport)** for PR 32177 at commit [`95bdbcf`](https://github.com/apache/spark/commit/95bdbcf2b368f0a1f252f963a5c4fb534d387ec0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r613723822



##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):
+    """
+    This util assumes the instance of this to have 'spark' attribute, having a spark session.
+    It is usually used with 'ReusedSQLTestCase' class but can be used if you feel sure the
+    the implementation of this class has 'spark' attribute.
+    """
+
+    @contextmanager
+    def sql_conf(self, pairs):
+        """
+        A convenient context manager to test some configuration specific logic. This sets
+        `value` to the configuration `key` and then restores it back when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        with sqlc(pairs, spark=self.spark):
+            yield
+
+    @contextmanager
+    def database(self, *databases):
+        """
+        A convenient context manager to test with some specific databases. This drops the given
+        databases if it exists and sets current database to "default" when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for db in databases:
+                self.spark.sql("DROP DATABASE IF EXISTS %s CASCADE" % db)
+            self.spark.catalog.setCurrentDatabase("default")
+
+    @contextmanager
+    def table(self, *tables):
+        """
+        A convenient context manager to test with some specific tables. This drops the given tables
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for t in tables:
+                self.spark.sql("DROP TABLE IF EXISTS %s" % t)
+
+    @contextmanager
+    def tempView(self, *views):
+        """
+        A convenient context manager to test with some specific views. This drops the given views
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for v in views:
+                self.spark.catalog.dropTempView(v)
+
+    @contextmanager
+    def function(self, *functions):
+        """
+        A convenient context manager to test with some specific functions. This drops the given
+        functions if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for f in functions:
+                self.spark.sql("DROP FUNCTION IF EXISTS %s" % f)
+
+
+class ReusedSQLTestCase(unittest.TestCase, SQLTestUtils):
+    @classmethod
+    def setUpClass(cls):
+        cls.spark = default_session()
+        cls.spark.conf.set(SPARK_CONF_ARROW_ENABLED, True)
+
+    @classmethod
+    def tearDownClass(cls):
+        # We don't stop Spark session to reuse across all tests.
+        # The Spark session will be started and stopped at PyTest session level.
+        # Please see databricks/koalas/conftest.py.
+        pass
+
+    def assertPandasEqual(self, left, right, check_exact=True):
+        if isinstance(left, pd.DataFrame) and isinstance(right, pd.DataFrame):
+            try:
+                if LooseVersion(pd.__version__) >= LooseVersion("1.1"):
+                    kwargs = dict(check_freq=False)
+                else:
+                    kwargs = dict()
+
+                assert_frame_equal(
+                    left,
+                    right,
+                    check_index_type=("equiv" if len(left.index) > 0 else False),
+                    check_column_type=("equiv" if len(left.columns) > 0 else False),
+                    check_exact=check_exact,
+                    **kwargs
+                )
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtypes)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtypes)
+                )
+                raise AssertionError(msg) from e
+        elif isinstance(left, pd.Series) and isinstance(right, pd.Series):
+            try:
+                if LooseVersion(pd.__version__) >= LooseVersion("1.1"):
+                    kwargs = dict(check_freq=False)
+                else:
+                    kwargs = dict()
+
+                assert_series_equal(
+                    left,
+                    right,
+                    check_index_type=("equiv" if len(left.index) > 0 else False),
+                    check_exact=check_exact,
+                    **kwargs
+                )
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+                )
+                raise AssertionError(msg) from e
+        elif isinstance(left, pd.Index) and isinstance(right, pd.Index):
+            try:
+                assert_index_equal(left, right, check_exact=check_exact)
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+                )
+                raise AssertionError(msg) from e
+        else:
+            raise ValueError("Unexpected values: (%s, %s)" % (left, right))
+
+    def assertPandasAlmostEqual(self, left, right):
+        """
+        This function checks if given pandas objects approximately same,
+        which means the conditions below:
+          - Both objects are nullable
+          - Compare floats rounding to the number of decimal places, 7 after
+            dropping missing values (NaN, NaT, None)
+        """
+        if isinstance(left, pd.DataFrame) and isinstance(right, pd.DataFrame):
+            msg = (
+                "DataFrames are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtypes)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtypes)
+            )
+            self.assertEqual(left.shape, right.shape, msg=msg)
+            for lcol, rcol in zip(left.columns, right.columns):
+                self.assertEqual(lcol, rcol, msg=msg)
+                for lnull, rnull in zip(left[lcol].isnull(), right[rcol].isnull()):
+                    self.assertEqual(lnull, rnull, msg=msg)
+                for lval, rval in zip(left[lcol].dropna(), right[rcol].dropna()):
+                    self.assertAlmostEqual(lval, rval, msg=msg)
+            self.assertEqual(left.columns.names, right.columns.names, msg=msg)
+        elif isinstance(left, pd.Series) and isinstance(right, pd.Series):
+            msg = (
+                "Series are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(left.name, right.name, msg=msg)
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lnull, rnull in zip(left.isnull(), right.isnull()):
+                self.assertEqual(lnull, rnull, msg=msg)
+            for lval, rval in zip(left.dropna(), right.dropna()):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        elif isinstance(left, pd.MultiIndex) and isinstance(right, pd.MultiIndex):
+            msg = (
+                "MultiIndices are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lval, rval in zip(left, right):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        elif isinstance(left, pd.Index) and isinstance(right, pd.Index):
+            msg = (
+                "Indices are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lnull, rnull in zip(left.isnull(), right.isnull()):
+                self.assertEqual(lnull, rnull, msg=msg)
+            for lval, rval in zip(left.dropna(), right.dropna()):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        else:
+            raise ValueError("Unexpected values: (%s, %s)" % (left, right))
+
+    def assert_eq(self, left, right, check_exact=True, almost=False):
+        """
+        Asserts if two arbitrary objects are equal or not. If given objects are Koalas DataFrame
+        or Series, they are converted into pandas' and compared.
+
+        :param left: object to compare
+        :param right: object to compare
+        :param check_exact: if this is False, the comparison is done less precisely.
+        :param almost: if this is enabled, the comparison is delegated to `unittest`'s
+                       `assertAlmostEqual`. See its documentation for more details.
+        """
+        lobj = self._to_pandas(left)
+        robj = self._to_pandas(right)
+        if isinstance(lobj, (pd.DataFrame, pd.Series, pd.Index)):
+            if almost:
+                self.assertPandasAlmostEqual(lobj, robj)
+            else:
+                self.assertPandasEqual(lobj, robj, check_exact=check_exact)
+        elif is_list_like(lobj) and is_list_like(robj):
+            self.assertTrue(len(left) == len(right))
+            for litem, ritem in zip(left, right):
+                self.assert_eq(litem, ritem, check_exact=check_exact, almost=almost)
+        elif (lobj is not None and pd.isna(lobj)) and (robj is not None and pd.isna(robj)):
+            pass
+        else:
+            if almost:
+                self.assertAlmostEqual(lobj, robj)
+            else:
+                self.assertEqual(lobj, robj)
+
+    @staticmethod
+    def _to_pandas(obj):
+        if isinstance(obj, (DataFrame, Series, Index)):
+            return obj.to_pandas()
+        else:
+            return obj
+
+
+class TestUtils(object):
+    @contextmanager
+    def temp_dir(self):
+        tmp = tempfile.mkdtemp()
+        try:
+            yield tmp
+        finally:
+            shutil.rmtree(tmp)
+
+    @contextmanager
+    def temp_file(self):
+        with self.temp_dir() as tmp:
+            yield tempfile.mktemp(dir=tmp)
+
+
+class ComparisonTestBase(ReusedSQLTestCase):
+    @property
+    def kdf(self):
+        return ps.from_pandas(self.pdf)
+
+    @property
+    def pdf(self):
+        return self.kdf.to_pandas()
+
+
+def compare_both(f=None, almost=True):
+
+    if f is None:
+        return functools.partial(compare_both, almost=almost)
+    elif isinstance(f, bool):
+        return functools.partial(compare_both, almost=f)
+
+    @functools.wraps(f)
+    def wrapped(self):
+        if almost:
+            compare = self.assertPandasAlmostEqual
+        else:
+            compare = self.assertPandasEqual
+
+        for result_pandas, result_spark in zip(f(self, self.pdf), f(self, self.kdf)):
+            compare(result_pandas, result_spark.to_pandas())
+
+    return wrapped
+
+
+@contextmanager
+def assert_produces_warning(
+        expected_warning=Warning,
+        filter_level="always",
+        check_stacklevel=True,
+        raise_on_extra_warnings=True,

Review comment:
       Cool!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820653455


   **[Test build #137429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137429/testReport)** for PR 32177 at commit [`6638d40`](https://github.com/apache/spark/commit/6638d40ca679abb911fdb0a0a9d50f84af563f57).
    * This patch **fails PySpark unit tests**.
    * This patch **does not merge cleanly**.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824388730


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42273/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819861215


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41950/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822701231






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820614932


   **[Test build #137430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137430/testReport)** for PR 32177 at commit [`73ce19a`](https://github.com/apache/spark/commit/73ce19a6a44933869db0be7dc02d03958cf0e44c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824388730


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42273/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820603838


   **[Test build #752884941](https://github.com/xinrong-databricks/spark/actions/runs/752884941)** for PR 32177 at commit [`c79a606`](https://github.com/xinrong-databricks/spark/commit/c79a606c8d5103744c0bcc7935548da7c973cc52).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-821523085


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137493/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824360027


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137746/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822632516


   **[Test build #137642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137642/testReport)** for PR 32177 at commit [`7a1a08a`](https://github.com/apache/spark/commit/7a1a08a332492604afa3d92376c40a8aefc42fd3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r613723592



##########
File path: python/pyspark/pandas/testing/utils.py
##########
@@ -1,448 +0,0 @@
-#

Review comment:
       Makes sense!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820615864


   **[Test build #752931779](https://github.com/xinrong-databricks/spark/actions/runs/752931779)** for PR 32177 at commit [`aa3590e`](https://github.com/xinrong-databricks/spark/commit/aa3590e655c7d0e9f5fb717aa633b843e42c4a53).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r613662307



##########
File path: python/pyspark/pandas/testing/utils.py
##########
@@ -1,448 +0,0 @@
-#

Review comment:
       If we remove this file, we can also remove `__init__.py`?

##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):
+    """
+    This util assumes the instance of this to have 'spark' attribute, having a spark session.
+    It is usually used with 'ReusedSQLTestCase' class but can be used if you feel sure the
+    the implementation of this class has 'spark' attribute.
+    """
+
+    @contextmanager
+    def sql_conf(self, pairs):
+        """
+        A convenient context manager to test some configuration specific logic. This sets
+        `value` to the configuration `key` and then restores it back when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        with sqlc(pairs, spark=self.spark):
+            yield
+
+    @contextmanager
+    def database(self, *databases):
+        """
+        A convenient context manager to test with some specific databases. This drops the given
+        databases if it exists and sets current database to "default" when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for db in databases:
+                self.spark.sql("DROP DATABASE IF EXISTS %s CASCADE" % db)
+            self.spark.catalog.setCurrentDatabase("default")
+
+    @contextmanager
+    def table(self, *tables):
+        """
+        A convenient context manager to test with some specific tables. This drops the given tables
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for t in tables:
+                self.spark.sql("DROP TABLE IF EXISTS %s" % t)
+
+    @contextmanager
+    def tempView(self, *views):
+        """
+        A convenient context manager to test with some specific views. This drops the given views
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for v in views:
+                self.spark.catalog.dropTempView(v)
+
+    @contextmanager
+    def function(self, *functions):
+        """
+        A convenient context manager to test with some specific functions. This drops the given
+        functions if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for f in functions:
+                self.spark.sql("DROP FUNCTION IF EXISTS %s" % f)
+
+
+class ReusedSQLTestCase(unittest.TestCase, SQLTestUtils):
+    @classmethod
+    def setUpClass(cls):
+        cls.spark = default_session()
+        cls.spark.conf.set(SPARK_CONF_ARROW_ENABLED, True)
+
+    @classmethod
+    def tearDownClass(cls):
+        # We don't stop Spark session to reuse across all tests.
+        # The Spark session will be started and stopped at PyTest session level.
+        # Please see databricks/koalas/conftest.py.
+        pass
+
+    def assertPandasEqual(self, left, right, check_exact=True):
+        if isinstance(left, pd.DataFrame) and isinstance(right, pd.DataFrame):
+            try:
+                if LooseVersion(pd.__version__) >= LooseVersion("1.1"):
+                    kwargs = dict(check_freq=False)
+                else:
+                    kwargs = dict()
+
+                assert_frame_equal(
+                    left,
+                    right,
+                    check_index_type=("equiv" if len(left.index) > 0 else False),
+                    check_column_type=("equiv" if len(left.columns) > 0 else False),
+                    check_exact=check_exact,
+                    **kwargs
+                )
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtypes)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtypes)
+                )
+                raise AssertionError(msg) from e
+        elif isinstance(left, pd.Series) and isinstance(right, pd.Series):
+            try:
+                if LooseVersion(pd.__version__) >= LooseVersion("1.1"):
+                    kwargs = dict(check_freq=False)
+                else:
+                    kwargs = dict()
+
+                assert_series_equal(
+                    left,
+                    right,
+                    check_index_type=("equiv" if len(left.index) > 0 else False),
+                    check_exact=check_exact,
+                    **kwargs
+                )
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+                )
+                raise AssertionError(msg) from e
+        elif isinstance(left, pd.Index) and isinstance(right, pd.Index):
+            try:
+                assert_index_equal(left, right, check_exact=check_exact)
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+                )
+                raise AssertionError(msg) from e
+        else:
+            raise ValueError("Unexpected values: (%s, %s)" % (left, right))
+
+    def assertPandasAlmostEqual(self, left, right):
+        """
+        This function checks if given pandas objects approximately same,
+        which means the conditions below:
+          - Both objects are nullable
+          - Compare floats rounding to the number of decimal places, 7 after
+            dropping missing values (NaN, NaT, None)
+        """
+        if isinstance(left, pd.DataFrame) and isinstance(right, pd.DataFrame):
+            msg = (
+                "DataFrames are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtypes)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtypes)
+            )
+            self.assertEqual(left.shape, right.shape, msg=msg)
+            for lcol, rcol in zip(left.columns, right.columns):
+                self.assertEqual(lcol, rcol, msg=msg)
+                for lnull, rnull in zip(left[lcol].isnull(), right[rcol].isnull()):
+                    self.assertEqual(lnull, rnull, msg=msg)
+                for lval, rval in zip(left[lcol].dropna(), right[rcol].dropna()):
+                    self.assertAlmostEqual(lval, rval, msg=msg)
+            self.assertEqual(left.columns.names, right.columns.names, msg=msg)
+        elif isinstance(left, pd.Series) and isinstance(right, pd.Series):
+            msg = (
+                "Series are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(left.name, right.name, msg=msg)
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lnull, rnull in zip(left.isnull(), right.isnull()):
+                self.assertEqual(lnull, rnull, msg=msg)
+            for lval, rval in zip(left.dropna(), right.dropna()):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        elif isinstance(left, pd.MultiIndex) and isinstance(right, pd.MultiIndex):
+            msg = (
+                "MultiIndices are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lval, rval in zip(left, right):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        elif isinstance(left, pd.Index) and isinstance(right, pd.Index):
+            msg = (
+                "Indices are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lnull, rnull in zip(left.isnull(), right.isnull()):
+                self.assertEqual(lnull, rnull, msg=msg)
+            for lval, rval in zip(left.dropna(), right.dropna()):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        else:
+            raise ValueError("Unexpected values: (%s, %s)" % (left, right))
+
+    def assert_eq(self, left, right, check_exact=True, almost=False):
+        """
+        Asserts if two arbitrary objects are equal or not. If given objects are Koalas DataFrame
+        or Series, they are converted into pandas' and compared.
+
+        :param left: object to compare
+        :param right: object to compare
+        :param check_exact: if this is False, the comparison is done less precisely.
+        :param almost: if this is enabled, the comparison is delegated to `unittest`'s
+                       `assertAlmostEqual`. See its documentation for more details.
+        """
+        lobj = self._to_pandas(left)
+        robj = self._to_pandas(right)
+        if isinstance(lobj, (pd.DataFrame, pd.Series, pd.Index)):
+            if almost:
+                self.assertPandasAlmostEqual(lobj, robj)
+            else:
+                self.assertPandasEqual(lobj, robj, check_exact=check_exact)
+        elif is_list_like(lobj) and is_list_like(robj):
+            self.assertTrue(len(left) == len(right))
+            for litem, ritem in zip(left, right):
+                self.assert_eq(litem, ritem, check_exact=check_exact, almost=almost)
+        elif (lobj is not None and pd.isna(lobj)) and (robj is not None and pd.isna(robj)):
+            pass
+        else:
+            if almost:
+                self.assertAlmostEqual(lobj, robj)
+            else:
+                self.assertEqual(lobj, robj)
+
+    @staticmethod
+    def _to_pandas(obj):
+        if isinstance(obj, (DataFrame, Series, Index)):
+            return obj.to_pandas()
+        else:
+            return obj
+
+
+class TestUtils(object):
+    @contextmanager
+    def temp_dir(self):
+        tmp = tempfile.mkdtemp()
+        try:
+            yield tmp
+        finally:
+            shutil.rmtree(tmp)
+
+    @contextmanager
+    def temp_file(self):
+        with self.temp_dir() as tmp:
+            yield tempfile.mktemp(dir=tmp)
+
+
+class ComparisonTestBase(ReusedSQLTestCase):
+    @property
+    def kdf(self):
+        return ps.from_pandas(self.pdf)
+
+    @property
+    def pdf(self):
+        return self.kdf.to_pandas()
+
+
+def compare_both(f=None, almost=True):
+
+    if f is None:
+        return functools.partial(compare_both, almost=almost)
+    elif isinstance(f, bool):
+        return functools.partial(compare_both, almost=f)
+
+    @functools.wraps(f)
+    def wrapped(self):
+        if almost:
+            compare = self.assertPandasAlmostEqual
+        else:
+            compare = self.assertPandasEqual
+
+        for result_pandas, result_spark in zip(f(self, self.pdf), f(self, self.kdf)):
+            compare(result_pandas, result_spark.to_pandas())
+
+    return wrapped
+
+
+@contextmanager
+def assert_produces_warning(
+        expected_warning=Warning,
+        filter_level="always",
+        check_stacklevel=True,
+        raise_on_extra_warnings=True,

Review comment:
       nit: style? maybe 4-space indent?

##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):
+    """
+    This util assumes the instance of this to have 'spark' attribute, having a spark session.
+    It is usually used with 'ReusedSQLTestCase' class but can be used if you feel sure the
+    the implementation of this class has 'spark' attribute.
+    """
+
+    @contextmanager
+    def sql_conf(self, pairs):
+        """
+        A convenient context manager to test some configuration specific logic. This sets
+        `value` to the configuration `key` and then restores it back when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        with sqlc(pairs, spark=self.spark):
+            yield
+
+    @contextmanager
+    def database(self, *databases):
+        """
+        A convenient context manager to test with some specific databases. This drops the given
+        databases if it exists and sets current database to "default" when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for db in databases:
+                self.spark.sql("DROP DATABASE IF EXISTS %s CASCADE" % db)
+            self.spark.catalog.setCurrentDatabase("default")
+
+    @contextmanager
+    def table(self, *tables):
+        """
+        A convenient context manager to test with some specific tables. This drops the given tables
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for t in tables:
+                self.spark.sql("DROP TABLE IF EXISTS %s" % t)
+
+    @contextmanager
+    def tempView(self, *views):
+        """
+        A convenient context manager to test with some specific views. This drops the given views
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for v in views:
+                self.spark.catalog.dropTempView(v)
+
+    @contextmanager
+    def function(self, *functions):
+        """
+        A convenient context manager to test with some specific functions. This drops the given
+        functions if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for f in functions:
+                self.spark.sql("DROP FUNCTION IF EXISTS %s" % f)
+
+
+class ReusedSQLTestCase(unittest.TestCase, SQLTestUtils):

Review comment:
       ditto, but without setting `spark.conf.set(SPARK_CONF_ARROW_ENABLED, True)`.

##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):

Review comment:
       Actually there is already `SQLTestUtils` in `pyspark.testing.sqlutils`.

##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):
+    """
+    This util assumes the instance of this to have 'spark' attribute, having a spark session.
+    It is usually used with 'ReusedSQLTestCase' class but can be used if you feel sure the
+    the implementation of this class has 'spark' attribute.
+    """
+
+    @contextmanager
+    def sql_conf(self, pairs):
+        """
+        A convenient context manager to test some configuration specific logic. This sets
+        `value` to the configuration `key` and then restores it back when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        with sqlc(pairs, spark=self.spark):
+            yield
+
+    @contextmanager
+    def database(self, *databases):
+        """
+        A convenient context manager to test with some specific databases. This drops the given
+        databases if it exists and sets current database to "default" when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for db in databases:
+                self.spark.sql("DROP DATABASE IF EXISTS %s CASCADE" % db)
+            self.spark.catalog.setCurrentDatabase("default")
+
+    @contextmanager
+    def table(self, *tables):
+        """
+        A convenient context manager to test with some specific tables. This drops the given tables
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for t in tables:
+                self.spark.sql("DROP TABLE IF EXISTS %s" % t)
+
+    @contextmanager
+    def tempView(self, *views):
+        """
+        A convenient context manager to test with some specific views. This drops the given views
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for v in views:
+                self.spark.catalog.dropTempView(v)
+
+    @contextmanager
+    def function(self, *functions):
+        """
+        A convenient context manager to test with some specific functions. This drops the given
+        functions if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for f in functions:
+                self.spark.sql("DROP FUNCTION IF EXISTS %s" % f)
+
+
+class ReusedSQLTestCase(unittest.TestCase, SQLTestUtils):
+    @classmethod
+    def setUpClass(cls):
+        cls.spark = default_session()
+        cls.spark.conf.set(SPARK_CONF_ARROW_ENABLED, True)
+
+    @classmethod
+    def tearDownClass(cls):
+        # We don't stop Spark session to reuse across all tests.
+        # The Spark session will be started and stopped at PyTest session level.
+        # Please see databricks/koalas/conftest.py.
+        pass
+
+    def assertPandasEqual(self, left, right, check_exact=True):

Review comment:
       We should have these additional testing functions separately in `python/pyspark/testing/pandasutils.py` or somewhere?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824463699


   **[Test build #137760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137760/testReport)** for PR 32177 at commit [`51c0198`](https://github.com/apache/spark/commit/51c01980a918b65edbfc56d11c7a85e46397dc5d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820035283


   **[Test build #137395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137395/testReport)** for PR 32177 at commit [`de6cb1e`](https://github.com/apache/spark/commit/de6cb1eb0ed7705dcb768ba5f8e93f38cac938ec).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824463699


   **[Test build #137760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137760/testReport)** for PR 32177 at commit [`51c0198`](https://github.com/apache/spark/commit/51c01980a918b65edbfc56d11c7a85e46397dc5d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819843480


   **[Test build #749688774](https://github.com/xinrong-databricks/spark/actions/runs/749688774)** for PR 32177 at commit [`b31ab33`](https://github.com/xinrong-databricks/spark/commit/b31ab33730dbaafaac5832ea53c704aae771df59).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822668634


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42171/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-821493829


   **[Test build #756639851](https://github.com/xinrong-databricks/spark/actions/runs/756639851)** for PR 32177 at commit [`279e655`](https://github.com/xinrong-databricks/spark/commit/279e65558ab655d93d2fe6765cfe17387146afa0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-821554068


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42069/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822668634


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42171/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820578341


   **[Test build #752784527](https://github.com/xinrong-databricks/spark/actions/runs/752784527)** for PR 32177 at commit [`6638d40`](https://github.com/xinrong-databricks/spark/commit/6638d40ca679abb911fdb0a0a9d50f84af563f57).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824505312


   **[Test build #137767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137767/testReport)** for PR 32177 at commit [`8660ae8`](https://github.com/apache/spark/commit/8660ae8c6cfdeb45dd1d1a07b6ff1af01aa2dec1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822651600


   **[Test build #137641 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137641/testReport)** for PR 32177 at commit [`db7d470`](https://github.com/apache/spark/commit/db7d470e631ad51646df9aba162eb3d2e67dcb72).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-823701287


   **[Test build #137706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137706/testReport)** for PR 32177 at commit [`4be5e72`](https://github.com/apache/spark/commit/4be5e726713c1fd475ca5fa2f2710f386eaba21d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820021211


   **[Test build #137392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137392/testReport)** for PR 32177 at commit [`9781e5d`](https://github.com/apache/spark/commit/9781e5dc8fbacfffa08e749a52a03e463af9b552).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824360007


   **[Test build #137746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137746/testReport)** for PR 32177 at commit [`5022a3b`](https://github.com/apache/spark/commit/5022a3b29bdcf662fa4d4778a66bf45b6e3e1f99).
    * This patch **fails Python style tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class IndexesTest(PandasOnSparkTestCase, TestUtils):`
     * `class CategoricalIndexTest(PandasOnSparkTestCase, TestUtils):`
     * `class DatetimeIndexTest(PandasOnSparkTestCase, TestUtils):`
     * `class DataFramePlotTest(PandasOnSparkTestCase):`
     * `class DataFramePlotMatplotlibTest(PandasOnSparkTestCase, TestUtils):`
     * `class DataFramePlotPlotlyTest(PandasOnSparkTestCase, TestUtils):`
     * `class SeriesPlotMatplotlibTest(PandasOnSparkTestCase, TestUtils):`
     * `class SeriesPlotPlotlyTest(PandasOnSparkTestCase, TestUtils):`
     * `class CategoricalTest(PandasOnSparkTestCase, TestUtils):`
     * `class ConfigTest(PandasOnSparkTestCase):`
     * `class CsvTest(PandasOnSparkTestCase, TestUtils):`
     * `class DataFrameTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class DataFrameConversionTest(PandasOnSparkTestCase, SQLTestUtils, TestUtils):`
     * `class DataFrameSparkIOTest(PandasOnSparkTestCase, TestUtils):`
     * `class DefaultIndexTest(PandasOnSparkTestCase):`
     * `class ExpandingTest(PandasOnSparkTestCase, TestUtils):`
     * `class ExtensionTest(PandasOnSparkTestCase):`
     * `class SparkFrameMethodsTest(PandasOnSparkTestCase, SQLTestUtils, TestUtils):`
     * `class GroupByTest(PandasOnSparkTestCase, TestUtils):`
     * `class IndexingTest(PandasOnSparkTestCase):`
     * `class SparkIndexOpsMethodsTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class InternalFrameTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class NamespaceTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class NumPyCompatTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class OpsOnDiffFramesEnabledTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class OpsOnDiffFramesDisabledTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class OpsOnDiffFramesGroupByTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class OpsOnDiffFramesGroupByExpandingTest(PandasOnSparkTestCase, TestUtils):`
     * `class OpsOnDiffFramesGroupByRollingTest(PandasOnSparkTestCase, TestUtils):`
     * `class ReprTest(PandasOnSparkTestCase):`
     * `class ReshapeTest(PandasOnSparkTestCase):`
     * `class RollingTest(PandasOnSparkTestCase, TestUtils):`
     * `class SeriesTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class SeriesConversionTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class SeriesDateTimeTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class SeriesStringTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class SQLTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class StatsTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class UtilsTest(PandasOnSparkTestCase, SQLTestUtils):`
     * `class ExpandingRollingTest(PandasOnSparkTestCase, TestUtils):`
     * `class PandasOnSparkTestCase(unittest.TestCase, SQLTestUtils):`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822663056


   **[Test build #137644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137644/testReport)** for PR 32177 at commit [`ae73324`](https://github.com/apache/spark/commit/ae73324764d48c37dbf6caf5830145b45463ba63).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824420785


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137750/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-821515942


   **[Test build #137493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137493/testReport)** for PR 32177 at commit [`279e655`](https://github.com/apache/spark/commit/279e65558ab655d93d2fe6765cfe17387146afa0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-825043437


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42346/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin closed pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
ueshin closed pull request #32177:
URL: https://github.com/apache/spark/pull/32177


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820654642


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137429/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824378539


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42273/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822662135






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819817791


   **[Test build #137372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137372/testReport)** for PR 32177 at commit [`b31ab33`](https://github.com/apache/spark/commit/b31ab33730dbaafaac5832ea53c704aae771df59).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820580982


   **[Test build #137429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137429/testReport)** for PR 32177 at commit [`6638d40`](https://github.com/apache/spark/commit/6638d40ca679abb911fdb0a0a9d50f84af563f57).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824478090






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824357411






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r614234839



##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):

Review comment:
       Cool, `SQLTestUtils` in `python/pyspark/testing/utils.py` is removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824359417


   **[Test build #137746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137746/testReport)** for PR 32177 at commit [`5022a3b`](https://github.com/apache/spark/commit/5022a3b29bdcf662fa4d4778a66bf45b6e3e1f99).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820627230


   **[Test build #752986243](https://github.com/xinrong-databricks/spark/actions/runs/752986243)** for PR 32177 at commit [`b70bba9`](https://github.com/xinrong-databricks/spark/commit/b70bba943b2e502dd5252eb011adf9dc5715aa89).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
Yikun commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r614085656



##########
File path: python/pyspark/testing/utils.py
##########
@@ -39,6 +54,29 @@
     # No NumPy, but that's okay, we'll skip those tests
     pass
 
+tabulate_requirement_message = None
+try:
+    from tabulate import tabulate  # noqa: F401
+except ImportError as e:
+    # If tabulate requirement is not satisfied, skip related tests.
+    tabulate_requirement_message = str(e)
+have_tabulate = tabulate_requirement_message is None

Review comment:
       I'm not sure why we need introduce the an extra `tabulate_requirement_message`?
   
   I would like the below code:
   ```suggestion
   have_tabulate = True
   try:
       from tabulate import tabulate  # noqa: F401
   except ImportError:
       # If tabulate requirement is not satisfied, skip related tests.
       have_tabulate = False
   ```
   or just keep the above style:
   ```suggestion
   have_tabulate = True
   try:
       from tabulate import tabulate  # noqa: F401
   except ImportError:
       # If tabulate requirement is not satisfied, skip related tests.
       have_tabulate = False
   ```

##########
File path: python/pyspark/pandas/tests/test_dataframe_conversion.py
##########
@@ -26,7 +26,7 @@
 
 from pyspark import pandas as pp

Review comment:
       unreltaed: we do the rename on pp --> ps in  https://github.com/apache/spark/pull/32108 , 
   
   - `from pyspark import pandas as pd`  <-- this is the panda develper style to name pandas.
   - `from pyspark import pandas as ps`  <-- does it mean `panda on spark`
   
   I know it's unrelated but would mind also do it in your patch? Or I'm also okay to submited in a separated patch.

##########
File path: python/pyspark/testing/utils.py
##########
@@ -39,6 +54,29 @@
     # No NumPy, but that's okay, we'll skip those tests

Review comment:
       unrelated, but better to change `except` to `except ImportError`

##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):
+    """
+    This util assumes the instance of this to have 'spark' attribute, having a spark session.
+    It is usually used with 'ReusedSQLTestCase' class but can be used if you feel sure the
+    the implementation of this class has 'spark' attribute.
+    """
+
+    @contextmanager
+    def sql_conf(self, pairs):
+        """
+        A convenient context manager to test some configuration specific logic. This sets
+        `value` to the configuration `key` and then restores it back when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        with sqlc(pairs, spark=self.spark):
+            yield
+
+    @contextmanager
+    def database(self, *databases):
+        """
+        A convenient context manager to test with some specific databases. This drops the given
+        databases if it exists and sets current database to "default" when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for db in databases:
+                self.spark.sql("DROP DATABASE IF EXISTS %s CASCADE" % db)
+            self.spark.catalog.setCurrentDatabase("default")
+
+    @contextmanager
+    def table(self, *tables):
+        """
+        A convenient context manager to test with some specific tables. This drops the given tables
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for t in tables:
+                self.spark.sql("DROP TABLE IF EXISTS %s" % t)
+
+    @contextmanager
+    def tempView(self, *views):
+        """
+        A convenient context manager to test with some specific views. This drops the given views
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for v in views:
+                self.spark.catalog.dropTempView(v)
+
+    @contextmanager
+    def function(self, *functions):
+        """
+        A convenient context manager to test with some specific functions. This drops the given
+        functions if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for f in functions:
+                self.spark.sql("DROP FUNCTION IF EXISTS %s" % f)
+
+
+class ReusedSQLTestCase(unittest.TestCase, SQLTestUtils):
+    @classmethod
+    def setUpClass(cls):
+        cls.spark = default_session()
+        cls.spark.conf.set(SPARK_CONF_ARROW_ENABLED, True)
+
+    @classmethod
+    def tearDownClass(cls):
+        # We don't stop Spark session to reuse across all tests.
+        # The Spark session will be started and stopped at PyTest session level.
+        # Please see databricks/koalas/conftest.py.
+        pass
+
+    def assertPandasEqual(self, left, right, check_exact=True):
+        if isinstance(left, pd.DataFrame) and isinstance(right, pd.DataFrame):
+            try:
+                if LooseVersion(pd.__version__) >= LooseVersion("1.1"):
+                    kwargs = dict(check_freq=False)
+                else:
+                    kwargs = dict()
+
+                assert_frame_equal(
+                    left,
+                    right,
+                    check_index_type=("equiv" if len(left.index) > 0 else False),
+                    check_column_type=("equiv" if len(left.columns) > 0 else False),
+                    check_exact=check_exact,
+                    **kwargs
+                )
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtypes)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtypes)
+                )
+                raise AssertionError(msg) from e
+        elif isinstance(left, pd.Series) and isinstance(right, pd.Series):
+            try:
+                if LooseVersion(pd.__version__) >= LooseVersion("1.1"):
+                    kwargs = dict(check_freq=False)
+                else:
+                    kwargs = dict()
+
+                assert_series_equal(
+                    left,
+                    right,
+                    check_index_type=("equiv" if len(left.index) > 0 else False),
+                    check_exact=check_exact,
+                    **kwargs
+                )
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+                )
+                raise AssertionError(msg) from e
+        elif isinstance(left, pd.Index) and isinstance(right, pd.Index):
+            try:
+                assert_index_equal(left, right, check_exact=check_exact)
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+                )
+                raise AssertionError(msg) from e
+        else:
+            raise ValueError("Unexpected values: (%s, %s)" % (left, right))
+
+    def assertPandasAlmostEqual(self, left, right):
+        """
+        This function checks if given pandas objects approximately same,
+        which means the conditions below:
+          - Both objects are nullable
+          - Compare floats rounding to the number of decimal places, 7 after
+            dropping missing values (NaN, NaT, None)
+        """
+        if isinstance(left, pd.DataFrame) and isinstance(right, pd.DataFrame):
+            msg = (
+                "DataFrames are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtypes)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtypes)
+            )
+            self.assertEqual(left.shape, right.shape, msg=msg)
+            for lcol, rcol in zip(left.columns, right.columns):
+                self.assertEqual(lcol, rcol, msg=msg)
+                for lnull, rnull in zip(left[lcol].isnull(), right[rcol].isnull()):
+                    self.assertEqual(lnull, rnull, msg=msg)
+                for lval, rval in zip(left[lcol].dropna(), right[rcol].dropna()):
+                    self.assertAlmostEqual(lval, rval, msg=msg)
+            self.assertEqual(left.columns.names, right.columns.names, msg=msg)
+        elif isinstance(left, pd.Series) and isinstance(right, pd.Series):
+            msg = (
+                "Series are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(left.name, right.name, msg=msg)
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lnull, rnull in zip(left.isnull(), right.isnull()):
+                self.assertEqual(lnull, rnull, msg=msg)
+            for lval, rval in zip(left.dropna(), right.dropna()):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        elif isinstance(left, pd.MultiIndex) and isinstance(right, pd.MultiIndex):
+            msg = (
+                "MultiIndices are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lval, rval in zip(left, right):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        elif isinstance(left, pd.Index) and isinstance(right, pd.Index):
+            msg = (
+                "Indices are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lnull, rnull in zip(left.isnull(), right.isnull()):
+                self.assertEqual(lnull, rnull, msg=msg)
+            for lval, rval in zip(left.dropna(), right.dropna()):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        else:
+            raise ValueError("Unexpected values: (%s, %s)" % (left, right))
+
+    def assert_eq(self, left, right, check_exact=True, almost=False):
+        """
+        Asserts if two arbitrary objects are equal or not. If given objects are Koalas DataFrame
+        or Series, they are converted into pandas' and compared.
+
+        :param left: object to compare
+        :param right: object to compare
+        :param check_exact: if this is False, the comparison is done less precisely.
+        :param almost: if this is enabled, the comparison is delegated to `unittest`'s
+                       `assertAlmostEqual`. See its documentation for more details.
+        """
+        lobj = self._to_pandas(left)
+        robj = self._to_pandas(right)
+        if isinstance(lobj, (pd.DataFrame, pd.Series, pd.Index)):
+            if almost:
+                self.assertPandasAlmostEqual(lobj, robj)
+            else:
+                self.assertPandasEqual(lobj, robj, check_exact=check_exact)
+        elif is_list_like(lobj) and is_list_like(robj):
+            self.assertTrue(len(left) == len(right))
+            for litem, ritem in zip(left, right):
+                self.assert_eq(litem, ritem, check_exact=check_exact, almost=almost)
+        elif (lobj is not None and pd.isna(lobj)) and (robj is not None and pd.isna(robj)):
+            pass
+        else:
+            if almost:
+                self.assertAlmostEqual(lobj, robj)
+            else:
+                self.assertEqual(lobj, robj)
+
+    @staticmethod
+    def _to_pandas(obj):
+        if isinstance(obj, (DataFrame, Series, Index)):
+            return obj.to_pandas()
+        else:
+            return obj
+
+
+class TestUtils(object):
+    @contextmanager
+    def temp_dir(self):
+        tmp = tempfile.mkdtemp()
+        try:
+            yield tmp
+        finally:
+            shutil.rmtree(tmp)
+
+    @contextmanager
+    def temp_file(self):
+        with self.temp_dir() as tmp:
+            yield tempfile.mktemp(dir=tmp)
+
+
+class ComparisonTestBase(ReusedSQLTestCase):
+    @property
+    def kdf(self):
+        return ps.from_pandas(self.pdf)
+
+    @property
+    def pdf(self):
+        return self.kdf.to_pandas()
+
+
+def compare_both(f=None, almost=True):
+
+    if f is None:
+        return functools.partial(compare_both, almost=almost)
+    elif isinstance(f, bool):
+        return functools.partial(compare_both, almost=f)
+
+    @functools.wraps(f)
+    def wrapped(self):
+        if almost:
+            compare = self.assertPandasAlmostEqual
+        else:
+            compare = self.assertPandasEqual
+
+        for result_pandas, result_spark in zip(f(self, self.pdf), f(self, self.kdf)):
+            compare(result_pandas, result_spark.to_pandas())
+
+    return wrapped
+
+
+@contextmanager
+def assert_produces_warning(
+    expected_warning=Warning,
+    filter_level="always",
+    check_stacklevel=True,
+    raise_on_extra_warnings=True,
+):
+    """
+    Context manager for running code expected to either raise a specific
+    warning, or not raise any warnings. Verifies that the code raises the
+    expected warning, and that it does not raise any other unexpected
+    warnings. It is basically a wrapper around ``warnings.catch_warnings``.
+
+    Notes
+    -----
+    Replicated from pandas._testing.

Review comment:
       This notes is outdate.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r614272785



##########
File path: python/pyspark/testing/utils.py
##########
@@ -39,6 +54,29 @@
     # No NumPy, but that's okay, we'll skip those tests

Review comment:
       Cool!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820033615


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137392/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819865909


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41950/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822697390






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r614276237



##########
File path: python/pyspark/testing/utils.py
##########
@@ -39,6 +54,29 @@
     # No NumPy, but that's okay, we'll skip those tests
     pass
 
+tabulate_requirement_message = None
+try:
+    from tabulate import tabulate  # noqa: F401
+except ImportError as e:
+    # If tabulate requirement is not satisfied, skip related tests.
+    tabulate_requirement_message = str(e)
+have_tabulate = tabulate_requirement_message is None

Review comment:
       If we would follow the similar case, for `pandas` and `pyarrow` in the sql module, we could use the `tabulate_requirement_message` for the message in `@unittest.skipIf(not have_tabulate, tabulate_requirement_message)`.
   
   https://github.com/apache/spark/blob/aa3590e655c7d0e9f5fb717aa633b843e42c4a53/python/pyspark/sql/tests/test_arrow.py#L44-L46




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820655884


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42009/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r616020120



##########
File path: python/pyspark/testing/utils.py
##########
@@ -39,6 +54,29 @@
     # No NumPy, but that's okay, we'll skip those tests
     pass
 
+tabulate_requirement_message = None
+try:
+    from tabulate import tabulate  # noqa: F401
+except ImportError as e:
+    # If tabulate requirement is not satisfied, skip related tests.
+    tabulate_requirement_message = str(e)
+have_tabulate = tabulate_requirement_message is None

Review comment:
       Okay I see. `..._requirement_message` is used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824389208


   **[Test build #137750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137750/testReport)** for PR 32177 at commit [`9beea9d`](https://github.com/apache/spark/commit/9beea9dc03f01c9b681bd6798f2a7ec223fa6604).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822668600


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42171/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820634567


   **[Test build #137432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137432/testReport)** for PR 32177 at commit [`aa3590e`](https://github.com/apache/spark/commit/aa3590e655c7d0e9f5fb717aa633b843e42c4a53).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824359417


   **[Test build #137746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137746/testReport)** for PR 32177 at commit [`5022a3b`](https://github.com/apache/spark/commit/5022a3b29bdcf662fa4d4778a66bf45b6e3e1f99).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824427374


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42277/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819865909


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41950/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r614263215



##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):
+    """
+    This util assumes the instance of this to have 'spark' attribute, having a spark session.
+    It is usually used with 'ReusedSQLTestCase' class but can be used if you feel sure the
+    the implementation of this class has 'spark' attribute.
+    """
+
+    @contextmanager
+    def sql_conf(self, pairs):
+        """
+        A convenient context manager to test some configuration specific logic. This sets
+        `value` to the configuration `key` and then restores it back when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        with sqlc(pairs, spark=self.spark):
+            yield
+
+    @contextmanager
+    def database(self, *databases):
+        """
+        A convenient context manager to test with some specific databases. This drops the given
+        databases if it exists and sets current database to "default" when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for db in databases:
+                self.spark.sql("DROP DATABASE IF EXISTS %s CASCADE" % db)
+            self.spark.catalog.setCurrentDatabase("default")
+
+    @contextmanager
+    def table(self, *tables):
+        """
+        A convenient context manager to test with some specific tables. This drops the given tables
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for t in tables:
+                self.spark.sql("DROP TABLE IF EXISTS %s" % t)
+
+    @contextmanager
+    def tempView(self, *views):
+        """
+        A convenient context manager to test with some specific views. This drops the given views
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for v in views:
+                self.spark.catalog.dropTempView(v)
+
+    @contextmanager
+    def function(self, *functions):
+        """
+        A convenient context manager to test with some specific functions. This drops the given
+        functions if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for f in functions:
+                self.spark.sql("DROP FUNCTION IF EXISTS %s" % f)
+
+
+class ReusedSQLTestCase(unittest.TestCase, SQLTestUtils):
+    @classmethod
+    def setUpClass(cls):
+        cls.spark = default_session()
+        cls.spark.conf.set(SPARK_CONF_ARROW_ENABLED, True)
+
+    @classmethod
+    def tearDownClass(cls):
+        # We don't stop Spark session to reuse across all tests.
+        # The Spark session will be started and stopped at PyTest session level.
+        # Please see databricks/koalas/conftest.py.
+        pass
+
+    def assertPandasEqual(self, left, right, check_exact=True):

Review comment:
       `python/pyspark/testing/pandasutils.py` is created.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r617873462



##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):
+    """
+    This util assumes the instance of this to have 'spark' attribute, having a spark session.
+    It is usually used with 'ReusedSQLTestCase' class but can be used if you feel sure the
+    the implementation of this class has 'spark' attribute.
+    """
+
+    @contextmanager
+    def sql_conf(self, pairs):
+        """
+        A convenient context manager to test some configuration specific logic. This sets
+        `value` to the configuration `key` and then restores it back when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        with sqlc(pairs, spark=self.spark):
+            yield
+
+    @contextmanager
+    def database(self, *databases):
+        """
+        A convenient context manager to test with some specific databases. This drops the given
+        databases if it exists and sets current database to "default" when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for db in databases:
+                self.spark.sql("DROP DATABASE IF EXISTS %s CASCADE" % db)
+            self.spark.catalog.setCurrentDatabase("default")
+
+    @contextmanager
+    def table(self, *tables):
+        """
+        A convenient context manager to test with some specific tables. This drops the given tables
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for t in tables:
+                self.spark.sql("DROP TABLE IF EXISTS %s" % t)
+
+    @contextmanager
+    def tempView(self, *views):
+        """
+        A convenient context manager to test with some specific views. This drops the given views
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for v in views:
+                self.spark.catalog.dropTempView(v)
+
+    @contextmanager
+    def function(self, *functions):
+        """
+        A convenient context manager to test with some specific functions. This drops the given
+        functions if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for f in functions:
+                self.spark.sql("DROP FUNCTION IF EXISTS %s" % f)
+
+
+class ReusedSQLTestCase(unittest.TestCase, SQLTestUtils):
+    @classmethod
+    def setUpClass(cls):
+        cls.spark = default_session()
+        cls.spark.conf.set(SPARK_CONF_ARROW_ENABLED, True)
+
+    @classmethod
+    def tearDownClass(cls):
+        # We don't stop Spark session to reuse across all tests.
+        # The Spark session will be started and stopped at PyTest session level.
+        # Please see databricks/koalas/conftest.py.
+        pass
+
+    def assertPandasEqual(self, left, right, check_exact=True):

Review comment:
       ReusedSQLTestCase in  `pandasutils.py` is renamed to PandasOnSparkTestCase.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824405478


   **[Test build #137750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137750/testReport)** for PR 32177 at commit [`9beea9d`](https://github.com/apache/spark/commit/9beea9dc03f01c9b681bd6798f2a7ec223fa6604).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class ComparisonTestBase(PandasOnSparkTestCase):`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820673153


   **[Test build #137438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137438/testReport)** for PR 32177 at commit [`b70bba9`](https://github.com/apache/spark/commit/b70bba943b2e502dd5252eb011adf9dc5715aa89).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820652663


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42009/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820614932


   **[Test build #137430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137430/testReport)** for PR 32177 at commit [`73ce19a`](https://github.com/apache/spark/commit/73ce19a6a44933869db0be7dc02d03958cf0e44c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819827025


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137372/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824505312


   **[Test build #137767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137767/testReport)** for PR 32177 at commit [`8660ae8`](https://github.com/apache/spark/commit/8660ae8c6cfdeb45dd1d1a07b6ff1af01aa2dec1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824483920






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824420785


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137750/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820580982


   **[Test build #137429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137429/testReport)** for PR 32177 at commit [`6638d40`](https://github.com/apache/spark/commit/6638d40ca679abb911fdb0a0a9d50f84af563f57).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820655884


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42009/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820675969


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42013/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820606899


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42006/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820654642


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137429/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822653432


   **[Test build #137642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137642/testReport)** for PR 32177 at commit [`7a1a08a`](https://github.com/apache/spark/commit/7a1a08a332492604afa3d92376c40a8aefc42fd3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824351319






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824990976


   The error in "Build and test" is
   ```
   build/mvn: line 126: /__w/spark/spark/build/apache-maven-3.6.3/bin/mvn: No such file or directory
   + OLD_VERSION=
   Error while getting version string from Maven:
   + '[' 1 '!=' 0 ']'
   + echo -e 'Error while getting version string from Maven:\n'
   ```
    I'm afraid this is not relevant to my PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822664838


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42171/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks edited a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks edited a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819815996


   [DONE] This PR ought to be adjusted and merged after https://github.com/apache/spark/pull/32139 and https://github.com/apache/spark/pull/32152.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-825034459


   **[Test build #137816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137816/testReport)** for PR 32177 at commit [`95bdbcf`](https://github.com/apache/spark/commit/95bdbcf2b368f0a1f252f963a5c4fb534d387ec0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822663056


   **[Test build #137644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137644/testReport)** for PR 32177 at commit [`ae73324`](https://github.com/apache/spark/commit/ae73324764d48c37dbf6caf5830145b45463ba63).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824533317


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42295/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824541701


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42295/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822632516


   **[Test build #137642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137642/testReport)** for PR 32177 at commit [`7a1a08a`](https://github.com/apache/spark/commit/7a1a08a332492604afa3d92376c40a8aefc42fd3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-821494924


   **[Test build #137493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137493/testReport)** for PR 32177 at commit [`279e655`](https://github.com/apache/spark/commit/279e65558ab655d93d2fe6765cfe17387146afa0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820021211


   **[Test build #137392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137392/testReport)** for PR 32177 at commit [`9781e5d`](https://github.com/apache/spark/commit/9781e5dc8fbacfffa08e749a52a03e463af9b552).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-821494924


   **[Test build #137493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137493/testReport)** for PR 32177 at commit [`279e655`](https://github.com/apache/spark/commit/279e65558ab655d93d2fe6765cfe17387146afa0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820685919






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820634154


   **[Test build #137430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137430/testReport)** for PR 32177 at commit [`73ce19a`](https://github.com/apache/spark/commit/73ce19a6a44933869db0be7dc02d03958cf0e44c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r614276237



##########
File path: python/pyspark/testing/utils.py
##########
@@ -39,6 +54,29 @@
     # No NumPy, but that's okay, we'll skip those tests
     pass
 
+tabulate_requirement_message = None
+try:
+    from tabulate import tabulate  # noqa: F401
+except ImportError as e:
+    # If tabulate requirement is not satisfied, skip related tests.
+    tabulate_requirement_message = str(e)
+have_tabulate = tabulate_requirement_message is None

Review comment:
       If we would follow the similar case, for `pandas` and `pyarrow` in the sql module, we could use the `tabulate_requirement_message` for the message like `@unittest.skipIf(not have_tabulate, tabulate_requirement_message)`.
   
   https://github.com/apache/spark/blob/aa3590e655c7d0e9f5fb717aa633b843e42c4a53/python/pyspark/sql/tests/test_arrow.py#L44-L46




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824338984


   **[Test build #137744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137744/testReport)** for PR 32177 at commit [`5e6fa2a`](https://github.com/apache/spark/commit/5e6fa2ae9282b10d48539ce3d5784833061c62d9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824535083


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42295/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-823716835






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824483918






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824522506


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137767/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820061103


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137395/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820021055


   **[Test build #750536294](https://github.com/xinrong-databricks/spark/actions/runs/750536294)** for PR 32177 at commit [`9781e5d`](https://github.com/xinrong-databricks/spark/commit/9781e5dc8fbacfffa08e749a52a03e463af9b552).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r614264742



##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):
+    """
+    This util assumes the instance of this to have 'spark' attribute, having a spark session.
+    It is usually used with 'ReusedSQLTestCase' class but can be used if you feel sure the
+    the implementation of this class has 'spark' attribute.
+    """
+
+    @contextmanager
+    def sql_conf(self, pairs):
+        """
+        A convenient context manager to test some configuration specific logic. This sets
+        `value` to the configuration `key` and then restores it back when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        with sqlc(pairs, spark=self.spark):
+            yield
+
+    @contextmanager
+    def database(self, *databases):
+        """
+        A convenient context manager to test with some specific databases. This drops the given
+        databases if it exists and sets current database to "default" when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for db in databases:
+                self.spark.sql("DROP DATABASE IF EXISTS %s CASCADE" % db)
+            self.spark.catalog.setCurrentDatabase("default")
+
+    @contextmanager
+    def table(self, *tables):
+        """
+        A convenient context manager to test with some specific tables. This drops the given tables
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for t in tables:
+                self.spark.sql("DROP TABLE IF EXISTS %s" % t)
+
+    @contextmanager
+    def tempView(self, *views):
+        """
+        A convenient context manager to test with some specific views. This drops the given views
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for v in views:
+                self.spark.catalog.dropTempView(v)
+
+    @contextmanager
+    def function(self, *functions):
+        """
+        A convenient context manager to test with some specific functions. This drops the given
+        functions if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for f in functions:
+                self.spark.sql("DROP FUNCTION IF EXISTS %s" % f)
+
+
+class ReusedSQLTestCase(unittest.TestCase, SQLTestUtils):
+    @classmethod
+    def setUpClass(cls):
+        cls.spark = default_session()
+        cls.spark.conf.set(SPARK_CONF_ARROW_ENABLED, True)
+
+    @classmethod
+    def tearDownClass(cls):
+        # We don't stop Spark session to reuse across all tests.
+        # The Spark session will be started and stopped at PyTest session level.
+        # Please see databricks/koalas/conftest.py.
+        pass
+
+    def assertPandasEqual(self, left, right, check_exact=True):

Review comment:
       Would it be fine to keep `ReusedSQLTestCase` in `python/pyspark/testing/pandasutils.py` since it has pandas-only testing functions? 
   Shall we rename it to avoid confusion with `ReusedSQLTestCase` in `python/pyspark/testing/sqlutils.py`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
ueshin commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-825149668


   Thanks! merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820617285


   **[Test build #137432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137432/testReport)** for PR 32177 at commit [`aa3590e`](https://github.com/apache/spark/commit/aa3590e655c7d0e9f5fb717aa633b843e42c4a53).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822662135






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820060610


   **[Test build #137395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137395/testReport)** for PR 32177 at commit [`de6cb1e`](https://github.com/apache/spark/commit/de6cb1eb0ed7705dcb768ba5f8e93f38cac938ec).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820685917






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820067686


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41970/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-825043393






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-825073349


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137816/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819851404


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41950/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824427358


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42277/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820612588


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42006/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820061103


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137395/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824324661


   **[Test build #137744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137744/testReport)** for PR 32177 at commit [`5e6fa2a`](https://github.com/apache/spark/commit/5e6fa2ae9282b10d48539ce3d5784833061c62d9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-823719173






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824522506


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137767/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820651256






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820021361


   **[Test build #750538121](https://github.com/xinrong-databricks/spark/actions/runs/750538121)** for PR 32177 at commit [`de6cb1e`](https://github.com/xinrong-databricks/spark/commit/de6cb1eb0ed7705dcb768ba5f8e93f38cac938ec).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820035283


   **[Test build #137395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137395/testReport)** for PR 32177 at commit [`de6cb1e`](https://github.com/apache/spark/commit/de6cb1eb0ed7705dcb768ba5f8e93f38cac938ec).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822630372


   **[Test build #137641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137641/testReport)** for PR 32177 at commit [`db7d470`](https://github.com/apache/spark/commit/db7d470e631ad51646df9aba162eb3d2e67dcb72).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824471804


   **[Test build #137760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137760/testReport)** for PR 32177 at commit [`51c0198`](https://github.com/apache/spark/commit/51c01980a918b65edbfc56d11c7a85e46397dc5d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r617812408



##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):
+    """
+    This util assumes the instance of this to have 'spark' attribute, having a spark session.
+    It is usually used with 'ReusedSQLTestCase' class but can be used if you feel sure the
+    the implementation of this class has 'spark' attribute.
+    """
+
+    @contextmanager
+    def sql_conf(self, pairs):
+        """
+        A convenient context manager to test some configuration specific logic. This sets
+        `value` to the configuration `key` and then restores it back when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        with sqlc(pairs, spark=self.spark):
+            yield
+
+    @contextmanager
+    def database(self, *databases):
+        """
+        A convenient context manager to test with some specific databases. This drops the given
+        databases if it exists and sets current database to "default" when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for db in databases:
+                self.spark.sql("DROP DATABASE IF EXISTS %s CASCADE" % db)
+            self.spark.catalog.setCurrentDatabase("default")
+
+    @contextmanager
+    def table(self, *tables):
+        """
+        A convenient context manager to test with some specific tables. This drops the given tables
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for t in tables:
+                self.spark.sql("DROP TABLE IF EXISTS %s" % t)
+
+    @contextmanager
+    def tempView(self, *views):
+        """
+        A convenient context manager to test with some specific views. This drops the given views
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for v in views:
+                self.spark.catalog.dropTempView(v)
+
+    @contextmanager
+    def function(self, *functions):
+        """
+        A convenient context manager to test with some specific functions. This drops the given
+        functions if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for f in functions:
+                self.spark.sql("DROP FUNCTION IF EXISTS %s" % f)
+
+
+class ReusedSQLTestCase(unittest.TestCase, SQLTestUtils):
+    @classmethod
+    def setUpClass(cls):
+        cls.spark = default_session()
+        cls.spark.conf.set(SPARK_CONF_ARROW_ENABLED, True)
+
+    @classmethod
+    def tearDownClass(cls):
+        # We don't stop Spark session to reuse across all tests.
+        # The Spark session will be started and stopped at PyTest session level.
+        # Please see databricks/koalas/conftest.py.
+        pass
+
+    def assertPandasEqual(self, left, right, check_exact=True):

Review comment:
       Yes, sounds good. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824357411






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820653214


   **[Test build #137438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137438/testReport)** for PR 32177 at commit [`b70bba9`](https://github.com/apache/spark/commit/b70bba943b2e502dd5252eb011adf9dc5715aa89).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820653214


   **[Test build #137438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137438/testReport)** for PR 32177 at commit [`b70bba9`](https://github.com/apache/spark/commit/b70bba943b2e502dd5252eb011adf9dc5715aa89).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r614269855



##########
File path: python/pyspark/testing/utils.py
##########
@@ -39,6 +54,29 @@
     # No NumPy, but that's okay, we'll skip those tests
     pass
 
+tabulate_requirement_message = None
+try:
+    from tabulate import tabulate  # noqa: F401
+except ImportError as e:
+    # If tabulate requirement is not satisfied, skip related tests.
+    tabulate_requirement_message = str(e)
+have_tabulate = tabulate_requirement_message is None

Review comment:
       Good idea! It's updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r616024865



##########
File path: python/pyspark/testing/utils.py
##########
@@ -171,3 +209,393 @@ def search_jar(project_relative_path, sbt_jar_name_prefix, mvn_jar_name_prefix):
         raise Exception("Found multiple JARs: %s; please remove all but one" % (", ".join(jars)))
     else:
         return jars[0]
+
+
+# Utilities below are used mainly in pyspark/pandas
+class SQLTestUtils(object):
+    """
+    This util assumes the instance of this to have 'spark' attribute, having a spark session.
+    It is usually used with 'ReusedSQLTestCase' class but can be used if you feel sure the
+    the implementation of this class has 'spark' attribute.
+    """
+
+    @contextmanager
+    def sql_conf(self, pairs):
+        """
+        A convenient context manager to test some configuration specific logic. This sets
+        `value` to the configuration `key` and then restores it back when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        with sqlc(pairs, spark=self.spark):
+            yield
+
+    @contextmanager
+    def database(self, *databases):
+        """
+        A convenient context manager to test with some specific databases. This drops the given
+        databases if it exists and sets current database to "default" when it exits.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for db in databases:
+                self.spark.sql("DROP DATABASE IF EXISTS %s CASCADE" % db)
+            self.spark.catalog.setCurrentDatabase("default")
+
+    @contextmanager
+    def table(self, *tables):
+        """
+        A convenient context manager to test with some specific tables. This drops the given tables
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for t in tables:
+                self.spark.sql("DROP TABLE IF EXISTS %s" % t)
+
+    @contextmanager
+    def tempView(self, *views):
+        """
+        A convenient context manager to test with some specific views. This drops the given views
+        if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for v in views:
+                self.spark.catalog.dropTempView(v)
+
+    @contextmanager
+    def function(self, *functions):
+        """
+        A convenient context manager to test with some specific functions. This drops the given
+        functions if it exists.
+        """
+        assert hasattr(self, "spark"), "it should have 'spark' attribute, having a spark session."
+
+        try:
+            yield
+        finally:
+            for f in functions:
+                self.spark.sql("DROP FUNCTION IF EXISTS %s" % f)
+
+
+class ReusedSQLTestCase(unittest.TestCase, SQLTestUtils):
+    @classmethod
+    def setUpClass(cls):
+        cls.spark = default_session()
+        cls.spark.conf.set(SPARK_CONF_ARROW_ENABLED, True)
+
+    @classmethod
+    def tearDownClass(cls):
+        # We don't stop Spark session to reuse across all tests.
+        # The Spark session will be started and stopped at PyTest session level.
+        # Please see databricks/koalas/conftest.py.
+        pass
+
+    def assertPandasEqual(self, left, right, check_exact=True):
+        if isinstance(left, pd.DataFrame) and isinstance(right, pd.DataFrame):
+            try:
+                if LooseVersion(pd.__version__) >= LooseVersion("1.1"):
+                    kwargs = dict(check_freq=False)
+                else:
+                    kwargs = dict()
+
+                assert_frame_equal(
+                    left,
+                    right,
+                    check_index_type=("equiv" if len(left.index) > 0 else False),
+                    check_column_type=("equiv" if len(left.columns) > 0 else False),
+                    check_exact=check_exact,
+                    **kwargs
+                )
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtypes)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtypes)
+                )
+                raise AssertionError(msg) from e
+        elif isinstance(left, pd.Series) and isinstance(right, pd.Series):
+            try:
+                if LooseVersion(pd.__version__) >= LooseVersion("1.1"):
+                    kwargs = dict(check_freq=False)
+                else:
+                    kwargs = dict()
+
+                assert_series_equal(
+                    left,
+                    right,
+                    check_index_type=("equiv" if len(left.index) > 0 else False),
+                    check_exact=check_exact,
+                    **kwargs
+                )
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+                )
+                raise AssertionError(msg) from e
+        elif isinstance(left, pd.Index) and isinstance(right, pd.Index):
+            try:
+                assert_index_equal(left, right, check_exact=check_exact)
+            except AssertionError as e:
+                msg = (
+                    str(e)
+                    + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                    + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+                )
+                raise AssertionError(msg) from e
+        else:
+            raise ValueError("Unexpected values: (%s, %s)" % (left, right))
+
+    def assertPandasAlmostEqual(self, left, right):
+        """
+        This function checks if given pandas objects approximately same,
+        which means the conditions below:
+          - Both objects are nullable
+          - Compare floats rounding to the number of decimal places, 7 after
+            dropping missing values (NaN, NaT, None)
+        """
+        if isinstance(left, pd.DataFrame) and isinstance(right, pd.DataFrame):
+            msg = (
+                "DataFrames are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtypes)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtypes)
+            )
+            self.assertEqual(left.shape, right.shape, msg=msg)
+            for lcol, rcol in zip(left.columns, right.columns):
+                self.assertEqual(lcol, rcol, msg=msg)
+                for lnull, rnull in zip(left[lcol].isnull(), right[rcol].isnull()):
+                    self.assertEqual(lnull, rnull, msg=msg)
+                for lval, rval in zip(left[lcol].dropna(), right[rcol].dropna()):
+                    self.assertAlmostEqual(lval, rval, msg=msg)
+            self.assertEqual(left.columns.names, right.columns.names, msg=msg)
+        elif isinstance(left, pd.Series) and isinstance(right, pd.Series):
+            msg = (
+                "Series are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(left.name, right.name, msg=msg)
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lnull, rnull in zip(left.isnull(), right.isnull()):
+                self.assertEqual(lnull, rnull, msg=msg)
+            for lval, rval in zip(left.dropna(), right.dropna()):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        elif isinstance(left, pd.MultiIndex) and isinstance(right, pd.MultiIndex):
+            msg = (
+                "MultiIndices are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lval, rval in zip(left, right):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        elif isinstance(left, pd.Index) and isinstance(right, pd.Index):
+            msg = (
+                "Indices are not almost equal: "
+                + "\n\nLeft:\n%s\n%s" % (left, left.dtype)
+                + "\n\nRight:\n%s\n%s" % (right, right.dtype)
+            )
+            self.assertEqual(len(left), len(right), msg=msg)
+            for lnull, rnull in zip(left.isnull(), right.isnull()):
+                self.assertEqual(lnull, rnull, msg=msg)
+            for lval, rval in zip(left.dropna(), right.dropna()):
+                self.assertAlmostEqual(lval, rval, msg=msg)
+        else:
+            raise ValueError("Unexpected values: (%s, %s)" % (left, right))
+
+    def assert_eq(self, left, right, check_exact=True, almost=False):
+        """
+        Asserts if two arbitrary objects are equal or not. If given objects are Koalas DataFrame
+        or Series, they are converted into pandas' and compared.
+
+        :param left: object to compare
+        :param right: object to compare
+        :param check_exact: if this is False, the comparison is done less precisely.
+        :param almost: if this is enabled, the comparison is delegated to `unittest`'s
+                       `assertAlmostEqual`. See its documentation for more details.
+        """
+        lobj = self._to_pandas(left)
+        robj = self._to_pandas(right)
+        if isinstance(lobj, (pd.DataFrame, pd.Series, pd.Index)):
+            if almost:
+                self.assertPandasAlmostEqual(lobj, robj)
+            else:
+                self.assertPandasEqual(lobj, robj, check_exact=check_exact)
+        elif is_list_like(lobj) and is_list_like(robj):
+            self.assertTrue(len(left) == len(right))
+            for litem, ritem in zip(left, right):
+                self.assert_eq(litem, ritem, check_exact=check_exact, almost=almost)
+        elif (lobj is not None and pd.isna(lobj)) and (robj is not None and pd.isna(robj)):
+            pass
+        else:
+            if almost:
+                self.assertAlmostEqual(lobj, robj)
+            else:
+                self.assertEqual(lobj, robj)
+
+    @staticmethod
+    def _to_pandas(obj):
+        if isinstance(obj, (DataFrame, Series, Index)):
+            return obj.to_pandas()
+        else:
+            return obj
+
+
+class TestUtils(object):
+    @contextmanager
+    def temp_dir(self):
+        tmp = tempfile.mkdtemp()
+        try:
+            yield tmp
+        finally:
+            shutil.rmtree(tmp)
+
+    @contextmanager
+    def temp_file(self):
+        with self.temp_dir() as tmp:
+            yield tempfile.mktemp(dir=tmp)
+
+
+class ComparisonTestBase(ReusedSQLTestCase):
+    @property
+    def kdf(self):
+        return ps.from_pandas(self.pdf)
+
+    @property
+    def pdf(self):
+        return self.kdf.to_pandas()
+
+
+def compare_both(f=None, almost=True):
+
+    if f is None:
+        return functools.partial(compare_both, almost=almost)
+    elif isinstance(f, bool):
+        return functools.partial(compare_both, almost=f)
+
+    @functools.wraps(f)
+    def wrapped(self):
+        if almost:
+            compare = self.assertPandasAlmostEqual
+        else:
+            compare = self.assertPandasEqual
+
+        for result_pandas, result_spark in zip(f(self, self.pdf), f(self, self.kdf)):
+            compare(result_pandas, result_spark.to_pandas())
+
+    return wrapped
+
+
+@contextmanager
+def assert_produces_warning(
+    expected_warning=Warning,
+    filter_level="always",
+    check_stacklevel=True,
+    raise_on_extra_warnings=True,
+):
+    """
+    Context manager for running code expected to either raise a specific
+    warning, or not raise any warnings. Verifies that the code raises the
+    expected warning, and that it does not raise any other unexpected
+    warnings. It is basically a wrapper around ``warnings.catch_warnings``.
+
+    Notes
+    -----
+    Replicated from pandas._testing.

Review comment:
       It is changed to pandas/_testing/_warnings.py. Hopefully, it's clearer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820651256






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819826888


   **[Test build #137372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137372/testReport)** for PR 32177 at commit [`b31ab33`](https://github.com/apache/spark/commit/b31ab33730dbaafaac5832ea53c704aae771df59).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r614266929



##########
File path: python/pyspark/pandas/tests/test_dataframe_conversion.py
##########
@@ -26,7 +26,7 @@
 
 from pyspark import pandas as pp

Review comment:
       I will do the renaming as my last commit of this patch for easier reviews.
   Thanks for the reminder!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824427374


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42277/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822701231






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824516680


   **[Test build #137767 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137767/testReport)** for PR 32177 at commit [`8660ae8`](https://github.com/apache/spark/commit/8660ae8c6cfdeb45dd1d1a07b6ff1af01aa2dec1).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824360027


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137746/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820644117






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-823710538


   **[Test build #137706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137706/testReport)** for PR 32177 at commit [`4be5e72`](https://github.com/apache/spark/commit/4be5e726713c1fd475ca5fa2f2710f386eaba21d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819815996


   This PR ought to be adjusted and merged after https://github.com/apache/spark/pull/32139 and https://github.com/apache/spark/pull/32152.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-822630372


   **[Test build #137641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137641/testReport)** for PR 32177 at commit [`db7d470`](https://github.com/apache/spark/commit/db7d470e631ad51646df9aba162eb3d2e67dcb72).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-825073349


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137816/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-824541701


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42295/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-821554068


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42069/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820067686


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41970/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-819817791


   **[Test build #137372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137372/testReport)** for PR 32177 at commit [`b31ab33`](https://github.com/apache/spark/commit/b31ab33730dbaafaac5832ea53c704aae771df59).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xinrong-databricks commented on a change in pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
xinrong-databricks commented on a change in pull request #32177:
URL: https://github.com/apache/spark/pull/32177#discussion_r617873875



##########
File path: python/pyspark/pandas/tests/plot/test_frame_plot_matplotlib.py
##########
@@ -25,7 +25,11 @@
 
 from pyspark import pandas as ps
 from pyspark.pandas.config import set_option, reset_option
-from pyspark.pandas.testing.utils import have_matplotlib, ReusedSQLTestCase, TestUtils
+from pyspark.testing.pandasutils import (
+    have_matplotlib,
+    matplotlib_requirement_message,
+    ReusedSQLTestCase, TestUtils

Review comment:
       Certainly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820605317


   **[Test build #752891640](https://github.com/xinrong-databricks/spark/actions/runs/752891640)** for PR 32177 at commit [`47af208`](https://github.com/xinrong-databricks/spark/commit/47af2085387fb7119fa374d65f29a9a891ae34e4).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32177: [SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-825054806


   **[Test build #137816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137816/testReport)** for PR 32177 at commit [`95bdbcf`](https://github.com/apache/spark/commit/95bdbcf2b368f0a1f252f963a5c4fb534d387ec0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32177: [WIP][SPARK-34999][PYTHON] Consolidate PySpark testing utils

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32177:
URL: https://github.com/apache/spark/pull/32177#issuecomment-820613990


   **[Test build #752927701](https://github.com/xinrong-databricks/spark/actions/runs/752927701)** for PR 32177 at commit [`73ce19a`](https://github.com/xinrong-databricks/spark/commit/73ce19a6a44933869db0be7dc02d03958cf0e44c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org