You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "ueshin (via GitHub)" <gi...@apache.org> on 2023/03/09 20:35:04 UTC

[GitHub] [spark] ueshin opened a new pull request, #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

ueshin opened a new pull request, #40356:
URL: https://github.com/apache/spark/pull/40356

   ### What changes were proposed in this pull request?
   
   Fixes `DataFrameWriter.save` to work without path parameter.
   
   ### Why are the changes needed?
   
   `DataFrameWriter.save` should work without table name or path parameter because some data sources, such as jdbc, noop, works without those parameters.
   
   ```py
   >>> print(spark.range(10).write.format("noop").mode("append").save())
   Traceback (most recent call last):
   ...
   AssertionError: Invalid configuration of WriteCommand, neither path or table present.
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   The data sources that don't need path or table parameter will work.
   
   ```py
   >>> print(spark.range(10).write.format("noop").mode("append").save())
   None
   ```
   
   ### How was this patch tested?
   
   Added a test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on PR #40356:
URL: https://github.com/apache/spark/pull/40356#issuecomment-1463045799

   merged to master/branch-3.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

Posted by "ueshin (via GitHub)" <gi...@apache.org>.

ueshin commented on code in PR #40356:
URL: https://github.com/apache/spark/pull/40356#discussion_r1131677278


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -554,7 +554,8 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
       parameters = Map("columnName" -> "`duplicatedcol`"))
   }
 
-  test("Writes fails without path or table") {
+  // TODO(SPARK-42733): Writes without path or table should work.
+  ignore("Writes fails without path or table") {

Review Comment:
   @zhenlineo May I ask you to fix this test later?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhenlineo commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.

zhenlineo commented on code in PR #40356:
URL: https://github.com/apache/spark/pull/40356#discussion_r1131922084


##########
python/pyspark/sql/tests/test_datasources.py:
##########
@@ -192,6 +193,23 @@ def test_ignore_column_of_all_nulls(self):
         finally:
             shutil.rmtree(path)
 
+    def test_jdbc(self):
+        db = f"memory:{uuid.uuid4()}"

Review Comment:
   @ueshin What kind of config do I need to make this work? I got error: Database xxx not found.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhenlineo commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.

zhenlineo commented on code in PR #40356:
URL: https://github.com/apache/spark/pull/40356#discussion_r1132520281


##########
python/pyspark/sql/tests/test_datasources.py:
##########
@@ -192,6 +193,23 @@ def test_ignore_column_of_all_nulls(self):
         finally:
             shutil.rmtree(path)
 
+    def test_jdbc(self):
+        db = f"memory:{uuid.uuid4()}"

Review Comment:
   @ueshin Yeah, the python test works. I want to do the same for scala client, but I am missing some spark config?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng closed pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter
URL: https://github.com/apache/spark/pull/40356


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhenlineo commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.

zhenlineo commented on code in PR #40356:
URL: https://github.com/apache/spark/pull/40356#discussion_r1132734151


##########
python/pyspark/sql/tests/test_datasources.py:
##########
@@ -192,6 +193,23 @@ def test_ignore_column_of_all_nulls(self):
         finally:
             shutil.rmtree(path)
 
+    def test_jdbc(self):
+        db = f"memory:{uuid.uuid4()}"

Review Comment:
   Oh I missed "create", thanks. Works for scala client too!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhenlineo commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.

zhenlineo commented on code in PR #40356:
URL: https://github.com/apache/spark/pull/40356#discussion_r1131689057


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -554,7 +554,8 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
       parameters = Map("columnName" -> "`duplicatedcol`"))
   }
 
-  test("Writes fails without path or table") {
+  // TODO(SPARK-42733): Writes without path or table should work.
+  ignore("Writes fails without path or table") {

Review Comment:
   Yeah, sure. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

Posted by "ueshin (via GitHub)" <gi...@apache.org>.

ueshin commented on code in PR #40356:
URL: https://github.com/apache/spark/pull/40356#discussion_r1131958688


##########
python/pyspark/sql/tests/test_datasources.py:
##########
@@ -192,6 +193,23 @@ def test_ignore_column_of_all_nulls(self):
         finally:
             shutil.rmtree(path)
 
+    def test_jdbc(self):
+        db = f"memory:{uuid.uuid4()}"

Review Comment:
   I think it should work without any configs.
   Could you try to rebuild?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ueshin commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

Posted by "ueshin (via GitHub)" <gi...@apache.org>.

ueshin commented on code in PR #40356:
URL: https://github.com/apache/spark/pull/40356#discussion_r1132717595


##########
python/pyspark/sql/tests/test_datasources.py:
##########
@@ -192,6 +193,23 @@ def test_ignore_column_of_all_nulls(self):
         finally:
             shutil.rmtree(path)
 
+    def test_jdbc(self):
+        db = f"memory:{uuid.uuid4()}"

Review Comment:
   Does #40358 include the test? I can try it and see.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org