You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/22 12:46:27 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #36957: [SPARK-39555][PYTHON][WIP] Make createTable and listTables in the python side support 3-layer-namespace

zhengruifeng opened a new pull request, #36957:
URL: https://github.com/apache/spark/pull/36957

   ### What changes were proposed in this pull request?
   
   Corresponding changes in the python side of [SPARK-39236](https://issues.apache.org/jira/browse/SPARK-39236) (Make CreateTable API and ListTables API compatible )
   
   ### Why are the changes needed?
   
   to support 3-layer-namespace in the python side
   
   ### Does this PR introduce _any_ user-facing change?
   
   yes
   
   
   ### How was this patch tested?
   1, for existing UT, update them to cover original cases;
   
   2, for 3-layer-namespace, maually test:
     - 2.1 move all `InMemoryCatalog` related files from `test` to `main`, by `mv sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemory*.scala sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/`
    - 2.2 fix a few conflicts
    - 2.3 compile and then maually test in the pyspark repl, for example:
   
   ```python
   
   spark.sql("CREATE DATABASE my_db")
   spark.createDataFrame([(1, 1)]).createOrReplaceTempView("temp_tab")
   spark.sql("CREATE TABLE tab1 (name STRING, age INT) USING parquet")
   spark.sql("CREATE TABLE my_db.tab2 (name STRING, age INT) USING parquet")
   
   schema = StructType([StructField("a", IntegerType(), True)])
   description = "this is a test table"
   
   spark.conf.set("spark.sql.catalog.testcat", "org.apache.spark.sql.connector.catalog.InMemoryCatalog")
   
   spark.catalog.createTable("testcat.my_db.my_table", source="json", schema=schema, description=description)
   
   
   In [2]: spark.catalog.listTables()
   Out[2]: 
   [Table(name='tab1', catalog='spark_catalog', namespace=['default'], description=None, tableType='MANAGED', isTemporary=False),
    Table(name='temp_tab', catalog='spark_catalog', namespace=None, description=None, tableType='TEMPORARY', isTemporary=True)]
   
   In [3]: spark.catalog.listTables("my_db")
   Out[3]: 
   [Table(name='tab2', catalog='spark_catalog', namespace=['my_db'], description=None, tableType='MANAGED', isTemporary=False),
    Table(name='temp_tab', catalog='spark_catalog', namespace=None, description=None, tableType='TEMPORARY', isTemporary=True)]
   
   In [4]: spark.catalog.listTables("testcat.my_db")
   Out[4]: [Table(name='my_table', catalog='testcat', namespace=['my_db'], description='this is a test table', tableType='MANAGED', isTemporary=False)]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #36957:
URL: https://github.com/apache/spark/pull/36957#issuecomment-1163845192

   @HyukjinKwon @cloud-fan @amaliujia @huaxingao 
   
   Please take a look when you are free.
   
   I encountered a problem that the python side also need `InMemoryCatalog` if we want to add UT for 3-layer-namespace, but `InMemoryCatalog` is only for `test` and not accessable from the python side. Is there some approach to work around? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36957:
URL: https://github.com/apache/spark/pull/36957#discussion_r905614422


##########
python/pyspark/sql/catalog.py:
##########
@@ -37,11 +37,19 @@ class Database(NamedTuple):
 
 class Table(NamedTuple):

Review Comment:
   We will have to document this at https://github.com/apache/spark/tree/master/python/docs/source/reference but it's not there already. Let's do it later separately.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #36957:
URL: https://github.com/apache/spark/pull/36957#discussion_r904633505


##########
python/pyspark/sql/catalog.py:
##########
@@ -341,6 +357,9 @@ def createTable(
 
         .. versionchanged:: 3.1
            Added the ``description`` parameter.
+
+        .. versionchanged:: 3.4
+           Made ``tableName`` support 3-layer namespace.

Review Comment:
   OK, let me update it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36957:
URL: https://github.com/apache/spark/pull/36957#issuecomment-1165043720

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36957:
URL: https://github.com/apache/spark/pull/36957#discussion_r904529034


##########
python/pyspark/sql/catalog.py:
##########
@@ -341,6 +357,9 @@ def createTable(
 
         .. versionchanged:: 3.1
            Added the ``description`` parameter.
+
+        .. versionchanged:: 3.4
+           Made ``tableName`` support 3-layer namespace.

Review Comment:
   to be more general:
   ```
   Allowed ``tableName`` to be qualified with catalog name.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace
URL: https://github.com/apache/spark/pull/36957


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36957:
URL: https://github.com/apache/spark/pull/36957#issuecomment-1163848082

   That's sort of annoying problem yeah. We should build Spark w/ `sbt test:package`, and then run the tests. Should probably have a logic like https://github.com/apache/spark/blob/master/python/pyspark/sql/tests/test_dataframe.py#L1204-L1238. Another option is to move the test codes into somewhere within the main code with `private[sql]`. the classes or methods with `private[sql]` can be accessed via Py4J (because that syntax is specific to Scala, and Java class files don't understand them).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #36957:
URL: https://github.com/apache/spark/pull/36957#issuecomment-1164705164

   nice work!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org