You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/25 01:58:11 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

zhengruifeng opened a new pull request, #36985:
URL: https://github.com/apache/spark/pull/36985

   ### What changes were proposed in this pull request?
   1, make TableExists and DatabaseExists support 3-layer-namespace
   2, add GetTable in the python side
   
   
   ### Why are the changes needed?
   to support 3-layer-namespace
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, In `TableExists`, when `dbName` is empty, will first try to treat `tabelName` as a multi-layer-namespace.
   
   
   ### How was this patch tested?
   added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #36985:
URL: https://github.com/apache/spark/pull/36985#discussion_r913435747


##########
python/pyspark/sql/catalog.py:
##########
@@ -164,6 +169,47 @@ def listTables(self, dbName: Optional[str] = None) -> List[Table]:
             )
         return tables
 
+    def getTable(self, tableName: str) -> Table:
+        """Get the table or view with the specified name. This table can be a temporary view or a
+        table/view. This throws an AnalysisException when no Table can be found.

Review Comment:
   sure!
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36985:
URL: https://github.com/apache/spark/pull/36985#discussion_r913377603


##########
python/pyspark/sql/catalog.py:
##########
@@ -164,6 +169,47 @@ def listTables(self, dbName: Optional[str] = None) -> List[Table]:
             )
         return tables
 
+    def getTable(self, tableName: str) -> Table:
+        """Get the table or view with the specified name. This table can be a temporary view or a
+        table/view. This throws an AnalysisException when no Table can be found.
+
+        .. versionadded:: 3.4.0
+
+        Parameters
+        ----------
+        tableName : str
+                    name of the table to check existence.

Review Comment:
   nit: Indentation here seems weird.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #36985:
URL: https://github.com/apache/spark/pull/36985#issuecomment-1166206143

   All test passed except document build failure which should be irrelevant:
   
   ```
     x Failed to parse Rd in histogram.Rd
   ℹ there is no package called ‘ggplot2’
   Caused by error in `loadNamespace()`:
   ! there is no package called ‘ggplot2’ 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #36985:
URL: https://github.com/apache/spark/pull/36985#issuecomment-1166206220

   cc @cloud-fan @HyukjinKwon @amaliujia  could you please take a look when you have some time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36985:
URL: https://github.com/apache/spark/pull/36985#discussion_r906999977


##########
python/pyspark/sql/catalog.py:
##########
@@ -164,6 +169,65 @@ def listTables(self, dbName: Optional[str] = None) -> List[Table]:
             )
         return tables
 
+    def getTable(self, tableName: str, dbName: Optional[str] = None) -> Table:

Review Comment:
   can we remove the `dbName` parameter? I think we should also deprecate table related APIs with db and table name parametes at scala side, and promote the single string parameter APIs. cc @amaliujia 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #36985:
URL: https://github.com/apache/spark/pull/36985#issuecomment-1166211504

   I made a fix here to recover master branch, @zhengruifeng . 
   - https://github.com/apache/spark/pull/36987


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #36985:
URL: https://github.com/apache/spark/pull/36985#discussion_r907041945


##########
python/pyspark/sql/catalog.py:
##########
@@ -164,6 +169,65 @@ def listTables(self, dbName: Optional[str] = None) -> List[Table]:
             )
         return tables
 
+    def getTable(self, tableName: str, dbName: Optional[str] = None) -> Table:

Review Comment:
   I think it's a good idea.
   functions with `dbName` and `tableName` are now somewhat confusing, when `tableName` start to support 3L namespace.
   
   Let me update this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace
URL: https://github.com/apache/spark/pull/36985


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #36985:
URL: https://github.com/apache/spark/pull/36985#issuecomment-1170841456

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #36985:
URL: https://github.com/apache/spark/pull/36985#discussion_r913419191


##########
python/pyspark/sql/catalog.py:
##########
@@ -164,6 +169,47 @@ def listTables(self, dbName: Optional[str] = None) -> List[Table]:
             )
         return tables
 
+    def getTable(self, tableName: str) -> Table:
+        """Get the table or view with the specified name. This table can be a temporary view or a
+        table/view. This throws an AnalysisException when no Table can be found.

Review Comment:
   let me send a PR to update them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36985:
URL: https://github.com/apache/spark/pull/36985#discussion_r913377704


##########
python/pyspark/sql/catalog.py:
##########
@@ -164,6 +169,47 @@ def listTables(self, dbName: Optional[str] = None) -> List[Table]:
             )
         return tables
 
+    def getTable(self, tableName: str) -> Table:
+        """Get the table or view with the specified name. This table can be a temporary view or a
+        table/view. This throws an AnalysisException when no Table can be found.

Review Comment:
   I would follow sphinx synctax in the docs. e.g., class: AnalysisException:



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36985:
URL: https://github.com/apache/spark/pull/36985#discussion_r913434775


##########
python/pyspark/sql/catalog.py:
##########
@@ -164,6 +169,47 @@ def listTables(self, dbName: Optional[str] = None) -> List[Table]:
             )
         return tables
 
+    def getTable(self, tableName: str) -> Table:
+        """Get the table or view with the specified name. This table can be a temporary view or a
+        table/view. This throws an AnalysisException when no Table can be found.

Review Comment:
   it's a nit. maybe we can fix it togehter when you touch this code :-).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #36985: [SPARK-39597][PYTHON] Make GetTable, TableExists and DatabaseExists in the python side support 3-layer-namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #36985:
URL: https://github.com/apache/spark/pull/36985#issuecomment-1170865705

   Thank you @cloud-fan for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org