You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/04 14:45:15 UTC

[GitHub] [spark] MaxGekk opened a new pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

MaxGekk opened a new pull request #31475:
URL: https://github.com/apache/spark/pull/31475


   ### What changes were proposed in this pull request?
   1. Extend the `TableCatalog` by new method `truncateTable()`
   2. Implement new method in `BasicInMemoryTableCatalog`
   
   ### Why are the changes needed?
   To support `TRUNCATE TABLE` for v2 tables.
   
   ### Does this PR introduce _any_ user-facing change?
   Should not.
   
   ### How was this patch tested?
   Added new tests to `TableCatalogSuite`:
   ```
   $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *TableCatalogSuite"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #31475:
URL: https://github.com/apache/spark/pull/31475


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776673234


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39680/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-781231945


   **[Test build #135225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135225/testReport)** for PR 31475 at commit [`b758060`](https://github.com/apache/spark/commit/b758060948cdb73e939eeb8ca923bc00caf111dd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776929251


   **[Test build #135103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135103/testReport)** for PR 31475 at commit [`bcc01e1`](https://github.com/apache/spark/commit/bcc01e17ab3bd829292b4855ff0e9b418ec5b887).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-781264320


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39807/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r578171819



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java
##########
@@ -68,4 +69,15 @@ default boolean canDeleteWhere(Filter[] filters) {
    * @throws IllegalArgumentException If the delete is rejected due to required effort
    */
   void deleteWhere(Filter[] filters);
+
+  Filter[] ALWAYS_TRUE_FILTER = new Filter[] { new AlwaysTrue() };

Review comment:
       How about to revert this commit https://github.com/apache/spark/pull/31475/commits/d1e5a18066f9fb2ff0ca1504e7c3f0802905febd , and implement it as:
   ```scala
   default boolean truncateTable() {
       Filter[] filters = new Filter[] { new AlwaysTrue() };
      ...
   ```
   I am not sure it is worth to do this premature optimization. Comparing to the truncation op, the allocation overhead is small. If it is a hot spot, JVM should do all work for us and optimize it, I do believe. @rdblue @cloud-fan @HyukjinKwon WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777256476


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39694/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-778103797


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39702/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776317964


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135081/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-781379576


   **[Test build #135225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135225/testReport)** for PR 31475 at commit [`b758060`](https://github.com/apache/spark/commit/b758060948cdb73e939eeb8ca923bc00caf111dd).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573190005



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {

Review comment:
       Maybe, do we need `Atomic` in the name? For example, `SupportsAtomicTruncate` is better?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776798748


   **[Test build #135098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135098/testReport)** for PR 31475 at commit [`2741a40`](https://github.com/apache/spark/commit/2741a401c61d68e7bee61ec17eb6a74cefe3c6cf).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-778115631


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39702/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r577632407



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java
##########
@@ -68,4 +69,15 @@ default boolean canDeleteWhere(Filter[] filters) {
    * @throws IllegalArgumentException If the delete is rejected due to required effort
    */
   void deleteWhere(Filter[] filters);
+
+  Filter[] ALWAYS_TRUE_FILTER = new Filter[] { new AlwaysTrue() };

Review comment:
       I wouldn't do that because of:
   - I believe interfaces should be independent from internals as much as possible.
   - ALWAYS_TRUE_FILTER can be used in other methods of the interface like `deleteWhere()`. For example, when an implementation overrides `truncateTable()` but an user want to delete all rows via `deleteWhere()`, so, he/she can re-use the constant.
   - Other interfaces have constants too, for example https://github.com/apache/spark/blob/ec1560af251d2c3580f5bccfabc750f1c7af09df/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java#L47 . I don't see much difference between `PROP_LOCATION` and  `ALWAYS_TRUE_FILTER`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-778198991


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135121/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-778057750


   **[Test build #135121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135121/testReport)** for PR 31475 at commit [`d1e5a18`](https://github.com/apache/spark/commit/d1e5a18066f9fb2ff0ca1504e7c3f0802905febd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776191153


   **[Test build #135081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135081/testReport)** for PR 31475 at commit [`141919c`](https://github.com/apache/spark/commit/141919c6543ee01d41dc02a14cd04df9edb7f6f9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-781259100


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39807/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-775901454


   > ... can you share the use case that you have for this?
   
   @rdblue For instance, v2 table catalog for JDBC:
   
   0. I assume if a table supports the `SupportsTruncate` interface, it must support atomic remove of all data and write any set of rows (empty and **non-empty**).
   1. Support of atomic `SupportsTruncate` is not so easy in the case of JDBC. For example, we still don't support it in JDBC v2 Table Catalog, see SPARK-32595.
   2. DBMS usually provides special command for table truncation (see [DB2](https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_truncate.html), [Oracle](https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_10007.htm#SQLRF01707), [PostgreSQL](https://www.postgresql.org/docs/9.1/sql-truncate.html), [Hive](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-TruncateTable)). So, we could map the new method `truncateTable` to a DBMS command.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573200376



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {

Review comment:
       All of the operations should be atomic, right? Certainly, that's the expectation of a write that truncates and then appends data. @MaxGekk cited that as a reason above why this interface is needed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-775093012


   LGTM, also cc @rdblue @imback82 @dongjoon-hyun 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777246209


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39694/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776221082


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39663/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r574840886



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java
##########
@@ -68,4 +69,14 @@ default boolean canDeleteWhere(Filter[] filters) {
    * @throws IllegalArgumentException If the delete is rejected due to required effort
    */
   void deleteWhere(Filter[] filters);
+
+  @Override
+  default boolean truncateTable() {
+    Filter[] filters = new Filter[] { new AlwaysTrue() };

Review comment:
       Could you make this a constant instead of creating a new filter and array instance here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776399717


   I like the idea to allow the catalog implementations to distinguish between `TRUNCATE TABLE t` and `INSERT OVERWRITE t SELECT * FROM empty_table`. It seems fine to move the API to the table side. But the name conflict is a bit annoying.
   
   One idea is to keep the name `SupportsTruncate`, and put it in `org.apache.spark.sql.connector.catalog`, which is the same package of `SupportsDelete`. Then there is no conflict.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573200820



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {
+  /**
+   * Truncate a table by removing all rows from the table atomically.
+   * If the table supports partitions, the method removes all existing partitions.
+   *
+   * @return true if a table was truncated successfully otherwise false
+   *
+   * @since 3.2.0
+   */
+  boolean truncateTable();

Review comment:
       Actually, this seems to mirror the other uses so I'm okay with it. I'd probably remove it but it seems okay to leave as is if you feel strongly about it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773570066


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134876/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773460642


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39464/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-778057750


   **[Test build #135121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135121/testReport)** for PR 31475 at commit [`d1e5a18`](https://github.com/apache/spark/commit/d1e5a18066f9fb2ff0ca1504e7c3f0802905febd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-782821920


   Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773436343


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39464/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-778198991


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135121/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773460642






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573187249



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {
+  /**
+   * Truncate a table by removing all rows from the table atomically.
+   * If the table supports partitions, the method removes all existing partitions.
+   *
+   * @return true if a table was truncated successfully otherwise false
+   *
+   * @since 3.2.0
+   */
+  boolean truncateTable();

Review comment:
       Why include "table" in the method name? `table.truncate()` seems clear enough to me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r578171819



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java
##########
@@ -68,4 +69,15 @@ default boolean canDeleteWhere(Filter[] filters) {
    * @throws IllegalArgumentException If the delete is rejected due to required effort
    */
   void deleteWhere(Filter[] filters);
+
+  Filter[] ALWAYS_TRUE_FILTER = new Filter[] { new AlwaysTrue() };

Review comment:
       How about to revert this commit https://github.com/apache/spark/pull/31475/commits/d1e5a18066f9fb2ff0ca1504e7c3f0802905febd , and implement it as:
   ```scala
   default boolean truncateTable() {
       Filter[] filters = new Filter[] { new AlwaysTrue() };
      ...
   ```
   I am not sure it is worth to do this premature optimization. Comparing to the truncation op, the allocation overhead is small. If it is a hot spot, JVM should do all work for us and optimize it, I do believe.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777806353


   Looks good now. Thanks for fixing this, @MaxGekk!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776221049


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39663/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773379588


   **[Test build #134876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134876/testReport)** for PR 31475 at commit [`b77a210`](https://github.com/apache/spark/commit/b77a2103356ffcf5ecaf34ec55d4d471c50a4edf).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776618014


   **[Test build #135098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135098/testReport)** for PR 31475 at commit [`2741a40`](https://github.com/apache/spark/commit/2741a401c61d68e7bee61ec17eb6a74cefe3c6cf).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777008303


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39685/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-775373033


   > ... why is this necessary instead of deleting from the table or overwriting everything with no new records?
   
   1. By emulating table truncation via the insertion of no rows, you require atomic operations: delete + insert but a concrete implementation might not support this though it can atomically truncate a table.
   2. You close the room for truncation specific optimizations. If a catalog implementation would know in advance that we want to truncate the entire table instead of deleting all rows, it could do that in a more optimal way. Let's say some file based implementation could move the table folder to a trash folder using one atomic syscall.
   3. From security or permissions controls point of view, we could distinguish insert with overwrite (or delete) from truncation. I could imagine a case when some roles/users can have only truncation permissions but not insert or delete permissions.
   4.  Also it is possible that truncation op is just a record at catalog level log but inserts/deletes are records at table level logs. So, we cannot smoothly sit on such implementation if we emulate table truncation via inserts/deletes.
   
   In general, I do believe we should not hide our intention from catalog implementations - truncation should be explicit. Table catalog implementation should decide how to implement in a more optimal way. So, if they can emulate truncation via overwriting with no rows, ok, this is up to them.   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776929251


   **[Test build #135103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135103/testReport)** for PR 31475 at commit [`bcc01e1`](https://github.com/apache/spark/commit/bcc01e17ab3bd829292b4855ff0e9b418ec5b887).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776191153


   **[Test build #135081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135081/testReport)** for PR 31475 at commit [`141919c`](https://github.com/apache/spark/commit/141919c6543ee01d41dc02a14cd04df9edb7f6f9).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776119870


   @MaxGekk, thanks. Then let's work on updating this to fit more cleanly with the design of v2 catalogs and tables. This should be a `Table` interface, not an extension to `TableCatalog`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573190852



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {
+  /**
+   * Truncate a table by removing all rows from the table atomically.
+   * If the table supports partitions, the method removes all existing partitions.

Review comment:
       `SupportsPartitionManagement` has a `truncatePartition` method that will "Truncate a partition in the table by completely removing partition data."
   
   That conflicts with the behavior of truncate here, which drops partitions. I think this requirement should be removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777235749


   **[Test build #135112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135112/testReport)** for PR 31475 at commit [`5af950a`](https://github.com/apache/spark/commit/5af950afa6f4f89269a2e6aea2d62b48d5f818d3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773460642






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573188836



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {

Review comment:
       The other traits use `Supports` as the first word, but `SupportsTruncate` already exists. This name is okay, but others may want to change it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-775029242


   @cloud-fan @HyukjinKwon Any objections to this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776834995


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135098/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r578179343



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java
##########
@@ -68,4 +69,15 @@ default boolean canDeleteWhere(Filter[] filters) {
    * @throws IllegalArgumentException If the delete is rejected due to required effort
    */
   void deleteWhere(Filter[] filters);
+
+  Filter[] ALWAYS_TRUE_FILTER = new Filter[] { new AlwaysTrue() };

Review comment:
       Okay, I just noticed that it was changed per the review comment above. I think it's fine to remove as the overhead is small, and avoid exposing `ALWAYS_TRUE_FILTER` as an API.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573189665



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
##########
@@ -51,7 +51,7 @@ class InMemoryTable(
     val distribution: Distribution = Distributions.unspecified(),
     val ordering: Array[SortOrder] = Array.empty)
   extends Table with SupportsRead with SupportsWrite with SupportsDelete
-      with SupportsMetadataColumns {
+      with SupportsMetadataColumns with TruncatableTable {

Review comment:
       I think that `SupportsDelete` should implement `TruncatableTable`, similar to how `SupportsOverwrite` implements `SupportsTruncate`. That way it isn't necessary to for implementations to support both delete and truncate separately.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777048542


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135103/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-775490751


   @MaxGekk, can you share the use case that you have for this? You mentioned truncation-specific optimizations. I think working with concrete use cases is usually a good idea. If these are theoretical only -- like a user that can drop all data but not a subset -- then we should put this off. If there's a specific case, then let's discuss it.
   
   I agree that there _may_ be good reason to pass that the engine's intent was to truncate. That's why we have `SupportsTruncate` for the write builder. And I agree with you that we don't necessarily need to use an atomic operation that could truncate and add data at the same time. Your point about not having insert permissions is a good one to justify not using `SupportsTruncate`, although the case of a user that can drop all data but not subsets doesn't sound real. The point about truncation possibly being a metadata operation is why we added `SupportsDelete` at the table level.
   
   Those points may indicate that an interface to truncate a table as a stand-alone operation is valid, although I still think that it is a bad idea to add more interfaces to v2 without a reasonable expectation that they will actually be used.
   
   Another problem here is that this is operation is proposed at the catalog level, which does not fit with how v2 works. I think that the reason for this is emulating what the Hive does, but that's not usually a good choice.
   
   In v2, catalogs load tables and tables are modified. That's why `SupportsDelete` extends `Table` and not `TableCatalog`. This keeps concerns separate, so we have a way to handle tables that don't exist and a separate way to handle tables that don't support a certain operation. Mixing those two together at the catalog level over-complicates the API, requiring a source to throw one exception if the table doesn't exist and another if it doesn't support truncation. (We also went through this discussion with the recently added interfaces to add/drop partitions.)
   
   Assuming that it is worth adding this interface, I would expect it to be a mix-in for `Table`. And like `SupportsOverwrite` that implements `SupportsTruncate`, I think this should also update `SupportsDelete` so that tables don't need to implement both interfaces.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573538823



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {

Review comment:
       if we name this interface as `SupportsAtomicTruncate`, someone may guess that `SupportsTruncate` is opposite to it (means non-atomic truncate) or both atomic and non-atomic truncate. But actually, those two interfaces are "orthogonals" in some sense.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573910127



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {

Review comment:
       I don't think that adding `Atomic` to the name is a good idea. The other operations don't specify whether an operation is atomic and I don't think that this should necessarily either. If I understand correctly, the purpose of this is to allow using `TRUNCATE` for JDBC or similar optimizations. That's more likely to be atomic, but may not be. It looks like the Hive implementation would not be because the partitions are kept.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573907465



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
##########
@@ -51,7 +51,7 @@ class InMemoryTable(
     val distribution: Distribution = Distributions.unspecified(),
     val ordering: Array[SortOrder] = Array.empty)
   extends Table with SupportsRead with SupportsWrite with SupportsDelete
-      with SupportsMetadataColumns {
+      with SupportsMetadataColumns with TruncatableTable {

Review comment:
       Truncate is equivalent to `deleteWhere(true)`. Why would that not be equivalent?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-781231945


   **[Test build #135225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135225/testReport)** for PR 31475 at commit [`b758060`](https://github.com/apache/spark/commit/b758060948cdb73e939eeb8ca923bc00caf111dd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777374418


   **[Test build #135112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135112/testReport)** for PR 31475 at commit [`5af950a`](https://github.com/apache/spark/commit/5af950afa6f4f89269a2e6aea2d62b48d5f818d3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773557994


   @cloud-fan @HyukjinKwon Could you review this, please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-778083724


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39702/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r577614309



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java
##########
@@ -68,4 +69,15 @@ default boolean canDeleteWhere(Filter[] filters) {
    * @throws IllegalArgumentException If the delete is rejected due to required effort
    */
   void deleteWhere(Filter[] filters);
+
+  Filter[] ALWAYS_TRUE_FILTER = new Filter[] { new AlwaysTrue() };

Review comment:
       This becomes a public API as well. Shall we put it in an internal object like `CatalogV2Util`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777042665


   **[Test build #135103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135103/testReport)** for PR 31475 at commit [`bcc01e1`](https://github.com/apache/spark/commit/bcc01e17ab3bd829292b4855ff0e9b418ec5b887).
    * This patch passes all tests.
    * This patch **does not merge cleanly**.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573528484



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {
+  /**
+   * Truncate a table by removing all rows from the table atomically.
+   * If the table supports partitions, the method removes all existing partitions.

Review comment:
       but in this PR, I drop empty partitions as well. The question is should v2 `TRUNCATE TABLE` be aligned to v1 implementation, and preserve empty partitions? I guess the answer is yes. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-781264295


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39807/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573518309



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {
+  /**
+   * Truncate a table by removing all rows from the table atomically.
+   * If the table supports partitions, the method removes all existing partitions.

Review comment:
       > Is there a compelling reason to force this behavior?
   
   Yes, I wanted to highlight that it can remove not only all rows but also partitions to align to Spark's v1 (and Hive) `TRUNCATE TABLE` but the v1 command doesn't drop partitions in fact:
   ```sql
   spark-sql> CREATE TABLE tbl (col0 INT) PARTITIONED BY (part INT);
   spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0;
   spark-sql> ALTER TABLE tbl ADD PARTITION (part=1);
   spark-sql> SHOW PARTITIONS tbl;
   part=0
   part=1
   spark-sql> SELECT * FROM tbl;
   0	0
   spark-sql> TRUNCATE TABLE tbl;
   spark-sql> SHOW PARTITIONS tbl;
   part=0
   part=1
   spark-sql> SELECT * FROM tbl;
   spark-sql>
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573521785



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {
+  /**
+   * Truncate a table by removing all rows from the table atomically.
+   * If the table supports partitions, the method removes all existing partitions.
+   *
+   * @return true if a table was truncated successfully otherwise false
+   *
+   * @since 3.2.0
+   */
+  boolean truncateTable();

Review comment:
       I just followed naming convention (maybe informal one) in other interfaces. For instance, if a table implements `TruncatableTable`, `SupportsPartitionManagement` and `SupportsAtomicPartitionManagement`, we have 3 methods in the same namespace:
   - truncatePartition()
   - truncatePartitions()
   - and truncate()
   
   In that case, maybe it is better to name this method as `truncateTable()` which can highlight that it is applied to entire table.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776834995


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135098/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773556511


   **[Test build #134876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134876/testReport)** for PR 31475 at commit [`b77a210`](https://github.com/apache/spark/commit/b77a2103356ffcf5ecaf34ec55d4d471c50a4edf).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777384077


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135112/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777276880


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39694/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-778115631


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39702/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777235749


   **[Test build #135112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135112/testReport)** for PR 31475 at commit [`5af950a`](https://github.com/apache/spark/commit/5af950afa6f4f89269a2e6aea2d62b48d5f818d3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776673234


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39680/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-781403519


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135225/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-775318833


   @MaxGekk, why is this necessary instead of deleting from the table or overwriting everything with no new records? I don't see a good reason to do this, especially at the catalog level instead of the table level. Introducing new ways to do something that is already possible over-complicates the API and is a step in the wrong direction.
   
   Please consider this a -1 until we come to consensus -- I may support it in the end, but I don't want anyone choosing to commit despite disagreement in the mean time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-781403519


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135225/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573188214



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {
+  /**
+   * Truncate a table by removing all rows from the table atomically.
+   * If the table supports partitions, the method removes all existing partitions.

Review comment:
       Why is this required? A table may support empty partitions that exist independent of rows, like Hive tables. Is there a compelling reason to force this behavior?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r578039193



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java
##########
@@ -68,4 +69,15 @@ default boolean canDeleteWhere(Filter[] filters) {
    * @throws IllegalArgumentException If the delete is rejected due to required effort
    */
   void deleteWhere(Filter[] filters);
+
+  Filter[] ALWAYS_TRUE_FILTER = new Filter[] { new AlwaysTrue() };

Review comment:
       Can't people use `AlwaysTrue` directly if they want? I would also prefer to avoid having multiple ways to do the same thing. It looks odd to have the AlwaysTrue predicate constant as an API under `SupportsDelete` interface.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777276880


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39694/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773415971


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39464/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-778180829


   **[Test build #135121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135121/testReport)** for PR 31475 at commit [`d1e5a18`](https://github.com/apache/spark/commit/d1e5a18066f9fb2ff0ca1504e7c3f0802905febd).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777384077


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135112/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773379588


   **[Test build #134876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134876/testReport)** for PR 31475 at commit [`b77a210`](https://github.com/apache/spark/commit/b77a2103356ffcf5ecaf34ec55d4d471c50a4edf).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777008303


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39685/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773460642


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39464/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776618014


   **[Test build #135098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135098/testReport)** for PR 31475 at commit [`2741a40`](https://github.com/apache/spark/commit/2741a401c61d68e7bee61ec17eb6a74cefe3c6cf).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-781264320


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39807/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776317046


   **[Test build #135081 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135081/testReport)** for PR 31475 at commit [`141919c`](https://github.com/apache/spark/commit/141919c6543ee01d41dc02a14cd04df9edb7f6f9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573200376



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {

Review comment:
       All of the operations should be atomic, right? Certainly, that's the expectation of a write that truncates and then appends data.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776221082


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39663/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773570066


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134876/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776317964


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135081/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773379588


   **[Test build #134876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134876/testReport)** for PR 31475 at commit [`b77a210`](https://github.com/apache/spark/commit/b77a2103356ffcf5ecaf34ec55d4d471c50a4edf).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-776218193


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39663/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573829234



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {
+  /**
+   * Truncate a table by removing all rows from the table atomically.
+   * If the table supports partitions, the method removes all existing partitions.

Review comment:
       Surprising but we don't have any tests that checks partition existence after entire table truncation. I opened this PR https://github.com/apache/spark/pull/31544 with such checks for tables from v1 In-Memory and Hive external catalogs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773557994


   @cloud-fan @HyukjinKwon Could you review this, please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573532339



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
##########
@@ -51,7 +51,7 @@ class InMemoryTable(
     val distribution: Distribution = Distributions.unspecified(),
     val ordering: Array[SortOrder] = Array.empty)
   extends Table with SupportsRead with SupportsWrite with SupportsDelete
-      with SupportsMetadataColumns {
+      with SupportsMetadataColumns with TruncatableTable {

Review comment:
       If you are sure that if an implementation supports `SupportsDelete ` then it must support `TruncatableTable` too. But I slightly doubt about it. I could imagine a case when an implementation can delete rows by a filter but cannot guarantee atomic truncation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31475: [SPARK-34360][SQL] Support table truncation by v2 Table Catalogs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-773379588






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-777048542


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135103/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31475: [SPARK-34360][SQL] Support truncation of v2 tables

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31475:
URL: https://github.com/apache/spark/pull/31475#discussion_r573518309



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Represents a table which can be atomically truncated.
+ */
+@Evolving
+public interface TruncatableTable extends Table {
+  /**
+   * Truncate a table by removing all rows from the table atomically.
+   * If the table supports partitions, the method removes all existing partitions.

Review comment:
       > Is there a compelling reason to force this behavior?
   
   Yes, I wanted to highlight that it can remove not only all rows but also partitions to align to Spark's v1 (and Hive) `TRUNCATE TABLE` but the command doesn't drop partitions in fact:
   ```sql
   spark-sql> CREATE TABLE tbl (col0 INT) PARTITIONED BY (part INT);
   spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0;
   spark-sql> ALTER TABLE tbl ADD PARTITION (part=1);
   spark-sql> SHOW PARTITIONS tbl;
   part=0
   part=1
   spark-sql> SELECT * FROM tbl;
   0	0
   spark-sql> TRUNCATE TABLE tbl;
   spark-sql> SHOW PARTITIONS tbl;
   part=0
   part=1
   spark-sql> SELECT * FROM tbl;
   spark-sql>
   ```
   Let me remove the sentence.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org