You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "gengliangwang (via GitHub)" <gi...@apache.org> on 2023/02/17 00:45:41 UTC

[GitHub] [spark] gengliangwang commented on a diff in pull request #40049: [SPARK-42398][SQL] Refine default column value DS v2 interface

gengliangwang commented on code in PR #40049:
URL: https://github.com/apache/spark/pull/40049#discussion_r1109103650


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Column.java:
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import java.util.Map;
+import javax.annotation.Nullable;
+
+import org.apache.spark.sql.connector.expressions.Transform;
+import org.apache.spark.sql.internal.connector.ColumnImpl;
+import org.apache.spark.sql.types.DataType;
+
+/**
+ * An interface representing a column of a {@link Table}. It defines basic properties of a column,
+ * such as name and data type, as well as some advanced ones like default column value.
+ * <p>
+ * Data Sources do not need to implement it. They should consume it in APIs like
+ * {@link TableCatalog#createTable(Identifier, Column[], Transform[], Map)}, and report it in
+ * {@link Table#columns()} by calling the static {@code create} functions of this interface to
+ * create it.
+ */
+public interface Column {

Review Comment:
   Mark it as `@Evolving`?



##########
sql/core/src/test/scala/org/apache/spark/sql/connector/TestV2SessionCatalogBase.scala:
##########
@@ -64,6 +64,15 @@ private[connector] trait TestV2SessionCatalogBase[T <: Table] extends Delegating
     }
   }
 
+  override def createTable(
+      ident: Identifier,
+      columns: Array[Column],
+      partitions: Array[Transform],
+      properties: java.util.Map[String, String]): Table = {
+    createTable(ident, CatalogV2Util.v2ColumnsToStructType(columns), partitions, properties)
+  }
+
+  // TODO: remove it when no tests calling this deprecated method.

Review Comment:
   Is there a follow-up ticket for this?



##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ColumnDefaultValue.java:
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog;
+
+import java.util.Objects;
+import javax.annotation.Nonnull;
+
+import org.apache.spark.sql.connector.expressions.Literal;
+
+/**
+ * A class representing the default value of a column. It contains both the SQL string and literal
+ * value of the user-specified default value expression. The SQL string should be re-evaluated for
+ * each table writing command, which may produce different values if the default value expression is
+ * something like {@code CURRENT_DATE()}. The literal value is used to back-fill existing data if
+ * new columns with default value are added. Note: the back-fill can be lazy. The data sources can
+ * remember the column default value and let the reader fill the column value when reading existing
+ * data that do not have these new columns.
+ */
+public class ColumnDefaultValue {
+  private String sql;
+  private Literal<?> value;

Review Comment:
   The naming `value` is confusing. Shall we rename it as `initialValue`? Or change `sql` and `value` as `currentDefault` and `existingDefault`.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3108,7 +3108,7 @@ object SQLConf {
         "provided values when the corresponding fields are not present in storage.")
       .version("3.4.0")
       .stringConf
-      .createWithDefault("csv,json,orc,parquet")
+      .createWithDefault("csv,json,orc,parquet,hive")

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org