You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/22 01:45:15 UTC

[GitHub] [spark] desmondcheongzx opened a new pull request, #38750: Refactor by introducing physical types

desmondcheongzx opened a new pull request, #38750:
URL: https://github.com/apache/spark/pull/38750

   ### What changes were proposed in this pull request?
   
   Refactor case matching for Spark types by introducing physical types. Since multiple logical types match to the same physical type (for e.g. `DateType` and `YearMonthIntervalType` are both implemented using an integer), we can case match on their physical types rather than listing all possible logical types.
   
   ### Why are the changes needed?
   
   These changes simplify the Spark type system.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Since this code is a refactor of existing code, we rely on existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types
URL: https://github.com/apache/spark/pull/38750


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #38750:
URL: https://github.com/apache/spark/pull/38750#issuecomment-1325955422

   nice refactor! We should have done this earlier, before adding ansi interval types and timestamp ntz. Now we should have more confidence of these new data types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38750:
URL: https://github.com/apache/spark/pull/38750#issuecomment-1334713994

   @cloud-fan @desmondcheongzx whether the following scenarios are currently unsuitable for use physical types:
   
   https://github.com/apache/spark/blob/3fc8a90267312fdfc42b7e6a16ee69f2507ef56b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java#L170-L172


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1031052882


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala:
##########
@@ -1901,24 +1904,27 @@ object CodeGenerator extends Logging {
    * Returns the Java type for a DataType.
    */
   def javaType(dt: DataType): String = dt match {
-    case BooleanType => JAVA_BOOLEAN
-    case ByteType => JAVA_BYTE
-    case ShortType => JAVA_SHORT
-    case IntegerType | DateType | _: YearMonthIntervalType => JAVA_INT
-    case LongType | TimestampType | TimestampNTZType | _: DayTimeIntervalType => JAVA_LONG
-    case FloatType => JAVA_FLOAT
-    case DoubleType => JAVA_DOUBLE
-    case _: DecimalType => "Decimal"
-    case BinaryType => "byte[]"
-    case StringType => "UTF8String"
-    case CalendarIntervalType => "CalendarInterval"
-    case _: StructType => "InternalRow"
-    case _: ArrayType => "ArrayData"
-    case _: MapType => "MapData"
     case udt: UserDefinedType[_] => javaType(udt.sqlType)
-    case ObjectType(cls) if cls.isArray => s"${javaType(ObjectType(cls.getComponentType))}[]"
-    case ObjectType(cls) => cls.getName
-    case _ => "Object"
+    case _ => dt.physicalDataType match {

Review Comment:
   nit:
   ```
   dt.physicalDataType match {
     ...
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] desmondcheongzx commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
desmondcheongzx commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1035431998


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala:
##########
@@ -214,6 +215,7 @@ object RowEncoder {
       } else {
         nonNullOutput
       }
+    case _ => inputObject

Review Comment:
   Done



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala:
##########
@@ -253,13 +255,16 @@ object RowEncoder {
       }
     case _: DayTimeIntervalType => ObjectType(classOf[java.time.Duration])
     case _: YearMonthIntervalType => ObjectType(classOf[java.time.Period])
-    case _: DecimalType => ObjectType(classOf[java.math.BigDecimal])
-    case StringType => ObjectType(classOf[java.lang.String])
-    case _: ArrayType => ObjectType(classOf[scala.collection.Seq[_]])
-    case _: MapType => ObjectType(classOf[scala.collection.Map[_, _]])
-    case _: StructType => ObjectType(classOf[Row])
     case p: PythonUserDefinedType => externalDataTypeFor(p.sqlType)
     case udt: UserDefinedType[_] => ObjectType(udt.userClass)
+    case _ => dt.physicalDataType match {
+      case _: PhysicalArrayType => ObjectType(classOf[scala.collection.Seq[_]])
+      case _: PhysicalDecimalType => ObjectType(classOf[java.math.BigDecimal])
+      case _: PhysicalMapType => ObjectType(classOf[scala.collection.Map[_, _]])
+      case _: PhysicalStringType => ObjectType(classOf[java.lang.String])
+      case _: PhysicalStructType => ObjectType(classOf[Row])
+      case _ => dt

Review Comment:
   Done



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala:
##########
@@ -358,6 +363,7 @@ object RowEncoder {
       If(IsNull(input),
         Literal.create(null, externalDataTypeFor(input.dataType)),
         CreateExternalRow(convertedFields, schema))
+    case _ => input

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1031054150


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.types
+
+import org.apache.spark.sql.types._
+
+sealed abstract class PhysicalDataType {}
+
+case class PhysicalArrayType(elementType: DataType, containsNull: Boolean)
+    extends PhysicalDataType {}
+
+case class PhysicalBinaryType() extends PhysicalDataType {}

Review Comment:
   should they be scala `object`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1031053475


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.types
+
+import org.apache.spark.sql.types._
+
+sealed abstract class PhysicalDataType {}

Review Comment:
   nit: in scala we can omit `{}` if it's empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1031050991


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala:
##########
@@ -129,24 +130,25 @@ object InternalRow {
    */
   def getAccessor(dt: DataType, nullable: Boolean = true): (SpecializedGetters, Int) => Any = {
     val getValueNullSafe: (SpecializedGetters, Int) => Any = dt match {

Review Comment:
   instead of
   ```
   dt match {
     case _ => dt.physicalDataType match ...
   }
   ```
   we can just do
   ```
   dt.physicalDataType match ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] desmondcheongzx commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
desmondcheongzx commented on PR #38750:
URL: https://github.com/apache/spark/pull/38750#issuecomment-1331513899

   Thanks for the suggestions @cloud-fan! I removed `PhysicalObjectType` and added comments to the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] desmondcheongzx commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
desmondcheongzx commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1035432270


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala:
##########
@@ -129,24 +130,25 @@ object InternalRow {
    */
   def getAccessor(dt: DataType, nullable: Boolean = true): (SpecializedGetters, Int) => Any = {
     val getValueNullSafe: (SpecializedGetters, Int) => Any = dt match {

Review Comment:
   Left this here as discussed. This additional pattern match also allows us to remove `PhysicalObjectType` which is nice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1031051704


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala:
##########
@@ -253,13 +255,16 @@ object RowEncoder {
       }
     case _: DayTimeIntervalType => ObjectType(classOf[java.time.Duration])
     case _: YearMonthIntervalType => ObjectType(classOf[java.time.Period])
-    case _: DecimalType => ObjectType(classOf[java.math.BigDecimal])
-    case StringType => ObjectType(classOf[java.lang.String])
-    case _: ArrayType => ObjectType(classOf[scala.collection.Seq[_]])
-    case _: MapType => ObjectType(classOf[scala.collection.Map[_, _]])
-    case _: StructType => ObjectType(classOf[Row])
     case p: PythonUserDefinedType => externalDataTypeFor(p.sqlType)
     case udt: UserDefinedType[_] => ObjectType(udt.userClass)
+    case _ => dt.physicalDataType match {
+      case _: PhysicalArrayType => ObjectType(classOf[scala.collection.Seq[_]])
+      case _: PhysicalDecimalType => ObjectType(classOf[java.math.BigDecimal])
+      case _: PhysicalMapType => ObjectType(classOf[scala.collection.Map[_, _]])
+      case _: PhysicalStringType => ObjectType(classOf[java.lang.String])
+      case _: PhysicalStructType => ObjectType(classOf[Row])
+      case _ => dt

Review Comment:
   ditto, add some comments



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala:
##########
@@ -358,6 +363,7 @@ object RowEncoder {
       If(IsNull(input),
         Literal.create(null, externalDataTypeFor(input.dataType)),
         CreateExternalRow(convertedFields, schema))
+    case _ => input

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] desmondcheongzx commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
desmondcheongzx commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1031316383


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.types
+
+import org.apache.spark.sql.types._
+
+sealed abstract class PhysicalDataType {}
+
+case class PhysicalArrayType(elementType: DataType, containsNull: Boolean)
+    extends PhysicalDataType {}
+
+case class PhysicalBinaryType() extends PhysicalDataType {}

Review Comment:
   I might be misunderstanding how java and scala classes work, but these were left as `case class` instead of `object` because of the `instanceof` matching in the `SpecializedGettersReader.java`, `ColumnarBatchRow.java`, and `ColumnarRow.java` files, which need these types to be classes instead of objects.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1031053828


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.types
+
+import org.apache.spark.sql.types._
+
+sealed abstract class PhysicalDataType {}
+
+case class PhysicalArrayType(elementType: DataType, containsNull: Boolean)
+    extends PhysicalDataType {}
+
+case class PhysicalBinaryType() extends PhysicalDataType {}
+
+case class PhysicalBooleanType() extends PhysicalDataType {}
+
+case class PhysicalByteType() extends PhysicalDataType {}
+
+case class PhysicalCalendarIntervalType() extends PhysicalDataType {}
+
+case class PhysicalDecimalType(precision: Int, scale: Int) extends PhysicalDataType {}
+
+case class PhysicalDoubleType() extends PhysicalDataType {}
+
+case class PhysicalFloatType() extends PhysicalDataType {}
+
+case class PhysicalIntegerType() extends PhysicalDataType {}
+
+case class PhysicalLongType() extends PhysicalDataType {}
+
+case class PhysicalMapType(keyType: DataType, valueType: DataType, valueContainsNull: Boolean)
+    extends PhysicalDataType {}
+
+case class PhysicalNullType() extends PhysicalDataType {}
+
+case class PhysicalObjectType(cls: Class[_]) extends PhysicalDataType {}

Review Comment:
   do we really need this? I feel it's like `UserDefinedType` and we should never get its physical type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] desmondcheongzx commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
desmondcheongzx commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1035206672


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.types
+
+import org.apache.spark.sql.types._
+
+sealed abstract class PhysicalDataType {}
+
+case class PhysicalArrayType(elementType: DataType, containsNull: Boolean)
+    extends PhysicalDataType {}
+
+case class PhysicalBinaryType() extends PhysicalDataType {}
+
+case class PhysicalBooleanType() extends PhysicalDataType {}
+
+case class PhysicalByteType() extends PhysicalDataType {}
+
+case class PhysicalCalendarIntervalType() extends PhysicalDataType {}
+
+case class PhysicalDecimalType(precision: Int, scale: Int) extends PhysicalDataType {}
+
+case class PhysicalDoubleType() extends PhysicalDataType {}
+
+case class PhysicalFloatType() extends PhysicalDataType {}
+
+case class PhysicalIntegerType() extends PhysicalDataType {}
+
+case class PhysicalLongType() extends PhysicalDataType {}
+
+case class PhysicalMapType(keyType: DataType, valueType: DataType, valueContainsNull: Boolean)
+    extends PhysicalDataType {}
+
+case class PhysicalNullType() extends PhysicalDataType {}
+
+case class PhysicalObjectType(cls: Class[_]) extends PhysicalDataType {}

Review Comment:
   Yeah, agreed now that I think about it. The `ObjectType` should just be a holder of the Scala type, and logical datatypes shouldn't be implemented on top of a `PhysicalObjectType`. I'll remove this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1035869504


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.types
+
+import org.apache.spark.sql.types._
+
+sealed abstract class PhysicalDataType
+
+case class PhysicalArrayType(elementType: DataType, containsNull: Boolean) extends PhysicalDataType
+
+case class PhysicalBinaryType() extends PhysicalDataType

Review Comment:
   for physical types without parameters, shall we follow logical type and have both class and object?
   ```
   class LongType private() ...
   case object LongType extends LongType
   ```
   
   The benefit is: it's a singleton and we can save memory usage. The matching code in Scala can be
   ```
   if (dt == PhysicalBinaryType)..
   pdt match {
     case PhysicalBinaryType => ...
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #38750:
URL: https://github.com/apache/spark/pull/38750#issuecomment-1334722571

   OK, let me find them as comprehensively as possible
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] desmondcheongzx commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
desmondcheongzx commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1035432531


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.types
+
+import org.apache.spark.sql.types._
+
+sealed abstract class PhysicalDataType {}
+
+case class PhysicalArrayType(elementType: DataType, containsNull: Boolean)
+    extends PhysicalDataType {}
+
+case class PhysicalBinaryType() extends PhysicalDataType {}
+
+case class PhysicalBooleanType() extends PhysicalDataType {}
+
+case class PhysicalByteType() extends PhysicalDataType {}
+
+case class PhysicalCalendarIntervalType() extends PhysicalDataType {}
+
+case class PhysicalDecimalType(precision: Int, scale: Int) extends PhysicalDataType {}
+
+case class PhysicalDoubleType() extends PhysicalDataType {}
+
+case class PhysicalFloatType() extends PhysicalDataType {}
+
+case class PhysicalIntegerType() extends PhysicalDataType {}
+
+case class PhysicalLongType() extends PhysicalDataType {}
+
+case class PhysicalMapType(keyType: DataType, valueType: DataType, valueContainsNull: Boolean)
+    extends PhysicalDataType {}
+
+case class PhysicalNullType() extends PhysicalDataType {}
+
+case class PhysicalObjectType(cls: Class[_]) extends PhysicalDataType {}

Review Comment:
   Removed `PhysicalObjectType`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1032058561


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala:
##########
@@ -129,24 +130,25 @@ object InternalRow {
    */
   def getAccessor(dt: DataType, nullable: Boolean = true): (SpecializedGetters, Int) => Any = {
     val getValueNullSafe: (SpecializedGetters, Int) => Any = dt match {

Review Comment:
   Ah, if we can remove `PhysicalUserDefinedType`, it's OK to make this pattern match a bit longer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] desmondcheongzx commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
desmondcheongzx commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1036314217


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.types
+
+import org.apache.spark.sql.types._
+
+sealed abstract class PhysicalDataType
+
+case class PhysicalArrayType(elementType: DataType, containsNull: Boolean) extends PhysicalDataType
+
+case class PhysicalBinaryType() extends PhysicalDataType

Review Comment:
   Nice suggestion! Just made the changes @cloud-fan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1032058145


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.types
+
+import org.apache.spark.sql.types._
+
+sealed abstract class PhysicalDataType {}
+
+case class PhysicalArrayType(elementType: DataType, containsNull: Boolean)
+    extends PhysicalDataType {}
+
+case class PhysicalBinaryType() extends PhysicalDataType {}

Review Comment:
   oh I see, scala object is not very java friendly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38750:
URL: https://github.com/apache/spark/pull/38750#issuecomment-1324584432

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] desmondcheongzx commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
desmondcheongzx commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1031292156


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala:
##########
@@ -129,24 +130,25 @@ object InternalRow {
    */
   def getAccessor(dt: DataType, nullable: Boolean = true): (SpecializedGetters, Int) => Any = {
     val getValueNullSafe: (SpecializedGetters, Int) => Any = dt match {

Review Comment:
   Makes sense. We have to introduce a `PhysicalUserDefinedType` to remove this layer of indirection, but this is very doable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #38750:
URL: https://github.com/apache/spark/pull/38750#issuecomment-1333112105

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1031051302


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala:
##########
@@ -214,6 +215,7 @@ object RowEncoder {
       } else {
         nonNullOutput
       }
+    case _ => inputObject

Review Comment:
   can we add a comment? `For other data types, just return the internal catalyst value as it is.`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #38750:
URL: https://github.com/apache/spark/pull/38750#issuecomment-1325955535

   cc @MaxGekk @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] desmondcheongzx commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
desmondcheongzx commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1035431902


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.types
+
+import org.apache.spark.sql.types._
+
+sealed abstract class PhysicalDataType {}

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] desmondcheongzx commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
desmondcheongzx commented on code in PR #38750:
URL: https://github.com/apache/spark/pull/38750#discussion_r1035432421


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala:
##########
@@ -1901,24 +1904,27 @@ object CodeGenerator extends Logging {
    * Returns the Java type for a DataType.
    */
   def javaType(dt: DataType): String = dt match {
-    case BooleanType => JAVA_BOOLEAN
-    case ByteType => JAVA_BYTE
-    case ShortType => JAVA_SHORT
-    case IntegerType | DateType | _: YearMonthIntervalType => JAVA_INT
-    case LongType | TimestampType | TimestampNTZType | _: DayTimeIntervalType => JAVA_LONG
-    case FloatType => JAVA_FLOAT
-    case DoubleType => JAVA_DOUBLE
-    case _: DecimalType => "Decimal"
-    case BinaryType => "byte[]"
-    case StringType => "UTF8String"
-    case CalendarIntervalType => "CalendarInterval"
-    case _: StructType => "InternalRow"
-    case _: ArrayType => "ArrayData"
-    case _: MapType => "MapData"
     case udt: UserDefinedType[_] => javaType(udt.sqlType)
-    case ObjectType(cls) if cls.isArray => s"${javaType(ObjectType(cls.getComponentType))}[]"
-    case ObjectType(cls) => cls.getName
-    case _ => "Object"
+    case _ => dt.physicalDataType match {

Review Comment:
   Left this here for the same reasons as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #38750:
URL: https://github.com/apache/spark/pull/38750#issuecomment-1334720137

   good point, I think we should use physical type there. We should probably find all the usages of `DateType` in the codebase and see if they need to use physical type or not. @LuciferYang it will be great if you have time to help with this. Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org