You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/02 07:19:18 UTC

[GitHub] [spark] sadikovi opened a new pull request, #36427: [SPARK-39086] Parquet UDT support

sadikovi opened a new pull request, #36427:
URL: https://github.com/apache/spark/pull/36427

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36427:
URL: https://github.com/apache/spark/pull/36427#discussion_r867751155


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:
##########
@@ -174,9 +174,15 @@ class ParquetToSparkSchemaConverter(
    */
   def convertField(
       field: ColumnIO,
-      sparkReadType: Option[DataType] = None): ParquetColumn = field match {
-    case primitiveColumn: PrimitiveColumnIO => convertPrimitiveField(primitiveColumn, sparkReadType)
-    case groupColumn: GroupColumnIO => convertGroupField(groupColumn, sparkReadType)
+      sparkReadType: Option[DataType] = None): ParquetColumn = {
+    val targetType = sparkReadType.map {

Review Comment:
   I added TestArrayUDT type which represents Array[Long] internally in ParquetQuerySuite. Do you think I should add a more elaborate test with complex types there?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36427:
URL: https://github.com/apache/spark/pull/36427#discussion_r867751155


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:
##########
@@ -174,9 +174,15 @@ class ParquetToSparkSchemaConverter(
    */
   def convertField(
       field: ColumnIO,
-      sparkReadType: Option[DataType] = None): ParquetColumn = field match {
-    case primitiveColumn: PrimitiveColumnIO => convertPrimitiveField(primitiveColumn, sparkReadType)
-    case groupColumn: GroupColumnIO => convertGroupField(groupColumn, sparkReadType)
+      sparkReadType: Option[DataType] = None): ParquetColumn = {
+    val targetType = sparkReadType.map {

Review Comment:
   I added TestArrayUDT type which represents Array[Long] internally in ParquetQuerySuite.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a diff in pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sunchao commented on code in PR #36427:
URL: https://github.com/apache/spark/pull/36427#discussion_r863072081


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetUtils.scala:
##########
@@ -208,6 +208,9 @@ object ParquetUtils {
     case st: StructType =>
       sqlConf.parquetVectorizedReaderNestedColumnEnabled &&
         st.fields.forall(f => isBatchReadSupported(sqlConf, f.dataType))
+    case udt: UserDefinedType[_] =>
+      sqlConf.parquetVectorizedReaderNestedColumnEnabled &&

Review Comment:
   why do we need to check `sqlConf.parquetVectorizedReaderNestedColumnEnabled`? if `udt.sqlType` is `AtomicType` then this should return true.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:
##########
@@ -174,9 +174,15 @@ class ParquetToSparkSchemaConverter(
    */
   def convertField(
       field: ColumnIO,
-      sparkReadType: Option[DataType] = None): ParquetColumn = field match {
-    case primitiveColumn: PrimitiveColumnIO => convertPrimitiveField(primitiveColumn, sparkReadType)
-    case groupColumn: GroupColumnIO => convertGroupField(groupColumn, sparkReadType)
+      sparkReadType: Option[DataType] = None): ParquetColumn = {
+    val targetType = sparkReadType.map {

Review Comment:
   I think there is another edge case in `convertInternal` when `fieldReadType` is `ArrayType` too - maybe we also need to check if the field type is UDT whose `sqlType` is `ArrayType`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36427:
URL: https://github.com/apache/spark/pull/36427#issuecomment-1124446872

   Thanks for the review @sunchao!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #36427: [SPARK-39086] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #36427:
URL: https://github.com/apache/spark/pull/36427#issuecomment-1114581370

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36427:
URL: https://github.com/apache/spark/pull/36427#discussion_r863240629


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetUtils.scala:
##########
@@ -208,6 +208,9 @@ object ParquetUtils {
     case st: StructType =>
       sqlConf.parquetVectorizedReaderNestedColumnEnabled &&
         st.fields.forall(f => isBatchReadSupported(sqlConf, f.dataType))
+    case udt: UserDefinedType[_] =>
+      sqlConf.parquetVectorizedReaderNestedColumnEnabled &&

Review Comment:
   Yes, I can update that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36427:
URL: https://github.com/apache/spark/pull/36427#issuecomment-1120790710

   Thanks @sunchao. I addressed your comments and fixed the problem in code generation as you pointed out. Could you review again? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36427:
URL: https://github.com/apache/spark/pull/36427#discussion_r869759054


##########
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java:
##########
@@ -816,8 +816,8 @@ protected boolean isArray() {
    * Sets up the common state and also handles creating the child columns if this is a nested
    * type.
    */
-  protected WritableColumnVector(int capacity, DataType type) {
-    super(type);
+  protected WritableColumnVector(int capacity, DataType dataType) {

Review Comment:
   Oh, I had to change the variable name because it clashes with `type` property in the class. I can keep the same name and instead use `this.type` in the code below if you like.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sunchao commented on PR #36427:
URL: https://github.com/apache/spark/pull/36427#issuecomment-1121492836

   @sadikovi hmm it seems there are still some test failures


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a diff in pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sunchao commented on code in PR #36427:
URL: https://github.com/apache/spark/pull/36427#discussion_r868880982


##########
sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java:
##########
@@ -310,6 +311,10 @@ public final CalendarInterval getInterval(int rowId) {
    * Sets up the data type of this column vector.
    */
   protected ColumnVector(DataType type) {
-    this.type = type;
+    if (type instanceof UserDefinedType) {

Review Comment:
   I wonder if it's better to move this to `reserveInternal`, since the `type` here is exposed via the `ColumnVector.dataType` method and maybe there're situations where a caller will expect the type here is the original type that passed into the constructor?
   
   For instance, this method is called in `ArrowEvalPythonExec` where the type is compared to the output type of the physical node.



##########
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java:
##########
@@ -816,8 +816,8 @@ protected boolean isArray() {
    * Sets up the common state and also handles creating the child columns if this is a nested
    * type.
    */
-  protected WritableColumnVector(int capacity, DataType type) {
-    super(type);
+  protected WritableColumnVector(int capacity, DataType dataType) {

Review Comment:
   This is unrelated change but I'm fine since it makes the naming more consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #36427:
URL: https://github.com/apache/spark/pull/36427#discussion_r869758806


##########
sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java:
##########
@@ -310,6 +311,10 @@ public final CalendarInterval getInterval(int rowId) {
    * Sets up the data type of this column vector.
    */
   protected ColumnVector(DataType type) {
-    this.type = type;
+    if (type instanceof UserDefinedType) {

Review Comment:
   My understanding is ArrowEvalPythonExec/EvalPythonExec works with a list of attributes as an output which is the actual Spark schema, not column vectors' types, so it should work. I can add a test for it to make sure it works.
   
   I thought about moving it to `reserveInternal` but then I would need to handle it in both off-heap and on-heap and call the method recursively. I thought it would be simpler to convert the type directly in ColumnVector and use the expanded type everywhere. 
   
   Let me know if you would like me to follow up on anything mentioned above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sunchao commented on PR #36427:
URL: https://github.com/apache/spark/pull/36427#issuecomment-1124576200

   Thanks! merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a diff in pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sunchao commented on code in PR #36427:
URL: https://github.com/apache/spark/pull/36427#discussion_r869863029


##########
sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java:
##########
@@ -310,6 +311,10 @@ public final CalendarInterval getInterval(int rowId) {
    * Sets up the data type of this column vector.
    */
   protected ColumnVector(DataType type) {
-    this.type = type;
+    if (type instanceof UserDefinedType) {

Review Comment:
   I see, this looks good then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao closed pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sunchao closed pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader
URL: https://github.com/apache/spark/pull/36427


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36427:
URL: https://github.com/apache/spark/pull/36427#issuecomment-1121735595

   I fixed the tests, everything should be fine now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36427: [SPARK-39086] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36427:
URL: https://github.com/apache/spark/pull/36427#issuecomment-1114574931

   @sunchao Can you review the PR? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sadikovi commented on PR #36427:
URL: https://github.com/apache/spark/pull/36427#issuecomment-1115533499

   Does anyone know why the tests would fail?
   All of them seem to fail with 
   `[info]   Cause: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 114, Column 107: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 114, Column 107: No applicable constructor/method found for actual parameters "int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Posted by GitBox <gi...@apache.org>.
sunchao commented on PR #36427:
URL: https://github.com/apache/spark/pull/36427#issuecomment-1115746980

   Looks like `CodeGenerator.getValueFromVector` and `CodeGenerator.getValue` needs to be updated since previously a `ColumnVector` won't be UDT type, but now it can. If input `dataType` is UDT:
   
   ```scala
     def getValueFromVector(vector: String, dataType: DataType, rowId: String): String = {
       if (dataType.isInstanceOf[StructType]) {
         // `ColumnVector.getStruct` is different from `InternalRow.getStruct`, it only takes an
         // `ordinal` parameter.
         s"$vector.getStruct($rowId)"
       } else {
         getValue(vector, dataType, rowId)
       }
   ```
   
   this will call `getValue` instead, and
   ```scala
     def getValue(input: String, dataType: DataType, ordinal: String): String = {
       val jt = javaType(dataType)
       dataType match {
         case _ if isPrimitiveType(jt) => s"$input.get${primitiveTypeName(jt)}($ordinal)"
         case t: DecimalType => s"$input.getDecimal($ordinal, ${t.precision}, ${t.scale})"
         case StringType => s"$input.getUTF8String($ordinal)"
         case BinaryType => s"$input.getBinary($ordinal)"
         case CalendarIntervalType => s"$input.getInterval($ordinal)"
         case t: StructType => s"$input.getStruct($ordinal, ${t.size})"
         case _: ArrayType => s"$input.getArray($ordinal)"
         case _: MapType => s"$input.getMap($ordinal)"
         case NullType => "null"
         case udt: UserDefinedType[_] => getValue(input, udt.sqlType, ordinal)
         case _ => s"($jt)$input.get($ordinal, null)"
       }
     }
   ```
   
   and `getValue` will recursively call itself on `udt.sqlType` which is struct type, and thus will call `$input.getStruct($ordinal, ${t.size})` which will fail with the above error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org