You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "chenhao-db (via GitHub)" <gi...@apache.org> on 2023/11/15 22:39:54 UTC

[PR] [SPARK-45827] Fix variant parquet reader. [spark]

chenhao-db opened a new pull request, #43825:
URL: https://github.com/apache/spark/pull/43825

   ## What changes were proposed in this pull request?
   
   This is a follow-up of https://github.com/apache/spark/pull/43707. The previous PR missed a piece in the variant parquet reader: we are treating the variant type as `struct<value binary, metadata binary>`, so it also needs a similar `assembleStruct` process in the Parquet reader to correctly set the nullness of variant values from def/rep levels.
   
   ## How was this patch tested?
   
   Extend the existing unit test. It would fail without the change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45827] Fix variant parquet reader. [spark]

Posted by "chenhao-db (via GitHub)" <gi...@apache.org>.
chenhao-db commented on code in PR #43825:
URL: https://github.com/apache/spark/pull/43825#discussion_r1395045528


##########
sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala:
##########
@@ -73,5 +73,12 @@ class VariantSuite extends QueryTest with SharedSparkSession {
       values.map(v => if (v == null) "null" else v.debugString()).sorted
     }
     assert(prepareAnswer(input) == prepareAnswer(result))
+
+    withTempDir { dir =>

Review Comment:
   Because the variant values it writes are all non-null. This only causes an issue when there is a null variant value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45827][SQL] Fix variant parquet reader. [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #43825:
URL: https://github.com/apache/spark/pull/43825#issuecomment-1813730404

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45827] Fix variant parquet reader. [spark]

Posted by "chenhao-db (via GitHub)" <gi...@apache.org>.
chenhao-db commented on PR #43825:
URL: https://github.com/apache/spark/pull/43825#issuecomment-1813534407

   @cloud-fan @HyukjinKwon could you help take a look? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45827] Fix variant parquet reader. [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43825:
URL: https://github.com/apache/spark/pull/43825#discussion_r1395044163


##########
sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala:
##########
@@ -73,5 +73,12 @@ class VariantSuite extends QueryTest with SharedSparkSession {
       values.map(v => if (v == null) "null" else v.debugString()).sorted
     }
     assert(prepareAnswer(input) == prepareAnswer(result))
+
+    withTempDir { dir =>

Review Comment:
   The `basic tests` test case also test parquet write an read, why it didn't expose the bug?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45827][SQL] Fix variant parquet reader. [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #43825: [SPARK-45827][SQL] Fix variant parquet reader.
URL: https://github.com/apache/spark/pull/43825


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org