You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/02 15:27:35 UTC

[GitHub] [spark] tgravescs commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

tgravescs commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-771714621

@razajafri I think @cloud-fan pointed to the code you could need to change, where the parquet format infers the schema based on the decimal precision. It think he is saying just have the makeDecimalType(Decimal.MAX_INT_DIGITS) infer too INT64. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L136
moves to:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L147

@cloud-fan it seems like a waste to store as long when you are just wasting space. I guess your thought is its more expensive to downcast it then save on space? I guess that might also depend on how big your column is though.
That also doesn't match the other code @revans2 pointed to above where it stores decimals small enough as Int. seems like the change as is makes it consistent with those whereas changing to use Long would be inconsistent.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org