You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/23 22:36:52 UTC

[GitHub] [spark] sadikovi edited a comment on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

sadikovi edited a comment on pull request #34995:
URL: https://github.com/apache/spark/pull/34995#issuecomment-1000545069


   I think there is  a bug in the partitioning cast where the value is inferred using the raw value which could contain escaped characters. We escape column name but not the value! For example, if you have double value 4.5, you end up with trying to infer double from `4%2E5`. See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L307-L311 and https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L491. Timestamps escape value separately in the "inferPartitionColumnValue" method.
   
   IMHO, "inferPartitionColumnValue" method should already take the actual value that was unescaped, not the raw one. Because of this issue, this PR introduces breaking changes as type inference could be incorrect in doubles and decimals.
   
   @cloud-fan is this a known issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org