You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/11 15:42:20 UTC

[GitHub] [spark] HyukjinKwon edited a comment on pull request #29063: [SPARK-32270][SQL] Use TextFileFormat in CSV's schema inference with a different encoding

HyukjinKwon edited a comment on pull request #29063:
URL: https://github.com/apache/spark/pull/29063#issuecomment-657081547


   Ah, it does. We should better change spark-xml to use SQL source instead of RDD APIs during schema inference but I think that would be difficult because the record separator in this case is newline in CSV and JSON but spark-xml is dependent on the custom Hadoop input format ...
   
   In most cases it wouldn't be a big deal so I guess it's fine to don't change at this moment unless any major issue is found.
   I think we could solve this issue together once we migrate from DSv1 to DSv2 in Spark XML .. but I guess it's a bit far future ..


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org