You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2023/02/07 11:20:28 UTC

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39926: [WIP][SQL] Remove skipComments function in CSVExprUtils

HyukjinKwon commented on code in PR #39926:
URL: https://github.com/apache/spark/pull/39926#discussion_r1098527228


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala:
##########
@@ -35,22 +35,11 @@ object CSVExprUtils {
     }
   }
 
-  def skipComments(iter: Iterator[String], options: CSVOptions): Iterator[String] = {
-    if (options.isCommentSet) {
-      val commentPrefix = options.comment.toString
-      iter.dropWhile { line =>

Review Comment:
   Because with `DataFrameReader.csv(Dataset[String])` (filtering) you don't know where is the first block when you apply this function. But when you read it from file, you can know the start of the file (by partition number) - in this case we can drop. The latter is more correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org