You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2016/06/30 03:54:30 UTC

[GitHub] spark pull request #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data sour...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/13988

    [WIP][SPARK-16101][SQL] Refactoring CSV data source to be consistent with JSON data source

    ## What changes were proposed in this pull request?
    
    This PR refactors CSV data source to be consistent with JSON data source.
    
    This PR removes classes `CSVParser` and introduces new classes `UnivocityParser`, `UnivocityGenerator` and `CSVUtils` to be consistent with JSON data source (`JacksonParser`, `JacksonGenerator` and `JacksonUtils`). Also, DefaultSource moves to `CSVRelation` just like `JSONRelation`.
    
    ## How was this patch tested?
    
    Existing tests should cover this.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-16101

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13988.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13988
    
----
commit 211bfb47acc79c51327b3f1c40aa86802470f436
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-06-30T03:50:58Z

    Refactoring CSV data source to be consistent with JSON data source

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #62016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62016/consoleFull)** for PR 13988 at commit [`97d7bd4`](https://github.com/apache/spark/commit/97d7bd43a04b9188ab111ba4e773e67a4fc4a3b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #63425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63425/consoleFull)** for PR 13988 at commit [`a634435`](https://github.com/apache/spark/commit/a63443505483287fa9bb20312a24b38e75f90588).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70919/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70923/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66035/consoleFull)** for PR 13988 at commit [`df69c8d`](https://github.com/apache/spark/commit/df69c8df4f5d45b8c66f9990e7ac74f3262179b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r71097989
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import com.univocity.parsers.csv.{CsvWriter, CsvWriterSettings}
    +
    +import org.apache.spark.internal.Logging
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * Converts a sequence of string to CSV string
    + */
    +private[csv] object UnivocityGenerator extends Logging {
    +  /**
    +   * Transforms a single InternalRow to CSV using Univocity
    +   *
    +   * @param rowSchema the schema object used for conversion
    +   * @param writer a CsvWriter object
    +   * @param headers headers to write
    +   * @param writeHeader true if it needs to write header
    +   * @param options CSVOptions object containing options
    +   * @param row The row to convert
    +   */
    +  def apply(
    +      rowSchema: StructType,
    +      writer: CsvWriter,
    +      headers: Array[String],
    +      writeHeader: Boolean,
    +      options: CSVOptions)(row: InternalRow): Unit = {
    +    val tokens = {
    +      row.toSeq(rowSchema).map { field =>
    +        // TODO: It seems all the data types are not able to be represented by `toString`.
    +        // For example, `DateType` and `TimestampType` are being long values as timestamps.
    --- End diff --
    
    Actually, I opened another PR here, https://github.com/apache/spark/pull/13912. Maybe it is about this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #61587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61587/consoleFull)** for PR 13988 at commit [`b1050b5`](https://github.com/apache/spark/commit/b1050b51e24384e4f9af7c381bb4ad5376f36040).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    (@rxin gentle ping..)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94755764
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala ---
    @@ -0,0 +1,272 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.math.BigDecimal
    +import java.text.NumberFormat
    +import java.util.Locale
    +
    +import scala.util.Try
    +import scala.util.control.NonFatal
    +
    +import com.univocity.parsers.csv.CsvParser
    +
    +import org.apache.spark.internal.Logging
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +import org.apache.spark.unsafe.types.UTF8String
    +
    +private[csv] class UnivocityParser(
    +    schema: StructType,
    +    requiredSchema: StructType,
    +    options: CSVOptions) extends Logging {
    +  def this(schema: StructType, options: CSVOptions) = this(schema, schema, options)
    +
    +  val valueConverters = makeConverters(schema, options)
    +  val parser = new CsvParser(options.asParserSettings)
    +
    +  // A `ValueConverter` is responsible for converting the given value to a desired type.
    +  private type ValueConverter = String => Any
    +
    +  var numMalformedRecords = 0
    +  val row = new GenericInternalRow(requiredSchema.length)
    +  val indexArr: Array[Int] = {
    +    val fields = if (options.dropMalformed) {
    +      // If `dropMalformed` is enabled, then it needs to parse all the values
    +      // so that we can decide which row is malformed.
    +      requiredSchema ++ schema.filterNot(requiredSchema.contains(_))
    +    } else {
    +      requiredSchema
    +    }
    +    fields.filter(schema.contains).map(schema.indexOf).toArray
    +  }
    +
    +  /**
    +   * Create converters which cast each given string datum to each specified type in given schema.
    +   * Currently, we do not support complex types (`ArrayType`, `MapType`, `StructType`).
    +   *
    +   * For string types, this is simply the datum.
    +   * For other types, this is converted into the value according to the type.
    +   * For other nullable types, returns null if it is null or equals to the value specified
    +   * in `nullValue` option.
    +   *
    +   * @param schema schema that contains data types to cast the given value into.
    +   * @param options CSV options.
    +   */
    +  private def makeConverters(
    +      schema: StructType,
    +      options: CSVOptions = CSVOptions()): Array[ValueConverter] = {
    +    schema.map(f => makeConverter(f.name, f.dataType, f.nullable, options)).toArray
    +  }
    +
    +  /**
    +   * Create a converter which converts the string value to a value according to a desired type.
    +   */
    +  def makeConverter(
    +      name: String,
    +      dataType: DataType,
    +      nullable: Boolean = true,
    +      options: CSVOptions = CSVOptions()): ValueConverter = dataType match {
    +    case _: ByteType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toByte)
    +
    +    case _: ShortType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toShort)
    +
    +    case _: IntegerType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toInt)
    +
    +    case _: LongType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toLong)
    +
    +    case _: FloatType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) {
    +        case options.nanValue => Float.NaN
    +        case options.negativeInf => Float.NegativeInfinity
    +        case options.positiveInf => Float.PositiveInfinity
    +        case datum =>
    +          Try(datum.toFloat)
    +            .getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).floatValue())
    +      }
    +
    +    case _: DoubleType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) {
    +        case options.nanValue => Double.NaN
    +        case options.negativeInf => Double.NegativeInfinity
    +        case options.positiveInf => Double.PositiveInfinity
    +        case datum =>
    +          Try(datum.toDouble)
    +            .getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).doubleValue())
    +      }
    +
    +    case _: BooleanType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toBoolean)
    +
    +    case dt: DecimalType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) { datum =>
    +        val value = new BigDecimal(datum.replaceAll(",", ""))
    +        Decimal(value, dt.precision, dt.scale)
    +      }
    +
    +    case _: TimestampType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) { datum =>
    +        // This one will lose microseconds parts.
    +        // See https://issues.apache.org/jira/browse/SPARK-10681.
    +        Try(options.timestampFormat.parse(datum).getTime * 1000L)
    +          .getOrElse {
    +          // If it fails to parse, then tries the way used in 2.0 and 1.x for backwards
    +          // compatibility.
    +          DateTimeUtils.stringToTime(datum).getTime * 1000L
    +        }
    +      }
    +
    +    case _: DateType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) { datum =>
    +        // This one will lose microseconds parts.
    +        // See https://issues.apache.org/jira/browse/SPARK-10681.x
    +        Try(DateTimeUtils.millisToDays(options.dateFormat.parse(datum).getTime))
    +          .getOrElse {
    +          // If it fails to parse, then tries the way used in 2.0 and 1.x for backwards
    +          // compatibility.
    +          DateTimeUtils.millisToDays(DateTimeUtils.stringToTime(datum).getTime)
    +        }
    +      }
    +
    +    case _: StringType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(UTF8String.fromString(_))
    +
    +    case udt: UserDefinedType[_] => (datum: String) =>
    +      makeConverter(name, udt.sqlType, nullable, options)
    +
    +    case _ => throw new RuntimeException(s"Unsupported type: ${dataType.typeName}")
    +  }
    +
    +  private def nullSafeDatum(
    +       datum: String,
    +       name: String,
    +       nullable: Boolean,
    +       options: CSVOptions)(converter: ValueConverter): Any = {
    +    if (datum == options.nullValue || datum == null) {
    +      if (!nullable) {
    +        throw new RuntimeException(s"null value found but field $name is not nullable.")
    +      }
    +      null
    +    } else {
    +      converter.apply(datum)
    +    }
    +  }
    +
    +  /**
    +   * Parses a single CSV record (in the form of an array of strings in which
    +   * each element represents a column) and turns it into either one resulting row or no row (if the
    +   * the record is malformed).
    +   */
    +  def parse(input: String): Option[InternalRow] = {
    --- End diff --
    
    Here, I separate the parsing mode logics and actual converting logics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    @rxin Could you take a look please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #70920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70920/testReport)** for PR 13988 at commit [`08e5fe7`](https://github.com/apache/spark/commit/08e5fe758e701e50e9d0bec6d86200ec48465cb3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #61588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61588/consoleFull)** for PR 13988 at commit [`0d60c57`](https://github.com/apache/spark/commit/0d60c57cf1807c610bca66e06d24dfa2a0b553a0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66037/consoleFull)** for PR 13988 at commit [`a6d85b6`](https://github.com/apache/spark/commit/a6d85b69c3724a7f4228a8169bb44658d675a4fb).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #70917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70917/testReport)** for PR 13988 at commit [`72e7dcc`](https://github.com/apache/spark/commit/72e7dccd113079cf19b5ddb1de83ba24ee770d51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #70917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70917/testReport)** for PR 13988 at commit [`72e7dcc`](https://github.com/apache/spark/commit/72e7dccd113079cf19b5ddb1de83ba24ee770d51).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61588/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #61523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61523/consoleFull)** for PR 13988 at commit [`211bfb4`](https://github.com/apache/spark/commit/211bfb47acc79c51327b3f1c40aa86802470f436).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94754860
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala ---
    @@ -228,3 +150,35 @@ class CSVFileFormat extends TextBasedFileFormat with DataSourceRegister {
         schema.foreach(field => verifyType(field.dataType))
       }
     }
    +
    --- End diff --
    
    These just came from `CSVRelation`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #65716 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65716/consoleFull)** for PR 13988 at commit [`015567e`](https://github.com/apache/spark/commit/015567e383c91b7621bf2911529ced8fdc30713e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64254/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by deanchen <gi...@git.apache.org>.
Github user deanchen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r71100480
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import com.univocity.parsers.csv.{CsvWriter, CsvWriterSettings}
    +
    +import org.apache.spark.internal.Logging
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * Converts a sequence of string to CSV string
    + */
    +private[csv] object UnivocityGenerator extends Logging {
    +  /**
    +   * Transforms a single InternalRow to CSV using Univocity
    +   *
    +   * @param rowSchema the schema object used for conversion
    +   * @param writer a CsvWriter object
    +   * @param headers headers to write
    +   * @param writeHeader true if it needs to write header
    +   * @param options CSVOptions object containing options
    +   * @param row The row to convert
    +   */
    +  def apply(
    +      rowSchema: StructType,
    +      writer: CsvWriter,
    +      headers: Array[String],
    +      writeHeader: Boolean,
    +      options: CSVOptions)(row: InternalRow): Unit = {
    +    val tokens = {
    +      row.toSeq(rowSchema).map { field =>
    +        // TODO: It seems all the data types are not able to be represented by `toString`.
    +        // For example, `DateType` and `TimestampType` are being long values as timestamps.
    --- End diff --
    
    ah thanks, commented on that PR. Glad to see someone showing some love to Spark's csv datasource!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63482/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66939/consoleFull)** for PR 13988 at commit [`697f276`](https://github.com/apache/spark/commit/697f276d4f539fc73189e1021694a313b30a5cbe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    cc @hvanhovell Do you mind if I ask to review this please? I remember the initial proposal was reviewed by you. If this seems too big to review, I can split this into reading path and writing path. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    This is also loosely related with https://issues.apache.org/jira/browse/SPARK-15463. After this one is merged, we could resemble the implementation of JSON one easily rather then introducing another refactoring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94892718
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala ---
    @@ -0,0 +1,272 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.math.BigDecimal
    +import java.text.NumberFormat
    +import java.util.Locale
    +
    +import scala.util.Try
    +import scala.util.control.NonFatal
    +
    +import com.univocity.parsers.csv.CsvParser
    +
    +import org.apache.spark.internal.Logging
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +import org.apache.spark.unsafe.types.UTF8String
    +
    +private[csv] class UnivocityParser(
    +    schema: StructType,
    +    requiredSchema: StructType,
    +    options: CSVOptions) extends Logging {
    +  def this(schema: StructType, options: CSVOptions) = this(schema, schema, options)
    +
    +  val valueConverters = makeConverters(schema, options)
    --- End diff --
    
    Some changes about converting here came from `CSVTypeCast`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62014/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66936 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66936/consoleFull)** for PR 13988 at commit [`97eec87`](https://github.com/apache/spark/commit/97eec8741c16ad4f4effc03feeec0797a33eb4d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    I still need to correct some nits and check the consistency with JSON data source but I opened this just to check if it breaks anything. I will submit some more commits soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70920/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66939/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #63482 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63482/consoleFull)** for PR 13988 at commit [`c143a01`](https://github.com/apache/spark/commit/c143a0173365f890551fdc5f52eb3309baaab2b4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class CachedData(plan: LogicalPlan, cachedRepresentation: InMemoryRelation)`
      * `class CacheManager extends Logging `
      * `trait DataSourceScanExec extends LeafExecNode with CodegenSupport `
      * `case class RowDataSourceScanExec(`
      * `case class FileSourceScanExec(`
      * `case class ExternalRDD[T](`
      * `case class ExternalRDDScanExec[T](`
      * `case class LogicalRDD(`
      * `case class RDDScanExec(`
      * `trait FileRelation `
      * `case class LocalTableScanExec(`
      * `abstract class RowIterator `
      * `trait LeafExecNode extends SparkPlan `
      * `trait UnaryExecNode extends SparkPlan `
      * `trait BinaryExecNode extends SparkPlan `
      * `case class PlanLater(plan: LogicalPlan) extends LeafExecNode `
      * `abstract class SparkStrategies extends QueryPlanner[SparkPlan] `
      * `class UnsafeRowSerializer(`
      * `case class ScalaUDAF(`
      * `case class InMemoryRelation(`
      * `case class InMemoryTableScanExec(`
      * `trait RunnableCommand extends LogicalPlan with logical.Command `
      * `case class ExecutedCommandExec(cmd: RunnableCommand) extends SparkPlan `
      * `case class AlterTableRecoverPartitionsCommand(`
      * `case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] `
      * `class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] `
      * `case class InsertIntoDataSourceCommand(`
      * `case class InsertIntoHadoopFsRelationCommand(`
      * `case class PartitionDirectory(values: InternalRow, path: Path)`
      * `case class PartitionSpec(`
      * `class CSVOutputWriterFactory(options: CSVOptions) extends OutputWriterFactory `
      * `class CsvOutputWriter(`
      * `case class JDBCPartition(whereClause: String, idx: Int) extends Partition `
      * `class ResolveDataSource(sparkSession: SparkSession) extends Rule[LogicalPlan] `
      * `case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] `
      * `case class PreWriteCheck(conf: SQLConf, catalog: SessionCatalog)`
      * `  case class DebugExec(child: SparkPlan) extends UnaryExecNode with CodegenSupport `
      * `class ExchangeCoordinator(`
      * `case class MapPartitionsRWrapper(`
      * `class IncrementalExecution(`
      * `class ExecutionPage(parent: SQLTab) extends WebUIPage(\"execution\") with Logging `
      * `class SQLHistoryListenerFactory extends SparkHistoryListenerFactory `
      * `class SQLListener(conf: SparkConf) extends SparkListener with Logging `
      * `class SQLHistoryListener(conf: SparkConf, sparkUI: SparkUI)`
      * `class SQLTab(val listener: SQLListener, sparkUI: SparkUI)`
      * `case class SparkPlanGraph(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66572/consoleFull)** for PR 13988 at commit [`3d01f8c`](https://github.com/apache/spark/commit/3d01f8c45751ca3ee8237e260c722f6014b6d91a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #62014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62014/consoleFull)** for PR 13988 at commit [`97d7bd4`](https://github.com/apache/spark/commit/97d7bd43a04b9188ab111ba4e773e67a4fc4a3b6).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #70919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70919/testReport)** for PR 13988 at commit [`34fc6ca`](https://github.com/apache/spark/commit/34fc6ca6d6f3064a07cc116d183113379718c843).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #62015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62015/consoleFull)** for PR 13988 at commit [`97d7bd4`](https://github.com/apache/spark/commit/97d7bd43a04b9188ab111ba4e773e67a4fc4a3b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94755096
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -214,127 +234,47 @@ private[csv] object CSVInferSchema {
     
         case _ => None
       }
    -}
    -
    -private[csv] object CSVTypeCast {
    -  // A `ValueConverter` is responsible for converting the given value to a desired type.
    -  private type ValueConverter = String => Any
     
       /**
    -   * Create converters which cast each given string datum to each specified type in given schema.
    -   * Currently, we do not support complex types (`ArrayType`, `MapType`, `StructType`).
    -   *
    -   * For string types, this is simply the datum.
    -   * For other types, this is converted into the value according to the type.
    -   * For other nullable types, returns null if it is null or equals to the value specified
    -   * in `nullValue` option.
    -   *
    -   * @param schema schema that contains data types to cast the given value into.
    -   * @param options CSV options.
    +   * Generates a header from the given row which is null-safe and duplicate-safe.
        */
    -  def makeConverters(
    -      schema: StructType,
    -      options: CSVOptions = CSVOptions()): Array[ValueConverter] = {
    -    schema.map(f => makeConverter(f.name, f.dataType, f.nullable, options)).toArray
    -  }
    -
    -  /**
    -   * Create a converter which converts the string value to a value according to a desired type.
    -   */
    -  def makeConverter(
    -       name: String,
    -       dataType: DataType,
    -       nullable: Boolean = true,
    -       options: CSVOptions = CSVOptions()): ValueConverter = dataType match {
    -    case _: ByteType => (d: String) =>
    -      nullSafeDatum(d, name, nullable, options)(_.toByte)
    -
    -    case _: ShortType => (d: String) =>
    -      nullSafeDatum(d, name, nullable, options)(_.toShort)
    -
    -    case _: IntegerType => (d: String) =>
    -      nullSafeDatum(d, name, nullable, options)(_.toInt)
    -
    -    case _: LongType => (d: String) =>
    -      nullSafeDatum(d, name, nullable, options)(_.toLong)
    -
    -    case _: FloatType => (d: String) =>
    -      nullSafeDatum(d, name, nullable, options) {
    -        case options.nanValue => Float.NaN
    -        case options.negativeInf => Float.NegativeInfinity
    -        case options.positiveInf => Float.PositiveInfinity
    -        case datum =>
    -          Try(datum.toFloat)
    -            .getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).floatValue())
    -      }
    -
    -    case _: DoubleType => (d: String) =>
    -      nullSafeDatum(d, name, nullable, options) {
    -        case options.nanValue => Double.NaN
    -        case options.negativeInf => Double.NegativeInfinity
    -        case options.positiveInf => Double.PositiveInfinity
    -        case datum =>
    -          Try(datum.toDouble)
    -            .getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).doubleValue())
    +  private def makeSafeHeader(
    --- End diff --
    
    This just came from `CSVFileFormat`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon closed the pull request at:

    https://github.com/apache/spark/pull/13988


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94755607
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala ---
    @@ -0,0 +1,89 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.io.Writer
    +
    +import com.univocity.parsers.csv.CsvWriter
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +
    +private[csv] class UnivocityGenerator(
    +    schema: StructType,
    +    writer: Writer,
    +    options: CSVOptions = new CSVOptions(Map.empty[String, String])) {
    +  private val writerSettings = options.asWriterSettings
    +  writerSettings.setHeaders(schema.fieldNames: _*)
    +  private val gen = new CsvWriter(writer, writerSettings)
    +
    +  // A `ValueConverter` is responsible for converting a value of an `InternalRow` to `String`.
    --- End diff --
    
    These below just mostly came from `CSVTypeCast`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #62011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62011/consoleFull)** for PR 13988 at commit [`f0d1512`](https://github.com/apache/spark/commit/f0d151209b8f626941ae8a1a8f69440051aa0d8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Sure! Let me split this into reading and writing ones. Thank you for yout comments. Let me close this for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64635/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #70919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70919/testReport)** for PR 13988 at commit [`34fc6ca`](https://github.com/apache/spark/commit/34fc6ca6d6f3064a07cc116d183113379718c843).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66936 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66936/consoleFull)** for PR 13988 at commit [`97eec87`](https://github.com/apache/spark/commit/97eec8741c16ad4f4effc03feeec0797a33eb4d9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94773568
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala ---
    @@ -0,0 +1,272 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.math.BigDecimal
    +import java.text.NumberFormat
    +import java.util.Locale
    +
    +import scala.util.Try
    +import scala.util.control.NonFatal
    +
    +import com.univocity.parsers.csv.CsvParser
    +
    +import org.apache.spark.internal.Logging
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +import org.apache.spark.unsafe.types.UTF8String
    +
    +private[csv] class UnivocityParser(
    +    schema: StructType,
    +    requiredSchema: StructType,
    +    options: CSVOptions) extends Logging {
    +  def this(schema: StructType, options: CSVOptions) = this(schema, schema, options)
    +
    +  val valueConverters = makeConverters(schema, options)
    +  val parser = new CsvParser(options.asParserSettings)
    +
    +  // A `ValueConverter` is responsible for converting the given value to a desired type.
    +  private type ValueConverter = String => Any
    +
    +  var numMalformedRecords = 0
    +  val row = new GenericInternalRow(requiredSchema.length)
    --- End diff --
    
    Also, it separates numMalformedRecords when it calls `parse (...)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #64254 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64254/consoleFull)** for PR 13988 at commit [`346c4a6`](https://github.com/apache/spark/commit/346c4a690565f689fa04db640403d78b2f87825b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #64635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64635/consoleFull)** for PR 13988 at commit [`52ca52b`](https://github.com/apache/spark/commit/52ca52b02838ef514788b2a1675c9ed4382080ed).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62011/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66035/consoleFull)** for PR 13988 at commit [`df69c8d`](https://github.com/apache/spark/commit/df69c8df4f5d45b8c66f9990e7ac74f3262179b8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #62016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62016/consoleFull)** for PR 13988 at commit [`97d7bd4`](https://github.com/apache/spark/commit/97d7bd43a04b9188ab111ba4e773e67a4fc4a3b6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #61588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61588/consoleFull)** for PR 13988 at commit [`0d60c57`](https://github.com/apache/spark/commit/0d60c57cf1807c610bca66e06d24dfa2a0b553a0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #70923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70923/testReport)** for PR 13988 at commit [`4c6666d`](https://github.com/apache/spark/commit/4c6666dba3ae76034c95bc31bfcf97c458c59be9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67391/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #61587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61587/consoleFull)** for PR 13988 at commit [`b1050b5`](https://github.com/apache/spark/commit/b1050b51e24384e4f9af7c381bb4ad5376f36040).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    I updated the PR description. I hope this is helpful for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65716/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #67391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67391/consoleFull)** for PR 13988 at commit [`346b1d2`](https://github.com/apache/spark/commit/346b1d2aece46bfa3da44606466af87b5aabad58).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #62011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62011/consoleFull)** for PR 13988 at commit [`f0d1512`](https://github.com/apache/spark/commit/f0d151209b8f626941ae8a1a8f69440051aa0d8a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #62014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62014/consoleFull)** for PR 13988 at commit [`97d7bd4`](https://github.com/apache/spark/commit/97d7bd43a04b9188ab111ba4e773e67a4fc4a3b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66038/consoleFull)** for PR 13988 at commit [`ac94e67`](https://github.com/apache/spark/commit/ac94e670d49cea6ffc9beaab5f4d109a21e6924e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    can you split this into smaller PRs? it's really painful to review such a big refactor-only PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66936/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #63425 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63425/consoleFull)** for PR 13988 at commit [`a634435`](https://github.com/apache/spark/commit/a63443505483287fa9bb20312a24b38e75f90588).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66037/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62015/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #65716 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65716/consoleFull)** for PR 13988 at commit [`015567e`](https://github.com/apache/spark/commit/015567e383c91b7621bf2911529ced8fdc30713e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66939 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66939/consoleFull)** for PR 13988 at commit [`697f276`](https://github.com/apache/spark/commit/697f276d4f539fc73189e1021694a313b30a5cbe).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Hi @cloud-fan, do you mind if I ask to check whether it looks making sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #62015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62015/consoleFull)** for PR 13988 at commit [`97d7bd4`](https://github.com/apache/spark/commit/97d7bd43a04b9188ab111ba4e773e67a4fc4a3b6).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #67391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67391/consoleFull)** for PR 13988 at commit [`346b1d2`](https://github.com/apache/spark/commit/346b1d2aece46bfa3da44606466af87b5aabad58).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63425/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    @hvanhovell If this change looks too big, I will split this into reading path and writing path if you confirm please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66572/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #64635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64635/consoleFull)** for PR 13988 at commit [`52ca52b`](https://github.com/apache/spark/commit/52ca52b02838ef514788b2a1675c9ed4382080ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #70920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70920/testReport)** for PR 13988 at commit [`08e5fe7`](https://github.com/apache/spark/commit/08e5fe758e701e50e9d0bec6d86200ec48465cb3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94755002
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -39,22 +38,43 @@ private[csv] object CSVInferSchema {
        *     3. Replace any null types with string type
        */
       def infer(
    --- End diff --
    
    This is kind of a important change to introduce similar functionalities with JSON. (e,g., creating a dataframe from `RDD[String]` or `Dataset[String]`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66572/consoleFull)** for PR 13988 at commit [`3d01f8c`](https://github.com/apache/spark/commit/3d01f8c45751ca3ee8237e260c722f6014b6d91a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66035/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by deanchen <gi...@git.apache.org>.
Github user deanchen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r71097167
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import com.univocity.parsers.csv.{CsvWriter, CsvWriterSettings}
    +
    +import org.apache.spark.internal.Logging
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * Converts a sequence of string to CSV string
    + */
    +private[csv] object UnivocityGenerator extends Logging {
    +  /**
    +   * Transforms a single InternalRow to CSV using Univocity
    +   *
    +   * @param rowSchema the schema object used for conversion
    +   * @param writer a CsvWriter object
    +   * @param headers headers to write
    +   * @param writeHeader true if it needs to write header
    +   * @param options CSVOptions object containing options
    +   * @param row The row to convert
    +   */
    +  def apply(
    +      rowSchema: StructType,
    +      writer: CsvWriter,
    +      headers: Array[String],
    +      writeHeader: Boolean,
    +      options: CSVOptions)(row: InternalRow): Unit = {
    +    val tokens = {
    +      row.toSeq(rowSchema).map { field =>
    +        // TODO: It seems all the data types are not able to be represented by `toString`.
    +        // For example, `DateType` and `TimestampType` are being long values as timestamps.
    --- End diff --
    
    @HyukjinKwon we ran in to this issue where csv writes ints for DateType instead of date string. (https://issues.apache.org/jira/browse/SPARK-16597)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61587/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Hi @rxin, I think the change in this PR might be still pretty big. Should I maybe make this separate for reading and writing parts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    I will try to split this into two PRs for read path and write path. Would that sound okay to you both @rxin and @hvanhovell?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94756023
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala ---
    @@ -0,0 +1,272 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.math.BigDecimal
    +import java.text.NumberFormat
    +import java.util.Locale
    +
    +import scala.util.Try
    +import scala.util.control.NonFatal
    +
    +import com.univocity.parsers.csv.CsvParser
    +
    +import org.apache.spark.internal.Logging
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +import org.apache.spark.unsafe.types.UTF8String
    +
    +private[csv] class UnivocityParser(
    +    schema: StructType,
    +    requiredSchema: StructType,
    +    options: CSVOptions) extends Logging {
    +  def this(schema: StructType, options: CSVOptions) = this(schema, schema, options)
    +
    +  val valueConverters = makeConverters(schema, options)
    +  val parser = new CsvParser(options.asParserSettings)
    +
    +  // A `ValueConverter` is responsible for converting the given value to a desired type.
    +  private type ValueConverter = String => Any
    +
    +  var numMalformedRecords = 0
    +  val row = new GenericInternalRow(requiredSchema.length)
    +  val indexArr: Array[Int] = {
    +    val fields = if (options.dropMalformed) {
    +      // If `dropMalformed` is enabled, then it needs to parse all the values
    +      // so that we can decide which row is malformed.
    +      requiredSchema ++ schema.filterNot(requiredSchema.contains(_))
    +    } else {
    +      requiredSchema
    +    }
    +    fields.filter(schema.contains).map(schema.indexOf).toArray
    +  }
    +
    +  /**
    +   * Create converters which cast each given string datum to each specified type in given schema.
    +   * Currently, we do not support complex types (`ArrayType`, `MapType`, `StructType`).
    +   *
    +   * For string types, this is simply the datum.
    +   * For other types, this is converted into the value according to the type.
    +   * For other nullable types, returns null if it is null or equals to the value specified
    +   * in `nullValue` option.
    +   *
    +   * @param schema schema that contains data types to cast the given value into.
    +   * @param options CSV options.
    +   */
    +  private def makeConverters(
    +      schema: StructType,
    +      options: CSVOptions = CSVOptions()): Array[ValueConverter] = {
    +    schema.map(f => makeConverter(f.name, f.dataType, f.nullable, options)).toArray
    +  }
    +
    +  /**
    +   * Create a converter which converts the string value to a value according to a desired type.
    +   */
    +  def makeConverter(
    +      name: String,
    +      dataType: DataType,
    +      nullable: Boolean = true,
    +      options: CSVOptions = CSVOptions()): ValueConverter = dataType match {
    +    case _: ByteType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toByte)
    +
    +    case _: ShortType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toShort)
    +
    +    case _: IntegerType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toInt)
    +
    +    case _: LongType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toLong)
    +
    +    case _: FloatType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) {
    +        case options.nanValue => Float.NaN
    +        case options.negativeInf => Float.NegativeInfinity
    +        case options.positiveInf => Float.PositiveInfinity
    +        case datum =>
    +          Try(datum.toFloat)
    +            .getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).floatValue())
    +      }
    +
    +    case _: DoubleType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) {
    +        case options.nanValue => Double.NaN
    +        case options.negativeInf => Double.NegativeInfinity
    +        case options.positiveInf => Double.PositiveInfinity
    +        case datum =>
    +          Try(datum.toDouble)
    +            .getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).doubleValue())
    +      }
    +
    +    case _: BooleanType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toBoolean)
    +
    +    case dt: DecimalType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) { datum =>
    +        val value = new BigDecimal(datum.replaceAll(",", ""))
    +        Decimal(value, dt.precision, dt.scale)
    +      }
    +
    +    case _: TimestampType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) { datum =>
    +        // This one will lose microseconds parts.
    +        // See https://issues.apache.org/jira/browse/SPARK-10681.
    +        Try(options.timestampFormat.parse(datum).getTime * 1000L)
    +          .getOrElse {
    +          // If it fails to parse, then tries the way used in 2.0 and 1.x for backwards
    +          // compatibility.
    +          DateTimeUtils.stringToTime(datum).getTime * 1000L
    +        }
    +      }
    +
    +    case _: DateType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) { datum =>
    +        // This one will lose microseconds parts.
    +        // See https://issues.apache.org/jira/browse/SPARK-10681.x
    +        Try(DateTimeUtils.millisToDays(options.dateFormat.parse(datum).getTime))
    +          .getOrElse {
    +          // If it fails to parse, then tries the way used in 2.0 and 1.x for backwards
    +          // compatibility.
    +          DateTimeUtils.millisToDays(DateTimeUtils.stringToTime(datum).getTime)
    +        }
    +      }
    +
    +    case _: StringType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(UTF8String.fromString(_))
    +
    +    case udt: UserDefinedType[_] => (datum: String) =>
    +      makeConverter(name, udt.sqlType, nullable, options)
    +
    +    case _ => throw new RuntimeException(s"Unsupported type: ${dataType.typeName}")
    +  }
    +
    +  private def nullSafeDatum(
    +       datum: String,
    +       name: String,
    +       nullable: Boolean,
    +       options: CSVOptions)(converter: ValueConverter): Any = {
    +    if (datum == options.nullValue || datum == null) {
    +      if (!nullable) {
    +        throw new RuntimeException(s"null value found but field $name is not nullable.")
    +      }
    +      null
    +    } else {
    +      converter.apply(datum)
    +    }
    +  }
    +
    +  /**
    +   * Parses a single CSV record (in the form of an array of strings in which
    +   * each element represents a column) and turns it into either one resulting row or no row (if the
    +   * the record is malformed).
    +   */
    +  def parse(input: String): Option[InternalRow] = {
    --- End diff --
    
    Actually, the argument change (matching it up to `JacksonParser`) is also important. We could avoid additional refactoring when introducing the same funtionalities with `JacksonParser`, (e.g., `from_json` and `to_json` functions).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #63482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63482/consoleFull)** for PR 13988 at commit [`c143a01`](https://github.com/apache/spark/commit/c143a0173365f890551fdc5f52eb3309baaab2b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #61523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61523/consoleFull)** for PR 13988 at commit [`211bfb4`](https://github.com/apache/spark/commit/211bfb47acc79c51327b3f1c40aa86802470f436).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94755470
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala ---
    @@ -126,6 +127,39 @@ private[csv] class CSVOptions(@transient private val parameters: CaseInsensitive
       val inputBufferSize = 128
     
       val isCommentSet = this.comment != '\u0000'
    +
    +  def asWriterSettings: CsvWriterSettings = {
    --- End diff --
    
    These just came from `CSVParser`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61523/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #64254 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64254/consoleFull)** for PR 13988 at commit [`346c4a6`](https://github.com/apache/spark/commit/346c4a690565f689fa04db640403d78b2f87825b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94755659
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala ---
    @@ -0,0 +1,272 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.math.BigDecimal
    +import java.text.NumberFormat
    +import java.util.Locale
    +
    +import scala.util.Try
    +import scala.util.control.NonFatal
    +
    +import com.univocity.parsers.csv.CsvParser
    +
    +import org.apache.spark.internal.Logging
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +import org.apache.spark.unsafe.types.UTF8String
    +
    +private[csv] class UnivocityParser(
    +    schema: StructType,
    +    requiredSchema: StructType,
    +    options: CSVOptions) extends Logging {
    +  def this(schema: StructType, options: CSVOptions) = this(schema, schema, options)
    +
    +  val valueConverters = makeConverters(schema, options)
    +  val parser = new CsvParser(options.asParserSettings)
    +
    +  // A `ValueConverter` is responsible for converting the given value to a desired type.
    +  private type ValueConverter = String => Any
    +
    +  var numMalformedRecords = 0
    +  val row = new GenericInternalRow(requiredSchema.length)
    --- End diff --
    
    Now, we reuse the single row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70917/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66038/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66037/consoleFull)** for PR 13988 at commit [`a6d85b6`](https://github.com/apache/spark/commit/a6d85b69c3724a7f4228a8169bb44658d675a4fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #70923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70923/testReport)** for PR 13988 at commit [`4c6666d`](https://github.com/apache/spark/commit/4c6666dba3ae76034c95bc31bfcf97c458c59be9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62016/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13988: [SPARK-16101][SQL] Refactoring CSV data source to...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13988#discussion_r94892239
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala ---
    @@ -0,0 +1,272 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.math.BigDecimal
    +import java.text.NumberFormat
    +import java.util.Locale
    +
    +import scala.util.Try
    +import scala.util.control.NonFatal
    +
    +import com.univocity.parsers.csv.CsvParser
    +
    +import org.apache.spark.internal.Logging
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +import org.apache.spark.unsafe.types.UTF8String
    +
    +private[csv] class UnivocityParser(
    +    schema: StructType,
    +    requiredSchema: StructType,
    +    options: CSVOptions) extends Logging {
    +  def this(schema: StructType, options: CSVOptions) = this(schema, schema, options)
    +
    +  val valueConverters = makeConverters(schema, options)
    +  val parser = new CsvParser(options.asParserSettings)
    +
    +  // A `ValueConverter` is responsible for converting the given value to a desired type.
    +  private type ValueConverter = String => Any
    +
    +  var numMalformedRecords = 0
    +  val row = new GenericInternalRow(requiredSchema.length)
    +  val indexArr: Array[Int] = {
    +    val fields = if (options.dropMalformed) {
    +      // If `dropMalformed` is enabled, then it needs to parse all the values
    +      // so that we can decide which row is malformed.
    +      requiredSchema ++ schema.filterNot(requiredSchema.contains(_))
    +    } else {
    +      requiredSchema
    +    }
    +    fields.filter(schema.contains).map(schema.indexOf).toArray
    +  }
    +
    +  /**
    +   * Create converters which cast each given string datum to each specified type in given schema.
    +   * Currently, we do not support complex types (`ArrayType`, `MapType`, `StructType`).
    +   *
    +   * For string types, this is simply the datum.
    +   * For other types, this is converted into the value according to the type.
    +   * For other nullable types, returns null if it is null or equals to the value specified
    +   * in `nullValue` option.
    +   *
    +   * @param schema schema that contains data types to cast the given value into.
    +   * @param options CSV options.
    +   */
    +  private def makeConverters(
    +      schema: StructType,
    +      options: CSVOptions = CSVOptions()): Array[ValueConverter] = {
    +    schema.map(f => makeConverter(f.name, f.dataType, f.nullable, options)).toArray
    +  }
    +
    +  /**
    +   * Create a converter which converts the string value to a value according to a desired type.
    +   */
    +  def makeConverter(
    +      name: String,
    +      dataType: DataType,
    +      nullable: Boolean = true,
    +      options: CSVOptions = CSVOptions()): ValueConverter = dataType match {
    +    case _: ByteType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toByte)
    +
    +    case _: ShortType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toShort)
    +
    +    case _: IntegerType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toInt)
    +
    +    case _: LongType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toLong)
    +
    +    case _: FloatType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) {
    +        case options.nanValue => Float.NaN
    +        case options.negativeInf => Float.NegativeInfinity
    +        case options.positiveInf => Float.PositiveInfinity
    +        case datum =>
    +          Try(datum.toFloat)
    +            .getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).floatValue())
    +      }
    +
    +    case _: DoubleType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) {
    +        case options.nanValue => Double.NaN
    +        case options.negativeInf => Double.NegativeInfinity
    +        case options.positiveInf => Double.PositiveInfinity
    +        case datum =>
    +          Try(datum.toDouble)
    +            .getOrElse(NumberFormat.getInstance(Locale.US).parse(datum).doubleValue())
    +      }
    +
    +    case _: BooleanType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(_.toBoolean)
    +
    +    case dt: DecimalType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) { datum =>
    +        val value = new BigDecimal(datum.replaceAll(",", ""))
    +        Decimal(value, dt.precision, dt.scale)
    +      }
    +
    +    case _: TimestampType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) { datum =>
    +        // This one will lose microseconds parts.
    +        // See https://issues.apache.org/jira/browse/SPARK-10681.
    +        Try(options.timestampFormat.parse(datum).getTime * 1000L)
    +          .getOrElse {
    +          // If it fails to parse, then tries the way used in 2.0 and 1.x for backwards
    +          // compatibility.
    +          DateTimeUtils.stringToTime(datum).getTime * 1000L
    +        }
    +      }
    +
    +    case _: DateType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options) { datum =>
    +        // This one will lose microseconds parts.
    +        // See https://issues.apache.org/jira/browse/SPARK-10681.x
    +        Try(DateTimeUtils.millisToDays(options.dateFormat.parse(datum).getTime))
    +          .getOrElse {
    +          // If it fails to parse, then tries the way used in 2.0 and 1.x for backwards
    +          // compatibility.
    +          DateTimeUtils.millisToDays(DateTimeUtils.stringToTime(datum).getTime)
    +        }
    +      }
    +
    +    case _: StringType => (d: String) =>
    +      nullSafeDatum(d, name, nullable, options)(UTF8String.fromString(_))
    +
    +    case udt: UserDefinedType[_] => (datum: String) =>
    +      makeConverter(name, udt.sqlType, nullable, options)
    +
    +    case _ => throw new RuntimeException(s"Unsupported type: ${dataType.typeName}")
    +  }
    +
    +  private def nullSafeDatum(
    +       datum: String,
    +       name: String,
    +       nullable: Boolean,
    +       options: CSVOptions)(converter: ValueConverter): Any = {
    +    if (datum == options.nullValue || datum == null) {
    +      if (!nullable) {
    +        throw new RuntimeException(s"null value found but field $name is not nullable.")
    +      }
    +      null
    +    } else {
    +      converter.apply(datum)
    +    }
    +  }
    +
    +  /**
    +   * Parses a single CSV record (in the form of an array of strings in which
    +   * each element represents a column) and turns it into either one resulting row or no row (if the
    +   * the record is malformed).
    +   */
    +  def parse(input: String): Option[InternalRow] = {
    --- End diff --
    
    For example, PR - 13300 introduces such refactoring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13988
  
    **[Test build #66038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66038/consoleFull)** for PR 13988 at commit [`ac94e67`](https://github.com/apache/spark/commit/ac94e670d49cea6ffc9beaab5f4d109a21e6924e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org