You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/01/07 14:54:14 UTC

[GitHub] spark pull request #16496: [SPARK-16101][SQL] Refactoring CSV write path to ...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/16496

    [SPARK-16101][SQL] Refactoring CSV write path to be consistent with JSON data source

    ## What changes were proposed in this pull request?
    
    This PR refactors CSV write path to be consistent with JSON data source.
    
    This PR makes the methods in classes have consistent arguments with JSON ones.
      - `UnivocityGenerator` and `JacksonGenerator` 
        
        ``` scala
        private[csv] class UnivocityGenerator(
            schema: StructType,
            writer: Writer,
            options: CSVOptions = new CSVOptions(Map.empty[String, String])) {
        ...
        
        def write ...
        def close ...
        def flush ...
        ```
        
        ``` scala
        private[sql] class JacksonGenerator(
           schema: StructType,
           writer: Writer,
           options: JSONOptions = new JSONOptions(Map.empty[String, String])) {
        ...
        
        def write ...
        def close ...
        def flush ...
        ```
    
    - This PR also makes the classes put in together in a consistent manner with JSON.
      - `CsvFileFormat`
        
        ``` scala
        CsvFileFormat
        CsvOutputWriter
        ```
    
      - `JsonFileFormat`
        
        ``` scala
        JsonFileFormat
        JsonOutputWriter
        ```
    
    ## How was this patch tested?
    
    Existing tests should cover this.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-16101-write

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16496.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16496
    
----
commit b8d97f7c391ea1f833d7a4025e3c627fc6385cfb
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-01-07T14:47:25Z

    Refactoring CSV write path to be consistent with JSON data source

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16496: [SPARK-16101][SQL] Refactoring CSV write path to ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16496#discussion_r97043612
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala ---
    @@ -0,0 +1,91 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.io.Writer
    +
    +import com.univocity.parsers.csv.CsvWriter
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +
    +private[csv] class UnivocityGenerator(
    +    schema: StructType,
    +    writer: Writer,
    +    options: CSVOptions = new CSVOptions(Map.empty[String, String])) {
    +  private val writerSettings = options.asWriterSettings
    +  writerSettings.setHeaders(schema.fieldNames: _*)
    +  private val gen = new CsvWriter(writer, writerSettings)
    --- End diff --
    
    nit:  why call it `gen`? how about `writer`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    **[Test build #71019 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71019/testReport)** for PR 16496 at commit [`943ebb7`](https://github.com/apache/spark/commit/943ebb7691f5c0591764492747c78e4bbba25b46).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    **[Test build #71016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71016/testReport)** for PR 16496 at commit [`b8d97f7`](https://github.com/apache/spark/commit/b8d97f7c391ea1f833d7a4025e3c627fc6385cfb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    **[Test build #71025 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71025/testReport)** for PR 16496 at commit [`943ebb7`](https://github.com/apache/spark/commit/943ebb7691f5c0591764492747c78e4bbba25b46).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    **[Test build #71718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71718/testReport)** for PR 16496 at commit [`a507795`](https://github.com/apache/spark/commit/a5077955f8648c17ed17c908dcbd188cdf57e829).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    @cloud-fan, could you take a look please? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    cc @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71718/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16496: [SPARK-16101][SQL] Refactoring CSV write path to ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16496


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    **[Test build #71016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71016/testReport)** for PR 16496 at commit [`b8d97f7`](https://github.com/apache/spark/commit/b8d97f7c391ea1f833d7a4025e3c627fc6385cfb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    **[Test build #71019 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71019/testReport)** for PR 16496 at commit [`943ebb7`](https://github.com/apache/spark/commit/943ebb7691f5c0591764492747c78e4bbba25b46).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16496: [SPARK-16101][SQL] Refactoring CSV write path to ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16496#discussion_r97051937
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala ---
    @@ -0,0 +1,91 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.io.Writer
    +
    +import com.univocity.parsers.csv.CsvWriter
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +
    +private[csv] class UnivocityGenerator(
    +    schema: StructType,
    +    writer: Writer,
    +    options: CSVOptions = new CSVOptions(Map.empty[String, String])) {
    +  private val writerSettings = options.asWriterSettings
    +  writerSettings.setHeaders(schema.fieldNames: _*)
    +  private val gen = new CsvWriter(writer, writerSettings)
    +  private var printHeader = options.headerFlag
    +
    +  // A `ValueConverter` is responsible for converting a value of an `InternalRow` to `String`.
    +  // When the value is null, this converter should not be called.
    +  private type ValueConverter = (InternalRow, Int) => String
    +
    +  // `ValueConverter`s for all values in the fields of the schema
    +  private val valueConverters: Array[ValueConverter] =
    +    schema.map(_.dataType).map(makeConverter).toArray
    +
    +  private def makeConverter(dataType: DataType): ValueConverter = dataType match {
    +    case DateType =>
    +      (row: InternalRow, ordinal: Int) =>
    +        options.dateFormat.format(DateTimeUtils.toJavaDate(row.getInt(ordinal)))
    +
    +    case TimestampType =>
    +      (row: InternalRow, ordinal: Int) =>
    +        options.timestampFormat.format(DateTimeUtils.toJavaTimestamp(row.getLong(ordinal)))
    +
    +    case udt: UserDefinedType[_] => makeConverter(udt.sqlType)
    +
    +    case dt: DataType =>
    +      (row: InternalRow, ordinal: Int) =>
    +        row.get(ordinal, dt).toString
    +  }
    +
    +  private def convertRow(row: InternalRow): Seq[String] = {
    +    var i = 0
    +    val values = new Array[String](row.numFields)
    +    while (i < row.numFields) {
    +      if (!row.isNullAt(i)) {
    +        values(i) = valueConverters(i).apply(row, i)
    +      } else {
    +        values(i) = options.nullValue
    +      }
    +      i += 1
    +    }
    +    values
    +  }
    +
    +  /**
    +   * Writes a single InternalRow to CSV using Univocity
    +   *
    +   * @param row The row to convert
    --- End diff --
    
    Ah.. let me just remove this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16496: [SPARK-16101][SQL] Refactoring CSV write path to ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16496#discussion_r97044269
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala ---
    @@ -0,0 +1,91 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.io.Writer
    +
    +import com.univocity.parsers.csv.CsvWriter
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +
    +private[csv] class UnivocityGenerator(
    +    schema: StructType,
    +    writer: Writer,
    +    options: CSVOptions = new CSVOptions(Map.empty[String, String])) {
    +  private val writerSettings = options.asWriterSettings
    +  writerSettings.setHeaders(schema.fieldNames: _*)
    +  private val gen = new CsvWriter(writer, writerSettings)
    +  private var printHeader = options.headerFlag
    +
    +  // A `ValueConverter` is responsible for converting a value of an `InternalRow` to `String`.
    +  // When the value is null, this converter should not be called.
    +  private type ValueConverter = (InternalRow, Int) => String
    +
    +  // `ValueConverter`s for all values in the fields of the schema
    +  private val valueConverters: Array[ValueConverter] =
    +    schema.map(_.dataType).map(makeConverter).toArray
    +
    +  private def makeConverter(dataType: DataType): ValueConverter = dataType match {
    +    case DateType =>
    +      (row: InternalRow, ordinal: Int) =>
    +        options.dateFormat.format(DateTimeUtils.toJavaDate(row.getInt(ordinal)))
    +
    +    case TimestampType =>
    +      (row: InternalRow, ordinal: Int) =>
    +        options.timestampFormat.format(DateTimeUtils.toJavaTimestamp(row.getLong(ordinal)))
    +
    +    case udt: UserDefinedType[_] => makeConverter(udt.sqlType)
    +
    +    case dt: DataType =>
    +      (row: InternalRow, ordinal: Int) =>
    +        row.get(ordinal, dt).toString
    +  }
    +
    +  private def convertRow(row: InternalRow): Seq[String] = {
    +    var i = 0
    +    val values = new Array[String](row.numFields)
    +    while (i < row.numFields) {
    +      if (!row.isNullAt(i)) {
    +        values(i) = valueConverters(i).apply(row, i)
    +      } else {
    +        values(i) = options.nullValue
    +      }
    +      i += 1
    +    }
    +    values
    +  }
    +
    +  /**
    +   * Writes a single InternalRow to CSV using Univocity
    +   *
    +   * @param row The row to convert
    --- End diff --
    
    Actually I think it's too obvious and we don't need this param doc...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16496: [SPARK-16101][SQL] Refactoring CSV write path to ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16496#discussion_r97044177
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala ---
    @@ -0,0 +1,91 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources.csv
    +
    +import java.io.Writer
    +
    +import com.univocity.parsers.csv.CsvWriter
    +
    +import org.apache.spark.sql.catalyst.InternalRow
    +import org.apache.spark.sql.catalyst.util.DateTimeUtils
    +import org.apache.spark.sql.types._
    +
    +private[csv] class UnivocityGenerator(
    +    schema: StructType,
    +    writer: Writer,
    +    options: CSVOptions = new CSVOptions(Map.empty[String, String])) {
    +  private val writerSettings = options.asWriterSettings
    +  writerSettings.setHeaders(schema.fieldNames: _*)
    +  private val gen = new CsvWriter(writer, writerSettings)
    +  private var printHeader = options.headerFlag
    +
    +  // A `ValueConverter` is responsible for converting a value of an `InternalRow` to `String`.
    +  // When the value is null, this converter should not be called.
    +  private type ValueConverter = (InternalRow, Int) => String
    +
    +  // `ValueConverter`s for all values in the fields of the schema
    +  private val valueConverters: Array[ValueConverter] =
    +    schema.map(_.dataType).map(makeConverter).toArray
    +
    +  private def makeConverter(dataType: DataType): ValueConverter = dataType match {
    +    case DateType =>
    +      (row: InternalRow, ordinal: Int) =>
    +        options.dateFormat.format(DateTimeUtils.toJavaDate(row.getInt(ordinal)))
    +
    +    case TimestampType =>
    +      (row: InternalRow, ordinal: Int) =>
    +        options.timestampFormat.format(DateTimeUtils.toJavaTimestamp(row.getLong(ordinal)))
    +
    +    case udt: UserDefinedType[_] => makeConverter(udt.sqlType)
    +
    +    case dt: DataType =>
    +      (row: InternalRow, ordinal: Int) =>
    +        row.get(ordinal, dt).toString
    +  }
    +
    +  private def convertRow(row: InternalRow): Seq[String] = {
    +    var i = 0
    +    val values = new Array[String](row.numFields)
    +    while (i < row.numFields) {
    +      if (!row.isNullAt(i)) {
    +        values(i) = valueConverters(i).apply(row, i)
    +      } else {
    +        values(i) = options.nullValue
    +      }
    +      i += 1
    +    }
    +    values
    +  }
    +
    +  /**
    +   * Writes a single InternalRow to CSV using Univocity
    +   *
    +   * @param row The row to convert
    --- End diff --
    
    nit: the row to write


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71019/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    **[Test build #71025 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71025/testReport)** for PR 16496 at commit [`943ebb7`](https://github.com/apache/spark/commit/943ebb7691f5c0591764492747c78e4bbba25b46).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    **[Test build #71718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71718/testReport)** for PR 16496 at commit [`a507795`](https://github.com/apache/spark/commit/a5077955f8648c17ed17c908dcbd188cdf57e829).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71025/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16496: [SPARK-16101][SQL] Refactoring CSV write path to be cons...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16496
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71016/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org