You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/03/21 14:27:23 UTC

[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/17377

    [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum and update related comments

    ## What changes were proposed in this pull request?
    
    This PR proposes to make `mode` options in both CSV and JSON to use enumeration and fix some related comments related previous fix.
    
    Also, this PR modifies some tests related parse modes.
    
    ## How was this patch tested?
    
    Modified unit tests in both `CSVSuite.scala` and `JsonSuite.scala`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-19949

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17377.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17377
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17377#discussion_r107385007
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ParseMode.scala ---
    @@ -17,25 +17,35 @@
     
     package org.apache.spark.sql.catalyst.util
     
    -object ParseModes {
    -  val PERMISSIVE_MODE = "PERMISSIVE"
    -  val DROP_MALFORMED_MODE = "DROPMALFORMED"
    -  val FAIL_FAST_MODE = "FAILFAST"
    +import org.apache.spark.internal.Logging
     
    -  val DEFAULT = PERMISSIVE_MODE
    +object ParseMode extends Enumeration with Logging {
    --- End diff --
    
    it's not public, not a big deal


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Clean up parse modes and u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    **[Test build #75049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75049/testReport)** for PR 17377 at commit [`4b32536`](https://github.com/apache/spark/commit/4b32536e141e652dc65c0623c48d16a124ea3568).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `sealed trait ParseMode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17377#discussion_r107169305
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -369,10 +369,10 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
             :param maxCharsPerColumn: defines the maximum number of characters allowed for any given
                                       value being read. If None is set, it uses the default value,
                                       ``-1`` meaning unlimited length.
    -        :param maxMalformedLogPerPartition: sets the maximum number of malformed rows Spark will
    -                                            log for each partition. Malformed records beyond this
    -                                            number will be ignored. If None is set, it
    -                                            uses the default value, ``10``.
    +        :param maxMalformedLogPerPartition: previously sets the maximum number of malformed rows
    --- End diff --
    
    We can't just remove this option. Otherwise, it will break the existing python codes that use those options via positional arguments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    So far, the documentation of these data source options are missing. In the last release, we clean up the [JDBC options](http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases) in the documentation. Do you think you have the bandwidth to do it for csv and json?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    **[Test build #75049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75049/testReport)** for PR 17377 at commit [`4b32536`](https://github.com/apache/spark/commit/4b32536e141e652dc65c0623c48d16a124ea3568).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Clean up parse mode...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17377


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17377#discussion_r107169934
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ParseMode.scala ---
    @@ -17,25 +17,35 @@
     
     package org.apache.spark.sql.catalyst.util
     
    -object ParseModes {
    -  val PERMISSIVE_MODE = "PERMISSIVE"
    -  val DROP_MALFORMED_MODE = "DROPMALFORMED"
    -  val FAIL_FAST_MODE = "FAILFAST"
    +import org.apache.spark.internal.Logging
     
    -  val DEFAULT = PERMISSIVE_MODE
    +object ParseMode extends Enumeration with Logging {
    +  type ParseMode = Value
     
    -  def isValidMode(mode: String): Boolean = {
    -    mode.toUpperCase match {
    -      case PERMISSIVE_MODE | DROP_MALFORMED_MODE | FAIL_FAST_MODE => true
    -      case _ => false
    -    }
    -  }
    +  /**
    +   * This mode permissively parses the records.
    +   */
    +  val Permissive = Value("PERMISSIVE")
    +
    +  /**
    +   * This mode ignores the whole corrupted records.
    +   */
    +  val DropMalformed = Value("DROPMALFORMED")
    +
    +  /**
    +   * This mode throws an exception when it meets corrupted records.
    +   */
    +  val FailFast = Value("FAILFAST")
     
    -  def isDropMalformedMode(mode: String): Boolean = mode.toUpperCase == DROP_MALFORMED_MODE
    -  def isFailFastMode(mode: String): Boolean = mode.toUpperCase == FAIL_FAST_MODE
    -  def isPermissiveMode(mode: String): Boolean = if (isValidMode(mode))  {
    -    mode.toUpperCase == PERMISSIVE_MODE
    -  } else {
    -    true // We default to permissive is the mode string is not valid
    +  /**
    +   * Returns `ParseMode` enum from the given string.
    +   */
    +  def fromString(mode: String): ParseMode = mode.toUpperCase match {
    +    case "PERMISSIVE" => ParseMode.Permissive
    --- End diff --
    
    We can use `Permissive.toString`. 
    
    ```
    Error:(34, 33) stable identifier required, but ParseMode.Permissive.toString found.
          case ParseMode.Permissive.toString => ParseMode.Permissive
                                    ^
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74985/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Clean up parse modes and u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75049/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    **[Test build #74985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74985/testReport)** for PR 17377 at commit [`bf155ab`](https://github.com/apache/spark/commit/bf155ab309cc23f880ff1cb27de1726a18530b25).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    I raised a JIRA about the documentation with my humble suggestion in [SPARK-20055](https://issues.apache.org/jira/browse/SPARK-20055).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    **[Test build #75040 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75040/testReport)** for PR 17377 at commit [`80e5be8`](https://github.com/apache/spark/commit/80e5be83c5449f0e24ea50c8002195fffcfa798a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    cc @cloud-fan and @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17377#discussion_r107171523
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1083,83 +1083,59 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
       }
     
       test("Corrupt records: PERMISSIVE mode, without designated column for malformed records") {
    -    withTempView("jsonTable") {
    -      val schema = StructType(
    -        StructField("a", StringType, true) ::
    -          StructField("b", StringType, true) ::
    -          StructField("c", StringType, true) :: Nil)
    +    val schema = StructType(
    +      StructField("a", StringType, true) ::
    +        StructField("b", StringType, true) ::
    +        StructField("c", StringType, true) :: Nil)
     
    -      val jsonDF = spark.read.schema(schema).json(corruptRecords)
    -      jsonDF.createOrReplaceTempView("jsonTable")
    +    val jsonDF = spark.read.schema(schema).json(corruptRecords)
     
    -      checkAnswer(
    -        sql(
    -          """
    -            |SELECT a, b, c
    -            |FROM jsonTable
    -          """.stripMargin),
    -        Seq(
    -          // Corrupted records are replaced with null
    -          Row(null, null, null),
    -          Row(null, null, null),
    -          Row(null, null, null),
    -          Row("str_a_4", "str_b_4", "str_c_4"),
    -          Row(null, null, null))
    -      )
    -    }
    +    checkAnswer(
    +      jsonDF.select($"a", $"b", $"c"),
    +      Seq(
    +        // Corrupted records are replaced with null
    +        Row(null, null, null),
    +        Row(null, null, null),
    +        Row(null, null, null),
    +        Row("str_a_4", "str_b_4", "str_c_4"),
    +        Row(null, null, null))
    +    )
       }
     
       test("Corrupt records: PERMISSIVE mode, with designated column for malformed records") {
         // Test if we can query corrupt records.
         withSQLConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD.key -> "_unparsed") {
    -      withTempView("jsonTable") {
    -        val jsonDF = spark.read.json(corruptRecords)
    -        jsonDF.createOrReplaceTempView("jsonTable")
    -        val schema = StructType(
    -          StructField("_unparsed", StringType, true) ::
    +      val jsonDF = spark.read.json(corruptRecords)
    +      val schema = StructType(
    +        StructField("_unparsed", StringType, true) ::
               StructField("a", StringType, true) ::
               StructField("b", StringType, true) ::
               StructField("c", StringType, true) :: Nil)
     
    -        assert(schema === jsonDF.schema)
    --- End diff --
    
    Here too. The actual changes are as below:
    
    While trying to check related other PRs, I saw some minor comments in https://github.com/apache/spark/pull/14929.
    
    The actual changes are as below:
    
    **From**
    
    ```
    withTempView("jsonTable") {
      ...
      jsonDF.createOrReplaceTempView("jsonTable")
      ...
        sql(
          """
            |SELECT a, b, c, _unparsed
            |FROM jsonTable
          """.stripMargin),
      ...
        sql(
          """
            |SELECT a, b, c
            |FROM jsonTable
            |WHERE _unparsed IS NULL
          """.stripMargin),
      ...
        sql(
         """
            |SELECT _unparsed
            |FROM jsonTable
            |WHERE _unparsed IS NOT NULL
          """.stripMargin),
    ...
    }
    ```
    
    **To**
    
    ```
    ...
    jsonDF.select($"a", $"b", $"c", $"_unparsed"),
    ...
    jsonDF.filter($"_unparsed".isNull).select($"a", $"b", $"c"),
    ...
    jsonDF.filter($"_unparsed".isNotNull).select($"_unparsed"),
    ...
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Clean up parse modes and u...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Clean up parse modes and u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17377#discussion_r107169080
  
    --- Diff: python/pyspark/sql/streaming.py ---
    @@ -625,6 +625,10 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
             :param maxCharsPerColumn: defines the maximum number of characters allowed for any given
                                       value being read. If None is set, it uses the default value,
                                       ``-1`` meaning unlimited length.
    +        :param maxMalformedLogPerPartition: previously sets the maximum number of malformed rows
    --- End diff --
    
    It seems this documentation was missed. See above - https://github.com/apache/spark/pull/17377/files#diff-1ffa6007687db29eb32770f95d817144L572


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Clean up parse modes and u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    **[Test build #74985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74985/testReport)** for PR 17377 at commit [`bf155ab`](https://github.com/apache/spark/commit/bf155ab309cc23f880ff1cb27de1726a18530b25).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Clean up parse modes and u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75050/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75040/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17377#discussion_r107170585
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1083,83 +1083,59 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
       }
     
       test("Corrupt records: PERMISSIVE mode, without designated column for malformed records") {
    -    withTempView("jsonTable") {
    -      val schema = StructType(
    -        StructField("a", StringType, true) ::
    -          StructField("b", StringType, true) ::
    -          StructField("c", StringType, true) :: Nil)
    +    val schema = StructType(
    --- End diff --
    
    While trying to check related other PRs, I saw some minor comments in https://github.com/apache/spark/pull/14929.
    
    The actual changes are as below:
    
    **From**
    
    ```
    withTempView("jsonTable") {
      ...
      jsonDF.createOrReplaceTempView("jsonTable")
      ...
        sql(
          """
             |SELECT a, b, c
             |FROM jsonTable
          """.stripMargin),
      ...
    }
    ```
    
    **To**
    
    ```
    jsonDF.select($"a", $"b", $"c"),
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    **[Test build #75050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75050/testReport)** for PR 17377 at commit [`1f93927`](https://github.com/apache/spark/commit/1f939277eced6acd786a00a7c2e0d6a0113c1a86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    **[Test build #75040 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75040/testReport)** for PR 17377 at commit [`80e5be8`](https://github.com/apache/spark/commit/80e5be83c5449f0e24ea50c8002195fffcfa798a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17377#discussion_r107243921
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ParseMode.scala ---
    @@ -17,25 +17,35 @@
     
     package org.apache.spark.sql.catalyst.util
     
    -object ParseModes {
    -  val PERMISSIVE_MODE = "PERMISSIVE"
    -  val DROP_MALFORMED_MODE = "DROPMALFORMED"
    -  val FAIL_FAST_MODE = "FAILFAST"
    +import org.apache.spark.internal.Logging
     
    -  val DEFAULT = PERMISSIVE_MODE
    +object ParseMode extends Enumeration with Logging {
    --- End diff --
    
    Not sure whether we should use JAVA Enum instead. cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as enum a...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    Definitely. Thanks for asking it. Let me open another PR soon for both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17377#discussion_r107385370
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ParseMode.scala ---
    @@ -17,25 +17,35 @@
     
     package org.apache.spark.sql.catalyst.util
     
    -object ParseModes {
    -  val PERMISSIVE_MODE = "PERMISSIVE"
    -  val DROP_MALFORMED_MODE = "DROPMALFORMED"
    -  val FAIL_FAST_MODE = "FAILFAST"
    +import org.apache.spark.internal.Logging
     
    -  val DEFAULT = PERMISSIVE_MODE
    +object ParseMode extends Enumeration with Logging {
    --- End diff --
    
    seems people usually use `sealed trait` and `case object` to implement enum in scala, see http://stackoverflow.com/questions/1898932/case-objects-vs-enumerations-in-scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17377#discussion_r107235659
  
    --- Diff: python/pyspark/sql/streaming.py ---
    @@ -625,6 +625,10 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
             :param maxCharsPerColumn: defines the maximum number of characters allowed for any given
                                       value being read. If None is set, it uses the default value,
                                       ``-1`` meaning unlimited length.
    +        :param maxMalformedLogPerPartition: previously sets the maximum number of malformed rows
    +                                            Spark will log. However, it does not log them after
    +                                            2.2.0. This parameter exists only for backwards
    +                                            compatibility for positional arguments.
    --- End diff --
    
    The same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Clean up parse modes and u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    **[Test build #75050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75050/testReport)** for PR 17377 at commit [`1f93927`](https://github.com/apache/spark/commit/1f939277eced6acd786a00a7c2e0d6a0113c1a86).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17377: [SPARK-19949][SQL][FOLLOW-UP] Make parse modes as...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17377#discussion_r107235501
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -369,10 +369,10 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
             :param maxCharsPerColumn: defines the maximum number of characters allowed for any given
                                       value being read. If None is set, it uses the default value,
                                       ``-1`` meaning unlimited length.
    -        :param maxMalformedLogPerPartition: sets the maximum number of malformed rows Spark will
    -                                            log for each partition. Malformed records beyond this
    -                                            number will be ignored. If None is set, it
    -                                            uses the default value, ``10``.
    +        :param maxMalformedLogPerPartition: previously sets the maximum number of malformed rows
    +                                            Spark will log. However, it does not log them after
    +                                            2.2.0. This parameter exists only for backwards
    +                                            compatibility for positional arguments.
    --- End diff --
    
    Let us simplify it to 
    > This parameter is no longer used since Spark 2.2.0. If specified, it is ignored.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17377: [SPARK-19949][SQL][FOLLOW-UP] Clean up parse modes and u...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17377
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org