You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2016/01/25 06:00:20 UTC

[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/10895

    [SPARK-12901][SQL] Refactor options for JSON and CSV datasource (not case class and same format).

    https://issues.apache.org/jira/browse/SPARK-12901
    This PR refactors the options in JSON and CSV datasources.
    
    In more details,
    
    1. `JSONOptions` uses the same format as `CSVOptions`.
    2. Not case classes.
    3. `CSVRelation` that does not have to be serializable (it was `with Serializable` but I removed) 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-12901

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10895.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10895
    
----
commit 3770ffba766f014dc7a8b2332d7d8ab40dc17750
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-01-25T04:54:44Z

    Refactor CSVParamters and JSONOptions.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10895#issuecomment-174418444
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49977/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10895#discussion_r50709870
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala ---
    @@ -19,13 +19,12 @@ package org.apache.spark.sql.execution.datasources.csv
     
     import java.nio.charset.Charset
     
    -import org.apache.hadoop.io.compress._
    -
     import org.apache.spark.Logging
     import org.apache.spark.sql.execution.datasources.CompressionCodecs
    -import org.apache.spark.util.Utils
     
    -private[sql] case class CSVParameters(@transient parameters: Map[String, String]) extends Logging {
    +private[sql] class CSVOptions(
    +    @transient parameters: Map[String, String])
    --- End diff --
    
    I think you need `@transient private val` at here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10895#issuecomment-174441100
  
    Thanks - I've merged this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10895#discussion_r50709959
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONOptions.scala ---
    @@ -26,16 +26,30 @@ import org.apache.spark.sql.execution.datasources.CompressionCodecs
      *
      * Most of these map directly to Jackson's internal options, specified in [[JsonParser.Feature]].
      */
    -case class JSONOptions(
    -    samplingRatio: Double = 1.0,
    -    primitivesAsString: Boolean = false,
    -    allowComments: Boolean = false,
    -    allowUnquotedFieldNames: Boolean = false,
    -    allowSingleQuotes: Boolean = true,
    -    allowNumericLeadingZeros: Boolean = false,
    -    allowNonNumericNumbers: Boolean = false,
    -    allowBackslashEscapingAnyCharacter: Boolean = false,
    -    compressionCodec: Option[String] = None) {
    +private[sql] class JSONOptions(
    +    @transient parameters: Map[String, String])
    --- End diff --
    
    same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10895#issuecomment-174405059
  
    **[Test build #49977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49977/consoleFull)** for PR 10895 at commit [`3770ffb`](https://github.com/apache/spark/commit/3770ffba766f014dc7a8b2332d7d8ab40dc17750).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10895#issuecomment-174671310
  
    I have pushed https://github.com/apache/spark/commit/00026fa9912ecee5637f1e7dd222f977f31f6766 to fix the 2.11 build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/10895#issuecomment-174717824
  
    @yhuai Sorry, I just checked this notification. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10895#discussion_r50738824
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala ---
    @@ -19,13 +19,12 @@ package org.apache.spark.sql.execution.datasources.csv
     
     import java.nio.charset.Charset
     
    -import org.apache.hadoop.io.compress._
    -
     import org.apache.spark.Logging
     import org.apache.spark.sql.execution.datasources.CompressionCodecs
    -import org.apache.spark.util.Utils
     
    -private[sql] case class CSVParameters(@transient parameters: Map[String, String]) extends Logging {
    +private[sql] class CSVOptions(
    +    @transient parameters: Map[String, String])
    --- End diff --
    
    Yep, that's the right fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10895#issuecomment-174418164
  
    **[Test build #49977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49977/consoleFull)** for PR 10895 at commit [`3770ffb`](https://github.com/apache/spark/commit/3770ffba766f014dc7a8b2332d7d8ab40dc17750).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10895#discussion_r50709990
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala ---
    @@ -19,13 +19,12 @@ package org.apache.spark.sql.execution.datasources.csv
     
     import java.nio.charset.Charset
     
    -import org.apache.hadoop.io.compress._
    -
     import org.apache.spark.Logging
     import org.apache.spark.sql.execution.datasources.CompressionCodecs
    -import org.apache.spark.util.Utils
     
    -private[sql] case class CSVParameters(@transient parameters: Map[String, String]) extends Logging {
    +private[sql] class CSVOptions(
    +    @transient parameters: Map[String, String])
    --- End diff --
    
    https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-sbt-scala-2.11/276/console is the failed build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10895


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10895#issuecomment-174418440
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12901][SQL] Refactor options for JSON a...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10895#issuecomment-174547794
  
    @HyukjinKwon If you get a chance, can you fit the scala compilation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org