You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2016/05/22 13:14:01 UTC

[GitHub] spark pull request: [SPARK-15475][SQL] Support for reading text da...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/13254

    [SPARK-15475][SQL] Support for reading text data source without specifying schema

    ## What changes were proposed in this pull request?
    
    Currently, Text data source requires a schema.
    
    So the codes below:
    
    ```scala
    emptyDf.write
      .format("text")
      .save(path.getCanonicalPath)
    
    val copyEmptyDf = spark.read
      .format("text")
      .load(path.getCanonicalPath)
    
    copyEmptyDf.show()
    ```
    
    throws an exception below:
    
    ```scala
    key not found: dataSchema
    java.util.NoSuchElementException: key not found: dataSchema
    	at scala.collection.MapLike$class.default(MapLike.scala:228)
    	at org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.default(ddl.scala:139)
    	at scala.collection.MapLike$class.apply(MapLike.scala:141)
    	at org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.apply(ddl.scala:139)
    	at org.apache.spark.sql.sources.SimpleTextSource.inferSchema(SimpleTextRelation.scala:43)
    	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:347)
    	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:347)
    	at scala.Option.orElse(Option.scala:289)
    ```
    
    This PR adds the support for a default schema with unnamed columns just like CSV data sources.
    
    ## How was this patch tested?
    
    Unit test in `SimpleTextHadoopFsRelationSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark text-msg

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13254.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13254
    
----
commit a542335f42956d63c7d4513bbaa9097406677c9d
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-05-22T13:09:17Z

    Support for reading text data source without specifying schema

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220834110
  
    **[Test build #59106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59106/consoleFull)** for PR 13254 at commit [`0c58c4f`](https://github.com/apache/spark/commit/0c58c4ff9adf7bff61c3f07e39ec780b1599da55).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15476][SQL] Support for reading text da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon closed the pull request at:

    https://github.com/apache/spark/pull/13254


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15475][SQL] Support for reading text da...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220870836
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220835089
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59104/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15476][SQL] Support for reading text da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-221201817
  
    I was totally stupid. I was testing `test` for `text`.. I am closing this sorry for my cc. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220837154
  
    **[Test build #59106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59106/consoleFull)** for PR 13254 at commit [`0c58c4f`](https://github.com/apache/spark/commit/0c58c4ff9adf7bff61c3f07e39ec780b1599da55).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220836507
  
    **[Test build #59105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59105/consoleFull)** for PR 13254 at commit [`6e3a1b0`](https://github.com/apache/spark/commit/6e3a1b0e876ba2356a135d0231e66ab191301818).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15475][SQL] Support for reading text da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220866532
  
    @andrewor14 Could you please take a look maybe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220837218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220837219
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59106/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15475][SQL] Support for reading text da...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220870762
  
    **[Test build #59117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59117/consoleFull)** for PR 13254 at commit [`6239611`](https://github.com/apache/spark/commit/6239611ac0abff6f7468db7c3bebc855609ac265).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220835034
  
    **[Test build #59104 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59104/consoleFull)** for PR 13254 at commit [`a542335`](https://github.com/apache/spark/commit/a542335f42956d63c7d4513bbaa9097406677c9d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15475][SQL] Support for reading text da...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220866525
  
    **[Test build #59117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59117/consoleFull)** for PR 13254 at commit [`6239611`](https://github.com/apache/spark/commit/6239611ac0abff6f7468db7c3bebc855609ac265).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15475][SQL] Support for reading text da...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220870837
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59117/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15475][SQL] Support for reading text da...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220831936
  
    **[Test build #59104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59104/consoleFull)** for PR 13254 at commit [`a542335`](https://github.com/apache/spark/commit/a542335f42956d63c7d4513bbaa9097406677c9d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220833389
  
    **[Test build #59105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59105/consoleFull)** for PR 13254 at commit [`6e3a1b0`](https://github.com/apache/spark/commit/6e3a1b0e876ba2356a135d0231e66ab191301818).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15475][SQL] Support for reading text da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13254#discussion_r64149589
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala ---
    @@ -515,20 +515,20 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils with Tes
             .save(subdir.getCanonicalPath)
     
           // Inferring schema should throw error as it should not find any file to infer
    -      val e = intercept[Exception] {
    -        spark.read.format(dataSourceName).load(dir.getCanonicalPath)
    -      }
    -
    -      e match {
    -        case _: AnalysisException =>
    -          assert(e.getMessage.contains("infer"))
    +      if (dataSourceName != classOf[SimpleTextSource].getCanonicalName) {
    --- End diff --
    
    Simply `dataSourceName != classOf[SimpleTextSource].getCanonicalName` checking is added here because text data source dose not throw an exception anymore if `dataSchema` is not given.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220836541
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220835086
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15476][SQL] Support for reading text da...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-221199393
  
    Oh, I was doing this for a `test`. Sorry, I will update this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-15475][SQL] Support for reading te...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13254#issuecomment-220836542
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59105/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org