You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2016/02/19 11:21:08 UTC

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/11270

    [SPARK-8000][SQL] Support for auto-detecting data sources.

    https://issues.apache.org/jira/browse/SPARK-8000
    
    This PR adds the support for detecting data source by extension.
    
    As I described in comments, detection follows the steps below:
    
    This tries to find out data source by file extension if the `format()` is not called.
    The auto-detection is based on given paths and it recognizes glob pattern as well but
    it does not recursively check the sub-paths even if the given paths are directories.
    This source detection goes the following steps
    
       1. Check `provider` and use this if this is not `null`.
       2. If `provider` is not given, then it tries to detect the source types by extension.
           at this point, if detects only if all the given paths have the same extension.
       3. if it fails to detect, use the datasource given to `spark.sql.sources.default`.
    
    
    Each tests has been added for each datasource.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-8000

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11270.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11270
    
----
commit 23ba7266358a3de4800bb65da316c20f60bbf7a8
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-02-19T10:15:44Z

    Support for auto-detecting data sources.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186293737
  
    **[Test build #51560 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51560/consoleFull)** for PR 11270 at commit [`23ba726`](https://github.com/apache/spark/commit/23ba7266358a3de4800bb65da316c20f60bbf7a8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187566782
  
    **[Test build #51740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51740/consoleFull)** for PR 11270 at commit [`cb3d4e8`](https://github.com/apache/spark/commit/cb3d4e89afcd33e84cbc269a2f01c5311d8114fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186250484
  
    **[Test build #51560 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51560/consoleFull)** for PR 11270 at commit [`23ba726`](https://github.com/apache/spark/commit/23ba7266358a3de4800bb65da316c20f60bbf7a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186201103
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53447740
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala ---
    @@ -130,7 +141,49 @@ object ResolvedDataSource extends Logging {
           bucketSpec: Option[BucketSpec],
           provider: String,
           options: Map[String, String]): ResolvedDataSource = {
    -    val clazz: Class[_] = lookupDataSource(provider)
    +    // Here, it tries to find out data source by file extensions if the `format()` is not called.
    +    // The auto-detection is based on given paths and it recognizes glob pattern as well but
    +    // it does not recursively check the sub-paths even if the given paths are directories.
    +    // This source detection goes the following steps
    +    //
    +    //   1. Check `provider` and use this if this is not `null`.
    +    //   2. If `provider` is not given, then it tries to detect the source types by extension.
    +    //      at this point, if detects only if all the given paths have the same extension.
    +    //   3. if it fails to detect, use the datasource given to `spark.sql.sources.default`.
    +    //
    +    val paths = {
    +      val caseInsensitiveOptions = new CaseInsensitiveMap(options)
    +      if (caseInsensitiveOptions.contains("paths") &&
    +        caseInsensitiveOptions.contains("path")) {
    +        throw new AnalysisException(s"Both path and paths options are present.")
    +      }
    +      caseInsensitiveOptions.get("paths")
    +        .map(_.split("(?<!\\\\),").map(StringUtils.unEscapeString(_, '\\', ',')))
    +        .getOrElse(Array(caseInsensitiveOptions("path")))
    +        .flatMap{ pathString =>
    +        val hdfsPath = new Path(pathString)
    +        val fs = hdfsPath.getFileSystem(sqlContext.sparkContext.hadoopConfiguration)
    +        val qualified = hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    +        SparkHadoopUtil.get.globPathIfNecessary(qualified).map(_.toString)
    +      }
    +    }
    +    val safeProvider = Option(provider).getOrElse {
    +      val safePaths = paths.filterNot { path =>
    +        FilenameUtils.getBaseName(path)
    +        path.startsWith("_") || path.startsWith(".")
    +      }
    +      val extensions = safePaths.map { path =>
    +        FilenameUtils.getExtension(path).toLowerCase
    +      }
    +      val defaultDataSourceName = sqlContext.conf.defaultDataSourceName
    +      if (extensions.exists(extensions.head != _)) {
    +        defaultDataSourceName
    --- End diff --
    
    An alternative idea is to throw an exception ASAP so that users can easily understand error reasons.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187435536
  
    **[Test build #51671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51671/consoleFull)** for PR 11270 at commit [`856062a`](https://github.com/apache/spark/commit/856062ae9a5c8551fbab53795e9d88e298a38c1b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186294383
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186470429
  
    I'd be ok dropping spark.sql.sources.default in 2.0.
    
    I think that is rarely used.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188751777
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51958/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187530058
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51718/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187455120
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188707171
  
    Sorry, `listFiles()` calls `listStatus()` internally. It looks there is no way to fetch a file deep without listing files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186160336
  
    **[Test build #51552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51552/consoleFull)** for PR 11270 at commit [`23ba726`](https://github.com/apache/spark/commit/23ba7266358a3de4800bb65da316c20f60bbf7a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187603526
  
    I made some tests `ignored` for testing default data source and also commented some pyspark tests instead of setting `unittest.skip()` as they include some primitive save and load tests.
    
    I think I should make another PR (or follow-up) to remove the default data source option if this is merged. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186459355
  
    retest rhis please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188696883
  
    **[Test build #51955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51955/consoleFull)** for PR 11270 at commit [`d00a1b5`](https://github.com/apache/spark/commit/d00a1b56ed3ca1d77966f7b7b55abe79c5ddc36d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187462062
  
    **[Test build #51706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51706/consoleFull)** for PR 11270 at commit [`dc2c454`](https://github.com/apache/spark/commit/dc2c45488466e3b048363aabbaa15e4308edc170).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187595896
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186480982
  
    @rxin thanks! I will update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187595303
  
    **[Test build #51740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51740/consoleFull)** for PR 11270 at commit [`cb3d4e8`](https://github.com/apache/spark/commit/cb3d4e89afcd33e84cbc269a2f01c5311d8114fd).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188736485
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188698952
  
    **[Test build #51958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51958/consoleFull)** for PR 11270 at commit [`8b06436`](https://github.com/apache/spark/commit/8b064367e3a7edd44b295882dbd06563e491d5c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187529607
  
    **[Test build #51718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51718/consoleFull)** for PR 11270 at commit [`28101a1`](https://github.com/apache/spark/commit/28101a1a1322b06898ff2b6e8325f3603b057ca0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187435794
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51671/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186487159
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-193627655
  
    I will take an action as soon as I could have some feedback for this conflict.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186487167
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51584/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53593587
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala ---
    @@ -130,7 +131,28 @@ object ResolvedDataSource extends Logging {
           bucketSpec: Option[BucketSpec],
           provider: String,
           options: Map[String, String]): ResolvedDataSource = {
    -    val clazz: Class[_] = lookupDataSource(provider)
    +    val paths = {
    +      val caseInsensitiveOptions = new CaseInsensitiveMap(options)
    +      if (caseInsensitiveOptions.contains("paths") &&
    +        caseInsensitiveOptions.contains("path")) {
    +        throw new AnalysisException(s"Both path and paths options are present.")
    +      }
    +      caseInsensitiveOptions.get("paths")
    +        .map(_.split("(?<!\\\\),").map(StringUtils.unEscapeString(_, '\\', ',')))
    +        .getOrElse(Array(caseInsensitiveOptions.getOrElse("path", {
    +        throw new IllegalArgumentException("'path' is not specified")
    +      })))
    --- End diff --
    
    indentation is weird here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188705039
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188706820
  
    **[Test build #51961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51961/consoleFull)** for PR 11270 at commit [`d2a1ecf`](https://github.com/apache/spark/commit/d2a1ecfd61b79b9f598b6b8c6150f95b42a7b107).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53595420
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceDetectionSuite.scala ---
    @@ -0,0 +1,70 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.File
    +
    +import org.apache.spark.SparkException
    +import org.apache.spark.sql.{QueryTest, SQLConf}
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class DataSourceDetectionSuite extends QueryTest with SharedSQLContext  {
    --- End diff --
    
    we need to add a test case that fails to detect the type. 
    
    if the behavior is not correct right now (because it always fall back to Parquet), add a test case and make it ignore for now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187608053
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51752/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53545637
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala ---
    @@ -130,7 +141,49 @@ object ResolvedDataSource extends Logging {
           bucketSpec: Option[BucketSpec],
           provider: String,
           options: Map[String, String]): ResolvedDataSource = {
    -    val clazz: Class[_] = lookupDataSource(provider)
    +    // Here, it tries to find out data source by file extensions if the `format()` is not called.
    +    // The auto-detection is based on given paths and it recognizes glob pattern as well but
    +    // it does not recursively check the sub-paths even if the given paths are directories.
    +    // This source detection goes the following steps
    +    //
    +    //   1. Check `provider` and use this if this is not `null`.
    +    //   2. If `provider` is not given, then it tries to detect the source types by extension.
    +    //      at this point, if detects only if all the given paths have the same extension.
    +    //   3. if it fails to detect, use the datasource given to `spark.sql.sources.default`.
    +    //
    +    val paths = {
    --- End diff --
    
    note that i'd move this detection code into a separate class, so we can unit test it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-198225310
  
    Sorry I didn't have much time. I will think about it too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53448744
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala ---
    @@ -130,7 +141,49 @@ object ResolvedDataSource extends Logging {
           bucketSpec: Option[BucketSpec],
           provider: String,
           options: Map[String, String]): ResolvedDataSource = {
    -    val clazz: Class[_] = lookupDataSource(provider)
    +    // Here, it tries to find out data source by file extensions if the `format()` is not called.
    +    // The auto-detection is based on given paths and it recognizes glob pattern as well but
    +    // it does not recursively check the sub-paths even if the given paths are directories.
    +    // This source detection goes the following steps
    +    //
    +    //   1. Check `provider` and use this if this is not `null`.
    +    //   2. If `provider` is not given, then it tries to detect the source types by extension.
    +    //      at this point, if detects only if all the given paths have the same extension.
    +    //   3. if it fails to detect, use the datasource given to `spark.sql.sources.default`.
    +    //
    +    val paths = {
    +      val caseInsensitiveOptions = new CaseInsensitiveMap(options)
    +      if (caseInsensitiveOptions.contains("paths") &&
    +        caseInsensitiveOptions.contains("path")) {
    +        throw new AnalysisException(s"Both path and paths options are present.")
    +      }
    +      caseInsensitiveOptions.get("paths")
    +        .map(_.split("(?<!\\\\),").map(StringUtils.unEscapeString(_, '\\', ',')))
    +        .getOrElse(Array(caseInsensitiveOptions("path")))
    +        .flatMap{ pathString =>
    +        val hdfsPath = new Path(pathString)
    +        val fs = hdfsPath.getFileSystem(sqlContext.sparkContext.hadoopConfiguration)
    +        val qualified = hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    +        SparkHadoopUtil.get.globPathIfNecessary(qualified).map(_.toString)
    +      }
    +    }
    +    val safeProvider = Option(provider).getOrElse {
    +      val safePaths = paths.filterNot { path =>
    +        FilenameUtils.getBaseName(path)
    +        path.startsWith("_") || path.startsWith(".")
    +      }
    +      val extensions = safePaths.map { path =>
    +        FilenameUtils.getExtension(path).toLowerCase
    +      }
    +      val defaultDataSourceName = sqlContext.conf.defaultDataSourceName
    +      if (extensions.exists(extensions.head != _)) {
    +        defaultDataSourceName
    --- End diff --
    
    Aha, I see.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188717294
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51961/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188717022
  
    **[Test build #51961 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51961/consoleFull)** for PR 11270 at commit [`d2a1ecf`](https://github.com/apache/spark/commit/d2a1ecfd61b79b9f598b6b8c6150f95b42a7b107).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188697104
  
    @rxin I found an API `listFiles()` which returns an iterator. So, this will not list up in any case but just trying to find a single file.
    
    Also, I added some more tests for all the cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187493096
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187031213
  
    I submitted some more commits. In summary, 
    
    1. Added a `DataSourceDetect` class separatly.
    2. Now, it only picks a single file. If the given path is directory, it goes deep and picks a single file if the directory does not have the extension.
    3. I did not remove `sqlContext.conf.defaultDataSourceName` here as I see it is referred from relatively a lot of classes (so I thought I could do this in another PR).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188729134
  
    **[Test build #51964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51964/consoleFull)** for PR 11270 at commit [`d2a1ecf`](https://github.com/apache/spark/commit/d2a1ecfd61b79b9f598b6b8c6150f95b42a7b107).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188699728
  
    @rxin I found an API `listFiles()` which returns an iterator. So, this will not list up all files in any case but just try to find a single file.
    
    Also, I added some more tests for all the cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188717288
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186404419
  
    In order to avoid breaking changes (e.g. we can always read Parquet with load), maybe we want to special case handle for Parquet beyond looking at file names.
    
    I looked at the binary protocol (see https://github.com/Parquet/parquet-format), and it looks like Parquet always start with "PAR1" in the beginning of the file. That is to say, if the first four bytes are: 0x50, 0x41, 0x52, 0x31, then it is a Parquet file.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53445997
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -408,7 +408,7 @@ class DataFrameReader private[sql](sqlContext: SQLContext) extends Logging {
       // Builder pattern config options
       ///////////////////////////////////////////////////////////////////////////////////////
     
    -  private var source: String = sqlContext.conf.defaultDataSourceName
    +  private var source: String = _
    --- End diff --
    
    Then we need to change all the function calls from other classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187031347
  
    I submitted some more commits. In summary, 
    
    1. Added a `DataSourceDetect` class separatly.
    2. Now, it only picks a single file. If the given path is directory, it goes deep and picks a single file if the directory does not have the extension.
    3. I did not remove `sqlContext.conf.defaultDataSourceName` here as I see it is referred from relatively a lot of classes (so I thought I could do this in another PR).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187658545
  
    **[Test build #51755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51755/consoleFull)** for PR 11270 at commit [`7a4c049`](https://github.com/apache/spark/commit/7a4c0497929ad85f007a53e2588de66349c2d8f3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53595470
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceDetection.scala ---
    @@ -0,0 +1,127 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import scala.util.Try
    +
    +import org.apache.commons.io.FilenameUtils
    +import org.apache.hadoop.fs.{FileStatus, Path}
    +import org.apache.hadoop.mapred.{FileInputFormat, JobConf}
    +
    +import org.apache.spark.{Logging, SparkException}
    +import org.apache.spark.sql.SQLContext
    +
    +object DataSourceDetection extends Logging {
    --- End diff --
    
    I'm going to take another look at this later, but can you make sure the performance is ok when querying a large number of files on s3? i.e. ideally we should only need to read one file's metadata, rather than all files.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188726983
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186470543
  
    Also this is some best effort thing. I'd just pick one file to read and test that, rather than testing all the files.
    
    If a folder has mixed Parquet and other files, then unfortunately an error will be thrown at runtime. I don't know how users will be able to read that anyway.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188736491
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51964/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186242792
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186463565
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186238954
  
    **[Test build #51558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51558/consoleFull)** for PR 11270 at commit [`23ba726`](https://github.com/apache/spark/commit/23ba7266358a3de4800bb65da316c20f60bbf7a8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187455123
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51703/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon closed the pull request at:

    https://github.com/apache/spark/pull/11270


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187492970
  
    **[Test build #51706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51706/consoleFull)** for PR 11270 at commit [`dc2c454`](https://github.com/apache/spark/commit/dc2c45488466e3b048363aabbaa15e4308edc170).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187397375
  
    **[Test build #51671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51671/consoleFull)** for PR 11270 at commit [`856062a`](https://github.com/apache/spark/commit/856062ae9a5c8551fbab53795e9d88e298a38c1b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188736377
  
    **[Test build #51964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51964/consoleFull)** for PR 11270 at commit [`d2a1ecf`](https://github.com/apache/spark/commit/d2a1ecfd61b79b9f598b6b8c6150f95b42a7b107).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187595936
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51740/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187608052
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186204891
  
    **[Test build #51558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51558/consoleFull)** for PR 11270 at commit [`23ba726`](https://github.com/apache/spark/commit/23ba7266358a3de4800bb65da316c20f60bbf7a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186467181
  
    **[Test build #51584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51584/consoleFull)** for PR 11270 at commit [`23ba726`](https://github.com/apache/spark/commit/23ba7266358a3de4800bb65da316c20f60bbf7a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53445643
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -408,7 +408,7 @@ class DataFrameReader private[sql](sqlContext: SQLContext) extends Logging {
       // Builder pattern config options
       ///////////////////////////////////////////////////////////////////////////////////////
     
    -  private var source: String = sqlContext.conf.defaultDataSourceName
    +  private var source: String = _
    --- End diff --
    
    type `Option[String]` is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53600012
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceDetectionSuite.scala ---
    @@ -0,0 +1,70 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.File
    +
    +import org.apache.spark.SparkException
    +import org.apache.spark.sql.{QueryTest, SQLConf}
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class DataSourceDetectionSuite extends QueryTest with SharedSQLContext  {
    --- End diff --
    
    The test case I added below is the case for detection failure. Given data is Json but it tries to read magic number since `part` files and the given directory do not have extensions.
    
    ```scala
      test("detect datasource - fail to load json without extensions") {
        val data = (1 to 10).map(i => (i, i.toString))
        withTempPath { file =>
          val path = file.getCanonicalPath
          sqlContext.createDataFrame(data).write.json(path)
    
          val message = intercept[SparkException] {
            DataSourceDetection.detect(sqlContext, path)
          }.getMessage
          assert(message.contains("Detected data source was"))
        }
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-192106315
  
    @rxin If you think we should not list up not even once then, should we maybe then just detect the source only by given paths without listing up and then just leave the `sqlContext.conf.defaultDataSourceName` option?
    
    So, in other words,
    ```bash
    ├── iamjson.json                 # Detect success by the extension of `iamjson.json`
    │   ├── part-001
    │   └── part-002
    ├── iamjson                      # Try use `sqlContext.conf.defaultDataSourceName` and then
    │   │                            # throw an exception in Parquet-side.
    │   ├── part-001
    │   └── part-002
    ├── iamparquet.parquet           # Detect success by the extension of `iamparquet.parquet`
    │   ├── part-001.parquet
    │   └── part-002.parquet
    └── iamparquet                   # Just use `sqlContext.conf.defaultDataSourceName`
        ├── part-001.parquet
        └── part-002.parquet
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186294391
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51560/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186199623
  
    **[Test build #51552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51552/consoleFull)** for PR 11270 at commit [`23ba726`](https://github.com/apache/spark/commit/23ba7266358a3de4800bb65da316c20f60bbf7a8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187530053
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53602790
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceDetectionSuite.scala ---
    @@ -0,0 +1,70 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.File
    +
    +import org.apache.spark.SparkException
    +import org.apache.spark.sql.{QueryTest, SQLConf}
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class DataSourceDetectionSuite extends QueryTest with SharedSQLContext  {
    --- End diff --
    
    Actually, shouldn't JSON datasource write `part` files having `.json` extension?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188705041
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51955/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187607865
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188750834
  
    **[Test build #51958 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51958/consoleFull)** for PR 11270 at commit [`8b06436`](https://github.com/apache/spark/commit/8b064367e3a7edd44b295882dbd06563e491d5c8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-198225157
  
    Let me close this for now because I could not come up with a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187493098
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51706/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187609643
  
    **[Test build #51755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51755/consoleFull)** for PR 11270 at commit [`7a4c049`](https://github.com/apache/spark/commit/7a4c0497929ad85f007a53e2588de66349c2d8f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186239281
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187459776
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186457302
  
    @rxin Actually, as you know, `spark.sql.sources.default` can be different datasource, so I think we might have to add some logics to validate all datasources from files in this way or add nothing to avoid breaking changes.
    
    If we go for the validation, there are several concerns. 
    
    1. For Parquet we might be able to use "magic number" you just said but as far as I remember there is no such thing for ORC but it just starts with index data. (Maybe for CSV and JSON we might be able to do this by reading few data from the first of files).
    
    2. For reading few bytes can be simply done by reading them directly but if we need to read other stuff (for example, reading footer from ORC to validate) this will bring complexity just like [this in Parquet](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala#L757-L775).
    
    3. Driver-side overhead would be pretty much increased because basically we need to touch each file to make sure it has a datasource of all the datasources.
    
    Could we maybe handle this issue in different PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186199782
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186486718
  
    **[Test build #51584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51584/consoleFull)** for PR 11270 at commit [`23ba726`](https://github.com/apache/spark/commit/23ba7266358a3de4800bb65da316c20f60bbf7a8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187658940
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51755/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187030511
  
    I submitted some more commits. In summary, 
    
    1. Added a `DataSourceDetect` class separatly.
    2. Now, it only picks a single file. If the given path is directory, it goes deep and picks a single file if the directory does not have the extension.
    3. I did not remove `sqlContext.conf.defaultDataSourceName` here as I see it is referred from relatively a lot of classes (so I thought I could do this in another PR).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187503949
  
    **[Test build #51718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51718/consoleFull)** for PR 11270 at commit [`28101a1`](https://github.com/apache/spark/commit/28101a1a1322b06898ff2b6e8325f3603b057ca0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53448453
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala ---
    @@ -130,7 +141,49 @@ object ResolvedDataSource extends Logging {
           bucketSpec: Option[BucketSpec],
           provider: String,
           options: Map[String, String]): ResolvedDataSource = {
    -    val clazz: Class[_] = lookupDataSource(provider)
    +    // Here, it tries to find out data source by file extensions if the `format()` is not called.
    +    // The auto-detection is based on given paths and it recognizes glob pattern as well but
    +    // it does not recursively check the sub-paths even if the given paths are directories.
    +    // This source detection goes the following steps
    +    //
    +    //   1. Check `provider` and use this if this is not `null`.
    +    //   2. If `provider` is not given, then it tries to detect the source types by extension.
    +    //      at this point, if detects only if all the given paths have the same extension.
    +    //   3. if it fails to detect, use the datasource given to `spark.sql.sources.default`.
    +    //
    +    val paths = {
    +      val caseInsensitiveOptions = new CaseInsensitiveMap(options)
    +      if (caseInsensitiveOptions.contains("paths") &&
    +        caseInsensitiveOptions.contains("path")) {
    +        throw new AnalysisException(s"Both path and paths options are present.")
    +      }
    +      caseInsensitiveOptions.get("paths")
    +        .map(_.split("(?<!\\\\),").map(StringUtils.unEscapeString(_, '\\', ',')))
    +        .getOrElse(Array(caseInsensitiveOptions("path")))
    +        .flatMap{ pathString =>
    +        val hdfsPath = new Path(pathString)
    +        val fs = hdfsPath.getFileSystem(sqlContext.sparkContext.hadoopConfiguration)
    +        val qualified = hdfsPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    +        SparkHadoopUtil.get.globPathIfNecessary(qualified).map(_.toString)
    +      }
    +    }
    +    val safeProvider = Option(provider).getOrElse {
    +      val safePaths = paths.filterNot { path =>
    +        FilenameUtils.getBaseName(path)
    +        path.startsWith("_") || path.startsWith(".")
    +      }
    +      val extensions = safePaths.map { path =>
    +        FilenameUtils.getExtension(path).toLowerCase
    +      }
    +      val defaultDataSourceName = sqlContext.conf.defaultDataSourceName
    +      if (extensions.exists(extensions.head != _)) {
    +        defaultDataSourceName
    --- End diff --
    
    Then original call `read().load()` for files having no extensions would throw an exception which breaks backword compatibility.
    
    If we drops this for Spark 2.0, I think that is a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187658938
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186199783
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51552/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186239285
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51558/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-186242687
  
    Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11270#discussion_r53597525
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceDetection.scala ---
    @@ -0,0 +1,127 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import scala.util.Try
    +
    +import org.apache.commons.io.FilenameUtils
    +import org.apache.hadoop.fs.{FileStatus, Path}
    +import org.apache.hadoop.mapred.{FileInputFormat, JobConf}
    +
    +import org.apache.spark.{Logging, SparkException}
    +import org.apache.spark.sql.SQLContext
    +
    +object DataSourceDetection extends Logging {
    --- End diff --
    
    One thing I am worried is, this will list up a directory if one of given paths does not have the extension and it is a directory (although it will stop looking deeper recursively if (at least) a single file is found).
    
    Nevertheless, I think I should take a look so that we do not list up any directory but just pick up a single file in any cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187435793
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188751775
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-187032665
  
    I submitted some more commits. In summary, 
    
    1. Added a `DataSourceDetect` class separatly.
    2. Now, it only picks a single file. If the given path is directory, it goes deep and picks a single file if the directory does not have the extension.
    3. I did not remove `sqlContext.conf.defaultDataSourceName` here as I see it is referred from relatively a lot of classes (so I thought I could do this in another PR).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11270#issuecomment-188704925
  
    **[Test build #51955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51955/consoleFull)** for PR 11270 at commit [`d00a1b5`](https://github.com/apache/spark/commit/d00a1b56ed3ca1d77966f7b7b55abe79c5ddc36d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org