You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanbohappy <gi...@git.apache.org> on 2015/02/15 06:56:00 UTC

[GitHub] spark pull request: JSON external data source INSERT improvements ...

GitHub user yanbohappy opened a pull request:

    https://github.com/apache/spark/pull/4610

    JSON external data source INSERT improvements initial draft

    
    
    JSON external data source INSERT operation improvements and bug fix:
    1, The path in "CREATE TABLE AS SELECT" must be a directory whether it exists, we use directory to represent the table.
    Because in this scenario we need to write(INSERT OVERWRITE) or append(INSERT INTO) data to the existing table, we can't append to HDFS files which represent RDD. If we want to implement append semantics, we need new files and add them to the specific directory.
    Another reason is that if the table based on a directory is more reasonable for access control, authentication and authorization. As SPARK-5821 mentioned, if we don't have write permission for the parent directory of the table, the CTAS command will failure. It's reasonable that it will not be granted some access rights to the directory which out of the table scope. So the table base on a directory may be a better choice. 
    This restriction is only to "CREATE TABLE AS SELECT", other DDL like "CREATE TABLE" can base on ordinary file or directory, due to the later one only scanning table without inserting new data.
    
    2, New INSERT OVERWRITE implementation.
    First insert the new generated table to a temporary directory which named as "_temporary" under the path directory. After insert finished, we deleted the original files. At last we rename "_temporary" for "data".
    This can fix the bug which mentioned at SPARK-5746.
    
    3, Why to rename "_temporary" for "data" rather than move all files in "_temporary" to path and then delete "_temporary"? Because that spark RDD.saveAsTextFile(path) related operation will store the whole RDD to HDFS files which named as "part-***** " like files under the path. If the original files were produced by this mean, and then we use "INSERT" without overwrite, the new generated table files are also named as "part-***** " which will produce corrupted table.
    
    Todo:
    1, If there is an existing RDD base on path a/b/c which has already been cached, after "INSERT" operation we need to recomputing this RDD by rescan the directory. Can we trigger a rescan execution operation after "INSERT"?
    2, Is it enough that rename  "_temporary" to "data" which mentioned above? If the base directory is produced by another "CTAS" command, there will be "data" directory under it. Can we append a unique number after "data" or just use the jobId or taskId to identify the subdirectory ?
    I think to resolve these problem in a follow up PR may be better.
    
    This is the initial draft and need optimization. Looking forward your comments.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanbohappy/spark jsonInsertImprovements

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4610.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4610
    
----
commit 307278ff6bdcd1d1a5a50650fb3dfa6da3db070f
Author: Yanbo Liang <ya...@gmail.com>
Date:   2015-02-15T05:23:14Z

    JSON external data source INSERT improvements initial draft

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-81588212
  
      [Test build #28646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28646/consoleFull) for   PR 4610 at commit [`e2df8d5`](https://github.com/apache/spark/commit/e2df8d5438dfe0b945bed76ec8ac8e632ac7a9a5).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-81567278
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28643/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-84065407
  
    @liancheng Please review my fix for ParquetRelation2 in #5107  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-83532036
  
    @yhuai Thanks for cc-ing me this. Although `ParquetRelation2` checks for exception, it doesn't check return value of `fs.delete`, and thus suffer the same issue. Opening a separate PR for this.
    
    @yanboliang Thanks for pointing the bug and working on this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-81626078
  
      [Test build #28646 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28646/consoleFull) for   PR 4610 at commit [`e2df8d5`](https://github.com/apache/spark/commit/e2df8d5438dfe0b945bed76ec8ac8e632ac7a9a5).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r24720107
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -76,12 +75,17 @@ private[sql] class DefaultSource
         } else {
           true
         }
    -    if (doSave) {
    +    val relation = if (doSave) {
           // Only save data when the save mode is not ignore.
    -      data.toJSON.saveAsTextFile(path)
    --- End diff --
    
    Why not just save the data?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-83548628
  
    @yanboliang Sure, please go ahead. I'm trying to verify this issue locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-82816330
  
      [Test build #28784 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28784/consoleFull) for   PR 4610 at commit [`46f0d9d`](https://github.com/apache/spark/commit/46f0d9d7e58cbc8f118e2d528d9241f8497bc73a).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-81626136
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28646/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r26511261
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -66,9 +66,17 @@ private[sql] class DefaultSource
           mode match {
             case SaveMode.Append =>
               sys.error(s"Append mode is not supported by ${this.getClass.getCanonicalName}")
    -        case SaveMode.Overwrite =>
    -          fs.delete(filesystemPath, true)
    +        case SaveMode.Overwrite => {
    +          try {
    +            fs.delete(filesystemPath, true)
    --- End diff --
    
    Why not check if delete returns false at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SPARK-5746] [SQL] JSON external ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74405439
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27506/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-83433562
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28860/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74406236
  
      [Test build #27509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27509/consoleFull) for   PR 4610 at commit [`3cdfb7a`](https://github.com/apache/spark/commit/3cdfb7aa739a37c6cb5a0254404d18eb6ef273c4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r24922476
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -66,9 +66,17 @@ private[sql] class DefaultSource
           mode match {
             case SaveMode.Append =>
               sys.error(s"Append mode is not supported by ${this.getClass.getCanonicalName}")
    -        case SaveMode.Overwrite =>
    -          fs.delete(filesystemPath, true)
    +        case SaveMode.Overwrite => {
    +          try {
    +            fs.delete(filesystemPath, true)
    +          } catch {
    +            case e: IOException =>
    +              throw new IOException(
    +                s"Unable to clear output directory ${filesystemPath.toString} prior"
    +                  + s" to CREATE a JSON table AS SELECT:\n${e.toString}")
    +          }
    --- End diff --
    
    @yanbohappy Seems we just throw another error message at here. Based on your JIRA description, I think you need to check if delete returns true or false when data already exists. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-83545467
  
    @yhuai Thank you for your comments on this PR
    @liancheng If this LGTM, can I open another PR for ```ParquetRelation2``` and work on it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SPARK-5746] [SQL] JSON external ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74405412
  
      [Test build #27506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27506/consoleFull) for   PR 4610 at commit [`307278f`](https://github.com/apache/spark/commit/307278ff6bdcd1d1a5a50650fb3dfa6da3db070f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source i...

Posted by yanbohappy <gi...@git.apache.org>.
Github user yanbohappy commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74425262
  
    @yhuai 
    OK, if we don't support write to a table while reading it, the code will be more concise.
    I will try to rewrite the code to focus on the issue SPARK-5812.
    Thank you for your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-84232332
  
    PS: #5107 has also been merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r25000037
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -66,9 +66,17 @@ private[sql] class DefaultSource
           mode match {
             case SaveMode.Append =>
               sys.error(s"Append mode is not supported by ${this.getClass.getCanonicalName}")
    -        case SaveMode.Overwrite =>
    -          fs.delete(filesystemPath, true)
    +        case SaveMode.Overwrite => {
    +          try {
    +            fs.delete(filesystemPath, true)
    +          } catch {
    +            case e: IOException =>
    +              throw new IOException(
    +                s"Unable to clear output directory ${filesystemPath.toString} prior"
    +                  + s" to CREATE a JSON table AS SELECT:\n${e.toString}")
    +          }
    --- End diff --
    
    @yhuai Yes, it is expected. If there is no exception thrown, it can ensure delete success. I have investigated other codes in spark and they do the same thing for delete.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r26726280
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -66,9 +66,21 @@ private[sql] class DefaultSource
           mode match {
             case SaveMode.Append =>
               sys.error(s"Append mode is not supported by ${this.getClass.getCanonicalName}")
    -        case SaveMode.Overwrite =>
    -          fs.delete(filesystemPath, true)
    +        case SaveMode.Overwrite => {
    +          try {
    +            if (!fs.delete(filesystemPath, true)) {
    +              throw new IOException(
    +                s"Unable to clear output directory ${filesystemPath.toString} prior"
    +                  + s" to INSERT OVERWRITE a JSON table:\n")
    +            }
    +          } catch {
    +            case e: IOException =>
    +              throw new IOException(
    +                s"Unable to clear output directory ${filesystemPath.toString} prior"
    +                  + s" to INSERT OVERWRITE a JSON table:\n${e.toString}")
    --- End diff --
    
    This will also catch the one thrown by `try`. Can we avoid this? If `delete` returns false, we just throw the exception after the `catch`. Also, you can put `e` as the cause of the new exception, like `IOException("...", e)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-84232316
  
    Thanks, merged into master and branch-1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r26726167
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -66,9 +66,21 @@ private[sql] class DefaultSource
           mode match {
             case SaveMode.Append =>
               sys.error(s"Append mode is not supported by ${this.getClass.getCanonicalName}")
    -        case SaveMode.Overwrite =>
    -          fs.delete(filesystemPath, true)
    +        case SaveMode.Overwrite => {
    +          try {
    +            if (!fs.delete(filesystemPath, true)) {
    +              throw new IOException(
    +                s"Unable to clear output directory ${filesystemPath.toString} prior"
    +                  + s" to INSERT OVERWRITE a JSON table:\n")
    --- End diff --
    
    Also, get rid of `:\n` and add a `.` at the end of the message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4610


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-83950252
  
    @yanboliang Adding a check for return value at [this line] [1] should be enough.
    
    [1]: https://github.com/apache/spark/blob/0745a305fac622a6eeb8aa4a7401205a14252939/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala#L596


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74494628
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27551/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-83433542
  
      [Test build #28860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28860/consoleFull) for   PR 4610 at commit [`c387fce`](https://github.com/apache/spark/commit/c387fcef43ed45bc6469902216c57b9937ae5a1d).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74406171
  
    @yanbohappy It will be great if we use this PR to only do the task of throwing a nice error message when the `delete` returns false and data already exists in the code path for overwrite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-81559306
  
      [Test build #28643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28643/consoleFull) for   PR 4610 at commit [`e4bc229`](https://github.com/apache/spark/commit/e4bc2290e3a0c5dc1fe0073993cbba1f56e7e050).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SPARK-5746] [SQL] JSON external ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74405438
  
      [Test build #27506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27506/consoleFull) for   PR 4610 at commit [`307278f`](https://github.com/apache/spark/commit/307278ff6bdcd1d1a5a50650fb3dfa6da3db070f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-81618205
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28645/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74406987
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27509/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r24720095
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -67,7 +67,6 @@ private[sql] class DefaultSource
             case SaveMode.Append =>
               sys.error(s"Append mode is not supported by ${this.getClass.getCanonicalName}")
             case SaveMode.Overwrite =>
    -          fs.delete(filesystemPath, true)
    --- End diff --
    
    Why remove it? I think we can first get the file status (we also need to handle FileNotFoundException). Then, we check if data already exists. For overwrite, we delete it and if `delete` returns false and data does exist, we throw an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-83272562
  
    Thank you for the update! It's very close. I think we want to do the same thing for `ParquetRelation2`. Is it possible to add a test for this?
    
    cc @liancheng  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by hackingdata8 <gi...@git.apache.org>.
Github user hackingdata8 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r24989501
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -66,9 +66,17 @@ private[sql] class DefaultSource
           mode match {
             case SaveMode.Append =>
               sys.error(s"Append mode is not supported by ${this.getClass.getCanonicalName}")
    -        case SaveMode.Overwrite =>
    -          fs.delete(filesystemPath, true)
    +        case SaveMode.Overwrite => {
    +          try {
    +            fs.delete(filesystemPath, true)
    +          } catch {
    +            case e: IOException =>
    +              throw new IOException(
    +                s"Unable to clear output directory ${filesystemPath.toString} prior"
    +                  + s" to CREATE a JSON table AS SELECT:\n${e.toString}")
    +          }
    --- End diff --
    
    @yhuai Yes, it is expected. If there is no exception thrown, it can ensure delete success. I have found that other codes in spark do the same thing for delete.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74405734
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27507/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-79517469
  
    @yhuai what is the status here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-79666339
  
    Sorry for not following it. I will investigate it and reproduce the error reported in jira. I should have more ideas on how we want to have the change early next week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-83399347
  
      [Test build #28860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28860/consoleFull) for   PR 4610 at commit [`c387fce`](https://github.com/apache/spark/commit/c387fcef43ed45bc6469902216c57b9937ae5a1d).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74406985
  
      [Test build #27509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27509/consoleFull) for   PR 4610 at commit [`3cdfb7a`](https://github.com/apache/spark/commit/3cdfb7aa739a37c6cb5a0254404d18eb6ef273c4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-82787234
  
      [Test build #28784 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28784/consoleFull) for   PR 4610 at commit [`46f0d9d`](https://github.com/apache/spark/commit/46f0d9d7e58cbc8f118e2d528d9241f8497bc73a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r26726286
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -110,7 +122,11 @@ private[sql] case class JSONRelation(
     
         if (overwrite) {
           try {
    -        fs.delete(filesystemPath, true)
    +        if (fs.exists(filesystemPath) && !fs.delete(filesystemPath, true)) {
    +          throw new IOException(
    +            s"Unable to clear output directory ${filesystemPath.toString} prior"
    +              + s" to INSERT OVERWRITE a JSON table:\n")
    --- End diff --
    
    Get rid of `:\n`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74484384
  
      [Test build #27551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27551/consoleFull) for   PR 4610 at commit [`5a42d83`](https://github.com/apache/spark/commit/5a42d832e1a523f2b9048ee580f3845689e1e620).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r26726099
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -66,9 +66,21 @@ private[sql] class DefaultSource
           mode match {
             case SaveMode.Append =>
               sys.error(s"Append mode is not supported by ${this.getClass.getCanonicalName}")
    -        case SaveMode.Overwrite =>
    -          fs.delete(filesystemPath, true)
    +        case SaveMode.Overwrite => {
    +          try {
    +            if (!fs.delete(filesystemPath, true)) {
    +              throw new IOException(
    +                s"Unable to clear output directory ${filesystemPath.toString} prior"
    +                  + s" to INSERT OVERWRITE a JSON table:\n")
    +            }
    +          } catch {
    +            case e: IOException =>
    +              throw new IOException(
    +                s"Unable to clear output directory ${filesystemPath.toString} prior"
    +                  + s" to INSERT OVERWRITE a JSON table:\n${e.toString}")
    --- End diff --
    
    Save here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r24720131
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -104,21 +116,35 @@ private[sql] case class JSONRelation(
       override def buildScan() =
         JsonRDD.jsonStringToRow(baseRDD, schema, sqlContext.conf.columnNameOfCorruptRecord)
     
    +  private def isTemporaryFile(file: Path): Boolean = {
    +    file.getName == "_temporary"
    +  }
    +
       override def insert(data: DataFrame, overwrite: Boolean) = {
    -    val filesystemPath = new Path(path)
    -    val fs = filesystemPath.getFileSystem(sqlContext.sparkContext.hadoopConfiguration)
    +
    +    // If the path exists, it must be a directory, error for not.
    +    // Otherwise we create a directory with the path name.
    +    if (fs.exists(filesystemPath) && !fs.getFileStatus(filesystemPath).isDirectory) {
    +      sys.error("a CREATE [TEMPORARY] TABLE AS SELECT statement need the path must be directory")
    +    }
     
         if (overwrite) {
    +      val temporaryPath = new Path(path, "_temporary")
    +      val dataPath = new Path(path, "data")
    +      // Write the data.
    +      data.toJSON.saveAsTextFile(temporaryPath.toUri.toString)
    +      val pathsToDelete = fs.listStatus(filesystemPath).filter(
    +        f => !isTemporaryFile(f.getPath)).map(_.getPath)
    +
           try {
    -        fs.delete(filesystemPath, true)
    --- End diff --
    
    We need to do the same thing for delete at here as what we will do for createRelation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74405515
  
      [Test build #27507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27507/consoleFull) for   PR 4610 at commit [`3156c1b`](https://github.com/apache/spark/commit/3156c1be6896ff315423cd422a3630fb5cf7f48e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-82816363
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28784/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r26726093
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -66,9 +66,21 @@ private[sql] class DefaultSource
           mode match {
             case SaveMode.Append =>
               sys.error(s"Append mode is not supported by ${this.getClass.getCanonicalName}")
    -        case SaveMode.Overwrite =>
    -          fs.delete(filesystemPath, true)
    +        case SaveMode.Overwrite => {
    +          try {
    +            if (!fs.delete(filesystemPath, true)) {
    +              throw new IOException(
    +                s"Unable to clear output directory ${filesystemPath.toString} prior"
    +                  + s" to INSERT OVERWRITE a JSON table:\n")
    --- End diff --
    
    Can we say "prior to writing to JSON file" since a user can reach this code path through `DataFrame.save`, which is not related to table operation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-81567253
  
      [Test build #28643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28643/consoleFull) for   PR 4610 at commit [`e4bc229`](https://github.com/apache/spark/commit/e4bc2290e3a0c5dc1fe0073993cbba1f56e7e050).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4610#discussion_r24720121
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
    @@ -104,21 +116,35 @@ private[sql] case class JSONRelation(
       override def buildScan() =
         JsonRDD.jsonStringToRow(baseRDD, schema, sqlContext.conf.columnNameOfCorruptRecord)
     
    +  private def isTemporaryFile(file: Path): Boolean = {
    +    file.getName == "_temporary"
    +  }
    +
       override def insert(data: DataFrame, overwrite: Boolean) = {
    -    val filesystemPath = new Path(path)
    -    val fs = filesystemPath.getFileSystem(sqlContext.sparkContext.hadoopConfiguration)
    +
    +    // If the path exists, it must be a directory, error for not.
    +    // Otherwise we create a directory with the path name.
    +    if (fs.exists(filesystemPath) && !fs.getFileStatus(filesystemPath).isDirectory) {
    +      sys.error("a CREATE [TEMPORARY] TABLE AS SELECT statement need the path must be directory")
    +    }
     
         if (overwrite) {
    +      val temporaryPath = new Path(path, "_temporary")
    --- End diff --
    
    Seems we do not need it for now because we will check if we are writing to a table while reading it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-81583301
  
      [Test build #28645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28645/consoleFull) for   PR 4610 at commit [`79f7040`](https://github.com/apache/spark/commit/79f70407d78db3df93641f82c85df14ce00c2205).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74494622
  
      [Test build #27551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27551/consoleFull) for   PR 4610 at commit [`5a42d83`](https://github.com/apache/spark/commit/5a42d832e1a523f2b9048ee580f3845689e1e620).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON external data source I...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-74405732
  
      [Test build #27507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27507/consoleFull) for   PR 4610 at commit [`3156c1b`](https://github.com/apache/spark/commit/3156c1be6896ff315423cd422a3630fb5cf7f48e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4610#issuecomment-81618177
  
      [Test build #28645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28645/consoleFull) for   PR 4610 at commit [`79f7040`](https://github.com/apache/spark/commit/79f70407d78db3df93641f82c85df14ce00c2205).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org