You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2016/06/18 00:41:00 UTC

[GitHub] spark pull request #13747: [SPARK-16033][SQL] insertInto() can't be used tog...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/13747

    [SPARK-16033][SQL] insertInto() can't be used together with partitionBy()

    ## What changes were proposed in this pull request?
    
    When inserting into an existing partitioned table, partitioning columns should always be determined by catalog metadata of the existing table to be inserted. Extra `partitionBy()` calls don't make sense, and mess up existing data because newly inserted data may have wrong partitioning directory layout.
    
    ## How was this patch tested?
    
    New test case added in `InsertIntoHiveTableSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark spark-16033-insert-into-without-partition-by

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13747.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13747
    
----
commit d24ed58ba4d3b9b5ee545873ee10662d3a8498c7
Author: Cheng Lian <li...@databricks.com>
Date:   2016-06-18T00:39:43Z

    insertInto() can't be used together with partitionBy()

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13747: [SPARK-16033][SQL] insertInto() can't be used together w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13747
  
    **[Test build #60736 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60736/consoleFull)** for PR 13747 at commit [`d24ed58`](https://github.com/apache/spark/commit/d24ed58ba4d3b9b5ee545873ee10662d3a8498c7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13747: [SPARK-16033][SQL] insertInto() can't be used together w...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/13747
  
    LGTM. Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13747: [SPARK-16033][SQL] insertInto() can't be used together w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13747
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60736/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13747: [SPARK-16033][SQL] insertInto() can't be used together w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13747
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13747: [SPARK-16033][SQL] insertInto() can't be used tog...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13747#discussion_r67591585
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -243,7 +241,15 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
     
       private def insertInto(tableIdent: TableIdentifier): Unit = {
         assertNotBucketed("insertInto")
    -    val partitions = normalizedParCols.map(_.map(col => col -> (Option.empty[String])).toMap)
    +
    +    if (partitioningColumns.isDefined) {
    +      throw new AnalysisException(
    +        "insertInto() can't be used together with partitionBy(). " +
    +          "Partition columns are defined by the table into which is being inserted."
    --- End diff --
    
    Mention that `partitionBy` is not needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13747: [SPARK-16033][SQL] insertInto() can't be used tog...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13747


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13747: [SPARK-16033][SQL] insertInto() can't be used together w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13747
  
    **[Test build #60736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60736/consoleFull)** for PR 13747 at commit [`d24ed58`](https://github.com/apache/spark/commit/d24ed58ba4d3b9b5ee545873ee10662d3a8498c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13747: [SPARK-16033][SQL] insertInto() can't be used together w...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/13747
  
    cc @cloud-fan @clockfly @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org