You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2014/10/30 09:25:54 UTC

[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/3013

    [SPARK-4152] [SQL] Avoid data change in CTAS while table already existed

    CREATE TABLE t1 (a String);
    CREATE TABLE t1 AS SELECT key FROM src; – throw exception
    CREATE TABLE if not exists t1 AS SELECT key FROM src; – expect do nothing, actually will overwrite the t1.
    
    This PR actually contains the following change:
    1) Fix bug describe above;
    2) Whitelist the test case.
    3) Disable the file split for small single table file(make input splits based on HDFS block)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark ctas_unittest

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3013.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3013
    
----
commit 0e8df89854c55f490d89cf8518cd56951c577a5c
Author: Cheng Hao <ha...@intel.com>
Date:   2014-10-10T05:26:09Z

    Keep 1 split for small file in table scanning

commit 1d3bc4dfe81f5c56f825487877ab248da0446918
Author: Cheng Hao <ha...@intel.com>
Date:   2014-10-30T07:55:59Z

    Fix bug for overwrite the existed table while CTAS & add whitelist the unit test

commit 6d18aa736bd039d1a8194c5014ad671b08e9e928
Author: Cheng Hao <ha...@intel.com>
Date:   2014-10-30T08:19:12Z

    Add missing golden file

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61434330
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22785/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61310021
  
      [Test build #22622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22622/consoleFull) for   PR 3013 at commit [`ec72555`](https://github.com/apache/spark/commit/ec72555ef6669f2bc9c599ccf32e283e4aebf845).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3013#discussion_r19701825
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -320,8 +326,9 @@ object HiveMetastoreTypes extends RegexParsers {
         "double" ^^^ DoubleType |
         "bigint" ^^^ LongType |
         "binary" ^^^ BinaryType |
    -    "boolean" ^^^ BooleanType |
    -    HiveShim.metastoreDecimal ^^^ DecimalType |
    +    "boolean" ^^^ BooleanType | // TODO decimal Hive 0.12.0
    +    "decimal\\((\\d+),(\\d+)\\)".r ^^^ DecimalType | // TODO decimal Hive 0.13.1
    --- End diff --
    
    we need these todos here? will this both ok for hive 12 and 13, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61157483
  
    I'm confused by the failure.  Do you possibly have HIVE_DEV_HOME set to Hive 12 or something such that its generating the wrong golden files?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61075927
  
      [Test build #22538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22538/consoleFull) for   PR 3013 at commit [`c1ea850`](https://github.com/apache/spark/commit/c1ea850c30b3e2217c89edc60e9187b68f5e00e3).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61427879
  
    Jenkins compiles with both versions just to make sure that we aren't breaking backwards compatibility (Hive 12 first).  Ideally, we'll set up another job to run the test for Hive 12 in parallel or at least periodically, but for now running both would take too much time.
    
    In terms of semantics I think it is too much overhead to try to faithfully mimic both versions since the primary goal here is metastore compatibility.  Thus, the query tests are based on Hive13 and the golden answers are too.  It is possible to run nearly of the tests with the Hive12 library too, though in places we act like 13 even though we are compiling with the 12 library.  In the few cases where we can't run a given test with both versions of the library there is a special blacklist in the shim.
    
    Full Hive 13 decimal support is now merged, so hopefully we can remove all the special cases from this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3013#discussion_r19708931
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala ---
    @@ -175,6 +175,8 @@ abstract class HiveComparisonTest
         "last_modified_by",
         "last_modified_time",
         "Owner:",
    +    "numPartitions",
    +    "decimal", // TODO currently we don't support the decimal type of Hive 0.13
    --- End diff --
    
    I don't think we should just ignore these lines.  If there are specific tests that are failing due to decimal then we should add them to the black list with the others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61310518
  
    Yes, blacklist any tests the rely on fixed decimals.  This will be fixed by #2983.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61409579
  
    Thank you @scwf @marmbrus for reviewing the code. Actually I am a little confused with the hive versions in Spark SQL. Seems the golden files are based on Hive 0.13, but Jenkins compiles the code with Hive 0.12. This why I have to put some of the work around code with TODOs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3013#discussion_r19712383
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala ---
    @@ -175,6 +175,8 @@ abstract class HiveComparisonTest
         "last_modified_by",
         "last_modified_time",
         "Owner:",
    +    "numPartitions",
    +    "decimal", // TODO currently we don't support the decimal type of Hive 0.13
    --- End diff --
    
    Sorry for the confusing of the TODO.
    This is for the `DescribeHiveTableCommand`. The the golden files output like "decimal(xx,xx)", but Hive 0.12 version test answer is "decimal". The same for `numPartitions`. I know we have PR to fix the precision issue of decimal, this is a workaround for compatible with Hive-0.12 testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61432158
  
    Thank you @marmbrus . I've updated the code just for the bug fixing. And will create another PRs for the Hive compatibility testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61315655
  
      [Test build #22622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22622/consoleFull) for   PR 3013 at commit [`ec72555`](https://github.com/apache/spark/commit/ec72555ef6669f2bc9c599ccf32e283e4aebf845).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61058954
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22526/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61307901
  
      [Test build #22619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22619/consoleFull) for   PR 3013 at commit [`1acc914`](https://github.com/apache/spark/commit/1acc914748368d2f5790af4523e7ac947578c15e).
     * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61357494
  
      [Test build #22678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22678/consoleFull) for   PR 3013 at commit [`4085c67`](https://github.com/apache/spark/commit/4085c67fa3ecdbb81ac1a70c0ed3b7662d2c9741).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61311415
  
    I don't think we need to blacklist the tests here. I've add TODO in the code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61079218
  
      [Test build #22538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22538/consoleFull) for   PR 3013 at commit [`c1ea850`](https://github.com/apache/spark/commit/c1ea850c30b3e2217c89edc60e9187b68f5e00e3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61358398
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22678/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3013#discussion_r19712391
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -320,8 +326,9 @@ object HiveMetastoreTypes extends RegexParsers {
         "double" ^^^ DoubleType |
         "bigint" ^^^ LongType |
         "binary" ^^^ BinaryType |
    -    "boolean" ^^^ BooleanType |
    -    HiveShim.metastoreDecimal ^^^ DecimalType |
    +    "boolean" ^^^ BooleanType | // TODO decimal Hive 0.12.0
    +    "decimal\\((\\d+),(\\d+)\\)".r ^^^ DecimalType | // TODO decimal Hive 0.13.1
    --- End diff --
    
    The ".q" files contains the create table like "CREATE TABLE DECIMAL_4_1(key decimal(35,25), value int)", however, the Jenkins seems compile spark sql with Hive 0.12, hence I have to put the both patterns here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61169005
  
    I think @chenghao-intel use hive 0.12 to generate the golden files, while Jenkins test with hive 0.13


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3013#discussion_r19712328
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala ---
    @@ -69,4 +69,10 @@ class HiveTableScanSuite extends HiveComparisonTest {
         TestHive.sql("DROP TABLE timestamp_query_null")
       }
       
    +  // In unit test, kv1.txt is a small file and will be loaded as table src by default
    +  // And since it's a small file, then it will be consider as a single input split.
    +  createQueryTest("file_split_for_small_table",
    +    """
    +      |SELECT key, value FROM src SORT BY key, value;
    +    """.stripMargin)
    --- End diff --
    
    Actually this test will fail without the change https://github.com/chenghao-intel/spark/blob/ctas_unittest/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L63 . I described root reason for this at #2589.
    Sorry I should put as independent PR, however, the 2 added test(`ctas.q`,`ctas_hadoop20.q`) cases will fail without this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61432265
  
      [Test build #22785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22785/consoleFull) for   PR 3013 at commit [`194113e`](https://github.com/apache/spark/commit/194113e1ede2fb4c7a1a18542caea1946d0ca776).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61169200
  
    Yeah, sorry.  We have switched everything to Hive 13 (though we should still pass the tests when running in Hive 12 mode, otherwise they should be added to the HiveShim blacklist).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61309696
  
    Thank you @scwf @marmbrus , I've updated the code for unit testing based on Hive 0.13, and it passed the test locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61557236
  
    Thanks!  Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61310303
  
    I also noticed that the golden files changed when switching from Hive 0.12 to 0.13, probably due to the decimal type incompatible. see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Decimals


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61320640
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22619/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61358397
  
      [Test build #22678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22678/consoleFull) for   PR 3013 at commit [`4085c67`](https://github.com/apache/spark/commit/4085c67fa3ecdbb81ac1a70c0ed3b7662d2c9741).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61315664
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22622/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/3013


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61079222
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22538/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61320629
  
      [Test build #22619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22619/consoleFull) for   PR 3013 at commit [`1acc914`](https://github.com/apache/spark/commit/1acc914748368d2f5790af4523e7ac947578c15e).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61058848
  
      [Test build #22526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22526/consoleFull) for   PR 3013 at commit [`6d18aa7`](https://github.com/apache/spark/commit/6d18aa736bd039d1a8194c5014ad671b08e9e928).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61058953
  
      [Test build #22526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22526/consoleFull) for   PR 3013 at commit [`6d18aa7`](https://github.com/apache/spark/commit/6d18aa736bd039d1a8194c5014ad671b08e9e928).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3013#discussion_r19708934
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala ---
    @@ -69,4 +69,10 @@ class HiveTableScanSuite extends HiveComparisonTest {
         TestHive.sql("DROP TABLE timestamp_query_null")
       }
       
    +  // In unit test, kv1.txt is a small file and will be loaded as table src by default
    +  // And since it's a small file, then it will be consider as a single input split.
    +  createQueryTest("file_split_for_small_table",
    +    """
    +      |SELECT key, value FROM src SORT BY key, value;
    +    """.stripMargin)
    --- End diff --
    
    What is this testing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3013#issuecomment-61434327
  
      [Test build #22785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22785/consoleFull) for   PR 3013 at commit [`194113e`](https://github.com/apache/spark/commit/194113e1ede2fb4c7a1a18542caea1946d0ca776).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class GenericStrategy[PhysicalPlan <: TreeNode[PhysicalPlan]] extends Logging `
      * `trait RunnableCommand extends logical.Command `
      * `case class ExecutedCommand(cmd: RunnableCommand) extends SparkPlan `
      * `  protected case class Keyword(str: String)`
      * `            sys.error(s"Failed to load class for data source: $provider")`
      * `case class EqualTo(attribute: String, value: Any) extends Filter`
      * `case class GreaterThan(attribute: String, value: Any) extends Filter`
      * `case class GreaterThanOrEqual(attribute: String, value: Any) extends Filter`
      * `case class LessThan(attribute: String, value: Any) extends Filter`
      * `case class LessThanOrEqual(attribute: String, value: Any) extends Filter`
      * `trait RelationProvider `
      * `abstract class BaseRelation `
      * `abstract class TableScan extends BaseRelation `
      * `abstract class PrunedScan extends BaseRelation `
      * `abstract class PrunedFilteredScan extends BaseRelation `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org